Tag: Policy

The National Data Library should help people deliver trustworthy data services

In its 2024 manifesto the UK Labour Party promised to build a National Data Library that would:

bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit.”

Great, there’s a lot to do on data in the UK. But, unfortunately the manifesto commitment is a pretty broad scope and the term ‘library’ can be confusing. It is not surprising that people are finding it hard to agree what to do next.

Perhaps the National Data Library should look like the warehouse from Indiana Jones

Here’s a suggestion for what the “National Data Library” should be and how it could get started.

The National Data Library should help public sector teams deliver trustworthy data services by providing guidance, reusable platforms / technical components, and growing communities of practice.

It should start with practical work to build and improve the government’s existing data services. This will help it learn what is needed to deliver trustworthy data services across the public sector.

The visions and priorities for the library are broad, that is making it hard to define

At a political level the description in the manifesto might seem to make sense, but when you poke at it the sentence starts to fall apart.

The concepts of bringing together “existing research programmes” and helping “deliver data-driven public services” are different.

Public services tend to need access to reference data, information about rules, and to collect and process data from and about individuals and businesses that use the service. They are shaped by democratic debate and a range of legislation including data protection and administrative law. There are many existing public service-focussed data initiatives inside the government.

Research programmes tend to need access to large datasets from one or more organisations. They have oversight, such as research ethics committees, and researchers are likely to be accredited. A research programme is a lot more likely to seek informed consent, for both participation in the research and data use, than a public service. There are many existing research programmes running in universities, philanthropic and public sector organisations.

The original proposal for a UK National Data Library, from centre-right think tank Onward, said that it should be “a centralised, secure platform to collate high-quality data for scientists and [AI] start-ups”. While data for scientists might mean the same thing as data for research programmes, data for AI start-ups certainly does not. So, that is another set of needs to understand, prioritise and design for.

Finally, the Labour government places an emphasis on mission-led government with a set of initiatives that cut across government departments, other parts of the public sector and wider society.

These missions will rely on data to understand problems, make change, and report progress to the public (perhaps as official statistics) and senior politicians. To deliver on their missions teams are also likely to need to use data in ways that create impact that differ from people’s classic conception of a public service. So that is another set of things to think about.

An old project that looked at ways to use data to create impact

And the library will prioritise between all of these different areas and their possible use cases  while maintaining strong safeguards and ensuring all of the public benefit?

It does not surprise me that no one has yet managed to work out what the national data library should be

And the idea of a data library has been leading people in confusing directions

Meanwhile there’s another problem. The term ‘library’ seems to be confusing things.

Like most people who does policy – and other things – I can take a strange joy from exploring definitions and meanings, but the term ‘library’ seems to be proving unhelpful. People seem to be thinking of it as a single object or thing that contains all the data.

Box-and-wire diagrams are in fashion this year, aren’t they?

A central data portal with a catalogue for all of the data seems to be a popular idea.

Yet a lesson we have learnt is that a single, big portal will not meet people’s varying needs when publishing data, searching for it, or making use of it. A broad scope like the one set out in the Labour manifesto needs lots of catalogues, portals and other things to ensure that data gets to people who need it and are allowed to use it.

A less popular idea, but an idea that is still visible in policy circles, is a single technology platform – such as the one described by Onward. This would be a platform where all of the data is accessible with common governance, technology and standards.

Unfortunately a single platform for data would be a great target for hackers, stifle innovation, and change democratic accountability in ways that are hard to predict. 

Government is not one organisation. It is thousands of organisations with varying goals, that fulfill their democratic lines of accountability in different ways. This means that different governance, technology and standards will often be appropriate. The world of using health data for medical research is pretty different to the world of using data to improve local authority services for planning applications.

And, just like a central portal, a central platform would not be able to meet the wide range of needs of data users.

Given the focus on the word ‘library‘ I’m even expecting someone to be daft enough to suggest library cards, perhaps with fines for people who don’t ‘return’ a particular piece of data on schedule…

The UK needs an approach that works for many different contexts and that builds on the work that is already being done by teams across the country.

There is not much discussion of the need to improve the government’s existing data services

Lots of data services already exist across the UK that might fall into the National Data Library’s broad scope.

Research programmes like Genomics England, Biobank, OPENSafely, the Justice Data Lab, Research Data Scotland and the ONS Integrated Data Service. Public service initiatives like CDDO’s data marketplace, the MHCLG Planning Data platform, the Information Gateway for digital verification services, or Democracy Club. Data for innovation by startups like DfE’s Content Store, and Geovation.

Each of these services has stakeholders with different behaviours, needs, motivations and skills. They also have delivery teams with different capabilities, strengths, and weaknesses. In the different services data might use different standards, because they meet different needs. There are multiple legal and governance frameworks. Data protection is not the only law at play here.

Some of these existing services are great, some are heading in the right direction, some don’t seem to understand their stakeholders, and some simply don’t exist even when they should.

Rather than building a single thing, the National Data Library should make it easier for teams across the public sector to deliver data services like these.

The National Data Library needs to help people deliver trustworthy data services

Despite the variations between these services there will be some shared problems and lessons that have been learnt and shared for how they can be tackled. This is an area where the National Data Library could usefully focus. 

Here are some ideas for the support capabilities that could be provided:

  • guidebooks and manuals for how to design trustworthy data services and their associated governance and oversight mechanisms. Organisations can then use these core practices or adapt and build on them in their own context.
  • a shared library of user research to make it easier for data service teams to understand the range of people impacted by their work such as researchers, data analysts, policymakers, and members of the public.
  • a design system and patterns, the data design patterns that currently exist tend to have been developed outside the public sector. 
  • reusable, tested and well-maintained components, for example platforms – or simpler technical libraries – for publishing data or information about research projects, tools that make it easy to create data portals, verify data against standards, or understand data bias and its potential discriminatory effects.
  • a data linking service, guidance, public engagement, transparency and approval mechanisms for linking data across government and/or non-government research infrastructures. Perhaps London shows a way?
  • data transparency and control services, that empower individuals, communities and regulators to understand and control how data is used at times and places that are relevant to them.
  • a National Data Academy that provides training courses and coaching in data skills
  • funding, to support experiments, pilots and discoveries in under-resourced public sector organisations.
  • communities of practice to maintain and improve all of the above. Some of these communities already exist, both formally and informally, but many need more support. Communities of practice could usefully exist in teams building data services and in the groups of people – like researchers and start-ups – that use them.
Design patterns are a way that digital team teams communicate repeatable solutions to common problems. These images are from  IF’s design pattern catalogue.

To stress. These are just ideas.

Other people might have better ideas. A team that helped people deliver trustworthy data services would have to learn what was needed by:

  • doing the practical work to build/improve some data services themselves,
  • working with existing teams to understand their challenges,
  • and listening to various viewpoints from outside the public sector.

This was roughly how GDS got going in improving digital services.

A data library team would also need to be careful of the overlap with existing things that help with the goal of delivering trustworthy data services. For example the ICO’s regulatory guidance on data protection, HRA’s advice on health research, or even the OSR regulatory guidance and ONS guidebook’s on statistics

But these are not insurmountable challenges and, from my own work across government, I’m confident that there are many needs for support that are not being met and many opportunities to tackle the problems together.

The existing data service teams across government can then use these new support capabilities to help them build trustworthy data services better, cheaper and faster.

Getting things started

It is important that one of things a national data library team that helps people deliver trustworthy services starts with is some practical work to help public sector organisations improve or build some data services. This would deliver some early impact, create momentum, and generate learnings and capabilities in that core team.

In selecting these data services it will need to look for some variety. This will help illuminate the problem / opportunity space.

The final decisions for where to start should be based on government priorities but – from my own knowledge about common problems / opportunities – perhaps it could include:

  • a data service that openly provides access to authoritative, non-personal data held by the public sector that could be widely used by public services and startups, for example address data
  • a data service that provides authorised organisations and researchers with secure access to attributes about individuals, for example their age or eligibility for benefits
  • a platform that helps research programmes publish accessible information about research projects throughout their lifecycle so that individuals and communities can understand how they are impacted by research activities
  • a platform that helps research programmes publish accessible information about the data they hold and types of research they support so that researchers can find what they need more easily
  • a service that makes it easier for multiple local authorities to publish data locally and then aggregates it for use nationally, for example information about elections or places where clean energy infrastructure could be built

By doing this work the national data library team can start to develop the much-needed guides, components and communities of practice that can deliver more trustworthy data services across the public sector.

The National Data Library should help people deliver trustworthy data services

The commitment behind the National Data Library provides an opportunity to improve how the UK public sector, researchers and startups use public sector data, but this opportunity will not be realised if the UK ends up in endless abstract policy debates, workshops and roundtables or – even worse – building big new central portals and technical platforms that everyone is told to use.

Building capabilities that support existing teams to deliver more trustworthy data services across the public sector will be far more impactful and start to deliver the change that is needed.

Want to read more stuff?

If you’re interested in the National Data Library then here are some links I found useful when forming my views:

And thanks to Ellie, Steve, Andy and others who I bounced around these ideas with as I was writing them up. All mistakes and idiocies are always my own.

Three thoughts from last week’s address data debate

The UK has an official list of building addresses and their locations – ‘address data’. This data is a vital resource for building public and private services that rely on locations, and is part of our national data infrastructure. At the moment, the UK’s address data is expensive, hard to access, not always accurate, and hard to correct. This causes problems for businesses and other organisations that rely on address data – and ultimately it affects us all.

A bit of legislation that would require the government to publish a list of addresses for the UK for free was debated in the House of Lords last week. Owen Boswarva has extracted the relevant text bits from the full Hansard record. James O’Malley has videos.

The debate had contributions from Labour, Liberal Democrat, Green and Conservative backbenchers. The Minister for the Conservative government then rejected the amendment.

Reading and watching back the debate made think about three things:

  • The government agreed to share deeper analysis, which is good news
  • But it misunderstands why previous attempts to recreate UK address file failed, that is bad news – and not just for addresses
  • The risks of openly publishing address data, or of not publishing it, are misunderstood

The government agreed to share deeper analysis

The Minister said that they were “very happy to share deeper analysis” of address data. This is good news, both because better evidence can create a better debate but also as it indicates that the government actually has some analysis.

The Geospatial Commission said they had no analysis

In 2022 the Geospatial Commission responded to a Freedom of Information (FOI) request by saying that it did not assess address data when preparing its strategy. Similarly in 2023, when the Geospatial Commission was agreeing a £31m contract with the Royal Mail, they said that they did not perform any analysis of the costs, benefits or alternative options.

There were some previous projects that did do deeper work. For example, in 2017 the government spent £500k, out of a potential budget of £5m, investigating how to create an open address file.

The results of a 2017 project were not published

Unfortunately 95% of that money was spent by the Ordnance Survey and government has refused to share the results. Perhaps now is the time to share the work that Ordnance Survey did?

Those FOI requests, and Baroness Bennett’s question about the benefits that other countries who have openly published address data are seeing provide tips on the kind of ‘deeper analysis’ that should be performed and made available.

A map of open address data around the world from OpenAddresses.io

A misunderstanding of why previous attempts to recreate UK address data failed

The Minister referred to previous attempts to recreate the UK’s address data, saying

the resulting dataset had, I am afraid, critical quality issues”.

Viscount camrose

As someone who spent part of 2014/15 working on a project to recreate the UK’s address data that was not why our project was stopped. The Minister might want to ask officials for more details as we learned some interesting lessons that the government needs to learn too.

The kind of innovation that government policy wanted to support

Our approach to recreating the UK’s address data was to start with data that the UK government already publishes. In line with the government’s “open by default” data policy, organisations like the Land Registry, Companies House, and the Valuation Office Agency spend money to make the data they hold available for other people to use. Some of this data contains address information.

We took this government data and extracted the addresses to form a starting dataset of millions of records. We could then ‘learn’ additional addresses through a combination of statistical techniques and information provided, with meaningful consent of course, by users of address services. This was all built into an API designed to make online services work better for more people.

We intended to make the bulk data available for free, and then generate just enough revenue for sustainability – perhaps from high volume users of the API. We set ourselves up as a not-for-profit company.

It was the kind of innovation that the government’s open by default policy is intended to support.

Much of the government’s open data was not ‘open’, this creates legal risks

Unfortunately we found that much of the government’s open data was not actually ‘open’.

The government’s copyright licence (the Open Government Licence, or OGL) excludes third party intellectual property rights. The third parties who hold IP rights in address data, Royal Mail and the Ordnance Survey, are litigious and many of the government organisations that published the data were unable to be clear on whether or not there was Royal Mail or Ordnance Survey rights in the data they published. We only used datasets where the publishing organisation told us it was ‘safe’.

But even though it was government organisations publishing the data they would not be liable if there was a legal issue. We would be. So we needed insurance cover.

I am reliably informed that multiple people received legal warning letters for this Private Eye piece that used address data to understand foreign ownership of UK properties. I wonder how Private Eye responded.

But given the risks only one insurance company was willing to offer cover and that was on unrealistic terms. So, we stopped the project.

To put it another way, an innovative, not-for-profit business could not use the data that multiple government organisations published to support innovation, because another government organisation might take legal action.

There are new plans to publish more government data, they risk the same problems

Zooming forward in time from the ancient history of 2014/15 and back to the present day various UK government departments are currently making new plans to publish more government data.

This is because of initiatives like the Vallance report on pro-innovation regulation of technologies and a desire to support the UK’s AI industry. High-quality, authoritative government reference data is one way of reducing the hallucinations that the current generation of AI models suffer from. Sounds sensible, right?

But publishing widely used address data is a lot simpler and safer than much of the planned work, yet the government failed to do so in a way that allowed organisations to clearly understand what they legally could, or couldn’t, do with it. Will this new wave of government data come with instructions telling AI models and engineers not to do anything with addresses? And what other third party rights might be lurking in there? Or will government just make AI’s copyright issues even more complicated.

If the government does not understand why its previous attempts to publish data did not yield the desired benefits then I fear a lot more wasted money in the future.

The risks of openly publishing address data are misunderstood

In the debate Lord Bassam said

 “there is a balance to be struck between privacy issues and the need to ensure that service delivery and commercial activity operate on a level playing field

LORD BASSAM

It is good that politicians consider privacy issues, but this misunderstands the risks.

Address data does not create new privacy risks

The list of addresses does not tell us where specific individuals live, the only personal data involved is likely to be those of people who name their business address after themselves. Instead address data tells us where people might live, work and play but not who is living, working or playing there.

(As an aside: I don’t want to imply that there are no risks of privacy, or other human rights, breaches with non-personal geographic data. For example in a separatist war in Sudan in 2011 atrocities were carried out because satellite data showed where particular groups of people were. But, hopefully, the UK is a long way from a separatist war and, let’s be honest, truly harmful actors will either simply buy the address data or use an illegal copy.)

The harms created by the lack of access to address data are more pressing

By contrast Lord Clement-Jones pointed out that 

The harms created by the lack of access to address data are more pressing

LORD CLEMENT-JONEs

While Baroness Harding pointed at the issues with the current data quality saying:

the quality of the data is not good enough….Anybody who has tried to build a service that delivers things to human beings in the physical world knows that errors in the database can cause huge problems. It might not feel like a huge problem if it concerns your latest Amazon delivery but, if it concerns the urgent dispatch of an ambulance, it is life and death.“

BARONNESS HARDING

Elsewhere the National Audit Office has pointed to the challenges of creating and using the shielding list of people with extreme clinical vulnerabilities during the pandemic. One of the challenges was inconsistent address data in different formats in different IT systems and organisations. This is one of the many challenges that opening up the official list of address data will help with, because over time more organisations will refer to and use the same reference data.

If the funding model changes then will quality drop?

There is one risk that was not discussed in the debate though.

If the maintenance and publication of address data is not funded from licence fees collected by Royal Mail and Ordnance Survey then will the quality drop?

This is where there is an important balance to be struck as people and organisations need the correct incentives to publish useful data.

Bluntly, this is the risk I worry about the most. Money is only one type of incentive but it is an important one in this context and it is one of the reasons why I’m so keen to see some deeper analysis of the current costs.

Experience tells me that the current costs are significantly overstated – particularly the Royal Mail who claim costs of ~£25m/year for ~300,000 changes/year. But however much the costs can be reduced it will still cost money to publish quality address data.

Making the publication of the data a statutory duty, as this amendment would have done, is one way to help tackle this risk. It requires the government to fund and do the work.

Perhaps the money might come from general taxation, and the increase in economic activity that will come from publishing the data? Or perhaps from a small increase in registration fees collected by local authorities who do most of the work to create addresses? Or a small increase in the Land Registry transaction fees, after all they handle nearly 50 million transactions per year?

Other countries have changed legacy business models, the UK should too

Whatever the final decision it will need some coordination and activity from a few public bodies willing and able to work together to publish address data as a public service.

And that’s where I hope the government is really focussing its analysis. Not on whether to publish address data for free, but on how to do it.

Because in the 21st century it is pretty sensible for high-income countries to make reference data, like addresses, as widely available as possible. That is why peers from so many different parties supported this amendment, and why so many other countries are doing the work.

The hard part of the work is changing the legacy business models and incentives of government organisations so that they make it happen. Other countries have done that, and it’s long past time for the UK to do the same.

Robots terms of service

In 2023 one of the AI debates was about when information and data on the web can be used to train AI models.

In late December we saw another billion dollar court case as the New York Times alleged that Microsoft and OpenAI had unlawfully used news articles to create AI models. 

In 2024 and beyond, then as well as the debate about how information can be used in relation to AI I expect we’re going to see more debate about how services can be used by AI. 

If we peer into the future, perhaps we need terms of service for robots?

AI services will connect services from multiple existing organisations in new ways

As Sarah Gold puts itwhen applied to technical infrastructure, LLMs become a kind of connective tissue…[they] will connect different systems – at scale. They will execute complex and multi-part tasks, across different departments and organisations”. 

From a consumer perspective this will manifest as different kinds of services, such as learned services that are deliberately designed for particular tasks like moving home or arranging a holiday, to more general-purpose AI agents that can help with a range of tasks.

The technology to enable these kinds of services is getting ever closer to working at scale, but services are not only made of technology.

A concept of a learned service that helps a family move home, by Projects by IF.

Service providers will have relationships with both users and AI providers

From the perspective of existing service providers this new wave of AI services will look like another relationship in addition to the existing relationship with service users. 

With AI agents there are important relationships between users, service providers, and the organisations that provide AI services. The new relationship between the AI service provider and its service users is also very important, but this post focuses on the relationship with existing service providers. Picture by me with assistance from DALL-E.

These kinds of three way relationships obviously already exist. Many people use travel agents to help arrange holidays. Supermarkets bring together food from multiple suppliers and make it available in one place. My sisters and I help my elderly mother use various services.

But AI has the potential to create new arrangements at speed, at scale, and without pre-existing contracts. To provide a simple example, an AI service could ring a series of hotels to make bookings for a train trip across Europe.

Many service providers will not be happy with AI services using their services

But just as existing service providers have not been happy with AI companies using information, many service providers will not be happy with AI services using their services.

Some of this discomfort will be from a simple fear of competition, but in other cases it will be because of other fears such as:

  • consumers being dissatisfied because a service does not meet their expectations, perhaps because an AI service generated an incorrect description of a hotel
  • risk of regulatory action, perhaps the AI service does not collect identity information in a way that meets local requirements
  • that it will generate degrading work for humans, for example through a large number of AI service providers using computers to make repeated phone calls for information
  • whether the existing service provider and AI service provider are receiving fair shares of the value created by the combined service

Robots terms of service

Some of these fears can, and will, be overcome by existing mechanisms.

Liability laws are being updated. AI services that take the mickey will be sued. Some AI and service providers will negotiate new contracts that create new rules for payment of commission, or for how workers should be treated. This will all need to happen across a large number of sectors, industries, geographies. 

But I also wonder if we need to look at some other existing concepts like terms of service, one of the, often lengthy, bits of legal text that humans get when we agree to use a service.

Picture by me with assistance from DALL-E.

If we are heading to a future where new three way relationships between humans, service providers, and AI-powered services can – and probably will – be created at speed, scale and without pre-existing contracts then, perhaps, service providers will need new terms of services that describe how AI robots can use their services?

The UK government wants to create £11bn/year of value from location data, will its paper on location data ethics help?

The UK government has a strategy to increase the use of location data – data about the location of people, events, and places. Its economists have said that this could create up to £11bn of economic benefit per year. That is an attractive goal for a government that has promised and needs economic growth.

As part of this strategy the UK government recognises the need to retain public trust in the use of location data.

That is useful and sensible. An overly simplistic focus on economic growth from collecting and using data will lead to harm to people and communities.

The body responsible for the location data strategy, the Geospatial Commission, recently published a policy paper on location data ethics with a goal of building public trust in the use of location data.

Unfortunately the paper’s analysis is incomplete and unlikely to have any real impact.

This risks harm to people and will contribute to a failure to deliver the promised £11bn/year of economic benefits. 

The paper proposes an ABC of ethical use of location data

The paper recognises that it is “vital that its use retains public trust and confidence”. It used a public dialogue, quantitive survey, and analysis from the Geospatial Commission’s team to suggest three ethical building blocks, on top of existing legislation, to help achieve this goal of public trust.

The building blocks are:

  • Accountability – Governing location data responsibly, with the appropriate oversight and security
  • Bias – Considering and mitigating different types of bias, and highlighting the positive benefits of location data
  • Clarity – Being clear about how location data will be used and the rights of individuals

Unfortunately, the paper’s analysis is incomplete

The paper’s analysis is incomplete, to provide just four examples it: 

  • has a smaller set of considerations than other geospatial ethics work
  • implicitly assumes that all people have the same capacity to make decisions
  • does not discuss ethical issues about places
  • and does not consider the business models of the organisations that collect and provide location data

There are three ethical building blocks compared to ten principles in the Locus Charter

These building blocks are a subset of the principles captured in the international Locus Charter for responsible use of location data.

As a result the proposed building blocks do not include protecting the vulnerable, protecting rights or data minimisation.


The Locus Charter has 10 principles, the Geospatial Commission covers accountability (#10), bias (#5) and adds clarity.

Several UK organisations, from the public, private and third sectors, participated in the global work that created the Locus Charter.

The paper does not explain why this existing work has not been reused or why only a subset of the Locus Charter has been included. 

Not all individuals or decisions are the same

The paper states that it wants to help individuals make “more informed, meaningful choices” but does not recognise that not all individuals or decisions are the same.

There is no consideration of factors such as someone having no choice in whether to use a service, having low skills, being busy or stressed when they make a decision, how people’s decisions can affect other people, or how some people may be more at risk than others.

This analogy forms part of the Geospatial Commission’s guidance. It is not a useful analogy. Making food safe requires many more activities than clear packaging. For example there are controls on food ingredients, training requirements for staff, and inspections by regulators. 

The paper does not even reference the age-appropriate design code – a statutory code of practice produced by the UK’s Information Commissioner’s Office.

This code was intended to help protect and empower children by ensuring that more services are designed for their needs. The design code states that online services likely to be accessed by children should have geolocation options turned off by default.

The ethics of place and communities are neglected

The paper says that location data is about people, events, and places. Unfortunately, the ethical considerations that are discussed are mostly concerned with location data about individual people, or personal location data, the ethics of place and communities are neglected.

You would look in vain if you wanted to find discussion of ethical issues such as:

  • whether and how places – such as domestic refuge centres, abortion clinics or military bases – might need to be protected to prevent harm
  • whether the UK’s regional naming systems – such as Welsh placenames, or Scottish tenement flat numbers – need to be better supported
  • whether and what decisions about the collection, sharing and of location data should be made by devolved administrations, city-regions, or the local communities that live in those places

Business models are an ethical issue but are not mentioned

Finally, many ethical issues stem from business models and incentives.

Neither the business models of the UK’s public sector geospatial agencies or those of the private sector firms, for example those in the advertising industry, that collect and manage location data are considered.

There is no discussion of whether these existing business models create harm for people or reduce the social and economic benefits that location data could create. For example, bad address data causes people to be denied credit cards and struggle to register to vote, while the sale of location data by data brokers is coming under increasing scrutiny.

The paper is unlikely to have any significant impact

These are just four gaps in the analysis. I could go on, for example there is no recognition of the distributional impacts of the benefits and risks across different places and different groups of people, but as well as the incomplete analysis the paper is unlikely to have any significant impact.

This is because of:

  • the hugely diverse range of organisations and people it needs to affect
  • it is not sufficiently grounded in how teams work
  • it does not assess existing legislation or consider legislative change

Will thousands of people in a huge range of sectors read the paper?

As the paper says location data is ubiquitous. To create impact the Geospatial Commission needs to affect the behaviour of the thousands of organisations and people that collect, share, use and maintain location data..

The paper uses examples from a huge range of sectors and disciplines: transport, wearable technology, drones, space, smart homes, healthcare, statistics, social sciences, online services, public services, private services, and the UK’s ubiquitous potholes.

A pothole in Newport on the Isle of Wight, CC-BY-3.0 by Editor5807

Are organisations and workers in all of these sectors and disciplines likely to follow the recommendations in the paper? Does the paper align with existing practices, challenges, and opportunities in their contexts? Are they even aware of the Geospatial Commission?

Insufficiently grounded in how teams work

Rather than looking at policy papers, the teams who build products and services or perform statistical analysis tend to use manuals/guides, reuse openly available code, data and design patterns, and learn from how other teams do their work.

The teams who developed the Locus Charter understand this. They blog, publish code, and have a repository of useful material.

Within the UK this is also one of the reasons why Government Digital Services worked in the open. They regularly blogged, published open source code, and created a service design manual for public services built by central government.

It is also why the Government Statistical Service created a manual for reproducible analytical pipelines (RAPs), and why the National Statistician’s Data Ethics Advisory Committee (NSDEC) publish their decisions and decision making processes. 

Consciously or not, these guides, source code and data embody particular types of ethical practices and decisions made by teams, but it is noticeable that the Geospatial Commission’s work is stuck at the level of principles and policy papers, rather than producing their own examples and guides or by embedding their proposed principles into the existing manual and guides.

It does not appear that the Geospatial Commission have thought about how to take their work to the places where digital and statistical teams do their work.

Searching the GDS design manual for location data brings up a single link about protecting user research participants

The paper does not consider legislative change

Organisations and people will also respond to legislation and regulation, but while the paper states that existing legislation provides a baseline it does not describe the existing legislation or assess whether the legal and regulatory framework needs to change.

As previously mentioned the paper does not cover the ICO’s Age Appropriate Design Code and its section on geolocation.

It also does not mention the existing legal definition of location data, under the PECR regulations, a definition which is narrower than the Geospatial Commission’s work, the accompanying guide, and the work that has taken place in multiple organisations to comply with PECR. 

The PECR definition of location data is narrower than the Geospatial Commission’s definition

The Geospatial Commission was established as an “independent, expert committee” but despite this independence the paper does not reference the UK government’s ongoing plans to reform data protection law or assess whether the changes will help or hinder the Commission’s strategy.

Meanwhile new changes to legislation are not considered in the paper. No reason for this is given.

These changes might be to help deliver on the Geospatial Commission’s objectives of economic growth, to enforce the Geospatial Commission’s proposed “ABC” building blocks, or to help protect people and communities from the increased risks that more data collection, sharing and use will cause.

Even the simplest of legal changes, such as making personal location data a new type of special category data, or powers for appropriate public sector organisations or regulators to make more reference data about places openly available and reusable, are not considered.

This paper will not help deliver the promised benefits

I am deliberately not forming opinions on whether the Geospatial Commission’s goal of creating £11bn of economic value from increased use of location data is feasible and/or desirable, but I cannot see how this paper will help.

As the Geospatial Commission correctly identified, their strategy needs to retain public trust in the use of location data. A failure to do this risks harm to people and will contribute to a failure to deliver the promised benefits. And failing to deliver those promised benefits will harm the government’s wider economic objectives. This paper will not help retain public trust and can only contribute to the inevitable failure to deliver the benefits.

Rather than an incomplete set of ethical building blocks we need more practical guidance and tools for teams to use, and an assessment of the legislative and regulatory framework that the government’s proposed larger location data market will need.

Three policy ideas to help the UK adapt faster to the internet

The UK is having a general election on December 12th. Over the next week political parties will put out their manifestos. Those manifestos will contain lots of commitments about what the parties will do if they are elected.

When I looked at the manifestos for the last general election in 2017 I was disappointed at their lack of recognition of the changes the world was going through because of technology. To help this time, here are three simple tech policy ideas for any party. They’re focussed on helping the UK adapt to the current wave of technology change. They are a bit late for the manifestos, but they still might be useful.

A bit of context

First, a bit of context. Technology is always changing but it has changed a lot in the last few decades with the proliferation of computers, the internet, the web, and data. These technologies have changed things for governments.

Some citizens now have higher expectations from public services. They expect public services to behave like those they get from Google, Amazon or whichever service is hot this year, *checks notes*, such as ByteDance’s TikTok. Technology is enabling things that some may think should be public services — like accurate mapping data on smartphones, or being able to have a video call with a doctor.

Other citizens now have more fear. Perhaps because they are excluded from those services because they lack skills or access to the internet or perhaps they are at risk of being discriminated against because technology is being used to perpetuate, or accentuate, existing societal biases.

Using new technology to help deliver public services that work for everyone is a tough job that, despite good work by Government Digital Services, government still has not cracked.

Image from For Everyone via the Web Foundation

New technology has also enabled new businesses, markets and types of services to emerge. Things like smartphones, social media, cloud computing, online retailers, online advertising, and the “sharing economy”. The world is now more interconnected. Someone in Wales can rapidly build an online service and start selling it to people in India, and vice versa. Meanwhile because the technologies have also been adopted by existing companies they affect government’s role in existing markets.

Technological waves of change like this are not new — I recommend reading some history about the after-effects of the invention of ocean sailing, printing, electricity, or television — but governments have been particularly slow to adapt to this wave of technological change.

Why? Perhaps because the technologies have changed things globally. Perhaps because of the type of governments that we have had. Perhaps because of lobbying by businesses. Who knows. Future historians will be better placed to assess this.

Anyway, my suggestions are not about the details of each of these areas. Instead they are about how to increase the rate of adaptation for the next government. About how to get more radical change.

Tackle the fear around technology and politics

There is a lot of fear about what technology means for politics. Misuse of data by companies and political organisations. Highly targeted advertising reducing accountability. Foreign governments interfering in elections. This fear is exacerbating a pre-existing low level of trust in and disengagement from UK democracy.

Political parties should start with themselves. They need to be open about how they are using data and online advertising and publish data about their candidates to help voters make more informed decisions. Political parties should not use micro-targeted advertising during the election, and should challenge their opposition to follow their lead. Where necessary they should err on the side of caution when using advertising tools. After all, much targeted advertising is already likely to be illegal under existing legislation. Doing these things will help politicians learn how to responsibly use technology while competing for power. That will help them use technology responsibly if they get in to power.

Whoever gets into power should then ban targeted political advertising until it is shown to be reasonably safe. To understand the effects researchers will need access to data held by the big technology platforms like Facebook, Twitter, Google and Apple. Organisations in the USA have faced challenges when trying to do this with Facebook but approaches like the ONS ‘five safes’ and the Ministry of Justice data lab show that parts of the public sector have the necessary skills to design ways to do it. Government should use models like this to give accredited researchers access to data held by the platforms to inform future policy decisions and, perhaps, when to relax the ban for certain kinds of ads.

Develop technology literacy in more of the public sector

To implement a party’s manifesto commitments — whether it be implementing municipal socialism, moving to a zero carbon society, (re)creating an independent Scotland, agreeing new trade deals (if Brexit actually happens), free broadband, a charter of digital rights, or implementing an industrial strategy and increasing R&D — public sector staff need to understand how technology affects their work and technology experts need to understand the public sector.

Sometimes a horrified face emerges from behind my polite face. I apologise to everyone who has seen it.

Unfortunately too many people still do not get it. In my own meetings with governments I am often surprised, and sometimes horrified, by whole teams of people with limited technology literacy making significant decisions about technology. (Similarly, I am often surprised, and sometimes horrified, by teams of technology experts making significant decisions that impact on policy or operations with no real experience in those areas.)

Not every public sector worker needs to be a technology expert, and it is certainly not true that everyone needs to know how to code, but it is necessary to have technology literacy in many more parts of government. More public sector workers need to understand both the benefits and the limitations of new technology and the techniques that people, like me, use to build it.

This is one of the most important things to focus on. Different skills are needed by different roles, but an underlying element of technology literacy is useful for everyone.

To start providing this technology literacy I would recommend vocally demonstrating that technology experience is as valued as other skill sets and encouraging more technology experts to join teams that lack that experience, and by seconding non-technology staff into technology teams. In both cases people can then listen to and learn from each other.

An independent inquiry into technology regulation

Finally, regulation. Technological change needs changes to regulators and can lead to the need for new ones. There are a growing number of known gaps in technology regulation. Some of these gaps affect public services, like the police. Others affect public spaces, like facial recognition. Some affect new services like social media. Others existing ones, like insurance. In some cases it is not clear if regulators are appropriately enforcing existing rules, like equalities and data protection legislation, while there will be a large number of gaps that people simply haven’t spotted yet.

Previous governments have set in process various initiatives such as considering the need for a new social media regulator, a national data strategy, and a Centre for Data Ethics and Innovation (CDEI), but these initiatives are not adequate. They are controlled and appointed by the current politicians, operate within current civil service structures, and are mostly taking place in London. The changes bought about by technology are too fundamental for this approach to work. The UK needs something more strategic, more radical, more independent, and more citizen-facing.

An independent inquiry into technology regulation should be set up. It should have representatives from around the UK; with different political views; with experience from the public sector, private sector and civil society; and from both citizens that love modern technology and from the groups that are most at risk of discrimination. It should look across the whole technology landscape, have the power to call witnesses, and be empowered to make a series of recommendations for changes to legislation and regulation to help set the UK on a better path for the next decade.

Inquiries like this can happen faster than you think. The recent German Data Ethics Commission took just 12 months to come up with a set of excellent recommendations. Setting a similar timescale for an inquiry in the UK will allow the next Parliament and the next Government to focus on delivery.

It is necessary and possible for the UK to adapt to technology faster

Politicians and their teams can learn how to use technology more responsibly by tackling the fear around technology and politics; mixing up teams in the public sector can help staff learn from each other; and an independent inquiry into technology regulation can help set the UK on a better path to the future.

The UK needs to adapt to technology faster. For the good of everyone in the UK, but particularly those who are being disadvantaged by irresponsible use of technology, can we do it? Please?

AI and the Committee for Standards in Public Life

The UK has a Committee for Standards in Public Life (CSPL). It advises the Prime Minister on ethical standards across the whole of public life in England (yes, only England — ethics must be a devolved matter).

A picture of some people by L S Lowry (via Flickr)

The committee is currently investigating Artificial Intelligence and whether the existing frameworks and regulations are sufficient to ensure that high standards of conduct are upheld as technologically assisted decision-making is adopted more widely across the public sector.

Big topic. After all AI is a range of techniques that uses people, mathematics, software and data to make guesses at the answer to things. It can help, and hinder, with lots of the huge array of things that the public sector does.

I represented the Open Data Institute (ODI) on a roundtable for this investigation. A couple of people have asked me what the roundtable was like and what I said. Here’s a quick blogpost.

Preparing for a roundtable

The ODI team get invited to lots of roundtables and events. We decide which ones to do and who does them based on a range of criteria. The invitation for this one went to our CEO, Jeni Tennison, she passed it to me to do. My goal was to help the committee, learn from what other attendees were saying, and test some of our ideas in front of this audience.

We did our usual preparation by sharing the questions around the team in the office and telling our network that we were going along to hear what advice they gave us. That technique provides a lot of input. It also helps me represent the ODI and the ODI’s network, rather than simply myself and my own views.

I summarised it down to a few key points to try and make, and then tried not to over-prepare. Over-preparation is the worst sin: it makes me sound even duller than normal.

Rounding a table

The roundtable itself was at Imperial College in London.

The setup was more informal and the committee was more friendly and asked more insightful questions than most similar things I’ve done. That was good. My background is technical and private sector — I previously spent 20 years working with telecoms operators building products, systems and networks — so I always worry that I’ll misunderstand or miscommunicate particular words or phrases. That would damage both me and the organisation I represent.

Anyway, I managed to get over versions of some of things that we’d prepared and/or that we regularly discuss in the office and that were relevant to how the roundtable took shape:

  • that there is little transparency over use of AI in the public sector and of how the UK government’s Data Ethics Framework is being used. I know that there is good and bad work being done, but mostly because I know some of the people doing it. How are the general public meant to know?
  • that we need to focus more on the people who design, build and buy AI services. Exploring what responsibility and accountability they should have and how we give them the space, time and money to design those services so that they support democracy, openness, transparency and accountability as well as being efficient and easy to use
  • that the current focus on ethical principles and AI principles do not seem to be having a useful effect. That instead we need to couple those top-down interventions with more bottom-up practical tools (like the framework or ODI’s Data Ethics Canvas) and more research into how the people designing, building or buying AI systems make decisions and what will influence them to comply with the law and think about the ethical implications of their actions
  • that control, distribution of benefits and harms, rights and responsibilities about AI models would be a useful area to explore
  • that eliminating bias is the wrong goal. Bias exists in our society, some of that bias becomes encoded in data and technology. AI relies on the past to predict the future, but the past might not reflect the present let alone the world we want. We should build systems that take us towards the future we want, and that can adapt as things change
  • that in a world which is increasingly online-first and where we risk the state disappearing behind a smartphone screen and automated decisions, that the principles of public life should be updated to put the need for humanity front and centre

I also learnt a lot from other attendees with some interesting things for myself and the team back in the office to chew over.

After the roundtable

A couple of weeks after the roundtable I was sent the transcript to review. The committee will publish that transcript openly — which is good and healthy. Attendees get to see the transcript first so they can suggest corrections to simple grammatical errors or transcription problems. That’s why I’m not commenting on or sharing what other people said.

It is important to review the transcript. There are sometimes errors. For example, in this transcript I was recorded as saying that my boss, Jeni, was “whiter than me” rather than “wiser than me”. I have no idea how I’d measure the former but I certainly know that she’s the latter. Some of the words and thoughts in this blogpost come from Jeni and others in the team like Olivier, Miranda, Renate, Jack &c &c &c.

Reading the transcript also helps me understand the difference between the clarity of my speech and the clarity of my writing. I’ve left most of my spoken errors in place. Just like the state we can’t only communicate in words and pictures that are sent through a computer. Most of us need to get better at speaking with humans.

A crap analogy

I was home recently and took my sister’s dog for a walk. When we were young we had dogs, Spud and Gyp, so it was a walk I’d taken before. A few things had changed. One was that there was less dog poo.

Me (left) taking my sister’s dog for a walk around Fairhaven Lake.

It was strange comparing the memories of those messy streets, including muck left behind by Spud, to the reality of the present day with dog walkers cleaning up and signs warning of penalties if they did not. There has been a change in our social norms. In return for the right to walk a dog, most people now accepted they needed to clear up behind them.

My day job is doing policy for the Open Data Institute. Policy is about changing outcomes, hopefully for the better.

On their own, legislation and guidance won’t fix challenges like data ethics, making data as openly available as possible, or the many other complex challenges that limit the social and economic value that societies get from data. It will need social change too.

I’m interested in how that change happens, including how society decided dog walkers should clean up the dunghills created by dogs.

People like having dogs, but dogs make a lot of shit

I found a blogpost about a book by Michael Brandow telling the tale of the introduction of a poop scooping law in New York City. I got a copy of the book and settled down for a read.

It would take a lot of rain to clean up 500,000 pounds of dog feces. (image Taxi Driver, copyright a big film company)

People like having dogs (*). They like having a companion. They like going for walks. Dogs can make people feel safer, particularly in a city that had as high a crime rate as 1970s New York. But dogs make a lot of shit (**).

In 1974 New York City’s Bureau of Animal Affairs estimated that 500,000 pounds of dog faeces were hitting the streets every day. The city’s population was growing. More people meant more dogs, more dog excrement and less space to step around it. That affected not just dog walkers but everyone else using the streets.

This sounded analogous to the interweb’s superhighways. While some people are having fun, other people are stepping in the dog doo-doo we make. I read on.

The dog doo-doo battle of many armies

There was a long battle to clean up New York City, it lasted for most of the 1970s. The battle involved many familiar armies.

There were a mix of civil society groups in the battle. Some wanted cleaner streets, others just wanted to keep walking their dogs, and some saw the opportunity for self-publicity. There were also people who didn’t care about the battle being waged under their feet.

A search on Amazon shows 1,357 results for ‘poop scoop’

There were businesses in the battle too. Some businesses simply wanted cleaner streets outside their shops. A pet food association objected to the final legislation because of the impact it might have on their customers, dog owners. Other businesses saw new opportunities. There was a boom in innovative, and probably disruptive, dirt cleaning solutions that continues to this day.

When dog owners look like their dogs is it correlation or causation? And which way is the causation? (source: National Library of Ireland on the commons)

Different government organisations took positions. In 1970 a new city Environmental Protection Agency had been created. Its leadership saw the opportunity to clear up a problem affecting citizens. Other organisations didn’t want the cost of enforcing new legislation and argued for others to take the lead.

Some organisations seemed to see a chance to pass part of the cost, and blame, for cleaning the streets to dog walkers. I suspect many other government organisations were wondering why all this effort was being spent on canine coprolites.

Meanwhile politicians were trying to navigate between all of these interest groups to tackle both this problem and others facing the city.

Politicians talking crap

Throughout the 1970s some argued that people could be persuaded to change behaviour without legislation through campaigns and leaflets. Both civil society groups and government organisations tried to do this and had some effect in parts of the city.

A waste receiver for dogs

Others said dogs should use bathrooms in houses, use different sides of the street on alternate days, or even be banned from the streets altogether. The mess caused by dogs risked all the enjoyment being taken away.

Some dog walkers, government organisations and politicians said that it was government’s job to scoop the poop and that government should have more resources for street cleaning.

There were politicians who thought that no legislation was needed as other problems took a higher priority. One politician said that he was keen for the legislation to happen as it would encourage city staff to focus on dogs rather than car parking fines. All politicians were heavily lobbied, by dog lovers and dog poo haters.

I can see a common pattern here. Regardless of whether the policy is about data or doo-doo we need public debate to gather ideas and decide who has to do what, what resources they have to do it with, and whether they get paid for the doing.

There was a campaign over public health issues with statements that an illness called toxocariasis, which can be caused by worms in dog excrement, was causing loss of eyesight in children. This risk appears to have been significantly overstated, although it looks like incidents of toxocariasis are reducing in the UK since dog waste laws were introduced there, but it was an effective campaign.

The debate raged until Ed Koch became Mayor and took a different tack. Rather than having another go at getting a new law passed in New York City’s legislature, he took the problem to the politicians at the New York State Senate. At the state level politicians debated how different solutions are needed in cities to more rural areas and passed legislation that only affected large cities (***). The law gave the city the power to fine people who didn’t scoop their pooch’s poop.

In all policy work sometimes you have to explore a few paths before you get to your goal.

Clearing up dog shit is good for society

Throughout the debate there was a common thread. A city that welcomed dogs but that had less dog faeces scattered around would be a better city.

Dog owners enjoyed the company of their dogs, but other people in their local communities were affected by their enjoyment. Pavements, or sidewalks in NYC, are shared spaces. Use and misuse of that shared space affects everyone who lives in the city. After a debate dog owners were prepared to take on the task of clearing up some of their mess for the benefit of wider society.

A super pooper scooper sign in North Vancouver communicating the new social norm in multiple languages. Image via “New York’s poop scoop law: dogs, the dirt and due process” by Michael Brandow

It is hard to know what was most effective — the debate, the civil society campaigns, the leaflets and signs, government loudly declaring that it had legislated, or the final push of fines. I’ve struggled to find good crap data. But the repeated legislative battles show us that NYC policymakers thought a law was required.

The book includes an interview saying that six years after the legislation was passed, 60% of dog owners were cooperating with the law. After a dog doo-doo battle which led to legislation for England and Wales in 1996, a larger shift in public behaviour was seen after more time had elapsed. A study in 2014 by three researchers from the University of Central Lancashire, 10 miles from my hometown, reported that only 3% of British people would not pick up their dog’s poo.

The shift from the streets and dog walkers of my childhood to one where only 3% of British people will not pick up dog poo is a significant change for the better (****). That is social change in action. Social change that made my walk a bit easier. Even though I now had to clear up after my sister’s dog everyone, including me, could enjoy the park a little bit more.

But, does this tale teach us how to make data better?

A crap analogy

Well, not directly. The title of this blogpost wasn’t a joke. It is a crap analogy. Our motives for using data are different from the simple motives — have fun, feel safe- of walking a dog. Data is not like doggy doodah.

While data is not like doggy doodah, Misha Rabinovich has shown that you can use data about faeces to make art. This artwork is temporarily installed at the Open Data Institute for a 2018 exhibition. I wonder if it subliminally got me thinking about this blogpost.

We can all agree what dog poo is, but we cannot all agree on the mess being created by how people are collecting, sharing and using data. We haven’t reached an agreement on what ‘good’ looks like and what outcome we are trying to achieve.

Meanwhile although the data ecosystem contains many of the same actors — individuals, civil society groups, businesses, and government organisations — each with their own changing motives and power it is more than a physical city. There are multiple virtual global villages which manifest themselves in our physical towns, cities, nations and continents. Someone in the UK can create mess on a virtual street used by people in Uruguay, the Ukraine and Uganda. It is trickier to deliberately change social norms and create better outcomes in such a complex system.

But the tale should remind us that given time and effort people are willing to change behaviour and reduce the negative impacts they have on other people. Do you need a New Year’s resolution for 2018? Let’s keep having fun with data, but let’s think more about other people and clean up some of the shit that we’re creating.

(*) and other pets, such as cats, that also lead to interesting tales about data

(**) data about other swear words is available

(***) UK politicians and dog waste policymakers would possibly benefit from reading that 1978 New York State Senate debate as it seems that UK is still discovering that while bagging it and binning it works in cities, in more rural areas you need to stick it and flick it.

(****) despite the improvements some people want city streets that are completely clean of the odious dog ordure. You will regularly see news articles about towns and cities saying that they might use CCTV tracking, registration schemes, and dog DNA databases to catch offenders. A company called MrDogPoop claims to have “the most powerful Dog Poop DNA matching database in the world” to help track down poops that avoid the scoop. These city-wide schemes tend to disappear when people realise the cost and debate uncovers that a rover registration scheme is too much of a stretch to our social norms.

Data and policy talk — November 2017

Approximate words of the talk I gave at the Data gedreven Beleidsontwikkeling / Data Congress event in November 2017.

Hi, I’m from the Open Data Institute, or ODI.

I’ve been asked to do a talk about “data and policy”. First, an apology. I don’t speak Dutch and sometimes I speak English too fast, and sometimes too quietly. That makes it harder for people who don’t speak English as a first language. Sorry. Shout at me if I do that and I’ll speak more clearly.

I want to start by expanding on the word policy. It means different things in different contexts.

Merriam-Webster has a definition of policy that says “a high-level overall plan embracing the general goals and acceptable procedures especially of a governmental body”.

That is a classic definition but there are other meanings and contexts.

Within organisations there will be policies for compliance with data regulation, like GDPR, or for how data should be collected, used, stored, shared or opened. Businesses, civil society and not-for-profit organisations will also have public policy positions on “government policies that affect the whole population”.

At the Open Data Institute lots of members of the team deal with all of these meanings of policy in different contexts. Most of my work is on public policy, but I’m trying to influence both governments and businesses.

The ODI is not-for-profit. We work globally, our headquarters are in the UK. We were founded by Sir Tim Berners-Lee, the inventor of the web, and Sir Nigel Shadbolt, an AI pioneer. We are not partisan but we are political. Data is a political topic. Open is a political statement. Our mission is knowledge for everyone.

A (hopefully) comprehensive map of where ODI has done work, where nodes have formed and where members are.

It’s our 5th birthday this year. Yay us 🙂 I’m going to share some policy lessons from those 5 years. The lessons have been learned from our work around the globe, our peer network of nodes and our network of members.

Policy is one of the capabilities we use to help us deliver our mission and strategy. We also do a lot of work with technology, training people, gathering evidence, building communities and incubating startups.

First, let’s talk about open data. Open data is vital and incredibly important but we learnt that if we only talk about and use open data then we can’t deliver our mission. Instead we work across the data spectrum.

the data spectrum

The data spectrum is about access. Who can get to data so they can use it or share it or etcetera. Some data should be kept closed within an organisation, like sales reports. Other data should be shared: the police need to be able to see your driving licence, medical records can help with research, twitter data can help us understand how social media is impacting our societies. Lots of data should be open data, things like bus timetables, maps and addresses.

At the ODI we learnt that we need to talk about and use the full spectrum of data to both get more open data and deliver on our mission.

We also learnt that we need to combat the very strange view that data is oil or coal or other types of fossil fuels.

I can, and often do, talk in economic theory about the different qualities of data and oil, but there is a more important difference. It creates the wrong mentality. People fight over control of oil. They want to hoard it for themselves. They want to sell it for huge amounts of money. This is not the way to get the most value from data, an increasingly abundant resource. The thinking generated by treating data like oil reduces innovative use of data and causes loss of trust by societies in how data is used.


Instead we need to turn data into infrastructure. It is already heading in that direction but we need to strengthen that momentum. Great infrastructure is boring, reliable, safe and easy to use. It’s there when we need it. Data is decades away from being boring, trust me *pause for ironic, self-knowing laughter*, but that’s the direction to head in. Turning data from every part of society — especially the public and private sectors – into safe, trusted and easy to use infrastructure that underpins every sector of our economy and our societies.

And that infrastructure will be built on a foundation of datasets that are made available as open data, for anyone to access, use and share. That foundation of open data makes it easier to publish and use other data.

The third lesson is about goals. Sometimes it can feel to other people like the goal of the open data movement is only to publish more open data or to put data on portals. That’s the wrong goal.


We think, talk about and use open data as a tool. One of several tools in the toolbox.

A toolbox that we, and others, use to tackle problems. Like finding a job that you enjoy, combatting corruption, finding your way around a city, responding to the threat of anti-microbial resistance, helping with house planning and building, or understanding the growth of new sectors and business models like the sharing economy (something we’re looking at in our new R&D programme).

The fourth lesson is about chance. Chance is great. Very unexpected things happen when you open up data. One of my personal favourites is that the UK government opened up radar data that was originally gathered for planning flood defences and people used it to discover both new places to grow wineand new Roman roads that criss-cross parts of the country. Fantastic. But that doesn’t always work.


Instead we learnt that we need to put more focus on creating impact by design. Looking for problems, working with people who are experts in tackling it and helping them to use data as one of the tools in their toolbox. When we do that then chance can also happen, but we also have a much higher chance of impact, and impact is necessary for sustainable change.

So those lessons are some of the ways we learnt to think about data over the last 5 years — about the full spectrum of data, about data as a tool, about impact by design, and about data as infrastructure. Those mental models are part of our approach to public policy.

But through our work and delivery we have also learnt some of the most effective levers that we have to create impact. In our policy work we amplify those levers and encourage others to use them or build their own.

First, practical advocacy.

Over the years we’ve developed a set of guides and a toolbox. They’re openly licenced. Anyone can use them, or fork them and change them. That can be a challenge for an organisation that needs to bring in revenues but it’s the right thing to do for a mission with an an open culture and a big mission. We don’t want to do everything, even if tried we wouldn’t be able to. We want to make it possible for other people to do what we do.


The practical advocacy tools keep on expanding.

We recently launched the first version of a data ethics canvas to help organisations using data understand, openly debate and decide on ethical issues about collecting, sharing and using data. Interestingly when we looked into data ethics we found that most of the debate was about personal data in the closed and shared parts of the data spectrum. People had missed the ethical issues around open data and non-personal data. The canvas might help fix that.


As part of our research & development programme we’re exploring how open data is being used in public sector service delivery and how it could be used more. There are some famous stories about open data helping to reimagine public services but we are still seeing the same old stories and not enough momentum. We’re hoping that through our research we can help understand the barriers to change, and build some methods and patterns that will help people do more things to use data to improve public services.

Patterns are important. We’ve also developed a set of design patterns for policymakers that use data to help them create impact. While data policy people might know data, many other policymakers don’t. We need to reach them and put data into their context, in language they understand and helping them understand how it can help them tackle their problems.


Through approaches like evidence-based policy many policymakers have realised that data can help inform policy, but these patterns also help show policymakers how data can help deliver policy. Whether that policy is reducing costs, improving an uncompetitive market, or helping consumers switch between service providers.

The next big lever is networks, peer networks in particular.

Peer networks are horizontal organisational structures with members who share similar identities, circumstances or contexts. We run global, African and European peer networks for open data and have seen their power in developing learnings and creating change. We’re learnt from how they have grown and how the people in them interact.


We’ve been seeing peer networks start to emerge in other work they do. Things like ODINE (open data incubator Europe), Datapitch (another Europe-wide startup incubator), and the sector programmes.

We believe that fostering other peer networks: in sectors, in particular disciplines (like policy), or in particular geographies will help build a better future faster. We’ve published a method report that we, or others, can use to do that.

Finally, sector programmes. We’ve been working with whole sectors to help them work together to use data. We can get more done if we work together.

Most people are familiar with organisations like the Open Government Partnership. Less well known are groups like GODAN (the Global Open Data for Agriculture & Nutrition) initiative that brings together governments, businesses and farmers to open up agriculture data to solve problems.


OpenActive is opening up sport data to make people more physically active. Places that offer a whole range of sports: football, squash, badminton, table tennis, running are opening up data and they’re also building an ecosystem of organisations that will use that data to make it easier for more people to play the sports they love.

In an initiative called open banking the UK retail banking sector is opening up data about products, locations and cash machines and creating open APIs so that people can choose to share data held about them by banks with people that they trust. We hope it will make it easier for more people to create better services for bank customers. It could also improve national statistics, help improve the UK’s identity framework, help tackle financial inclusion or many other things. We’re talking to other countries on multiple continents about helping them implement open banking too.

There are more sectors, like transport, coming together as they start to see the power of working together to solve common problems. We need to encourage sectors to understand and unlock the value of open data by focussing on infrastructure, skills and open innovation.


Finally, we’re launching a report today on the grocery retail sector and GDPR based on consumer research, sector interviews and our thinking about sectors. We want to encourage the retail sector to work together to focus on opportunities, and to use the data they hold in ways that both builds trust in shoppers and gives them better services.


But there’s an important point to understand with all of these levers. We are not building a new product or smartphone game. We are changing systems. This takes time. We are only a few decades into a large wave of technology driven change that will take many more decades to see through to the end.

Take geospatial data. People have been campaigning for open UK geospatial data for decades. Just last week there was another major commitment, a new Geospatial Commission and £80m of new government funding to maximise the value created by location data starting by opening up the UK government’s most detailed maps. It will take a few more years before the impact of that committment is fully seen.


And that’s why there’s another vital lesson. Having fun. Being optimistic. Sometimes it can feel like things are moving slowly or in a bad direction and that things will never get better. But just as open is a political statement, so optimism is a political act. Having fun helps me be optimistic. Choosing to be optimistic both helps the day go faster and creates the momentum we need to help create a better future.

Thank you.

Learning from historical waves

As I’ve been starting to get to grips with technology policy over the last few years one of the things that has fascinated me is how little reference to history there is. When I read historical books and talk to people about technology and innovation history I find some frequent gaps. We need to learn from history if we are to make the best of the opportunity created by the current waves of innovation and technology.

Whatsapp and Columbus

The Landing of Columbus by John Vanderlyn

For example, people talking about the wonders of technology talk about how few staff WhatsApp had when they were bought by Facebook, yet don’t talk about how few people sailed in the Niña, the Pinta, and the Santa Maria when Columbus sailed across the Atlantic. After Columbus’ expedition more and more people crossed the Atlantic, for exploration, for business and for pleasure.

WhatsApp’s success built on the internet, the web, cryptography and smartphones. Similarly Columbus relied on inventions in navigation and shipbuilding. Neither could have achieved what they did without those previous inventions. Are they analogous?

Learning lessons from history

Recently I read a couple of books that helped me sort out some of my thinking about lessons from previous waves of technology-driven change. The books were Ruling The Waves by Deborah L. Spar and The Master Switch by Tim Wu. They are good books. If you’re interested in technology policy you should read them too. I’ll lend you my copies if you want.

Ruling The Waves uses ocean sailing, telegraph, radio, satellite television, cryptography, personal computer operating systems and digital music to explore innovation. It proposes that they show four common phases: innovation, commercialisation, creative anarchy and rules. Different actors dominate in each those phases.

There are piratical adventures in the early years before the surviving, and now dominant, winners encourage government to work with them to bring order to the new technology. Using the model of this book would show that my silly Whatsapp/Columbus analogy is fatally flawed. Columbus was in the innovation phase, Whatsapp (and other messaging services) are in either the creative anarchy or rules phase. They’re very different kinds of innovators.

Ruling the Waves argues that the eventual rules tend to be dominated by intellectual and property rights. It shows that it can take decades, or even centuries, from innovation until stable rules are in place.

The Master Switch uses the Greek myth of the titan Kronos devouring his children as an analogy for existing monopolies devouring startups. This is Goya’s verion of that myth, using the titan’s Roman name of Saturn.

The Master Switch looks at lessons from the telephone, radio, broadcast and cable television, and Apple to propose that all information technologies go through a cycle of decentralisation to centralisation ending with a corporate (or state) monopoly where innovation, the economy and consumers suffer.

It argues that a separation principle can help prevent this fate.

This principle would keep a distance between young industries and existing monopolies to enable new technologies to show their worth; between different markets to make it harder for monopolies to spread; and between the public and private sectors to prevent government from favouring friendly monopolies.

After reading the books I was more convinced than ever that the waves of change bought about by the internet and web will take decades, if not centuries, to be absorbed into our societies. It is seductive but false to think that we can legislate for technology and data quickly. We have to allow for experiments to learn the right legislative and regulatory frameworks.

Gaps in the lessons

But there were gaps in the books. That’s not unique. I see the same gaps in lots of technology policy and thinking.

Despite the best efforts of Victorian inventors the vast majority of dinner tables do not yet feature a minature railway delivering food to bearded men. Picture from Victorian Inventions by Leonard de Vries

Major enabling waves of technology like the internet and web underpin lots of other innovation — like smartphones, social media and search engines—that each have their own journeys to go through. Some of these smaller waves will have lasting impact, some may disappear and get washed away, others are badly timed and will come back in a while. But the waves don’t stop. They are continuous. That is one of the reasons why open culture is so important. It keeps us open to innovation, new ideas and challenges from outside of a small circle of friends and organisations.

Both books miss the impact of data in the current period of change and that much of this data is personal data. It is data about you, me and billions of other people. Most data is about interactions between people, or between people and organisations staffed by other people. It is difficult, if not impossible, to determine who ‘owns’ data. For most data there will be multiple people and organisations who have rights. This makes it hard to rely on property rights as a way to shape and bring rules to the market. The challenge of building good governance for data infrastructure will need a more systemic response than property rights.

There’s a whole world of innovation out there. (Gall-Peters projection, image by Strebe CC-BY-SA 3.0)

The books also focus on the US and UK, with some excursions into mainland Europe. While they describe the differences between European and US approaches to regulation, with Europe typically intervening more, I would love to see more about the lessons learned by other countries. The web, the internet and data infrastructure cross, and therefore soften, national boundaries. Learning from and listening to other countries and societies will become even more important as these waves of technology reach their full power. These excellent recent reports from the Web Foundation are useful for those in a US/UK filter bubble who want to start listening more widely.

Innovation has limits

And finally both books miss the influence of societies and people. They are books about economy, regulation and business. They miss the social side of the change.

Lots of the impact of technology is societal as well as economic. Similarly the forces that impact on and affect technology change are both societal and economic. People adapt to technology and innovation, but sometimes they push back and reject it. Those rejections can be learned from.

The innovations that led to Christopher Columbus crossing the Atlantic also led to industrialised slavery. Slavery might have helped create the modern world but it is an evil that should not have happened and should not still be happening. We could have intervened earlier and stronger to stop it. A modern world similar, but not the same as, our current one would still have been built. It would have taken longer but it would have damaged billions fewer people in the process. Our societal norms now reject slavery and many of the other things that that particular innovation enabled.

As our societies matured we embedded some of those societal norms and values into legislation. Human rights, worker’s rights, anti-discrimination, health and safety, and data protection are some obvious examples. They are strong signals from society indicating where innovation is encouraged and where it isn’t.

The precise rules will vary by country but while the boundaries of legislation will contain things that need to adapt as we learn how to do things better at the core of the legislation are societal norms and values. We cannot and should not forget our values as we go through this wave of change. Those values do change but that change should be vigorously and openly debated.

Something the team at the ODI say a lot.

Innovation can take strange paths and be used for unintended purposes. We need to engage and work openly with societies and people if we are to both understand the limits and share the benefits of the current waves of technology.

What does this have to do with my job?

Over the last couple of years I’ve been working at the Open Data Institute where I spend about 50% of my time working with the private and public sectors delivering projects and building services. We help businesses and governments understand and adapt to the wave of change being bought about by data. The other 50% of my time is spent developing our policy thinking based on what I and the rest of the team and network learnt from delivery and research.

In that second half of my time one of the many things I’ve been helping on is developing a line of thinking that data is becoming a new form of infrastructure. That a data infrastructure which is as open as possible is one that will create the most impact and be best for people, businesses, societies and the planet and that we need to build an open future for data.

Clearly data is not “good” infrastructure right now, too many people can’t get the data that they need, so we think a lot about how governments and businesses can help strengthen it. We look at history when we do that. This is all part of my research. How did we recognise things becoming infrastructure in the past? How did we learn how to design and build good infrastructure? How long did it take? Do historical examples contain useful lessons?

What should I read next?

Anyway, like all of my blogs, I’m thinking out loud. These are some of the things my recent work and reading about history has made me think about. The gaps in the last two books led me to pick a book on the anthropology of roads as my next one. What should I read or who should I talk to after that?

© 2026

Theme by Anders NorenUp ↑

This website stores cookies on your computer. These cookies are used to provide a more personalized experience and to track your whereabouts around our website in compliance with the European General Data Protection Regulation. If you decide to to opt-out of any future tracking, a cookie will be setup in your browser to remember this choice for one year.

Accept or Deny