The UK has an official list of building addresses and their locations – ‘address data’. This data is a vital resource for building public and private services that rely on locations, and is part of our national data infrastructure. At the moment, the UK’s address data is expensive, hard to access, not always accurate, and hard to correct. This causes problems for businesses and other organisations that rely on address data – and ultimately it affects us all.
A bit of legislation that would require the government to publish a list of addresses for the UK for free was debated in the House of Lords last week. Owen Boswarva has extracted the relevant text bits from the full Hansard record. James O’Malley has videos.
The debate had contributions from Labour, Liberal Democrat, Green and Conservative backbenchers. The Minister for the Conservative government then rejected the amendment.
Reading and watching back the debate made think about three things:
- The government agreed to share deeper analysis, which is good news
- But it misunderstands why previous attempts to recreate UK address file failed, that is bad news – and not just for addresses
- The risks of openly publishing address data, or of not publishing it, are misunderstood
The government agreed to share deeper analysis
The Minister said that they were “very happy to share deeper analysis” of address data. This is good news, both because better evidence can create a better debate but also as it indicates that the government actually has some analysis.
The Geospatial Commission said they had no analysis
In 2022 the Geospatial Commission responded to a Freedom of Information (FOI) request by saying that it did not assess address data when preparing its strategy. Similarly in 2023, when the Geospatial Commission was agreeing a £31m contract with the Royal Mail, they said that they did not perform any analysis of the costs, benefits or alternative options.
There were some previous projects that did do deeper work. For example, in 2017 the government spent £500k, out of a potential budget of £5m, investigating how to create an open address file.
The results of a 2017 project were not published
Unfortunately 95% of that money was spent by the Ordnance Survey and government has refused to share the results. Perhaps now is the time to share the work that Ordnance Survey did?
Those FOI requests, and Baroness Bennett’s question about the benefits that other countries who have openly published address data are seeing provide tips on the kind of ‘deeper analysis’ that should be performed and made available.
A misunderstanding of why previous attempts to recreate UK address data failed
The Minister referred to previous attempts to recreate the UK’s address data, saying
“the resulting dataset had, I am afraid, critical quality issues”.
Viscount camrose
As someone who spent part of 2014/15 working on a project to recreate the UK’s address data that was not why our project was stopped. The Minister might want to ask officials for more details as we learned some interesting lessons that the government needs to learn too.
The kind of innovation that government policy wanted to support
Our approach to recreating the UK’s address data was to start with data that the UK government already publishes. In line with the government’s “open by default” data policy, organisations like the Land Registry, Companies House, and the Valuation Office Agency spend money to make the data they hold available for other people to use. Some of this data contains address information.
We took this government data and extracted the addresses to form a starting dataset of millions of records. We could then ‘learn’ additional addresses through a combination of statistical techniques and information provided, with meaningful consent of course, by users of address services. This was all built into an API designed to make online services work better for more people.
We intended to make the bulk data available for free, and then generate just enough revenue for sustainability – perhaps from high volume users of the API. We set ourselves up as a not-for-profit company.
It was the kind of innovation that the government’s open by default policy is intended to support.
Much of the government’s open data was not ‘open’, this creates legal risks
Unfortunately we found that much of the government’s open data was not actually ‘open’.
The government’s copyright licence (the Open Government Licence, or OGL) excludes third party intellectual property rights. The third parties who hold IP rights in address data, Royal Mail and the Ordnance Survey, are litigious and many of the government organisations that published the data were unable to be clear on whether or not there was Royal Mail or Ordnance Survey rights in the data they published. We only used datasets where the publishing organisation told us it was ‘safe’.
But even though it was government organisations publishing the data they would not be liable if there was a legal issue. We would be. So we needed insurance cover.
But given the risks only one insurance company was willing to offer cover and that was on unrealistic terms. So, we stopped the project.
To put it another way, an innovative, not-for-profit business could not use the data that multiple government organisations published to support innovation, because another government organisation might take legal action.
There are new plans to publish more government data, they risk the same problems
Zooming forward in time from the ancient history of 2014/15 and back to the present day various UK government departments are currently making new plans to publish more government data.
This is because of initiatives like the Vallance report on pro-innovation regulation of technologies and a desire to support the UK’s AI industry. High-quality, authoritative government reference data is one way of reducing the hallucinations that the current generation of AI models suffer from. Sounds sensible, right?
But publishing widely used address data is a lot simpler and safer than much of the planned work, yet the government failed to do so in a way that allowed organisations to clearly understand what they legally could, or couldn’t, do with it. Will this new wave of government data come with instructions telling AI models and engineers not to do anything with addresses? And what other third party rights might be lurking in there? Or will government just make AI’s copyright issues even more complicated.
If the government does not understand why its previous attempts to publish data did not yield the desired benefits then I fear a lot more wasted money in the future.
The risks of openly publishing address data are misunderstood
In the debate Lord Bassam said
“there is a balance to be struck between privacy issues and the need to ensure that service delivery and commercial activity operate on a level playing field”
LORD BASSAM
It is good that politicians consider privacy issues, but this misunderstands the risks.
Address data does not create new privacy risks
The list of addresses does not tell us where specific individuals live, the only personal data involved is likely to be those of people who name their business address after themselves. Instead address data tells us where people might live, work and play but not who is living, working or playing there.
(As an aside: I don’t want to imply that there are no risks of privacy, or other human rights, breaches with non-personal geographic data. For example in a separatist war in Sudan in 2011 atrocities were carried out because satellite data showed where particular groups of people were. But, hopefully, the UK is a long way from a separatist war and, let’s be honest, truly harmful actors will either simply buy the address data or use an illegal copy.)
The harms created by the lack of access to address data are more pressing
By contrast Lord Clement-Jones pointed out that
“The harms created by the lack of access to address data are more pressing”
LORD CLEMENT-JONEs
While Baroness Harding pointed at the issues with the current data quality saying:
“the quality of the data is not good enough….Anybody who has tried to build a service that delivers things to human beings in the physical world knows that errors in the database can cause huge problems. It might not feel like a huge problem if it concerns your latest Amazon delivery but, if it concerns the urgent dispatch of an ambulance, it is life and death.“
BARONNESS HARDING
Elsewhere the National Audit Office has pointed to the challenges of creating and using the shielding list of people with extreme clinical vulnerabilities during the pandemic. One of the challenges was inconsistent address data in different formats in different IT systems and organisations. This is one of the many challenges that opening up the official list of address data will help with, because over time more organisations will refer to and use the same reference data.
If the funding model changes then will quality drop?
There is one risk that was not discussed in the debate though.
If the maintenance and publication of address data is not funded from licence fees collected by Royal Mail and Ordnance Survey then will the quality drop?
This is where there is an important balance to be struck as people and organisations need the correct incentives to publish useful data.
Bluntly, this is the risk I worry about the most. Money is only one type of incentive but it is an important one in this context and it is one of the reasons why I’m so keen to see some deeper analysis of the current costs.
Experience tells me that the current costs are significantly overstated – particularly the Royal Mail who claim costs of ~£25m/year for ~300,000 changes/year. But however much the costs can be reduced it will still cost money to publish quality address data.
Making the publication of the data a statutory duty, as this amendment would have done, is one way to help tackle this risk. It requires the government to fund and do the work.
Perhaps the money might come from general taxation, and the increase in economic activity that will come from publishing the data? Or perhaps from a small increase in registration fees collected by local authorities who do most of the work to create addresses? Or a small increase in the Land Registry transaction fees, after all they handle nearly 50 million transactions per year?
Other countries have changed legacy business models, the UK should too
Whatever the final decision it will need some coordination and activity from a few public bodies willing and able to work together to publish address data as a public service.
And that’s where I hope the government is really focussing its analysis. Not on whether to publish address data for free, but on how to do it.
Because in the 21st century it is pretty sensible for high-income countries to make reference data, like addresses, as widely available as possible. That is why peers from so many different parties supported this amendment, and why so many other countries are doing the work.
The hard part of the work is changing the legacy business models and incentives of government organisations so that they make it happen. Other countries have done that, and it’s long past time for the UK to do the same.