Author: peterkwells (Page 2 of 11)

Two events on different types of ‘public good’

‘Public good’ is a term that is frequently used in debates about data, statistics and AI. It has featured prominently in UK government policy and strategy, but the term isn’t always well defined or explored.

But in the UK it can seem like there is little recognition that there are different uses of the term ‘public good’ in debates elsewhere in the world, particularly with the emergence of the concept of ‘digital public good’. Sadly UK policy seems to have gradually become more insular over the last decade as the country has wrangled with the results of the Brexit referendum and a seemingly never-ending carousel of ministers with responsibility for this area.

The UK’s new Labour government’s draft industrial strategy, published in 2022, said that “our second mission will be to harness data for the public good“. It has created a new “digital centre” within DSIT and plans to build a National Data Library.

It seemed a good time to explore the concept of ‘public good’ a bit more., soJob de Roij and I from the RSS’s Data Ethics and Governance section have organised a couple of online events with speakers with expertise in both the UK and more globally.

Both events are free for RSS members, and £10 for non-members.

Read on for more about the two events, the confirmed speakers, and some background reading on the topic.

Event 1: For the public good

The UK Statistics Authority’s Five Year Strategy 2020-2025 is called Statistics for the public good. It also features in Labour’s draft industry strategy.

While some work has been undertaken to unpick this term by understanding how the UK public think of it, and to review how the public good can be enhanced to support policy makers, regulators and practitioners, there is more work to be done to ensure that statistics and statistical processes truly serve the good of the UK public.

Speakers

Starting questions

What do we mean by “serving the public good”? What are the gaps in our understanding of how to make things serve the public good? How do we fill those gaps?

Event 2: As a public good

Meanwhile there is growing attention around the world to the concept of ‘digital public goods’.

The UN defines digital public goods as “solutions and systems that enable the effective provision of essential society-wide functions and services in the public and private sectors”.

Identity assurance and payment systems are well-known examples of digital public goods. Certificate transparency, which underpins website security, a less well-known example.

In the world of statistics things like national statistics or the new ONS Integrated Data Service could be grouped into the concept of digital public goods. But what other kinds of digital public goods might, or should, exist that are relevant to statisticians? 

Speakers

Starting questions

What is a digital public good? When is data, AI and statistics a public good, and when is it not? Are any digital public goods that could help statisticians serve the public good missing? How do we build and govern digital public goods?

Further reading

If you want to read more about this topic then some links are below. Skeet or mail me if you think other things should be added.

Three thoughts from last week’s address data debate

The UK has an official list of building addresses and their locations – ‘address data’. This data is a vital resource for building public and private services that rely on locations, and is part of our national data infrastructure. At the moment, the UK’s address data is expensive, hard to access, not always accurate, and hard to correct. This causes problems for businesses and other organisations that rely on address data – and ultimately it affects us all.

A bit of legislation that would require the government to publish a list of addresses for the UK for free was debated in the House of Lords last week. Owen Boswarva has extracted the relevant text bits from the full Hansard record. James O’Malley has videos.

The debate had contributions from Labour, Liberal Democrat, Green and Conservative backbenchers. The Minister for the Conservative government then rejected the amendment.

Reading and watching back the debate made think about three things:

  • The government agreed to share deeper analysis, which is good news
  • But it misunderstands why previous attempts to recreate UK address file failed, that is bad news – and not just for addresses
  • The risks of openly publishing address data, or of not publishing it, are misunderstood

The government agreed to share deeper analysis

The Minister said that they were “very happy to share deeper analysis” of address data. This is good news, both because better evidence can create a better debate but also as it indicates that the government actually has some analysis.

The Geospatial Commission said they had no analysis

In 2022 the Geospatial Commission responded to a Freedom of Information (FOI) request by saying that it did not assess address data when preparing its strategy. Similarly in 2023, when the Geospatial Commission was agreeing a £31m contract with the Royal Mail, they said that they did not perform any analysis of the costs, benefits or alternative options.

There were some previous projects that did do deeper work. For example, in 2017 the government spent £500k, out of a potential budget of £5m, investigating how to create an open address file.

The results of a 2017 project were not published

Unfortunately 95% of that money was spent by the Ordnance Survey and government has refused to share the results. Perhaps now is the time to share the work that Ordnance Survey did?

Those FOI requests, and Baroness Bennett’s question about the benefits that other countries who have openly published address data are seeing provide tips on the kind of ‘deeper analysis’ that should be performed and made available.

A map of open address data around the world from OpenAddresses.io

A misunderstanding of why previous attempts to recreate UK address data failed

The Minister referred to previous attempts to recreate the UK’s address data, saying

the resulting dataset had, I am afraid, critical quality issues”.

Viscount camrose

As someone who spent part of 2014/15 working on a project to recreate the UK’s address data that was not why our project was stopped. The Minister might want to ask officials for more details as we learned some interesting lessons that the government needs to learn too.

The kind of innovation that government policy wanted to support

Our approach to recreating the UK’s address data was to start with data that the UK government already publishes. In line with the government’s “open by default” data policy, organisations like the Land Registry, Companies House, and the Valuation Office Agency spend money to make the data they hold available for other people to use. Some of this data contains address information.

We took this government data and extracted the addresses to form a starting dataset of millions of records. We could then ‘learn’ additional addresses through a combination of statistical techniques and information provided, with meaningful consent of course, by users of address services. This was all built into an API designed to make online services work better for more people.

We intended to make the bulk data available for free, and then generate just enough revenue for sustainability – perhaps from high volume users of the API. We set ourselves up as a not-for-profit company.

It was the kind of innovation that the government’s open by default policy is intended to support.

Much of the government’s open data was not ‘open’, this creates legal risks

Unfortunately we found that much of the government’s open data was not actually ‘open’.

The government’s copyright licence (the Open Government Licence, or OGL) excludes third party intellectual property rights. The third parties who hold IP rights in address data, Royal Mail and the Ordnance Survey, are litigious and many of the government organisations that published the data were unable to be clear on whether or not there was Royal Mail or Ordnance Survey rights in the data they published. We only used datasets where the publishing organisation told us it was ‘safe’.

But even though it was government organisations publishing the data they would not be liable if there was a legal issue. We would be. So we needed insurance cover.

I am reliably informed that multiple people received legal warning letters for this Private Eye piece that used address data to understand foreign ownership of UK properties. I wonder how Private Eye responded.

But given the risks only one insurance company was willing to offer cover and that was on unrealistic terms. So, we stopped the project.

To put it another way, an innovative, not-for-profit business could not use the data that multiple government organisations published to support innovation, because another government organisation might take legal action.

There are new plans to publish more government data, they risk the same problems

Zooming forward in time from the ancient history of 2014/15 and back to the present day various UK government departments are currently making new plans to publish more government data.

This is because of initiatives like the Vallance report on pro-innovation regulation of technologies and a desire to support the UK’s AI industry. High-quality, authoritative government reference data is one way of reducing the hallucinations that the current generation of AI models suffer from. Sounds sensible, right?

But publishing widely used address data is a lot simpler and safer than much of the planned work, yet the government failed to do so in a way that allowed organisations to clearly understand what they legally could, or couldn’t, do with it. Will this new wave of government data come with instructions telling AI models and engineers not to do anything with addresses? And what other third party rights might be lurking in there? Or will government just make AI’s copyright issues even more complicated.

If the government does not understand why its previous attempts to publish data did not yield the desired benefits then I fear a lot more wasted money in the future.

The risks of openly publishing address data are misunderstood

In the debate Lord Bassam said

 “there is a balance to be struck between privacy issues and the need to ensure that service delivery and commercial activity operate on a level playing field

LORD BASSAM

It is good that politicians consider privacy issues, but this misunderstands the risks.

Address data does not create new privacy risks

The list of addresses does not tell us where specific individuals live, the only personal data involved is likely to be those of people who name their business address after themselves. Instead address data tells us where people might live, work and play but not who is living, working or playing there.

(As an aside: I don’t want to imply that there are no risks of privacy, or other human rights, breaches with non-personal geographic data. For example in a separatist war in Sudan in 2011 atrocities were carried out because satellite data showed where particular groups of people were. But, hopefully, the UK is a long way from a separatist war and, let’s be honest, truly harmful actors will either simply buy the address data or use an illegal copy.)

The harms created by the lack of access to address data are more pressing

By contrast Lord Clement-Jones pointed out that 

The harms created by the lack of access to address data are more pressing

LORD CLEMENT-JONEs

While Baroness Harding pointed at the issues with the current data quality saying:

the quality of the data is not good enough….Anybody who has tried to build a service that delivers things to human beings in the physical world knows that errors in the database can cause huge problems. It might not feel like a huge problem if it concerns your latest Amazon delivery but, if it concerns the urgent dispatch of an ambulance, it is life and death.“

BARONNESS HARDING

Elsewhere the National Audit Office has pointed to the challenges of creating and using the shielding list of people with extreme clinical vulnerabilities during the pandemic. One of the challenges was inconsistent address data in different formats in different IT systems and organisations. This is one of the many challenges that opening up the official list of address data will help with, because over time more organisations will refer to and use the same reference data.

If the funding model changes then will quality drop?

There is one risk that was not discussed in the debate though.

If the maintenance and publication of address data is not funded from licence fees collected by Royal Mail and Ordnance Survey then will the quality drop?

This is where there is an important balance to be struck as people and organisations need the correct incentives to publish useful data.

Bluntly, this is the risk I worry about the most. Money is only one type of incentive but it is an important one in this context and it is one of the reasons why I’m so keen to see some deeper analysis of the current costs.

Experience tells me that the current costs are significantly overstated – particularly the Royal Mail who claim costs of ~£25m/year for ~300,000 changes/year. But however much the costs can be reduced it will still cost money to publish quality address data.

Making the publication of the data a statutory duty, as this amendment would have done, is one way to help tackle this risk. It requires the government to fund and do the work.

Perhaps the money might come from general taxation, and the increase in economic activity that will come from publishing the data? Or perhaps from a small increase in registration fees collected by local authorities who do most of the work to create addresses? Or a small increase in the Land Registry transaction fees, after all they handle nearly 50 million transactions per year?

Other countries have changed legacy business models, the UK should too

Whatever the final decision it will need some coordination and activity from a few public bodies willing and able to work together to publish address data as a public service.

And that’s where I hope the government is really focussing its analysis. Not on whether to publish address data for free, but on how to do it.

Because in the 21st century it is pretty sensible for high-income countries to make reference data, like addresses, as widely available as possible. That is why peers from so many different parties supported this amendment, and why so many other countries are doing the work.

The hard part of the work is changing the legacy business models and incentives of government organisations so that they make it happen. Other countries have done that, and it’s long past time for the UK to do the same.

A list of books I read in January 2024

Note

I read a lot of papers and blogposts for work too. Things that I need to write down so they can be found again tend to be bookmarked on Pinboard.

How much extra spam will the UK’s Data Protection and Digital Information Bill create?

The UK’s Data Protection and Digital Information Bill continues to work its way through Parliament. The UK government hopes to get it completed in the first half of 2024.

The bill is complex with lots of different parts. When the UK government first started promoting the bill they said that one of the ways it would help the public was by reducing cookie pop-ups, reducing the chance of people being pestered by seemingly unnecessary alerts.

Unfortunately, the bill will do little to cookies – that’s a problem that industry is trying to ‘solve’ – but it looks like it could significantly increase the amount of unwanted spam and letters that people receive. From some figures it looks like there could be a 25% increase. Uh oh.

Image by DALL-E and me.

The bill makes it easier for more organisations to send unwanted mail

Current UK legislation and guidance effectively says that unless organisations have consent then they need to carry out a number of tests to decide whether they have a ‘legitimate interest’ in sending direct marketing to people.

The data protection regulator say that those tests mean organisations need to consider things like the nuisance factor of unwanted adverts and the effect they might have on people in vulnerable situations. Sounds sensible.

If the regulator’s guidance is not followed then organisations can be fined. That also sounds sensible.

But the new bill explicitly says that direct marketing – a category that includes things like posted or emailed adverts – is an example of a legitimate interest.

The bill is long and complex, you’ll find the change in the section on “lawfulness of processing”, but what does it mean for people?

Industry thinks this will mean that a lot more money is spent on adverts

At a conference last year the CEO Direct Marketing Association said that this change is an important clarification and that they expect it to mean that an extra £250m will be spent on printing and posting adverts through people’s letterboxes.

The Advertising Association said that £1.1bn was spent on direct mail in 2021 so an extra £250m means about a 25% increase in the amount of printed adverts that we’ll all get.

If those estimates are correct then it seems reasonable to think that there’ll be a similar 25% increase in the number of emailed adverts.

Obviously some adverts are useful, but people hate spam

Last year I worked on a project with the team at IF that researched how people felt about advertising.

It was pretty clear that most people like some advertising, I mean who wouldn’t want a discount for their favourite food in the middle of a cost of living crisis.

But it was also very clear that people hated unwanted advertising, particularly when it came through their letterboxes and into their emails, and that there was too much of it already.

Another image by DALL-E and me. I have a lot less hair than this.

Unwanted adverts makes life harder for everyone because we need to wade through them to find meaningful things, like the increasing number of notifications that public services send us about our taxes, health, or benefits.

It makes it particularly hard for people in vulnerable situations. Some people find it harder to sift through the volume of letters and emails to find the important things, while more people with specific vulnerabilities might be targeted by bad organisations.

That is why the ICO recommends those tests under the current legislation. It helps reduce the proportion of unwanted, or actively harmful, adverts that people receive.

In IF’s research we also found that people wanted other ways to reduce unwanted adverts, for example by using their legal right to object. Unfortunately that legal right is not being respected.

Instead of fixing these things – and giving people more useful and controllable advertising – the government seems to be changing the legislation so that more adverts can get sent.

Do the industry’s figures on increased spend on advertising feel ‘right’?

It is genuinely hard to tell if the industry’s figures are accurate. 

Government has published an impact assessment for the bill. It says that this change will save organisations about £4.5m per year and notes the potential risks to people in vulnerable situations.

The impact assessment does not attempt to quantify those risks whether in monetary terms, in terms of the number of people affected, or the number of extra unwanted adverts that people will receive. It does not bring to life how an increase in marketing will affect people.

But will advertisers really spend even more money on advertising? Or just shift it between different types of advertising like direct mail, email and online adverts? Perhaps they will spend more money but it will simply get swallowed up within the opaque online advertising industry?

Who knows.

But given that the expected benefits are a tiny £4.5m a year in reduced costs, perhaps more people should be asking how much extra spam people will get in return?

Robots terms of service

In 2023 one of the AI debates was about when information and data on the web can be used to train AI models.

In late December we saw another billion dollar court case as the New York Times alleged that Microsoft and OpenAI had unlawfully used news articles to create AI models. 

In 2024 and beyond, then as well as the debate about how information can be used in relation to AI I expect we’re going to see more debate about how services can be used by AI. 

If we peer into the future, perhaps we need terms of service for robots?

AI services will connect services from multiple existing organisations in new ways

As Sarah Gold puts itwhen applied to technical infrastructure, LLMs become a kind of connective tissue…[they] will connect different systems – at scale. They will execute complex and multi-part tasks, across different departments and organisations”. 

From a consumer perspective this will manifest as different kinds of services, such as learned services that are deliberately designed for particular tasks like moving home or arranging a holiday, to more general-purpose AI agents that can help with a range of tasks.

The technology to enable these kinds of services is getting ever closer to working at scale, but services are not only made of technology.

A concept of a learned service that helps a family move home, by Projects by IF.

Service providers will have relationships with both users and AI providers

From the perspective of existing service providers this new wave of AI services will look like another relationship in addition to the existing relationship with service users. 

With AI agents there are important relationships between users, service providers, and the organisations that provide AI services. The new relationship between the AI service provider and its service users is also very important, but this post focuses on the relationship with existing service providers. Picture by me with assistance from DALL-E.

These kinds of three way relationships obviously already exist. Many people use travel agents to help arrange holidays. Supermarkets bring together food from multiple suppliers and make it available in one place. My sisters and I help my elderly mother use various services.

But AI has the potential to create new arrangements at speed, at scale, and without pre-existing contracts. To provide a simple example, an AI service could ring a series of hotels to make bookings for a train trip across Europe.

Many service providers will not be happy with AI services using their services

But just as existing service providers have not been happy with AI companies using information, many service providers will not be happy with AI services using their services.

Some of this discomfort will be from a simple fear of competition, but in other cases it will be because of other fears such as:

  • consumers being dissatisfied because a service does not meet their expectations, perhaps because an AI service generated an incorrect description of a hotel
  • risk of regulatory action, perhaps the AI service does not collect identity information in a way that meets local requirements
  • that it will generate degrading work for humans, for example through a large number of AI service providers using computers to make repeated phone calls for information
  • whether the existing service provider and AI service provider are receiving fair shares of the value created by the combined service

Robots terms of service

Some of these fears can, and will, be overcome by existing mechanisms.

Liability laws are being updated. AI services that take the mickey will be sued. Some AI and service providers will negotiate new contracts that create new rules for payment of commission, or for how workers should be treated. This will all need to happen across a large number of sectors, industries, geographies. 

But I also wonder if we need to look at some other existing concepts like terms of service, one of the, often lengthy, bits of legal text that humans get when we agree to use a service.

Picture by me with assistance from DALL-E.

If we are heading to a future where new three way relationships between humans, service providers, and AI-powered services can – and probably will – be created at speed, scale and without pre-existing contracts then, perhaps, service providers will need new terms of services that describe how AI robots can use their services?

Addresses, memes, and badges

A journalist managed to meme themselves, myself – and some wiser people – into a meeting with the Shadow Business Secretary, Jonathan Reynolds, to talk about how to free up the UK’s address data. There were even badges.

Jonathan Reynolds MP, James O’Malley, Hadley Beeman, Anna Powell-Smith and me. My PAF badge is in my pocket ‘cos wearing badges is not enough, in days like these.

Address data – ie the list of properties in the UK: 1 Acacia Avenue NE26, 2 Acacia Avenue NE26 – might seem a bit dull, but it’s actually really fundamental to our lives.

It’s digital infrastructure that’s used every time you order a pizza online, to every time you get home insurance, to that time the government needed to combine umpteen datasets to work out who was most at risk from the pandemic and then help them shield from it.

Anyway, UK address data has been partly privatised (into the Royal Mail), partly behind paywalls (Ordnance Survey), and tangled up in some really complicated copyright and business model stuff.

That holds back innovation, investment and makes all of the services we use a little bit worse as people need to spend time wrangling money, copyright, licensing rather than cracking on with doing stuff that can make our lives better.

It’s possible for a government to fix that mess, but it needs some will power. And the last few governments have, to be kind, had other things to do.

Jonathan Reynolds posted on X about the event. Unfortunately X’s product radar is so broken that I can’t simply embed the post nowadays. So, here’s a screenshot.

So, in the run up to the next election a few of us are doing the rounds to ask the political parties who might form the next government whether they’ll have a go at sorting it out

Anyway, you can read more on the meeting on James’ substack, and subscribe to his newsletter too.

If you subscribe then James might even send you a ‘free the PAF’ badge…

The Post Office scandal and the law about computer evidence

The Post Office scandal is one of the most outrageous miscarriages of justice in recent UK history.

For over a decade people were wrongly accused of defrauding the Post Office. Many were prosecuted. Many others will have paid money to the Post Office to stop the accusations. Eighty three convictions have already been overturned, with more likely to come. There were jail sentences, deaths, divorces, bankruptcies and suicides before justice started to be restored. A public inquiry is under way.

One thing the scandal shows is that the legal presumption that computers are operating correctly risks causing a lot more harm.

The building on the left of this picture used to be a combined sweet shop and post office, it was run by my grandparents. Fortunately they retired before the Horizon system was implemented. Image © Google.

Some of the problems were caused by computers

The root cause of the scandal is often described as being bad technology. Horizon, a system developed and operated for the Post Office by Fujitsu, was clearly not up to the job. It incorrectly reported that subpostmasters owed the Post Office money. But stories from subpostmasters, journalists, court cases and the ongoing inquiry show a broader set of failures.

The failures spanned a full stack of user interfaces, technology, organisations and public policy. Image by Projects by IF.

Security researchers at University College London have done a neat summary of many of the technical issues with the Horizon system that were identified by a judge in 2019.

Nick Wallis’s book, The Great Post Office Scandal, also contains evidence of other issues. Sometimes Horizon’s touchscreen interface was too slow to keep up with the speed that people could enter orders, while the training and tools provided to subpostmasters were inadequate.

But there were organisational failures too

Employees of the Post Office and Fujitsu falsely told victims, politicians, and courts that the IT system was working correctly. Both organisations had evidence of its ongoing failures.

To make things worse the Post Office was in the rare position of having the legal power to both investigate and bring private prosecutions. There appears to have been little oversight of how the Post Office was using those powers by the Post Office’s sole shareholder, the UK government.

A report to the public inquiry that was published last week said that the policies and training provided to Post Office ‘s investigation and prosecution teams was not adequate to ensure that investigations and prosecutions were fair, auditable, and in accordance with the interests of justice.

And the law is an ass

And, for the icing on the cake, the Post Office’s prosecutions relied on a piece of law that says that courts should presume that computers are operating correctly. It is for defendants to demonstrate that a computer is operating incorrectly.

As the Post Office case shows it is incredibly difficult for defendants to do that when some organisations are as broken and incapable of being honest as the Post Office and Fujitsu. The evidence that they held of system failures was not shared with defendants or the courts.

The UK government has said that it has no plans to review this piece of law. As the technology industry is currently moving towards more probabilistic forms of technology, like the current wave of AI foundation models, that will be an increasingly strange position to take.

This is a law that will not just harm people in extreme cases, like the Post Office, but in many other cases too.

But if the law is an ass, then the law is an ass. There need to be campaigns to review and change the law, but in the meantime we can also try and make the law less of an ass.

A nice picture of an ass that I found on the web.

Making the law less of an ass

Over the years most people that design, build and operate computer systems, products and services have realised that we need to mitigate the effects of computer faults.

But there’s less recognition, and less guidance, on how to reduce the chance of law, like the presumption that computers are reliable, being misused to harm people. The intent of this law was not to falsely jail people, but that is the outcome it has led to.

So, if you are responsible for designing, building or operating a computer system then you need to think about how and when to provide evidence that that system does have faults and flaws.

That means doing the basic things like collecting and storing evidence about problems with the computer, its user interfaces, and the surrounding organisational processes. The kind of things that most organisations do.

But it also means being prepared to do what can be a hard thing. To disclose that evidence, even when an organisation’s leadership – like the Post Office’s – decide not to.

Because a fair justice system that does not falsely prosecute and jail people is one of the most important foundations of society. And, as our society increasingly runs on computers, how the justice system handles evidence about computers is going to become ever more important to all our lives.

Why the public sector needs trust to innovate

The public sector is facing pressure to innovate in response to changing needs, financial pressures, and emerging technologies. To innovate it needs to be trusted. The public sector can earn and maintain trust by aligning data and technology use with democratic values.

The pressure to innovate is coming from multiple directions.

Read more

Part 1….Towards a market of trustworthy AI foundation models

Foundation models, such as OpenAI’s GPT-4 and Google’s LaMDA, underpin many recent AI services. These models — which include generative artificial intelligence and large language models- make services like OpenAI’s chatGPT and Google’s Bard possible.

The UK Competition and Markets Authority (CMA) is carrying out an initial review of AI foundation models, focusing on potential competition and consumer protection considerations.

There are many possible intervention points where the introduction of new measures could reduce consumer harm. In this response from Projects by IF to the CMA we argue that the most effective measures will be to require model providers to deliver trustworthy foundation models, including supporting documentation and resources, that are designed to help product teams build better services.

IF’s Responsible Technology by Design framework

Read more

Want to reduce harmful design? Make good design easy.

People are rightfully concerned about being manipulated online. One of the causes is deceptive, or harmful, design practices. Two of the UK’s regulators have published a report on harmful design practices. This makes it even more important for organisations to act, but it is not enough. Regulators and service providers need to make it easier to do good design. Here’s where to get started.

Read more

The report says that cookie banners like this are harmful, and probably illegal, under both data protection and competition law.
« Older posts Newer posts »

© 2026

Theme by Anders NorenUp ↑

This website stores cookies on your computer. These cookies are used to provide a more personalized experience and to track your whereabouts around our website in compliance with the European General Data Protection Regulation. If you decide to to opt-out of any future tracking, a cookie will be setup in your browser to remember this choice for one year.

Accept or Deny