Tag: Government (Page 1 of 2)

Three policy ideas to help the UK adapt faster to the internet

The UK is having a general election on December 12th. Over the next week political parties will put out their manifestos. Those manifestos will contain lots of commitments about what the parties will do if they are elected.

When I looked at the manifestos for the last general election in 2017 I was disappointed at their lack of recognition of the changes the world was going through because of technology. To help this time, here are three simple tech policy ideas for any party. They’re focussed on helping the UK adapt to the current wave of technology change. They are a bit late for the manifestos, but they still might be useful.

A bit of context

First, a bit of context. Technology is always changing but it has changed a lot in the last few decades with the proliferation of computers, the internet, the web, and data. These technologies have changed things for governments.

Some citizens now have higher expectations from public services. They expect public services to behave like those they get from Google, Amazon or whichever service is hot this year, *checks notes*, such as ByteDance’s TikTok. Technology is enabling things that some may think should be public services — like accurate mapping data on smartphones, or being able to have a video call with a doctor.

Other citizens now have more fear. Perhaps because they are excluded from those services because they lack skills or access to the internet or perhaps they are at risk of being discriminated against because technology is being used to perpetuate, or accentuate, existing societal biases.

Using new technology to help deliver public services that work for everyone is a tough job that, despite good work by Government Digital Services, government still has not cracked.

Image from For Everyone via the Web Foundation

New technology has also enabled new businesses, markets and types of services to emerge. Things like smartphones, social media, cloud computing, online retailers, online advertising, and the “sharing economy”. The world is now more interconnected. Someone in Wales can rapidly build an online service and start selling it to people in India, and vice versa. Meanwhile because the technologies have also been adopted by existing companies they affect government’s role in existing markets.

Technological waves of change like this are not new — I recommend reading some history about the after-effects of the invention of ocean sailing, printing, electricity, or television — but governments have been particularly slow to adapt to this wave of technological change.

Why? Perhaps because the technologies have changed things globally. Perhaps because of the type of governments that we have had. Perhaps because of lobbying by businesses. Who knows. Future historians will be better placed to assess this.

Anyway, my suggestions are not about the details of each of these areas. Instead they are about how to increase the rate of adaptation for the next government. About how to get more radical change.

Tackle the fear around technology and politics

There is a lot of fear about what technology means for politics. Misuse of data by companies and political organisations. Highly targeted advertising reducing accountability. Foreign governments interfering in elections. This fear is exacerbating a pre-existing low level of trust in and disengagement from UK democracy.

Political parties should start with themselves. They need to be open about how they are using data and online advertising and publish data about their candidates to help voters make more informed decisions. Political parties should not use micro-targeted advertising during the election, and should challenge their opposition to follow their lead. Where necessary they should err on the side of caution when using advertising tools. After all, much targeted advertising is already likely to be illegal under existing legislation. Doing these things will help politicians learn how to responsibly use technology while competing for power. That will help them use technology responsibly if they get in to power.

Whoever gets into power should then ban targeted political advertising until it is shown to be reasonably safe. To understand the effects researchers will need access to data held by the big technology platforms like Facebook, Twitter, Google and Apple. Organisations in the USA have faced challenges when trying to do this with Facebook but approaches like the ONS ‘five safes’ and the Ministry of Justice data lab show that parts of the public sector have the necessary skills to design ways to do it. Government should use models like this to give accredited researchers access to data held by the platforms to inform future policy decisions and, perhaps, when to relax the ban for certain kinds of ads.

Develop technology literacy in more of the public sector

To implement a party’s manifesto commitments — whether it be implementing municipal socialism, moving to a zero carbon society, (re)creating an independent Scotland, agreeing new trade deals (if Brexit actually happens), free broadband, a charter of digital rights, or implementing an industrial strategy and increasing R&D — public sector staff need to understand how technology affects their work and technology experts need to understand the public sector.

Sometimes a horrified face emerges from behind my polite face. I apologise to everyone who has seen it.

Unfortunately too many people still do not get it. In my own meetings with governments I am often surprised, and sometimes horrified, by whole teams of people with limited technology literacy making significant decisions about technology. (Similarly, I am often surprised, and sometimes horrified, by teams of technology experts making significant decisions that impact on policy or operations with no real experience in those areas.)

Not every public sector worker needs to be a technology expert, and it is certainly not true that everyone needs to know how to code, but it is necessary to have technology literacy in many more parts of government. More public sector workers need to understand both the benefits and the limitations of new technology and the techniques that people, like me, use to build it.

This is one of the most important things to focus on. Different skills are needed by different roles, but an underlying element of technology literacy is useful for everyone.

To start providing this technology literacy I would recommend vocally demonstrating that technology experience is as valued as other skill sets and encouraging more technology experts to join teams that lack that experience, and by seconding non-technology staff into technology teams. In both cases people can then listen to and learn from each other.

An independent inquiry into technology regulation

Finally, regulation. Technological change needs changes to regulators and can lead to the need for new ones. There are a growing number of known gaps in technology regulation. Some of these gaps affect public services, like the police. Others affect public spaces, like facial recognition. Some affect new services like social media. Others existing ones, like insurance. In some cases it is not clear if regulators are appropriately enforcing existing rules, like equalities and data protection legislation, while there will be a large number of gaps that people simply haven’t spotted yet.

Previous governments have set in process various initiatives such as considering the need for a new social media regulator, a national data strategy, and a Centre for Data Ethics and Innovation (CDEI), but these initiatives are not adequate. They are controlled and appointed by the current politicians, operate within current civil service structures, and are mostly taking place in London. The changes bought about by technology are too fundamental for this approach to work. The UK needs something more strategic, more radical, more independent, and more citizen-facing.

An independent inquiry into technology regulation should be set up. It should have representatives from around the UK; with different political views; with experience from the public sector, private sector and civil society; and from both citizens that love modern technology and from the groups that are most at risk of discrimination. It should look across the whole technology landscape, have the power to call witnesses, and be empowered to make a series of recommendations for changes to legislation and regulation to help set the UK on a better path for the next decade.

Inquiries like this can happen faster than you think. The recent German Data Ethics Commission took just 12 months to come up with a set of excellent recommendations. Setting a similar timescale for an inquiry in the UK will allow the next Parliament and the next Government to focus on delivery.

It is necessary and possible for the UK to adapt to technology faster

Politicians and their teams can learn how to use technology more responsibly by tackling the fear around technology and politics; mixing up teams in the public sector can help staff learn from each other; and an independent inquiry into technology regulation can help set the UK on a better path to the future.

The UK needs to adapt to technology faster. For the good of everyone in the UK, but particularly those who are being disadvantaged by irresponsible use of technology, can we do it? Please?

AI and the Committee for Standards in Public Life

The UK has a Committee for Standards in Public Life (CSPL). It advises the Prime Minister on ethical standards across the whole of public life in England (yes, only England — ethics must be a devolved matter).

A picture of some people by L S Lowry (via Flickr)

The committee is currently investigating Artificial Intelligence and whether the existing frameworks and regulations are sufficient to ensure that high standards of conduct are upheld as technologically assisted decision-making is adopted more widely across the public sector.

Big topic. After all AI is a range of techniques that uses people, mathematics, software and data to make guesses at the answer to things. It can help, and hinder, with lots of the huge array of things that the public sector does.

I represented the Open Data Institute (ODI) on a roundtable for this investigation. A couple of people have asked me what the roundtable was like and what I said. Here’s a quick blogpost.

Preparing for a roundtable

The ODI team get invited to lots of roundtables and events. We decide which ones to do and who does them based on a range of criteria. The invitation for this one went to our CEO, Jeni Tennison, she passed it to me to do. My goal was to help the committee, learn from what other attendees were saying, and test some of our ideas in front of this audience.

We did our usual preparation by sharing the questions around the team in the office and telling our network that we were going along to hear what advice they gave us. That technique provides a lot of input. It also helps me represent the ODI and the ODI’s network, rather than simply myself and my own views.

I summarised it down to a few key points to try and make, and then tried not to over-prepare. Over-preparation is the worst sin: it makes me sound even duller than normal.

Rounding a table

The roundtable itself was at Imperial College in London.

The setup was more informal and the committee was more friendly and asked more insightful questions than most similar things I’ve done. That was good. My background is technical and private sector — I previously spent 20 years working with telecoms operators building products, systems and networks — so I always worry that I’ll misunderstand or miscommunicate particular words or phrases. That would damage both me and the organisation I represent.

Anyway, I managed to get over versions of some of things that we’d prepared and/or that we regularly discuss in the office and that were relevant to how the roundtable took shape:

  • that there is little transparency over use of AI in the public sector and of how the UK government’s Data Ethics Framework is being used. I know that there is good and bad work being done, but mostly because I know some of the people doing it. How are the general public meant to know?
  • that we need to focus more on the people who design, build and buy AI services. Exploring what responsibility and accountability they should have and how we give them the space, time and money to design those services so that they support democracy, openness, transparency and accountability as well as being efficient and easy to use
  • that the current focus on ethical principles and AI principles do not seem to be having a useful effect. That instead we need to couple those top-down interventions with more bottom-up practical tools (like the framework or ODI’s Data Ethics Canvas) and more research into how the people designing, building or buying AI systems make decisions and what will influence them to comply with the law and think about the ethical implications of their actions
  • that control, distribution of benefits and harms, rights and responsibilities about AI models would be a useful area to explore
  • that eliminating bias is the wrong goal. Bias exists in our society, some of that bias becomes encoded in data and technology. AI relies on the past to predict the future, but the past might not reflect the present let alone the world we want. We should build systems that take us towards the future we want, and that can adapt as things change
  • that in a world which is increasingly online-first and where we risk the state disappearing behind a smartphone screen and automated decisions, that the principles of public life should be updated to put the need for humanity front and centre

I also learnt a lot from other attendees with some interesting things for myself and the team back in the office to chew over.

After the roundtable

A couple of weeks after the roundtable I was sent the transcript to review. The committee will publish that transcript openly — which is good and healthy. Attendees get to see the transcript first so they can suggest corrections to simple grammatical errors or transcription problems. That’s why I’m not commenting on or sharing what other people said.

It is important to review the transcript. There are sometimes errors. For example, in this transcript I was recorded as saying that my boss, Jeni, was “whiter than me” rather than “wiser than me”. I have no idea how I’d measure the former but I certainly know that she’s the latter. Some of the words and thoughts in this blogpost come from Jeni and others in the team like Olivier, Miranda, Renate, Jack &c &c &c.

Reading the transcript also helps me understand the difference between the clarity of my speech and the clarity of my writing. I’ve left most of my spoken errors in place. Just like the state we can’t only communicate in words and pictures that are sent through a computer. Most of us need to get better at speaking with humans.

A crap analogy

I was home recently and took my sister’s dog for a walk. When we were young we had dogs, Spud and Gyp, so it was a walk I’d taken before. A few things had changed. One was that there was less dog poo.

Me (left) taking my sister’s dog for a walk around Fairhaven Lake.

It was strange comparing the memories of those messy streets, including muck left behind by Spud, to the reality of the present day with dog walkers cleaning up and signs warning of penalties if they did not. There has been a change in our social norms. In return for the right to walk a dog, most people now accepted they needed to clear up behind them.

My day job is doing policy for the Open Data Institute. Policy is about changing outcomes, hopefully for the better.

On their own, legislation and guidance won’t fix challenges like data ethics, making data as openly available as possible, or the many other complex challenges that limit the social and economic value that societies get from data. It will need social change too.

I’m interested in how that change happens, including how society decided dog walkers should clean up the dunghills created by dogs.

People like having dogs, but dogs make a lot of shit

I found a blogpost about a book by Michael Brandow telling the tale of the introduction of a poop scooping law in New York City. I got a copy of the book and settled down for a read.

It would take a lot of rain to clean up 500,000 pounds of dog feces. (image Taxi Driver, copyright a big film company)

People like having dogs (*). They like having a companion. They like going for walks. Dogs can make people feel safer, particularly in a city that had as high a crime rate as 1970s New York. But dogs make a lot of shit (**).

In 1974 New York City’s Bureau of Animal Affairs estimated that 500,000 pounds of dog faeces were hitting the streets every day. The city’s population was growing. More people meant more dogs, more dog excrement and less space to step around it. That affected not just dog walkers but everyone else using the streets.

This sounded analogous to the interweb’s superhighways. While some people are having fun, other people are stepping in the dog doo-doo we make. I read on.

The dog doo-doo battle of many armies

There was a long battle to clean up New York City, it lasted for most of the 1970s. The battle involved many familiar armies.

There were a mix of civil society groups in the battle. Some wanted cleaner streets, others just wanted to keep walking their dogs, and some saw the opportunity for self-publicity. There were also people who didn’t care about the battle being waged under their feet.

A search on Amazon shows 1,357 results for ‘poop scoop’

There were businesses in the battle too. Some businesses simply wanted cleaner streets outside their shops. A pet food association objected to the final legislation because of the impact it might have on their customers, dog owners. Other businesses saw new opportunities. There was a boom in innovative, and probably disruptive, dirt cleaning solutions that continues to this day.

When dog owners look like their dogs is it correlation or causation? And which way is the causation? (source: National Library of Ireland on the commons)

Different government organisations took positions. In 1970 a new city Environmental Protection Agency had been created. Its leadership saw the opportunity to clear up a problem affecting citizens. Other organisations didn’t want the cost of enforcing new legislation and argued for others to take the lead.

Some organisations seemed to see a chance to pass part of the cost, and blame, for cleaning the streets to dog walkers. I suspect many other government organisations were wondering why all this effort was being spent on canine coprolites.

Meanwhile politicians were trying to navigate between all of these interest groups to tackle both this problem and others facing the city.

Politicians talking crap

Throughout the 1970s some argued that people could be persuaded to change behaviour without legislation through campaigns and leaflets. Both civil society groups and government organisations tried to do this and had some effect in parts of the city.

A waste receiver for dogs

Others said dogs should use bathrooms in houses, use different sides of the street on alternate days, or even be banned from the streets altogether. The mess caused by dogs risked all the enjoyment being taken away.

Some dog walkers, government organisations and politicians said that it was government’s job to scoop the poop and that government should have more resources for street cleaning.

There were politicians who thought that no legislation was needed as other problems took a higher priority. One politician said that he was keen for the legislation to happen as it would encourage city staff to focus on dogs rather than car parking fines. All politicians were heavily lobbied, by dog lovers and dog poo haters.

I can see a common pattern here. Regardless of whether the policy is about data or doo-doo we need public debate to gather ideas and decide who has to do what, what resources they have to do it with, and whether they get paid for the doing.

There was a campaign over public health issues with statements that an illness called toxocariasis, which can be caused by worms in dog excrement, was causing loss of eyesight in children. This risk appears to have been significantly overstated, although it looks like incidents of toxocariasis are reducing in the UK since dog waste laws were introduced there, but it was an effective campaign.

The debate raged until Ed Koch became Mayor and took a different tack. Rather than having another go at getting a new law passed in New York City’s legislature, he took the problem to the politicians at the New York State Senate. At the state level politicians debated how different solutions are needed in cities to more rural areas and passed legislation that only affected large cities (***). The law gave the city the power to fine people who didn’t scoop their pooch’s poop.

In all policy work sometimes you have to explore a few paths before you get to your goal.

Clearing up dog shit is good for society

Throughout the debate there was a common thread. A city that welcomed dogs but that had less dog faeces scattered around would be a better city.

Dog owners enjoyed the company of their dogs, but other people in their local communities were affected by their enjoyment. Pavements, or sidewalks in NYC, are shared spaces. Use and misuse of that shared space affects everyone who lives in the city. After a debate dog owners were prepared to take on the task of clearing up some of their mess for the benefit of wider society.

A super pooper scooper sign in North Vancouver communicating the new social norm in multiple languages. Image via “New York’s poop scoop law: dogs, the dirt and due process” by Michael Brandow

It is hard to know what was most effective — the debate, the civil society campaigns, the leaflets and signs, government loudly declaring that it had legislated, or the final push of fines. I’ve struggled to find good crap data. But the repeated legislative battles show us that NYC policymakers thought a law was required.

The book includes an interview saying that six years after the legislation was passed, 60% of dog owners were cooperating with the law. After a dog doo-doo battle which led to legislation for England and Wales in 1996, a larger shift in public behaviour was seen after more time had elapsed. A study in 2014 by three researchers from the University of Central Lancashire, 10 miles from my hometown, reported that only 3% of British people would not pick up their dog’s poo.

The shift from the streets and dog walkers of my childhood to one where only 3% of British people will not pick up dog poo is a significant change for the better (****). That is social change in action. Social change that made my walk a bit easier. Even though I now had to clear up after my sister’s dog everyone, including me, could enjoy the park a little bit more.

But, does this tale teach us how to make data better?

A crap analogy

Well, not directly. The title of this blogpost wasn’t a joke. It is a crap analogy. Our motives for using data are different from the simple motives — have fun, feel safe- of walking a dog. Data is not like doggy doodah.

While data is not like doggy doodah, Misha Rabinovich has shown that you can use data about faeces to make art. This artwork is temporarily installed at the Open Data Institute for a 2018 exhibition. I wonder if it subliminally got me thinking about this blogpost.

We can all agree what dog poo is, but we cannot all agree on the mess being created by how people are collecting, sharing and using data. We haven’t reached an agreement on what ‘good’ looks like and what outcome we are trying to achieve.

Meanwhile although the data ecosystem contains many of the same actors — individuals, civil society groups, businesses, and government organisations — each with their own changing motives and power it is more than a physical city. There are multiple virtual global villages which manifest themselves in our physical towns, cities, nations and continents. Someone in the UK can create mess on a virtual street used by people in Uruguay, the Ukraine and Uganda. It is trickier to deliberately change social norms and create better outcomes in such a complex system.

But the tale should remind us that given time and effort people are willing to change behaviour and reduce the negative impacts they have on other people. Do you need a New Year’s resolution for 2018? Let’s keep having fun with data, but let’s think more about other people and clean up some of the shit that we’re creating.

(*) and other pets, such as cats, that also lead to interesting tales about data

(**) data about other swear words is available

(***) UK politicians and dog waste policymakers would possibly benefit from reading that 1978 New York State Senate debate as it seems that UK is still discovering that while bagging it and binning it works in cities, in more rural areas you need to stick it and flick it.

(****) despite the improvements some people want city streets that are completely clean of the odious dog ordure. You will regularly see news articles about towns and cities saying that they might use CCTV tracking, registration schemes, and dog DNA databases to catch offenders. A company called MrDogPoop claims to have “the most powerful Dog Poop DNA matching database in the world” to help track down poops that avoid the scoop. These city-wide schemes tend to disappear when people realise the cost and debate uncovers that a rover registration scheme is too much of a stretch to our social norms.

An example I use when talking about data and services

In my job at the Open Data Institute I sometimes talk with people, from businesses and governments, about how better use of data can help them design and deliver better services. I’ve been using a public sector example recently that I’ve not written down. Here it is.

Ways to get bus timetable data to people who need it

The example I use is bus timetables. People need to know the times and routes of buses so they can make a journey and get to their destination. When I use the example I talk through four of the patterns that can be seen in many cities and towns around the world for services that get bus timetable data to people who need it.

  1. Mass market private sector services: many cities and towns now have bus timetables available as open data. Private sector services like Google Maps, Apple Maps and CityMapper pick up this data and build it into a service which they aim at the mass market of smartphone users. The services work in many cities and might haveother features such as information about restaurants and pubs. They get their open bus timetable data either directly or through a data aggregator, like TransportAPI or ITOWorld, who collate data from multiple cities / transport providers. That takes aways some of the effort from using open data and makes it easier for more people to build services.
  2. Targeted private/public sector services: smart cities and towns recognise that the mass market services don’t always meet all needs, particularly accessibility. If you look closely you can often find small bits of public services meeting the needs of some users, or a transport authority running a challenge to help focus the private sector market on meeting particular user needs. Left to its own devices the private sector might only target the profitable and easy-to-serve mass market, a challenge can help change that to build more accessible services or to experiment with new technologies like AI or voice interfaces. Targeted services often use the same data aggregators as the mass market services. It’s the same data, just presented for a different set of user needs.

A bus stop outside Picaddily Station in Manchester

3. LocalBusTimes: a local website and/or smartphone app where people can look up the timetables for a journey they want to make. It might be for a whole town or a single bus company. It probably started by only providing bus timetable data, nowadays I think more of them recommend a route. The local authority or bus company typically run the LocalBusTimes service themselves.

4. Physical services: not everyone has or uses a smartphone when they need bus timetable data. There are many reasons for this. To give just a few: there might be no coverage, they might not be able to afford a smartphone, they might have run out of credit/data, they might not want a smartphone, their city might not have made bus timetable data available or they might simply have run out of battery. That’s why bus stations have information desks, why bus stops have timetables printed and stuck to them and why people ask other people “when’s the next bus?” on the street. Someone has used the bus timetable data as part of the design for the bus stop or as part of designing an operational process to help a human answer another human’s questions.

Some of the reactions I get to my example

No one, yet…, has told me that my example is stupid or dull. Feel free to be first to do that.

When I talk through this example with people the usual reaction is that while lots of people knew about the transport sector and data few people had thought of all the patterns or wondered about how they could be applied to their work in another sector.

Most people had used the mass market services but very few people had thought of using the market, in this case through open data and challenges, to help them meet their own goals. Those that had thought that they risked losing control to the market and hadn’t realised that they could still discover if user needs were being met — for example through user research — and could use a variety of ways to shape the market to target unmet needs. Challenges are just one of the ways to do that. Governments can legislate. Both businesses and governments can use procurement, strike deals, make different types of data more open, either fully open or in a more controlled way through APIs, or lots of other forms of soft power to shape the market around them.

I also find that few people had thought of the physical services pattern as part of the overall service. I find that sad. It also shows that I’m in a bit of a bubble and exposed to only some views. The tech world is overly focussed on services that end in smartphones and websites. I expect/hope that’s a passing phase.

Why I’m writing this down now

I’m writing this down now because I’ve been using the example for a while. It’s good to publish it to get my thinking straight, to show some of the reactions I get and to learn from new reactions. As I often say, data is becoming infrastructure that will be as open as possible. Businesses and governemnts need to adapt to that future. They have different goals, and needs for democratic accountability, but can learn from and collaborate with each other. I’m expecting to do some more work on public sector service delivery models over the next few months. It’s good to share, even shoddy, thinking early. It’ll help make that work better.

Open your effing data

Warning: this post contains content that will be offensive to some people.

The post is a version of talk I gave at the ODIFridays series of lectures at the HQ of the Open Data Institute in London. The slides and a video of the talk are at the end of the post. Like most of my talks I adlibbed a bit. The post has links to most of the material I adlibbed from, others are at the end of the slides. It includes some thoughts on swearwords, Roger Mellie, democracy, censorship, Blackpool FC, artificial intelligence, context and an apology to my mum.

One of the UK’s regulators, Ofcom, commissioned research on offensive language last year. The research got lots of headlines. It was a nice opportunity for papers and websites to make cheap gags about swear words.

A report from the Metro on the publication of the report.

But it also gave me an opportunity to open up some swear word data and to use that example to talk with people and think about things like democracy, censorship, context and artificial intelligence. I made some cheap gags about swear words too.

Data needs context

Ofcom published the research in an openly licensed 126-page document and a 15-page quick reference guide.

from the report that Ipsos Mori did for Ofcom

The newspapers extracted the data from the PDF to write their stories. I extracted the data too. (btw some work that our friends at ODI Leeds and Adobe are doing might make my cut and pasting easier in the future…)

Unfortunately at first I missed the all important context for the data. I discovered the mistake by checking my data with the helpful team at Ofcom.

Take a look at the data or if you want to use it in a project or service there’s a CSV in github.

After some discussion within the ODI and with Ofcom’s research team we ended up with this. The same data as the PDF but in a format that is both human and machine readable.

Now, a big part of our job at the Open Data Institute is “getting data to people who need it”. Normally I start with problems but this time I had started with data. My bad. Now to find out who needed it and how they would use it.

Some of the things people use this swear word data for

As I put the data out on twitter there was a background mantra of “arse…balls….knob…bastard…” from around the office. One person then wrote a little script that people could use to get their computers to say the list of words. Soon I could hear both human and machine voices swearing away. The swearing mantra was charming, if a little unsettling, but I had my serious face on. Why do people swear?

Well a bit of research showed an academic saying:

The main purpose of swearing is to express emotions, especially anger and frustration.

Seems fair. I suspect that a lot of people get frustrated at not being able to get data they need to do something. That explained the background mantra from the Open Data Institute office, but what about other uses of the data?

Roger Mellie, copyright Viz. Note that the swear word data might allow people to block his language, but not his gestures.

The content of the report told us about some other users. It would help TV broadcasters and presenters understand how people would react to things that they said on air and so help the presenters decide what they could say.

For example the word “bollocks” was seen as somewhat vulgar if it referred to testicles but less problematic if it was being used to call something ‘nonsense’.

This might mean that people did or did not say words in certain contexts. It might lead to some content only being accessible if a PIN was entered to unlock it.

This data was created because of democracy

Democratic processes can need data to be created. Image Nick Youngson, CC-BY-SA-3.0 via http://thebluediamondgallery.com/d/democracy.html

But the biggest user of the report is Ofcom themselves. Ofcom commissioned the research because through our democratic processes we have decided that there are limits to free speech on TV & radio and made it Ofcom’s job to regulate those limits. They needed the data to help with this job so Ofcom commissioned Ipsos MORI to produce the data by performing user research through focus groups, interviews and follow-ups based on a long list of potentially offensive words and phrases.

We have given Ofcom the power to fine organisations and people that breach their codes. By publishing the report openly, they were helping broadcasters understand how they might use those powers and therefore discouraging breaches. This probably makes the system cheaper and more effective.

Broadcasters are likely to have their own guidance to help them meet the expectations of their target audiences. They could merge Ofcom’s list with their own list to help them meet both society’s needs and their own user’s needs.

Similar data is maintained in contexts outside of TV and radio

In Britain Mary Whitehouse was a famous campaigner from the 1960s to the 1980s against things that she found offensive. I can imagine Mary being keen on data-driven censorship. Image fair use via Wikipedia.

The data includes the word ginger saying it is ‘mild language, generally of little concern’, but the word ginger can also be used to describe a very tasty type of biscuit. A filter that used the swear word data to block offensive words might ban ginger nuts. That would be bad. This is a common problem with simple data-driven solutions. They ignore context.

I couldn’t find a list of offensive biscuit names but there are other sets that are similar to the swear word data used in contexts other than TV and radio.

The UK has a list of suppressed car registration plates

It is the job of part of the UK government, the DVLA, to maintain a list of combinations of letters and numbers that you cannot put on a car. Unfortunately, and curiously, the list is not published openly, but sometimes it is made available after freedom of information requests.

An extract from the suppressed car registration plate list via Whatdotheyknow

The list of suppressed car registration plates helps prevent confusion over typographically similar symbols, like o (zero) and 0 (oh). It blocks language that is likely to be considered offensive, for example “*B** UMS” and “*R**APE**”.

The list also explicitly contains the names of terrorist groups such as the UVF, UDA and UFF. Another terrorist organisation, the IRA, are already banned, like any other organisation beginning with I, because of the potential for confusion between 1 (one) and I (aye).

More controversially the acronym for the far-right British National Party, BNP, is also on the list. The BNP are allowed to stand in the UK’s democratic election process. How was that decision made? Unfortunately just as the list isn’t publicly available neither is the methodology.

Context affects what words are offensive

The UK’s democratic processes produce others lists of offensive words.

The speaker in the UK’s parliament can request that politicians withdraw words when debating with their opponents, so called unparliamentary language. The way in which words are deemed to be unparliamentary or not are unclear. In 2015 the opposition leader Ed Milliband was allowed to call the then Prime Minister David Cameron “dodgy”, yet in 2016 an opposition backbencher Dennis Skinner was asked to leave a debate because he called David Cameron “dodgy Dave”. The word “dodgy” isn’t on Ofcom’s list, it’s offensive to call an MP “dodgy” in a parliamentary debate but not to call them it on television.

The list of unparliamentary langauge is currently unpublished. To help UK politicians make better decisions about being unparliamentary or not I compiled some examples into a list. Parliaments in other countries, and other UK nations, have similar lists. They show the importance of geographic context.

The Australian parliamentary records show offense was taken against the term “suck-holing”, a word that in 1977 was decided to be offensive in the Australian parliament but that will be meaningless to most British people and has never been used in the British parliament. I wonder if a British MP would get away with using it.

The word “Oyston” is offensive to me and my community of fans of Blackpool football club. The offensiveness is not only because of this cringeworthy picture but because of how the Oyston family treats fans.

Another example of offensive language in a particular context is the word “Oyston”.

The Oyston family own the football club that I support, Blackpool FC. Because of their actions against fans being called an Oyston fan on one of the websites used by Blackpool fans would be offensive. How would anyone outside of the community of Blackpool fans discover this?

There are related examples that may help us understand how we could do this.

Collaborative maintenance of data

Hatebase maintains a list of hate speech from around the world. The data is maintained by automated processes and manual interaction to cater for how hate speech changes over time and in different places. Hate speech can be used to encourage violence against people and communities. The collaborative maintenance process allows people to debate which words are hate speech or not.

“popular” types of hate speech from Hatebase.

An interesting experiment would be to see if the hatebase dataset could have helped predict violent events through rises of hate speech in parliaments, newspaper and social media. Do get in touch with them if you have money to fund that research.

Other people could learn from the example of Hatebase. If British politicians wanted, and could get to grips with github, then they could collaboratively maintain my initial list of unparliamentary language and create something that would help them understand the boundaries of offensiveness.

Offensiveness is affected by time, place and communities

Rebecca Roache in Aeon magazine.

By this point in my own research I was clear that the context of offensiveness is affected by time, place and communities.

When I checked I found that swearing philosophers were, of course, already aware of this. As often happens I was a technologist rediscovering ground that others had already covered. But technology can also affect how and which words become offensive.

People create new offensive words

Oyston is an example of a word that became offensive to a small group of people before becoming offensive to a larger group. Blackpool fans have effectively used social media and the press — oh, and talks & blogposts like this ;) — as part of a campaign to get the Oyston family out of our football club. An effect of this has been to spread the understanding of the offensiveness of the Oystons from the seaside to wider parts of the footballing community. A more famous example is the case of Rick Santorum who found his surname defined as an offensive word in a campaign led by Dan Savage.

This is a challenge to any list of swear words and a risk for people who use them. People create new offensive words for their own purposes. They game systems.

A t-shirt with the universally unique identifier for beef curtains.

Would people game the swear word data I created from Ofcom’s list? Yes, of course they would.

An example quickly came to mind. When I published the Ofcom offensive word list as open data then in line with good practice I gave every entry a universally unique identifier (UUID). UUIDs make it easier for machines to use the data.

If this data was to get widely used then how long would it be before people started to circumvent the system by being interviewed on telly wearing t-shirts with the UUID of a swear word? Perhaps over time the UUIDs, or parts of them, would become offensive? “That fella’s a right 81cb.“, they’d say. Maybe the UUIDs would need to be added to the list as they became offensive?

People adapt and change. That is one of the best things about people and one of the biggest challenges we face when maintaining and using data. We need to build in mechanisms to change datasets over time as needs and uses change.

Swear words-as-a-service is hard

It is clear that swear word data was easy to build and also clear that it would be more difficult to maintain and make it useful in multiple contexts.

I knew that many companies were already maintaining similar lists as, like many other people, I had seen, laughed and evaded filters on websites that had turned the British town of Scunthorpe into the apparently inoffensive “S***horpe” due to simplistic and bad data-driven algorithms. I do wonder how useful those filters and services are.

Many of the website filters I had seen are simple and flawed because of the lack of context and their inability to adapt to people’s changing behaviour but thinking ahead I wondered if people would start to apply machine learning / artificial intelligence (ML/AI) and create services that could automatically learn new swear words? Perhaps this could be used on a massive scale to reduce the damage caused by offensive language on the web?

A couple of snippets from this patent

I knew that I wouldn’t be the first person to think of this idea. While 2016 had been the year when every problem could be fixed with a blockchain, 2017 is the year of ML/AI.

A quick search of patent libraries showed that in 2015 Google had registered a patent to classify offensive words using machine learning. Unfortunately it looks rubbish. The training mechanism worked on a large set of text samples, it failed to recognise the context in which the text was being used. The resulting service might be slightly better than current filters but would still be data-driven rather than informed by data.

Maybe, like Hatebase, it would help if users were to train the machines that provided the service. After all Google, like most other large internet companies, use thousands of people — including you — to help train their services. I started to consider what I had learn about offensive language and think of the tasks that Google would need to give to swear word raters to train their machine:

Task: go to a football ground in Gdansk, Poland. Play this video to people near you. Observe their attitude to you, and each other, over the following seven days and then categorise the offensiveness of the video. Repeat this exercise every 3 months.

Hmm… I quickly realised that this might be a Quixotic mission and that AI/ML might provide a better service but still only a partial one. There would be no perfect service. People decide what is offensive, not machines. If the service only considered some contexts then the people who controlled the machines and trained them on those contexts would be the ones who decided where it was useful. Swear word data isn’t like the location of bus stops or the list of transactions in a bank account. The context is even more important.

This is one of the challenges of the web and providing data and services for it. The web is pervasive. It interacts with the physical world in many places. It appears in multiple contexts. I use the web to watch broadcast news, like that regulated by Ofcom. I use it keep up to date on politics, where the unparliamentary rules are useful. I talk about football, and the Oystons, on message boards. I keep up to date on current affairs, and feel helpless at the levels of hate speech deployed at people in the UK and abroad. I chat to friends, both publicly on sites like Twitter and Facebook and also privately in messaging applications.

Datasets and services that reduce offensive content on the web will need to cater for all of these different contexts, and more. Even if they do, some people will still work around them. Data and technology may be able to help the problem but it will only ever be part of a solution to something that is fundamentally a more human problem. Our need to express our emotions in language.

Sorry mum

It was clear from my investigations that we could usefully create data about swear words, i.e. words that are offensive. That the need for this data came from people who swear, people who didn’t want to swear and societies & communities trying to decide the boundaries between what was offensive or not. That it would be useful if the research and rules for deciding on what was offensive were open. And that if people could collaborate to decide on what was offensive that the data would be more useful because it would cater for more contexts. But it was also clear that while technology creates new possibilities to reduce offensiveness that people will still adapt to achieve the goal they want. So it goes.

The other thing that was clear from the talk was mine and my audience’s squeamishness with some of the words. In my case it was certainly because of one of my most important contexts: my upbringing and my family. I’d like to end this post the same way I ended the talk by apologising to my mum. Sorry mum.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

The questions from the audience showed the importance of context

At the end of the talk at the ODI the audience raised several points about offensive language that had not been covered in the talk, such as the use of racial and religious slurs. I was already covering a wide topic. Racial and religious offensiveness cover even more ground. I couldn’t cover everything.

Image from The Wanderers, based on a book by Richard Price. The film includes a fantastic scene in a 1960s New York school where people of different religions and ethnicity try, and fail, to remember all of the offensive names they have for each other.

I did find it interesting that the audience in the room hadn’t heard of some of the words in the list. Particularly choc ice, blood claat and bum claat, words that in my — white, middle class, mostly Northern England and South London experience — are used against black people or in black communities. In the case of the latter two more specifically within Jamaican communities.

That people hadn’t heard of these words says something about the context of the audience. A context where those words may not have been seen as offensive. Perhaps next time I talk on this topic I should try and sneak in some offensive language from different contexts to see what happens.

Watch the original talk or read the slides

If you want you can watch a recording of the talk (which includes some swear-a-long fun):

You can also see the presentation on slideshare or google slides, whichever your prefer.

Words from leasehold and commonhold reform APPG

Approximate words spoken at the meeting of the the UK Parliament’s All Party Parliamentary Group (APPG) on residential leasehold and commonhold. The meeting was chaired by Jim Fitzpatrick MP and Sir Peter Bottomley MP. There were 60–70 people in the room: MPs, Peers, conveyancing firms, big homebuilding companies and people suffering under bad leasehold terms.

Yes it’s 900 years away but why should anyone produce or sign a contract that commits them to spend this? (source: Telegraph)

I spoke after Patrick Collinson from the Guardian, who has written extensively about leaseholds in England and Wales and the issues some leaseholds cause for people; Bob Bessell of Retirement Security; and Phillip Rainey QC a specialist in property litigation and expert in leaseholds.

Phillip discussed various policy options to tackle the challenges. The options includes banning ground rents or limiting how much they could increase in value and many other subtle tweaks.

I then had 5 minutes.

Hello, thank you for inviting me. I’m from the Open Data Institute (ODI). You may not have heard of us. (murmers of agreement)

We were founded 4 years ago by Sir Tim Berners-Lee, the inventor the web, and Sir Nigel Shadbolt. Our CEO is Jeni Tennison, she apologises for not being here. So do I as I’ve ended up creating an all-male panel. That’s bad.

We are global. We connect, enable and inspire people to innovate with data. Or “to get stuff done that make things better by being more open” as I sometimes say.

I am not a housing or leasehold specialist, my job is to get data to people who need it. Leasehold Knowledge Partnership are part of our current UK startup programme. They’ve been helping us understand the problems in leasing, we’ve been helping them understand whether more data can help.

At the ODI we think of data as a new form of infrastructure. It has become essential infrastructure without us realising it.

Like most physical infrastructure – for example roads – data creates most value when it is as open as possible while respecting privacy.

When data is open and available for anyone to use it is easier for people to use it to make decisions and solve problems.

Take leaseholds. Let’s imagine if more information was open while respecting the privacy of homeowners.

  • People expect easy access to data in the web age. Many homebuyers use sites like RightMove and Zoopla as they look for a home. Opening up leasehold data would enable those services to help people make an informed decision. For example they could compare terms with other properties, leasehold or not, in the area and see what’s reasonable. Some of the cases Patrick mentioned happened because people lacked information when buying a home.

  • Conveyancers and estate agents would have access to more data too. They could get things done faster and give better advice to homebuyers.
  • Researchers would be able to model the market; help people understand how it is working and suggest improvements
  • Legislators would be able to get better information about problems, where legislation is needed or where soft power could be used to influence things
  • With better access to data government could test a policy idea, like the ones Phillip suggested, in a region before deciding whether to roll it out nationally

Much of this data is available but it is locked away. In government offices, in the offices of house building firms, in law firms or in contracts held by leaseholders and freeholders.

Some of our big public registries and institutions – things like the Land Registry, Ordnance Survey, the Met Office — were created to make this type of information available to people who need it but it feels like they haven’t adapted to changing times and 21st century needs.

Getting this data open can take time and cost money. Not that much, technology can be cheaper than some people might tell you. But getting the data open and using it to change markets, like leasehold, can also affect business models. That’s usually more significant.

We need to support those organisations to change their business models; move to a future where we have data infrastructure that is as open as possible while respecting privacy; and help meet society’s 21st century needs. That might mean they also need to help open up data held outside government.

In closing I’d ask both the members of the APPG and all of the leasehold experts in the room to think about the power of the web, what people expect in the modern age and how the tools and techniques of the web and data can help build a better housing market. One that can reduce the number of cases like those that Patrick Collinson has written about over the last few months.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

After the various speeches questions were asked by people in the room. The questions were from a more diverse group of people than the the all-male panel (grr!).

I was asked whether there was enough data available for someone in Ellesmere Port to get a reasonable view on whether their leasehold flat will be worthless in 10 years time. I’m checking that today.

Someone else raised the issue of freehold management companies surprising people with unnecessary administration fees — for example £250 for a simple bit of paperwork that is necessary if the homeowner wants to sell their home. That’s an issue my wife and I are well aware of having just sold our leasehold flat in London. We plan to blog on how data helped and where some data was missing.

Someone else asked whether we knew if the problem with leaseholds was bigger than in the 1970s. The answer from the panel was a bit vague but Phillip Rainey raised an important point. He said that the problem was getting worse because lawyers were producing new tighter leasehold clauses that benefitted the freeholder. He said that lawyers used the web to share these new clauses so they were all getting better in a way that made the situation worse for leaseholders.

You see technology can be used for good and bad and — as a very wise person once said — knowledge is power.

To help level out power imbalances we need to share the knowledge and the skills to use it with everyone.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

After these questions the event was closed by Peter Bottomley who discussed next week’s leasehold reform debate in Parliament and how he intends to name names.

{Update 22 December: the Hansard transcript of the debate is now up}

Words for the launch of the APPG on data analytics

These are the approximate words I said at the launch of the new All-Party Parliamentary Group (APPG) on data analytics on 31 October. An APPG brings together representatives from different political parties from both the House of Commons and House of Lords to pursue a particular topic or interest. Daniel Zeichner MP’s speech from the launch is also online. Other speakers were from TfL, Experian, CompareTheMarket and the Institute for Environmental Analytics. In person I wandered off topic a bit based on audience reactions but I promise that there were no cat jokes.

Hi, thank you to everyone who’s come along and for inviting us to speak. I work at the Open Data Institute, or ODI as it’s more commonly known. The ODI’s mission is to connect, equip and inspire people around the world to innovate with data.

It is based in London but the network is global. We have nodes and members on six continents and in every nation of the UK. We do research, train people, advise them, introduce them to people with similar interests, give them simple tools to help them publish and use data, incubate startups and encourage thinking on fundamental issues such as data infrastructure and how to use personal data in a way that creates trust. We do this with large businesses, startups, charities and governments. We are a global voice for the better use of data to deliver social, environmental and economic impact.

The ODI is a not-for-profit and was founded five years ago by Tim Berners-Lee and Nigel Shadbolt. Both of them are at the yearly ODI summit which takes place at the British Film Institute tomorrow.

Bringing people together to solve common problems

The ODI team at the 2015 summit. Don’t let anyone convince you that diversity in tech is impossible, it’s not. Image by Paul Clarke, CC-BY-SA.

The summit is kind of unique, as is the ODI. It brings together large corporates with charities and startups; people interested in global development and democracy with people interested in the latest smart cities and transport trends; people from local government, national government and reps from global institutions. The attendees and speakers come from around the world. They all believe that openness and data can benefit them and everyone else too. (you can watch a stream of many of the summit sessions)

Which brings me to this all-party parliamentary group on data analytics. I’m a big fan of democracy and I’m also a big fan of things that bring together people from different backgrounds such as elected representatives and peers from across the political spectrum to find common points of interest, or problems, where people can work together to get things done and make things better. It’s the type of approach we use to help bring together large sectors like banking and agriculture, another one will be announced tomorrow. I won’t spoil the surprise. (it was sports)

An age of data abundance

We are in an age of data abundance with billions more people and devices coming online. It’s ever cheaper to collect, use and publish data. A web of data is evolving that sits alongside and behind the web of documents which changed our lives when Tim Berners-Lee invented the web 20-odd years ago. Our experience from the last 5 years is that that data will create most value when it is as open as possible while respecting privacy: an open future. But the future is uncertain.

We need to work together to shape an open future because whilst the current wave of technology change has bought many benefits it also carries many risks. Privacy risks, monopoly risks, democratic risks. We need to overcome those risks and project a positive message to get to a good future.

Tim famously said “this is for everyone” when tweeting about the world wide web from the launch of the London Olympics in 2012. The type of open thinking that Tim showed when he gave away the web is going to be necessary if we are going to realise the brilliant potential of this new web of data to benefit everyone.

And that open thinking is what we hope to see from this all-party parliamentary group. As well as the rest of us we need government and legislators to play an active part in making this happen. Government can lead by example.

Data for everyone

We can benefit everyone if we build data infrastructure (vital reference datasets like maps, lists of local authorities and addresses, and tools, processes, policy, legislation, organisations) which is reliable, adaptable, trustworthy, and as open as possible. Open in the sense of culture as well as open data.

We need to provide data skills for citizens, business and policymakers, with policymakers using data both for evidence and as a tool to achieve their policy ends.

And we need to encourage open innovation. A bridge between academic research, public, private and third sectors, and a thriving startup ecosystem where new ideas and approaches can grow. Innovation that solves problems.

We describe this as the open future. A future where we’ve understood and tackled those risks, made data as open as possible and created benefits for citizens, businesses and government. Data for everyone.

There were questions

After we talked the audience asked questions covering a whole range of topics from data in manufacturing and engineering; trust in use of data; public sector reform; EU proposals for copyright and how that impacted on organisations holding data; and whether people should be paid when their data is used. A wide range, as you’d expect from something that connects together and underpins sectors across the economy.

The last two questions I found particularly interesting. Both of them seemed to come from applying models from the real world to something, data, which has different qualities. Data is non-rivalrous, it benefits from network effects, etcetera. That’s why the economics of data are different from other things and still being researched. The questions also seemed to come from an implicit assumption that we could use the concept of ownership in the physical sense of the word. We need to be careful in how we use the language of ownership to address questions about data. Physical world metaphors don’t readily fit the data world. And even our understandings and expectations of ownership in the physical world aren’t as simple as they seem. This blog from Ellen Broad is a good read and what I channeled in my response. I hope the APPG thinks about those questions and the concept of ‘data ownership’ deeply. Its members will be part of shaping the legislative environment that will help us get to that open future.

An open city is a better city

Approximate words from a talk at the Holyrood Connect: Data Forum in September 2016. Approximate as I tend to ad-lib in person as I see shocked, or occasionally, pleased faces in front of me. I also had a bad cold so ad-libbed even more than normal. The slides are also available online.

— — — –

Hi, I’m Peter. I do some stuff at the Open Data Institute (ODI). I’m here to talk about how an open city is a better city.

First some background and a couple of concepts: the data spectrum and data infrastructure. Then some current examples of data analytics in cities, and their limitations, followed by some UK examples of people building more open cities with more benefits. I’ll end up with some principles to help get you started and a bit about what’s coming in the future. Ok, background:


The ODI was founded four years ago by people like Tim Berners-Lee and Nigel Shadbolt. It is headquartered in the UK but its team works around the world. There are currently 29 nodes in 18 countries. In the UK that includes places like Aberdeen, Leeds, Belfast, Devon, Bristol and Cardiff.

The ODI’s mission is to connect, equip and inspire people around the world to innovate with data. We believe in knowledge for everyone. We help the public sector, third sector, academia and businesses to get more impact from data. Last week there were research fellows in the office from Madrid and Singapore debating and sharing ideas about geospatial data and privacy, crowdsourcing and smart cities. In the last few weeks the HQ team have been doing stuff in the UK, in Malaysia, New York, Mexico and Tanzania.


The ODI works across the data spectrum. Some of us worry about personal health records being “made open”. Some confuse commercial and personal data, or mix up “big data” with “open data”. To unpack data’s challenges and its benefits, we need to be precise about what these things mean. They should be clear and familiar to everyone, so we can all have informed conversations about how we use them, how they affect us and how we plan for the future. And it doesn’t have to be complicated. It can be simple. In one image. Whether big, medium or small, whether state, commercial or personal, the important thing about data is how it is licensed and who can use it. Closed so that it can only be used within one organisation, shared can only be used by some organisations (because of rules or price restrictions), or open data that can be used by anyone for any purpose.

The ODI works to improve data infrastructure. Data has become vital infrastructure over the last few years. It underpins transparency, accountability, public services, business innovation and civil society. Data such as statistics, maps and real-time sensor readings help us to make decisions, build services and gain insight. Data infrastructure will only become more vital as our populations grow and our economies and societies become ever more reliant on getting value from data.

I often hear people say that data is the new fuel or that it’s oil for the digital revolution. Daft analogies. Data doesn’t get burnt up when we use it, we can use it again and again and again. It doesn’t get extracted from the ground: unless it’s geological data. The analogy we use for data infrastructure is roads. Roads help us navigate to a location. Data helps us make a decision. Roads have signs and maps to tell us how to use them. So does data, well hopefully.

Lots of cities are improving data infrastructure

Now back to the theme of cities and data. Cities and local authorities around the world are using and improving data infrastructure. It may not feel like it sometimes, but they are.

Many public sector organisations are developing skills and creating more impact by using their own data to make better decisions. Whether it be where to spend money on social care, what time to pick up the bins or how to design a local authority website so that it’s easy to use. In each case the organisation is having to learn how to gather data, analyse it and use it to make a better decision.

These are all activities in the closed part of the data spectrum.

Half-spectrum doesn’t give you all the value.

We’re also seeing more and more public sector organisations work together and share data to make better decisions. Down in Manchester local authorities are sharing data to help vulnerable children. In London local authorities are sharing and analysing data to look for unlicensed houses of multiple occupancy, they can be unsafe places to live. This type of big data analytics takes inspiration from places like Chicago which has been using data about graffiti tags to tackle gang violence, or New York City and Amsterdam which have analysed data from across the city to work out what characteristics were the best indicators for fire and help prevent it.

These activities take place in the closed and shared part of the data spectrum.

All the data and all the open

But let’s go back a bit. When I talked about data infrastructure I said it underpins transparency, accountability, public services, business innovation and civil society.

All of the previous examples are about public services. The rest of the benefits of data infrastructure missing. There’s some business innovation — for example from data analytics companies selling into the public sector — but only a portion.

Why is that ? Let’s look again at the full data spectrum. We’re missing public data and open data.

At the ODI we say that cities, their businesses and their citizens get most impact from a data infrastructure that is as open as possible while respecting privacy. There’s lots of research showing this and there’s also practical examples. I’ll cover some in a bit.

It’s true you know.

The reasons that open data infrastructure creates most impact is due to the qualities of data. For example, it benefits from network effects. Data becomes more useful and creates more value as more people use and maintain it.

When you work openly and use as much open data as possible then more people can work together to solve problems, make decisions, find insights and build services. You benefit from network effects. You can build a better city. One that benefits everyone.

This is particularly true if you combine all the data — closed, shared and open — with all the open. Open culture. Open source. Open government. Open standards. Open innovation. Etcetera.

There’s lots of examples, here are some

Let’s take a few examples showing some different aspects.

First, Bath and Strava, the cycling app. Strava users cycling around Bath can choose to share their closed personal data with a community group called Bath:Hacked. That group preserve privacy, analyse the data and are working with the council to use it to improve cycling routes. Interestingly there’s anecdotal evidence that people are cycling and using the app more because they can see that the data they collect benefits the city and themselves. Win win. Meanwhile Bath:Hacked are sharing what they’re doing online.

As a coffee drinker I am unsurprised by the decline in tea-drinking in Britain (source: Defra, ODI and Kiln)

There are two reasons for that. First, by opening up the knowledge for everyone other people can use it and other people can tell Bath how they are using it. People can learn with each other. Second, openness about how organisations secure and manage personal data builds trust. It can improve quality too. take Defra who recently did a privacy impact assessment in the open, with people outside the organisation commenting, before releasing diaries showing the diet habits of 150,000 households. They worked out by debating with their community that some of this data which would otherwise have all been kept closed could be made open for anyone to use. Transparency and open debate about personal data can make things better.

Another example, I was talking to someone from Devon council last week. They published a map of places where people could get help. Unfortunately the map was wrong. Because both the data and the source code were open a friendly person could fix it for them and send them the corrected version. Problem fixed within a few hours. Thank you friendly person.

Another. In places like Manchester and Leeds people from the public sector, private sector and civil society are working to build a low-cost open infrastructure for the internet of things. They’re helping each other using each other’s skills and experience as needed. On the infrastructure people will be able to build and deploy sensors to monitor air quality or the height of a river and anyone will be able to use the data to decide whether to place a new school near a road or a set of new houses by a river, whether to buy a house or whether to evacuate a house as the waters are rising…

These things cost money but they don’t need to cost the big money that so many projects with technology do. The cost of software, hardware and hence data is falling dramatically. You can now build an air quality sensor for less than £100, you can get a LIDAR sensor — a device that can measure distance using lasers — that used to cost tens of thousands of pounds for a few hundred pounds. (That’s part of the reason we’re hearing about automated cars so much. They need those sensors too). As much as possible of the data from that infrastructure will be open, that’s the culture of the community. That will allow other people to use it too for only the cost of allowing people to use the data that has already been collected. The infrastructure is designed for open.

And to continue the theme of culture. In Aberdeen the team in the council run hackathons open to anyone and learn innovative techniques from civil society businesses to help the council deliver other services. Those hackathons will also help with the Scottish government’s digital skills initiative that I was reading about on the train yesterday. An initiative that could also be supported by the new work that the Open Government Partnership are starting with the Scottish government.

Back to Leeds. The city council has funded ODI Leeds to act as a neutral space outside the council that can be used to convene businesses, academia, civil society and the public sector to understand and define problems; share data to explore ideas and then open the data as much as possible to allow people to build solutions. Those solutions could be built by new startups or established businesses. Arup, the global construction firm, use similar open innovation techniques working with startups to help improve how they build stuff. It’s like the data analytics examples we saw earlier but it uses the full spectrum.

In each of these cases we can see people from multiple sectors sector working together to solve common problems as openly as possible. In the process new businesses are built, there’s transparency and accountability, civil society are engaged, and there’s better public services too. All of the things our data infrastructure supports.

There’s countless more examples across the world for those who look.

How do I build open data infrastructure?

But, I often hear people ask, how do I do this?

As you may have realised from these examples data infrastructure is not only about data. Data infrastructure includes datasets; the technology, training and processes that makes them useable; policies and regulation such as those for data sharing and protection; and the organisations and people that collect, maintain and use data. We can all see that the datasets may be from anywhere in the data spectrum. But the more open the data infrastructure, the more value it will create as more people can use it.

Principles to help people build better data infrastructure.

Based on the ODI’s own work and research on what works and what doesn’t at city, national and globally we’ve published some principles to help other people build better data infrastructure.

The first and last principles are key. Design for open and encourage open innovation.

Based on our experience we believe we need a number of things to work together to create the space for open innovation to happen: strategy, policy, training, technology, research, a tech community, and engagement. With that engagement you’re looking to build a receptive internal customer (for example a councillor in a city), a responsive tech community and an engaged civic community willing to work with you. With open innovation the best answers can come from anywhere. You just need to get started and have the courage to try.

Anyway, I hope that was interesting, and useful, but before I go I want to leave with you another thought as to why getting to grips with open and data is so important.

The web of data is coming.

Over the last 25 years we’ve all been building the web of documents. Billions of webpages linked together. It’s fabulous. But the billions of people, sensors and services that are connected to the web and the internet produce, publish and use data. A web of data is now evolving that sits alongside and behind the web of documents.

That might seem like a challenging thing and something we can’t control but I would encourage everyone to see it as an opportunity. By getting to grips with your data infrastructure and making it as open as possible you will be positioning your city and the businesses and citizens that live in it to thrive in that future. That sounds like a pretty important mission to be cracking on with. It’s about building for the open future.

An open city is a better city.

There’s countless other examples to demonstrate why an open city is better and to help you understand how to grow your city in a way that works for your problems and your challenges. But, as a start, I’d encourage all of you to pick a problem and get started. Work together with your businesses and citizens to solve that problem and start building that open city and make things better for everyone.

Open addresses: will the address wars ever end?

This is the (rough) text of a talk I gave at the British Computer Society (BCS) Location Information Specialist Group’s 3rd annual addressing update seminar in August 2016. There were more jokes in person. And some Pikachu. The slides for my talk are also online as are those for Ant Beck’s talk.

Hi, I’m Peter. I do some stuff at the Open Data Institute (ODI). The ODI was founded three years ago. It’s mission is to connect, equip and inspire people around the world to innovate with data. Its headquarters are in the UK but it works around the world.

I’m here to talk about open addresses in the UK. To understand the tale it’s useful to start off with a (shortened) bit of history.

Ancient history…

Addresses and other types of geospatial data were early targets for open data releases. They are vital datasets that make it possible to build many, many services and products. Way back in 2006 Charles Arthur and Michael Cross wrote in the Guardian to ask the UK government to “give us back our crown jewels”. They pointed out the complex arrangements for maintaining address data and how the data was sold to fund those complex arrangements. They even pointed out the issues it generated for the 2001 census.

In 2009 the UK government announced that Tim Berners-Lee, one of the ODI’s founders, was going to help it open up data and in 2010 government said that postcodes and address data were going to be early releases. Victory!

Some of the tales from 2013

But it was a pyrrhic victory. Whilst government released many thousands of datasets the promised address data was not one of them. In 2013 the Royal Mail was privatised along with its rights to help create and sell that address data. The complex arrangements that were pointed out in 2006 just got more complex. And, in the meantime, another census happened with the inevitable, and costly, need to build another new address list.

The open data community was rightly sad, and probably got a bit angry. They knew how important that data was. They kept working to make things better. They didn’t just tweet, they organised.

More recent history…

In 2014 the Cabinet Office’s release of data fund provided some money to the ODI to explore whether it was possible to rebuild the UK’s address list and publish it as open data. The ODI pulled together lots of people who work with addresses to share and debate ideas.

The homepage of Open Addresses

This led to the launch of Open Addresses UK. I was one of the team working for Open Addresses. We worked as openly as possible with regular blogs and open source code.

We explored the benefits of better address data for the UK. We found that we could help fix problems such as the months it can take before new addresses are added to computer systems across the country. Months during which someone might not be able to order a pizza, get home insurance or register to vote. We looked at the economic evidence from case studies of other countries, such as Denmark, that have released address data as open data. If the success of Denmark scaled in proportion to the population of the country then the UK could expect to see an extra £110 million a year of social and economic value. Value that we don’t get at the moment because paid data creates less economic value than open data.

We looked at funding models. We started off with £383k of funding from the Cabinet Office. We got some extra funding from BCS (thank you). We knew that we would need to be able to show people what our services would look like before we could start bringing in funding from the users of address services.

From talking with potential users of those services we learnt about the challenges of address entry on many websites. User research supported our theory that moving to free-format address entry would both make life easier for many people and lead to better quality address data going into organisations. We built a working demo of that service.

We knew we needed to gather address data. Following on from the discovery phase we built a model that would allow any organisation or individual to contribute their own address data; that would allow anyone to add large sets of open data containing addresses if they followed guidelines and confirmed that they were legally allowed to publish that address data as open data; and put in place a takedown policy to investigate and remove any infringing data. For the legally minded, we were set up to host the data. This was important. In the past people had been threatened with legal action by the Royal Mail over address data and the hosting model provided a defence.

Unfortunately we hit a snag.

Digital cholera makes me sad.

We learned that one of the largest open data sets held by government was tainted by what we called ‘digital cholera’. It contained third party rights that government was not authorised to licence as open data. This was no good. We wanted to publish address data that was safe to use.

We didn’t want to spend the limited grant funding on more and more legal advice or court battles (sorry lawyers…). So we concentrated on other approaches.

We used clean open data sets and statistical techniques to multiply the address data we already had. For example, “if house number 1 exists and house number 5 exists then house number 3 probably exists”.

We started developing a collaborative maintenance model. People could use our address services to both improve their own services and improve the address data that everyone was using. The model would enable us to learn and publish new address information (such as alternative addresses — like Rose Cottage rather than 8 Acacia Avenue and new addresses) as people started to use them. This would increase the speed of publishing new information and improve data quality. By crowdsourcing data through APIs the data would get better as more people used it.

The team recognised that these new ways of collecting address data would impact on confidence. So, we started developing a model that would allow the platform to declare a level of confidence in each address. The model allowed for different levels of trust based on how frequently we’d seen an address, who reported it, and how long ago they’d reported it. Data users could use the APIs to determine confidence and choose whether to trust an address for their particular use case.

But all this time the clock was ticking. There was limited funding. From the beginning we knew that we were testing two hypotheses.

Two hypotheses. Both are true.

Unfortunately we discovered that both hypotheses were true. We could build much better address services using modern approaches, but the intellectual property issues would keep hindering us.

A report was published: to share the lessons of what worked, and what didn’t. As you’ll see in the report even with all of our mitigations against intellectual property violations in place, Open Addresses was only able to find one insurer who would provide it with cover for defence against Intellectual Property infringement claims. The insurers were too concerned that the Royal Mail would take legal action to protect their revenues from address data.

A blog was published about the shades of grey in open data. And then Open Addresses went to sleep.

Someone else would have to take up the challenge of opening up address data and making things better for everyone.


While Open Addresses was happening so were other things. Lots of things. I’m obviously interested in the data ones.

The ODI was thinking about who owned our data infrastructure. Data is infrastructure to a modern society. Just like roads. Roads help us navigate to a location. Data helps us make a decision.

Spot the infrastructure in this excellent picture by Paul Downey.

The government was also working on its policy of government-as-a-platform. Companies House were opening up their data and putting it on the web. The Land Registry described itself as a steel thread that we could all build on.

Things started to come together with the description of registers as authoritative list that we could all trust. We could all build things on top of government’s open registers.

Registers are data infrastructure. An important part of data infrastructure is geospatial data, like addresses.


In the 2016 budget it was announced that government had allocated £5m to explore options to open up address data.

It is important to understand that this is about exploring options. As Open Addresses had learnt UK addresses are pretty complex. We have centuries of legacy to deal with.

Matt Hancock, who was the Minister for the Cabinet Office when the announcement was made, likened it to the ‘US administration (decision) to allow GPS data to be made freely available for civilian use in the 1980s, which he said had “kick-started a multi-billion dollar proliferation of digital goods and services”’.

He got the importance of this data being open. Not that surprising when you know that his parents ran a company that built “software that allows you to type your postcode into the internet and bring up your address”.

Government is building a common language about addresses.

Government is exploring the options as openly as possible. They are sharing their research into topics such as the need and complexity of address matching. and the need for a common language for addresses. They are trialling technology approaches, you can see the source code for yourself: it’s open. And this all forms part of the bigger picture of building registers as infrastructure for the government-as-a-platform strategy. In fact just this week government announced an early version of an authoritative register for English local authorities.

Whilst not all of the work is in the open (remember, the arrangements for UK address data are complex commercially and legally) it is clear that many government organisations — such as the Cabinet Office, Ordnance Survey, BEIS and Treasury — are working together to explore the options and business case for an open register. Good ☺

Will the address wars ever end?

All of the above is what I said in the talk at the BCS addressing update seminar. At the end the audience debated some of the issues raised. The legal issues seemed to confuse some people — derived database rights are tricky. Eventually I was asked the most important question: will this new UK government initiative to create an open address register succeed?

The honest answer is “I don’t know” but I do trust the people working on it. They are good and there is clear political will to get this problem sorted. With good people and political support it’s possible to do hard things. I choose to be optimistic. I think they’ll succeed. Good ☺

The web of data is coming.

It is important for the UK that they do. We need to build for the future web of data.

Other countries recognise the value of data infrastructure that is as open as possible. The USA, Australia and France have all recently made strong moves to get their address data open.

Data infrastructure is a competitive advantage in the 21st century. We need to move on from old licensing and funding models that don’t make the best use of the qualities of the web and data.

Let’s build better data infrastructure that makes things better for everyone.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

Gov cats

In recent years the UK government has got into the habit of announcing that it has employed cats. Downing Street, the Foreign Office and the Treasury all have cats whilst the Cabinet Office are about to appoint one. An unusual habit for a government but, I suppose, life should be full of strangeness.

One afternoon I was feeling simultaneously bored and whimsical, a risky combination, so I spent 10 minutes building a UK gov cat register — a list of these cats — which I published on the web.

the cat register

The cat register is open data. Anyone can use it for any purpose. It is also open for contributions. Anyone can suggest changes and help improve it. Some people have done so already.

This week I created a dashboard for the cat register. That should have been relatively simple too but it took a little longer. Some of my skills are a bit rusty.

the cat dashboard

A list of cats that work for the UK government might seem like a silly joke – it was 🙂 – but it also gave me a chance to use, and give feedback on, some new tools developed by the Open Data Institute (ODI)’s Labs team.

Here’s what I did. It might help others publish some open data or build a dashboard. If you read it all you’ll also learn who Schrödinger’s gov cats are…

How I built cat register

I started off by pulling together some of the available data: names; the department the cats worked in; the dates when they started (or ended) their work; and social media accounts. Yes, UK government cats have social media accounts: both official and unofficial. The data was gathered into a spreadsheet application and saved as a CSV file.

I will shamefully admit that I did not think too much about the needs of potential users of the data. After all, this was a whimsical experiment which users would be able to help maintain if they wanted to be whimsical too. I also concluded that privacy would not be an issue as animals do not have rights under the General Data Protection Regulation. In less whimsical circumstances I would recommend completing a privacy assessment before publishing a dataset.

Octopub screen for adding a dataset

I used the ODI Labs’ Octopub tool to publish the CSV file. Octopub automatically creates an open data certificate and uses Github to store and publish the data with all of the functionality that provides.

After that step the data was accessible on the web, openly licensed to make it clear that people can use it and was open for collaboration so that people could help improve it. Do use the cat data, read how to submit some extra data or raise an issue if you want to.

This bit was easy. A dashboard was a little harder.

A minimum viable cat dashboard

To help with metrics and dashboards the Labs team have created Bothan: it brings you information in the form of a free platform for storing and publishing metrics as JSON or simple visualisations. This capability is built on top of another web tool, Heroku, that allows new applications to be quickly deployed to the web.

Bothan’s name is inspired by a pretty obscure line of dialogue about the many spies who died getting the plans for the death star in Return of the Jedi. I suspect the Labs team had many failures when building their tool…

The ODI’s lab teams have also built some sample code which can be copied and configured to present Bothan visualisations as a dashboard using Github Pages (another free tool).

Setting up a Bothan instance and reconfiguring an existing dashboard was relatively easy but automating the process of getting data, like the total number of cats, from the register into Bothan proved harder.

The team recommended Zapier, a web tool designed to help automate workflows. It’s less open than the other tools — I couldn’t easily share my config and the pricing plan seemed to scale fast — but it looked like it would do the job and help get even more cats on the web. The team have even integrated Bothan with Zapier to make it easy. Unfortunately I had to get to grips with the Python scripting language and my last foray into similar stuff was a while ago. Luckily there was help both on the web and in the office.

a bit of Zapier configuration which, to put it another way, says “if there’s a change to cat register, then run an algorithm and store the results in the Bothan metrics platform”

After getting the tech working I shared a couple of early drafts on twitter; got some feedback (at which point I learnt that Google had given me the wrong answer for the total number of cats in the UK (if only searching for data was as easy as searching for documents) and improved it to a point that I was happy to call it a minimum viable dashboard.

There is one bit of configuration and code looking for changes to the cat register and calculating new metrics for those values; whilst another bit is looking for changes to some official UK government data about cats. Everything runs automatically.

You will find a bit more detail and the code for the dashboard on Github. Feel free to suggest new features.

Peta is Schrödinger’s cat

Schrödinger’s cats

You might have noticed that the dashboard has an entry for “Schrödinger’s cats”. The reason for that is quite simple, just like the cat in Schrödinger’s famous experiment I could find no data that confirms whether some cats are alive or dead. I could make an educated assumption, after all one cat started duty in 1964…, but I thought it was worth leaving the status unclear. I simply left them marked“Inactive” and imagined the life of a retired UK government cat.

some cats from the swinging 60’s. Picture courtesy of National Archives via Wikipedia

Anyone who uses the data can make their own assumption about those cats whilst leaving it unclear might incentivise someone to help find the missing data and, perhaps, discover that an elderly cat from the swinging 60’s is still patrolling the corridors and clubs of Whitehall.

That incentivisation is interesting. A good register should, like any data infrastructure, be providing a foundation on which people can build services and find insights but a good dashboard should be incentivising behaviour in line with a particular goal or strategy. My goal was to get even more cats on the web. The register and dashboard was a way of getting other people to help me. Submit more cats.

Publish your own data or build your own dashboard

But enough of cats, for now. My whimsy also helped me explore a little bit of data publishing. Octopub, Bothan, Zapier and Python all turned out to be fairly easy to use so, if you fancy giving open data a go, why don’t you publish your own dataset or create your own dashboard?

You could start with a whimsical project (penguin register anyone?) or perhaps something more useful like this list of data science courses in Europe prepared as part of the ODI learning team’s work for the European Data Science Academy.

If the documentation for each of those tools doesn’t help you with a problem then there are plenty of people around to ask and, once you’ve learnt the answer, you can always suggest ways to improve the documentation and help the next person.

The hardest bit about publishing (cat) data is getting started. Tools like Octopub and Bothan are there to make it easy.

— — -

Update 21 April: since writing this blogpost I have done a bit more work on cat data, privacy and complexity.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

« Older posts

© 2023

Theme by Anders NorenUp ↑