Tag: Open Data (Page 1 of 3)

AI and the Committee for Standards in Public Life

The UK has a Committee for Standards in Public Life (CSPL). It advises the Prime Minister on ethical standards across the whole of public life in England (yes, only England — ethics must be a devolved matter).

A picture of some people by L S Lowry (via Flickr)

The committee is currently investigating Artificial Intelligence and whether the existing frameworks and regulations are sufficient to ensure that high standards of conduct are upheld as technologically assisted decision-making is adopted more widely across the public sector.

Big topic. After all AI is a range of techniques that uses people, mathematics, software and data to make guesses at the answer to things. It can help, and hinder, with lots of the huge array of things that the public sector does.

I represented the Open Data Institute (ODI) on a roundtable for this investigation. A couple of people have asked me what the roundtable was like and what I said. Here’s a quick blogpost.

Preparing for a roundtable

The ODI team get invited to lots of roundtables and events. We decide which ones to do and who does them based on a range of criteria. The invitation for this one went to our CEO, Jeni Tennison, she passed it to me to do. My goal was to help the committee, learn from what other attendees were saying, and test some of our ideas in front of this audience.

We did our usual preparation by sharing the questions around the team in the office and telling our network that we were going along to hear what advice they gave us. That technique provides a lot of input. It also helps me represent the ODI and the ODI’s network, rather than simply myself and my own views.

I summarised it down to a few key points to try and make, and then tried not to over-prepare. Over-preparation is the worst sin: it makes me sound even duller than normal.

Rounding a table

The roundtable itself was at Imperial College in London.

The setup was more informal and the committee was more friendly and asked more insightful questions than most similar things I’ve done. That was good. My background is technical and private sector — I previously spent 20 years working with telecoms operators building products, systems and networks — so I always worry that I’ll misunderstand or miscommunicate particular words or phrases. That would damage both me and the organisation I represent.

Anyway, I managed to get over versions of some of things that we’d prepared and/or that we regularly discuss in the office and that were relevant to how the roundtable took shape:

  • that there is little transparency over use of AI in the public sector and of how the UK government’s Data Ethics Framework is being used. I know that there is good and bad work being done, but mostly because I know some of the people doing it. How are the general public meant to know?
  • that we need to focus more on the people who design, build and buy AI services. Exploring what responsibility and accountability they should have and how we give them the space, time and money to design those services so that they support democracy, openness, transparency and accountability as well as being efficient and easy to use
  • that the current focus on ethical principles and AI principles do not seem to be having a useful effect. That instead we need to couple those top-down interventions with more bottom-up practical tools (like the framework or ODI’s Data Ethics Canvas) and more research into how the people designing, building or buying AI systems make decisions and what will influence them to comply with the law and think about the ethical implications of their actions
  • that control, distribution of benefits and harms, rights and responsibilities about AI models would be a useful area to explore
  • that eliminating bias is the wrong goal. Bias exists in our society, some of that bias becomes encoded in data and technology. AI relies on the past to predict the future, but the past might not reflect the present let alone the world we want. We should build systems that take us towards the future we want, and that can adapt as things change
  • that in a world which is increasingly online-first and where we risk the state disappearing behind a smartphone screen and automated decisions, that the principles of public life should be updated to put the need for humanity front and centre

I also learnt a lot from other attendees with some interesting things for myself and the team back in the office to chew over.

After the roundtable

A couple of weeks after the roundtable I was sent the transcript to review. The committee will publish that transcript openly — which is good and healthy. Attendees get to see the transcript first so they can suggest corrections to simple grammatical errors or transcription problems. That’s why I’m not commenting on or sharing what other people said.

It is important to review the transcript. There are sometimes errors. For example, in this transcript I was recorded as saying that my boss, Jeni, was “whiter than me” rather than “wiser than me”. I have no idea how I’d measure the former but I certainly know that she’s the latter. Some of the words and thoughts in this blogpost come from Jeni and others in the team like Olivier, Miranda, Renate, Jack &c &c &c.

Reading the transcript also helps me understand the difference between the clarity of my speech and the clarity of my writing. I’ve left most of my spoken errors in place. Just like the state we can’t only communicate in words and pictures that are sent through a computer. Most of us need to get better at speaking with humans.

The data wasteland is polluted

Part of the ODI’s theory of change

At the Open Data Institute we use a theory of change. It is one of the tools that we use internally to help us make decisions and externally to explain to people what we do and how we do it.

Our theory of change describes the farmland, oilfield and wasteland futures and helps us try to steer between the extremes of the oilfield and wasteland futures to get to the farmland.

The wasteland future emerges when there are unaddressed fears arising from legitimate concerns — such as who has access to data and how it might be used.

We frequently talk through the theory of change to explain what we do and how we do it. We try to provide pauses in the conversation to get other people to give their opinions. It helps people to think and learn for themselves. It helps us learn too. We hear what other people think happens in the wasteland future. How they think people and organisations will react to their fears being unaddressed.

Most of us the people we talk with think that the wasteland future has a lack of data. They realise that with a lack of trust then many people and organisations will reduce how much data they share. They imagine people refusing to use services because they don’t trust them, and that organisations similarly refuse to share data because they fear being punished. They think the data stops flowing.

A smaller group of people realise the wasteland is more complex and weird. People’s behaviour will change in many different ways. Humans are fun like that.

Some people might post inaccurate data. Perhaps you will post fake claims of jogging exploits to social media if it is the only way to get a fair life insurance deal. Other people will hide in the data. Maybe we will give our children common names so they are hard to identify or so they appear to be from an ethnic group that is not discriminated against.

Similarly businesses will feel the need to create fake data. Organisations that fear that their supply chain data is being captured and used unfairly by their competitors might start to create ever more complex corporate structures to hide the data. Obviously reducing the chance of this unfair behaviour will also make it harder for regulators and civil society to know if a business is acting fairly.

I’m sure that even if you hadn’t thought of them at first you can now think of many more things that happen in the wasteland future.

You can see some of this future now. There are already people and organiastion hiding in the flows of data. Some of those people need and deserve help to hide because they have a genuine fear of harm, perhaps due to their political beliefs, ethnicity or sexuality. Equally there are others who are trying to evade fair scrutiny, for example tax dodgers and other criminals, and organisations providing services to help them do so. But if we increasingly fear harm then more people will want and need these services and, inevitably, they will become ever cheaper and used by more of us.

As this behaviour becomes widespread we will see data that is massively biased and misleading. People and organisations that use data-enabled services to tackle global challenges such as global warming, to price a life insurance premium in a way that doesn’t unfairly discriminate, or to decide whether or not to take a job will struggle. That would not be good for any of us.

Navigating the a route between the wasteland future and a different future where we get more economic and social value from data will not be easy. There will always be some people who need to pollute and hide in data to protect themselves from harm, we need to allow that to happen. Understanding and addressing people’s fears is not only a technical challenge, it is also a social and political one. To retain trust we need businesses and governments to adapt to people’s ever-changing expectations in a range of cultural contexts.

An increasing fear of how data is used will not simply stop people using services or sharing data, it will change peoples behaviour in a range of ways. If that happens we can expect data to be increasingly poor quality, biased and misleading. And that pollution will make data less useful to help people, communities and organisations make decisions that hold the potential to improve all of our lives. Some of that potential is false — the use of data required is too scary and people do not want or need it — but that is why it is important to understand and address the concerns we can if societies are to navigate towards the farmland.

You can read more about the ODI’s strategy and theory of change on our site.

You don’t control your Facebook posts, the reasons why are more complex than you might think

[facebook url=”https://www.facebook.com/FacebookUK/videos/1635229329867267/” /]

It told me that my “photos and posts” belong to me and that “[Facebook] won’t use them without [my] permission”.

The same advert has appeared in the feed of friends and work colleagues based in the UK. It seems to be part of a campaign. Perhaps the campaign is related to the imminent European Union’s General Data Protection Regulation and the growing public awareness that there is debate around data, how it is used, and whether to trust those uses.

There is a similar message in Facebook’s terms and conditions saying:

“You own all of the content and information you post on Facebook, and you can control how it is shared through your privacy and application settings”.

Both messages are simplistic, at best. I don’t fully own or control the content I post on Facebook. It doesn’t only belong to or affect me. By over-simplifying its messaging Facebook, like many other organisations, is missing the chance to help explain how its services work and help us all make better decisions when sharing content.

Social media content is more complex than you might think

This will sound counter-intuitive to many. I mean shouldn’t I have control over my data on Facebook? It’s about me! I created it!!

Don’t be silly. Data ‘ownership’ is not as straightforward as it sounds. Most of my content on Facebook is not only about me. It is about other people too.

These people are not my friends. They are from a film called Peter’s Friends. But it shows some people in a picture they may regret in later life.

My list of friends is a list of relationships with other people, people tag someone in a post saying that they went to a restaurant or pub with them, or share a picture or comment about a group of friends.

Most of us will think about our friend’s feelings when sharing content about them on social media, but we don’t always know what will be important to them. The rules aren’t written down. Many of us will have had the experience of sharing something and then having a friend say “hi, do you mind deleting that post because of X…”.

Sometimes we listen to those objections and sometimes we don’t. Our friends might not be able to delete our Facebook content without our consent but their views are part of the complex set of things we think about when posting. They can unfriend us in real-life as well as on social media.

Adverse impact on other people

Beyond affecting a personal relationship there are many types of adverse impact that a Facebook post might have. Affecting copyright owners is one. Copyright has many many flaws but it is one of the ways societies help creators benefit from their work.

A picture by a famous artist, Mr and Mrs Clark and Percy. Image used under fair use. Copyright David Hockney.

If I did own all the content I posted on Facebook then presumably I could post a picture created by someone else and start to make money off it by selling things. Money that could have gone to the artist.

I could, but I shouldn’t.

Both Facebook and I recognise that we need to abide by copyright legislation and that governments help enforce it. A copyright holder can complain directly to Facebook, or through the relevant national or international rules. The content is not mine to own to control and use how I wish. If I breach copyright in a way that unfairly impacts creators then fewer nice things get created. That would be bad.

Germany recently passed a new law stating that social media platforms have to take down hate speech within 1–7 days or face large fines.

Going deeper into adverse impact it could be that someone on Facebook posts something with the intent of causing harm.

To give just a few examples the content might libel someone, use hate speech, endorse terrorism, or use a sexual image of someone without their consent.

Facebook is a global service, and the legislation and definitions of those things will change from country to country, but in many countries those things would be illegal. A poster would lose control of the content, and perhaps even their liberty, as democratic governments use the powers given to them by people to stop the content from being seen and shared.

Facebook has its own moderation rules and tools that allow Facebook’s moderators to intervene proactively or for people to report content and get it removed. Again, that removal can happen without the poster’s consent. The poster is not in control.

Not all of the adverse impacts that moderation rules try to prevent are illegal and intentional. Others are unethical, or against social norms for a particular community or society. Moderation exists because the adverse impact from my posts might damage the health and goals of a community.

Both sassy socialist memes, with 1 millions followers, and sassy libertarian memes, with 200 followers, are real Facebook groups.

Moderation is not only done by Facebook and governments. Many community groups within Facebook have their own moderators and policies. Group moderators can also remove content without a poster’s consent.

Perhaps the moderators of sassy socialist memes or sassy libertarian memes will remove content I post in their groups if my content just ain’t sassy enough. The local Facebook group for the town I live in, like many other local Facebook groups, certainly has a fierce response to excessive advertising or outsiders criticising the town.

Other people can benefit from content

Shifting to a more positive, and less sassy, note people should also be aware of other people who can benefit from content they post. As the Financial Times recently noted “an explosion of [trustworthy data, such as that posted on Facebook] would give us the capability to understand our world in far more detail than ever before”. Facebook shares some of the data you post already so that other people can benefit, I think it should do more.

OpenStreetMap’s data is freely available as open data and used by governments, businesses, communities and indivudals all over the world.

For example, Facebook users help maintain data about things like cafes, restaurants and leisure centres. We don’t only need this type of data in Facebook, we need it in many other parts of our lives, so Facebook have been exploring how to share data with the community-maintained OpenStreetMap. That will help everyone using the thousands of services that use OpenStreetMap. The Facebook users are not in control of this flow of data but they, and many other people, will benefit.

In other sectors rather than downloading data I can give a third party that I trust the right to access it

In other contexts then Facebook users might want to share content that they post with a third party that they trust.

The EU’s General Data Protection Regulations strengthens this want to a right, although it is a right with limitations.

I might decide to do this so that it benefits my local community, for example helping local government understand feelings on a particular topic, to help deliver another service I want to receive, for example by asking my friends if they want to join me on a a new photo-sharing service, or to help me learn things about my own behaviour and habits.

Unfortunately despite Facebook telling me that I can control how data is shared I can’t easily share that data with third parties.

Facebook allows people to download data they post, but it is not in a standard format and I can’t simply give another organisation that I trust the right to access it to the same extent that, say, the UK banking sector is starting to do.

The UK’s banking sector is expecting to see increased competition and new services as a result of making it easier for people to share data. Perhaps social media firms and the people who use their services would benefit from a similar collaborative effort to determine how to safely share data, which mostly includes other people, without creating adverse impacts.

It is good that Facebook is starting to share data to create benefits outside of their own service. They should do more of it by sharing carefully anonymised data openly, more sensitive data in secure conditions with researchers working for the public good, and by giving people ways to safely share data that they post with third parties that they trust.

Explaining this stuff is hard, but it is necessary

This stuff is complex and can be hard to explain in an accessible way, but it is necessary to understand the complexity before trying to make it simple.

Like many other types of content and data, Facebook posts and photos can be about more than one person. The content can create adverse impacts for those other people but it can also create benefits too. Because of this, users are not fully in control of the content they post, and they certainly don’t own it in the same way that we might own a house or car. Instead civil society, governments and service providers need to work together to design ways to help give people more control and to maximise the social and economic benefits, while minimising the adverse impacts.

Over-simplifying this necessary complexity risks us slipping into a world where instead individuals fully control the data that they create. That is the world that Facebook’s ad is describing to many people. How silly. That world will reduce the benefits and increase the risk of harms.

We don’t need more lengthy and unreadable terms and conditions but as the debate over data grows it would be helpful if major service providers like Facebook took greater responsibility in helping to create a more informed debate and helping people to make better decisions.

Open data and advocacy — EU datathon

Approximate words of the talk I gave at the EU datathon in November 2017.

Hi, I’m from the Open Data Institute, or ODI. I’ve been asked to do a quick talk before the next panel about “open data and advocacy”. I’ll keep it quick so you can get to the panel and the Q&A. Asking questions is much more fun than listening to a presentation 🙂

We’re a not-for-profit. We work globally, our headquarters are in the UK. We were founded 5 years ago by Sir Tim Berners-Lee, the inventor of the web, and Sir Nigel Shadbolt, an AI pioneer. Our mission is knowledge for everyone.

As you might have seen on the first slide it’s our 5th birthday this year. Yay us. So, I want to share a bit about what we’ve learned about advocacy and open data in that time.

First, let’s talk about open data. Open data is vital and incredibly important but if we only talk about and use open data then we can’t deliver our mission. Instead we work across the data spectrum.

the data spectrum

The data spectrum is about access. Who can get to data so they can use it or share it or etcetera. Some data should be kept closed within an organisation, like sales reports. Other data should be shared: the police need to be able to see your driving licence, medical records can help with research, twitter data can help us understand how social media is impacting our societies. Lots of data should be open like bus timetables, maps and addresses.

We need to talk about and use the full spectrum of data if we were to get more open data made available so that anyone can access, use and share it.

The second lesson is about goals. Sometimes it can feel to other people like the goal of the open data movement is only to publish more open data or to put data on portals. That’s the wrong goal.

We think, talk about and use open data as a tool.

A tool that we use to solve problems. Like finding a job that you enjoy, combatting corruption, finding your way around a city, responding to the threat of anti-microbial resistance, helping with house planning and building, or understanding the growth of new sectors and business models like the sharing economy (something we’re looking at in our new R&D programme).

The third lesson is about chance. Chance is great. Very unexpected things happen when you open up data. One of my personal favourites is that the UK government opened up radar data that was originally gathered for planning flood defences and people used it to discover both new places to grow wine and new Roman roads that criss-cross parts of the country. Fantastic. But that doesn’t always work.

We need more focus on creating impact by design. Looking for problems, working with people who are experts in tackling it and getting them the data they need. To move data to the right place on the spectrum. When we do that then chance can also happen, but we also have a much higher chance of impact.

We also learnt that we need to combat the very strange view that data is oil or coal or other types of fossil fuels. I can talk in economic theory about the different qualities of data and oil, but there’s a more important difference. It creates the wrong mentality. People fight over control of oil. They want to hoard it for themselves. They want to sell it for huge amounts of money.

Instead we need to turn data into infrastructure. It is already heading in that direction but we need to strengthen that momentum. Great infrastructure is boring, reliable and safe to use. It’s there when we need it. Data is decades away from being boring, trust me *pause for ironic, self-knowing laughter*, but that’s the direction to head in. Turning data from the public and private sectors into infrastructure that underpins every sector of our economy and societies.

And that infrastructure will be built on a foundation of datasets that are made available as open data, for anyone to access, use and share. That foundation of open data makes it easier to publish and use other data. It’s a powerful way of thinking.

So those lessons are some of the ways we learnt to think — about the full spectrum of data, about data as a tool, about impact by design, and about data as infrastructure. Those mental models have helped our advocacy.

But over the last five years we have also learnt some methods that work to create impact.

We’ve been working with whole sectors to help them use data.

The UK retail banking sector is opening up data about products, locations and cash machines and creating open APIs so that people can choose to share data held about them by banks with people that they trust. We hope it will make it easier for more people to create better services for bank customers. We’re talking to other countries on multiple continents about helping them to make the same change. GODAN (the Global Open Data for Agriculture & Nutrition) initiative that we work with is working globally to open agriculture data to solve problems.

OpenActive is opening up sport data to make people more physically active. Places that offer a whole range of sports: football, squash, badminton, table tennis, running are opening up data and they’re also building an ecosystem of organisations that will use that data to make it easier for more people to play the sports they love.

There are more sectors, like transport, coming together as they start to see the power of working together to solve common problems. We need to encourage sectors to understand and unlock the value of open data by focussing on infrastructure, skills and open innovation.

We’re launching a report next week on the grocery retail sector and GDPR based on consumer research, sector interviews and our thinking about sectors. We want to encourage the retail sector to work together to focus on opportunities, and to use the data they hold in ways that builds trust in shoppers and gives them better services.

As well as sector programmes we work on practical advocacy. Here’s two examples.

  • A set of design patterns for policymakers that use data to help them create impact. While data policy people know data, many other policymakers don’t. We need to reach them and put data into their context, in language they understand and tackling problems they need to solve.
  • A data ethics canvas to help organisations using data understand, openly debate and decide on ethical issues about collecting, sharing and using data. Interestingly when we looked at data ethics we found that most of the debate was about personal data in the closed and shared parts of the data spectrum. People had missed the ethical issues around open data.

We’ve also been working on networks. Peer networks are horizontal organisational structures with members who share similar identities, circumstances or contexts. We run global, African and European peer networks for open data and have seen their power in developing learnings and creating change. We’re learnt from how they have grown and how the people in them interact.

We’ve been seeing peer networks start to emerge in other work they do. Things like ODINE (open data incubator Europe), Datapitch (another Europe-wide startup incubator), and the sector programmes.

We believe that fostering other peer networks: in sectors, in particular disciplines (like policy), or in particular geographies will help build a better future faster. We’ve published a method report that we, or others, can use to do that.

Oh and finally, there’s another vital method. Having fun. Sometimes it can feel like things are moving slowly or in a bad direction and that things will never get better. But just as open is a political statement, we should also be aware that optimism is a political act. Having fun helps me be optimistic. Choosing to be optimistic both helps the day go faster and helps create a better future.

Thank you. I hope this talk and the rest of the event is both fun and useful.

Learning from historical waves

As I’ve been starting to get to grips with technology policy over the last few years one of the things that has fascinated me is how little reference to history there is. When I read historical books and talk to people about technology and innovation history I find some frequent gaps. We need to learn from history if we are to make the best of the opportunity created by the current waves of innovation and technology.

Whatsapp and Columbus

The Landing of Columbus by John Vanderlyn

For example, people talking about the wonders of technology talk about how few staff WhatsApp had when they were bought by Facebook, yet don’t talk about how few people sailed in the Niña, the Pinta, and the Santa Maria when Columbus sailed across the Atlantic. After Columbus’ expedition more and more people crossed the Atlantic, for exploration, for business and for pleasure.

WhatsApp’s success built on the internet, the web, cryptography and smartphones. Similarly Columbus relied on inventions in navigation and shipbuilding. Neither could have achieved what they did without those previous inventions. Are they analogous?

Learning lessons from history

Recently I read a couple of books that helped me sort out some of my thinking about lessons from previous waves of technology-driven change. The books were Ruling The Waves by Deborah L. Spar and The Master Switch by Tim Wu. They are good books. If you’re interested in technology policy you should read them too. I’ll lend you my copies if you want.

Ruling The Waves uses ocean sailing, telegraph, radio, satellite television, cryptography, personal computer operating systems and digital music to explore innovation. It proposes that they show four common phases: innovation, commercialisation, creative anarchy and rules. Different actors dominate in each those phases.

There are piratical adventures in the early years before the surviving, and now dominant, winners encourage government to work with them to bring order to the new technology. Using the model of this book would show that my silly Whatsapp/Columbus analogy is fatally flawed. Columbus was in the innovation phase, Whatsapp (and other messaging services) are in either the creative anarchy or rules phase. They’re very different kinds of innovators.

Ruling the Waves argues that the eventual rules tend to be dominated by intellectual and property rights. It shows that it can take decades, or even centuries, from innovation until stable rules are in place.

The Master Switch uses the Greek myth of the titan Kronos devouring his children as an analogy for existing monopolies devouring startups. This is Goya’s verion of that myth, using the titan’s Roman name of Saturn.

The Master Switch looks at lessons from the telephone, radio, broadcast and cable television, and Apple to propose that all information technologies go through a cycle of decentralisation to centralisation ending with a corporate (or state) monopoly where innovation, the economy and consumers suffer.

It argues that a separation principle can help prevent this fate.

This principle would keep a distance between young industries and existing monopolies to enable new technologies to show their worth; between different markets to make it harder for monopolies to spread; and between the public and private sectors to prevent government from favouring friendly monopolies.

After reading the books I was more convinced than ever that the waves of change bought about by the internet and web will take decades, if not centuries, to be absorbed into our societies. It is seductive but false to think that we can legislate for technology and data quickly. We have to allow for experiments to learn the right legislative and regulatory frameworks.

Gaps in the lessons

But there were gaps in the books. That’s not unique. I see the same gaps in lots of technology policy and thinking.

Despite the best efforts of Victorian inventors the vast majority of dinner tables do not yet feature a minature railway delivering food to bearded men. Picture from Victorian Inventions by Leonard de Vries

Major enabling waves of technology like the internet and web underpin lots of other innovation — like smartphones, social media and search engines—that each have their own journeys to go through. Some of these smaller waves will have lasting impact, some may disappear and get washed away, others are badly timed and will come back in a while. But the waves don’t stop. They are continuous. That is one of the reasons why open culture is so important. It keeps us open to innovation, new ideas and challenges from outside of a small circle of friends and organisations.

Both books miss the impact of data in the current period of change and that much of this data is personal data. It is data about you, me and billions of other people. Most data is about interactions between people, or between people and organisations staffed by other people. It is difficult, if not impossible, to determine who ‘owns’ data. For most data there will be multiple people and organisations who have rights. This makes it hard to rely on property rights as a way to shape and bring rules to the market. The challenge of building good governance for data infrastructure will need a more systemic response than property rights.

There’s a whole world of innovation out there. (Gall-Peters projection, image by Strebe CC-BY-SA 3.0)

The books also focus on the US and UK, with some excursions into mainland Europe. While they describe the differences between European and US approaches to regulation, with Europe typically intervening more, I would love to see more about the lessons learned by other countries. The web, the internet and data infrastructure cross, and therefore soften, national boundaries. Learning from and listening to other countries and societies will become even more important as these waves of technology reach their full power. These excellent recent reports from the Web Foundation are useful for those in a US/UK filter bubble who want to start listening more widely.

Innovation has limits

And finally both books miss the influence of societies and people. They are books about economy, regulation and business. They miss the social side of the change.

Lots of the impact of technology is societal as well as economic. Similarly the forces that impact on and affect technology change are both societal and economic. People adapt to technology and innovation, but sometimes they push back and reject it. Those rejections can be learned from.

The innovations that led to Christopher Columbus crossing the Atlantic also led to industrialised slavery. Slavery might have helped create the modern world but it is an evil that should not have happened and should not still be happening. We could have intervened earlier and stronger to stop it. A modern world similar, but not the same as, our current one would still have been built. It would have taken longer but it would have damaged billions fewer people in the process. Our societal norms now reject slavery and many of the other things that that particular innovation enabled.

As our societies matured we embedded some of those societal norms and values into legislation. Human rights, worker’s rights, anti-discrimination, health and safety, and data protection are some obvious examples. They are strong signals from society indicating where innovation is encouraged and where it isn’t.

The precise rules will vary by country but while the boundaries of legislation will contain things that need to adapt as we learn how to do things better at the core of the legislation are societal norms and values. We cannot and should not forget our values as we go through this wave of change. Those values do change but that change should be vigorously and openly debated.

Something the team at the ODI say a lot.

Innovation can take strange paths and be used for unintended purposes. We need to engage and work openly with societies and people if we are to both understand the limits and share the benefits of the current waves of technology.

What does this have to do with my job?

Over the last couple of years I’ve been working at the Open Data Institute where I spend about 50% of my time working with the private and public sectors delivering projects and building services. We help businesses and governments understand and adapt to the wave of change being bought about by data. The other 50% of my time is spent developing our policy thinking based on what I and the rest of the team and network learnt from delivery and research.

In that second half of my time one of the many things I’ve been helping on is developing a line of thinking that data is becoming a new form of infrastructure. That a data infrastructure which is as open as possible is one that will create the most impact and be best for people, businesses, societies and the planet and that we need to build an open future for data.

Clearly data is not “good” infrastructure right now, too many people can’t get the data that they need, so we think a lot about how governments and businesses can help strengthen it. We look at history when we do that. This is all part of my research. How did we recognise things becoming infrastructure in the past? How did we learn how to design and build good infrastructure? How long did it take? Do historical examples contain useful lessons?

What should I read next?

Anyway, like all of my blogs, I’m thinking out loud. These are some of the things my recent work and reading about history has made me think about. The gaps in the last two books led me to pick a book on the anthropology of roads as my next one. What should I read or who should I talk to after that?

An example I use when talking about data and services

In my job at the Open Data Institute I sometimes talk with people, from businesses and governments, about how better use of data can help them design and deliver better services. I’ve been using a public sector example recently that I’ve not written down. Here it is.

Ways to get bus timetable data to people who need it

The example I use is bus timetables. People need to know the times and routes of buses so they can make a journey and get to their destination. When I use the example I talk through four of the patterns that can be seen in many cities and towns around the world for services that get bus timetable data to people who need it.

  1. Mass market private sector services: many cities and towns now have bus timetables available as open data. Private sector services like Google Maps, Apple Maps and CityMapper pick up this data and build it into a service which they aim at the mass market of smartphone users. The services work in many cities and might haveother features such as information about restaurants and pubs. They get their open bus timetable data either directly or through a data aggregator, like TransportAPI or ITOWorld, who collate data from multiple cities / transport providers. That takes aways some of the effort from using open data and makes it easier for more people to build services.
  2. Targeted private/public sector services: smart cities and towns recognise that the mass market services don’t always meet all needs, particularly accessibility. If you look closely you can often find small bits of public services meeting the needs of some users, or a transport authority running a challenge to help focus the private sector market on meeting particular user needs. Left to its own devices the private sector might only target the profitable and easy-to-serve mass market, a challenge can help change that to build more accessible services or to experiment with new technologies like AI or voice interfaces. Targeted services often use the same data aggregators as the mass market services. It’s the same data, just presented for a different set of user needs.

A bus stop outside Picaddily Station in Manchester

3. LocalBusTimes: a local website and/or smartphone app where people can look up the timetables for a journey they want to make. It might be for a whole town or a single bus company. It probably started by only providing bus timetable data, nowadays I think more of them recommend a route. The local authority or bus company typically run the LocalBusTimes service themselves.

4. Physical services: not everyone has or uses a smartphone when they need bus timetable data. There are many reasons for this. To give just a few: there might be no coverage, they might not be able to afford a smartphone, they might have run out of credit/data, they might not want a smartphone, their city might not have made bus timetable data available or they might simply have run out of battery. That’s why bus stations have information desks, why bus stops have timetables printed and stuck to them and why people ask other people “when’s the next bus?” on the street. Someone has used the bus timetable data as part of the design for the bus stop or as part of designing an operational process to help a human answer another human’s questions.

Some of the reactions I get to my example

No one, yet…, has told me that my example is stupid or dull. Feel free to be first to do that.

When I talk through this example with people the usual reaction is that while lots of people knew about the transport sector and data few people had thought of all the patterns or wondered about how they could be applied to their work in another sector.

Most people had used the mass market services but very few people had thought of using the market, in this case through open data and challenges, to help them meet their own goals. Those that had thought that they risked losing control to the market and hadn’t realised that they could still discover if user needs were being met — for example through user research — and could use a variety of ways to shape the market to target unmet needs. Challenges are just one of the ways to do that. Governments can legislate. Both businesses and governments can use procurement, strike deals, make different types of data more open, either fully open or in a more controlled way through APIs, or lots of other forms of soft power to shape the market around them.

I also find that few people had thought of the physical services pattern as part of the overall service. I find that sad. It also shows that I’m in a bit of a bubble and exposed to only some views. The tech world is overly focussed on services that end in smartphones and websites. I expect/hope that’s a passing phase.

Why I’m writing this down now

I’m writing this down now because I’ve been using the example for a while. It’s good to publish it to get my thinking straight, to show some of the reactions I get and to learn from new reactions. As I often say, data is becoming infrastructure that will be as open as possible. Businesses and governemnts need to adapt to that future. They have different goals, and needs for democratic accountability, but can learn from and collaborate with each other. I’m expecting to do some more work on public sector service delivery models over the next few months. It’s good to share, even shoddy, thinking early. It’ll help make that work better.

Open your effing data

Warning: this post contains content that will be offensive to some people.

The post is a version of talk I gave at the ODIFridays series of lectures at the HQ of the Open Data Institute in London. The slides and a video of the talk are at the end of the post. Like most of my talks I adlibbed a bit. The post has links to most of the material I adlibbed from, others are at the end of the slides. It includes some thoughts on swearwords, Roger Mellie, democracy, censorship, Blackpool FC, artificial intelligence, context and an apology to my mum.

One of the UK’s regulators, Ofcom, commissioned research on offensive language last year. The research got lots of headlines. It was a nice opportunity for papers and websites to make cheap gags about swear words.

A report from the Metro on the publication of the report.

But it also gave me an opportunity to open up some swear word data and to use that example to talk with people and think about things like democracy, censorship, context and artificial intelligence. I made some cheap gags about swear words too.

Data needs context

Ofcom published the research in an openly licensed 126-page document and a 15-page quick reference guide.

from the report that Ipsos Mori did for Ofcom

The newspapers extracted the data from the PDF to write their stories. I extracted the data too. (btw some work that our friends at ODI Leeds and Adobe are doing might make my cut and pasting easier in the future…)

Unfortunately at first I missed the all important context for the data. I discovered the mistake by checking my data with the helpful team at Ofcom.

Take a look at the data or if you want to use it in a project or service there’s a CSV in github.

After some discussion within the ODI and with Ofcom’s research team we ended up with this. The same data as the PDF but in a format that is both human and machine readable.

Now, a big part of our job at the Open Data Institute is “getting data to people who need it”. Normally I start with problems but this time I had started with data. My bad. Now to find out who needed it and how they would use it.

Some of the things people use this swear word data for

As I put the data out on twitter there was a background mantra of “arse…balls….knob…bastard…” from around the office. One person then wrote a little script that people could use to get their computers to say the list of words. Soon I could hear both human and machine voices swearing away. The swearing mantra was charming, if a little unsettling, but I had my serious face on. Why do people swear?

Well a bit of research showed an academic saying:

The main purpose of swearing is to express emotions, especially anger and frustration.

Seems fair. I suspect that a lot of people get frustrated at not being able to get data they need to do something. That explained the background mantra from the Open Data Institute office, but what about other uses of the data?

Roger Mellie, copyright Viz. Note that the swear word data might allow people to block his language, but not his gestures.

The content of the report told us about some other users. It would help TV broadcasters and presenters understand how people would react to things that they said on air and so help the presenters decide what they could say.

For example the word “bollocks” was seen as somewhat vulgar if it referred to testicles but less problematic if it was being used to call something ‘nonsense’.

This might mean that people did or did not say words in certain contexts. It might lead to some content only being accessible if a PIN was entered to unlock it.

This data was created because of democracy

Democratic processes can need data to be created. Image Nick Youngson, CC-BY-SA-3.0 via http://thebluediamondgallery.com/d/democracy.html

But the biggest user of the report is Ofcom themselves. Ofcom commissioned the research because through our democratic processes we have decided that there are limits to free speech on TV & radio and made it Ofcom’s job to regulate those limits. They needed the data to help with this job so Ofcom commissioned Ipsos MORI to produce the data by performing user research through focus groups, interviews and follow-ups based on a long list of potentially offensive words and phrases.

We have given Ofcom the power to fine organisations and people that breach their codes. By publishing the report openly, they were helping broadcasters understand how they might use those powers and therefore discouraging breaches. This probably makes the system cheaper and more effective.

Broadcasters are likely to have their own guidance to help them meet the expectations of their target audiences. They could merge Ofcom’s list with their own list to help them meet both society’s needs and their own user’s needs.

Similar data is maintained in contexts outside of TV and radio

In Britain Mary Whitehouse was a famous campaigner from the 1960s to the 1980s against things that she found offensive. I can imagine Mary being keen on data-driven censorship. Image fair use via Wikipedia.

The data includes the word ginger saying it is ‘mild language, generally of little concern’, but the word ginger can also be used to describe a very tasty type of biscuit. A filter that used the swear word data to block offensive words might ban ginger nuts. That would be bad. This is a common problem with simple data-driven solutions. They ignore context.

I couldn’t find a list of offensive biscuit names but there are other sets that are similar to the swear word data used in contexts other than TV and radio.

The UK has a list of suppressed car registration plates

It is the job of part of the UK government, the DVLA, to maintain a list of combinations of letters and numbers that you cannot put on a car. Unfortunately, and curiously, the list is not published openly, but sometimes it is made available after freedom of information requests.

An extract from the suppressed car registration plate list via Whatdotheyknow

The list of suppressed car registration plates helps prevent confusion over typographically similar symbols, like o (zero) and 0 (oh). It blocks language that is likely to be considered offensive, for example “*B** UMS” and “*R**APE**”.

The list also explicitly contains the names of terrorist groups such as the UVF, UDA and UFF. Another terrorist organisation, the IRA, are already banned, like any other organisation beginning with I, because of the potential for confusion between 1 (one) and I (aye).

More controversially the acronym for the far-right British National Party, BNP, is also on the list. The BNP are allowed to stand in the UK’s democratic election process. How was that decision made? Unfortunately just as the list isn’t publicly available neither is the methodology.

Context affects what words are offensive

The UK’s democratic processes produce others lists of offensive words.

The speaker in the UK’s parliament can request that politicians withdraw words when debating with their opponents, so called unparliamentary language. The way in which words are deemed to be unparliamentary or not are unclear. In 2015 the opposition leader Ed Milliband was allowed to call the then Prime Minister David Cameron “dodgy”, yet in 2016 an opposition backbencher Dennis Skinner was asked to leave a debate because he called David Cameron “dodgy Dave”. The word “dodgy” isn’t on Ofcom’s list, it’s offensive to call an MP “dodgy” in a parliamentary debate but not to call them it on television.

The list of unparliamentary langauge is currently unpublished. To help UK politicians make better decisions about being unparliamentary or not I compiled some examples into a list. Parliaments in other countries, and other UK nations, have similar lists. They show the importance of geographic context.

The Australian parliamentary records show offense was taken against the term “suck-holing”, a word that in 1977 was decided to be offensive in the Australian parliament but that will be meaningless to most British people and has never been used in the British parliament. I wonder if a British MP would get away with using it.

The word “Oyston” is offensive to me and my community of fans of Blackpool football club. The offensiveness is not only because of this cringeworthy picture but because of how the Oyston family treats fans.

Another example of offensive language in a particular context is the word “Oyston”.

The Oyston family own the football club that I support, Blackpool FC. Because of their actions against fans being called an Oyston fan on one of the websites used by Blackpool fans would be offensive. How would anyone outside of the community of Blackpool fans discover this?

There are related examples that may help us understand how we could do this.

Collaborative maintenance of data

Hatebase maintains a list of hate speech from around the world. The data is maintained by automated processes and manual interaction to cater for how hate speech changes over time and in different places. Hate speech can be used to encourage violence against people and communities. The collaborative maintenance process allows people to debate which words are hate speech or not.

“popular” types of hate speech from Hatebase.

An interesting experiment would be to see if the hatebase dataset could have helped predict violent events through rises of hate speech in parliaments, newspaper and social media. Do get in touch with them if you have money to fund that research.

Other people could learn from the example of Hatebase. If British politicians wanted, and could get to grips with github, then they could collaboratively maintain my initial list of unparliamentary language and create something that would help them understand the boundaries of offensiveness.

Offensiveness is affected by time, place and communities

Rebecca Roache in Aeon magazine.

By this point in my own research I was clear that the context of offensiveness is affected by time, place and communities.

When I checked I found that swearing philosophers were, of course, already aware of this. As often happens I was a technologist rediscovering ground that others had already covered. But technology can also affect how and which words become offensive.

People create new offensive words

Oyston is an example of a word that became offensive to a small group of people before becoming offensive to a larger group. Blackpool fans have effectively used social media and the press — oh, and talks & blogposts like this ;) — as part of a campaign to get the Oyston family out of our football club. An effect of this has been to spread the understanding of the offensiveness of the Oystons from the seaside to wider parts of the footballing community. A more famous example is the case of Rick Santorum who found his surname defined as an offensive word in a campaign led by Dan Savage.

This is a challenge to any list of swear words and a risk for people who use them. People create new offensive words for their own purposes. They game systems.

A t-shirt with the universally unique identifier for beef curtains.

Would people game the swear word data I created from Ofcom’s list? Yes, of course they would.

An example quickly came to mind. When I published the Ofcom offensive word list as open data then in line with good practice I gave every entry a universally unique identifier (UUID). UUIDs make it easier for machines to use the data.

If this data was to get widely used then how long would it be before people started to circumvent the system by being interviewed on telly wearing t-shirts with the UUID of a swear word? Perhaps over time the UUIDs, or parts of them, would become offensive? “That fella’s a right 81cb.“, they’d say. Maybe the UUIDs would need to be added to the list as they became offensive?

People adapt and change. That is one of the best things about people and one of the biggest challenges we face when maintaining and using data. We need to build in mechanisms to change datasets over time as needs and uses change.

Swear words-as-a-service is hard

It is clear that swear word data was easy to build and also clear that it would be more difficult to maintain and make it useful in multiple contexts.

I knew that many companies were already maintaining similar lists as, like many other people, I had seen, laughed and evaded filters on websites that had turned the British town of Scunthorpe into the apparently inoffensive “S***horpe” due to simplistic and bad data-driven algorithms. I do wonder how useful those filters and services are.

Many of the website filters I had seen are simple and flawed because of the lack of context and their inability to adapt to people’s changing behaviour but thinking ahead I wondered if people would start to apply machine learning / artificial intelligence (ML/AI) and create services that could automatically learn new swear words? Perhaps this could be used on a massive scale to reduce the damage caused by offensive language on the web?

A couple of snippets from this patent

I knew that I wouldn’t be the first person to think of this idea. While 2016 had been the year when every problem could be fixed with a blockchain, 2017 is the year of ML/AI.

A quick search of patent libraries showed that in 2015 Google had registered a patent to classify offensive words using machine learning. Unfortunately it looks rubbish. The training mechanism worked on a large set of text samples, it failed to recognise the context in which the text was being used. The resulting service might be slightly better than current filters but would still be data-driven rather than informed by data.

Maybe, like Hatebase, it would help if users were to train the machines that provided the service. After all Google, like most other large internet companies, use thousands of people — including you — to help train their services. I started to consider what I had learn about offensive language and think of the tasks that Google would need to give to swear word raters to train their machine:

Task: go to a football ground in Gdansk, Poland. Play this video to people near you. Observe their attitude to you, and each other, over the following seven days and then categorise the offensiveness of the video. Repeat this exercise every 3 months.

Hmm… I quickly realised that this might be a Quixotic mission and that AI/ML might provide a better service but still only a partial one. There would be no perfect service. People decide what is offensive, not machines. If the service only considered some contexts then the people who controlled the machines and trained them on those contexts would be the ones who decided where it was useful. Swear word data isn’t like the location of bus stops or the list of transactions in a bank account. The context is even more important.

This is one of the challenges of the web and providing data and services for it. The web is pervasive. It interacts with the physical world in many places. It appears in multiple contexts. I use the web to watch broadcast news, like that regulated by Ofcom. I use it keep up to date on politics, where the unparliamentary rules are useful. I talk about football, and the Oystons, on message boards. I keep up to date on current affairs, and feel helpless at the levels of hate speech deployed at people in the UK and abroad. I chat to friends, both publicly on sites like Twitter and Facebook and also privately in messaging applications.

Datasets and services that reduce offensive content on the web will need to cater for all of these different contexts, and more. Even if they do, some people will still work around them. Data and technology may be able to help the problem but it will only ever be part of a solution to something that is fundamentally a more human problem. Our need to express our emotions in language.

Sorry mum

It was clear from my investigations that we could usefully create data about swear words, i.e. words that are offensive. That the need for this data came from people who swear, people who didn’t want to swear and societies & communities trying to decide the boundaries between what was offensive or not. That it would be useful if the research and rules for deciding on what was offensive were open. And that if people could collaborate to decide on what was offensive that the data would be more useful because it would cater for more contexts. But it was also clear that while technology creates new possibilities to reduce offensiveness that people will still adapt to achieve the goal they want. So it goes.

The other thing that was clear from the talk was mine and my audience’s squeamishness with some of the words. In my case it was certainly because of one of my most important contexts: my upbringing and my family. I’d like to end this post the same way I ended the talk by apologising to my mum. Sorry mum.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

The questions from the audience showed the importance of context

At the end of the talk at the ODI the audience raised several points about offensive language that had not been covered in the talk, such as the use of racial and religious slurs. I was already covering a wide topic. Racial and religious offensiveness cover even more ground. I couldn’t cover everything.

Image from The Wanderers, based on a book by Richard Price. The film includes a fantastic scene in a 1960s New York school where people of different religions and ethnicity try, and fail, to remember all of the offensive names they have for each other.

I did find it interesting that the audience in the room hadn’t heard of some of the words in the list. Particularly choc ice, blood claat and bum claat, words that in my — white, middle class, mostly Northern England and South London experience — are used against black people or in black communities. In the case of the latter two more specifically within Jamaican communities.

That people hadn’t heard of these words says something about the context of the audience. A context where those words may not have been seen as offensive. Perhaps next time I talk on this topic I should try and sneak in some offensive language from different contexts to see what happens.

Watch the original talk or read the slides

If you want you can watch a recording of the talk (which includes some swear-a-long fun):

You can also see the presentation on slideshare or google slides, whichever your prefer.

Roman roads and data infrastructure

I occasionally walk around, wave my arms and proclaim:

data is infrastructure, just like roads

I alternately blame and praise the brilliant Jeni Tennison for this strange affliction. I praise Jeni for coming up with the wonderful analogy of roads for data, I blame her for infecting me with the bug of excitedly talking about it to anybody and everybody so that I can learn from what they think.

A clip from one of Jeni’s talk on data infrastructure.

I recently proclaimed that data was like roads to a friend who has a degree in classics and spent a career teaching in primary schools. She is very well-read.

My friend asked me if I thought that as a society we were well advanced in building our data infrastructure.

No, I said, it’s only been a few decades since the invention of the internet / web which led to the current massive growth in data, I suspect it will take a decade or two before we learn how to do data things well.

I think you’re right, she replied, after all the data infrastructure that you describe sounds a lot like the Roman roads and it took us a couple of millennia to start getting roads right.

Really? I said. Roman roads? That sounds interesting….

Roman roads were for the economy as well as the military

A Roman army in an Asterix comic. They will have tried, and failed, to conquer. Copyright René Goscinny and Albert Uderzo

Our usual vision of a Roman road is either a muddy field being dug up by a team of archaeologists or an army of Roman soldiers marching to try and conquer a new land. But Roman roads were used by other people too. They were an important component of the Roman economy.

People transported goods along them for trade and materials for building new houses. Books have been written about the impact of roads on Roman Egypt and Italy — they had sophisticated pricing models, integrated their road with other modes of transport and they evolved governance arrangements to manage the development of their roads.

But Roman roads were not only for armies and traders. They were also used to transport messages, taxes and people. Along the cursus publicus, or public way, there were mansios, or waystations.

Data is not really roads

Before I go further into this tale I should be clear that I don’t really think data is exactly like roads. It’s an analogy. All analogies are imperfect. But I do think data is becoming a new, strange and vital form of infrastructure for a 21st century society. It’s very important that we debate and learn how to get the best out of it.

A first edition of the first UK Highway Code by Mikey Ashworth, CC-BY-2.0

The analogy of roads helps break people out of the usual mindset when thinking about data. The frequent comparison with oil is particularly misplaced.

The analogy of roads is much more relevant. The importance of maintenance; the need for big, open roads between large towns and the value of smaller roads for villages; the dangers of toll roads and expensive or complicated licensing; and rulebooks for how to use the roads.

It’s a pretty decent analogy, as analogies go, but my friend had started talking about Roman roads.

Roman roads helped co-opt other economies

Mansio were set up along the roads. They were maintained by the Roman government and used by officials and armies. Officials from the government and their animals could sleep, get washed and get fed. Many other people could use the mansios too but they would have to pay for the privilege.

The money people paid would go to the upkeep of the mansios and to the running of the cursus publicus. The cursus publicus was a transportation system, both for people and for messages. Officials and their information would travel for free. Everyone else would have to pay. It was a massive toll road network set up across a range of nations with preferential access for one group of people.

Other people would pay because the Roman roads were so much better than the roads they could build themselves. There was no real competition: if you wanted to go from A to B you had to go Roman. As a result many of the mansio gradually grew into towns.

A Roman coin showing Marcus Aurelius. Copyright: CC-BY-SA 3.0 by Rasiel at English Wikipedia

The impact wasn’t just to preferentially improve the economy of one group of people, the Romans, and their towns but also to help impose Roman culture and standards by making people use their language and their currency. It is a myth that the width of our railways comes from Roman roads — that was due to a different bit of infrastructure, the railways that were invented in the North of England — but many European town names and locations still reflect their Roman origins.

After telling me the tale of Roman roads my friend turned to me and said: isn’t that what you just described? Aren’t Google, Microsoft, Amazon and those big government agencies a modern cursus publicus?

Oh, I said, yes they are.

What have the Romans ever done for us

As I noted earlier “data is roads” is just an analogy and IANARH (I am not a Roman historian) but the similarity of the Roman system to our current data infrastructure was both striking and reassuring.

The Roman road system was striking in its similarities, even down to people bemoaning what the road builders have done while using their roads, recognising that what they’ve done is actually very good and realising that in many cases it couldn’t have happened without them.

It was also reassuring. History is full of repeated patterns and perhaps the current stage of evolution of our data infrastructure is a necessary stage in a pattern that repeats when new infrastructure emerges.

We learnt that roads needed to be run as a system

Roman roads might have started off as a form of military and economic conquest but we gradually learnt more about the need for roads to be run as a system for the good of everyone in society. This took a while, as did our understanding of government’s role in making that happen. The case for this involvement evolved as we understood the decisions that needed to be made.

A thousand years after the fall of the Roman empire the UK decided that governments should take a stronger role in roads with the first Highways Act in the UK. 300 years later the Rebecca Riots against toll roads contributed to the gradual removal of charges and the transfer of responsibility to central and local government for maintaining most roads. Private roads, for example the path to your house or the bit of road to a local factory, were not transferred but governments make sure that we have a duty of care to visitors and workers.

The Rebecca Riots, courtesy Wikipedia and the Illustrated London News

The UK still builds some toll roads but, generally, they are on a lease. For example the M6 toll road near Birmingham will be a toll road for 53 years until the initial investment is paid back. Meanwhile in 1978 countries worked together to develop the Vienna Convention on road signs and signals to standardise rules of the road. Common standards that help with safety and make it easier for people in one country to drive to a location in another whether it’s for pleasure or business. And at this point we come full circle back to my road and data analogies which tells me that it’s time to stop…

But one final thought. Many of the major roads in European countries are still based on the old Roman ones. I wonder if in 2000 years our data infrastructure will still show signs of its 21st century origins and the decisions of the people who are building it now?

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

Cat data is complex, and that’s ok

Last year I openly published data about some of the cats that work for the UK government. I ended up giving a talk about it. When publishing the data and giving the talk I skipped over the potential data protection and privacy issues.

Why are you talking about my data?

Some of those potential issues came up again recently when our family cat, Bugsy, was being transferred to our new home. I was nervous about the cat arriving safe and on time. A friend asked:

can’t you publish some data showing the cat on his journey?

Such a short and simple question. This is my long and complex answer. Most of my friends are patient people.

This post might sound like it is going to be whimsical —ok, there will be some cat whimsy…— but there is a serious point. Publishing and thinking about cat data helped me think and talk about other data things with more people.

Thinking and talking about data protection, ownership and control for cat data will have the same effect. It is pretty important that more people know how complex they are.

This cat data deserves data protection

Different countries have their own data protection and privacy laws. Personal data can be hard to define but at the Open Data Institute we encourage people to look at relevant legislation and start by simply saying:

Data from which a person can be identified is personal data.

If data can be combined with other information to identify a person, that data will still be personal data.

If there is personal data in a dataset then we should consider relevant data protection legislation and the univeral human right of privacy.

At this point I expect that lots of people reading this post will be thinking that a cat is not a person so neither the personal data definition or human rights do not apply.

This is true but, like other animals, cats do have rights. Some people argue that pets are becoming people, in a legal sense, and that animals deserve democratic representation. Perhaps cats do not have data protection rights today but if that might change in the future then perhaps I need to worry about it today.

A cat called Paddington chasing its own tail. Picture by Bill Abbot, CC-BY-SA.

Whilst this would be a fascinating topic to explore unfortunately, to paraphrase a recent article by Luciano Floridi on the rights of robots and artificial intelligence, I’m in danger of chasing my own tail when I should be focussing on the current opportunities and challenges with data that affect people. People like me. Our cat wasn’t moving home in a few year’s time, he was moving now; and I was nervous.

There is a simple reason why I need to think about data protection if I was to publish this cat data. Whether cats realise it or not, their data can refer to people. My cat lives in the same house as me. If you knew the destination of its journey then you would know where I live. If you knew the date when it was being transferred to a new home then you might be able to guess that my old or new home is empty. Etcetera.

So if I was to publish data about Bugsy’s journey I would need to think about the impact on privacy using a methodology like the one provided by the UK’s Information Commissioner’s Office (ICO) before I published the data.

Ownership of cat data is complex

I occasionally hear people saying that defining a legal right to personal data ownership will make this process easy. My privacy, my data, my choice. I doubt my cat cares about human laws but, according to the law, I own him. So I might legally own data about my cat and would have the legal right to choose to publish it. Unfortunately data ownership is not that simple and nor is cat data.

How is my cat’s identity defined? Some cats have microchips, and Edinburgh University have even given a library card to a cat so it can prove its identity and demonstrate its entitlement to borrow books, but our cat just has a phone number on its collar. Is that sufficient?

Defining legal ownership of cats in data seems simple.

Meanwhile Bugsy is a family cat. He is owned by me and my wife. It might look like that joint ownership can easily be defined in data, but the world is more complex than my simple model. How is my identity and that of my wife defined? How would we verify our identities to say that we are allowed to track our cat on his journey? Identity management is hard.

And once we get past those issues I might find that my wife disagrees on how the cat’s data can be used. We both own and live at the same house that the cat is being transferred to. The data refers to both of us. My wife might think my nervousness is utterly ridiculous and not worth risking our privacy for. There have been several legal disputes over the ownership of pets. I don’t think it would calm my cat moving nerves if I was to take my wife to court over ownership of cat data.

Meanwhile we’re still missing something quite important. The cat isn’t travelling alone on his journey. He is being transported by an employee of a company. What about that company’s potential rights to own the data produced by their service? What about the cat transporter’s privacy?

Controlling cat data

At this point, when answering that simple question from a friend about publishing data about Bugsy’s journey to make me feel less nervous, I started to talk more about consent.

Data protection isn’t just for the online world. We also need to think about the offline world and the billions of people who don’t use computers.

Giving people choice and ongoing control over how you use their data is becoming more important. It’s one of Tim Berners-Lees three challenges for the web. Some trading blocks, like the EU, and individual nations, like the UK, have decided that it is necessary to put in place new legislation that strengthen people’s rights over data. Consent is not always necessary but the ICO recently published some draft guidance on consent under that new legislation which I could use to help publish cat data.

My wife knows quite a bit about data so could give informed consent which I could record. I could also ask the cat transporter and their employer if they were willing to consent. To be clear I would want to give the cat transporter the choice of saying no. A world where people who transport cats have less privacy than other people does not sound a sensible world.

Unfortunately given the impending journey I did not have time to think about or research the cat transporter’s needs and skills. The ICO’s guidance says that I can assume that “adults have the capacity to consent unless you have reason to believe the contrary”, and I knew how to be open about how I planned to use the data, but without more research I would not know how to design something so that the cat transporter could choose whether to consent, or not. I might mistakenly assume that an online only service was good enough, despite a large proportion of the UK population having no access to the internet or insufficient skills to use it. The cat transporter could be one of those people.

And all I would have achieved by this point was possibly gaining consent. I would not have given the cat transporter control over the data about their journey. With that control they could reuse the data for another purpose, such as reclaiming their petrol costs or seeing what cat data tells us about people moving house around the country. My wife, the cat transporter, their employer and I all had rights to the cat data and should all be able to have some control over its use.

Sometimes you need to keep things simple

At this point my wife and friend both firmly interrupted me and told me I was not being utterly ridiculous but being completely and utterly ridiculous. I was trying to design a perfect solution that would work for many cats and purposes, rather than keeping things simple and starting with a solution for a particular problem. My nervousness about our cat.

My wife rang the cat transportation company and asked them to text us a couple of times during the journey. They agreed, of course. Sensible wife.

Data is complex, and that’s ok

Now you might read all of this and ask:

if we have to think through all of this complexity everytime we’re thinking of publishing data how will we ever build anything?

The team at the Open Data Institute, where I work, do the hard work to try and make data as simple and easy as possible so more organisations can get data to people who need it.

That requires us to work on lots of things including how to publish data; how people will search for it; the skills they need; how to use it in organisations, large and small, or whole sectors; and how to get data to benefit everyone. Lots of other people do similar things.

But sometimes I wonder if we and other people can make it sound too easy.

So when we’re encouraging more people to do wonderful things with data then as well as the brilliant possibilities we also talk about the challenges using both real examples and whimsical ones like the ones I faced with my cat data. Whimsical tales sometimes help convey simple messages.

We can build a better future with data but we need to solve problems and be realistic about the complexity if we are to build one that works for people. Data is complex, and that’s ok.

Make data great again

Data is becoming increasingly important to our societies. We live in an age of data abundance and, without many of us realising, data has become a new type of infrastructure and a critical one at that. The age of data abundance has led to brilliant new services and can help our societies tackle challenges such as climate change and population growth, but it also creates risks to privacy and concentrations of power.

Societies need to be able to debate what this age of data abundance means for them. People need to make decisions about the relationship between individuals, communities, societies and data. We need to pick a future vision for our relationship with data and then make steps towards it. Many governments and societies are having this debate now.

In my job I put forward the Open Data Institute’s position on those decisions while also trying to encourage a more public debate. I want a debate because I, and the lovely people I work with, want the decision to be made by societies around the world.

To make this debate as broad and informed as possible, I need what I say to be understandable by as many people as possible. I try to use plain language and frequently test new language and concepts to see if they are understandable. Sometimes I test things through tweets or blogs, like this one, at other times by talking with people from differing backgrounds and perspectives.

By testing, listening and learning I have made some of the language more accessible but I’ve also realised that something was more important than I first thought: politics. Both my politics and that of others.

Let me try and explain.

Choices about data

Sometimes people say they want to help people make better choices about data. I did that a few times in this blog about an open future for data.

I was talking about the ideas in that blog with a left-wing British politican who stopped me mid-sentence and asked if I was a Blairite nowadays. No, I replied. “Then why are you using the language of Blair’s choice agenda?”, they asked.

image copyright the BBC. Taken from a blog stating that the comedy show Yes (Prime) Minister, was the most cunning political propaganda ever conceived

Further testing of the language caused another person to recoil and suggest that if I kept talking about choices I might be accused of being a secret Thatcherite pushing the theory of public choice. Hmm….

I’d used the word ‘choice’ because I thought it was plain language but it was clear that the decision risked putting in place a political barrier for some of the other ideas in the blog. This is a problem.

Data is political

When thinking about and debating technology and data with other technologists it can be easy to fall into a trap of thinking that every decision can be based on empirical evidence, that there is a single right answer and that we can make that right answer a reality by designing and building the right technology. This is nonsense.

In our debates about data we need to decide issues of access, ownership, regulation and the relationship between citizens and the state. These are political decisions.

Whilst we might have individual opinions about data we need a state and legal system to help put decisions into practice. States will allow technologists to innovate and try things out but there comes a time when existing legislation will be more strongly applied or new legislation will be put in place as society’s needs change. This happened and continues to happen with road traffic, it will happen with data.

By broadening the debate we are helping that decision to be made democratically. Democracy might have seemed under strain in some countries in 2016 but as Churchill said:

Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time

To put it more simply politics and democracy is important and data, as with most things, is political.

Words already carry political meaning

The “white heat of technology” makes me think of Harold Wilson and the 1960s UK Labour party. Because of my political history I have positive feelings about the phrase despite the speech being followed by the scrapping of several high-profile technology projects. Image copyright PA.

Words are a tool political people use to reach our hearts. Sometimes those words are a catchy slogan. At other times it’s a frame: a guiding metaphor or image for a political argument.

Political slogans and language are designed to appeal to a group of people, build on existing beliefs and make them choose a particular path.

Some words carry a particular meaning in the present because they have been used in a political context in the past. Marx said it more poetically:

The tradition of all dead generations weighs like a nightmare on the brains of the living.

The word “choice” resonated amongst some people involved in British politics that I spoke to because of those traditions and their political history. It will have bought back nightmares for some and heavenly dreams for others.

Data is not about left or right wing politics

In economic terms each of these cakes is rivalrous: only one person can eat them. Cake is not like data, multiple people can use data at the same time. Picture of cake by Hani AlYousif, CC BY-NC-ND 2.0

Our societies and political systems are used to making political decisions about many types of resources, for example oil or water, but data has different qualities to the physical resources that are embedded in our political systems, debates and legislation.

To give two regularly used examples: data is non-rivalrous, unlike a piece of cake many people can use data at the same time, and data benefits from network effects, it becomes more valuable as more people use and maintain it.

These differences are one of the reasons the team at the Open Data Institute talk about data as analogous to roads:

Data is infrastructure. Just like roads. Roads help us navigate to a location. Data helps us make a decision.

The “data is roads” analogy breaks people out of the traditional mindset. It helps open their minds to thinking differently.

I think that, as with the web, these different qualities mean that a closed-open axis is a more useful way of thinking than the traditional left and right-wing political axis.

But it will be harder to get people to think about the decisions along that closed-open axis if our words and ideas cause them to think of old left and right wing political battles.

Take back control of data

Data has many other different qualities to other resources. One that is becoming increasingly evident and important is that data is sometimes about identifiable people, sometimes it isn’t and sometimes it’s a bit complicated.

Much of the current debate about data is dominated by personal data: the stuff which is about identifiable people. Many people believe that there is an asymmetry of power and privacy as data about us is controlled by governments and corporations.

Tav Kotka, the Estonian Chief Information Officer, at MyData 2016 in Helsinki. Watch the full video.

Tav Kotka, the Chief Information Officer of Estonia, recently gave a talk in which he broached the idea of adding a fifth freedom to the EU’s existing four freedoms for free movement of goods, workers, services and capital. The talk was mostly about personal data and the concept of personal data stores that could allow individuals to control how data about them is used.

Whilst I agree that more personal control over personal data is important the talk bought up memories of Margaret Thatcher and my teenage political nightmares. The talk did not mention society’s need to access and use that data. Taking back control of data by giving control to individuals misses out the challenges of digital inclusion and the role of other important parts of society like families, communities and nations. Different levels of control, rights and responsibilities are likely to need to given to these different groups. To give just one example vital medical research and national statistics need to use large amounts of personal data, this can’t be neglected or left solely to the decisions of individuals.

But, as I realised, this time I was the one allowing my political history to do the interpretation for me and I was the one who wasn’t listening to the underlying argument. Tav Kotka was using language that built on his political history while talking in English to a Finnish audience. Even though I work for a global organisation my initial reaction was from a UK perspective. My bad.

The political debate about data is happening now

The EU is currently discussing complex concepts such as data control and data ownership through the free flow of data initiative. Major geopolitical organisations, like the EU, can have a large impact on countries outside their membership, the UK government has committed to following current EU data protection regulation after it exits the EU. That EU debate involves politicians from multiple countries, each with their own rich histories and perspectives. There are many other debates in countries around the world.

If you want to help build a great future for data then as well as building new services you may want to get involved in either this or other multinational, national and local debates.

But if you do, remember to think about politics: both other people’s politics and your own. That way you will be best placed to help people think about the decisions not in terms of traditional left and right-wing politics but instead in terms more suited to the different challenges and possibilities of data.

« Older posts

© 2023

Theme by Anders NorenUp ↑