Tag: Web

Cat data is complex, and that’s ok

2017-03-12 / peterkwells / 0 Comments

Last year I openly published data about some of the cats that work for the UK government. I ended up giving a talk about it. When publishing the data and giving the talk I skipped over the potential data protection and privacy issues.

Some of those potential issues came up again recently when our family cat, Bugsy, was being transferred to our new home. I was nervous about the cat arriving safe and on time. A friend asked:

can’t you publish some data showing the cat on his journey?

Such a short and simple question. This is my long and complex answer. Most of my friends are patient people.

This post might sound like it is going to be whimsical —ok, there will be some cat whimsy…— but there is a serious point. Publishing and thinking about cat data helped me think and talk about other data things with more people.

Thinking and talking about data protection, ownership and control for cat data will have the same effect. It is pretty important that more people know how complex they are.

This cat data deserves data protection

Different countries have their own data protection and privacy laws. Personal data can be hard to define but at the Open Data Institute we encourage people to look at relevant legislation and start by simply saying:

Data from which a person can be identified is personal data.

If data can be combined with other information to identify a person, that data will still be personal data.

If there is personal data in a dataset then we should consider relevant data protection legislation and the univeral human right of privacy.

At this point I expect that lots of people reading this post will be thinking that a cat is not a person so neither the personal data definition or human rights do not apply.

This is true but, like other animals, cats do have rights. Some people argue that pets are becoming people, in a legal sense, and that animals deserve democratic representation. Perhaps cats do not have data protection rights today but if that might change in the future then perhaps I need to worry about it today.

A cat called Paddington chasing its own tail. Picture by Bill Abbot, CC-BY-SA.

Whilst this would be a fascinating topic to explore unfortunately, to paraphrase a recent article by Luciano Floridi on the rights of robots and artificial intelligence, I’m in danger of chasing my own tail when I should be focussing on the current opportunities and challenges with data that affect people. People like me. Our cat wasn’t moving home in a few year’s time, he was moving now; and I was nervous.

There is a simple reason why I need to think about data protection if I was to publish this cat data. Whether cats realise it or not, their data can refer to people. My cat lives in the same house as me. If you knew the destination of its journey then you would know where I live. If you knew the date when it was being transferred to a new home then you might be able to guess that my old or new home is empty. Etcetera.

So if I was to publish data about Bugsy’s journey I would need to think about the impact on privacy using a methodology like the one provided by the UK’s Information Commissioner’s Office (ICO) before I published the data.

Ownership of cat data is complex

I occasionally hear people saying that defining a legal right to personal data ownership will make this process easy. My privacy, my data, my choice. I doubt my cat cares about human laws but, according to the law, I own him. So I might legally own data about my cat and would have the legal right to choose to publish it. Unfortunately data ownership is not that simple and nor is cat data.

How is my cat’s identity defined? Some cats have microchips, and Edinburgh University have even given a library card to a cat so it can prove its identity and demonstrate its entitlement to borrow books, but our cat just has a phone number on its collar. Is that sufficient?

Defining legal ownership of cats in data seems simple.

Meanwhile Bugsy is a family cat. He is owned by me and my wife. It might look like that joint ownership can easily be defined in data, but the world is more complex than my simple model. How is my identity and that of my wife defined? How would we verify our identities to say that we are allowed to track our cat on his journey? Identity management is hard.

And once we get past those issues I might find that my wife disagrees on how the cat’s data can be used. We both own and live at the same house that the cat is being transferred to. The data refers to both of us. My wife might think my nervousness is utterly ridiculous and not worth risking our privacy for. There have been several legal disputes over the ownership of pets. I don’t think it would calm my cat moving nerves if I was to take my wife to court over ownership of cat data.

Meanwhile we’re still missing something quite important. The cat isn’t travelling alone on his journey. He is being transported by an employee of a company. What about that company’s potential rights to own the data produced by their service? What about the cat transporter’s privacy?

Controlling cat data

At this point, when answering that simple question from a friend about publishing data about Bugsy’s journey to make me feel less nervous, I started to talk more about consent.

Data protection isn’t just for the online world. We also need to think about the offline world and the billions of people who don’t use computers.

Giving people choice and ongoing control over how you use their data is becoming more important. It’s one of Tim Berners-Lees three challenges for the web. Some trading blocks, like the EU, and individual nations, like the UK, have decided that it is necessary to put in place new legislation that strengthen people’s rights over data. Consent is not always necessary but the ICO recently published some draft guidance on consent under that new legislation which I could use to help publish cat data.

My wife knows quite a bit about data so could give informed consent which I could record. I could also ask the cat transporter and their employer if they were willing to consent. To be clear I would want to give the cat transporter the choice of saying no. A world where people who transport cats have less privacy than other people does not sound a sensible world.

Unfortunately given the impending journey I did not have time to think about or research the cat transporter’s needs and skills. The ICO’s guidance says that I can assume that “adults have the capacity to consent unless you have reason to believe the contrary”, and I knew how to be open about how I planned to use the data, but without more research I would not know how to design something so that the cat transporter could choose whether to consent, or not. I might mistakenly assume that an online only service was good enough, despite a large proportion of the UK population having no access to the internet or insufficient skills to use it. The cat transporter could be one of those people.

And all I would have achieved by this point was possibly gaining consent. I would not have given the cat transporter control over the data about their journey. With that control they could reuse the data for another purpose, such as reclaiming their petrol costs or seeing what cat data tells us about people moving house around the country. My wife, the cat transporter, their employer and I all had rights to the cat data and should all be able to have some control over its use.

Sometimes you need to keep things simple

At this point my wife and friend both firmly interrupted me and told me I was not being utterly ridiculous but being completely and utterly ridiculous. I was trying to design a perfect solution that would work for many cats and purposes, rather than keeping things simple and starting with a solution for a particular problem. My nervousness about our cat.

My wife rang the cat transportation company and asked them to text us a couple of times during the journey. They agreed, of course. Sensible wife.

Data is complex, and that’s ok

Now you might read all of this and ask:

if we have to think through all of this complexity everytime we’re thinking of publishing data how will we ever build anything?

I don't think the cat is happy I've come home. pic.twitter.com/w11ZwGPv0i

— Peter Wells (@peterkwells) February 24, 2017

The team at the Open Data Institute, where I work, do the hard work to try and make data as simple and easy as possible so more organisations can get data to people who need it.

That requires us to work on lots of things including how to publish data; how people will search for it; the skills they need; how to use it in organisations, large and small, or whole sectors; and how to get data to benefit everyone. Lots of other people do similar things.

But sometimes I wonder if we and other people can make it sound too easy.

So when we’re encouraging more people to do wonderful things with data then as well as the brilliant possibilities we also talk about the challenges using both real examples and whimsical ones like the ones I faced with my cat data. Whimsical tales sometimes help convey simple messages.

We can build a better future with data but we need to solve problems and be realistic about the complexity if we are to build one that works for people. Data is complex, and that’s ok.

An open city is a better city

2016-09-27 / peterkwells / 0 Comments

Approximate words from a talk at the Holyrood Connect: Data Forum in September 2016. Approximate as I tend to ad-lib in person as I see shocked, or occasionally, pleased faces in front of me. I also had a bad cold so ad-libbed even more than normal. The slides are also available online.

— — — –

Hi, I’m Peter. I do some stuff at the Open Data Institute (ODI). I’m here to talk about how an open city is a better city.

First some background and a couple of concepts: the data spectrum and data infrastructure. Then some current examples of data analytics in cities, and their limitations, followed by some UK examples of people building more open cities with more benefits. I’ll end up with some principles to help get you started and a bit about what’s coming in the future. Ok, background:

Background

The ODI was founded four years ago by people like Tim Berners-Lee and Nigel Shadbolt. It is headquartered in the UK but its team works around the world. There are currently 29 nodes in 18 countries. In the UK that includes places like Aberdeen, Leeds, Belfast, Devon, Bristol and Cardiff.

The ODI’s mission is to connect, equip and inspire people around the world to innovate with data. We believe in knowledge for everyone. We help the public sector, third sector, academia and businesses to get more impact from data. Last week there were research fellows in the office from Madrid and Singapore debating and sharing ideas about geospatial data and privacy, crowdsourcing and smart cities. In the last few weeks the HQ team have been doing stuff in the UK, in Malaysia, New York, Mexico and Tanzania.

Concepts

The ODI works across the data spectrum. Some of us worry about personal health records being “made open”. Some confuse commercial and personal data, or mix up “big data” with “open data”. To unpack data’s challenges and its benefits, we need to be precise about what these things mean. They should be clear and familiar to everyone, so we can all have informed conversations about how we use them, how they affect us and how we plan for the future. And it doesn’t have to be complicated. It can be simple. In one image. Whether big, medium or small, whether state, commercial or personal, the important thing about data is how it is licensed and who can use it. Closed so that it can only be used within one organisation, shared can only be used by some organisations (because of rules or price restrictions), or open data that can be used by anyone for any purpose.

The ODI works to improve data infrastructure. Data has become vital infrastructure over the last few years. It underpins transparency, accountability, public services, business innovation and civil society. Data such as statistics, maps and real-time sensor readings help us to make decisions, build services and gain insight. Data infrastructure will only become more vital as our populations grow and our economies and societies become ever more reliant on getting value from data.

I often hear people say that data is the new fuel or that it’s oil for the digital revolution. Daft analogies. Data doesn’t get burnt up when we use it, we can use it again and again and again. It doesn’t get extracted from the ground: unless it’s geological data. The analogy we use for data infrastructure is roads. Roads help us navigate to a location. Data helps us make a decision. Roads have signs and maps to tell us how to use them. So does data, well hopefully.

Lots of cities are improving data infrastructure

Now back to the theme of cities and data. Cities and local authorities around the world are using and improving data infrastructure. It may not feel like it sometimes, but they are.

Many public sector organisations are developing skills and creating more impact by using their own data to make better decisions. Whether it be where to spend money on social care, what time to pick up the bins or how to design a local authority website so that it’s easy to use. In each case the organisation is having to learn how to gather data, analyse it and use it to make a better decision.

These are all activities in the closed part of the data spectrum.

Half-spectrum doesn’t give you all the value.

We’re also seeing more and more public sector organisations work together and share data to make better decisions. Down in Manchester local authorities are sharing data to help vulnerable children. In London local authorities are sharing and analysing data to look for unlicensed houses of multiple occupancy, they can be unsafe places to live. This type of big data analytics takes inspiration from places like Chicago which has been using data about graffiti tags to tackle gang violence, or New York City and Amsterdam which have analysed data from across the city to work out what characteristics were the best indicators for fire and help prevent it.

These activities take place in the closed and shared part of the data spectrum.

All the data and all the open

But let’s go back a bit. When I talked about data infrastructure I said it underpins transparency, accountability, public services, business innovation and civil society.

All of the previous examples are about public services. The rest of the benefits of data infrastructure missing. There’s some business innovation — for example from data analytics companies selling into the public sector — but only a portion.

Why is that ? Let’s look again at the full data spectrum. We’re missing public data and open data.

At the ODI we say that cities, their businesses and their citizens get most impact from a data infrastructure that is as open as possible while respecting privacy. There’s lots of research showing this and there’s also practical examples. I’ll cover some in a bit.

The reasons that open data infrastructure creates most impact is due to the qualities of data. For example, it benefits from network effects. Data becomes more useful and creates more value as more people use and maintain it.

When you work openly and use as much open data as possible then more people can work together to solve problems, make decisions, find insights and build services. You benefit from network effects. You can build a better city. One that benefits everyone.

This is particularly true if you combine all the data — closed, shared and open — with all the open. Open culture. Open source. Open government. Open standards. Open innovation. Etcetera.

There’s lots of examples, here are some

Let’s take a few examples showing some different aspects.

First, Bath and Strava, the cycling app. Strava users cycling around Bath can choose to share their closed personal data with a community group called Bath:Hacked. That group preserve privacy, analyse the data and are working with the council to use it to improve cycling routes. Interestingly there’s anecdotal evidence that people are cycling and using the app more because they can see that the data they collect benefits the city and themselves. Win win. Meanwhile Bath:Hacked are sharing what they’re doing online.

As a coffee drinker I am unsurprised by the decline in tea-drinking in Britain (source: Defra, ODI and Kiln)

There are two reasons for that. First, by opening up the knowledge for everyone other people can use it and other people can tell Bath how they are using it. People can learn with each other. Second, openness about how organisations secure and manage personal data builds trust. It can improve quality too. take Defra who recently did a privacy impact assessment in the open, with people outside the organisation commenting, before releasing diaries showing the diet habits of 150,000 households. They worked out by debating with their community that some of this data which would otherwise have all been kept closed could be made open for anyone to use. Transparency and open debate about personal data can make things better.

Another example, I was talking to someone from Devon council last week. They published a map of places where people could get help. Unfortunately the map was wrong. Because both the data and the source code were open a friendly person could fix it for them and send them the corrected version. Problem fixed within a few hours. Thank you friendly person.

Another. In places like Manchester and Leeds people from the public sector, private sector and civil society are working to build a low-cost open infrastructure for the internet of things. They’re helping each other using each other’s skills and experience as needed. On the infrastructure people will be able to build and deploy sensors to monitor air quality or the height of a river and anyone will be able to use the data to decide whether to place a new school near a road or a set of new houses by a river, whether to buy a house or whether to evacuate a house as the waters are rising…

These things cost money but they don’t need to cost the big money that so many projects with technology do. The cost of software, hardware and hence data is falling dramatically. You can now build an air quality sensor for less than £100, you can get a LIDAR sensor — a device that can measure distance using lasers — that used to cost tens of thousands of pounds for a few hundred pounds. (That’s part of the reason we’re hearing about automated cars so much. They need those sensors too). As much as possible of the data from that infrastructure will be open, that’s the culture of the community. That will allow other people to use it too for only the cost of allowing people to use the data that has already been collected. The infrastructure is designed for open.

And to continue the theme of culture. In Aberdeen the team in the council run hackathons open to anyone and learn innovative techniques from civil society businesses to help the council deliver other services. Those hackathons will also help with the Scottish government’s digital skills initiative that I was reading about on the train yesterday. An initiative that could also be supported by the new work that the Open Government Partnership are starting with the Scottish government.

Back to Leeds. The city council has funded ODI Leeds to act as a neutral space outside the council that can be used to convene businesses, academia, civil society and the public sector to understand and define problems; share data to explore ideas and then open the data as much as possible to allow people to build solutions. Those solutions could be built by new startups or established businesses. Arup, the global construction firm, use similar open innovation techniques working with startups to help improve how they build stuff. It’s like the data analytics examples we saw earlier but it uses the full spectrum.

In each of these cases we can see people from multiple sectors sector working together to solve common problems as openly as possible. In the process new businesses are built, there’s transparency and accountability, civil society are engaged, and there’s better public services too. All of the things our data infrastructure supports.

There’s countless more examples across the world for those who look.

How do I build open data infrastructure?

But, I often hear people ask, how do I do this?

As you may have realised from these examples data infrastructure is not only about data. Data infrastructure includes datasets; the technology, training and processes that makes them useable; policies and regulation such as those for data sharing and protection; and the organisations and people that collect, maintain and use data. We can all see that the datasets may be from anywhere in the data spectrum. But the more open the data infrastructure, the more value it will create as more people can use it.

Principles to help people build better data infrastructure.

Based on the ODI’s own work and research on what works and what doesn’t at city, national and globally we’ve published some principles to help other people build better data infrastructure.

The first and last principles are key. Design for open and encourage open innovation.

Based on our experience we believe we need a number of things to work together to create the space for open innovation to happen: strategy, policy, training, technology, research, a tech community, and engagement. With that engagement you’re looking to build a receptive internal customer (for example a councillor in a city), a responsive tech community and an engaged civic community willing to work with you. With open innovation the best answers can come from anywhere. You just need to get started and have the courage to try.

Anyway, I hope that was interesting, and useful, but before I go I want to leave with you another thought as to why getting to grips with open and data is so important.

The web of data is coming.

Over the last 25 years we’ve all been building the web of documents. Billions of webpages linked together. It’s fabulous. But the billions of people, sensors and services that are connected to the web and the internet produce, publish and use data. A web of data is now evolving that sits alongside and behind the web of documents.

That might seem like a challenging thing and something we can’t control but I would encourage everyone to see it as an opportunity. By getting to grips with your data infrastructure and making it as open as possible you will be positioning your city and the businesses and citizens that live in it to thrive in that future. That sounds like a pretty important mission to be cracking on with. It’s about building for the open future.

An open city is a better city.

There’s countless other examples to demonstrate why an open city is better and to help you understand how to grow your city in a way that works for your problems and your challenges. But, as a start, I’d encourage all of you to pick a problem and get started. Work together with your businesses and citizens to solve that problem and start building that open city and make things better for everyone.