Tag: Government (Page 2 of 2)

An open city is a better city

Approximate words from a talk at the Holyrood Connect: Data Forum in September 2016. Approximate as I tend to ad-lib in person as I see shocked, or occasionally, pleased faces in front of me. I also had a bad cold so ad-libbed even more than normal. The slides are also available online.

— — — –

Hi, I’m Peter. I do some stuff at the Open Data Institute (ODI). I’m here to talk about how an open city is a better city.

First some background and a couple of concepts: the data spectrum and data infrastructure. Then some current examples of data analytics in cities, and their limitations, followed by some UK examples of people building more open cities with more benefits. I’ll end up with some principles to help get you started and a bit about what’s coming in the future. Ok, background:


Background

The ODI was founded four years ago by people like Tim Berners-Lee and Nigel Shadbolt. It is headquartered in the UK but its team works around the world. There are currently 29 nodes in 18 countries. In the UK that includes places like Aberdeen, Leeds, Belfast, Devon, Bristol and Cardiff.

The ODI’s mission is to connect, equip and inspire people around the world to innovate with data. We believe in knowledge for everyone. We help the public sector, third sector, academia and businesses to get more impact from data. Last week there were research fellows in the office from Madrid and Singapore debating and sharing ideas about geospatial data and privacy, crowdsourcing and smart cities. In the last few weeks the HQ team have been doing stuff in the UK, in Malaysia, New York, Mexico and Tanzania.

Concepts

The ODI works across the data spectrum. Some of us worry about personal health records being “made open”. Some confuse commercial and personal data, or mix up “big data” with “open data”. To unpack data’s challenges and its benefits, we need to be precise about what these things mean. They should be clear and familiar to everyone, so we can all have informed conversations about how we use them, how they affect us and how we plan for the future. And it doesn’t have to be complicated. It can be simple. In one image. Whether big, medium or small, whether state, commercial or personal, the important thing about data is how it is licensed and who can use it. Closed so that it can only be used within one organisation, shared can only be used by some organisations (because of rules or price restrictions), or open data that can be used by anyone for any purpose.


The ODI works to improve data infrastructure. Data has become vital infrastructure over the last few years. It underpins transparency, accountability, public services, business innovation and civil society. Data such as statistics, maps and real-time sensor readings help us to make decisions, build services and gain insight. Data infrastructure will only become more vital as our populations grow and our economies and societies become ever more reliant on getting value from data.

I often hear people say that data is the new fuel or that it’s oil for the digital revolution. Daft analogies. Data doesn’t get burnt up when we use it, we can use it again and again and again. It doesn’t get extracted from the ground: unless it’s geological data. The analogy we use for data infrastructure is roads. Roads help us navigate to a location. Data helps us make a decision. Roads have signs and maps to tell us how to use them. So does data, well hopefully.

Lots of cities are improving data infrastructure

Now back to the theme of cities and data. Cities and local authorities around the world are using and improving data infrastructure. It may not feel like it sometimes, but they are.

Many public sector organisations are developing skills and creating more impact by using their own data to make better decisions. Whether it be where to spend money on social care, what time to pick up the bins or how to design a local authority website so that it’s easy to use. In each case the organisation is having to learn how to gather data, analyse it and use it to make a better decision.

These are all activities in the closed part of the data spectrum.

Half-spectrum doesn’t give you all the value.

We’re also seeing more and more public sector organisations work together and share data to make better decisions. Down in Manchester local authorities are sharing data to help vulnerable children. In London local authorities are sharing and analysing data to look for unlicensed houses of multiple occupancy, they can be unsafe places to live. This type of big data analytics takes inspiration from places like Chicago which has been using data about graffiti tags to tackle gang violence, or New York City and Amsterdam which have analysed data from across the city to work out what characteristics were the best indicators for fire and help prevent it.

These activities take place in the closed and shared part of the data spectrum.

All the data and all the open

But let’s go back a bit. When I talked about data infrastructure I said it underpins transparency, accountability, public services, business innovation and civil society.

All of the previous examples are about public services. The rest of the benefits of data infrastructure missing. There’s some business innovation — for example from data analytics companies selling into the public sector — but only a portion.

Why is that ? Let’s look again at the full data spectrum. We’re missing public data and open data.

At the ODI we say that cities, their businesses and their citizens get most impact from a data infrastructure that is as open as possible while respecting privacy. There’s lots of research showing this and there’s also practical examples. I’ll cover some in a bit.

It’s true you know.

The reasons that open data infrastructure creates most impact is due to the qualities of data. For example, it benefits from network effects. Data becomes more useful and creates more value as more people use and maintain it.

When you work openly and use as much open data as possible then more people can work together to solve problems, make decisions, find insights and build services. You benefit from network effects. You can build a better city. One that benefits everyone.

This is particularly true if you combine all the data — closed, shared and open — with all the open. Open culture. Open source. Open government. Open standards. Open innovation. Etcetera.

There’s lots of examples, here are some

Let’s take a few examples showing some different aspects.

First, Bath and Strava, the cycling app. Strava users cycling around Bath can choose to share their closed personal data with a community group called Bath:Hacked. That group preserve privacy, analyse the data and are working with the council to use it to improve cycling routes. Interestingly there’s anecdotal evidence that people are cycling and using the app more because they can see that the data they collect benefits the city and themselves. Win win. Meanwhile Bath:Hacked are sharing what they’re doing online.

As a coffee drinker I am unsurprised by the decline in tea-drinking in Britain (source: Defra, ODI and Kiln)

There are two reasons for that. First, by opening up the knowledge for everyone other people can use it and other people can tell Bath how they are using it. People can learn with each other. Second, openness about how organisations secure and manage personal data builds trust. It can improve quality too. take Defra who recently did a privacy impact assessment in the open, with people outside the organisation commenting, before releasing diaries showing the diet habits of 150,000 households. They worked out by debating with their community that some of this data which would otherwise have all been kept closed could be made open for anyone to use. Transparency and open debate about personal data can make things better.

Another example, I was talking to someone from Devon council last week. They published a map of places where people could get help. Unfortunately the map was wrong. Because both the data and the source code were open a friendly person could fix it for them and send them the corrected version. Problem fixed within a few hours. Thank you friendly person.


Another. In places like Manchester and Leeds people from the public sector, private sector and civil society are working to build a low-cost open infrastructure for the internet of things. They’re helping each other using each other’s skills and experience as needed. On the infrastructure people will be able to build and deploy sensors to monitor air quality or the height of a river and anyone will be able to use the data to decide whether to place a new school near a road or a set of new houses by a river, whether to buy a house or whether to evacuate a house as the waters are rising…

These things cost money but they don’t need to cost the big money that so many projects with technology do. The cost of software, hardware and hence data is falling dramatically. You can now build an air quality sensor for less than £100, you can get a LIDAR sensor — a device that can measure distance using lasers — that used to cost tens of thousands of pounds for a few hundred pounds. (That’s part of the reason we’re hearing about automated cars so much. They need those sensors too). As much as possible of the data from that infrastructure will be open, that’s the culture of the community. That will allow other people to use it too for only the cost of allowing people to use the data that has already been collected. The infrastructure is designed for open.

And to continue the theme of culture. In Aberdeen the team in the council run hackathons open to anyone and learn innovative techniques from civil society businesses to help the council deliver other services. Those hackathons will also help with the Scottish government’s digital skills initiative that I was reading about on the train yesterday. An initiative that could also be supported by the new work that the Open Government Partnership are starting with the Scottish government.

Back to Leeds. The city council has funded ODI Leeds to act as a neutral space outside the council that can be used to convene businesses, academia, civil society and the public sector to understand and define problems; share data to explore ideas and then open the data as much as possible to allow people to build solutions. Those solutions could be built by new startups or established businesses. Arup, the global construction firm, use similar open innovation techniques working with startups to help improve how they build stuff. It’s like the data analytics examples we saw earlier but it uses the full spectrum.


In each of these cases we can see people from multiple sectors sector working together to solve common problems as openly as possible. In the process new businesses are built, there’s transparency and accountability, civil society are engaged, and there’s better public services too. All of the things our data infrastructure supports.

There’s countless more examples across the world for those who look.

How do I build open data infrastructure?

But, I often hear people ask, how do I do this?

As you may have realised from these examples data infrastructure is not only about data. Data infrastructure includes datasets; the technology, training and processes that makes them useable; policies and regulation such as those for data sharing and protection; and the organisations and people that collect, maintain and use data. We can all see that the datasets may be from anywhere in the data spectrum. But the more open the data infrastructure, the more value it will create as more people can use it.

Principles to help people build better data infrastructure.

Based on the ODI’s own work and research on what works and what doesn’t at city, national and globally we’ve published some principles to help other people build better data infrastructure.

The first and last principles are key. Design for open and encourage open innovation.

Based on our experience we believe we need a number of things to work together to create the space for open innovation to happen: strategy, policy, training, technology, research, a tech community, and engagement. With that engagement you’re looking to build a receptive internal customer (for example a councillor in a city), a responsive tech community and an engaged civic community willing to work with you. With open innovation the best answers can come from anywhere. You just need to get started and have the courage to try.

Anyway, I hope that was interesting, and useful, but before I go I want to leave with you another thought as to why getting to grips with open and data is so important.

The web of data is coming.


Over the last 25 years we’ve all been building the web of documents. Billions of webpages linked together. It’s fabulous. But the billions of people, sensors and services that are connected to the web and the internet produce, publish and use data. A web of data is now evolving that sits alongside and behind the web of documents.

That might seem like a challenging thing and something we can’t control but I would encourage everyone to see it as an opportunity. By getting to grips with your data infrastructure and making it as open as possible you will be positioning your city and the businesses and citizens that live in it to thrive in that future. That sounds like a pretty important mission to be cracking on with. It’s about building for the open future.

An open city is a better city.

There’s countless other examples to demonstrate why an open city is better and to help you understand how to grow your city in a way that works for your problems and your challenges. But, as a start, I’d encourage all of you to pick a problem and get started. Work together with your businesses and citizens to solve that problem and start building that open city and make things better for everyone.

Open addresses: will the address wars ever end?

This is the (rough) text of a talk I gave at the British Computer Society (BCS) Location Information Specialist Group’s 3rd annual addressing update seminar in August 2016. There were more jokes in person. And some Pikachu. The slides for my talk are also online as are those for Ant Beck’s talk.

Hi, I’m Peter. I do some stuff at the Open Data Institute (ODI). The ODI was founded three years ago. It’s mission is to connect, equip and inspire people around the world to innovate with data. Its headquarters are in the UK but it works around the world.

I’m here to talk about open addresses in the UK. To understand the tale it’s useful to start off with a (shortened) bit of history.

Ancient history…

Addresses and other types of geospatial data were early targets for open data releases. They are vital datasets that make it possible to build many, many services and products. Way back in 2006 Charles Arthur and Michael Cross wrote in the Guardian to ask the UK government to “give us back our crown jewels”. They pointed out the complex arrangements for maintaining address data and how the data was sold to fund those complex arrangements. They even pointed out the issues it generated for the 2001 census.

In 2009 the UK government announced that Tim Berners-Lee, one of the ODI’s founders, was going to help it open up data and in 2010 government said that postcodes and address data were going to be early releases. Victory!

Some of the tales from 2013

But it was a pyrrhic victory. Whilst government released many thousands of datasets the promised address data was not one of them. In 2013 the Royal Mail was privatised along with its rights to help create and sell that address data. The complex arrangements that were pointed out in 2006 just got more complex. And, in the meantime, another census happened with the inevitable, and costly, need to build another new address list.

The open data community was rightly sad, and probably got a bit angry. They knew how important that data was. They kept working to make things better. They didn’t just tweet, they organised.

More recent history…

In 2014 the Cabinet Office’s release of data fund provided some money to the ODI to explore whether it was possible to rebuild the UK’s address list and publish it as open data. The ODI pulled together lots of people who work with addresses to share and debate ideas.

The homepage of Open Addresses

This led to the launch of Open Addresses UK. I was one of the team working for Open Addresses. We worked as openly as possible with regular blogs and open source code.

We explored the benefits of better address data for the UK. We found that we could help fix problems such as the months it can take before new addresses are added to computer systems across the country. Months during which someone might not be able to order a pizza, get home insurance or register to vote. We looked at the economic evidence from case studies of other countries, such as Denmark, that have released address data as open data. If the success of Denmark scaled in proportion to the population of the country then the UK could expect to see an extra £110 million a year of social and economic value. Value that we don’t get at the moment because paid data creates less economic value than open data.

We looked at funding models. We started off with £383k of funding from the Cabinet Office. We got some extra funding from BCS (thank you). We knew that we would need to be able to show people what our services would look like before we could start bringing in funding from the users of address services.

From talking with potential users of those services we learnt about the challenges of address entry on many websites. User research supported our theory that moving to free-format address entry would both make life easier for many people and lead to better quality address data going into organisations. We built a working demo of that service.

We knew we needed to gather address data. Following on from the discovery phase we built a model that would allow any organisation or individual to contribute their own address data; that would allow anyone to add large sets of open data containing addresses if they followed guidelines and confirmed that they were legally allowed to publish that address data as open data; and put in place a takedown policy to investigate and remove any infringing data. For the legally minded, we were set up to host the data. This was important. In the past people had been threatened with legal action by the Royal Mail over address data and the hosting model provided a defence.

Unfortunately we hit a snag.

Digital cholera makes me sad.

We learned that one of the largest open data sets held by government was tainted by what we called ‘digital cholera’. It contained third party rights that government was not authorised to licence as open data. This was no good. We wanted to publish address data that was safe to use.

We didn’t want to spend the limited grant funding on more and more legal advice or court battles (sorry lawyers…). So we concentrated on other approaches.

We used clean open data sets and statistical techniques to multiply the address data we already had. For example, “if house number 1 exists and house number 5 exists then house number 3 probably exists”.

We started developing a collaborative maintenance model. People could use our address services to both improve their own services and improve the address data that everyone was using. The model would enable us to learn and publish new address information (such as alternative addresses — like Rose Cottage rather than 8 Acacia Avenue and new addresses) as people started to use them. This would increase the speed of publishing new information and improve data quality. By crowdsourcing data through APIs the data would get better as more people used it.

The team recognised that these new ways of collecting address data would impact on confidence. So, we started developing a model that would allow the platform to declare a level of confidence in each address. The model allowed for different levels of trust based on how frequently we’d seen an address, who reported it, and how long ago they’d reported it. Data users could use the APIs to determine confidence and choose whether to trust an address for their particular use case.

But all this time the clock was ticking. There was limited funding. From the beginning we knew that we were testing two hypotheses.

Two hypotheses. Both are true.

Unfortunately we discovered that both hypotheses were true. We could build much better address services using modern approaches, but the intellectual property issues would keep hindering us.

A report was published: to share the lessons of what worked, and what didn’t. As you’ll see in the report even with all of our mitigations against intellectual property violations in place, Open Addresses was only able to find one insurer who would provide it with cover for defence against Intellectual Property infringement claims. The insurers were too concerned that the Royal Mail would take legal action to protect their revenues from address data.

A blog was published about the shades of grey in open data. And then Open Addresses went to sleep.

Someone else would have to take up the challenge of opening up address data and making things better for everyone.

Meanwhile…

While Open Addresses was happening so were other things. Lots of things. I’m obviously interested in the data ones.

The ODI was thinking about who owned our data infrastructure. Data is infrastructure to a modern society. Just like roads. Roads help us navigate to a location. Data helps us make a decision.

Spot the infrastructure in this excellent picture by Paul Downey.

The government was also working on its policy of government-as-a-platform. Companies House were opening up their data and putting it on the web. The Land Registry described itself as a steel thread that we could all build on.

Things started to come together with the description of registers as authoritative list that we could all trust. We could all build things on top of government’s open registers.

Registers are data infrastructure. An important part of data infrastructure is geospatial data, like addresses.

Now

In the 2016 budget it was announced that government had allocated £5m to explore options to open up address data.

It is important to understand that this is about exploring options. As Open Addresses had learnt UK addresses are pretty complex. We have centuries of legacy to deal with.

Matt Hancock, who was the Minister for the Cabinet Office when the announcement was made, likened it to the ‘US administration (decision) to allow GPS data to be made freely available for civilian use in the 1980s, which he said had “kick-started a multi-billion dollar proliferation of digital goods and services”’.

He got the importance of this data being open. Not that surprising when you know that his parents ran a company that built “software that allows you to type your postcode into the internet and bring up your address”.

Government is building a common language about addresses.

Government is exploring the options as openly as possible. They are sharing their research into topics such as the need and complexity of address matching. and the need for a common language for addresses. They are trialling technology approaches, you can see the source code for yourself: it’s open. And this all forms part of the bigger picture of building registers as infrastructure for the government-as-a-platform strategy. In fact just this week government announced an early version of an authoritative register for English local authorities.

Whilst not all of the work is in the open (remember, the arrangements for UK address data are complex commercially and legally) it is clear that many government organisations — such as the Cabinet Office, Ordnance Survey, BEIS and Treasury — are working together to explore the options and business case for an open register. Good ☺

Will the address wars ever end?

All of the above is what I said in the talk at the BCS addressing update seminar. At the end the audience debated some of the issues raised. The legal issues seemed to confuse some people — derived database rights are tricky. Eventually I was asked the most important question: will this new UK government initiative to create an open address register succeed?

The honest answer is “I don’t know” but I do trust the people working on it. They are good and there is clear political will to get this problem sorted. With good people and political support it’s possible to do hard things. I choose to be optimistic. I think they’ll succeed. Good ☺

The web of data is coming.

It is important for the UK that they do. We need to build for the future web of data.

Other countries recognise the value of data infrastructure that is as open as possible. The USA, Australia and France have all recently made strong moves to get their address data open.

Data infrastructure is a competitive advantage in the 21st century. We need to move on from old licensing and funding models that don’t make the best use of the qualities of the web and data.

Let’s build better data infrastructure that makes things better for everyone.




Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!


Gov cats

In recent years the UK government has got into the habit of announcing that it has employed cats. Downing Street, the Foreign Office and the Treasury all have cats whilst the Cabinet Office are about to appoint one. An unusual habit for a government but, I suppose, life should be full of strangeness.

One afternoon I was feeling simultaneously bored and whimsical, a risky combination, so I spent 10 minutes building a UK gov cat register — a list of these cats — which I published on the web.

the cat register

The cat register is open data. Anyone can use it for any purpose. It is also open for contributions. Anyone can suggest changes and help improve it. Some people have done so already.

This week I created a dashboard for the cat register. That should have been relatively simple too but it took a little longer. Some of my skills are a bit rusty.

the cat dashboard

A list of cats that work for the UK government might seem like a silly joke – it was 🙂 – but it also gave me a chance to use, and give feedback on, some new tools developed by the Open Data Institute (ODI)’s Labs team.

Here’s what I did. It might help others publish some open data or build a dashboard. If you read it all you’ll also learn who Schrödinger’s gov cats are…

How I built cat register

I started off by pulling together some of the available data: names; the department the cats worked in; the dates when they started (or ended) their work; and social media accounts. Yes, UK government cats have social media accounts: both official and unofficial. The data was gathered into a spreadsheet application and saved as a CSV file.

I will shamefully admit that I did not think too much about the needs of potential users of the data. After all, this was a whimsical experiment which users would be able to help maintain if they wanted to be whimsical too. I also concluded that privacy would not be an issue as animals do not have rights under the General Data Protection Regulation. In less whimsical circumstances I would recommend completing a privacy assessment before publishing a dataset.

Octopub screen for adding a dataset

I used the ODI Labs’ Octopub tool to publish the CSV file. Octopub automatically creates an open data certificate and uses Github to store and publish the data with all of the functionality that provides.

After that step the data was accessible on the web, openly licensed to make it clear that people can use it and was open for collaboration so that people could help improve it. Do use the cat data, read how to submit some extra data or raise an issue if you want to.

This bit was easy. A dashboard was a little harder.

A minimum viable cat dashboard

To help with metrics and dashboards the Labs team have created Bothan: it brings you information in the form of a free platform for storing and publishing metrics as JSON or simple visualisations. This capability is built on top of another web tool, Heroku, that allows new applications to be quickly deployed to the web.

Bothan’s name is inspired by a pretty obscure line of dialogue about the many spies who died getting the plans for the death star in Return of the Jedi. I suspect the Labs team had many failures when building their tool…

The ODI’s lab teams have also built some sample code which can be copied and configured to present Bothan visualisations as a dashboard using Github Pages (another free tool).

Setting up a Bothan instance and reconfiguring an existing dashboard was relatively easy but automating the process of getting data, like the total number of cats, from the register into Bothan proved harder.

The team recommended Zapier, a web tool designed to help automate workflows. It’s less open than the other tools — I couldn’t easily share my config and the pricing plan seemed to scale fast — but it looked like it would do the job and help get even more cats on the web. The team have even integrated Bothan with Zapier to make it easy. Unfortunately I had to get to grips with the Python scripting language and my last foray into similar stuff was a while ago. Luckily there was help both on the web and in the office.

a bit of Zapier configuration which, to put it another way, says “if there’s a change to cat register, then run an algorithm and store the results in the Bothan metrics platform”

After getting the tech working I shared a couple of early drafts on twitter; got some feedback (at which point I learnt that Google had given me the wrong answer for the total number of cats in the UK (if only searching for data was as easy as searching for documents) and improved it to a point that I was happy to call it a minimum viable dashboard.

There is one bit of configuration and code looking for changes to the cat register and calculating new metrics for those values; whilst another bit is looking for changes to some official UK government data about cats. Everything runs automatically.

You will find a bit more detail and the code for the dashboard on Github. Feel free to suggest new features.

Peta is Schrödinger’s cat

Schrödinger’s cats

You might have noticed that the dashboard has an entry for “Schrödinger’s cats”. The reason for that is quite simple, just like the cat in Schrödinger’s famous experiment I could find no data that confirms whether some cats are alive or dead. I could make an educated assumption, after all one cat started duty in 1964…, but I thought it was worth leaving the status unclear. I simply left them marked“Inactive” and imagined the life of a retired UK government cat.

some cats from the swinging 60’s. Picture courtesy of National Archives via Wikipedia

Anyone who uses the data can make their own assumption about those cats whilst leaving it unclear might incentivise someone to help find the missing data and, perhaps, discover that an elderly cat from the swinging 60’s is still patrolling the corridors and clubs of Whitehall.

That incentivisation is interesting. A good register should, like any data infrastructure, be providing a foundation on which people can build services and find insights but a good dashboard should be incentivising behaviour in line with a particular goal or strategy. My goal was to get even more cats on the web. The register and dashboard was a way of getting other people to help me. Submit more cats.

Publish your own data or build your own dashboard

But enough of cats, for now. My whimsy also helped me explore a little bit of data publishing. Octopub, Bothan, Zapier and Python all turned out to be fairly easy to use so, if you fancy giving open data a go, why don’t you publish your own dataset or create your own dashboard?

You could start with a whimsical project (penguin register anyone?) or perhaps something more useful like this list of data science courses in Europe prepared as part of the ODI learning team’s work for the European Data Science Academy.

If the documentation for each of those tools doesn’t help you with a problem then there are plenty of people around to ask and, once you’ve learnt the answer, you can always suggest ways to improve the documentation and help the next person.

The hardest bit about publishing (cat) data is getting started. Tools like Octopub and Bothan are there to make it easy.

— — -

Update 21 April: since writing this blogpost I have done a bit more work on cat data, privacy and complexity.




Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!


City data marketplaces are a distraction, let’s improve data infrastructure

I chaired a debate at the Open Data Institute (ODI) titled “What does a good data market look like?” on 29 April. It was a timely debate.

Data is infrastructure at city, national and global levels. It is vital to our societies. It is important to strengthen it. Stable, reliable and well-maintained data infrastructure helps us make better decisions, it brings us new services and it supports innovation.

There are voices arguing that we need to move beyond the open data portal. Nesta have argued that the Mayor of London should build a city data marketplace and Hitachi are building one in Copenhagen.

At the ODI we are keen to learn — and to share what we learn — so we arranged a debate to discuss the idea of a city data marketplace. The panellists were Eddie Copeland the Director of Government Innovation at Nesta; Leigh Dodds a founder of Bath:Hacked and ODI associate; and Yodit Stanton the founder of Open Sensors, a startup building IoT data infrastructure. Questions were taken from the audience in the room whilst those watching the live-stream asked questions via the #ODIFridays hashtag.

Chairing the debate firmed up my views.

A city data marketplace is just another centralised website, we need to move beyond that model and improve how we can discover data on the web. At the same time we can build a better market for data and spend more time experimenting and learning about the role of governments and cities in that market. A better market for data can help strengthen our data infrastructure.

Better and more open data infrastructure will help both cities and the organisations that work in them solve problems and deliver services.

A city data marketplace is different to a market for data

A city data marketplace has previously been described as “an online marketplace that connects organisations and individuals that have useful data with those that want it.”. During the debate it was also described as an “appstore for data” and a “TaskRabbit or eBay for data”. The marketplace would support both open data and data shared for a fee. It would support data publication and exchange between the public and private sector. A city data marketplace is in one place, on one platform and focussed on a city.

The data spectrum.

By contrast, the market for data uses the web where organisations publish data and APIs. The market for data supports the full data spectrum. Like the city data marketplace it includes both open data and data shared for a fee; and supports data publication and exchange between the public and private sector. The traditional open data portal helps people discover public sector open data in the market for data but the market for data is mostly decentralised, just like the web. The market for data already exists.

Data marketplaces have failed before

Leigh Dodds shared his experience as product manager of a company that tried and failed to build a sustainable data marketplace.

He related the tale of multiple other organisations that have tried or failed with only those focussed on specific sectors, such as social media data, living to tell the tale. The debate did not surface other examples of successful data marketplaces, certainly not one that supported more than a single sector. A city supports multiple sectors.

The debate recognised that existing web publishing capabilities and search techniques were not always making it easy to find data in the market whether it be published by governments, city authorities or businesses.

Laura Koesten, a PhD student based at the ODI, is researching this problem. It is a hard one. As Benedict Evans has observed “All curation grows until it requires search. All search grows until it requires curation“.

There were some user needs that people thought weren’t being met by data portals

The debate discussed some user needs that existing open data portals may not be adequately meeting. I was unconvinced that a city data marketplace would help meet these needs any more than the existing market for data.

Even seemingly simple needs such as discovery are affected by a number of factors including the literacy of the person searching for data and how the data has been described during publishing.

More complex needs such as sharing a common problem to get others to help fix it — whether the problem be in banking, housing, jobs or education — is similarly complex. It requires a range of on and offline activities that can take years to complete. Whole new institutions might need to be built to fully address a problem.

There is a team at the UK’s Government Digital Service (GDS) that are researching the user needs for the data.gov.uk portal. Opening up that research and considering it alongside research on city data portals will help everyone learn more about what needs exist so we can design better services to meet them.

The role of government was unclear

The panel discussed the role of government. The views included government building and hosting a city data marketplace, using the marketplace to publish its own data, using the marketplace to buy data, encouraging use of open standards and recognising that the data that government holds is data that it holds on behalf of society.

In general, the role of government in a data marketplace or a market for data was under-discussed. As the chair I take full responsibility as we ran out of time! I think this would have been the most important bit of the debate.

I believe we need to think harder about government’s role

Our governments have chosen to actively shape technology markets: for example by encouraging the uptake of open standards and open source. They are also being active by choosing to use and publish open data.

The UK government, like many around the world, recognises that “It is critical that businesses have the ability to create new and innovative products without being hampered by cost, by licensing conditions, or the inertia caused by uncertainty and doubt.”

But the plans in Copenhagen and the ones that Nesta have floated for London include a marketplace that helps people buy and sell data. Paid for data often has a licence that restricts how you can use it: for example some Surrey councils can’t publish planning applications as open data because of their data supplier. This reduces the value that people can create from the data. Governments and cities that are open-by-default should not be encouraging paid data models.

As Jeni Tennison recently said:

Using data to make a decision is like travelling on a series of roads. To get from point A to point B without open data is like stopping at toll booths at each road junction.

Some journeys you just wouldn’t want to make because they are too much of a pain. So some decisions you will not be able to make because that data is too difficult to access.

Perhaps people think it is necessary to pay to get the private sector to provide data but many businesses, whether it be large enterprises or startups like Guru Systems and Open Sensors, and, as Yodit pointed out, the customers of Open Sensors are making the same choice as governments. They choose to publish some data as open data as it helps their businesses grow, solve problems and deliver services.

Good governments and cities will, where it is useful, use this open data to improve their services. Better ones will go further and encourage more open data.

Government could encourage mobile phone operators to publish the aggregated footfall data that they use for network planning or credit card companies to publish aggregated consumer spend data that they already collected and aggregate. Cities could choose to encourage taxi firms or, in the future, driverless car operators to publish aggregated open data such as traffic congestion or road maps. More open and collaborative mapping models can reduce costs for businesses and the public sector.

To take London as a specific example: the Mayor has responsibility for transport. Wouldn’t aggregated open data from private sector transport firms help a city meet the needs of citizens regardless of who owns the tube train, bus, black cab, car, or bicycle they happen to use? A more efficient transport market is better for the citizens that use it, the public and private sector firms that provide services in it, and the politicians that have democratic responsibility for it.

By encouraging a more open data infrastructure governments and cities won’t just deliver more efficient public services they will support innovation, transparency and accountability; help everyone get better services and grow our economies in the process.

We need to think more about our market of data

I came away from the debate unconvinced by the idea of a data marketplace as it was described. I do not think that our market for data needs another centralised website owned by a single organisation. The web is at its best when it is as open and decentralised as possible, so is our data infrastructure.

The debate did help me deepen my thinking about the market for data and its importance for the future though. If we are to strengthen our data infrastructure and make it as reliable and open as possible, so that it can help support innovation whilst respecting better principles for personal data usage, then we do need to improve that market and encourage it towards openness. Rather than building city data marketplaces perhaps our cities should experiment and learn how to improve the market and their data infrastructure by getting open data out of more organisations and making it easier to discover.

I’d love to hear more thoughts on this topic and talk to other people thinking about and, ideally, helping build better markets for data and more open data infrastructures.

Drop me a note or leave a comment if you can help.

Panama, open data, infrastructure data, open Panama

We’ve been talking about the Panama Papers at the Open Data Institute. This blog came from discussions with some of the lovely team there.

This part looks at the Panama Papers through the lens of data infrastructure, the second part looks at it through the lens of personal data and privacy.

The recent leak of records from Mossack Fonseca, a tax firm specialising in offshore transactions (also known as “the Panama Papers”) has put the issue of how much people and corporations pay in tax on the front pages of our newspapers. The Prime Minister of Iceland has resigned. The Chinese government has censored online discussion. There have been reactions and investigations across the world.

The story is also about data. The Panama Papers are not open data. They contain leaked, or hacked, information some of which has been placed in the public domain, but it is clearly an important story and one where data plays a vital role.

People want everyone to pay what we think is a fair amount of tax. People want to be represented by politicians that they trust. People are angry because they think that global tax avoidance is leading to people not paying fair amounts of tax and they are losing trust in their politicians because they feel politicians are benefiting from global tax avoidance rather than trying to stop it.

Societies across the world want their governments and politicians to be more open, openness can help create trust. Open data’s origins are closely related to the open government movement. Amongst many other things the open parts of our data infrastructure underpin transparency and accountability and can help reduce corruption.

Open data, open government and infrastructure to reduce corruption

Open data has been shown to improve our governments in many ways. Last year open data for election results helped improve confidence in the fairness of Burkina Faso’s election. This year people are experimenting with improvements to the way that the UK publishes election results.

Organisations like Spend Network (*) and tools like 18F’s Calc bring together contracting and procurement data, making it simple for anyone to search and use. Journalists and citizens can look for corruption and challenge their governments on how they are spending data whilst different parts of government can make better decisions and check they are being charged similar sums to their peers. Publishing the spend increases transparency. Using the data improves accountability and efficiency.

Open Corporates (*) collect together data published by governments and corporations across the world to help us understand how corporations, such as Goldman Sachs, are structured. On their site any of us can search for corporations where Mossack Fonseca are an officer. Linking together the data in the Panama Papers with data from Open Corporates and information on company and property ownership (unfortunately both of which are information that can be challenging to find and use in even countries that are the most enthusiastic endorsers of open government…) allowed journalists to start to discover the real owners of some of the property owned by companies based offshore.

In countries, like the UK, where the price of property is a major political issue the Panama Papers should help us all understand the benefits of a beneficial ownership register and start making steps towards its implementation at both national and global levels. This will not be simple. Company structures can be designed to obscure ownership. People will need to match up and link data from many jurisdictions. But progress can be made.

Journalists, citizens and governments have used other lists to analyse the Panama Papers. Wired reported that the German newspaper Sudeutsche Zeitung used lists of known criminals, politicians and famous athletes. Various government’s financial sanctions list will have provided another way to analyse the data.

Lists of politicians, criminals, people under sanctions, property ownership and beneficial ownership are not ‘just datasets’. When used to analyse leaks of data like the Panama Papers these lists become part of a global data infrastructure for anti-corruption. The UK, and its overseas dependencies, will be important contributors to this infrastructure.

Making this infrastructure as open and reliable as possible will increase how many people can use the data. This will improve every country’s efforts to combat tax avoidance and other forms of corruption.

Transparency alone will not solve the problems caused by global tax avoidance but it is a step in the right direction.

The Panama Papers can lead to better data infrastructure for anti-corruption

The Panama Papers highlight the urgent need to make progress on getting people and organisations to pay a fair amount of tax.

The next steps in building a more reliable and open data infrastructure for anti-corruption should include a global register of beneficial ownership. It will improve our efforts to combat tax avoidance and other forms of corruption.

In the next part I’ll touch on some of the personal data and privacy issues raised by the Panama Papers and how I hope it leads to a wider and more informed debate about the role of data in our society.

(*) a former ODI startup

Newer posts »

© 2026

Theme by Anders NorenUp ↑

This website stores cookies on your computer. These cookies are used to provide a more personalized experience and to track your whereabouts around our website in compliance with the European General Data Protection Regulation. If you decide to to opt-out of any future tracking, a cookie will be setup in your browser to remember this choice for one year.

Accept or Deny