Tag: Privacy

You don’t control your Facebook posts, the reasons why are more complex than you might think

2018-03-11 / peterkwells / 0 Comments

It told me that my “photos and posts” belong to me and that “[Facebook] won’t use them without [my] permission”.

The same advert has appeared in the feed of friends and work colleagues based in the UK. It seems to be part of a campaign. Perhaps the campaign is related to the imminent European Union’s General Data Protection Regulation and the growing public awareness that there is debate around data, how it is used, and whether to trust those uses.

There is a similar message in Facebook’s terms and conditions saying:

“You own all of the content and information you post on Facebook, and you can control how it is shared through your privacy and application settings”.

Both messages are simplistic, at best. I don’t fully own or control the content I post on Facebook. It doesn’t only belong to or affect me. By over-simplifying its messaging Facebook, like many other organisations, is missing the chance to help explain how its services work and help us all make better decisions when sharing content.

Social media content is more complex than you might think

This will sound counter-intuitive to many. I mean shouldn’t I have control over my data on Facebook? It’s about me! I created it!!

Don’t be silly. Data ‘ownership’ is not as straightforward as it sounds. Most of my content on Facebook is not only about me. It is about other people too.

These people are not my friends. They are from a film called Peter’s Friends. But it shows some people in a picture they may regret in later life.

My list of friends is a list of relationships with other people, people tag someone in a post saying that they went to a restaurant or pub with them, or share a picture or comment about a group of friends.

Most of us will think about our friend’s feelings when sharing content about them on social media, but we don’t always know what will be important to them. The rules aren’t written down. Many of us will have had the experience of sharing something and then having a friend say “hi, do you mind deleting that post because of X…”.

Sometimes we listen to those objections and sometimes we don’t. Our friends might not be able to delete our Facebook content without our consent but their views are part of the complex set of things we think about when posting. They can unfriend us in real-life as well as on social media.

Adverse impact on other people

Beyond affecting a personal relationship there are many types of adverse impact that a Facebook post might have. Affecting copyright owners is one. Copyright has many many flaws but it is one of the ways societies help creators benefit from their work.

A picture by a famous artist, Mr and Mrs Clark and Percy. Image used under fair use. Copyright David Hockney.

If I did own all the content I posted on Facebook then presumably I could post a picture created by someone else and start to make money off it by selling things. Money that could have gone to the artist.

I could, but I shouldn’t.

Both Facebook and I recognise that we need to abide by copyright legislation and that governments help enforce it. A copyright holder can complain directly to Facebook, or through the relevant national or international rules. The content is not mine to own to control and use how I wish. If I breach copyright in a way that unfairly impacts creators then fewer nice things get created. That would be bad.

Germany recently passed a new law stating that social media platforms have to take down hate speech within 1–7 days or face large fines.

Going deeper into adverse impact it could be that someone on Facebook posts something with the intent of causing harm.

To give just a few examples the content might libel someone, use hate speech, endorse terrorism, or use a sexual image of someone without their consent.

Facebook is a global service, and the legislation and definitions of those things will change from country to country, but in many countries those things would be illegal. A poster would lose control of the content, and perhaps even their liberty, as democratic governments use the powers given to them by people to stop the content from being seen and shared.

Facebook has its own moderation rules and tools that allow Facebook’s moderators to intervene proactively or for people to report content and get it removed. Again, that removal can happen without the poster’s consent. The poster is not in control.

Not all of the adverse impacts that moderation rules try to prevent are illegal and intentional. Others are unethical, or against social norms for a particular community or society. Moderation exists because the adverse impact from my posts might damage the health and goals of a community.

Both sassy socialist memes, with 1 millions followers, and sassy libertarian memes, with 200 followers, are real Facebook groups.

Moderation is not only done by Facebook and governments. Many community groups within Facebook have their own moderators and policies. Group moderators can also remove content without a poster’s consent.

Perhaps the moderators of sassy socialist memes or sassy libertarian memes will remove content I post in their groups if my content just ain’t sassy enough. The local Facebook group for the town I live in, like many other local Facebook groups, certainly has a fierce response to excessive advertising or outsiders criticising the town.

Explaining this stuff is hard, but it is necessary

This stuff is complex and can be hard to explain in an accessible way, but it is necessary to understand the complexity before trying to make it simple.

Like many other types of content and data, Facebook posts and photos can be about more than one person. The content can create adverse impacts for those other people but it can also create benefits too. Because of this, users are not fully in control of the content they post, and they certainly don’t own it in the same way that we might own a house or car. Instead civil society, governments and service providers need to work together to design ways to help give people more control and to maximise the social and economic benefits, while minimising the adverse impacts.

Over-simplifying this necessary complexity risks us slipping into a world where instead individuals fully control the data that they create. That is the world that Facebook’s ad is describing to many people. How silly. That world will reduce the benefits and increase the risk of harms.

We don’t need more lengthy and unreadable terms and conditions but as the debate over data grows it would be helpful if major service providers like Facebook took greater responsibility in helping to create a more informed debate and helping people to make better decisions.

Cat data is complex, and that’s ok

2017-03-12 / peterkwells / 0 Comments

Last year I openly published data about some of the cats that work for the UK government. I ended up giving a talk about it. When publishing the data and giving the talk I skipped over the potential data protection and privacy issues.

Some of those potential issues came up again recently when our family cat, Bugsy, was being transferred to our new home. I was nervous about the cat arriving safe and on time. A friend asked:

can’t you publish some data showing the cat on his journey?

Such a short and simple question. This is my long and complex answer. Most of my friends are patient people.

This post might sound like it is going to be whimsical —ok, there will be some cat whimsy…— but there is a serious point. Publishing and thinking about cat data helped me think and talk about other data things with more people.

Thinking and talking about data protection, ownership and control for cat data will have the same effect. It is pretty important that more people know how complex they are.

This cat data deserves data protection

Different countries have their own data protection and privacy laws. Personal data can be hard to define but at the Open Data Institute we encourage people to look at relevant legislation and start by simply saying:

Data from which a person can be identified is personal data.

If data can be combined with other information to identify a person, that data will still be personal data.

If there is personal data in a dataset then we should consider relevant data protection legislation and the univeral human right of privacy.

At this point I expect that lots of people reading this post will be thinking that a cat is not a person so neither the personal data definition or human rights do not apply.

This is true but, like other animals, cats do have rights. Some people argue that pets are becoming people, in a legal sense, and that animals deserve democratic representation. Perhaps cats do not have data protection rights today but if that might change in the future then perhaps I need to worry about it today.

A cat called Paddington chasing its own tail. Picture by Bill Abbot, CC-BY-SA.

Whilst this would be a fascinating topic to explore unfortunately, to paraphrase a recent article by Luciano Floridi on the rights of robots and artificial intelligence, I’m in danger of chasing my own tail when I should be focussing on the current opportunities and challenges with data that affect people. People like me. Our cat wasn’t moving home in a few year’s time, he was moving now; and I was nervous.

There is a simple reason why I need to think about data protection if I was to publish this cat data. Whether cats realise it or not, their data can refer to people. My cat lives in the same house as me. If you knew the destination of its journey then you would know where I live. If you knew the date when it was being transferred to a new home then you might be able to guess that my old or new home is empty. Etcetera.

So if I was to publish data about Bugsy’s journey I would need to think about the impact on privacy using a methodology like the one provided by the UK’s Information Commissioner’s Office (ICO) before I published the data.

Ownership of cat data is complex

I occasionally hear people saying that defining a legal right to personal data ownership will make this process easy. My privacy, my data, my choice. I doubt my cat cares about human laws but, according to the law, I own him. So I might legally own data about my cat and would have the legal right to choose to publish it. Unfortunately data ownership is not that simple and nor is cat data.

How is my cat’s identity defined? Some cats have microchips, and Edinburgh University have even given a library card to a cat so it can prove its identity and demonstrate its entitlement to borrow books, but our cat just has a phone number on its collar. Is that sufficient?

Defining legal ownership of cats in data seems simple.

Meanwhile Bugsy is a family cat. He is owned by me and my wife. It might look like that joint ownership can easily be defined in data, but the world is more complex than my simple model. How is my identity and that of my wife defined? How would we verify our identities to say that we are allowed to track our cat on his journey? Identity management is hard.

And once we get past those issues I might find that my wife disagrees on how the cat’s data can be used. We both own and live at the same house that the cat is being transferred to. The data refers to both of us. My wife might think my nervousness is utterly ridiculous and not worth risking our privacy for. There have been several legal disputes over the ownership of pets. I don’t think it would calm my cat moving nerves if I was to take my wife to court over ownership of cat data.

Meanwhile we’re still missing something quite important. The cat isn’t travelling alone on his journey. He is being transported by an employee of a company. What about that company’s potential rights to own the data produced by their service? What about the cat transporter’s privacy?

Controlling cat data

At this point, when answering that simple question from a friend about publishing data about Bugsy’s journey to make me feel less nervous, I started to talk more about consent.

Data protection isn’t just for the online world. We also need to think about the offline world and the billions of people who don’t use computers.

Giving people choice and ongoing control over how you use their data is becoming more important. It’s one of Tim Berners-Lees three challenges for the web. Some trading blocks, like the EU, and individual nations, like the UK, have decided that it is necessary to put in place new legislation that strengthen people’s rights over data. Consent is not always necessary but the ICO recently published some draft guidance on consent under that new legislation which I could use to help publish cat data.

My wife knows quite a bit about data so could give informed consent which I could record. I could also ask the cat transporter and their employer if they were willing to consent. To be clear I would want to give the cat transporter the choice of saying no. A world where people who transport cats have less privacy than other people does not sound a sensible world.

Unfortunately given the impending journey I did not have time to think about or research the cat transporter’s needs and skills. The ICO’s guidance says that I can assume that “adults have the capacity to consent unless you have reason to believe the contrary”, and I knew how to be open about how I planned to use the data, but without more research I would not know how to design something so that the cat transporter could choose whether to consent, or not. I might mistakenly assume that an online only service was good enough, despite a large proportion of the UK population having no access to the internet or insufficient skills to use it. The cat transporter could be one of those people.

And all I would have achieved by this point was possibly gaining consent. I would not have given the cat transporter control over the data about their journey. With that control they could reuse the data for another purpose, such as reclaiming their petrol costs or seeing what cat data tells us about people moving house around the country. My wife, the cat transporter, their employer and I all had rights to the cat data and should all be able to have some control over its use.

Sometimes you need to keep things simple

At this point my wife and friend both firmly interrupted me and told me I was not being utterly ridiculous but being completely and utterly ridiculous. I was trying to design a perfect solution that would work for many cats and purposes, rather than keeping things simple and starting with a solution for a particular problem. My nervousness about our cat.

My wife rang the cat transportation company and asked them to text us a couple of times during the journey. They agreed, of course. Sensible wife.

Data is complex, and that’s ok

Now you might read all of this and ask:

if we have to think through all of this complexity everytime we’re thinking of publishing data how will we ever build anything?

I don't think the cat is happy I've come home. pic.twitter.com/w11ZwGPv0i

— Peter Wells (@peterkwells) February 24, 2017

The team at the Open Data Institute, where I work, do the hard work to try and make data as simple and easy as possible so more organisations can get data to people who need it.

That requires us to work on lots of things including how to publish data; how people will search for it; the skills they need; how to use it in organisations, large and small, or whole sectors; and how to get data to benefit everyone. Lots of other people do similar things.

But sometimes I wonder if we and other people can make it sound too easy.

So when we’re encouraging more people to do wonderful things with data then as well as the brilliant possibilities we also talk about the challenges using both real examples and whimsical ones like the ones I faced with my cat data. Whimsical tales sometimes help convey simple messages.

We can build a better future with data but we need to solve problems and be realistic about the complexity if we are to build one that works for people. Data is complex, and that’s ok.

A data perspective on the IP bill

2015-12-22 / peterkwells / 0 Comments

Recently the UK Government issued a draft of legislation that would alter its powers to investigate illegal activity on the internet: the Investigatory Powers (IP) bill. The draft is being debated in committee, in public and will be debated in Parliament before a decision is made on the final text.

I have chosen to consider the IP Bill through the prism of data. The bill is lengthy and hard to unpick but it moves large datasets from closed to shared on the data spectrum. Sometimes with unclear ownership and governance.

As a whole the bill creates risks to our economy and privacy with the aim of increasing our security. We risk causing significant damage as a result.

Perhaps we should go back to first principles and consider other ways to use data to improve our security.

Bulk communications data for telephone calls

To understand the implications we need to look in detail at the data impact of the draft bill.

First, a dataset containing bulk communications data for telephone calls. The telecoms industry calls these Call Detail Records (CDRs).

CDRs contain information for each telephone call made to or from a UK number. It was confirmed on release of the IP bill that CDRs have been collected since 2001. This was confirmed by an oblique reference to the 1984 Telecommunications Act in the IP Bill debate. This was the first time that government had confirmed that CDRs were being collected by the intelligence agencies. Alongside the draft bill more detail was released of how CDRs will be acquired and used.

The dataset contains what is known as telephony metadata: the calling number, called number, date, time and duration for each call. It does not contain the content, i.e. what was said, or the identity of the person at either end of the call but under some conditions that could be inferred to a high level of confidence. For example, with access to other data CDRs could be linked to the identity of the bill payer at either side. That might disclose the name of a person, a business, a charity, a government department. If you knew that I called a letting agent and removal firm on a given day you might be able to guess my purpose.

As well as through the CDR dataset disclosed in the IP bill, the police, the security services and multiple other public sector organisations can also gain access to CDRs, and the identity of the person who pays the bill, under the Regulation of Investigatory Powers (RIPA) Act but with a different access method.

RIPA states that communication companies must store this data within their own organisation. Authorised bodies make requests under certain conditions to search the data. The communications company provides data that matches the request. For example a communications company might receive a request for details of “all calls from phone number X within time period Y to Z” and the requesting body might have made the request to support a criminal investigation.

There have been many cases where people have used RIPA to access data in dubious circumstances such as investigating the sources of journalists or checking whether parents live in a catchment area for a school.

The Interception of Communication Commissioner’s Office (IOCCO) has regulatory responsibility for RIPA and can take action against bodies that use RIPA incorrectly. From IOCCO’s reports it appears that they were asked to take responsibility for regulating the newly disclosed bulk CDR dataset and, in return, requested it to be put on a clearer legislative footing early in 2015.

This is a complex tale. It is tricky to even spot this dataset being disclosed in the debate in Parliament let alone unpick its history and understand the impact.

A simpler way to consider the data is using the Open Data Institute’s data spectrum.

Data spectrum image by the Open Data Institute. The circles represent datasets.

Communications companies need CDRs to provide services and bill customers. The CDRs used for billing would be in the internal access part of the spectrum: it is only accessible by the communications company (1). The Information Commissioner’s Office regulates that the data is kept securely. Customers understand that the dataset exists and receive a derived version in the form of their personal telephone bill (2).

The data that is retained by communications companies to meet their requirements under RIPA and the dataset that is gathered and retained by the intelligence services under the newly disclosed powers are different datasets. They are also derivatives of the internal access dataset that is used to provide services and bill customers.

The RIPA dataset sits within the named access part of the data spectrum (3): it is maintained by the communications company who provide access to authorised public sector organisations. IOCCO’s function as regulator is to verify and report that all parties are are complying with the rules under which Named Access is permitted.

It is unclear whether the dataset newly disclosed by the IP bill is gathered directly by intelligence agencies or provided to them by communications companies. It could be internal access or named access on the data spectrum (4).

Coupled with a historic lack of visibility and independent scrutiny it is difficult for people commenting on the draft of the IP bill to understand some key questions about this dataset. Does the bill provide an appropriate level of regulatory scrutiny? Has the data proved operationally useful to the intelligence services since it was first collected in 2001? Has it, like that accessed under RIPA, been misused?

Internet Connection Records

The second dataset we will consider is Internet Connection Records (ICRs). The intelligence services have informed government that they need access to this new dataset to protect our security. The IP Bill requires communication companies to gather ICRs, retain them and provide access on request.

In the draft bill the ICR definition is loose and no example is given of precisely what it might look like and contain. It is clear that ICRs will contain a level of metadata for internet usage: i.e. which websites are accessed and when.

Communications companies need to temporarily use this data to connect customers to websites (5). They may retain an aggregated form of the data to understand the behaviour of their customers and help with marketing and network management. They are unlikely to retain detailed data due to the cost of gathering and securely retaining, the risk that would be identified by a privacy impact assessment and the low value to their business.

Under the IP bill communications providers are asked to derive and retain ICRs for all of their customers (6). Like the RIPA datasets this would be held by the communications company but provided to police and intelligence services on request. This is a dataset in the named access part of the data spectrum.

Attempts to recreate ICRs show that they are likely to contain personal data which when linked together and analysed would be extremely revealing about individuals. The data is far more revealing than the bulk communications dataset for telephone records.

This is concerning.

By moving data from closed to shared we increase the risk of malicious actors gaining access. Every point of data storage, transfer or access is open to attack. The data that malicious actors could gain access to contains personal data about all UK citizens who use telephones or use the internet.

To give a simple example of the potential damage, details continue to emerge of the personal impact of the hack to a single website, the Ashley Madison dating site in America. A leak of an ICR dataset from either the intelligence services or a major communications company would show not just who used that one website but every website that had been browsed and when. An increasing number of services are delivered and used online. Anyone using the data would be able to paint a realistic picture of each of our lives.

The data should be kept securely but it is difficult to guarantee protection against malicious attacks. The UK ISP TalkTalk recently suffered from a hack whilst one of the many things that the Edward Snowden leaks showed us is that even if there are safeguards in place to protect against such hacks, there are still ways for human beings to circumvent the safeguards.

To provide confidence that this data is being kept securely will require significant organisational transparency from both the intelligence services and communications companies. It would require disclosure of how the data is used, how it is shared, who it is shared with, why it is shared, how it is secured, whether it has been breached, who audits it, and who regulates the access. It would also require us to trust the answer to each of these questions and that the auditor and regulator are performing their functions to our satisfaction.

Bulk Personal Datasets

The final dataset to consider is bulk personal datasets (BPDs). BPDs contain personal data relating to a large number of individuals. The majority of the individuals in a BPD are not of interest to them, but the intelligence agencies still deem the whole dataset to be useful

It became clear in March 2015 that the intelligence services had been gathering, retaining and using BPDs for some time. Privacy International started legal action against the UK intelligence agencies in June 2015. The IP Bill creates a regulatory framework for how BPDs are gathered and retained.

It is good that BPDs (7) are being defined and regulated in the draft bill but it is not yet clear how many and which BPDs are being gathered and retained, how they are being used or who they are being shared with. They could be Internal Access or Named Access.

The definition in the draft bill is broad. BPDs could be telephone directories, property records and electoral registers. BPDs could also be medical records, travel records, financial records, records for membership or sports clubs or political parties. They could be biometric records such as the planned NHS genomics dataset.

The Intelligence and Security Committee (ISC)of the UK Parliament has reported that “each Agency reported that they had disciplined — or in some cases dismissed — staff for inappropriately accessing personal information held in these (bulk personal) datasets in recent years.”.

Without additional clarity, openness and ongoing transparency it is unclear whether BPDs create a larger or smaller risk to privacy than the new ICR dataset. Due to the disciplinary action reported by the ISC it is clear that some damage has already been done.

Increasing the risks to people’s privacy will damage the digital economy

There are other changes to communications data in the bill. For example the power to interfere, including in bulk, with consumer equipment such as smartphones, communications company network equipment or the potential requirement for online services to provide ways to bypass encryption.

Image by Juniper Networks of a Netscreen firewall. It was recently disclosed that the software on these firewalls had been interfered with. It is not known who or even how many individuals or organisations performed this act.

But the overall message is consistent.

By increasing the number of people who can access communications data and bulk personal datasets we are moving data from the closed to the shared area of the data spectrum. Whenever we move data in this way it increases the number of people who can use the data. This increases the risk of invading privacy. We need to protect data against these risks.

If protection fails then malicious actors could use their access to discriminate against us, to steal our identity or steal our possessions. They could even choose to openly publish data that was always meant to be kept private. If such breaches occur then it will cause personal and economic damage. Lives will be damaged.

We will ask for compensation from the services that we think failed us and we will stop using services — from the public and private sectors — that we don’t trust.

To protect against the risk of personal damage some individuals can be expected to use personal cryptography to protect their data.

To protect against the risk of economic damage that will threaten their businesses it can be expected that internet companies will respond by continuing to deploy end-end cryptography and hence reducing the value of the data that is collected under the powers in the bill. The battle between internet companies and governments will be played out behind closed doors and in the media. Perhaps the internet companies will escalate their response and mobilise their users to lobby governments and sway policy.

With a continuing battle, occasional data hacks and the corresponding loss of trust we risk individuals choosing the locked down future for their data. This is a future where all data is as closed as possible. That would reduce the value we can gain from data and increase the damage to our economies.

We need to tread carefully when changing the way we store and use data on such a large scale. We need a more informed and open debate. We may find that in an attempt to reduce the immediate danger to our security we are risking irreversible damage to trust between the state and citizens, to our economy and to our privacy.

We want to be world leaders in the digital economy but are risking both the digital economy and our privacy as the intelligence services believe that mass surveillance and ‘big data’ is the best way to use data to protect the security of citizens. Perhaps we should go back to first principles and reconsider whether that is the case?

[Note2: this isn’t an authoritative assessment of the IP Bill. The bill’simpacts are complex. An internet search will find you many more assessments. This is a useful legal review.]

[Note: this blogpost was updated on 23 December to include a section on bulk personal datasets and clarify some detailed points in the last section]

Tag: Privacy

A data perspective on the IP bill

Bulk communications data for telephone calls

Internet Connection Records

Bulk Personal Datasets

Increasing the risks to people’s privacy will damage the digital economy

About me

Archives

Copyright notice

You don’t control your Facebook posts, the reasons why are more complex than you might think

Social media content is more complex than you might think

Adverse impact on other people

Other people can benefit from content

Explaining this stuff is hard, but it is necessary

Cat data is complex, and that’s ok

This cat data deserves data protection

Ownership of cat data is complex

Controlling cat data

Sometimes you need to keep things simple

Data is complex, and that’s ok

A data perspective on the IP bill

Bulk communications data for telephone calls

Internet Connection Records

Bulk Personal Datasets

Increasing the risks to people’s privacy will damage the digital economy

About me

Archives

Copyright notice