When open data is too open.

The map below should concern you. This visualization, made by James Siddle, shows a single commuter's journeys using London's public bicycles in a six-month period between 2012 and 2013. Purple lines indicate round trips while orange lines represent one-way journeys.

James Siddle

Even without an intimate knowledge of London’s geography, it is hard not to reach a few obvious conclusions. This commuter appears to live in the Limehouse neighborhood, at the southeast corner of the map, and works at King's Cross, toward the northwest. She probably has close friends, family, or a partner in Bow, at the eastern edge of the map. Control for time, and that theory gets stronger:

James Siddle

Those are journeys made between 4 a.m. and 10 a.m. They head in one direction: towards King’s Cross (in fact, to the only cycle docking station near the Guardian’s headquarters). And they come from two places, suggesting this person spends the night at a location that is not home.

Siddle says he had no desire to dig deeper, but a determined individual with just a little more information—a geocoded photograph, a tweet complaining about full docking stations—could probably identify this supposedly anonymous individual. "All that’s needed to work out who this profile belongs to is one bit of connecting information," writes Siddle on his blog.

When open data is too open

Siddle obtained this information through datasets made publicly available by Transport for London, the authority that controls all transport in the British capital. He says he was shocked when he downloaded the data in February. The documentation that accompanied it did not indicate that the data would include customer IDs (TfL says it has now plugged the hole).

"It's not something you should have in that dataset," Siddle says. "Because there is no direct way to tie it to people, it's kind of in a grey area. But because of the nature of the data, all it takes is a little other data to know who that person is. For prolific bike users that's their life."

Another cyclist's movements show just how rich Transport for London’s dataset is. (James Siddle)

An interactive version of some of his findings, which allow you to filter by time of day and number of journeys per route, shows just how revealing the information can be. Pick morning or afternoon for "random_profile_2" and you can see where the cyclist probably works and lives. Click on “evening” and you know where he socializes.

Urban authorities, countries and international agencies around the world routinely release datasets to the public in the hope that tinkerers such as Siddle will find creative ways to make use of it, and perhaps even help the service improve. In aggregate, such data are harmless. But as Quartz has reported several times over the past year, data linked to individuals can be used to draw detailed pictures of a person’s movements, connections, political beliefs and relationships.

Siddle says he alerted Transport for London before publishing his blog post but didn't hear back. TfL's general manager of cycle hire, Nick Aldworth, said:

We’re committed to improving transparency across all our services and publish a range of data for customers and stakeholders online. Due to an administrative error, anonymised user identification numbers were shown against individual trips made between 22 July 2012 and 2 February 2013. The data, which did not identify any individual customers online, was removed as soon as the matter was brought to our attention.

This post originally appeared on Quartz. More from our partner site:

About the Author

Most Popular

  1. A photo of anti-gentrification graffiti in Washington, D.C.

    The Hidden Winners in Neighborhood Gentrification

    A new study claims the effects of neighborhood change on original lower-income residents are largely positive, despite fears of spiking rents and displacement.

  2. Little kids under a blanket.

    How U.S. Child Care Is Segregated: a Brooklyn Story

    At a daycare in a gentrifying Brooklyn area, is the entrance of racially diverse, middle-class families income integration, or more akin to colonization?

  3. Perspective

    Hurricane Barry: Lessons From a Disaster That Wasn’t

    Hurricane Barry largely spared New Orleans, but it underscored that climate change brings complex impacts and hard choices.

  4. Equity

    Berlin’s Plan to Preserve Affordable Apartments: Buy Them

    To ward off rent hikes and evictions at the hands of new building owners, the city will purchase about 700 homes the much-coveted Karl Marx Allee neighborhood.

  5. People wait in line, holding tote bags in the sunshine, outside a job fair.

    How 3 Skill Sets Explain U.S. Economic Geography

    Metro areas in the U.S. with higher cognitive and people skills, versus motor skills, perform better economically and are more resilient during downturns.