When open data is too open.

The map below should concern you. This visualization, made by James Siddle, shows a single commuter's journeys using London's public bicycles in a six-month period between 2012 and 2013. Purple lines indicate round trips while orange lines represent one-way journeys.

James Siddle

Even without an intimate knowledge of London’s geography, it is hard not to reach a few obvious conclusions. This commuter appears to live in the Limehouse neighborhood, at the southeast corner of the map, and works at King's Cross, toward the northwest. She probably has close friends, family, or a partner in Bow, at the eastern edge of the map. Control for time, and that theory gets stronger:

James Siddle

Those are journeys made between 4 a.m. and 10 a.m. They head in one direction: towards King’s Cross (in fact, to the only cycle docking station near the Guardian’s headquarters). And they come from two places, suggesting this person spends the night at a location that is not home.

Siddle says he had no desire to dig deeper, but a determined individual with just a little more information—a geocoded photograph, a tweet complaining about full docking stations—could probably identify this supposedly anonymous individual. "All that’s needed to work out who this profile belongs to is one bit of connecting information," writes Siddle on his blog.

When open data is too open

Siddle obtained this information through datasets made publicly available by Transport for London, the authority that controls all transport in the British capital. He says he was shocked when he downloaded the data in February. The documentation that accompanied it did not indicate that the data would include customer IDs (TfL says it has now plugged the hole).

"It's not something you should have in that dataset," Siddle says. "Because there is no direct way to tie it to people, it's kind of in a grey area. But because of the nature of the data, all it takes is a little other data to know who that person is. For prolific bike users that's their life."

Another cyclist's movements show just how rich Transport for London’s dataset is. (James Siddle)

An interactive version of some of his findings, which allow you to filter by time of day and number of journeys per route, shows just how revealing the information can be. Pick morning or afternoon for "random_profile_2" and you can see where the cyclist probably works and lives. Click on “evening” and you know where he socializes.

Urban authorities, countries and international agencies around the world routinely release datasets to the public in the hope that tinkerers such as Siddle will find creative ways to make use of it, and perhaps even help the service improve. In aggregate, such data are harmless. But as Quartz has reported several times over the past year, data linked to individuals can be used to draw detailed pictures of a person’s movements, connections, political beliefs and relationships.

Siddle says he alerted Transport for London before publishing his blog post but didn't hear back. TfL's general manager of cycle hire, Nick Aldworth, said:

We’re committed to improving transparency across all our services and publish a range of data for customers and stakeholders online. Due to an administrative error, anonymised user identification numbers were shown against individual trips made between 22 July 2012 and 2 February 2013. The data, which did not identify any individual customers online, was removed as soon as the matter was brought to our attention.

This post originally appeared on Quartz. More from our partner site:

About the Author

Most Popular

  1. Design

    Cities Deserve Better Than These Thomas Heatherwick Gimmicks

    The “Vessel” at New York’s Hudson Yards—like so many of his designs—look as if the dystopian world of 1984 has been given a precious makeover.

  2. Transportation

    China's 50-Lane Traffic Jam Is Every Commuter's Worst Nightmare

    What happens when a checkpoint merges 50 lanes down to 20.

  3. A photo of U.S. senators and 2020 Democratic Party hopefuls Cory Booker and Kamala Harris

    Cory Booker and Kamala Harris Want a Monthly IRS Tax Credit for Rent

    The 2020 Democratic Party hopefuls are both planning bills that would create a tax credit for housing rental assistance every month. How would that work?

  4. Homes in Amsterdam are pictured.

    Amsterdam's Plan: If You Buy a Newly Built House, You Can't Rent It Out

    In an effort to make housing more affordable, the Dutch capital is crafting a law that says anyone who buys a newly built home must live in it themselves.

  5. North Carolina's legislature building.

    Should Government Agencies Move Out of Capital Cities?

    North Carolina may relocate its Division of Motor Vehicles from Raleigh to boost lagging Rocky Mount. Can this be a national model for decentralizing power?