Data privacy illustration
Kacper Pempel/Reuters

A detailed guide from Harvard helps governments protect residents’ personal information in open-data initiatives.

For any city, open data is a double-edged sword; the most useful information can also be the most sensitive. To help officials balance the risks and benefits, researchers at Harvard University have created a playbook for open data, complete with best practices, examples of what has and hasn’t worked so far, and a thorough checklist of what to consider when embarking on a new data project.

In the absence of clear-cut regulations, cities have always been somewhat haphazard about how they release data, and how they protect it. It’s not uncommon for officials to face public backlash after releasing seemingly innocuous data—transit information, for example—and finding out, only afterwards, that combined with other datasets, there’s enough information to track individuals. Cyber security experts call this the mosaic effect, and it can undermine traditional data protection efforts like anonymizing data or removing certain identifiers.

Still, with advocates pushing for more open government at the local level, and with municipalities locked in a race to become truly smart cities, the trove of public data is only going to grow. Even the federal government isn’t immune to this challenge.

A 2000 study shows how anonymized health records could be combined with voting registration records to match the information with most U.S. individuals. (Harvard University)

“In this new world of such rich data, often times it’s not [the threat] of someone getting data they shouldn't have access to, but being able to mine what’s already out there for information that shouldn’t have been stored,” says Ben Green, the report’s lead author and a graduate student at Harvard University’s Berkman Klein Center for Internet & Society.

The playbook makes four main recommendations for technology officers in the municipal government, and each is broken down into, as Green puts it, “here’s what you need to know, here’s what you need to do, and then here’s how you do it.”

Find the balance between risk and value: Zero risk is impossible, a point perhaps cities don’t acknowledge enough. But according to the researchers, the trick is to find a level of risk that officials and the public are willing to accept. That can be done by conducting thorough risk-benefit analysis before designing any data sharing program. That means sussing out the vulnerabilities, the potential threats and their likelihood of happening, the impact, and the limits of traditional risk mitigation efforts. In determining the value, the key question to ask is who will use the data, who benefits from it, and how.

Consider privacy at each stage of the data lifecycle: That lifecycle includes data collection, maintenance, release, and retirement—when unpublished data should be removed because it’s no longer relevant. It’s typical for cities to think about privacy only when data is about to be released, but Green says those concerns should be considered at the very first stage. That is to say, cities shouldn’t collect excessive data that isn’t relevant to the project and that could become a vulnerability in the future—for example, recording home addresses while surveying passersby.

Develop a structure for privacy management: ”The harder challenge is developing the internal and operational expertise, and valuing protecting privacy as an essential component of open data program,” Green says. Since there are few, often outdated, guidelines at the federal and state level, the rules of releasing data should at least be consistent at the city scale. The researchers call for cities to develop their own privacy standards and establish a formal process for releasing data.

Keep the public informed: Nearly 80 percent of Americans are concerned about government surveillance, according to Pew surveys cited in the report. So the researchers stress the need for cities to engage the public, to earn its support by showing how open data has benefited the city and gaining trust by being transparent about the entire process. When Chicago, for example, released data about the more than 100 million taxi trips, it also published a detailed post about how that data was prepared and altered to protect privacy—a step that the researchers write should be “common practice.”

Even with a hundred-plus pages of tips and guidelines, though, the playbook doesn’t answer the specifics. It won’t tell cities exactly which fields to remove from their published data, for example, or how large the geographic blocks should be when grouping data to protect people’s addresses. In part, it’s the result of the researchers’ own limitations, Green admits. The answers depend on the kind of data, and can require trial-and-error processes through complex computer modeling.

But also, “we don't want to be prescriptive and define a value that a city government and public should have,” he says. “We’re happy leaving decision-making power in the city's hand.”

You can read the full report here.

About the Author

Most Popular

  1. Traffic-free Times Square in New York City
    Maps

    Mapping How Cities Are Reclaiming Street Space

    To help get essential workers around, cities are revising traffic patterns, suspending public transit fares, and making more room for bikes and pedestrians.

  2. photo: A lone tourist in Barcelona, one of several global cities that have seen a massive crash in Airbnb bookings.
    Coronavirus

    Can Airbnb Survive Coronavirus?

    The short-term rental market is reeling from the coronavirus-driven tourism collapse. Can the industry’s dominant player stage a comeback after lockdowns lift?

  3. Maps

    Readers: Share Your Hand-Made Maps of Life Under Quarantine

    As coronavirus transforms our private and public spaces, how would you map what your neighborhood and community look like now?

  4. A pedestrian wearing a protective face mask walks past a boarded up building in San Francisco, California, U.S., on Tuesday, March 24, 2020. Governors from coast to coast Friday told Americans not to leave home except for dire circumstances and ordered nonessential business to shut their doors.
    Equity

    The Geography of Coronavirus

    What do we know so far about the types of places that are more susceptible to the spread of Covid-19? In the U.S., density is just the beginning of the story.

  5. Equity

    The Problem With a Coronavirus Rent Strike

    Because of coronavirus, millions of tenants won’t be able to write rent checks. But calls for a rent holiday often ignore the longer-term economic effects.

×