The 2015 Big Data Expo in Guiyang, China. Reuters Staff / Reuters

Checking in on the latest advancements, and the challenges that remain.

There’s been no shortage of hype about the relationship between cities and data, especially so-called big data. For large numbers of tech companies, cities, and even a growing number of urbanists, data promises to solve all manner of urban problems, from predictive policing to improving traffic flow to promoting energy efficiency.

An even bigger potential role for new kinds of data lies in helping researchers and policy-makers better understand how cities and neighborhoods grow and evolve—but only if done right.

The legitimately exciting use of new data

A growing number of researchers are using data from internet sources such as Google, Twitter, and Yelp to develop new insights into cities and urban change. The sociologists Robert Sampson and Jackelyn Hwang have used Street View images to examine the role of race in the process of gentrification and neighborhood transformation. Similarly, a study from the U.K. Spatial Economics Research Centre used geo-tagged photos on Flickr to determine levels of urbanity in London and Berlin. Mobility data from Uber and Lyft—and even taxicabs—has also been used in several recent studies, which my CityLab colleague Laura Bliss and former colleague Eric Jaffe have chronicled. Data from real estate sites such as Zillow and Trulia is also being used to analyze housing price trends across neighborhoods, cities, and metro areas.

Other research has used reviewer data from Yelp to study gentrification and unequal urban consumption patterns. One study used Yelp reviews to shed light on the connection between gentrification and race in Brooklyn. Another NBER study employed Yelp data to find out how ethnic and racial segregation affects consumption levels in New York City.

Twitter data has been used to chart regional preferences and patterns of behavior. A study from the Oxford Internet Institute mapped the flow of online content and ideas across cultures. The cartography blog Floating Sheep has used data from Twitter, Google, and Wikipedia to map everything from beer and pizza to weed, bowling, and strip clubs. And my own team has used data from MySpace to track the leading centers for popular music genres across the U.S. and the world.

More recently, a team of Italian researchers combined data from Foursquare and OpenStreetMap, among other sources, to test Jane Jacobs’ theories of urban vitality and diversity in six Italian cities. Their study confirmed many of Jacobs’ key insights about the importance of short blocks, mixed land uses, walkability, dense concentrations of talented workers, and urban public spaces.

In addition to data from websites, satellite data offers the possibility of amassing systematic and comparable data across global cities (little, if any, has been previously available). Several studies (including my own) have used satellite data to get at the economic output of cities and metros around the world. And a 2012 study in the American Economic Review uses light emissions from satellites as a proxy for the spatial organization and economic size of global cities. While this data is subject to considerable limits, it provides at least rough estimates of the overall size and economic scale of cities across the world.

Accurately characterizing “big data”

Not all data from new sources qualifies as “big data,” which—as its name implies—refers to truly massive amounts of information. Max Nathan of the London School of Economics breaks down actual big data into three key categories: internet data from sites like Yelp, Twitter, or Google and other commercial data, government-sponsored data collected by cities or towns, and Census and related data. One example is a 2014 NESTA study, which used big data from the London-based firm Growth Intelligence to map patterns of information and technology businesses in the U.K. Another comes from a forthcoming study in the American Journal of Sociology, which uses data from millions of 3-1-1 service requests to examine neighborhood conflict among residents of different ethnicities.

According to Nathan, big data can be thought of in terms of “four Vs”: variety, volume (millions or billions of observations), velocity (real-time data), and veracity (raw data). Actual big data often requires data analytics methods like machine learning to process and derive meaning from such large troves of information. The ongoing Livehoods Project from the School of Computer Science at Carnegie Mellon University, for instance, uses machine learning to analyze 18 million check-ins on Foursquare to determine the structure and characteristics of eight different cities. When used appropriately, big data and new data analytics can help researchers discern urban structures and patterns that traditional data and methods might not uncover on their own.

A particularly good example of the use of big data is a recent NBER study by Harvard and MIT researchers, which uses computer visioning to better understand geographic differences in income and housing prices. Although the paper covers plenty of ground, perhaps the most interesting section involves the use of Google Street View to predict income levels and housing prices in Boston and New York between 2007 and 2014. The study links 12,200 images of New York City and over 3,600 images of Boston to data on median family income and home values from the 2006-2011 from the American Community Survey. It then examines the extent to which the positive physical attributes shown in these images (i.e. things like size and green space) attract more affluent residents and predict incomes and housing prices.  

Ultimately, the study finds that “images can predict income at the block group level far better than race or education does.” The study notes that a key purpose of big data is to help illuminate the role of smaller geographic areas in our urban economies, which are harder to get at with traditional Census data. The authors conclude that big data offers “some hope that Google Street View and similar predicts will enable us to better understand patterns of wealth and poverty worldwide.”

Problems and limitations

While big data may ultimately be able to advance our observation of and theories about cities, a growing number of scholars urge caution in using it. A 2014 workshop, which brought together 40 or so leading urban social scientists and data users, identified six key issues surrounding big data, spanning data quality and compatibility, the use of new analytical techniques, and questions of privacy and security. As the workshop summary notes:

Developing theory to go with the new methods and data is critical, and is often sidelined. Engineering and control theory (or big data “without theory”) work well when there is a measurable outcome, a simple policy to correct for it, and fast enough reaction time that the correction can be implemented while it is still appropriate. In cities, this is the process used to optimize service delivery. But this theory does not work well for complex systems with long time horizons, like most social systems.

In other words, big data and new data analytics are only as good as the questions we pose and theories we generate to better understand them. No matter how powerful they may be, new data sources and analytic techniques are no real substitute for nuanced human reasoning about cities. The real power of course lies in using these new tools to test and deepen the insights of cutting-edge urban theory. My own hope is that we can eventually combine them in ways that deepen our understanding of the underlying “urban genomics” of neighborhoods, cities, and urban areas.

About the Author

Most Popular

  1. A group of students talk as one tests a pedal-free bicycle they have built.
    Environment

    How an Ancestor of the Bicycle Relates to Climate Resilience

    Architecture students in Buffalo built their own versions of the "laufmaschine," a proto-bike invented in response to a 19th-century environmental crisis.

  2. A photo of shoppers on University Avenue in East Palo Alto, California, which is flanked by two technology campuses.
    Equity

    An Island of Silicon Valley Affordability Says Yes to More Housing

    East Palo Alto is surrounded by tech riches, but that hasn’t necessarily helped longtime residents, who welcome a state law mandating zoning reform

  3. Equity

    What the Supreme Court Said About the 2020 Census Citizenship Question

    In oral arguments, conservative justices asked about data science, while liberals asked what the citizenship question was really for.

  4. A young girl winces from the sting as she receives the polio vaccine in 1954.
    Life

    How Mandatory Vaccination Fueled the Anti-Vaxxer Movement

    To better understand the controversy over New York’s measles outbreak, you have to go back to the late 19th century.

  5. A crowded room of residents attend a local public forum in Chapel Hill, North Carolina.
    Life

    Are Local Politics As Polarized As National? Depends on the Issue.

    Republican or Democrat, even if we battle over national concerns, research finds that in local politics, it seems we can all just get along—most of the time.