Linda Poon is a staff writer at CityLab covering science and urban technology, including smart cities and climate change. She previously covered global health and development for NPR’s Goats and Soda blog.
Where official census data is sparse, MIT researchers find that restaurant review websites can offer similar demographic and economic information.
Online review sites can tell you a lot about a city’s restaurant scene, and they can reveal a lot about the city itself, too.
Researchers at MIT recently found that information about restaurants gathered from popular review sites can be used to uncover a number of socioeconomic factors of a neighborhood, including its employment rates and demographic profiles of the people who live, work, and travel there.
A report published last week in the Proceedings of the National Academy of Sciences explains how the researchers used information found on Dianping—a Yelp-like site in China—to find information that might usually be gleaned from an official government census. The model could prove especially useful for gathering information about cities that don’t have that kind of reliable or up-to-date government data, especially in developing countries with limited resources to conduct regular surveys.
“We wanted to explore a new way of using restaurant data to predict those very small neighborhood-level attributes like income, population, employment, and consumption, without relying on official census data,” says Siqi Zheng, an urban development professor at MIT Futures Lab with a special focus on China.
Zheng and her colleagues tested out their machine-learning model using restaurant data from nine Chinese cities of various sizes—from crowded ones like Beijing, with a population of more than 10 million, to smaller ones like Baoding, a city of fewer than 3 million people.
They pulled data from 630,000 restaurants listed on Dianping, including each business’s location, menu prices, opening day, and customer ratings. Then they ran it through a machine-learning model with official census data and with anonymous location and spending data gathered from cell phones and bank cards. By comparing the information, they were able to determine where the restaurant data reflected the other data they had about neighborhoods’ characteristics.
They found that the local restaurant scene can predict, with 95 percent accuracy, variations in a neighborhood’s daytime and nighttime populations, which are measured using mobile phone data. They can also predict, with 90 and 93 percent accuracy, respectively, the number of businesses and the volume of consumer consumption. The type of cuisines offered and kind of eateries available (coffeeshop vs. traditional teahouses, for example), can also predict the proportion of immigrants or age and income breakdown of residents. The predictions are more accurate for neighborhoods near urban centers as opposed to those near suburbs, and for smaller cities, where neighborhoods don’t vary as widely as those in bigger metropolises.
Running a model based on data from one data-rich city can be accurate enough to be applied to different cities within a country, according to the study.
Together, the predictions provide urban planners with the most up-to-date socioeconomic attributes needed to “make the decisions on where to provide public services,” says Zheng. “They need to understand the demand side.” As for the private sector, predictions about daytime activity will inform them about where to set up retail or real estate markets.
It makes sense that the local restaurant scene can paint a picture of the neighborhood it’s in. “It’s one of the most decentralized and deregulated local industries, especially in China,” Zheng tells CityLab. That is, they are almost all privately owned enterprises and driven by demand, with low barriers of entry compared to other industries. Plus, restaurants are everywhere, and they often change over time to reflect changes in the neighborhood.
In that sense, Zheng and her team thinks this method can be applied anywhere, and will be especially useful low-income nations. There is a socioeconomic data gap among countries, and among cities within a country, she says, “even though we are now in the era of big data.”
And while the online platforms may not gather their information in scientific ways, the sheer abundance of it makes it a useful complement to government data. In the U.S., for example, Yelp data can reflect the economic health of cities and shed light on connections between food, race, and gentrification.
Zheng recognizes the methodology’s limitations: There has to be enough data on the restaurants to feed into the machine learning model first, and that may not be available in all countries. But if data from only a handful of cities can help predict the characteristics of all cities within a country—more than 600 in China, for example—small, under-resourced cities can benefit. And perhaps future studies could test whether one model, using data from cities in just one country, can predict the neighborhood profile for any city across the globe.