Linda Poon is a staff writer at CityLab covering science and urban technology, including smart cities and climate change. She previously covered global health and development for NPR’s Goats and Soda blog.
You can learn a lot about an area just from the cars parked on its streets.
If you walk through a city and see more pickup trucks than sedans parked on the side of the road, there’s a good chance most residents there vote Republican. This sounds like just another stereotype—that Republicans cruise around in pickups while Democrats prefer the Toyota Prius. But maybe there’s some truth to it after all.
That’s what a team of artificial intelligence researchers at Stanford University have found from their efforts to predict demographics and voting patterns based solely on Google Street View images of cars.
Working with some 50 million Street View images from over 200 cities, researchers developed two algorithms. One detected and classified the cars into more than 2,600 distinct categories based on things like the make, model, body type, and age. (Given how blurry many of the images are, that was quite the accomplishment.) Then, using data from the Census and the 2008 elections, they trained another algorithm to predict the income level, racial makeup, educational attainment, and voting patterns for different tracts and precincts based on what cars are present.
Among the findings: Toyota and Honda vehicles are strongly correlated with Asian neighborhoods, in line with surveys that suggest car owners of Asian descent prefer Asian brands over American ones. Meanwhile, black neighborhoods are more strongly associated with Buick, Oldsmobile, and Chrysler vehicles. The presence of pickup trucks, Volkswagens, and Aston Martins indicate mostly white neighborhoods.
It’s not perfect, for sure. “It probably will never be 100 percent [accurate],” said Jonathan Krause, one of the researchers. But when he and his team compared their model’s prediction to actual data from the American Community Survey, their estimates weren’t far off. The model accurately determined that Seattle, Washington, is 69 percent Caucasian, with African Americans mostly residing in the southern neighborhoods. Similarly, the model was correct in predicting that the lowest-income ZIP code in Tampa, Florida, was at the southern tip.
More surprising, even to the researchers themselves, is the accuracy with which the proportion of pickup trucks to sedans in a precinct—an area of about 1,000 residents—determines whether residents there lean Democratic or Republican. In Gilbert, Arizona, the model correctly identified the voting patterns of 58 out of 60 precincts—a 97 percent accuracy rate. Overall the model indicated that a city with more sedans has an 88 percent chance of voting for a Democrat in the next election, and those with with more pickup trucks are 82 percent more likely to vote Republican.
That’s not to validate the stereotypes, though, Krause said. “We’re not really talking this at an individual level, so it’s not that you drive a pickup truck, therefore you are a Republican,” he told CityLab. “These studies are based on aggregating over entire precincts and at even higher levels than that.”And more importantly, he stressed, these aren’t causal links: “It’s more that we see that these things co-occur.”
Plus, given how several counties flipped their support in last year’s election, the team can’t say for sure how the result would look using more recent Street View images and voting data from 2016. (The Street View images used in this research are from 2013.)
The use of machine learning like this is a potential game-changer for large-scale surveys, though it wouldn’t be without caution—one best described by the former Wall Street analyst Cathy O’Neil in her book Weapons of Math Destruction:
The math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of the models encoded human prejudice, misunderstanding and bias into the software systems that increasingly managed our lives…. And they tended to punish the poor and the oppressed in our society, while making the rich richer.
That isn’t lost on the research team and, increasingly, the AI community. “There is a growing recognition in the field that your algorithm is only as unbiased as the data that you give it,” Krause said. “The wrong way to use our study is applying it to an individual level, which would be dangerous to do.”
But it can help with something like the American Community Survey, which the researchers note cost the government more than $250 million per year to conduct. Krause and his team think that as AI technology advances, it can not only cut down on labor and costs, but more importantly, reduce the time lag.
“I don’t think [the model] is accurate enough to replace the manual process, but if you apply something like this before you collect the survey data, you can get more up-to-date information, although it’s a bit noisier,” he said. “And maybe you can use this to figure out, where areas are changing very quickly or which neighborhood is getting worse,” giving policymakers a head start in implementing the right initiatives early on.
And as society changes (say, Millennials stop driving or cities finally figure out how to go car-free) models like his can be trained to analyze other possibly telling aspects, like building architecture, or the type of trees planted, or maybe even pedestrians—though he’s also fully aware of the privacy concerns there.