The method recently worked in an experiment that analyzed Chicago crime, but privacy questions abound.

Initially, Matthew Gerber didn't believe Twitter could help predict where crimes might occur. For one thing, Twitter's 140-character limit leads to slang and abbreviations and neologisms that are hard to analyze from a linguistic perspective. Beyond that, while criminals occasionally taunt law enforcement via Twitter, few are dumb or bold enough to tweet their plans ahead of time. "My hypothesis was there was nothing there," says Gerber.

But then, that's why you run the data. Gerber, a systems engineer at the University of Virginia's Predictive Technology Lab, did indeed find something there. He reports in a new research paper that public Twitter data improved the predictions for 19 of 25 crimes that occurred early last year in metropolitan Chicago, compared with predictions based on historical crime patterns alone. Predictions for stalking, criminal damage, and gambling saw the biggest bump.

"I was surprised," says Gerber. "In the thousands of tweets that I've read, you don't see people saying things like, 'I'm going to rob somebody tonight.'"

The experiment began with Gerber collecting more than 1.5 million public tweets tagged with GPS coordinates within the city limits between January and March, 2013. (Important privacy side note: Twitter users must opt-in to GPS tagging.) Meanwhile, he gathered information on all the documented crimes that occurred over that same period.

Next Gerber created a computer algorithm that separated the tweets into 1 kilometer by 1 kilometer neighborhoods, then analyzed the content of the tweets in each neighborhood to find out what people were tweeting about. The content was then lumped into hundreds of "topics." For instance, the foremost topic in the neighborhood around Chicago O'Hare pertained to travel, with tweets including words like gate, plane, flight, and of course, delayed.

Things get a bit technical from here. In basic terms, Gerber's model compared topics in a neighborhood to the historical crime data from that same spot in the city for a given month. The model formed correlations between topics and crimes, then used those correlations to predict crime in the same neighborhood for a subsequent month. The method is similar to the way Google Flu Trends uses search terms to predict outbreaks.

"So in the past maybe there's a cluster of thefts that occurred in a particular neighborhood," says Gerber. "The model will take that and say that's a neighborhood that's really prone to theft, now let's look at the Twitter content that's been generated. It's looking at that cluster of theft, and that Twitter content, and it's saying ok these words are highly associated with theft. And it makes a prediction on that basis."

For 19 crimes that occurred during these months in Chicago, Gerber's model did a better job predicting them than did the historical crime data alone. Of course, the method says nothing about why Twitter data improved the predictions. Gerber speculates that people are tweeting about plans that correlate highly with illegal activity, as opposed to tweeting about crimes themselves.

Let's use criminal damage as an example. The algorithm identified 700 Twitter topics related to criminal damage; of these, one topic involved the words "united center blackhawks bulls" and so on. Gather enough sports fans with similar tweets and some are bound to get drunk enough to damage public property after the game. Again this scenario extrapolates far more than the data tells, but it offers a possible window into the algorithm's predictive power.

The map on the left shows predicted crime threat based on historical patterns; the one on the right includes Twitter data. (Via Decision Support Systems)

From a logistical standpoint, it wouldn't be too difficult for police departments to use this method in their own predictions; both the Twitter data and modeling software Gerber used are freely available. The big question, he says, is whether a department used the same historical crime "hot spot" data as a baseline for comparison. If not, a new round of tests would have to be done to show that the addition of Twitter data still offered a predictive upgrade.

There's also the matter of public acceptance. Data-driven crime prediction tends to raise any number of civil rights concerns. In 2012, privacy advocates criticized the FBI for a similar plan to use Twitter for crime predictions. In recent months the Chicago Police Department's own methods have been knocked as a high-tech means of racial profiling. Gerber says his algorithms don't target any individuals and only cull data posted voluntarily to a public account.

"We lump everybody together and look at the aggregate of what are people talking about in this neighborhood," he says. "In that sense it feels a little more innocent than what people might immediately imagine when they hear about this kind of work."

Top image: Gil C /

About the Author

Most Popular

  1. Environment

    A 13,235-Mile Road Trip for 70-Degree Weather Every Day

    This year-long journey across the U.S. keeps you at consistent high temperatures.

  2. A photo of police officers sealing off trash bins prior to the Tokyo Marathon in Tokyo in 2015.

    Carefully, Japan Reconsiders the Trash Can

    The near-absence of public garbage bins in cities like Tokyo is both a security measure and a reflection of a cultural aversion to littering.

  3. A woman walks down a city street across from a new apartment and condominium building.

    How Housing Supply Became the Most Controversial Issue in Urbanism

    New research has kicked off a war of words among urban scholars over the push for upzoning to increase cities’ housing supply.

  4. Design

    Paris Will Create the City's Largest Gardens Around the Eiffel Tower

    The most famous space in the city is set to get a pedestrian-friendly redesign that will create the city’s largest garden by 2024.

  5. Design

    Bringing New Life to Frank Lloyd Wright’s Lost Designs

    “I would love to model all of Wright's work, but it is immense,” says architect David Romero. “I do not know if during all my life I will have time.”