The New York City Police Department publishes an incredibly dense database every year of the profiles of controversial “stop-and-frisk” encounters between officers and residents throughout the city. In any given year, there may be 500,000 or more such stops on the city’s streets, where police question (and sometimes search) anyone who appears suspicious. Plot all of these "stop-and-frisks" on a map, using their geotagged latitude and longitude coordinates, and the picture looks something like this:
“It just sort of looks like Christmas lights,” says Chris Herwig, a data analyst at Mapbox. He created the above map, using 2008 data from the city. The stop-and-frisks neatly overlay with the city’s street network. But that’s about all that map reveals. “The problem is that it’s very dense data that we’re talking about,” Herwig says. “You can’t really see patterns with this. You just see these lines that coincide with the streets – yes. But it’s hard to look at that and get some kind of analytical insight into where the pattern is coming from.”
This is a common problem with any dense dataset about urban living, where so many little points of information crowded onto a map – incidents of crime, 3-1-1 calls, school enrollment – can start to blend into each other. In New York City, stop-and-frisks occur much more frequently in some neighborhoods than others. But the density of those cases is hard to map precisely because they overlap atop one another.
This gave Herwig another idea for how to visualize data points that essentially pile up on a given place. "Why don’t I treat them like elevation?" he says. Dense information has a topology in the same way that physical terrain does. Borrowing the contour lines of geographers, Herwig ultimately translated that above map into this one:
Here he is displaying information about an otherwise flat place borrowing a method used to map mountains, and the idea could have numerous other applications. "What this seemed to work well with was densely packed data points, geographic points that are relatively dense in their distribution, but also kind of random all over the city," Herwig says. "Eventually you see there are patterns to it." Zoom in to certain corners of the city, like this neighborhood in Brooklyn, and veritable peaks appear:
Herwig has also done the same with stop-and-frisk data most recently from 2011, for comparison. As an illustration of the idea's broader application, data journalist Gerald Rich recently took Herwig's method and used it to map an entirely different dataset at the national scale: the density of offensive place names. (Rich was inspired by this Jon Stewart clip on a New Hampshire town called "Jew Pond.") Using a dictionary of racial slurs, Rich weeded through geographic place names kept by the US Geological Survey to turn up locations like "Squaw Everest" and "Dead Negro Draw."
These names exhibit a density of their own around the country, appearing more often in some regions than others (Native-American slurs, Rich notes, are concentrated around the Appalachians). This is his topo map of information elevation:
Most interestingly on Rich's map, some of the dirty-place-name density actually bears a relationship to the real geography of the land. Here's a view of the Appalachians (with the real mountains shown beneath them), where inhabited places in general – and dirty place names among them – densely cluster in parts of the mountains: