The three dozen inspectors at the Chicago Department of Public Health scrutinize 16,000 eating establishments to protect diners from gut-bombing food sickness. Some of those pose more of a health risk than others; approximately 15 percent of inspections catch a critical violation.
For years, Chicago, like most every city in the U.S., scheduled these inspections by going down the complete list of food vendors and making sure they all had a visit in the mandated timeframe. That process ensured that everyone got inspected, but not that the most likely health code violators got inspected first. And speed matters in this case. Every day that unsanitary vendors serve food is a new chance for diners to get violently ill, paying in time, pain, and medical expenses.
That’s why, in 2014, Chicago’s Department of Innovation and Technology started sifting through publicly available city data and built an algorithm to predict which restaurants were most likely to be in violation of health codes, based on the characteristics of previously recorded violations. The program generated a ranked list of which establishments the inspectors should look at first. The project is notable not just because it worked—the algorithm identified violations significantly earlier than business as usual did—but because the team made it as easy as possible for other cities to replicate the approach.
And yet, more than a year after Chicago published its code, only one local government, in metro D.C., has tried to do the same thing. All cities face the challenge of keeping their food safe and therefore have much to gain from this data program. The challenge, then, isn’t just to design data solutions that work, but to do so in a way that facilitates sharing them with other cities. The Chicago example reveals the obstacles that might prevent a good urban solution from spreading to other cities, but also how to overcome them.
Cracking the code
After an initial test that failed, the Chicago innovation team retooled which variables they used to predict health violations—nine of them, including previous violations, nearby sanitary complaints, and length of time since last inspection—and how they weighed them. In September and October 2014, they generated a list of priority inspections and compared the projected violations with what inspectors really found. The results were clear: the algorithm found violations 7.5 days earlier, on average, than the inspectors operating as usual did.
“That trial gave us enough confidence that we were able to roll it out to drive day-to-day decisions,” Chicago’s chief data officer Tom Schenk tells CityLab.
Chicago started using the prediction tool for daily operations in February 2015, and the transition worked very smoothly, says Raed Mansour, innovation projects lead for the Department of Public Health. That’s because the department was careful to incorporate the algorithm in a way that minimally altered the existing business practices. Inspectors still get their assignments from a manager, for instance, but now the manager is generating schedules from the algorithm. The department will conduct an evaluation of the program after a year, and Mansour anticipates that the performance will meet or exceed the metrics from the test run.
But that was never meant to be the end of it. Back in November 2014, Schenk published the code for the algorithm on the programming website GitHub, so anyone in any other city could see exactly what Chicago did and adapt the program to their own community’s needs. That’s about as far as they could go to promote it, short of knocking on the door of every city hall in America. But the months since then have shown that it takes more than code to launch a municipal data program.
It still takes work
Just because an idea is good doesn’t mean it will spread. The New Yorker’s Atul Gawande dissected this difficulty with the example of solutions to the two scourges of surgery: pain and infection. After the first public demonstration of anesthesia in 1846, the technology proliferated throughout the world in a matter of months, making surgery significantly less frantic. But antiseptic methods, like washing hands and sterilizing the operating room, took decades to gain wide acceptance. The evidence was out there that it saved lives, but evidence alone doesn’t alter people’s behavior.
“People talking to people is still how the world’s standards change,” Gawande wrote.
Digital communication means that nowadays the talking doesn’t need to happen face to face. Urban data innovators can share their ideas and projects remotely, provided they know where to look and who to talk to. But there are still several hurdles between the idea stage and an active city data service.
For starters, Schenk says, there can be intellectual property issues. If the code belongs to someone, another city can’t just take it. The open data approach deals with this problem: cities can choose to share their work with whomever may be interested. But if the programmers build a project using expensive paid or proprietary software, other city governments probably won’t have access to it. That’s why the Chicago team worked with R, an open-source statistics program.
That leaves the requirement that a city have someone on staff with the technical ability to work with that software. This is less of a problem now, Schenk says, because it’s getting easier for governments to find eager partners at academic institutions and community groups who have the expertise and want to help. But that’s not all.
“The specifics do change between cities,” Schenk says. “To even pick up code and adapt it to your specific business practice still takes work.” Maybe another city’s public health department collects or formats their data differently, so the algorithm needs to account for that. Maybe the salient variables correlated to health violations differ empirically from city to city. At the very least, before a municipality spends taxpayer dollars to convert its restaurant inspections to a data-driven approach, they need to test that the approach works in that city.
Leaving the nest
Chicago passed around the free samples, but a year later only one government has taken a bite: Montgomery County, Maryland, just northwest of Washington, D.C. The county hired a private company called Open Data Nation to adapt Chicago’s code for use in the new location. Carey Anne Nadeau, who heads the company, ran a two-month test of the adapted algorithm in fall 2015 that identified 27 percent more violations in the first month than business as usual, and finding them three days earlier.
“The big win is it’s replicable—this is the first time anyone has been able to adapt the algorithm from its initial development,” she tells CityLab. “It’s possible to do this outside of Chicago.”
Not only is it possible outside of Chicago, but it’s possible in a radically different built environment, says Montgomery County Chief Innovation Officer Dan Hoffman. The county sprawls across 500 square miles, including urban, suburban, and rural territory. Success there speaks to the robustness of the approach.
To get to that point, Nadeau’s team added some variables to the roster used in Chicago, like Yelp reviews and nearby construction permits (construction seems to stir up pests and dust, leading to deterioration of food safety). So far, the revamped algorithm has only succeeded in theory, so real-world trials are needed to see if those results hold during day-to-day operations. Next up for Open Data Nation is to produce a mobile app for health inspectors and to build out similar algorithms for 10 other cities.
In so doing, Open Data Nation suggests one way to bridge the civic data gap between cities: private-sector partnerships. If a city doesn’t have the resources or expertise to code a predictive algorithm, they can contract it out to companies that do. The company earns revenue, of course, and if it all works out, the government saves money through efficiency. “As Montgomery County grows and adds more restaurants,” Nadeau notes, “they don’t have to grow their food inspection budget to respond to the growth in the city.”
To fully capitalize on predictive uses of urban data, cities could work with a growing community of data-savvy companies and researchers, and philanthropic foundations could fill in some of the funding gaps. The Knight Foundation is already supporting civic data experiments with its Prototype Fund, for instance, as is Bloomberg Philanthropies with its What Works Cities initiative.
But there’s still a significant benefit to having more data experts within city governments, says Eric Potash, a postdoctoral researcher at the University of Chicago’s Center for Data Science and Public Policy. He’s working with Chicago’s public health department on a project to use data to predict lead contamination in housing before it poisons children. He points out that collecting data is a messy task, with different troves of information stored separately in different departments. Having an advocate “on the inside” can really help speed up that process.
Finally, it’s important to remember that predictive modeling is simply an optimization tool. It’s good or bad only insofar as it helps an existing government practice work more effectively.
“If you tell it to optimize for finding kids that are going to be poisoned by lead, that sounds great,” Potash says. “But there are lots of other cases where governments are using predictive technologies and people aren’t as excited—predictive policing is the obvious example. It’s not as simple as ‘prediction solves all problems.’”