Emma Green is a staff writer at The Atlantic, where she covers politics, policy, and religion.
Be wary of self-selection bias when measuring engagement with digital platforms, a New York City official warns.
Like businesses and organizations everywhere, city governments are getting on board with big data. In particular, the New York City mayor’s office has embraced analytical tools. According to The New York Times, the city's Office of Policy and Strategic Planning processes a terabyte of information (the equivalent of about 143 million pieces of paper) about its citizens every day, and the New York Police Department famously uses Compstat, an analytics software, to track and respond to crime. But cities have to be wary of how they use data, cautioned Chris Corcoran, the deputy in the Mayor’s Office of Analytics, in a panel discussion at The Atlantic’s CityLab summit.
I used to work for an energy efficiency company, and we had people input information about their energy use, and we had a participation of about 30 percent. There was a huge self-selection bias.
If you looked at Foursquare data for New York City, and you looked at day-to-day operations of where Foursquare operates, you would assume everyone lives in Midtown – you would never see anything in the outer boroughs.
As we talk about crowdsourcing information, at least from the cities perspective, we have to be cautious about the huge, silent majority of people who are not participating in all these platforms, and how we still make sure we’re using data to serve their needs.
As cities figure out new ways to use data to their advantage, this warning – from a data scientist, no less! – is worth remembering: Although social media and digital platforms can be powerful tools for connecting with residents, geography and income can easily bias who is able to use those tools.