Before self-driving vehicles can truly operate autonomously, they’ll need to master Street Sign 101. They might be almost there.
Traffic signs are designed to flag down human drivers, relying on blocky shapes and bold colors to grab our gaze from long distances as we whoosh past at top speed. But how do you train a robot to spot and interpret these signs, in all their mystifying international variations?
That’s one of the many challenges facing autonomous vehicle (AV) pioneers like Google, Uber, et al. as they lay the groundwork for our self-driving automotive future. The sign problem is particularly knotty. The rise of AVs makes a stronger-than-ever case for standardizing traffic signage within national borders (road safety practitioners have long argued for the same thing, for human benefit), and possibly from country to country. Once humanity gives up the keys, we won’t need street signs at all—the robots will communicate with one another, and with the roads themselves, much more efficiently.
But for now, and during the Great Transition, AVs need to understand signs, because they will be sharing the roads with people-driven cars, and we need signs. (Though fans of the Dutch woonerf model of street design disagree.) AVs also need to be sign-literate in various cultures. Besides understanding variations in straightforward stuff like “Yield” and “Do Not Enter,” there are moose crossings in Canada, polar bear warnings in Norway, and volcano route markings in Washington State. Ideally, AVs should know them apart. On top of that, they must recognize changing details in their surroundings at any time, or type, of day.
Some robotic vehicles are learning signs from photographs of real-world proving grounds. Others are “studying” global signage in their labs, using images from one of the world’s largest databases of street-view photographs. That’s what the Swedish startup Mapillary, a crowdsourced alternative to Google Maps’ Street View, is up to. It’s using machine-learning technology to sort through its 114 million ground-level images—spanning 1.6 million miles of streets—and pluck out features most relevant to robotic cars, which it then sells in neat data-sets to AV manufacturers and tech companies. Right now, the focus is on Traffic Signs 101.
Autonomous vehicle software can be “trained” to read thousands of signs in Mapillary’s library of 500-odd sign types from more than 60 countries, using what engineers call “neural networks”—a computer system that “studies” lots of tweaked versions of a thing to learn how to infer what it is, at any angle or condition. The more traffic signs the software sees, the better it can commit to memory the difference between, say, an underpass and a train crossing, and how those might be represented in Japan versus Norway versus Nicaragua. Then the car can respond accordingly in the real world.
According to Jan Erik Solem, Mapillary’s CEO, his company’s databases offer a much more efficient way for software manufacturers to get car-brains up to speed than collecting snapshots of traffic signs on their own. And, as Solem argues, the variations and quirks inherent in Mapillary’s crowdsourced imagery may make his product particularly useful for AV training. Engineers are developing all manner of cameras to accompany the LIDAR, radar, and sensor technologies that help AVs “see” the world. Mapillary’s photo pool is built from photographs snapped by many different camera types, under all kinds of weather and lighting conditions. “The bottleneck for software engineers is about finding good-enough training data with enough variability,” Solem says. He thinks his product is a match.
Mapillary’s reason-for-being stems from a similar logic: Gathering a lot of variable, even blurry images for less money might be more useful than a lot of crisp, uniform ones for a lot of it. Solem founded his company because he thought he could compete with some of Google Maps’ Street View products for way less overhead. Rather than deploy an enormous fleet of camera-mounted vehicles to traverse every street on the planet, Solem, an AI specialist by training, set out to collect free, crowdsourced imagery of roads all over the world. Mapillary’s real genius lies in its computer-vision software, which weaves its snapshots together and renders a place explorable and comprehensive, much like Google’s Street View.
And what about Google? Is Waymo, Alphabet’s self-driving spin-off company, “training” their self-driving software with Street View images? Waymo is no longer building its own AVs, but it is developing software and hardware. Google has been roaming the earth with their all-seeing camera cars—which kind of resemble their earliest self-driving vehicles, come to think of it—since 2007. To imagine that at least part of the goal all along was to feed the image-hungry brains of the company’s self-driving car moonshot might have seemed somewhat tin-foil-ish even five years ago. Now it makes Google seem incredibly far-sighted.
Is it true? A Waymo communications representative said that AV-training protocol got close to elements of their work that have to stay under wraps, and hasn’t provided further details. (I’ll update when and if I hear more.) So I asked around: Two mapping industry leaders said, off the record, that they believed Alphabet’s cars very likely do “learn” from Street View. One speculated the images are likely used “somewhere” among a mix of datasets; the other guessed Street View was used “extensively.”
Raj Rajkumar, a roboticist and autonomous vehicle authority at Carnegie Mellon University, thought it was less likely: Since the sky is always blue in the land of Google Maps, there may not be enough weather/light variation to make the images useful as training tools. Most of the training data, he said, would be collected by the cars themselves, in the real-world environments they’re tested in.
Either way, this little story about traffic signs turns out to be a startling confluence of technological trajectories. Mapillary and Street View both contain an ocean of photographs pulled together by the Internet’s far-reaching currents—one of them thanks to the proliferation of iPhones and other mobile devices, another to advancements in 360-degree cameras. Both rely heavily on recent advances in image-recognition software, and ever-more accurate remote-sensing technology. Both have generated “textbooks” full of images for computers to learn from, even if that wasn’t the original intent. Autonomous vehicles need all of these supporting tools to truly take off on their own. We’re living in that moment of convergence.
Prognostications of self-driving cars’ pending arrival on real-world streets can sometimes feel far-fetched, even scary. Could these vehicles really be around the corner, reading our stop signs and talking to our sidewalks, when right now some can’t even manage a red light?
It’s true there’s still a ways to go before truly driverless robo-cars arrive, especially with regards to regulating them. But a glance backwards gives an opposite impression: Much like Mapillary’s millions of disconnected images stitched together by a highly “intelligent” machine, a bunch of technologies that were used to be distinct—maps, cameras, and computer-vision—are now intertwined. Together, they’ve already brought us pretty far. With that in perspective, it seems a little more natural that we’d be nearly ready to drive.