rom left, U of T researchers Wenjie Luo, Associate Professor Raquel Urtasun, and Bin Yang at Uber’s Advanced Technologies Group (ATG) Toronto. Credit: Ryan Perez A self-driving vehicle has to detect objects, track them over time, and predict where they will be in the future in order to plan a safe manoeuvre. These tasks are typically trained independently from one another, which could result in disasters should any one task fail.
Researchers at the University of Toronto's department of computer science and Uber's Advanced Technologies Group (ATG) in Toronto have developed an algorithm that jointly reasons about all these tasks – the first to bring them all together. Importantly, their solution takes as little as 30 milliseconds per frame.
"We try to optimize as a whole so we can correct mistakes between each of the tasks themselves," says Wenjie Luo, a Ph.D. student in computer science. "When done jointly, uncertainty can be propagated and computation shared."
Luo and Bin Yang, a Ph.D. student in computer science, along with their graduate supervisor, Raquel Urtasun, an associate professor of computer science and head of Uber ATG Toronto, will present their paper, Fast and Furious: Real Time End-to-End 3-D Detection, Tracking and Motion Forecasting with a Single Convolutional Net, at this week's Computer Vision and Pattern Recognition (CVPR) conference in Salt Lake City, the premier annual computer vision event.
To start, Uber collected a large-scale dataset of several North American cities using roof-mounted Li-DAR scanners that emit laser beams to measure distances. The dataset includes more than a million frames, collected from 6,500 different scenes.
Urtasun says the output of the LiDAR is a point-cloud in three dimensional space that needs to be understood by an artificial intelligence (AI) system. This data is unstructured in nature, and is thus considerably different from structured data typically fed into AI systems, such as images.
"If the task is detecting objects, you can try to detect objects everywhere but there's too much free space, so a lot of computation is done for nothing. In bird's eye view, the objects we try to recognize sit on the ground and thus it's very efficient to reason about where things are," says Urtasun.
To deal with large amounts of unstructured data, Ph.D. student Shenlong Wang and researchers from Uber ATG developed a special AI tool.
"A picture is a 2-D grid. A 3-D model is a bunch of 3-D meshes. But here, what we capture [with Li-DAR] is just a bunch of points, and they are scattered in that space, which for traditional AI is very difficult to deal with," says Wang (pictured left).
Urtasun explains there's a reason AI works really well on images. Images are rectangular objects, made up of tiny pixels, also rectangular, so the algorithms work well on analyzing grid-like structures. But the LiDAR data is without any regular structure, making it difficult for AI systems to learn.
Their results for processing scattered points directly is not limited to self-driving, but any domain where there is unstructured data, including chemistry and social networks.
Nine papers will be presented at CVPR from Urtasun's lab. Mengye Ren, a Ph.D. student in computer science, Andrei Pokrovsky, a staff software engineer at Uber ATG, Yang and Urtasun also sought faster computation and developed SBNet: Sparse Blocks Network for Fast Inference.
"We want the network to be as fast as possible so that it can detect and make decisions in real time, based on the current situation," says Ren. "For example, humans look at certain regions we feel are important to perceive, so we apply this to self-driving."
To increase the speed of the whole computation, says Ren, they've devised a sparse computation based on what regions are important. As a result, their algorithm proved up to 10 times faster when compared to existing methods.
"The car sees everything, but it focuses most of its computation on what's important, saving computation," says Urtasun.
"So when there's a lot of cars [on the road], the computation doesn't become too sparse, so we don't miss any vehicles. But when it's sparse, it will adaptively change the computation," says Ren.
The researchers released the SBNet code as it is widely useful for improving processing for small devices, including smartphones.
Urtasun says the overall impact of her group's research has increased significantly when they've seen their algorithms implemented in Uber's self-driving fleet, rather than reside solely in academic papers.
"We're trying to solve self-driving," says Urtasun, "which is one of the fundamental problems of this century."
Explore further: Uber setting up artificial intelligence lab in Toronto
More information: Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net: openaccess.thecvf.com/content_ … _CVPR_2018_paper.pdf