When Anna Kozminski joined Facebook as a software program manager in 2018, her mandate was clear: Help cut the cord on VR devices, so anyone, anywhere, could put on a headset and instantly be immersed in virtual reality, without having to set up external tracking cameras to capture their movements.
“We wanted to create a system that lets you move and explore a VR world just as naturally and easily as you would in real life,” says Kozminski.
Kozminski joined a team whose mission was to create the first full-featured “inside-out” tracking system for a consumer VR device. The technology would have to track the full range of a person’s movements (known as six degrees of freedom) and be able to pinpoint the location of the two handheld controllers as well as the headset.
Previously, VR devices relied on external sensors to track these movements. These cameras attach to a PC, and while they work well, they make VR less portable and more complicated to set up.
“With inside-out tracking in the headset, VR becomes as easy as putting on headphones to listen to music,” says Kozminski.
But the team’s mission was far from easy. They had to take state-of-the-art computer vision technology from the research lab and make it work in a consumer device that anyone could use. The tracking had to be accurate down to less than a millimeter — enough to capture a subtle tilt of your head or twitch of your hand. It had to be robust enough to work in a nearly infinite variety of conditions found in real-world homes. And it had to be efficient enough to work on a battery-powered device like Oculus Quest.
To do this, Kozminski and her team used computer vision and hand-built algorithms to generate a real-time 3D map of your immediate surroundings so the headset could compute your position within the map and translate it into VR.
We call this system Oculus Insight, and it’s what makes our new Oculus Quest and Rift S headsets possible. And today, we’re pulling back the curtain on how a team of engineers in Zurich, Menlo Park, and Seattle brought that technology to life.
The foundation of Oculus Insight’s inside-out tracking is simultaneous localization and mapping, or SLAM, which uses computer vision CV algorithms to essentially fuse incoming data from multiple sensors in order to fix the position of an object within a constantly updated digital map. SLAM has been used in robotics and in AR camera effects on smartphones and was demoed in the Oculus Santa Cruz VR headset prototype in 2016. But Oculus Insight required an unprecedented level of precision and efficiency, and that meant adapting the latest research on tracking and computer vision.
“A lot of these technologies really start in academia — inside the lab,” Kozminski notes. It’s no coincidence, then, that she’s part of Facebook’s Zurich-based team of engineers, many of whom came from Zurich Eye — a joint program from the prestigious ETH University and University of Zurich that researched self-navigating systems.
To build a new, more advanced version of SLAM, the engineering team drew from Facebook’s years of AI research and engineering work, building systems to understand the objects and actions that appear in videos and creating highly efficient computer vision algorithms that work well on mobile devices.
Oskar Linde, the lead machine perception architect on the Facebook team that worked on Oculus Insight, was already experienced in building ultra-efficient SLAM systems. Linde had cofounded 13th Lab, a startup that demonstrated the world’s first use of visual SLAM in a consumer application as part of a mobile AR game in 2011. When Facebook acquired 13th Lab and its SLAM technology in 2014, Linde joined the company to develop standalone inside-out tracking for VR headsets, forming the team that kicked off the continuous development of Oculus Insight.
By 2017, the Oculus Insight team was in full swing, though they faced a central challenge: creating SLAM-based tracking that was both extremely precise and efficient enough to run on a mobile device like Oculus Quest. Linde was joined by Engineering Manager Joel Hesch, who had previously studied SLAM applications using visual, laser, and inertial sensors to aid robot navigation, and worked on mobile AR and VR applications. Hesch joined Facebook to lead the team that would bring Oculus Insight to Quest and Rift S.
Linde, Hesch, Kozminski, and their team drew upon Facebook’s previous SLAM work for mobile AR use as well as the tracking technology in the original Oculus Rift VR system, but they had to find new ways to adapt and reengineer them for inside-out tracking in VR headsets.
On a smartphone, SLAM uses the phone’s camera to create “world-locked” photo and video effects. But with VR, there are multiple cameras, additional sensors, and three different objects to track in 3D space.
“We’ve got three moving pieces at once: the headset and then two additional controllers,” says Kozminski. “We need to get that pose exactly right every time.”
There are other complications, too. The infrared LEDs in the two hand controllers drastically change appearance when they move closer or farther away from the headset as you swing a virtual sword or maneuver a virtual spaceship. Oculus Insight also uses other sensors, drawing acceleration and velocity data from the inertial measurement units (IMUs) located in the headset and controllers. The system must process all of these data points in real time and, in the case of Quest, on a mobile chipset.
To address all these challenges, the Oculus Insight team methodically refined their system. They built new computer vision algorithms to boost the system’s tracking precision and speed. They recorded thousands of hours of video in a wide range of sample environments, and then they used the footage to teach the system to identify features in its environment. By spotting and tracking, say, the corners of a couch or the edge of a table, Oculus Insight can triangulate a person’s exact location within a room, in real time — similar to the way our eyes detect objects to help orient us.
The team also used arrays of extremely accurate OptiTrack motion-capture cameras — the same kind of cameras used for Hollywood VFX productions. By comparing the measurements recorded with the OptiTrack cameras with the data from Oculus Insight, the engineers were able to fine-tune the system’s computer vision algorithms so they would be accurate within a millimeter.
While the focus inside a research lab is on achieving measurable, accurate, and repeatable results, building with a focus on the people who would use this technology in their daily lives required a shift in focus to perceptual metrics. In other words: How does a given experience actually feel for the person inside VR?
To address perceptual artifacts like swimminess (the disorienting sensation you get when your physical and virtual positions and movements don’t line up) and judder (visual strobing and smearing between frames), the engineering team got creative.
The Zurich team spent an exhaustive amount of time and energy testing Oculus Insight with the OptiTrack motion capture system in a number of environments and under various conditions — using themselves as test subjects.
Today, Oculus Insight makes it easier than ever for people to experience VR. With Rift S, you simply plug your headset into your PC, with no other cables or connections required. Quest cuts out the computer entirely to enable room-scale experiences right out of the box. But our vision for the future extends beyond what’s possible today.
The same technology that currently powers Oculus Insight (as well as AR experiences on Facebook, Instagram, Messenger) will ultimately translate to new experiences on future devices. Eventually, it will be the foundation of lightweight, stylish AR glasses.
We still have a long way to go, but Oculus Insight brings us one step closer.
Reality Labs brings together a world-class team of researchers, developers, and engineers to build the future of connection within virtual and augmented reality.