Building products to help people connect is a key tenet at Facebook Reality Labs. In the past few years, we’ve shared our progress on machine perception, lifelike avatars, and even brain-computer interfaces. Last month, FRL Research Chief Scientist Michael Abrash talked about the road to AR glasses at Facebook Connect. In the latest entry of our Inside Facebook Reality Labs series, we’ll learn how researchers “cracked” hand tracking technology on the Quest, the first input solution of its kind to debut on a standalone VR headset, and share a few research updates on text input.
Hand tracking is one of many endeavors inside FRL Research designed to make interacting with technology feel more approachable. From VR headsets to AR glasses, we're building the next computing platform to benefit as many different people as possible. Hand tracking technology opens immersive computing to more people by adopting a “come as you are” approach to human-computer interaction.
FRL’s work building more natural devices started with Touch controllers. Comfortable to hold and packed with sophisticated sensors, Touch controllers delivered life-like hand presence and made even the most basic interactions in VR (like grabbing a door handle) feel more like the real thing. But for all their considerable benefits, Touch controllers can’t replicate the expressiveness of a peace sign or the efficiency of typing. When it comes to doing things naturally, nothing currently beats the human hand. Intuitive, adaptable, and understood by billions of people, human hands help us accomplish routine tasks, creative works, and everything in between — the ideal input for most activities. But more than five years ago, when FRL Research started working on hand tracking for VR, no one had ever shipped consumer-quality, controller-free hand tracking. The challenge was nothing less than to develop the technology from scratch to make VR more approachable by creating a new interface based on the human hand. Happily, we succeeded.
We explore how we accomplished that, as well as further potential of hand tracking, through two new publications. This week at UIST 2020, a symposium on user interface software and technology, FRL Research debuted a novel method that enables touch typing without a physical keyboard. We showed that hand tracking can potentially deliver the efficiency and familiarity of traditional typing, but on any flat surface. And at SIGGRAPH 2020, an annual conference on computer graphics, FRL researchers shared how they delivered practical hand tracking on Quest through numerous breakthroughs in mobile computing. Today, every Quest headset ships with hand tracking out-of-the-box.
Natural typing without a keyboard
As part of our ongoing hand tracking research, FRL Research is constantly exploring new, experimental forms of text input, which is a critical task for both communication and productivity. This week at UIST, FRL Research proposed a novel method that enables touch typing on any flat surface — without a keyboard. This new approach uses hand motion from a marker-based hand tracking system as input and decodes the motion directly into the text they intended to type. While still early in the research phase, this exploration illustrates the potential of hand tracking for productivity scenarios.
To support touch typing without a physical keyboard — and without the benefit of haptic feedback from individual keys — the team had to make sense of erratic typing patterns. They adopted statistical decoding techniques from automatic speech recognition, and where speech recognition uses an acoustic model to predict phonemes from audio frames they instead use a motion model to predict keystrokes from hand motion.
This, along with a language model, predicted what people intended to type despite ambiguous hand motion. Using this new method, typists averaged 73 words per minute with a 2.4% uncorrected error rate using their hands, a flat surface, and nothing else, achieving similar speed and accuracy to the same typist on a physical keyboard.
This surprising result led researchers to investigate why hand tracking was more effective than other physical methods, like tapping on a tablet. The team discovered hand tracking is uniquely good at isolating individual fingers and their trajectories as they reach for keys — information that is missing from capacitive sensing on tablets and smartphones today.
Hand tracking on Oculus Quest
While the potential of hand tracking is massive, delivering hands that work in virtual space as they do in real life is equally daunting. In work at SIGGRAPH 2020, researchers described the numerous issues to delivering hands on the Oculus Quest.
The first issue is “working volume” — the physical space where your hands are tracked. Without sufficient working volume, your hands would simply disappear while still in the visual field of view, abruptly interrupting user interaction. The second issue is tracking jitter, which produces jerky and unnatural hand movements. Jitter is currently an under-explored area in hand tracking research, with most benchmark datasets today based on still frames, not fluid motion. Finally, there’s the hardware resource requirement. Past research has largely focused on high-end PCs with a powerful graphics processing unit. But the original Quest has about two orders of magnitude less power than your typical high-end PC.
FRL’s hand tracking solution on Quest delivers the largest working volume by leveraging the four cameras on Oculus Quest. Initially, the plan was to use just two of the cameras, but researchers noticed their hands would vanish even within the field of view of the display. Using all four cameras substantially increased tracking volume but introduced a fresh computer vision challenge: taking visual data from four different sources and stitching it together in real-time. At any given time, your hands may be within the field of view of one or more of the Quest cameras. As your hand moves, it can leave one camera’s field of view and enter another’s. FRL researchers had to build a framework that could track the hand even as it moved between cameras and a new framework to stitch it all together.
The final technical hurdle was Quest’s mobile processor, which already shares limited power resources to run games and applications. The answer to the power problem came through building efficient neural network architectures. Over the years, the research team has made significant progress in tailoring neural network designs to specific mobile processors, such as the Snapdragon 835 and Hexagon DSP on the Quest. The team also leveraged the regularity of hand motion to predict where the hand would be next, which reduced neural network evaluations. Altogether, these optimizations mean that hand tracking on the original Quest uses only 7% of the battery under everyday use.
To keep hand motion smooth and realistic, researchers also developed a hand keypoint estimation network. Traditional systems predict key points, like fingertips, from a series of still images. But predicting the hand from a single still image leads to jittery and inconsistent results between cameras. Instead, researchers posed the problem differently, allowing the network to access previously predicted keypoints. With this design change, researchers demonstrated their network could successfully track partially visible hands at an image boundary. By conditioning the network with this new, temporal information, the team achieved a significant reduction in jitter without sacrificing accuracy, both of which are essential ingredients when building lifelike hand tracking.
Experience hand tracking today
We’re still in the early stages of building more natural ways to interact with our devices. But this year, we’ve seen developers bring incredible experiences to life through controller-free hand tracking on Quest. Waltz of the Wizard gives players the power to conjure spells and brew arcane ingredients with the snap of a finger. In the Emmy Award-winning interactive experience The Line, users can manipulate knobs, switches, and other objects to enhance the story.
Spatial allows you to join colleagues in virtual meeting rooms where you can use your hands to more naturally convey emotion, gesturing, and even high-fiving teammates. In The Curious Tale of the Stolen Pets, players use their hands to interact with miniature worlds and solve mysteries. You can explore these experiences by heading over to the Oculus Blog. And to hear about how CTO of Fast Travel Games Kristoffer Benjaminsson built intuitive hand tracking in Stolen Pets, check out the Developer Super Session talk from Facebook Connect. If you’re a developer yourself, you can learn how to build hand tracking into your project on the Developer Blog.
Hand tracking is just one method we’re exploring to help people do more with technology. And despite the progress we’ve made, we’re still at the beginning of a long road. Finding new and better ways of doing things demands further research and creative exploration. So while we’re thrilled with the milestones already achieved at FRL Research, we’re even more excited about what’s coming in the future.
Hand tracking research is a cross-functional effort involving multiple teams at Facebook. Key contributors include Shangchen Han, Beibei Liu, Randi Cabezas, Christopher Twigg, Peizhao Zhang, Jeff Petkau, Tsz-Ho Yu, Chun-Jung Tai, Muzaffer Ackbay, Zheng Wang, Asaf Nitzan, Gang Dong, Yuting Ye, Lingling Tao, Chengde Wan, Manish Kushwaha, Weiguang Si, Yue Yu, Mark Richardson, Matt Durasoff, Peter Vajda, Eldad Isaac, Ed Wei, Rajesh Shenoy, Dima Svetlov, and Robert Wang