When it comes to the futuristic AR/VR and neural interface research that will underpin the coming metaverse, seeing is believing. Much of this research — led by Reality Labs Chief Scientist Michael Abrash — is still several years out from possibly making its way into consumers’ hands, and not everyone can make the trek to our laboratories in Redmond, Washington, to get hands-on with the tech. But the proof is in the products. Soon, people will be able to experience Meta Quest Pro for themselves and see the fruits of some of that research first-hand, from our hand tracking technology to pancake lenses — both of which began life as research projects. Following years of prototyping in the lab, our research and product teams worked as a single unit to get these technologies into people’s hands. And there’s much more on the horizon.
At Reality Labs, we’re inventing a new computing platform — one built around people, connections, and the relationships that matter. Today at Meta Connect, Abrash joined Mark Zuckerberg on the virtual stage to give an update on our research efforts that are laying the groundwork for this future. There’s a lot of ground to cover, so let’s dive right in.
As computing took hold over the past half century and gave rise to the personal computer, laptop, tablet, and smartphone, the graphical user interface or GUI largely defined people’s experience of the digital world, opening up entirely new possibilities for interacting with it. But as we look ahead to future augmented reality (AR) glasses and next-generation VR headsets, we see the need for a paradigm shift in human-computer interaction — one that will help these new devices function in any of the myriad situations you might encounter in the course of a day, while also letting you stay fully present and focused on the world and people around you.
Along with new inputs (we’re betting on wrist-based interaction using electromyography (EMG)), our vision for the future calls for personalized artificial intelligence (AI). That AI will work in concert with future devices — particularly AR glasses — to sense and reconstruct the world around you and to understand the context in which you’re using your device. With this understanding, the AI will be able to make suggestions and take action proactively to help you get things done — ideally, so seamlessly that you may not even notice.
Combine personalized AI with EMG — which uses the neuromuscular signals through your wrist directly as input — and you have an intuitive, almost frictionless interface. This will be the first truly human-centered interface — in part because it won’t force people to learn a difficult new control scheme. Instead, your future devices will learn and adapt to you as you use them. By combining machine learning and neuroscience, this future interface will work for different people while accounting for their differences in physiologies, sizes, and more through a process known as “co-adaptive learning.”
In the first demo, you can see two people playing an arcade game using wrist-based EMG. While they’re both using the same gesture for their in-game controls, no two people are alike, so they perform that gesture in slightly different ways. Each time one of them performs the gesture, the algorithm adapts to interpret that person’s signals, so each person’s natural gesture is quickly recognized with high reliability. In other words, the system gets better at understanding them over time.
And the potential of co-adaptive learning extends beyond full gestures to microgestures. The algorithm learns in real time how to respond to the EMG signals the person is activating and sending to their hand with only the slightest of hand movements. The system recognizes the actions the person has already decided to perform by decoding those signals at their wrist and translating them into digital commands.
This lets them communicate their intended actions to the computer with almost no hand movement. Just as the GUI marked a sea change in how we interact with computers, EMG has the potential to completely transform the way we interact with the digital world — letting us not just do more, but do it in the way we want to do it.
At Connect, Zuckerberg showed a working demo that lets you control an AR/VR device with motor neuron signals.
With a few subtle movements, you can check your messages, take a photo, and more. The more subtle the gestures, the more seamless and effortless the experience becomes — letting you stay in the moment without having to dig out your phone.
This is just the beginning. True AR glasses and their future interface promise to unlock even more useful interactions, like being able to right-click on a physical object or location to pull up detailed information about it, better control our devices without having to look away from the world around us, and get proactive assistance from a personalized, AI-powered digital assistant. Together, they’ll deliver a more natural, human-centric approach to computing — and likely open up new possibilities we haven’t even thought of yet.
These future interfaces will be truly life-changing. One particularly inspiring example is Carnegie Mellon University’s NavCog project to help the visually impaired.
Two years ago at Connect, we unveiled Project Aria — a new research project to help us build the first generation of wearable AR devices. As part of that announcement, we announced our pilot program with Carnegie Mellon University (CMU) and its Cognitive Assistance Laboratory to build 3D maps of museums and airports that will have multiple applications, including helping people with visual impairments better navigate their surroundings indoors, where GPS signals often don’t reach.
Today, we shared an update on that work.
Developed by CMU, NavCog is a smartphone app designed to help people with visual impairments better navigate indoor environments. CMU has been working on the NavCog project since 2014. The open source project has many collaborators around the world. CMU researchers used Project Aria to build a 3D map of the Pittsburgh International Airport. They could then use that map to train AI localization models running on a mobile phone. With NavCog, people can figure out where they are in the airport without having to rely as heavily on external bluetooth beacons placed around the airport. This is a significant development in that it points to the ability to eventually deploy NavCog at scale.
The ability to build and manipulate 3D objects will play a key role in the metaverse. For example, imagine displaying a treasured family heirloom in your Meta Horizon Home in VR, or being able to use AR to see what a new painting looks like in your living room before you buy the physical object. It’s hard to build 3D objects from scratch, and using physical objects as templates could be easier and faster. But there’s no seamless way to do that today, so we’re researching two different technologies to help solve that problem.
The first one uses machine learning-based Neural Radiance Fields (NeRFs) to reconstruct the appearance of a 3D object from multiple 2D images taken at different angles. This method can reproduce even the finest details of an object.
There’s some processing time involved, but the results are impressive. The texture comes through really clearly, and you can pick up fine details like individual strands of fur.
The second technology captures geometry and appearance directly. We use a different technique called inverse rendering to scan an object and bring its digital twin into VR or AR. It responds dynamically to lighting in a VR environment. And when you toss it in the air, throw it against a wall, or bounce it off the ground, it responds in the same way that the physical object would.
Neither of these technologies operate in real time yet, and they both have their limitations. But they’re important steps on the path toward the goal of helping people easily make physical objects a part of their virtual world.
We know that avatars in the metaverse will come in a variety of styles. But what if you could have a truly photorealistic representation of yourself and use that to interact with other people? It would be the most powerful remote connection technology that’s ever existed, creating a genuine sense of social presence — the feeling that you’re right there with another person or people, despite the physical distance that might separate you.
Last year, we shared some early progress on full-body Codec Avatars. What sets Codec Avatars apart from other high-quality avatars you might have seen is that they can be automatically generated and driven in real time to match how real people look and move. This means they’re not limited to pre-set movements. You can control them live and in real time.
We’ve continued our work to develop this technology, and today we revealed that it’s possible to change Codec Avatars’ virtual outfits.
That means that, in the future, you’ll be able to seamlessly jump from an action-packed gaming session to an important business meeting and then back to your home environment to hang out with friends or family in the metaverse.
We also showed our latest progress on Codec Avatars 2.0 — our research project to make the facial expressions of our Codec Avatars truer to our physical forms — alongside Zuckerberg’s own 2.0 Codec Avatar.
We’ve made Codec Avatars 2.0 far more expressive. Beyond simple things like looking left, right, up, and down, we’ve incorporated non-verbal cues that people rely on to communicate with each other and understand tone — things like raising an eyebrow, squinting, widening your eyes, and scrunching your nose. By better capturing those subtle expressions, Codec Avatars 2.0 are now far more lifelike and natural. You can even control the lighting on your Codec Avatar 2.0 to add another dimension of realism.
Codec Avatars are impressive, but they take a long time to generate, so we’re working on something much faster for people to use in the future: Instant Codec Avatars.
Instant Avatars are much faster and easier to make than Codec Avatars. All you need for the scan is your phone and some decent lighting. You scan your face from multiple angles with a neutral expression for about 30 seconds, then you spend an additional 90 seconds making a variety of expressions as you continue the scan. It currently takes a few hours to generate an Instant Avatar following the scanning process, and the team is working to reduce the overall processing time.
We’ve made a lot of progress with Instant Codec Avatars. While they aren’t on par with the quality and realism of Codec Avatars 2.0, Instant Codec Avatars still feel realistic and expressive. And because all you need for the scan is your phone, they’re far more accessible.
This work is still at the research stage and may or may not ever make its way into an actual product. Still, it’s a glimpse at where the technology is headed over the next five to 10 years. And our research over the past several years has laid a compelling foundation on which we’ll continue to build toward the future.
Reality Labs brings together a world-class team of researchers, developers, and engineers to build the future of connection within virtual and augmented reality.