At Facebook Reality Labs, we are building a future where real and virtual worlds can mix freely, making our daily lives easier, more productive, and better connected. Power consumption is one of the challenges in getting to that future. In order to get to augmented reality and virtual reality (AR/VR) devices that can be worn comfortably for however long you want, up to and including all day, VR headsets must become more power-efficient, and AR glasses have to consume much less power still. As part of building AR/VR systems that get to that next level, we’re developing graphics systems that dramatically decrease power consumption without compromising image quality.
DeepFovea is one of several neural network-based approaches that we’ve developed to meet this challenge. DeepFovea is a rendering system that applies the recently invented AI concept of generative adversarial networks (GAN) to mimic the way our peripheral vision perceives the world in everyday life, leveraging that perceptually matched architecture to deliver unprecedented graphics efficiency. DeepFovea’s neural rendering goes well beyond the traditional foveated rendering systems used in Oculus products today, enabling the generation of images that are perceptually indistinguishable from full-resolution images yet require fewer than 10 percent as many pixels to be rendered. Existing foveated rendering methods require rendering about half as many pixels as the full-resolution image, so DeepFovea’s order-of-magnitude reduction in rendering requirements represents a new milestone in perceptual rendering.
We first presented the DeepFovea method at SIGGRAPH Asia in November 2019. Today, we are publishing the complete demo to our DeepFovea repository to help the graphics research community deepen its exploration into state-of-the-art perceptual rendering.
The key to DeepFovea is the physiology of the human eye. When the eye looks directly at an object, the photons from that object land in what is known as the foveal region of the retina, or fovea for short. The fovea is the only part of the retina with high resolution, and it is a very small fraction of the overall retina. The highest-resolution region is only 3 degrees across, out of the more than 150-degree field of view of the human eye, and resolution drops by an order of magnitude within 10 degrees from the center of the fovea. We feel like we have high-resolution vision for a much wider field of view, but that is because our brain maintains a model of our surroundings and fills in missing details, while at the same time moving the fovea to any object of interest so quickly we’re not aware that we couldn’t see that object clearly a split second earlier. In truth, we have a tiny area of high-resolution vision — only the size of your thumbnail held at arm’s length — with only very blurry perception of everything that surrounds it.
That’s not to say that peripheral vision isn’t important. It is important for balance, motion detection, and ambient awareness, and cues the brain where to look next. But its ability to distinguish detail is sharply limited.
DeepFovea works to generate images that match the resolution of the retina, using the minimum necessary amount of data to do so. Given an image that is rendered sparsely, with a variable resolution that matches the resolution of the retina at each point based on where the fovea is pointed at any given moment, DeepFovea infers the missing data. Crucially, it does so in a way that, given the resolution and image-processing characteristics of the retina, produces results that are perceptually indistinguishable from a full-resolution image. That doesn’t mean that the results are identical — in fact, they’re generally not even close when you look at them with the fovea, as you can see in the example below — but the lower-resolution processing of the peripheral vision can’t perceive the difference.
DeepFovea infers the missing peripheral information by using generative adversarial networks (GANs). We train DeepFovea’s neural network by feeding millions of real videos with artificially degraded peripheral quality. The artificially degraded videos simulate peripheral image degradation, and the GAN-based design helps the network learn how to fill in the missing details based on statistics from all the videos it has seen.
As a result, a renderer can render an order of magnitude fewer pixels, or even less than that — pixel density can decrease by as much as 99 percent along the periphery of a 60x40-degree field of view — thereby saving a great deal of power consumption. Yet thanks to DeepFovea, together with eye tracking, the viewer will perceive exactly the same scene at exactly the same quality.
DeepFovea also ensures that flicker, aliasing, and other video artifacts in the periphery remain undetected by the human eye.
Our ultimate goal is to make real-time foveated rendering run on lightweight, highly power-efficient AR/VR devices that can be worn all day. DeepFovea marks an important step toward that goal by setting a new standard for the efficiency of perceptual rendering, by demonstrating no perceived quality loss while rendering fewer than 10 percent as many pixels as conventional renderers. This approach is also hardware agnostic, making DeepFovea compatible with a wide range of AR/VR research systems.
While DeepFovea unlocks an important method for efficient rendering in AR and VR, this is just the beginning of the exploration of ultra-low-power perceptual rendering. By releasing the DeepFovea demo in addition to our research paper, we hope to provide a useful framework for researchers in graphics and vision science who are interested in contributing to the advancement of perceptual and neural rendering technologies.
Meta creates breakthrough technologies and advances AI to connect people to what matters and to help keep communities safe.