UPDATE: March 30, 2020: Today in Nature Neuroscience, UCSF researchers Joseph Makin, David Moses, and Eddie Chang published the results of a recent study that set a new benchmark for decoding speech directly from brain activity.
While other studies have shown that spoken words and phrases can be decoded from signals recorded from the surface of the brain, decoding quality has been limited, with error rates north of 60% for 100-word vocabularies. UCSF’s results redefine what’s possible for brain-to-text decoding, showing average error rates as low as 3% when tested with vocabularies of up to 300 words.
This key breakthrough is enabled by UCSF’s use of modern machine learning techniques, utilizing recurrent neural networks inspired by state-of-the-art speech recognition and language translation algorithms. It’s a significant step toward UCSF’s long-term goal of developing a prosthesis to restore speech communication for people who have lost the ability to speak — a mission we support.
This work illustrates how applying new machine learning techniques can accelerate brain-computer interface (BCI) development for many different applications. At Facebook Reality Labs, we’re focused on exploring how non-invasive BCI can redefine the AR/VR experience. New research helps illuminate the path forward in our mission to develop a non-invasive silent speech interface for the next computing platform. We are proud to sponsor and support academic research groups like UCSF.
To hear more about the latest work emerging from our own BCI group at Facebook Reality Labs, check out FRL Research Director Mark Chevillet’s recent talk from the ApplySci Wearable Tech + Digital Health + Neurotech conference.
TL;DR: At F8 2017, we announced our brain-computer interface (BCI) program, outlining our goal to build a non-invasive, wearable device that lets people type by simply imagining themselves talking. As one part of that work, we’ve been supporting a team of researchers at University of California, San Francisco (UCSF) who are working to help patients with neurological damage speak again by detecting intended speech from brain activity in real time as part of a series of studies. UCSF works directly with all research participants, and all data remains onsite at UCSF. Today, the UCSF team is sharing some of its findings in a Nature Communications article, which provides insight into some of the work they’ve done so far — and how far we have left to go to achieve fully non-invasive BCI as a potential input solution for AR glasses.
This is the third entry in our “Inside Facebook Reality Labs” blog series. In the first post, we go behind the scenes at FRL Pittsburgh, where Yaser Sheikh and his team are building state-of-the-art lifelike avatars. The second entry explored haptic gloves for virtual reality, and we’ll have more to share throughout the year. In the meantime, check out FRL Chief Scientist Michael Abrash’s series introduction to learn more.
For Emily Mugler, it was a moment that changed everything.
It was late April 2017, and Regina Dugan had just taken the stage at F8, where she asked a provocative question: “What if you could type directly from your brain?”
She went on to detail exciting research taking place at Facebook with the goal of letting people type simply by imagining what they wanted to say — all without ever saying a word or typing a single keystroke.
“I sent in my CV that night,” Mugler recalls. At the time, she was at Northwestern, putting her PhD in bioengineering to work by attempting to decode speech based on data from awake brain surgeries. That’s where the patient remains conscious while speech and brain signals are recorded, often to help map out tissue that can be safely removed without damaging cognitive function.
Years before, Mugler had also worked with patients who could no longer speak as a result of ALS, or Lou Gehrig’s disease. At the time, she had used electroencephalography (EEG) — a technique using a kind of swim cap of small electrodes worn on the head — to measure the brain’s electrical activity and make it easier for them to communicate. Unfortunately, it was difficult for people to use and very slow going. “It sometimes took 70 minutes for a patient to type a single sentence,” she says. This work motivated her throughout her doctoral studies and beyond as she sought out better, more efficient ways to help improve people’s lives.
She started looking at speech as a more effective communication method. And while the exploration of brain-computer interface (BCI) technologies had started in earnest in 2004 with the use of electrocorticography (ECoG), there were only a handful of labs around the world tackling the problem at the time. Moreover, because ECoG requires the placement of electrodes on the cortex, it continued to fall short of a fully non-invasive solution — one that would work outside of the body.
Watching the F8 keynote, Mugler was excited by the work being done at Facebook to explore BCI. Here, it seemed, was at least the potential for answers.
Imagine a world where all the knowledge, fun, and utility of today’s smartphones were instantly accessible and completely hands-free. Where you could spend quality time with the people who matter most in your life, whenever you want, no matter where in the world you happen to be. And where you could connect with others in a meaningful way, regardless of external distractions, geographic constraints, and even physical disabilities and limitations.
That’s a future we believe in, and one we think will be fully realized in the ultimate form factor of a pair of stylish, augmented reality (AR) glasses. As Chief Scientist Michael Abrash and the team at Facebook Reality Labs (FRL) see it, we’re standing on the edge of the next great wave in human-oriented computing, one in which the combined technologies of AR and VR converge and revolutionize how we interact with the world around us — as well as the ways in which we work, play, and connect.
That future is still a long way off, but the early-stage research taking place today is the first step toward delivering on its promise. And one of the most interesting and challenging problems that needs to be solved is the question of input. In other words, once we put on a pair of AR glasses, how will we actually use them?
“It’s going to be something completely new, as clean a break from anything that’s come before as the mouse/GUI-based interface was from punch cards, printouts, and teletype machines,” writes Abrash.
And that’s where BCI comes in.
“I grew up reading about BCI in William Gibson and Neal Stephenson novels — and I’ve spent most of my adult life trying to figure out if BCI is going to be a real thing... though I’m still surprised that turned out to be a real job,” says Mark Chevillet, a Research Director at FRL. Controlling machines with our minds isn’t science fiction anymore. Real people have used BCI technologies to feed themselves, hold the hand of a loved one, and even fly a jet simulator. But that’s only been possible to date using implanted electrodes. “There’s no other way to do it today, but our team’s long-term goal is to make these things possible in a non-invasive, wearable device.”
Before joining FRL to build and lead a team dedicated to BCI research, Chevillet was at the Johns Hopkins University Applied Physics Laboratory (APL). “The thing that drew me to APL was their work combining cutting-edge robotics and surgically implanted electrodes to develop a prosthetic arm for wounded combat veterans that could be controlled just like their natural arm,” he says.
Inspired, Chevillet helped APL build up a broader program of related research projects in applied neuroscience to see where the technology could go in the future — and how it could potentially help even more people in a non-invasive way. “I led an internal research program to see if we could use some of the ideas from motor prosthetics and some of my other work in cognitive neuroscience to make a communications prosthetic for people who can’t speak,” he explains. “We were also trying to figure out if we could use a non-invasive approach rather than implanted electrodes — so the technology could be used by many more people.”
After being recruited by Facebook to build a team focused on BCI, Chevillet was encouraged to propose and pursue an ambitious, long-term research project that pushed the boundaries of what was possible and redefined the state of the art. And that’s when it hit him: BCI for communication using a wearable device wasn’t just relevant for people who couldn’t speak — it might also be a very powerful way for people to interact with their digital devices.
While voice is gaining traction as an input mechanism for smart home devices and phones, it’s not always practical. What if you’re in a crowded room, walking down a noisy city street, or in a hushed art gallery? As Chevillet realized, “Most people have used the voice assistant on their phone, but they hardly ever use it in front of other people.”
But what if you could have the hands-free convenience and speed of voice with the discreteness of typing?
With that vision in mind, Chevillet set out to build an interdisciplinary team that could figure out whether his dream of a non-invasive, wearable BCI device might become a reality.
“We clearly needed somebody who really understood speech production and the neuroscience behind it, and we lucked out and found Emily — or she found us,” he says. As the team grew, the interdisciplinary nature of the problem attracted a diverse array of top-notch researchers, with backgrounds ranging from biomedical imaging to machine learning and quantum computing. But there was one problem the FRL team was not equipped to solve — before they could figure out exactly what type of wearable device to build, they needed to know whether a silent speech interface was even possible and, if so, which neural signals were actually needed to make it work.
As it stands today, that question can only be answered using implanted electrodes, which is why Chevillet reached out to his long-time colleague Edward Chang — a world-renowned neurosurgeon at University of California, San Francisco (UCSF), where he also runs a leading brain mapping and speech neuroscience research team dedicated to developing new treatments for patients with neurological disorders.
“I went to see Eddie in his office and just sorta laid out the vision,” Chevillet recalls. “I explained that I wanted to know if speech BCI could be possible using a wearable device, and how supporting his team’s research involving implanted electrodes might move our own non-invasive research forward.”
It turned out that Chang had long been planning to develop a communication device for patients who could no longer speak after severe brain injury — like brainstem stroke, spinal cord injury, neurodegenerative disease, and other conditions. But he knew that such an ambitious goal needed resources that weren’t readily available to pull it off in the near term. The opportunity seemed like an exciting one.
After discussing the importance of UCSF’s research aimed at improving the lives of people suffering from paralysis and other forms of speech impairment, as well as Facebook’s interest in the long-term potential of BCI to change the way we interact with technology more broadly, the two decided to team up with the shared goal of demonstrating whether it might really be possible to decode speech from brain activity in real time.
Today in Nature Communications, Chang and David Moses, a postdoctoral scholar in Chang’s lab at UCSF, published the results of a study demonstrating that brain activity recorded while people speak could be used to almost instantly decode what they were saying into text on a computer screen. While previous decoding work has been done offline, the key contribution in this paper is that the UCSF team was able to decode a small set of full, spoken words and phrases from brain activity in real time — a first in the field of BCI research. The researchers emphasize that their algorithm is so far only capable of recognizing a small set of words and phrases, but ongoing work aims to translate much larger vocabularies with dramatically lower error rates.
The past decade has seen tremendous strides in neuroscience — we know a lot more about how the brain understands and produces speech. At the same time, new AI research has improved our ability to translate speech to text. Taken together, these technologies could one day help people communicate by imagining what they want to say — a possibility that could dramatically improve the lives of people living with paralysis.
The work done for the Nature Communications paper involved volunteer research participants with normal speech who were already undergoing brain surgery to treat epilepsy. It’s part of a larger research program at UCSF that has been supported in part by FRL in a project we call Project Steno. The final phase of Project Steno will involve a year-long study to see whether it’s possible to use brain activity to restore a single research participant’s ability to communicate in the face of disability. In addition to providing funding, a small team of Facebook researchers are working directly with Chang and his lab to provide input and engineering support. UCSF oversees the research program and works directly with research volunteers. Facebook researchers have limited access to de-identified data, which remain onsite at UCSF and under its control at all times.
Ultimately, the researchers hope to reach a real-time decoding speed of 100 words per minute with a 1,000-word vocabulary and word error rate of less than 17%. And by demonstrating a proof of concept using implanted electrodes as part of their effort to help patients with speech loss, we hope UCSF’s work will inform our development of the decoding algorithms and technical specifications needed for a fully non-invasive, wearable device.
We’re a long way away from being able to get the same results that we’ve seen at UCSF in a non-invasive way. To get there, FRL is continuing to explore non-invasive BCI methods with other partners, including the Mallinckrodt Institute of Radiology at Washington University School of Medicine and APL at Johns Hopkins.
And we’re starting with a system using near-infrared light.
Like other cells in your body, neurons consume oxygen when they’re active. So if we can detect shifts in oxygen levels within the brain, we can indirectly measure brain activity. Think of a pulse oximeter — the clip-like sensor with a glowing red light you’ve probably had attached to your index finger at the doctor’s office. Just as it’s able to measure the oxygen saturation level of your blood through your finger, we can also use near-infrared light to measure blood oxygenation in the brain from outside of the body in a safe, non-invasive way. This is similar to the signals measured today in functional magnetic resonance imaging (fMRI) — but using a portable, wearable device made from consumer-grade parts.
We don’t expect this system to solve the problem of input for AR anytime soon. It’s currently bulky, slow, and unreliable. But the potential is significant, so we believe it’s worthwhile to keep improving this state-of-the-art technology over time. And while measuring oxygenation may never allow us to decode imagined sentences, being able to recognize even a handful of imagined commands, like “home,” “select,” and “delete,” would provide entirely new ways of interacting with today's VR systems — and tomorrow's AR glasses.
We’re also exploring ways to move away from measuring blood oxygenation as the primary means of detecting brain activity and toward measuring movement of the blood vessels and even the neurons themselves. Thanks to the commercialization of optical technologies for smartphones and LiDAR, we think we can create small, convenient BCI devices that will let us measure neural signals closer to those we currently record with implanted electrodes — and maybe even decode silent speech one day.
It could take a decade, but we think we can close the gap.
Technology is not inevitable, and it is never neutral — it’s always situated within a specific social and historical context. There have been dramatic advancements in neurotechnologies over the last five years, driven in part by considerable international investment across the US, Europe, Japan, China, and Australia. As a result, society’s understanding of what’s possible — and what it could mean for people — will need some time to catch up.
While BCI as a compelling input mechanism for AR is clearly still a long way off, it’s never too early to start thinking through the important questions that will need to be answered before such a potentially powerful technology should make its way into commercial products. For example, how can we ensure the devices are safe and secure? Will they work for everyone, regardless of skin tone? And how do we help people manage their privacy and data in the way they want?
“We’ve already learned a lot,” says Chevillet. “For example, early in the program, we asked our collaborators to share some de-identified electrode recordings from epilepsy patients with us so we could validate how their software worked. While this is very common in the research community and is now required by some journals, as an added layer of protection, we no longer have electrode data delivered to us at all.”
“We know the technology better than anyone else, so we should start talking about this now with people in the community,” Mugler adds. “Just because we aren’t all trained bioethicists, that doesn’t mean we can’t or shouldn’t be part of the conversation. In fact, it’s our responsibility to make sure everyone is made properly aware of the scientific advances, so we can all have an informed discussion about the future of this technology.”
That’s why we support and encourage our partner institutions to publish their research in peer-reviewed journals — and why we’re telling this story today.
“We can’t anticipate or solve all of the ethical issues associated with this technology on our own,” Chevillet says. “What we can do is recognize when the technology has advanced beyond what people know is possible, and make sure that information is delivered back to the community. Neuroethical design is one of our program’s key pillars — we want to be transparent about what we’re working on so that people can tell us their concerns about this technology.”
The promise of AR lies in its ability to seamlessly connect people to the world that surrounds them — and to each other. Rather than looking down at a phone screen or breaking out a laptop, we can maintain eye contact and retrieve useful information and context without ever missing a beat. It’s a tantalizing vision, but one that will require an enterprising spirit, hefty amounts of determination, and an open mind.
The question of input for all-day wearable AR glasses remains to be solved, and BCI represents a compelling path towards a solution. A decade from now, the ability to type directly from our brains may be accepted as a given. Not long ago, it sounded like science fiction. Now, it feels within plausible reach. And our responsibility to ensure that these technologies work for everyone begins today.
We're hiring AR/VR engineers & researchers!
Help us build the metaverse
Reality Labs brings together a world-class team of researchers, developers, and engineers to build the future of connection within virtual and augmented reality.