Scinetists have been experimenting since long to develop a brain-computer interface that could enable people with a spinal cord injury, locked-in syndrome, ALS, or other paralyzing condition to talk again. Current assistive technologies for people whose paralysis, ALS, or other condition leaves them unable to speak “are not very natural and intuitive,” said Martin, who is part of a European consortium on decoding speech from brain activity. Patients gaze at a screen displaying letters, scalp electrodes sense brain waves that encode eye movement and position, and the chosen letters spell words that a speech synthesizer says aloud. The late cosmologist Stephen Hawking, who had ALS, used a system like this. But scientists think they can do better by “directly exploiting neural correlates of speech,” Martin said.
More and more experts therefore think a system that decodes whether a person is silently saying yes or no or hungry or pain or water is now within reach, thanks to parallel advances in neuroscience, engineering, and machine learning. “We think we’re getting enough of an understanding of the brain signals that encode silent speech that we could soon make something practical,” said Brian Pasley of the University of California, Berkeley. “Even something modest could be meaningful to patients. I’m convinced it’s possible.” Further in the future, Facebook and others envision similar technology facilitating consumer products that translate thoughts into text messages and emails. No typing or Siri necessary.
The first brain-computer interfaces (BCI) read electrical signals in the motor cortex corresponding to the intention to move, and use software to translate the signals into instructions to operate a computer cursor or robotic arm. In 2016, scientists at the University of Pittsburgh went a step further, adding sensors to a mind-controlled robotic arm so it produced sensations of touch.
Speech BCI faces even higher hurdles. Decoding the intention to articulate a word involves reading more brain signals than movement, and it hasn’t been clear precisely which areas of the brain are involved. The main challenge is that language is encoded in an extensive brain network, and current recording techniques can’t monitor the whole brain with high enough spatial and temporal resolution, said Stephanie Martin of the University of Geneva, who last year won an award for her progress toward a speech BCI. The brain is also very noisy, and the electrical activity that encodes speech tends to get drowned out by other signals. “That makes it hard to extract the speech patterns with a high accuracy,” she said.
Neuroscientists are teaming up with electrical engineers to develop a system of implants, decoders, and speech synthesizers that would read a patient’s intended words, as encoded in brain signals, and turn them into audible speech. One aspect of speech BCI that could one day make their use widespread, Guenther said: The hardware is much less expensive than robot arms, which can cost hundreds of thousands of dollars.
Now, a group of scientists have developed a way to translate brain waves directly into human speech, potentially giving patients who are unable to use their voices another way to communicate. In a scientific first, Columbia neuroengineers have created a system that translates thought into intelligible, recognizable speech. By monitoring someone’s brain activity, the technology can reconstruct the words a person hears with unprecedented clarity. This breakthrough, which harnesses the power of speech synthesizers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain.
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” said Nima Mesgarani, Ph.D., the paper’s senior author and a principal investigator at Columbia University’s Mortimer B. Zuckerman Mind Brain Behavior Institute. “With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
The key piece of technology that makes this possible is the translation algorithm, which could still be improved a great deal. With a more advanced algorithm coupled with a better understanding of the brain, we might someday be able to actually give people who lack speech a real alternative.
Engineers translate brain signals directly into speech
Decades of research has shown that when people speak—or even imagine speaking—telltale patterns of activity appear in their brain. Distinct (but recognizable) pattern of signals also emerge when we listen to someone speak, or imagine listening. Experts, trying to record and decode these patterns, see a future in which thoughts need not remain hidden inside the brain—but instead could be translated into verbal speech at will.But accomplishing this feat has proven challenging. Early efforts to decode brain signals by Dr. Mesgarani and others focused on simple computer models that analyzed spectrograms, which are visual representations of sound frequencies. But because this approach has failed to produce anything resembling intelligible speech, Dr. Mesgarani’s team turned instead to a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.
“This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions,” said Dr. Mesgarani, who is also an associate professor of electrical engineering at Columbia’s Fu Foundation School of Engineering and Applied Science. “Working with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity,” said Dr. Mesgarani. “These neural patterns trained the vocoder.”
Next, the researchers asked those same patients to listen to speakers reciting digits between 0 to 9, while recording brain signals that could then be run through the vocoder. The sound produced by the vocoder in response to those signals was analyzed and cleaned up by neural networks, a type of artificial intelligence that mimics the structure of neurons in the biological brain. The end result was a robotic-sounding voice reciting a sequence of numbers. To test the accuracy of the recording, Dr. Mesgarani and his team tasked individuals to listen to the recording and report what they heard.
“We found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts,” said Dr. Mesgarani. The improvement in intelligibility was especially evident when comparing the new recordings to the earlier, spectrogram-based attempts. “The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.” Dr. Mesgarani and his team plan to test more complicated words and sentences next, and they want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words.
“In this scenario, if the wearer thinks ‘I need a glass of water,’ our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech,” said Dr. Mesgarani. “This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them.” This paper is titled “Towards reconstructing intelligible speech from the human auditory cortex.”
DARPA RATS program
Following his research into speech signal processing for DARPA’s RATS program, Dr. Nima Mesgarani of Columbia University’s Zuckerman Institute and fellow researchers announce that a brain-computer interface (BCI) has been used to turn brainwave patterns into speech with the help of a speech synthesizer
Existing speech signal processing technologies are inadequate for most noisy or degraded speech signals that are important to military intelligence. DARPA launched The Robust Automatic Transcription of Speech (RATS) program in 2010 to create algorithms and software for performing the following tasks on potentially speech-containing signals received over communication channels that are extremely noisy and/or highly distorted:
- Speech Activity Detection: Determine whether a signal includes speech or is just background noise or music.
- Language Identification: Once a speech signal has been detected, identify the language being spoken.
- Speaker Identification: Once a speech signal has been detected, identify whether the speaker is an individual on a list of known speakers.
- Key Word Spotting: Once a speech signal has been detected and the language has been identified, spot specific words or phrases from a list of terms of interest.
In a presentation dated from 2014 and entitled “Reverse engineering the neural mechanisms involved in speech processing,” Dr. Mesgarani referenced the RATS program and talked about “decoding speech signals and attentional focus directly from the brain activity,” which was realized today with the creation a brain-computer interface that turns brainwave patterns into speech.
As the latest research from Columbia University’s Zuckerman Institute shows, “Reconstructing speech from the neural responses recorded from the human auditory cortex […] opens up the possibility of using this technique as a speech brain-computer interface to restore speech in severely paralyzed patients.”
How Brain Waves Surf Sound Waves to Produce speech
Decades ago, the noted computational neuroscientist David Marr observed that “trying to understand perception by understanding neurons is like trying to understand a bird’s flight by understanding only feathers.” In the 1970s, the influential computational neuroscientist David Marr argued that brains and other information processing systems needed to be studied in terms of the specific problems they face and the solutions they find (what he called a computational level of analysis) to yield answers about the reasons behind their behavior. Looking only at what the systems do (an algorithmic analysis) or how they physically do it (an implementational analysis) is not enough.
More curiously, some studies have seen that when people listen to spoken language, an entrained signal also shows up in the part of the motor cortex that controls speech. Assaneo and Poeppel took a fresh approach with a hypothesis that tied the real-world behavior of language to the observed neurophysiology. They noticed that the frequency of the entrained signals in the auditory cortex is commonly about 4.5 hertz — which also happens to be the mean rate at which syllables are spoken in languages around the world.
“When we perceive intelligible speech, the brain network being activated is more complex and extended,” she explained.) If the signals in the auditory cortex drive those in the motor cortex, then they should stay entrained to each other throughout the tests. If the motor cortex signal is independent, it should not change.
But what Assaneo observed was rather more interesting and surprising, Poeppel said: The auditory and speech motor activities did stay entrained, but only up to about 5 hertz. Once the audio changed faster than spoken language typically does, the motor cortex dropped out of sync. A computational model later confirmed that these results were consistent with the idea that the motor cortex has its own internal oscillator that naturally operates at around 4 to 5 hertz.