You read my mind: speech decoding with AI

Written by Emma Hall (Digital Editor)

New research offers promising next steps in developing BCI technology for speech decoding and communication.

Have you heard of brain-computer interfaces (BCIs)?

In the past month, researchers have banded together to achieve huge milestones in BCIs, speech decoding and neuroprosthesis.

First up, UC San Francisco and UC Berkeley (both CA, USA) created a BCI that allowed a woman with severe limb and vocal paralysis to talk again through a digital avatar. The technology marks an enormous breakthrough, as while previous speech decoding research has shown promise in restoring communication through text generation, this is the first time that speech audio and facial expressions have also been decoded.

The whole system works using 253 electrodes placed on the area of the brain’s surface vital for speech. These electrodes intercept brain signals travelling to the face, jaw, larynx and tongue, and were joined to several computers through a cable connecting to a port in the woman’s head. Using this set up, an AI system was trained to discern the woman’s unique brain activity patterns for speech through having her repeat various sentences from a 1024 word vocabulary.

When reconstructing speech, the voice was created using a speech synthesizer algorithm that was personalized to sound exactly like the woman’s voice prior to her paralysis (through a recording of her speaking). Facial-avatar animation was created using AI software that recreates facial movements, and merged with speech brain signals using machine learning, enabling muscle movements in the avatar’s jaw, lips and tongue, as well as expressions of sadness, surprise and happiness.

The system is a substantial upgrade to commercially available devices, more quickly translating brain signals into text at almost 80 words per minute. The team aims to create a wireless version of the system in the future, and hopes that this research will eventually lead to an FDA-approved system for speech decoding.

Next up we have a team of researchers from Radboud University (Nijmegen, Netherlands) and the UMC Utrecht (Netherlands), who also managed to translate brain signals into audible speech.

Unlike the previous study, this research was conducted on non-paralyzed volunteers with temporary brain implants. The volunteers’ brain activity was recorded while they voiced numerous words out loud, and AI models were used to directly map and translate brain activity into audible speech. Interestingly, these speech reconstructions sounded like the original speakers in their articulations and pronunciation.

Not only were the AI models highly accurate in synthesizing correct individual words (92–100% accuracy across participants), but the words were also communicated intelligibly and coherently, like a real voice.

UMC Utrecht researcher Julia Berezutskaya does, however, caution against some disadvantages: ‘In these experiments, we asked participants to say twelve words out loud, and those were the words we tried to detect. In general, predicting individual words is less complicated than predicting entire sentences. In the future, large language models that are used in AI research can be beneficial. Our goal is to predict full sentences and paragraphs of what people are trying to say based on their brain activity alone. To get there, we’ll need more experiments, more advanced implants, larger datasets and advanced AI models. All these processes will still take a number of years, but it looks like we’re heading in the right direction.’

Both studies provide a promising outlook for those suffering from speech loss after a neurological injury, which is devastating to communication and can lead to social isolation. For now, they represent the next steps in developing BCI devices for communication.