AI Detects Anxiety and Depression Comorbidity from Voice Recordings

An Illinois-based (U.S.) research team has used AI to detect comorbid anxiety disorders (AD) and major depressive disorder (MDD) from verbal fluency tests. They trained four models to distinguish AD/MDD by extracting features from voice recordings, such as patterns of silence and speaking volume.
AD/MDD Prevalence is at an All-Time High and Continues to Rise
The number of people experiencing a common mental illness has been slowly and steadily increasing throughout the 2000s. In 2023, the UK mental health services received a record of 5 million referrals, reflecting the growing need for expanded testing and treatment options
Traditional screening methods —such as self-reported questionnaires, clinical interviews, and observational assessments— are struggling to support the increasing demand. While mental health specialists are crucial now more than ever, workforces remain understaffed and under-resourced, struggling to meet demand.
Moreover, traditional mental health assessments, which often rely on a single verbal interview to confirm diagnoses, are widely regarded as inefficient and biased. This limited approach can overlook the complexity of a patient’s symptoms and is susceptible to subjective interpretation, potentially leading to inconsistent evaluations and misdiagnoses.
To address these concerns, scientists are looking to automate the sector, using AI to screen for mental illnesses that could speed up and standardize the assessment process.
One study, published in the journal, Acoustical Society of America, reveals compelling insights into the transformative potential of machine learning (ML) in diagnosing comorbid AD/MDD .
Led by Mary Pietrowicz, Senior Research Scientist at the National Center for Supercomputing Applications at the University of Illinois, the team utilized four different AI models to assess the severity of anxiety and depression through pre-recorded verbal fluency tests (VFTs).
Mary first became interested in the impact of mental illness on voice patterns 11 years ago, while listening to oral history recordings of patients who had recently experienced highly stressful situations recounting their experiences:
“While listening to some of these interviews, I realized that the speakers were telling me something about their physical and emotional well-being in their acoustic voices, it was absolutely striking to hear.”
A few years later, Mary continued her research by analyzing interviews with individuals who had recently experienced highly stressful situations. She noticed that through acoustic voice modelling, she was able to track the severity of their symptoms, further reinforcing the idea that vocal patterns could provide valuable insights into mental health conditions.
VFTs are commonly used by psychologists to gauge cognitive ability. During the assessment, participants are asked to name as many objects as possible from a given category in an allotted time, usually 60 seconds.
Studies on VFTs have found that AD patients talk more quickly and with greater pitch variation, while those with MDD typically talk slower and with less pitch variation. However, little is known about the way comorbid AD/MDD affects speech.
“Speech and voice profiles differ across people with AD-only, MDD-only, and comorbid AD/MDD. Some of the effects of AD and MDD oppose one another,” explains Mary. “Much of the existing research overlooks these distinctions and fails to address the unique characteristics of comorbid AD/MDD.”
The research group trained both traditional ML and more convoluted deep learning (DL) models using VFT recordings from 40 female participants with and without AD/MDD. During these tests, volunteers were asked to name as many animals as they could in precisely one minute, aiming to be as speedy as possible.
The AI models then learnt to correlate vocal patterns from these recordings with known AD/MDD severity scores, which had been pre-determined by combining the results from two assessments for anxiety and depression: Patient Health Questionnaire-9 and Generalized Anxiety Disorder-7.
Dissecting the AI Toolkit
So, what kind of AI models were put to the test, and why did they use four different systems? The group wanted to find out what particular patterns in the voice recordings are most strongly correlated to comorbid AD/MDD. To achieve this, they tested four different AI systems, each with unique strengths in analyzing speech, allowing them to compare different approaches and identify the most effective method for detecting relevant speech patterns.
Three classical ML systems were trained to look at both acoustic and phenomic features of VFTs, capturing the physical properties of sound waves and linguistic behaviors, respectively. The models are all non-parametric ML systems, meaning they do not rely on fixed mathematical functions, such as linear equations, to analyze training datasets. This flexibility enables them to capture more complex patterns. Three of the four AI systems used in the study are:
- K-Nearest Neighbors (KNNs). KNNs don’t create formulas or rules during training. Instead, when they need to make a prediction, they look at the stored training data and find the K closest data points (neighbors) to the new input. The prediction is then based on the majority of those nearest neighbors. KNNs are particularly adept at learning non-linearly separable data, in which there is no clear, straight-line boundary that can separate classes.
- Bagging Model (BAG). Bagging is an ensemble ML method that combines multiple base models into a single, more robust predictor. It is a generalized form of Random Forest (RF) that extends beyond decision trees, allowing the integration of various weak learners. Each model is trained on random subsets of the data, and their predictions are aggregated into one result to improve accuracy and reduce variance.
- Random Forest (RF). A specialized type of bagging, which combines multiple decision trees. At each ‘split’ in the decision tree, a random subset of the training data will be used to make a prediction, ensuring that each tree makes different errors and provides unique information.
Both BAG and RF are highly interpretable and are less prone to overfitting since weak learners focus on random subsets of the data and do not focus on any specific features of training data.
Each of these models analzyed statistical audio features, evaluating properties such as mean variation in pitch or pause duration, rather than scrutinizing every moment along the sound wave.
The research group also utilized a fourth, more complex AI system. A DL technique, called Long-term Short Memory Networks (LSTMs). Unlike KNN, BAG, and RF, which rely on summary statistics of sound features, LSTMs process waveforms frame-by-frame. This sequential approach allows LSTMs to capture temporal patterns in sound, making them particularly effective for analyzing complex, time-dependent signals.
The LSTM framework allows AI to uncover complex patterns in voice recordings that the three classical models may have struggled to capture.
When put to the test, the AI models achieved accuracy ratings ranging from 70% to 83%, with the LSTM framework demonstrating the best performance in the study.
“This study confirms and describes the comorbid AD/MDD voice signal and distinguishes it from healthy voices,” says Mary. “It lays the groundwork both for the development of AI screening tools using acoustic voice and models that look at alternative streams of data.”
Wider Implications: Will This Change the Trajectory of Mental Health Screening?
As Mary describes, the implications of this study could have widespread effects on psychological assessments. While growing in prevalence, mental health disorders such as AD and MDD are prone to misdiagnoses, biases, and inefficient screening methods.
As a result, many individuals struggling with AD and MDD frequently fly under the radar, meaning that patients miss out on the help they need. Delaying treatment exacerbates their conditions. The lack of accessible, standardized diagnostic tools further compounds the issue, leaving countless patients without the support they need.
AI could rapidly assess a patient’s risk of comorbid AD/MDD from a simple one-minute verbal test, picking up on subtle speaking patterns that may be missed by clinicians. This innovation could be particularly impactful in reducing the stigma around mental health by shifting the diagnostic process away from purely subjective assessments to data-driven, evidence-based evaluations.
However, the accuracy of using voice analysis to diagnose AD and MDD is still an evolving area of research. There is promising evidence supporting its potential, however, it has not been validated for broad use or as a standalone diagnostic tool. Instead, currently, it should be viewed as a complementary screening method, rather than a definitive diagnostic tool.