Parkinson’s drug screening boosted 10x using AI
Researchers have used artificial intelligence techniques to massively accelerate the search for Parkinson’s disease treatments.
The researchers, from the University of Cambridge (UK), designed and used an AI-based strategy to identify compounds that block the clumping, or aggregation, of alpha-synuclein, the protein that characterizes Parkinson’s.
The team used machine learning techniques to quickly screen a chemical library containing millions of entries, and identified five highly potent compounds for further investigation.
Parkinson’s affects more than six million people worldwide, with that number projected to triple by 2040. No disease-modifying treatments for the condition are currently available. The process of screening large chemical libraries for drug candidates – which needs to happen well before potential treatments can be tested on patients – is enormously time-consuming and expensive, and often unsuccessful.
Using machine learning, the researchers were able to speed up the initial screening process by ten-fold, and reduce the cost by a thousand-fold, which could mean that potential treatments for Parkinson’s reach patients much faster. The results are reported in the journal Nature Chemical Biology.
Parkinson’s is the fastest-growing neurological condition worldwide. In the UK, one in 37 people alive today will be diagnosed with Parkinson’s in their lifetime. In addition to motor symptoms, Parkinson’s can also affect the gastrointestinal system, nervous system, sleeping patterns, mood and cognition, and can contribute to a reduced quality of life and significant disability.
Proteins are responsible for important cell processes, but when people have Parkinson’s, these proteins go rogue and cause the death of nerve cells. When proteins misfold, they can form abnormal clusters called Lewy bodies, which build up within brain cells stopping them from functioning properly.
“One route to search for potential treatments for Parkinson’s requires the identification of small molecules that can inhibit the aggregation of alpha-synuclein, which is a protein closely associated with the disease,” said Michele Vendruscolo from the Yusuf Hamied Department of Chemistry, who led the research. “But this is an extremely time-consuming process – just identifying a lead candidate for further testing can take months or even years.”
While there are currently clinical trials for Parkinson’s currently underway, no disease-modifying drug has been approved, reflecting the inability to directly target the molecular species that cause the disease.
This has been a major obstacle in Parkinson’s research, because of the lack of methods to identify the correct molecular targets and engage with them. This technological gap has severely hampered the development of effective treatments.
The Cambridge team developed a machine learning method in which chemical libraries containing millions of compounds are screened to identify small molecules that bind to the amyloid aggregates and block their proliferation.
A small number of top-ranking compounds were then tested experimentally to select the most potent inhibitors of aggregation. The information gained from these experimental assays was fed back into the machine learning model in an iterative manner, so that after few iterations, highly potent compounds were identified.
“Instead of screening experimentally, we screen computationally,” said Vendruscolo. “By using the knowledge we gained from the initial screening with our machine learning model, we were able to train the model to identify the specific regions on these small molecules responsible for binding, then we can re-screen and find more potent molecules.”
Using this method, the Cambridge team developed compounds to target pockets on the surfaces of the aggregates, which are responsible for the exponential proliferation of the aggregates themselves. These compounds are hundreds of times more potent, and far cheaper to develop, than previously reported ones.
“Machine learning is having a real impact on the drug discovery process – it’s speeding up the whole process of identifying the most promising candidates,” said Vendruscolo. “For us this means we can start work on multiple drug discovery programmes – instead of just one. So much is possible due to the massive reduction in both time and cost – it’s an exciting time.”
The research was conducted in the Chemistry of Health Laboratory in Cambridge, which was established with the support of the UK Research Partnership Investment Fund (UKRPIF) to promote the translation of academic research into clinical programmes.
Digging Deeper into the AI
In the initial phase of the research, the team employed docking simulations to screen a vast array of molecules, generating an initial set of candidate compounds. These simulations predict the preferred orientation of one molecule to another, focusing primarily on estimating binding affinities.
Candidate Molecule Generation
Following the docking stage, the output dataset is input into a variational autoencoder (VAE), which serves to compress the molecular data into a more manageable form known as latent space. The VAE attempts to reconstruct the molecular structures with high fidelity. Through iterative learning, the VAE discerns the underlying ‘rules’ of molecular structure, allowing it to sample and generate new molecular structures from the latent space. This capability significantly broadens the diversity of compounds available for testing beyond traditional methods, with the potential to discover molecules that exhibit superior binding properties and more effective inhibitory actions against aplha-synuclein aggregation.
Predicting Potency
The molecular structures produced by the autoencoder are next fed into a predictive framework consisting of a Random Forest Regressor (RFR), which estimates the potency of various compounds based on learned molecular features.
To add a layer of precision, the predictions from the RFR are further refined by a Gaussian Process Regressor (GPR). This model assesses the uncertainty associated with each prediction, providing valuable probabilistic insights. This is particularly important, as it indicates the reliability of the predictions and can guide researchers on whether to proceed cautiously or gather more data.
Iterative Learning and Optimization
The approach employed in this study is iterative, meaning the AI system continuously adapted based on new experimental data obtained from in vitro assays. As new compounds were synthesized and tested, the outcomes were fed back into both the VAE and the predictive models (RFR and GPR). This feedback mechanism allowed for the continuous adjustment of the models’ parameters, enhancing their accuracy in identifying effective inhibitors. Such an iterative process fosters a dynamic exploration of the chemical space, progressively sharpening the focus on the most promising compounds, and refining the exploration of molecular features critical to successful drug development.
Ultimately, these methods demonstrate the power of multi-stage machine learning approaches and set an exciting foundation for future research.