VirtuDockDL: A Graph Neural Network Platform for Drug Discovery

Written by Neha Bhatti (Reporter)

Researchers from The University of Lahore (Pakistan), Government College University Faisalabad (Pakistan), Shenzhen University (China) and Taif University (Saudi Arabia) have collaborated to develop VirtuDockDL, a Python-based platform that integrates deep learning (DL) to streamline the drug discovery process.   

The drug discovery process is fraught with challenges, where approximately out of one million screened compounds, only one marketable drug emerges. While technological advancements have expanded the number of druggable molecular targets and potential drug compounds, current screening methods; high-throughput and ultra-high-throughput; often fall short.  

The most prominent limitation of these methods is that they are unable to efficiently manage the overwhelming number of potential candidates. This necessitates the adoption of more advanced computational tools to bridge the gap.  

Introducing VirtuDockDL 

Described in Nature Scientific Reports, VirtuDockDL is an automated deep-learning-based platform designed to streamline drug discovery. It combines molecular graph construction, Graph Neural Networks (GNN) modelling, virtual screening, and compound clustering, into a unified framework to identify potential drug candidates.  

Machine learning (ML) approaches have been increasingly used in drug discovery to analyze data and predict outcomes with success. However, DL approaches offer distinct advantages over machine learning.  

Unlike ML, which relies on manually crafted features, DL algorithms automatically extract features directly from raw data. This means that scientists won’t need to manually decide which features of the drug discovery data are the most important. 

VirtuDockDL uses advanced deep learning (DL) models to identify compounds that are most likely to work as a drug from libraries, refine the identified compound, and use traditional docking tools, such as AutoDock Vina to predict binding poses and affinities. This approach offers researchers a much more streamlined workflow, allowing them to focus their money and effort only on compounds with high druggable potential. 

How does VirtuDockDL work? 

VirtuDock DL employs a multi-step computational process: 

  1. Molecule Selection and Generation: The process begins by identifying active (inhibitors) and inactive molecules (non-inhibitors). New molecules are generated using de novo synthesis, ensuring a diverse library of compounds for screening. 
  2. Feature Selection, Modeling, and Screening: Molecular structures are encoded as Simplified Molecular Input Line Entry System (SMILES) strings, derived from databases like PubChem and Protein Data Bank. SMILES strings are transformed into molecular graphs optimized for GNN models. These graphs encode critical descriptors, such as molecular weight and fingerprints, allowing GNN to screen a compound library and identify those with the potential to be therapeutic drugs. 
  3. Protein Refinement: The 3D structures of proteins are refined using OpenMM, which is an open-source software that simulates how molecules move and interact. This process involves energy minimization and ensures the protein is in its most stable configuration, to enhance the accuracy of docking simulation later on. 
  4. Ligand Docking: This is performed using the AutoDock Vina, which predicts the binding affinity of ligands/drugs to their receptor. Based on the binding affinity data, this tool identifies which receptors the ligand is most likely to bind to.   
  5. Visualization and Benchmarking: The simulation results are visualized for better interpretation and compared to experimental data to assess their accuracy and predictive power. This benchmarking ensures that the virtual screening results align with real-world biological behavior. 

Case study: Targeting the Marburg Virus VP35 Protein 

A notable application of VirtuDockDL is its use to target the VP35 protein of the Marburg virus (MARV), a pathogen with high fatality rates and lacks effective validated inhibitors. 

Researchers utilized positive datasets, which were prepared from literature and public databases, as well as negative datasets, which were generated via de novo synthesis. Using VirtuDock’s automated screening process, alongside docking tools that predicted docking scores and binding affinities, researchers were able to identify potential VP35 inhibitors, namely Clomipramine and Marizomib. 

Researchers also found that the tool was able to identify other druggable targets, such as the HER2 protein associated with cancer, TEM-1 Beta-Lactamase associated with bacterial infections, and the CYP51 enzyme associated with fungal infections.  

Benchmarking: Comparing VirtuDockDL to Other Screening Tools 

VirtuDockDL was benchmarked against other popular virtual screening tools, including PyRMD, RosettaVS, and MzDOCK. Researchers found that VirtuDockDL outperformed these platforms by combining both ligand-based and structure-based virtual screening. Its advanced GNN model achieved remarkable performance metrics. 

VirtuDockDL had an accuracy of 99%, an F1 score of 0.992, and an area under the curve score of 0.99. These results highlight VirtuDockDL’s capabilities and its advantage over other ML- and non-ML-based docking tools.  

Future Prospects of VirtuDockDL in Drug Discovery   

VirtuDockDL demonstrated impressive accuracy and outperforms traditional screening tools as well as newer ML-based platforms. Its broad applicability in drug discovery stems from its ability to accurately identify high-affinity inhibitors across various targets. This includes applications in screening druggable targets for diseases in oncology, bacterial infections, and fungal infections.  

The AI-driven GNN model makes VirtuDockDL stand out in automating the drug discovery process, increasing efficiency, and reducing the time and cost involved.  

Despite its advantages, some challenges remain. Currently, there is an overwhelming abundance of data that requires efficient screening and organization, making AutoDockDL an invaluable tool for this task. However, in the long run, the primary challenge for any computational screening method lies in the need for high-quality, diverse datasets, which demand substantial computational resources. Many experts have raised their concerns regarding the possible lack of data, thus, halting AI progress and its application in practical settings.  

Nevertheless, AutoDockDL emerges as a critical innovation, offering the potential to optimize and sustain the drug discovery process, which is arguably one of the least efficient fields in science. By addressing current bottlenecks, it represents a significant step forward in accelerating and transforming drug development.