The drug discovery pipeline is a notoriously arduous, time-consuming, and expensive process. It often takes over a decade and billions of dollars to bring a new drug to market [6]. This challenging landscape has spurred the exploration of computational approaches to accelerate and improve the efficiency of drug development. Machine learning (ML) and, increasingly, quantum machine learning (QML) are emerging as powerful tools, offering the potential to revolutionize various stages of the drug discovery process [10, 19]. This review will explore the current state of the art, highlighting key applications, emerging trends, and future directions in the application of ML and QML to drug discovery, focusing on the generation of novel molecules, prediction of drug-target interactions, drug repurposing, and the mitigation of challenges such as data scarcity and interpretability. This field is rapidly evolving, with new methods and applications constantly emerging, promising to reshape the future of medicine.

Generative Models for Drug Design

One of the most promising applications of ML in drug discovery is the de novo design of drug molecules. Generative models are trained on existing datasets of known drugs and can then create novel molecules with desired properties [4, 19]. These models offer the potential to explore vast chemical spaces and identify promising drug candidates more efficiently than traditional methods [12].

Energy-based generative models, such as the one developed in [1], are designed for target-specific drug discovery. TagMol, the proposed model, generates molecules with binding affinity scores comparable to real molecules. The study also highlights the advantage of using GAT-based models over GCN baselines for faster and better learning. Similarly, [4] proposes the use of various QML techniques, including generative adversarial networks (GANs), to generate small drug molecules.

Variational autoencoders (VAEs) are another popular approach for drug design. These models learn a latent representation of molecular structures and can generate new molecules by sampling from this latent space [11, 18]. However, as highlighted in [11], near-term quantum computers have limitations that hinder the representation learning in high-dimensional spaces. The authors present a scalable quantum generative autoencoder (SQ-VAE) for simultaneously reconstructing and sampling drug molecules, and a corresponding vanilla variant (SQ-AE) for better reconstruction. The results suggest that quantum computing advantages can be achieved for normalized low-dimension molecules, and that high-dimension molecules generated from quantum generative autoencoders have better drug properties within the same learning period. A hybrid quantum-classical deep learning model tailored for binding affinity prediction in drug discovery shows a 6% improvement in prediction accuracy relative to existing classical models, as well as a significantly more stable convergence performance [9]. Moreover, the work in [18] built a compact discrete variational autoencoder (DVAE) with a Restricted Boltzmann Machine (RBM) of reduced size in its latent layer, which could fit a state-of-the-art D-Wave quantum annealer and generate novel chemical structures with medicinal chemistry and synthetic accessibility properties.

Conditional diversity networks offer another approach to drug design. These networks can generate potential drug molecules from a prototype, which is especially valuable in drug discovery where researchers often start from a molecule with some of the desired properties [36].

Predicting Drug-Target Interactions

Identifying drug-target interactions (DTIs) is a critical step in drug discovery. Accurate prediction of these interactions can significantly reduce the time and cost associated with identifying lead compounds [5, 15, 16, 27, 28, 29].

Several studies have explored the use of deep learning models for DTI prediction [5, 16, 28, 29, 30]. DrugMAN, developed in [5], integrates heterogeneous information from multiple biological networks using a mutual attention network. This approach allows the model to capture complex relationships between drugs and targets, leading to improved prediction performance. The study in [16] proposes a method for predicting drug-target binding affinity using deep learning models, using a modified GRU and GNN to extract features from the drug-target protein sequences and the drug molecule map, respectively, to obtain their feature vectors. Another approach, presented in [27], incorporates 3D protein structure features for drug target affinity prediction using GraphPrint. The model generates graph representations for protein 3D structures using amino acid residue location coordinates and combines them with drug graph representation and traditional features to jointly learn drug target affinity, demonstrating that 3D protein structure-based features provide information complementary to traditional features. A cross-field information fusion strategy is employed in [28] to acquire local and global protein information, proposing the siamese drug-target interaction SiamDTI prediction method.

Knowledge graphs and knowledge graph embedding (KGE) models have also shown promise in DTI prediction [15, 21]. In [15], a causal intervention-based confidence measure assesses the triplet score to improve the accuracy of the DTI prediction model. The study in [21] proposes an inductive RGCN for learning informative relation embeddings, even in the few-shot learning regime, which can be applied on the drug-repurposing knowledge graph (DRKG) for discovering drugs for Covid-19. Another study [29] proposes a self-attention-based multi-view representation learning approach for modeling drug-target interactions, achieving competitive prediction performance and offering biologically plausible drug-target interaction interpretations. Furthermore, the study in [30] proposes a convolutional neural network for EEG-mediated DTI prediction, which allows the identification of similarities in the mechanisms of action and effects of psychotropic drugs.

Drug Repurposing

Drug repurposing, the process of identifying new uses for existing drugs, offers a faster and more cost-effective approach to drug discovery compared to de novo drug development [3, 13, 20, 21, 24, 31, 37, 39]. By leveraging existing safety and efficacy data, drug repurposing can significantly accelerate the drug development process [13].

Several studies have explored the use of ML and AI for drug repurposing. NeuroCADR, a novel system for drug repurposing, uses a multi-pronged approach consisting of k-nearest neighbor algorithms (KNN), random forest classification, and decision trees [13]. The system identified novel drug candidates for epilepsy. In [20], a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases (KGML-xDTD) is proposed, which can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths. In the context of the COVID-19 pandemic, [37] proposes Dr-COVID, a graph neural network (GNN) based drug repurposing model. The model constructs a four-layered heterogeneous graph to model the complex interactions between drugs, diseases, genes, and anatomies. The study in [31] proposes a multi-agent framework to enhance the drug repurposing process using state-of-the-art machine learning techniques and knowledge integration. Similarly, [39] develops a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised).

Self-supervised learning can also be applied to drug repurposing to address label sparsity [24]. The study in [24] proposes a multi-task self-supervised learning framework for computational drug repositioning, which tackles label sparsity by learning a better drug representation. The framework uses data augmentation strategies and contrast learning to mine the internal relationships of the original drug features and a multi-input decoding network to improve the reconstruction ability of the autoencoder model.

Addressing Challenges in Drug Discovery with ML/QML

While ML and QML offer significant promise for drug discovery, several challenges need to be addressed to fully realize their potential. These include data scarcity, the need for interpretability, and the development of methods for handling novel compounds.

Data Scarcity and Cold Start Problems

The availability of high-quality, labeled data is often a limiting factor in ML applications, particularly in drug discovery [24, 25]. Many drug discovery tasks, such as predicting drug-target interactions or drug responses, suffer from data scarcity, making it difficult to train robust and accurate models [8, 24].

To address this challenge, several approaches have been developed. Self-supervised learning techniques can be used to learn representations from unlabeled data, which can then be used to improve the performance of supervised models [24, 26]. Transfer learning, where knowledge learned from related tasks is transferred to the target task, can also be effective [8]. For instance, [8] proposes using transfer learning from chemical-chemical interaction (CCI) and protein-protein interaction (PPI) task to drug-target interaction task to solve the cold start problem. The representation learned by CCI and PPI tasks can be transferred smoothly to the drug-target interaction task due to the similar nature of the tasks. The study in [25] discusses the performance of classical and quantum classifiers in QSAR prediction and attempts to demonstrate the quantum advantages in the generalization power of the quantum classifier under conditions of limited data availability.

Interpretability

Many ML models, particularly deep learning models, are often considered "black boxes," making it difficult to understand why they make certain predictions [2, 17]. This lack of interpretability can be a major barrier to the adoption of ML in drug discovery, as it can be challenging to trust and validate the predictions made by these models.

Explainable artificial intelligence (XAI) techniques are being developed to address this challenge [17]. XAI methods aim to provide insights into the decision-making process of ML models, making them more transparent and understandable. The study in [17] provides a comprehensive overview of the current state-of-the-art in XAI for drug discovery, including various XAI methods, their application in drug discovery, and the challenges and limitations of XAI techniques in drug discovery. The study in [29] proposes a self-attention-based multi-view representation learning approach for modeling drug-target interactions that offer biologically plausible drug-target interaction interpretations. The KGML-xDTD framework in [20] provides KG-path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications.

Handling Novel Structures

Many ML models struggle to generalize to novel chemical structures or biological targets that are not well-represented in the training data [21, 28]. This is particularly problematic in drug discovery, where researchers are often interested in identifying novel drug candidates or targeting previously unexplored proteins.

Several strategies are being explored to address this challenge. Graph neural networks (GNNs), which are designed to handle graph-structured data, are particularly well-suited for modeling molecular structures [21, 26, 27]. The study in [21] proposes an inductive RGCN for learning informative relation embeddings, even in the few-shot learning regime. The cross-field information fusion strategy in [28] is employed to acquire local and global protein information.

Quantum Machine Learning: A New Frontier

Quantum computing offers the potential to overcome some of the limitations of classical ML, particularly in handling complex data and performing computationally intensive tasks [4, 6, 10, 11]. QML algorithms can potentially accelerate drug discovery by enabling more accurate simulations, faster molecular property predictions, and the efficient exploration of chemical space [4, 6, 10].

Several studies have explored the application of QML to drug discovery [4, 9, 10, 11, 18, 25]. For example, [4] proposes a suite of QML techniques to generate small drug molecules, classify binding pockets in proteins, and generate large drug molecules. The study in [10] discusses the theoretical foundations of quantum machine learning, including data encoding, variational quantum circuits, and hybrid quantum-classical approaches. The study in [11] presents a scalable quantum generative autoencoder (SQ-VAE) for simultaneously reconstructing and sampling drug molecules. The study in [18] built a compact discrete variational autoencoder (DVAE) with a Restricted Boltzmann Machine (RBM) of reduced size in its latent layer, which could fit a state-of-the-art D-Wave quantum annealer and generate novel chemical structures with medicinal chemistry and synthetic accessibility properties. The study in [9] introduces a novel hybrid quantum-classical deep learning model tailored for binding affinity prediction in drug discovery. The study in [25] discusses the performance of classical and quantum classifiers in QSAR prediction.

Hybrid quantum-classical approaches, which combine the strengths of both quantum and classical computing, are particularly promising [9, 10, 18]. These approaches can leverage quantum computers for specific tasks, such as molecular simulations, while using classical computers for other aspects of the drug discovery pipeline.

Future Directions

The application of ML and QML to drug discovery is still in its early stages, and significant opportunities remain for future research and development. Several key areas warrant further investigation:

  • Development of more robust and interpretable models: Future research should focus on developing ML models that are more robust to noise and data scarcity, and that provide more interpretable predictions. XAI techniques will be critical for building trust and confidence in these models.
  • Integration of multi-modal data: Drug discovery involves a wide range of data sources, including chemical structures, genomic data, clinical trial data, and literature. Future research should focus on developing ML models that can effectively integrate and analyze multi-modal data to gain a more comprehensive understanding of drug action and disease mechanisms.
  • Advancements in quantum machine learning: QML has the potential to significantly accelerate drug discovery, but the technology is still in its infancy. Future research should focus on developing more efficient QML algorithms, building larger and more powerful quantum computers, and exploring the application of QML to a wider range of drug discovery tasks.
  • Automated drug discovery pipelines: The ultimate goal is to create automated drug discovery pipelines that can quickly and efficiently identify new drug candidates. This will require the integration of various ML and QML techniques, as well as the development of new methods for data management, model training, and validation.
  • Addressing ethical considerations: As ML and QML become more widely used in drug discovery, it is important to address ethical considerations, such as data privacy, bias, and the potential for misuse.
  • The ongoing convergence of quantum computing and artificial intelligence has the potential to revolutionize the field of drug discovery, leading to faster, cheaper, and more effective treatments for a wide range of diseases [6, 19]. While challenges remain, the rapid pace of innovation in both ML and QML suggests that these technologies will play an increasingly important role in the future of medicine.

    ==================================================

    References

  • Junde Li, Collin Beaudoin, Swaroop Ghosh. Energy-based Generative Models for Target-specific Drug Discovery. arXiv:2212.02404v1 (2022). Available at: http://arxiv.org/abs/2212.02404v1
  • Kun Li, Yida Xiong, Hongzhi Zhang, Xiantao Cai, Bo Du, Wenbin Hu. Small Molecule Drug Discovery Through Deep Learning:Progress, Challenges, and Opportunities. arXiv:2502.08975v1 (2025). Available at: http://arxiv.org/abs/2502.08975v1
  • Kun Li, Yong Luo, Xiantao Cai, Wenbin Hu, Bo Du. Zero-shot Learning of Drug Response Prediction for Preclinical Drug Screening. arXiv:2310.12996v1 (2023). Available at: http://arxiv.org/abs/2310.12996v1
  • Junde Li, Mahabubul Alam, Congzhou M Sha, Jian Wang, Nikolay V. Dokholyan, Swaroop Ghosh. Drug Discovery Approaches using Quantum Machine Learning. arXiv:2104.00746v1 (2021). Available at: http://arxiv.org/abs/2104.00746v1
  • Yuanyuan Zhang, Yingdong Wang, Chaoyong Wu, Lingmin Zhana, Aoyi Wang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Jianxin Chen, Peng Li. Drug-target interaction prediction by integrating heterogeneous information with mutual attention network. arXiv:2404.03516v1 (2024). Available at: http://arxiv.org/abs/2404.03516v1
  • Yidong Zhou, Jintai Chen, Jinglei Cheng, Gopal Karemore, Marinka Zitnik, Frederic T. Chong, Junyu Liu, Tianfan Fu, Zhiding Liang. Quantum-machine-assisted Drug Discovery: Survey and Perspective. arXiv:2408.13479v3 (2024). Available at: http://arxiv.org/abs/2408.13479v3
  • Yi Zhong, Xueyu Chen, Yu Zhao, Xiaoming Chen, Tingfang Gao, Zuquan Weng. Graph-augmented Convolutional Networks on Drug-Drug Interactions Prediction. arXiv:1912.03702v1 (2019). Available at: http://arxiv.org/abs/1912.03702v1
  • Tri Minh Nguyen, Thin Nguyen, Truyen Tran. Mitigating cold start problems in drug-target affinity prediction with interaction knowledge transferring. arXiv:2202.01195v1 (2022). Available at: http://arxiv.org/abs/2202.01195v1
  • L. Domingo, M. Chehimi, S. Banerjee, S. He Yuxun, S. Konakanchi, L. Ogunfowora, S. Roy, S. Selvaras, M. Djukic, C. Johnson. A hybrid quantum-classical fusion neural network to improve protein-ligand binding affinity predictions for drug discovery. arXiv:2309.03919v3 (2023). Available at: http://arxiv.org/abs/2309.03919v3
  • Anthony M. Smaldone, Yu Shee, Gregory W. Kyro, Chuzhi Xu, Nam P. Vu, Rishab Dutta, Marwa H. Farag, Alexey Galda, Sandeep Kumar, Elica Kyoseva, Victor S. Batista. Quantum Machine Learning in Drug Discovery: Applications in Academia and Pharmaceutical Industries. arXiv:2409.15645v1 (2024). Available at: http://arxiv.org/abs/2409.15645v1
  • Junde Li, Swaroop Ghosh. Scalable Variational Quantum Circuits for Autoencoder-based Drug Discovery. arXiv:2112.12563v1 (2021). Available at: http://arxiv.org/abs/2112.12563v1
  • Abhijit Gupta. CardiGraphormer: Unveiling the Power of Self-Supervised Learning in Revolutionizing Drug Discovery. arXiv:2307.00859v4 (2023). Available at: http://arxiv.org/abs/2307.00859v4
  • Srilekha Mamidala. NeuroCADR: Drug Repurposing to Reveal Novel Anti-Epileptic Drug Candidates Through an Integrated Computational Approach. arXiv:2309.13047v1 (2023). Available at: http://arxiv.org/abs/2309.13047v1
  • Josip Mesarić. Novel prediction methods for virtual drug screening. arXiv:2202.06635v1 (2022). Available at: http://arxiv.org/abs/2202.06635v1
  • Wenting Ye, Chen Li, Yang Xie, Wen Zhang, Hong-Yu Zhang, Bowen Wang, Debo Cheng, Zaiwen Feng. Causal Intervention for Measuring Confidence in Drug-Target Interaction Prediction. arXiv:2306.00041v2 (2023). Available at: http://arxiv.org/abs/2306.00041v2
  • Boyuan Liu. Drug-target affinity prediction method based on consistent expression of heterogeneous data. arXiv:2211.06792v1 (2022). Available at: http://arxiv.org/abs/2211.06792v1
  • Roohallah Alizadehsani, Solomon Sunday Oyelere, Sadiq Hussain, Rene Ripardo Calixto, Victor Hugo C. de Albuquerque, Mohamad Roshanzamir, Mohamed Rahouti, Senthil Kumar Jagatheesaperumal. Explainable Artificial Intelligence for Drug Discovery and Development — A Comprehensive Survey. arXiv:2309.12177v2 (2023). Available at: http://arxiv.org/abs/2309.12177v2
  • A. I. Gircha, A. S. Boev, K. Avchaciov, P. O. Fedichev, A. K. Fedorov. Hybrid quantum-classical machine learning for generative chemistry and drug design. arXiv:2108.11644v3 (2021). Available at: http://arxiv.org/abs/2108.11644v3
  • Catrin Hasselgren, Tudor I. Oprea. Artificial Intelligence for Drug Discovery: Are We There Yet?. arXiv:2307.06521v1 (2023). Available at: http://arxiv.org/abs/2307.06521v1
  • Chunyu Ma, Zhihan Zhou, Han Liu, David Koslicki. KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description. arXiv:2212.01384v2 (2022). Available at: http://arxiv.org/abs/2212.01384v2
  • Vassilis N. Ioannidis, Da Zheng, George Karypis. Few-shot link prediction via graph neural networks for Covid-19 drug-repurposing. arXiv:2007.10261v1 (2020). Available at: http://arxiv.org/abs/2007.10261v1
  • Rıza Özçelik, Derek van Tilborg, José Jiménez-Luna, Francesca Grisoni. Structure-based drug discovery with deep learning. arXiv:2212.13295v1 (2022). Available at: http://arxiv.org/abs/2212.13295v1
  • Clemens Isert, Kenneth Atz, Gisbert Schneider. Structure-based drug design with geometric deep learning. arXiv:2210.11250v1 (2022). Available at: http://arxiv.org/abs/2210.11250v1
  • Xinxing Yang, Genke Yang, Jian Chu. Self-supervised Learning for Label Sparsity in Computational Drug Repositioning. arXiv:2206.00262v1 (2022). Available at: http://arxiv.org/abs/2206.00262v1
  • Wei-Yin Chiang, Po-Yu Kao, Tzu-Lan Yeh, Ya-Chu Yang, Yen-Chu Lin, Alex Zhavoronkov. Enhancing Drug Discovery: Quantum Machine Learning for QSAR Prediction with Incomplete Data. arXiv:2501.13395v1 (2025). Available at: http://arxiv.org/abs/2501.13395v1
  • Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song. Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv:2012.11175v1 (2020). Available at: http://arxiv.org/abs/2012.11175v1
  • Amritpal Singh. GraphPrint: Extracting Features from 3D Protein Structure for Drug Target Affinity Prediction. arXiv:2407.10452v1 (2024). Available at: http://arxiv.org/abs/2407.10452v1
  • Hongzhi Zhang, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu. A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction. arXiv:2405.14545v1 (2024). Available at: http://arxiv.org/abs/2405.14545v1
  • Brighter Agyemang, Wei-Ping Wu, Michael Yelpengne Kpiebaareh, Zhihua Lei, Ebenezer Nanor, Lei Chen. Multi-View Self-Attention for Interpretable Drug-Target Interaction Prediction. arXiv:2005.00397v2 (2020). Available at: http://arxiv.org/abs/2005.00397v2
  • Konstantin Y. Kalitin, Alexey A. Nevzorov, Denis A. Babkov, Alexander A. Spasov, Olga Y. Mukha. Deep learning analysis of intracranial EEG for recognizing drug effects and mechanisms of action. arXiv:2009.12984v3 (2020). Available at: http://arxiv.org/abs/2009.12984v3
  • Yoshitaka Inoue, Tianci Song, Tianfan Fu. DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning. arXiv:2408.13378v3 (2024). Available at: http://arxiv.org/abs/2408.13378v3
  • Tianyue Cheng, Tianchi Fan, Landi Wang. Genetic Constrained Graph Variational Autoencoder for COVID-19 Drug Discovery. arXiv:2104.11674v1 (2021). Available at: http://arxiv.org/abs/2104.11674v1
  • Jianyuan Deng, Zhibo Yang, Iwao Ojima, Dimitris Samaras, Fusheng Wang. Artificial Intelligence in Drug Discovery: Applications and Techniques. arXiv:2106.05386v4 (2021). Available at: http://arxiv.org/abs/2106.05386v4
  • Christopher Tosh, Daniel Hsu. Diameter-based Interactive Structure Discovery. arXiv:1906.02101v2 (2019). Available at: http://arxiv.org/abs/1906.02101v2
  • Xianbin Ye, Ziliang Li, Fei Ma, Zongbi Yi, Pengyong Li, Jun Wang, Peng Gao, Yixuan Qiao, Guotong Xie. CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer. arXiv:2203.00836v2 (2022). Available at: http://arxiv.org/abs/2203.00836v2
  • Shahar Harel, Kira Radinsky. Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks. arXiv:1804.02668v1 (2018). Available at: http://arxiv.org/abs/1804.02668v1
  • Siddhant Doshi, Sundeep Prabhakar Chepuri. Dr-COVID: Graph Neural Networks for SARS-CoV-2 Drug Repurposing. arXiv:2012.02151v1 (2020). Available at: http://arxiv.org/abs/2012.02151v1
  • Yizhen Zheng, Huan Yee Koh, Maddie Yang, Li Li, Lauren T. May, Geoffrey I. Webb, Shirui Pan, George Church. Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials. arXiv:2409.04481v1 (2024). Available at: http://arxiv.org/abs/2409.04481v1
  • Ke Yu, Shyam Visweswaran, Kayhan Batmanghelich. Hyperbolic Molecular Representation Learning for Drug Repositioning. arXiv:2208.06361v1 (2022). Available at: http://arxiv.org/abs/2208.06361v1
  • Alun Stokes, William Hum, Jonathan Zaslavsky. A Minimal-Input Multilayer Perceptron for Predicting Drug-Drug Interactions Without Knowledge of Drug Structure. arXiv:2005.10644v1 (2020). Available at: http://arxiv.org/abs/2005.10644v1
  • More Saikat Barua's questions See All
    Similar questions and discussions