Scientific Publications

Artificial Intelligence for Drug Discovery: Are We There Yet?

Annual Review of Pharmacology and Toxicology 2024 64:1

Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.

Autophagy dark genes: Can we find them with machine learning?

Natural Sciences. 2023; 3:e20220067.

Identifying novel autophagy (ATG) associated genes in humans remains an important task for understanding this fundamental physiological process. Machine learning (ML) can highlight potentially “missing pieces” linking core ATG genes with understudied, “dark” genes by mining functional genomic data. Here, a set of 103 (out of 288 genes from the Autophagy Database) was used as training set, based on ATG-associated terms annotated from 3 secondary sources: GO (gene ontology), Kyoto Encyclopedia of Genes and Genomes pathway, and UniProt keywords, as additional confirmation of their importance in ATG. As negative labels, an OMIM list of genes associated with monogenic diseases was used (after excluding the 288 ATG-associated genes). Data related to these genes from 17 different sources were compiled and used to derive a trained MetaPath/XGBoost (MPxgb) ML model for distinguishing ATG and non-ATG genes (10-fold cross-validated, 100-times randomized models, median area under the curve = 0.994 ± 0.008). Sixteen ATG-relevant variables explained 64% of the total model gain. Overall, 23% of the top 251 predicted genes are annotated in the Autophagy Database, whereas 193 genes (77%) are not. In 2019, we suggested that some of these 193 genes may represent “ATG dark genes.” A literature search in 2022 for those top 20 predicted ATG dark genes found that 9 were subsequently reported as ATG genes during the intervening 3.5 years. A post-factum evaluation of data leakage (the presence of ATG-associated terms in the top 40 ML features) confirms that 7 out of these 9 genes and 2 out of 3 other recently validated predictions from the bottom 20 are novel. Those genes with the largest number of ATG features would be most likely to yield valuable experimental insights. Modern high-throughput testing would be capable of spanning the full 193 ATG genes list reported here. Our analysis demonstrates that ML can guide genomics research to gain a more complete functional and pathway annotation of complex processes.

Novel drug targets in 2022

Nature Reviews Drug Discovery 22, 437 (2023)

In 2022, the number of novel drugs approved in the USA, European Union and Japan declined, dropping from ~60 in recent years to 48. Of these, 35 have well-established mechanism-of-action (MoA) targets (Nat. Rev. Drug Discov. 16, 19–34; 2017) as described in package inserts and primary literature. Here, we focus on the 12 drugs approved in 2022 with novel MoA targets (Table 1). These targets have not been modulated by any previously approved drug.

Molecular Complexity: You Know It When You See It

J. Med. Chem. 2023, https://doi.org/10.1021/acs.jmedchem.3c01507

Molecular complexity (MC) lacks a universal definition, but various studies address it in contexts ranging from ligand–receptor interactions to DNA sequencing, with the overarching emphasis being its significance in synthetic organic chemistry and pharmaceutical research. Efforts to quantify MC in drug discovery have been numerous, but a unified approach remains challenging. Strategies based on graph theory, information theory, and substructural feature counts employed to gauge MC are often correlated to molecular weight (MW). Herbert Waldmann and his team introduced a new MC metric called the spacial score (SPS), which is based on factors like atom hybridization and stereoisomeric considerations. While SPS and its normalized version, nSPS, correlate with the natural product likeness score, they do not align with traditional chemical properties. We examined nSPS trends for approved drugs and found no significant changes in MC over eight decades, nor did nSPS capture drug innovation during that period. Furthermore, our analysis indicates that while the majority of approved drugs have an nSPS value between 10 and 20, this metric does not correlate with key drug properties like target bioactivity and oral bioavailability. Mirroring a chemist’s intuitive sense of chemical complexity, nSPS addresses the need for a precise empirical tool while a universal definition of MC remains elusive.

Exploring DrugCentral: from molecular structures to clinical effects

J Comput Aided Mol Des (2023). https://doi.org/10.1007/s10822-023-00529-x

DrugCentral, accessible at https://drugcentral.org, is an open-access online drug information repository. It covers over 4950 drugs, incorporating structural, physicochemical, and pharmacological details to support drug discovery, development, and repositioning. With around 20,000 bioactivity data points, manual curation enhances information from several major digital sources. Approximately 724 mechanism-of-action (MoA) targets offer updated drug target insights. The platform captures clinical data: over 14,300 on- and off-label uses, 27,000 contraindications, and around 340,000 adverse drug events from pharmacovigilance reports. DrugCentral encompasses information from molecular structures to marketed formulations, providing a comprehensive pharmaceutical reference. Users can easily navigate basic drug information and key features, making DrugCentral a versatile, unique resource. Furthermore, we present a use-case example where we utilize experimentally determined data from DrugCentral to support drug repurposing. A minimum activity threshold t should be considered against novel targets to repurpose a drug. Analyzing 1156 bioactivities for human MoA targets suggests a general threshold of 1 µM: t = 6 when expressed as − log[Activity(M)]). This applies to 87% of the drugs. Moreover, t can be refined empirically based on water solubility (S): t = 3 − logS, for logS < − 3. Alongside the drug repurposing classification scheme, which considers intellectual property rights, market exclusivity protections, and market accessibility, DrugCentral provides valuable data to prioritize candidates for drug repurposing programs efficiently.