Accelerating Drug Discovery With the AI Behind ChatGPT – Screening 100 Million Compounds a Day

by Henrik Andersen
0 comments

MIT and Tufts University researchers have developed an AI model called ConPLex that revolutionizes drug discovery by predicting drug-protein interactions without requiring the calculation of molecular structures. This groundbreaking model can screen more than 100 million compounds per day, offering significant potential for reducing drug development failure rates and costs.

By leveraging a language model to analyze protein-drug interactions, scientists can rapidly screen vast libraries of potential drug compounds. These libraries hold immense promise for treating various diseases, including cancer and heart disease. However, experimentally testing each compound against all potential targets is time-consuming and impractical.

To expedite drug discovery, computational methods have been employed to screen these libraries in recent years. However, many of these methods are time-consuming as they calculate the three-dimensional structures of target proteins from their amino acid sequences to predict their interactions with drug molecules.

MIT and Tufts University researchers have developed an alternative computational approach using a large language model—a prominent example being ChatGPT. These models excel at analyzing extensive amounts of text and determining the most likely associations between words (or amino acids in this case). The newly developed model, ConPLex, can match target proteins with potential drug molecules without requiring computationally intensive calculations of molecular structures.

This approach enables researchers to screen over 100 million compounds within a single day, surpassing the capabilities of existing models. Bonnie Berger, head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory, expresses that this work satisfies the need for efficient and accurate in silico screening of potential drug candidates. The scalability of ConPLex enables large-scale screens for off-target effects assessment, drug repurposing, and evaluating the impact of mutations on drug binding.

The paper, published in the Proceedings of the National Academy of Sciences on June 8, features Lenore Cowen, a professor of computer science at Tufts University, as a senior author. The lead authors are Rohit Singh, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, and Samuel Sledzieski, an MIT graduate student. Bryan Bryson, an associate professor of biological engineering at MIT, and a member of the Ragon Institute of MGH, MIT, and Harvard, is also an author. Furthermore, the researchers have made their model publicly available online for other scientists to utilize.

The researchers have made substantial progress in predicting protein structures based on their amino acid sequences in recent years. However, using these models to predict how a large library of potential drugs interacts with cancerous proteins, for instance, has proven challenging due to the considerable time and computational power required for calculating protein structures.

Another hurdle is the models’ limited ability to eliminate decoy compounds, which closely resemble successful drugs but do not interact effectively with the target. Even minor differences between these compounds and the genuine drugs may still yield positive interaction predictions.

While models addressing this fragility have been designed, they are typically tailored to specific classes of drug molecules and are ill-suited for large-scale screening due to lengthy computations.

MIT researchers opted for an alternative approach based on a protein model they developed in 2019. They worked with a database containing over 20,000 proteins and encoded this information into meaningful numerical representations of each amino acid sequence. These representations capture associations between sequence and structure.

The language model allows similar representations of proteins with distinct sequences but potentially similar structures or functions in a shared language space. This enables leveraging the representations to make predictions.

In their study, the researchers employed the protein model to determine which protein sequences interact with specific drug molecules. Both the protein sequences and the drug molecules have numerical representations transformed into a common shared space by a neural network. By training the network on known protein-drug interactions, it learned to associate specific features of proteins with drug-binding ability without calculating the 3D structure of the molecules

You may also like

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.

SciTechPost is a web resource dedicated to providing up-to-date information on the fast-paced world of science and technology. Our mission is to make science and technology accessible to everyone through our platform, by bringing together experts, innovators, and academics to share their knowledge and experience.

Subscribe

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!