Featured Researches

Biomolecules

Evaluation of Peppermint Leaf Flavonoids as SARS-CoV-2 Spike Receptor-Binding Domain Attachment Inhibitors to the Human ACE2 Receptor: A Molecular Docking Study

One of the strategies in combating COVID-19 consists of virtual screening for possible inhibitors for the attachment of SARS-CoV-2 Spike receptor-binding domain (RBD) to the human ACE2 receptor. Here, we performed a molecular docking study to propose potential candidates to prevent the RBD/ACE2 attachment. These candidates are sixteen different flavonoids present in the peppermint leaf. Results showed that Luteolin 7-O-neohesperidoside is the peppermint flavonoid with a higher binding affinity regarding the RBD/ACE2 complex (about -9.18 Kcal/mol). On the other hand, Sakuranetin was the one with the lowest affinity (about -6.38 Kcal/mol). Binding affinities of the other peppermint flavonoids ranged from -6.44 Kcal/mol up to -9.05 Kcal/mol. The biding site surface analysis showed pocket-like regions on the RBD/ACE2 complex that yield several interactions (mostly hydrogen bonds) between the flavonoid and the amino acid residues of the proteins. This study can open channels for the understanding of the roles of flavonoids against COVID-19 infection.

Read more
Biomolecules

Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning

Self-supervised representation learning of biological sequence embeddings alleviates computational resource constraints on downstream tasks while circumventing expensive experimental label acquisition. However, existing methods mostly borrow directly from large language models designed for NLP, rather than with bioinformatics philosophies in mind. Recently, contrastive mutual information maximization methods have achieved state-of-the-art representations for ImageNet. In this perspective piece, we discuss how viewing evolution as natural sequence augmentation and maximizing information across phylogenetic "noisy channels" is a biologically and theoretically desirable objective for pretraining encoders. We first provide a review of current contrastive learning literature, then provide an illustrative example where we show that contrastive learning using evolutionary augmentation can be used as a representation learning objective which maximizes the mutual information between biological sequences and their conserved function, and finally outline rationale for this approach.

Read more
Biomolecules

Evolving methods for rational de novo design of functional RNA molecules

Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.

Read more
Biomolecules

Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts

Predicting compound-protein affinity is critical for accelerating drug discovery. Recent progress made by machine learning focuses on accuracy but leaves much to be desired for interpretability. Through molecular contacts underlying affinities, our large-scale interpretability assessment finds commonly-used attention mechanisms inadequate. We thus formulate a hierarchical multi-objective learning problem whose predicted contacts form the basis for predicted affinities. We further design a physics-inspired deep relational network, DeepRelations, with intrinsically explainable architecture. Specifically, various atomic-level contacts or "relations" lead to molecular-level affinity prediction. And the embedded attentions are regularized with predicted structural contexts and supervised with partially available training contacts. DeepRelations shows superior interpretability to the state-of-the-art: without compromising affinity prediction, it boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets, respectively. Our study represents the first dedicated model development and systematic model assessment for interpretable machine learning of compound-protein affinity.

Read more
Biomolecules

Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery

Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to construct models to predict molecular properties or to design novel molecules. This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.

Read more
Biomolecules

Exploring RNA structure and dynamics through enhanced sampling simulations

RNA function is intimately related to its structural dynamics. Molecular dynamics simulations are useful for exploring biomolecular flexibility but are severely limited by the accessible timescale. Enhanced sampling methods allow this timescale to be effectively extended in order to probe biologically-relevant conformational changes and chemical reactions. Here, we review the role of enhanced sampling techniques in the study of RNA systems. We discuss the challenges and promises associated with the application of these methods to force-field validation, exploration of conformational landscapes and ion/ligand-RNA interactions, as well as catalytic pathways. Important technical aspects of these methods, such as the choice of the biased collective variables and the analysis of multi-replica simulations, are examined in detail. Finally, a perspective on the role of these methods in the characterization of RNA dynamics is provided.

Read more
Biomolecules

Exploring the Regulatory Function of the N-terminal Domain of SARS-CoV-2 Spike Protein Through Molecular Dynamics Simulation

SARS-CoV-2 is what has caused the COVID-19 pandemic. Early viral infection is mediated by the SARS-CoV-2 homo-trimeric Spike (S) protein with its receptor binding domains (RBDs) in the receptor-accessible state. We performed molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs). Our study reveals that the NTD acts as a "wedge" and plays a crucial regulatory role in the conformational changes of the S protein. The complete RBD structural transition is allowed only when the neighboring NTD that typically prohibits the RBD's movements as a wedge detaches and swings away. Based on this NTD "wedge" model, we propose that the NTD-RBD interface should be a potential drug target.

Read more
Biomolecules

Featuring ACE2 binding SARS-CoV and SARS-CoV-2 through a conserved evolutionary pattern of amino acid residues

Spike (S) glycoproteins mediate the coronavirus entry into the host cell. The S1 subunit of S-proteins contains the receptor-binding domain (RBD) that is able to recognize different host receptors, highlighting its remarkable capacity to adapt to their hosts along the viral evolution. While RBD in spike proteins is determinant for the virus-receptor interaction, the active residues lie at the receptor-binding motif (RBM), a region located in RBD that plays a fundamental role binding the outer surface of their receptors. Here, we address the hypothesis that SARS-CoV and SARS-CoV-2 strains able to use angiotensin-converting enzyme 2 (ACE2) proteins have adapted their RBM along the viral evolution to explore specific conformational topology driven by the residues YGF to infect host cells. We also speculate that this YGF-based mechanism can act as a protein signature located at the RBM to distinguish coronaviruses able to use ACE2 as a cell entry receptor.

Read more
Biomolecules

First-Passage Time Distributions in Two-State Protein Folding Kinetics: Exploring the Native-Like States vs Overcoming the Free Energy Barrier

Using a beta-hairpin protein as a representative example of two-state folders, we studied how the exploration of native-like states affects the folding kinetics. It has been found that the first-passage time (FPT) distributions are essentially single-exponential not only for the times to overcome the free energy barrier that separates unfolded and native-like states but also for the times to find the native state among the native-like ones. If the protein explores native-like states for a time much longer than the time to overcome the free energy barrier, which was found to be characteristic of high temperatures, the resulting FPT distribution to reach the native state remains close to exponential but the mean FPT (MFPT) is determined not by the height of the free energy barrier but by the time to explore native-like states. The mean time to overcome the free energy barrier is found to be in reasonable agreement with the Kramers rate formula and generally far shorter than the MFPT to reach the native state. The time to find the native state among native-like ones increases with temperature, which explains the known U-shape dependence of the MFPTs on temperature.

Read more
Biomolecules

Fixed-Length Protein Embeddings using Contextual Lenses

The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being computationally expensive. As opposed to working with edit distance, a vector similarity approach can be accelerated substantially using modern hardware or hashing techniques. Such an approach would require fixed-length embeddings for biological sequences. There has been recent interest in learning fixed-length protein embeddings using deep learning models under the hypothesis that the hidden layers of supervised or semi-supervised models could produce potentially useful vector embeddings. We consider transformer (BERT) protein language models that are pretrained on the TrEMBL data set and learn fixed-length embeddings on top of them with contextual lenses. The embeddings are trained to predict the family a protein belongs to for sequences in the Pfam database. We show that for nearest-neighbor family classification, pretraining offers a noticeable boost in performance and that the corresponding learned embeddings are competitive with BLAST. Furthermore, we show that the raw transformer embeddings, obtained via static pooling, do not perform well on nearest-neighbor family classification, which suggests that learning embeddings in a supervised manner via contextual lenses may be a compute-efficient alternative to fine-tuning.

Read more

Ready to get started?

Join us today