Featured Researches

Biomolecules

Machine learning a model for RNA structure prediction

RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble populations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS, SHAPE), and co-evolutionary data (Direct Coupling Analysis, DCA) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained. With the selected model we obtain increased ensemble populations for native structures and more accurate predictions in an independent validation set. The flexibility of the approach allows the model to be easily retrained and adapted to incorporate arbitrary experimental information.

Read more
Biomolecules

Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition

In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.

Read more
Biomolecules

Machine learning-assisted directed protein evolution with combinatorial libraries

To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning in the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine learning models trained on tested variants provide a fast method for testing sequence space computationally. We validate this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee. By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.

Read more
Biomolecules

Machine learning-guided directed evolution for protein engineering

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.

Read more
Biomolecules

Machine-learning a virus assembly fitness landscape

Realistic evolutionary fitness landscapes are notoriously difficult to construct. A recent cutting-edge model of virus assembly consists of a dodecahedral capsid with 12 corresponding packaging signals in three affinity bands. This whole genome/phenotype space consisting of 3 12 genomes has been explored via computationally expensive stochastic assembly models, giving a fitness landscape in terms of the assembly efficiency. Using latest machine-learning techniques by establishing a neural network, we show that the intensive computation can be short-circuited in a matter of minutes to astounding accuracy.

Read more
Biomolecules

Macromolecule Classification Based on the Amino-acid Sequence

Deep learning is playing a vital role in every field which involves data. It has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. In this study we focused on classification of protein sequences with deep learning techniques. The study of amino acid sequence is vital in life sciences. We used different word embedding techniques from Natural Language processing to represent the amino acid sequence as vectors. Our main goal was to classify sequences to four group of classes, that are DNA, RNA, Protein and hybrid. After several tests we have achieved almost 99% of train and test accuracy. We have experimented on CNN, LSTM, Bidirectional LSTM, and GRU.

Read more
Biomolecules

Mapping active allosteric loci SARS-CoV Spike Proteins by means of Protein Contact Networks

Coronaviruses are a class of virus responsible of the recent outbreak of Human Severe Acute Respiratory Syndrome. The molecular machinery behind the viral entry and thus infectivity is based on the formation of the complex of virus spike protein with the angiotensin-converting enzyme 2 (ACE2). The detection of putative allosteric sites on the viral spike protein can trace the path to develop allosteric drugs to weaken the strength of the spike-ACE2 interface and, thus, reduce the viral infectivity. In this work we present results of the application of the Protein Contact Network (PCN) paradigm to the complex SARS-CoV spike - ACE2 relative to both 2003 SARS and the recent 2019 - CoV. Results point to a specific region, present in both structures, that is predicted to act as allosteric site modulating the binding of the spike protein with ACE2.

Read more
Biomolecules

Mass-Resolved Electronic Circular Dichroism Ion Spectroscopy

DNA and proteins are chiral: their three-dimensional structure cannot be superimposed with its mirror image. Circular dichroism spectroscopy is widely used to characterize chiral compounds, but data interpretation is difficult in the case of mixtures. We recorded for the first time the electronic circular dichroism spectra of DNA helices separated in a mass spectrometer. We electrosprayed guanine-rich strands having various secondary structures as negative ions, irradiated them with a laser, and measured the difference in electron photodetachment efficiency between left and right circularly polarized light. The reconstructed circular dichroism ion spectra resemble the solution ones, thereby allowing us to assign the DNA helical topology. The ability to measure circular dichroism directly on biomolecular ions expands the capabilities of mass spectrometry for structural analysis.

Read more
Biomolecules

Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges

Advanced mathematics, such as multiscale weighted colored graph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R grand challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 (GC2) focused on the pose prediction and binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy Set 1 in Stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has 5 subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38- α , TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of 26 official competitive tasks.

Read more
Biomolecules

Mathematical modeling and analysis of the pathway network consisting of symmetrical complexes with N monomers, like the activation of MMP2

The activation of matrix metalloproteinase 2 (MMP2) is a crucial event during tumor metastasis and invasion, and this pathway network consists of 3 monomers. The pathway network of the activation obeys to a set of specified reaction rules. According to the rules, the individual molecules localize in a particular order and symmetrically around a homodimer following the formation of that dimer. We generalized the homodimer pathway network obeying to similar reaction rules, by changing the number of monomers involved in this pathway from 3 to N. At the previous work, we found the molecules in the pathway network are classified to some reaction groups. We derived the law of mass conservation between the groups. Each group concentration converges to its equilibrium solution. Using these results, we derive the concentrations of the complexes theoretically and reveal that each complex concentration converges to its equilibrium value. We can say the pathway network with homodimer symmetric form complexes is asymptotic stable and identify the regulator parameter of the target complex in the network. Our mathematical approach may help us understand the mechanism of this type pathway network by knowing the background mathematical laws which govern this type pathway network.

Read more

Ready to get started?

Join us today