Featured Researches

Biomolecules

DeepAcid: Classification of macromolecule type based on sequences of amino acids

The study of the amino acid sequence is vital in life sciences. In this paper, we are using deep learning to solve macromolecule classification problem using amino acids. Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. We are using word embedding from NLP to represent the amino acid sequence as vectors. We are using different deep learning model for classification of macromolecules like CNN, LSTM, and GRU. Convolution neural network can extract features from amino acid sequences which are represented by vectors. The extracted features will be feed to a different type of model to train a robust classifier. our results show that Word2vec as embedding combine with VGG-16 has better performance than LSTM and GRU. our approach gets an error rate of 1.5%. Code is available at this https URL

Read more
Biomolecules

DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks

Motivation: Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy, and interpretability. Results: We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally-annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC 50 within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. Availability: Data and source codes are available at this https URL Supplementary Information: Supplementary data are available at this http URL

Read more
Biomolecules

DeepAtom: A Framework for Protein-Ligand Binding Affinity Prediction

The cornerstone of computational drug design is the calculation of binding affinity between two biological counterparts, especially a chemical compound, i.e., a ligand, and a protein. Predicting the strength of protein-ligand binding with reasonable accuracy is critical for drug discovery. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. With validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set, we demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other state-of-the-art scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.

Read more
Biomolecules

DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins

The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the protein's surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches.

Read more
Biomolecules

Deflation reveals dynamical structure in nondominant reaction coordinates

The output of molecular dynamics simulations is high-dimensional, and the degrees of freedom among the atoms are related in intricate ways. Therefore, a variety of analysis frameworks have been introduced in order to distill complex motions into lower-dimensional representations that model the system dynamics. These dynamical models have been developed to optimally approximate the system's global kinetics. However, the separate aims of optimizing global kinetics and modeling a process of interest diverge when the process of interest is not the slowest process in the system. Here, we introduce deflation into state-of-the-art methods in molecular kinetics in order to preserve the use of variational optimization tools when the slowest dynamical mode is not the same as the one we seek to model and understand. First, we showcase deflation for a simple toy system and introduce the deflated variational approach to Markov processes (dVAMP). Using dVAMP, we show that nondominant reaction coordinates produced using deflation are more informative than their counterparts generated without deflation. Then, we examine a protein folding system in which the slowest dynamical mode is not folding. Following a dVAMP analysis, we show that deflation can be used to obscure this undesired slow process from a kinetic model, in this case a VAMPnet. The incorporation of deflation into current methods opens the door for enhanced sampling strategies and more flexible, targeted model building.

Read more
Biomolecules

Delineating elastic properties of kinesin linker and their sensitivity to point mutations

We analyze free energy estimators from simulation trials mimicking single-molecule pulling experiments on a neck linker of a kinesin motor. For that purpose, we have performed a version of steered molecular dynamics (SMD) calculations. The sample trajectories have been analyzed to derive distribution of work done on the system. In order to induce unfolding of the linker, we have stretched the molecule at a constant pulling force and allowed for a subsequent relaxation of its structure. The use of fluctuation relations (FR) relevant to non-equilibrium systems subject to thermal fluctuations allows us to assess the difference in free energy between stretched and relaxed conformations. To further understand effects of potential mutations on elastic properties of the linker, we have performed similar in silico studies on a structure formed of a polyalanine sequence (Ala-only) and on three other structures, created by substituting selected types of amino acid residues in the linker's sequence with alanine (Ala) ones. The results of SMD simulations indicate a crucial role played by the Asparagine (Asn) and Lysine (Lys) residues in controlling stretching and relaxation properties of the linker domain of the motor.

Read more
Biomolecules

Design Of Drug-Like Protein-Protein Interaction Stabilizers Guided By Chelation-Controlled Bioactive Conformation Stabilization

The protein-protein interactions (PPIs) of 14-3-3 proteins are a model system for studying PPI stabilization. The complex natural product Fusicoccin A stabilizes many 14-3-3 PPIs but is not amenable for use in SAR studies, motivating the search for more drug-like chemical matter. However, drug-like 14-3-3 PPI stabilizers enabling such study have remained elusive. An X-ray crystal structure of a PPI in complex with an extremely low potency stabilizer uncovered an unexpected non-protein interacting, ligand-chelated Mg 2+ leading to the discovery of metal ion-dependent 14-3-3 PPI stabilization potency. This originates from a novel chelation-controlled bioactive conformation stabilization effect. Metal chelation has been associated with pan-assay interference compounds (PAINS) and frequent hitter behavior, but chelation can evidently also lead to true potency gains and find use as a medicinal chemistry strategy to guide compound optimization. To demonstrate this, we exploited the effect to design the first potent, selective and drug-like 14-3-3 PPI stabilizers.

Read more
Biomolecules

Design of metalloproteins and novel protein folds using variational autoencoders

The design of novel proteins has many applications but remains an attritional process with success in isolated cases. Meanwhile, deep learning technologies have exploded in popularity in recent years and are increasingly applicable to biology due to the rise in available data. We attempt to link protein design and deep learning by using variational autoencoders to generate protein sequences conditioned on desired properties. Potential copper and calcium binding sites are added to non-metal binding proteins without human intervention and compared to a hidden Markov model. In another use case, a grammar of protein structures is developed and used to produce sequences for a novel protein topology. One candidate structure is found to be stable by molecular dynamics simulation. The ability of our model to confine the vast search space of protein sequences and to scale easily has the potential to assist in a variety of protein design tasks.

Read more
Biomolecules

Designing a Prospective COVID-19 Therapeutic with Reinforcement Learning

The SARS-CoV-2 pandemic has created a global race for a cure. One approach focuses on designing a novel variant of the human angiotensin-converting enzyme 2 (ACE2) that binds more tightly to the SARS-CoV-2 spike protein and diverts it from human cells. Here we formulate a novel protein design framework as a reinforcement learning problem. We generate new designs efficiently through the combination of a fast, biologically-grounded reward function and sequential action-space formulation. The use of Policy Gradients reduces the compute budget needed to reach consistent, high-quality designs by at least an order of magnitude compared to standard methods. Complexes designed by this method have been validated by molecular dynamics simulations, confirming their increased stability. This suggests that combining leading protein design methods with modern deep reinforcement learning is a viable path for discovering a Covid-19 cure and may accelerate design of peptide-based therapeutics for other diseases.

Read more
Biomolecules

Determining the atomic charge of calcium ion requires the information of its coordination geometry in an EF-hand motif

It is challenging to parameterize the force field for calcium ions (Ca2+) in calcium-binding proteins because of their unique coordination chemistry that involves the surrounding atoms required for stability. In this work, we observed wide variation in Ca2+ binding loop conformations of the Ca2+-binding protein calmodulin (CaM), which adopts the most populated ternary structures determined from the MD simulations, followed by ab initio quantum mechanical (QM) calculations on all twelve amino acids in the loop that coordinate Ca2+ in aqueous solution. Ca2+ charges were derived by fitting to the electrostatic potential (ESP) in the context of a classical or polarizable force field (PFF). We discovered that the atomic radius of Ca2+ in conventional force fields is too large for the QM calculation to capture the variation in the coordination geometry of Ca2+ in its ionic form, leading to unphysical charges. Specifically, we found that the fitted atomic charges of Ca2+ in the context of PFF depend on the coordinating geometry of electronegative atoms from the amino acids in the loop. Although nearby water molecules do not influence the atomic charge of Ca2+, they are crucial for compensating for the coordination of Ca2+ due to the conformational flexibility in the EF-hand loop. Our method advances the development of force fields for metal ions and protein binding sites in dynamic environments.

Read more

Ready to get started?

Join us today