Featured Researches

Biomolecules

LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative O( n 3 ) -time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability: Our source code is available at this https URL, and our webserver is at this http URL (sequence limit: 100,000nt).

Read more
Biomolecules

LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities

RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. This slowness is even more severe than cubic-time MFE-based methods due to a larger constant factor in runtime. Inspired by the success of our recently proposed LinearFold algorithm that predicts the approximate MFE structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g., 2.5 days vs. 1.3 minutes on a sequence with length 32,753 nt). More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNA), as well as a substantial improvement on long-distance base pairs (500+ nt apart).

Read more
Biomolecules

Liquid-liquid microphase separation leads to formation of membraneless organelles

Proteins and nucleic acids can spontaneously self-assemble into membraneless droplet-like compartments, both in vitro and in vivo. A key component of these droplets are multi-valent proteins that possess several adhesive domains with specific interaction partners (whose number determines total valency of the protein) separated by disordered regions. Here, using multi-scale simulations we show that such proteins self-organize into micro-phase separated droplets of various sizes as opposed to the Flory-like macro-phase separated equilibrium state of homopolymers or equilibrium physical gels. We show that the micro-phase separated state is a dynamic outcome of the interplay between two competing processes: a diffusion-limited encounter between proteins, and the dynamics within small clusters that results in exhaustion of available valencies whereby all specifically interacting domains find their interacting partners within smaller clusters, leading to arrested phase separation. We first model these multi-valent chains as bead-spring polymers with multiple adhesive domains separated by semi-flexible linkers and use Langevin Dynamics (LD) to assess how key timescales depend on the molecular properties of associating polymers. Using the time-scales from LD simulations, we develop a coarse-grained kinetic model to study this phenomenon at longer times. Consistent with LD simulations, the macro-phase separated state was only observed at high concentrations and large interaction valencies. Further, in the regime where cluster sizes approach macro-phase separation, the condensed phase becomes dynamically solid-like, suggesting that it might no longer be biologically functional. Therefore, the micro-phase separated state could be a hallmark of functional droplets formed by proteins with the sticker-spacer architecture.

Read more
Biomolecules

Local sequence-structure relationships in proteins

We seek to understand the interplay between amino acid sequence and local structure in proteins. Are some amino acids unique in their ability to fit harmoniously into certain local structures? What is the role of sequence in sculpting the putative native state folds from myriad possible conformations? In order to address these questions, we represent the local structure of each C-alpha atom of a protein by just two angles, theta and mu, and we analyze a set of more than 4000 protein structures from the PDB. We use a hierarchical clustering scheme to divide the 20 amino acids into six distinct groups based on their similarity to each other in fitting local structural space. We present the results of a detailed analysis of patterns of amino acid specificity in adopting local structural conformations and show that the sequence-structure correlation is not very strong compared to a random assignment of sequence to structure. Yet, our analysis may be useful to determine an effective scoring rubric for quantifying the match of an amino acid to its putative local structure.

Read more
Biomolecules

Localization of Energetic Frustration in Proteins

We present a detailed heuristic method to quantify the degree of local energetic frustration manifested by protein molecules. Current applications are realized in computational experiments where a protein structure is visualized highlighting the energetic conflicts or the concordance of the local interactions in that structure. Minimally frustrated linkages highlight the stable folding core of the molecule. Sites of high local frustration, in contrast, often indicate functionally relevant regions such as binding, active or allosteric sites.

Read more
Biomolecules

Low-frequency vibrations of water molecules in minor groove of the DNA double helix

The dynamics of the structured water molecules in the hydration shell of the DNA double helix is of paramount importance for the understanding of many biological mechanisms. In particular, the vibrational dynamics of a water spine that is formed in the DNA minor groove is the aim of the present study. Within the framework of the developed phenomenological model, based on the approach of DNA conformational vibrations, the modes of H-bond stretching, backbone vibrations, and water translational vibrations have been established. The calculated frequencies of translation vibrations of water molecules vary from 167 to 205 cm −1 depending on the nucleotide sequence. The mode of water vibrations higher than the modes of internal conformational vibrations of DNA. The calculated frequencies of water vibrations have shown a sufficient agreement with the experimental low-frequency vibrational spectra of DNA. The obtained modes of water vibrations are observed in the same region of the vibrational spectra of DNA as translation vibrations of water molecules in the bulk phase. To distinguish the vibrations of water molecules in the DNA minor groove from those in the bulk water, the dynamics of DNA with heavy water was also considered. The results have shown that in the case of heavy water the frequencies of vibrations decrease for about 10 cm −1 that may be used in the experiment to identify the mode of water vibrations in the spine of hydration in DNA minor groove.

Read more
Biomolecules

MENSADB: A Thorough Structural Analysis of Membrane Protein Dimers

Membrane Proteins (MPs) account for around 15-39% of the human proteome and assume a critical role in a vast set of cellular and physiological mechanisms, including molecular transport, nutrient uptake, toxin and waste product clearance, respiration, and signaling. While roughly 60% of all FDA-approved drugs target MPs, there is a shortage of structural and biochemical data on them mainly hindered by their localization in the lipid bilayer. We present here MEmbrane protein dimer Novel Structure Analyser database (MENSAdb), a real time web-application exposing a broad array of fundamental features about MPs surface and their interfacial regions. In particular, we present conservation, four distinctive Accessible Solvent Area (ASA) descriptors, average and environment-specific B-factors, intermolecular contacts at 2.5 and 4.0 angstroms distance cutoffs, salt-bridges, hydrogen-bonds, hydrophobic, pi-pi interactions, t-stacking and cation-pi interactions. Additionally, users can closely inspect differences in values between three distinctive residues classes: i) non-surface, ii) surface and non-interfacial and iii) interfacial. The database is freely available at this http URL.

Read more
Biomolecules

MUFold-BetaTurn: A Deep Dense Inception Network for Protein Beta-Turn Prediction

Beta-turn prediction is useful in protein function studies and experimental design. Although recent approaches using machine-learning techniques such as SVM, neural networks, and K-NN have achieved good results for beta-turn pre-diction, there is still significant room for improvement. As previous predictors utilized features in a sliding window of 4-20 residues to capture interactions among sequentially neighboring residues, such feature engineering may result in incomplete or biased features, and neglect interactions among long-range residues. Deep neural networks provide a new opportunity to address these issues. Here, we proposed a deep dense inception network (DeepDIN) for beta-turn prediction, which takes advantages of the state-of-the-art deep neural network design of the DenseNet and the inception network. A test on a recent BT6376 benchmark shows that the DeepDIN outperformed the previous best BetaTPred3 significantly in both the overall prediction accuracy and the nine-type beta-turn classification. A tool, called MUFold-BetaTurn, was developed, which is the first beta-turn prediction tool utilizing deep neural networks. The tool can be downloaded at this http URL.

Read more
Biomolecules

Machine Learning Harnesses Molecular Dynamics to Discover New μ Opioid Chemotypes

Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the μ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of μOR with molecular dynamics simulation and then machine learn ligand-structure relationships to predict opioid ligand function. These artificial intelligence models identified a novel μ opioid chemotype.

Read more
Biomolecules

Machine Learning for Classification of Protein Helix Capping Motifs

The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular α -helix and β -sheet; irregular structural patterns, such as 'turns' and 'loops', are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit α -helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly--including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)--as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.

Read more

Ready to get started?

Join us today