Featured Researches

Biomolecules

A Systematic Comparison Study on Hyperparameter Optimisation of Graph Neural Networks for Molecular Property Prediction

Graph neural networks (GNNs) have been proposed for a wide range of graph-related learning tasks. In particular, in recent years, an increasing number of GNN systems were applied to predict molecular properties. However, a direct impediment is to select appropriate hyperparameters to achieve satisfactory performance with lower computational cost. Meanwhile, many molecular datasets are far smaller than many other datasets in typical deep learning applications. Most hyperparameter optimization (HPO) methods have not been explored in terms of their efficiencies on such small datasets in the molecular domain. In this paper, we conducted a theoretical analysis of common and specific features for two state-of-the-art and popular algorithms for HPO: TPE and CMA-ES, and we compared them with random search (RS), which is used as a baseline. Experimental studies are carried out on several benchmarks in MoleculeNet, from different perspectives to investigate the impact of RS, TPE, and CMA-ES on HPO of GNNs for molecular property prediction. In our experiments, we concluded that RS, TPE, and CMA-ES have their individual advantages in tackling different specific molecular problems. Finally, we believe our work will motivate further research on GNN as applied to molecular machine learning problems in chemistry and materials sciences.

Read more
Biomolecules

A Tale of Two Desolvation Potentials: An Investigation of Protein Behavior Under High Hydrostatic Pressure

Hydrostatic pressure is a common perturbation to probe the conformations of proteins. There are two common forms of pressure dependent potentials of mean force (PMFs) derived from hydrophobic molecules available for the coarse grained molecular simulations of protein folding and unfolding under hydrostatic pressure. Although both PMF includes a desolvation barrier separating the well of a direct contact and the well of a solvent mediated contact, how these features vary with hydrostatic pressure is still debated. There is a need of a systematic comparison of these two PMFs on a protein. We investigated the two different pressure dependencies on the desolvation potential in a structure based protein model using coarse grained molecular simulations. We compared them to the known behavior a real protein based on experimental evidence. We showed that the protein s folding transition curve on the pressure temperature phase diagram depends on the relationship between the potential well minima and pressure. For protein that reduces the total volume under pressure, it is essential for the PMF to carry the feature that the direct contact well is essential less stable than the water mediated contact well at high pressure. We also comment on the practicality and importance of structure based minimalist models for understanding the phenomenological behavior of a protein under a wide range of phase space.

Read more
Biomolecules

A Two-Step Biopolymer Nucleation Model Shows a Nonequilibrium Critical Point

Biopolymer self-assembly pathways are central to biological activity, but are complicated by the ability of the monomeric subunits of biopolymers to adopt different conformational states. As a result, biopolymer nucleation often involves a two-step mechanism where the monomers first condense to form a metastable intermediate, and this then converts to a stable polymer by conformational rearrangement of its constituent monomers. While existing mathematical models neglect the dynamics by which intermediates convert to stable polymers, experiments and simulations show that these dynamics frequently occur on comparable timescales to condensation of intermediates and growth of mature polymers, and thus cannot be ignored. Moreover, nucleation intermediates are responsible for cell toxicity in pathologies such as Alzheimer's, Parkinson's, and prion diseases. Due to the relationship between conformation and biological function, the slow conversion dynamics of these species will strongly affect their toxicity. In this study, we present a modified Oosawa model which explicitly accounts for simultaneous assembly and conversion. To describe the conversion dynamics, we propose an experimentally motivated initiation-propagation (IP) mechanism in which the stable phase arises locally within the intermediate, and then spreads through additional conversion events induced by nearest-neighbor interactions, analogous to one-dimensional Glauber dynamics. Our mathematical analysis shows that the competing timescales of assembly and conversion result in a nonequilibrium critical point, separating a regime where intermediates are kinetically unstable from one where conformationally mixed intermediates can accumulate. Our work provides the first general model of two-step biopolymer nucleation, which can be used to quantitatively predict the concentration and composition of biologically crucial intermediates.

Read more
Biomolecules

A Workflow for Exploring Ligand Dissociation from a Macromolecule: Efficient Random Acceleration Molecular Dynamics Simulation and Interaction Fingerprints Analysis of Ligand Trajectories

The dissociation of ligands from proteins and other biomacromolecules occurs over a wide range of timescales. For most pharmaceutically relevant inhibitors, these timescales are far beyond those that are accessible by conventional molecular dynamics (MD) simulation. Consequently, to explore ligand egress mechanisms and compute dissociation rates, it is necessary to enhance the sampling of ligand unbinding. Random Acceleration MD (RAMD) is a simple method to enhance ligand egress from a macromolecular binding site, which enables the exploration of ligand egress routes without prior knowledge of the reaction coordinates. Furthermore, the tauRAMD procedure can be used to compute the relative residence times of ligands. When combined with a machine-learning analysis of protein-ligand interaction fingerprints (IFPs), molecular features that affect ligand unbinding kinetics can be identified. Here, we describe the implementation of RAMD in GROMACS 2020, which provides significantly improved computational performance, with scaling to large molecular systems. For the automated analysis of RAMD results, we developed MD-IFP, a set of tools for the generation of IFPs along unbinding trajectories and for their use in the exploration of ligand dynamics. We demonstrate that the analysis of ligand dissociation trajectories by mapping them onto the IFP space enables the characterization of ligand dissociation routes and metastable states. The combined implementation of RAMD and MD-IFP provides a computationally efficient and freely available workflow that can be applied to hundreds of compounds in a reasonable computational time and will facilitate the use of tauRAMD in drug design.

Read more
Biomolecules

A clustering-based biased Monte Carlo approach to protein titration curve prediction

In this work, we developed an efficient approach to compute ensemble averages in systems with pairwise-additive energetic interactions between the entities. Methods involving full enumeration of the configuration space result in exponential complexity. Sampling methods such as Markov Chain Monte Carlo (MCMC) algorithms have been proposed to tackle the exponential complexity of these problems; however, in certain scenarios where significant energetic coupling exists between the entities, the efficiency of the such algorithms can be diminished. We used a strategy to improve the efficiency of MCMC by taking advantage of the cluster structure in the interaction energy matrix to bias the sampling. We pursued two different schemes for the biased MCMC runs and show that they are valid MCMC schemes. We used both synthesized and real-world systems to show the improved performance of our biased MCMC methods when compared to the regular MCMC method. In particular, we applied these algorithms to the problem of estimating protonation ensemble averages and titration curves of residues in a protein.

Read more
Biomolecules

A comparative analysis for SARS-CoV-2

COVID-19 has affected the world tremendously. It is critical that biological experiments and clinical designs are informed by computational approaches for time- and cost-effective solutions. Comparative analyses particularly can play a key role to reveal structural changes in proteins due to mutations, which can lead to behavioural changes, such as the increased binding of the SARS-CoV-2 surface glycoprotein to human ACE2 receptors. The aim of this report is to provide an easy to follow tutorial for biologists and others without delving into different bioinformatics tools. More complex analyses such as the use of large-scale computational methods can then be utilised. Starting with a SARS-CoV-2 genome sequence, the report shows visualising DNA sequence features, deriving amino acid sequences, and aligning different genomes to analyse mutations and differences. The report provides further insights into how the SARS-CoV-2 surface glycoprotein mutated for higher binding affinity to human ACE2 receptors, compared to the SARS-CoV protein, by integrating existing 3D protein models.

Read more
Biomolecules

A computational insight of the improved nicotine binding with ACE2-SARS-CoV-2 complex with its clinical impact

Smokers being witnessed with the mild adverse clinical symptoms of SARS-CoV-2, the in-silico study is intended to explore the effect of nicotine binding to the soluble angiotensin converting enzyme II (ACE2) receptor with or without SARS-CoV-2 binding. Nicotine established a stable interaction with the conserved amino acid residues: Asp382, Gly405, His378 and Tyr385 through His401 of the soluble ACE2 that seals its interaction with the INS1. Also, nicotine binding has significantly reduced the affinity score of ACE2 with INS1 to -12.6 kcal/mol (versus -15.7 kcal/mol without nicotine) and the interface area to 1933.6 square Angstrom (versus 2057.3 square Angstrom without nicotine). Nicotine exhibited a higher binding affinity score with ACE2-SARS-CoV-2 complex with -6.33 kcal/mol (Vs -5.24 kcal/mol without SARS-CoV-2) and a lowered inhibitory contant value of 22.95 micromolar (Vs 151.69 micromolar without SARS-CoV). Eventhough ACE2 is not a potential receptor for nicotine binding in the healthy people, in COVID19 patients, it may exhibit better binding affinity with the ACE2 receptor. In overall, nicotines strong preference for ACE2-SARS-CoV-2 complex might drastically reduce the SARS-CoV-2 virulence by intervening the ACE2 conserved residues interaction with the spike (S1) protein of SARS-CoV-2.

Read more
Biomolecules

A digital microarray using interferometric detection of plasmonic nanorod labels

DNA and protein microarrays are a high-throughput technology that allow the simultaneous quantification of tens of thousands of different biomolecular species. The mediocre sensitivity and dynamic range of traditional fluorescence microarrays compared to other techniques have been the technology's Achilles' Heel, and prevented their adoption for many biomedical and clinical diagnostic applications. Previous work to enhance the sensitivity of microarray readout to the single-molecule ('digital') regime have either required signal amplifying chemistry or sacrificed throughput, nixing the platform's primary advantages. Here, we report the development of a digital microarray which extends both the sensitivity and dynamic range of microarrays by about three orders of magnitude. This technique uses functionalized gold nanorods as single-molecule labels and an interferometric scanner which can rapidly enumerate individual nanorods by imaging them with a 10x objective lens. This approach does not require any chemical enhancement such as silver deposition, and scans arrays with a throughput similar to commercial fluorescence devices. By combining single-nanoparticle enumeration and ensemble measurements of spots when the particles are very dense, this system achieves a dynamic range of about one million directly from a single scan.

Read more
Biomolecules

A hydrophobic-interaction-based mechanism trigger docking between the SARS CoV 2 spike and angiotensin-converting enzyme 2

A recent experimental study found that the binding affinity between the cellular receptor human angiotensin converting enzyme 2 (ACE2) and receptor-binding domain (RBD) in spike (S) protein of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is more than 10-fold higher than that of the original severe acute respiratory syndrome coronavirus (SARS-CoV). However, main-chain structures of the SARS-CoV-2 RBD are almost the same with that of the SARS-CoV RBD. Understanding physical mechanism responsible for the outstanding affinity between the SARS-CoV-2 S and ACE2 is the "urgent challenge" for developing blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Considering the mechanisms of hydrophobic interaction, hydration shell, surface tension, and the shielding effect of water molecules, this study reveals a hydrophobic-interaction-based mechanism by means of which SARS-CoV-2 S and ACE2 bind together in an aqueous environment. The hydrophobic interaction between the SARS-CoV-2 S and ACE2 protein is found to be significantly greater than that between SARS-CoV S and ACE2. At the docking site, the hydrophobic portions of the hydrophilic side chains of SARS-CoV-2 S are found to be involved in the hydrophobic interaction between SARS-CoV-2 S and ACE2. We propose a method to design live attenuated viruses by mutating several key amino acid residues of the spike protein to decrease the hydrophobic surface areas at the docking site. Mutation of a small amount of residues can greatly reduce the hydrophobic binding of the coronavirus to the receptor, which may be significant reduce infectivity and transmissibility of the virus.

Read more
Biomolecules

A method for partitioning the information contained in a protein sequence between its structure and function

Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We developed a computational algorithm to evaluate the amount of information necessary to specify the protein structure, keeping into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize designed proteins sequences.

Read more

Ready to get started?

Join us today