Featured Researches

Biomolecules

Cross-Modal Fusion Between Data in SAXS and Cryo-EM for Biomolecular Structure Determination

Cryo-Electron Microscopy (cryo-EM) has become an extremely powerful method for resolving structural details of large biomolecular complexes. However, challenging problems in single-particle methods remain open because of (1) the low signal-to-noise ratio in EM; and (2) the potential anisotropy and lack of coverage of projection directions relative to the body-fixed coordinate system for some complexes. Whereas (1) is usually addressed by class averaging (and increasingly due to rapid advances in microscope and sensor technology), (2) is an artifact of the mechanics of interaction of biomolecular complexes and the vitrification process. In the absence of tilt series, (2) remains a problem, which is addressed here by supplementing EM data with Small-Angle X-Ray Scattering (SAXS). Whereas SAXS is of relatively low resolution and contains much lower information content than EM, we show that it is nevertheless possible to use SAXS to fill in blind spots in EM in difficult cases where the range of projection directions is limited.

Read more
Biomolecules

Cross-Modality Protein Embedding for Compound-Protein Affinity and Contact Prediction

Compound-protein pairs dominate FDA-approved drug-target pairs and the prediction of compound-protein affinity and contact (CPAC) could help accelerate drug discovery. In this study we consider proteins as multi-modal data including 1D amino-acid sequences and (sequence-predicted) 2D residue-pair contact maps. We empirically evaluate the embeddings of the two single modalities in their accuracy and generalizability of CPAC prediction (i.e. structure-free interpretable compound-protein affinity prediction). And we rationalize their performances in both challenges of embedding individual modalities and learning generalizable embedding-label relationship. We further propose two models involving cross-modality protein embedding and establish that the one with cross interaction (thus capturing correlations among modalities) outperforms SOTAs and our single modality models in affinity, contact, and binding-site predictions for proteins never seen in the training set.

Read more
Biomolecules

Crowding-induced Elongated Conformation of Urea-unfolded Apoazurin: Investigating the Role of Crowder Shape In Silico

Here, we show by solution nuclear magnetic resonance measurements that the urea-unfolded protein apoazurin becomes elongated when the synthetic crowding agent dextran 20 is present, in contrast to the prediction from the macromolecular crowding effect based on the argument of volume exclusion. To explore the complex interactions beyond volume exclusion, we employed coarse-grained molecular dynamics simulations to explore the conformational ensemble of apoazurin in a box of monodisperse crowders under strong chemically denaturing conditions. The elongated conformation of unfolded apoazurin appears to result from the interplay of the effective attraction between the protein and crowders and the shape of the crowders. With a volume-conserving crowder model, we show that the crowder shape provides an anisotropic direction of the depletion force, in which a bundle of surrounding rod-like crowders stabilize an elongated conformation of unfolded apoazurin in the presence of effective attraction between the protein and crowders.

Read more
Biomolecules

DHX36-mediated G-quadruplex unfolding is ATP-independent?

Chen et al. solved the crystal structure of bovine DHX36 bound to a DNA with a G-quadruplex (G4) and a single-stranded DNA segment. They believed that the mechanism they proposed may represent a general model for describing how a G4-unfolding helicase recognizes and unfolds G4 DNA. Their conclusion is interesting, however, we noticed that their linear DNA substrate (DNAMyc) that harbors a Myc-promoter-derived G4-forming sequence was directly used without pre-folding. This raises the question whether the structure they obtained really reflects DHX36-mediated G4 recognition and unfolding, or just only represents a DHX36-binding-induced quasi-folded G4 structure. By a combination of polymerase extension, DMS footprinting, stopped-flow, and smFRET assays, we obtained clear evidences that do not support their ATP-independent one-base translocation structural model. We further revealed that the oscillation of FRET signal they observed should correspond to a repetitive G4 binding, but not unfolding, by DHX36.

Read more
Biomolecules

DNA Torsion-based Model of Cell Fate Phase Transitions

All stem cell fate transitions, including the metabolic reprogramming of stem cells and the somatic reprogramming of fibroblasts into pluripotent stem cells, can be understood from a unified theoretical model of cell fates. Each cell fate transition can be regarded as a phase transition in DNA supercoiling. However, there has been a dearth of quantitative biophysical models to explain and predict the behaviors of these phase transitions. The generalized Ising model is proposed to define such phase transitions. The model predicts that, apart from temperature-induced phase transitions, there exists DNA torsion frequency-induced phase transitions. Major transitions in epigenetic states, from stem cell activation to differentiation and reprogramming, can be explained by such torsion frequency-induced phase transitions, with important implications for regenerative medicine and medical diagnostics in the future.

Read more
Biomolecules

DNA energy constraints shape biological evolutionary trajectories

Most living systems rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the human codon biases and context-dependent mutation patterns in human populations. Thus, the DNA "exceptional symmetries", emerged from the randomness, have to be taken into account when looking for the DNA encoded information. Our results suggest that the double helix energy constraints and, more generally, the physical properties of the dsDNA are the hard drivers of the overall DNA sequence architecture, whereas the biological selective processes act as soft drivers, which only under extraordinary circumstances overtake the overall entropy content of the genome.

Read more
Biomolecules

Deciphering general characteristics of residues constituting allosteric communication paths

Considering all the PDB annotated allosteric proteins (from ASD - AlloSteric Database) belonging to four different classes (kinases, nuclear receptors, peptidases and transcription factors), this work has attempted to decipher certain consistent patterns present in the residues constituting the allosteric communication sub-system (ACSS). The thermal fluctuations of hydrophobic residues in ACSSs were found to be significantly higher than those present in the non-ACSS part of the same proteins, while polar residues showed the opposite trend. The basic residues and hydroxyl residues were found to be slightly more predominant than the acidic residues and amide residues in ACSSs, hydrophobic residues were found extremely frequently in kinase ACSSs. Despite having different sequences and different lengths of ACSS, they were found to be structurally quite similar to each other - suggesting a preferred structural template for communication. ACSS structures recorded low RMSD and high Akaike Information Criterion(AIC) scores among themselves. While the ACSS networks for all the groups of allosteric proteins showed low degree centrality and closeness centrality, the betweenness centrality magnitudes revealed nonuniform behavior. Though cliques and communities could be identified within the ACSS, maximal-common-subgraph considering all the ACSS could not be generated, primarily due to the diversity in the dataset. Barring one particular case, the entire ACSS for any class of allosteric proteins did not demonstrate "small world" behavior, though the sub-graphs of the ACSSs, in certain cases, were found to form small-world networks.

Read more
Biomolecules

Deciphering the Protein Motion of S1 Subunit in SARS-CoV-2 Spike Glycoprotein Through Integrated Computational Methods

The novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major worldwide public health emergency that has infected over 1.5 million people. The partially open state of S1 subunit in spike glycoprotein is considered vital for its infection with host cell and is represented as a key target for neutralizing antibodies. However, the mechanism elucidating the transition from the closed state to the partially open state still remains unclear. Here, we applied a combination of Markov state model, transition path theory and random forest to analyze the S1 motion. Our results explored a promising complete conformational movement of receptor-binding domain, from buried, partially open, to detached states. We also numerically confirmed the transition probability between those states. Based on the asymmetry in both the dynamics behavior and backbone C α importance, we further suggested a relation between chains in the trimer spike protein, which may help in the vaccine design and antibody neutralization.

Read more
Biomolecules

Decoy Selection for Protein Structure Prediction Via Extreme Gradient Boosting and Ranking

Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.

Read more
Biomolecules

Deep Generative Model Driven Protein Folding Simulation

Significant progress in computer hardware and software have enabled molecular dynamics (MD) simulations to model complex biological phenomena such as protein folding. However, enabling MD simulations to access biologically relevant timescales (e.g., beyond milliseconds) still remains challenging. These limitations include (1) quantifying which set of states have already been (sufficiently) sampled in an ensemble of MD runs, and (2) identifying novel states from which simulations can be initiated to sample rare events (e.g., sampling folding events). With the recent success of deep learning and artificial intelligence techniques in analyzing large datasets, we posit that these techniques can also be used to adaptively guide MD simulations to model such complex biological phenomena. Leveraging our recently developed unsupervised deep learning technique to cluster protein folding trajectories into partially folded intermediates, we build an iterative workflow that enables our generative model to be coupled with all-atom MD simulations to fold small protein systems on emerging high performance computing platforms. We demonstrate our approach in folding Fs-peptide and the \beta\beta\alpha (BBA) fold, FSD-EY. Our adaptive workflow enables us to achieve an overall root-mean squared deviation (RMSD) to the native state of 1.6~Å and 4.4~Å respectively for Fs-peptide and FSD-EY. We also highlight some emerging challenges in the context of designing scalable workflows when data intensive deep learning techniques are coupled to compute intensive MD simulations.

Read more

Ready to get started?

Join us today