Ryan R. Cheng
Rice University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryan R. Cheng.
Proceedings of the National Academy of Sciences of the United States of America | 2014
Faruck Morcos; Nicholas P. Schafer; Ryan R. Cheng; José N. Onuchic; Peter G. Wolynes
Significance Natural protein sequences, being the result of random mutation coupled with natural selection, have remarkable properties that are not typical of unselected random sequences, including the ability to robustly fold to an organized structure that is needed to function. We estimate the selection temperature, the effective temperature at which sequences were selected by evolution, for eight protein families and compare these values with experimental data for folding temperatures of proteins in each family. The selection temperature measures the importance of maintaining the stability and structural specificity of the folded state on the evolutionary process. For all families, the selection temperature is below physiological temperature, indicating that maintaining the structural integrity of the folded state is an important constraint on evolution. The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature Tsel at which these foldable sequences have been selected in sequence space by evolution. Tsel quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for Tsel are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.
Proceedings of the National Academy of Sciences of the United States of America | 2014
Ryan R. Cheng; Faruck Morcos; Herbert Levine; José N. Onuchic
Significance Our study uses amino acid coevolutionary information to better understand how bacterial two-component signaling (TCS) proteins preferentially interact with their correct partners while avoiding interactions with nonpartners. We extract coevolutionary couplings from sequences of TCS partners and study how coevolution is necessary to maintain their ability to transfer signals with high specificity. We use these coevolving couplings to devise a metric, which can predict the effects of mutations in the quality of signal transmission observed in vitro and provide support to the hypothesis that hybrid TCS proteins have reduced specificity. Our metric can potentially be used to redesign a TCS protein to preferentially interact with a nonpartner. Furthermore, our study can potentially be extended to networks of interacting proteins. A challenge in molecular biology is to distinguish the key subset of residues that allow two-component signaling (TCS) proteins to recognize their correct signaling partner such that they can transiently bind and transfer signal, i.e., phosphoryl group. Detailed knowledge of this information would allow one to search sequence space for mutations that can be used to systematically tune the signal transmission between TCS partners as well as potentially encode a TCS protein to preferentially transfer signals to a nonpartner. Motivated by the notion that this detailed information is found in sequence data, we explore the sequence coevolution between signaling partners to better understand how mutations can positively or negatively alter their ability to transfer signal. Using direct coupling analysis for determining evolutionarily conserved protein–protein interactions, we apply a metric called the direct information score to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in measured in vitro phosphotransfer. Furthermore, by subtracting from our metric an appropriate null model corresponding to generic, conserved features in TCS signaling pairs, we can isolate the determinants that give rise to interaction specificity and recognition, which are variable among different TCS partners. Our methodology forms a potential framework for the rational design of TCS systems by allowing one to quickly search sequence space for mutations or even entirely new sequences that can increase or decrease our metric, as a proxy for increasing or decreasing phosphotransfer ability between TCS proteins.
Proceedings of the National Academy of Sciences of the United States of America | 2017
Michele Di Pierro; Ryan R. Cheng; Erez Lieberman Aiden; Peter G. Wolynes; Jos eacute N. Onuchic
Significance In the nucleus of eukaryotic cells, the genome is organized in three dimensions in an architecture that depends on cell type. This organization is a key element of transcriptional regulation, and its disruption often leads to disease. We demonstrate that it is possible to predict how a genome will fold based on the epigenetic marks that decorate chromatin. Epigenetic marking patterns are used to predict the corresponding ensemble of 3D structures by leveraging both energy landscape theory and neural network-based machine learning. These predictions are extensively validated by the results of DNA-DNA ligation assays and fluorescence microscopy, which are found to be in exceptionally good agreement with theory. Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.
Proceedings of the National Academy of Sciences of the United States of America | 2016
Fang Bai; Faruck Morcos; Ryan R. Cheng; Hualiang Jiang; José N. Onuchic
Significance Protein−protein interfaces have become an emerging class of molecular targets for the design of therapeutic drugs. However, major challenges exist for the correct identification of binding sites on the protein surface as well as drug-like modulators of protein−protein interaction. An integrated approach using molecular fragment docking and coevolutionary analysis is presented to face these challenges. This approach can accurately predict and characterize the binding sites for protein−protein interactions as well as provide clusters of bound, fragment-sized molecules on the druggable regions of the predicted binding site. These bound, molecular fragments can be chemically combined to create candidate drugs. Protein−protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein−protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
Molecular Biology and Evolution | 2016
Ryan R. Cheng; Olle Nordesjö; Ryan L. Hayes; Herbert Levine; Samuel Coulbourn Flores; José N. Onuchic; Faruck Morcos
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli. We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Protein Science | 2016
Ryan R. Cheng; Mohit Raghunathan; Jeffrey K. Noel; José N. Onuchic
Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid sites within the multiple sequence alignment of a protein family. Here, we use the maximum entropy‐based approach called mean field Direct Coupling Analysis (mfDCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family. We use the inferred pairwise statistical couplings to generate the sequence‐dependent heterogeneous interaction energies of a structure‐based model (SBM) where only native contacts are considered. Considering the ribosomal S6 protein and its circular permutants as well as the SH3 protein, we demonstrate that these models quantitatively agree with experimental data on folding mechanisms. This work serves as a new framework for generating coevolutionary data‐enriched models that can potentially be used to engineer key functional motions and novel interactions in protein systems.
Journal of Bacteriology | 2016
Joseph S. Boyd; Ryan R. Cheng; Mark L. Paddock; Cigdem Sancar; Faruck Morcos; Susan S. Golden
UNLABELLED Two-component systems (TCS) that employ histidine kinases (HK) and response regulators (RR) are critical mediators of cellular signaling in bacteria. In the model cyanobacterium Synechococcus elongatus PCC 7942, TCSs control global rhythms of transcription that reflect an integration of time information from the circadian clock with a variety of cellular and environmental inputs. The HK CikA and the SasA/RpaA TCS transduce time information from the circadian oscillator to modulate downstream cellular processes. Despite immense progress in understanding of the circadian clock itself, many of the connections between the clock and other cellular signaling systems have remained enigmatic. To narrow the search for additional TCS components that connect to the clock, we utilized direct-coupling analysis (DCA), a statistical analysis of covariant residues among related amino acid sequences, to infer coevolution of new and known clock TCS components. DCA revealed a high degree of interaction specificity between SasA and CikA with RpaA, as expected, but also with the phosphate-responsive response regulator SphR. Coevolutionary analysis also predicted strong specificity between RpaA and a previously undescribed kinase, HK0480 (herein CikB). A knockout of the gene for CikB (cikB) in a sasA cikA null background eliminated the RpaA phosphorylation and RpaA-controlled transcription that is otherwise present in that background and suppressed cell elongation, supporting the notion that CikB is an interactor with RpaA and the clock network. This study demonstrates the power of DCA to identify subnetworks and key interactions in signaling pathways and of combinatorial mutagenesis to explore the phenotypic consequences. Such a combined strategy is broadly applicable to other prokaryotic systems. IMPORTANCE Signaling networks are complex and extensive, comprising multiple integrated pathways that respond to cellular and environmental cues. A TCS interaction model, based on DCA, independently confirmed known interactions and revealed a core set of subnetworks within the larger HK-RR set. We validated high-scoring candidate proteins via combinatorial genetics, demonstrating that DCA can be utilized to reduce the search space of complex protein networks and to infer undiscovered specific interactions for signaling proteins in vivo Significantly, new interactions that link circadian response to cell division and fitness in a light/dark cycle were uncovered. The combined analysis also uncovered a more basic core clock, illustrating the synergy and applicability of a combined computational and genetic approach for investigating prokaryotic signaling networks.
bioRxiv | 2017
Ryan R. Cheng; Ellinor Haglund; Nicholas Tiee; Faruck Morcos; Herbert Levine; Joseph A. Adams; Patricia A. Jennings; José N. Onuchic
The selection of mutations that encode new interactions between bacterial two-component signaling (TCS) proteins remains a significant challenge. Recent work constructed a coevolutionary landscape where mutations can readily be selected to maintain signal transfer interactions between partner TCS proteins without introducing unwanted crosstalk. A bigger challenge is to select mutations for a TCS protein from the landscape to enhance, suppress, or have a neutral effect on its basal signal transfer with a non-partner. This study focuses on the computational selection of 12 single-point mutations to a response regulator from Bacillus subtilis and its effect on phosphotransfer with a histidine kinase from Escherichia Coli. These mutations are experimentally expressed to directly test the theoretical predictions, of which seven mutants successfully perturb phosphotransfer in the predicted manner. Furthermore, Differential Scanning Calorimetry is used to monitor any protein stability effects caused by the mutations, which could be detrimental to proper protein function.
PLOS ONE | 2018
Ryan R. Cheng; Ellinor Haglund; Nicholas Tiee; Faruck Morcos; Herbert Levine; Joseph A. Adams; Patricia A. Jennings; José N. Onuchic
Selecting amino acids to design novel protein-protein interactions that facilitate catalysis is a daunting challenge. We propose that a computational coevolutionary landscape based on sequence analysis alone offers a major advantage over expensive, time-consuming brute-force approaches currently employed. Our coevolutionary landscape allows prediction of single amino acid substitutions that produce functional interactions between non-cognate, interspecies signaling partners. In addition, it can also predict mutations that maintain segregation of signaling pathways across species. Specifically, predictions of phosphotransfer activity between the Escherichia coli histidine kinase EnvZ to the non-cognate receiver Spo0F from Bacillus subtilis were compiled. Twelve mutations designed to enhance, suppress, or have a neutral effect on kinase phosphotransfer activity to a non-cognate partner were selected. We experimentally tested the ability of the kinase to relay phosphate to the respective designed Spo0F receiver proteins against the theoretical predictions. Our key finding is that the coevolutionary landscape theory, with limited structural data, can significantly reduce the search-space for successful prediction of single amino acid substitutions that modulate phosphotransfer between the two-component His-Asp relay partners in a predicted fashion. This combined approach offers significant improvements over large-scale mutations studies currently used for protein engineering and design.
Cell | 2018
Laura Vian; Aleksandra Pekowska; Suhas S.P. Rao; Kyong-Rim Kieffer-Kwon; Seolkyoung Jung; Laura Baranello; Su-Chen Huang; Laila El Khattabi; Marei Dose; Nathanael Pruett; Adrian L. Sanborn; Andres Canela; Yaakov Maman; Anna Oksanen; Wolfgang Resch; Xingwang Li; Byoungkoo Lee; Alexander L. Kovalchuk; Zhonghui Tang; Steevenson Nelson; Michele Di Pierro; Ryan R. Cheng; Ido Machol; Brian Glenn St Hilaire; Neva C. Durand; Muhammad S. Shamim; Elena Stamenova; José N. Onuchic; Yijun Ruan; André Nussenzweig