Jinbo Xu
Toyota Technological Institute at Chicago
Nature Protocols | 2012
Morten Källberg; Haipeng Wang; Sheng Wang; Jian Peng; Zhiyong Wang; Hui Lu; Jinbo Xu
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ∼35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ∼6,000 sequences submitted by ∼1,600 users from around the world.
Proceedings of the National Academy of Sciences of the United States of America | 2008
Rohit Singh; Jinbo Xu; Bonnie Berger
Protein–protein interactions (PPIs) and their networks play a central role in all biological processes. Akin to the complete sequencing of genomes and their comparative analysis, complete descriptions of interactomes and their comparative analysis is fundamental to a deeper understanding of biological processes. A first step in such an analysis is to align two or more PPI networks. Here, we introduce an algorithm, IsoRank, for global alignment of multiple PPI networks. The guiding intuition here is that a protein in one PPI network is a good match for a protein in another network if their respective sequences and neighborhood topologies are a good match. We encode this intuition as an eigenvalue problem in a manner analogous to Googles PageRank method. Using IsoRank, we compute a global alignment of the Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens PPI networks. We demonstrate that incorporating PPI data in ortholog prediction results in improvements over existing sequence-only approaches and over predictions from local alignments of the yeast and fly networks. Previous methods have been effective at identifying conserved, localized network patterns across pairs of networks. This work takes the further step of performing a global alignment of multiple PPI networks. It simultaneously uses sequence similarity and network data and, unlike previous approaches, explicitly models the tradeoff inherent in combining them. We expect IsoRank—with its simultaneous handling of node similarity and network similarity—to be applicable across many scientific domains.
research in computational molecular biology | 2007
Rohit Singh; Jinbo Xu; Bonnie Berger
We describe an algorithm, IsoRank, for global alignment of two protein-protein interaction (PPI) networks. IsoRank aims to maximize the overall match between the two networks; in contrast, much of previous work has focused on the local alignment problem-- identifying many possible alignments, each corresponding to a local region of similarity. IsoRank is guided by the intuition that a protein should be matched with a protein in the other network if and only if the neighbors of the two proteins can also be well matched. We encode this intuition as an eigenvalue problem, in a manner analogous to Googles PageRank method. We use IsoRank to compute the first known global alignment between the S. cerevisiae and D. melanogaster PPI networks. The common subgraph has 1420 edges and describes conserved functional components between the two species. Comparisons of our results with those of a well-known algorithm for local network alignment indicate that the globally optimized alignment resolves ambiguity introduced by multiple local alignments. Finally, we interpret the results of global alignment to identify functional orthologs between yeast and fly; our functional ortholog prediction method is much simpler than a recently proposed approach and yet provides results that are more comprehensive.
Journal of Bioinformatics and Computational Biology | 2003
Jinbo Xu; Ming Li; Dongsup Kim; Ying Xu
This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of the time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR-RApid Protein Threading by Operation Research technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling (HM) targets. RAPTOR was implemented at the University of Waterloo and it can be accessed at http://www.cs.uwaterloo.ca/~j3xu/RAPTOR_form.htm.
Proteins | 2011
Jian Peng; Jinbo Xu
This work presents RaptorX, a statistical method for template‐based protein modeling that improves alignment accuracy by exploiting structural information in a single or multiple templates. RaptorX consists of three major components: single‐template threading, alignment quality prediction, and multiple‐template threading. This work summarizes the methods used by RaptorX and presents its CASP9 result analysis, aiming to identify major bottlenecks with RaptorX and template‐based modeling and hopefully directions for further study. Our results show that template structural information helps a lot with both single‐template and multiple‐template protein threading especially when closely‐related templates are unavailable, and there is still large room for improvement in both alignment and template selection. The RaptorX web server is available at http://raptorx.uchicago.edu. Proteins 2011;
Current Genomics | 2007
Lai; Jinbo Xu
The ribosome is essential for protein synthesis. The composition and structure of ribosomes from several organisms have been determined, and it is well documented that ribosomal RNAs (rRNAs) and ribosomal proteins (RPs) constitute this important organelle. Many RPs also fill various roles that are independent of protein biosynthesis, called extraribosomal functions. These functions include DNA replication, transcription and repair, RNA splicing and modification, cell growth and proliferation, regulation of apoptosis and development, and cellular transformation. Previous investigations have revealed that RP regulation in colorectal carcinomas (CRC) differs from that found in colorectal adenoma or normal mucosa, with some RPs being up-regulated while others are down-regulated. The expression patterns of RPs are associated with the differentiation, progression or metastasis of CRC. Additionally, the recent literature has shown that the perturbation of specific RPs may promote certain genetic diseases and tumorigenesis. Because of the implications of RPs in disease, especially malignancy, our review sought to address several questions. Why do expression levels or categories of RPs differ in different diseases, most notably in CRC? Is this a cause or consequence of the diseases? What are their possible roles in the diseases? We review the known extraribosomal functions of RPs and associated changes in colorectal cancer and attempt to clarify the possible roles of RPs in colonic malignancy.
PLOS Computational Biology | 2017
Sheng Wang; Siqi Sun; Zhen Li; Renyu Zhang; Jinbo Xu
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/
Scientific Reports | 2016
Sheng Wang; Jian Peng; Jianzhu Ma; Jinbo Xu
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
research in computational molecular biology | 2005
Jinbo Xu
This paper proposes a novel tree decomposition based side-chain assignment algorithm, which can obtain the globally optimal solution of the side-chain packing problem very efficiently. Theoretically, the computational complexity of this algorithm is O((N + M)n
Journal of the ACM | 2006
Jinbo Xu; Bonnie Berger
_{rot}^{tw + 1}