Is this you? Create Your Porfile

Jianzhu Ma

Toyota Technological Institute at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianzhu Ma is active.

Explore More

Publication

Featured researches published by Jianzhu Ma.

Scientific Reports | 2016

Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

Sheng Wang; Jian Peng; Jianzhu Ma; Jinbo Xu

Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

Methods of Molecular Biology | 2014

RaptorX server: A Resource for Template-Based Protein Structure Modeling

Morten Källberg; Gohar Margaryan; Sheng Wang; Jianzhu Ma; Jinbo Xu

Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.

Bioinformatics | 2013

Protein threading using context-specific alignment potential

Jianzhu Ma; Sheng Wang; Feng Zhao; Jinbo Xu

Motivation: Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related. Results: We present a novel context-specific alignment potential for protein threading, including alignment and template selection. Our alignment potential measures the log-odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global context-specific information. The local alignment potential quantifies how well one sequence residue can be aligned to one template residue based on context-specific information of the residues. The global alignment potential quantifies how well two sequence residues can be placed into two template positions at a given distance, again based on context-specific information. By accounting for correlation among a variety of protein features and making use of context-specific information, our alignment potential is much more sensitive than the widely used context-independent or profile-based scoring function. Experimental results confirm that our method generates significantly better alignments and threading results than the best profile-based methods on several large benchmarks. Our method works particularly well for distantly related proteins or proteins with sparse sequence profiles because of the effective integration of context-specific, structure and global information. Availability: http://raptorx.uchicago.edu/download/. Contact: [email protected]

Bioinformatics | 2015

Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning

Jianzhu Ma; Sheng Wang; Zhiyong Wang; Jinbo Xu

MOTIVATION Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs. RESULTS This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts. AVAILABILITY AND IMPLEMENTATION See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Bioinformatics | 2012

A conditional neural fields model for protein threading

Jianzhu Ma; Jian Peng; Sheng Wang; Jinbo Xu

Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence–template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

research in computational molecular biology | 2014

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

Jianzhu Ma; Sheng Wang; Zhiyong Wang; Jinbo Xu

Sequence-based protein homology detection has been extensively studied, but it still remains very challenging for remote homologs with divergent sequences. So far the most sensitive method for homology detection is based upon comparison of protein sequence profiles, which are usually derived from multiple sequence alignment MSA of sequence homologs in a protein family and represented as a position-specific scoring matrix PSSM or an HMM Hidden Markov Model. HMM is more sensitive than PSSM because the former contains position-specific gap information and also takes into account correlation among sequentially adjacent residues. The main issue with HMM lies in that it makes use of only position-specific amino acid mutation patterns and very short-range residue correlation, but not long-range residue interaction. However, remote homologs may have very divergent sequences and are only similar at the level of long-range residue interaction pattern, which is not encoded in current popular PSSM or HMM models.

International Journal of Molecular Sciences | 2015

DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields

Sheng Wang; Shunyan Weng; Jianzhu Ma; Qingming Tang

Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.

Advances in Protein Chemistry | 2014

Algorithms, applications, and challenges of protein structure alignment.

Jianzhu Ma; Sheng Wang

As a fundamental problem in computational structure biology, protein structure alignment has attracted the focus of the community for more than 20 years. While the pairwise structure alignment could be applied to measure the similarity between two proteins, which is a first step for homology search and fold space construction, the multiple structure alignment could be used to understand evolutionary conservation and divergence from a family of protein structures. Structure alignment is an NP-hard problem, which is only computationally tractable by using heuristics. Three levels of heuristics for pairwise structure alignment have been proposed, from the representations of protein structure, the perspectives of viewing protein as a rigid-body or flexible, to the scoring functions as well as the search algorithms for the alignment. For multiple structure alignment, the fourth level of heuristics is applied on how to merge all input structures to a multiple structure alignment. In this review, we first present a small survey of current methods for protein pairwise and multiple alignment, focusing on those that are publicly available as web servers. In more detail, we also discuss the advancements on the development of the new approaches to increase the pairwise alignment accuracy, to efficiently and reliably merge input structures to the multiple structure alignment. Finally, besides broadening the spectrum of the applications of structure alignment for protein template-based prediction, we also list several open problems that need to be solved in the future, such as the large complex alignment and the fast database search.

Nature Methods | 2018

Using deep learning to model the hierarchical structure and function of a cell

Jianzhu Ma; Michael Ku Yu; Samson Fong; Keiichiro Ono; Eric Sage; Barry Demchak; Roded Sharan; Trey Ideker

Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the models inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in silico investigations of the molecular mechanisms underlying genotype–phenotype associations. These mechanisms can be validated, and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life.

Bioinformatics | 2016

AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields

Sheng Wang; Jianzhu Ma; Jinbo Xu

MOTIVATION Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT [email protected], [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Explore More