Jianzhao Gao
Nankai University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jianzhao Gao.
PLOS ONE | 2012
Jianzhao Gao; Eshel Faraggi; Yaoqi Zhou; Jishou Ruan; Lukasz Kurgan
Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods.
Structure | 2011
Ke Chen; Marcin J. Mizianty; Jianzhao Gao; Lukasz Kurgan
Protein function annotation and rational drug discovery rely on the knowledge of binding sites for small organic compounds, and yet the quality of existing binding site predictors was never systematically evaluated. We assess predictions of ten representative geometry-, energy-, threading-, and consensus-based methods on a new benchmark data set that considers apo and holo protein structures with multiple binding sites for biologically relevant ligands. Statistical tests show that threading-based Findsite outperforms other predictors when its templates have high similarity with the input protein. However, Findsite is equivalent or inferior to some geometry-, energy-, and consensus-based methods when the similarity is lower. We demonstrate that geometry-, energy-, and consensus-based predictors benefit from the usage of holo structures and that the top four methods, Findsite, Q-SiteFinder, ConCavity, and MetaPocket, perform better for larger binding sites. Predictions from these four methods are complementary, and our simple meta-predictor improves over the best single predictor.
Proteins | 2010
Jianzhao Gao; Tuo Zhang; Hua Zhang; Shiyi Shen; Jishou Ruan; Lukasz Kurgan
Protein folding rates vary by several orders of magnitude and they depend on the topology of the fold and the size and composition of the sequence. Although recent works show that the rates can be predicted from the sequence, allowing for high‐throughput annotations, they consider only the sequence and its predicted secondary structure. We propose a novel sequence‐based predictor, PFR‐AF, which utilizes solvent accessibility and residue flexibility predicted from the sequence, to improve predictions and provide insights into the folding process. The predictor includes three linear regressions for proteins with two‐state, multistate, and unknown (mixed‐state) folding kinetics. PFR‐AF on average outperforms current methods when tested on three datasets. The proposed approach provides high‐quality predictions in the absence of similarity between the predicted and the training sequences. The PFR‐AFs predictions are characterized by high (between 0.71 and 0.95, depending on the dataset) correlation and the lowest (between 0.75 and 0.9) mean absolute errors with respect to the experimental rates, as measured using out‐of‐sample tests. Our models reveal that for the two‐state chains inclusion of solvent‐exposed Ala may accelerate the folding, while increased content of Ile may reduce the folding speed. We also demonstrate that increased flexibility of coils facilitates faster folding and that proteins with larger content of solvent‐exposed strands may fold at a slower pace. The increased flexibility of the solvent‐exposed residues is shown to elongate folding, which also holds, with a lower correlation, for buried residues. Two case studies are included to support our findings. Proteins 2010.
Amino Acids | 2012
Hua Zhang; Tuo Zhang; Jianzhao Gao; Jishou Ruan; Shiyi Shen; Lukasz Kurgan
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Briefings in Bioinformatics | 2016
Yuedong Yang; Jianzhao Gao; Jihua Wang; Rhys Heffernan; Jack Hanson; Kuldip Kumar Paliwal; Yaoqi Zhou
Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
Methods of Molecular Biology | 2014
Jianzhao Gao; Lukasz Kurgan
Computational identification of B-cell epitopes from antigen chains is a difficult and actively pursued research topic. Efforts towards the development of method for the prediction of linear epitopes span over the last three decades, while only recently several predictors of conformational epitopes were released. We review a comprehensive set of 13 recent approaches that predict linear and 4 methods that predict conformational B-cell epitopes from the antigen sequences. We introduce several databases of B-cell epitopes, since the availability of the corresponding data is at the heart of the development and validation of computational predictors. We also offer practical insights concerning the use and availability of these B-cell epitope predictors, and motivate and discuss feature research in this area.
PLOS ONE | 2015
Wei Zheng; Jishou Ruan; Gang Hu; Kui Wang; Michelle Hanlon; Jianzhao Gao
The prediction of conformational b-cell epitopes plays an important role in immunoinformatics. Several computational methods are proposed on the basis of discrimination determined by the solvent-accessible surface between epitopes and non-epitopes, but the performance of existing methods is far from satisfying. In this paper, depth functions and the k-th surface convex hull are used to analyze epitopes and exposed non-epitopes. On each layer of the protein, we compute relative solvent accessibility and four different types of depth functions, i.e., Chakravarty depth, DPX, half-sphere exposure and half space depth, to analyze the location of epitopes on different layers of the proteins. We found that conformational b-cell epitopes are rich in charged residues Asp, Glu, Lys, Arg, His; aliphatic residues Gly, Pro; non-charged residues Asn, Gln; and aromatic residue Tyr. Conformational b-cell epitopes are rich in coils. Conservation of epitopes is not significantly lower than that of exposed non-epitopes. The average depths (obtained by four methods) for epitopes are significantly lower than that of non-epitopes on the surface using the Wilcoxon rank sum test. Epitopes are more likely to be located in the outer layer of the convex hull of a protein. On the benchmark dataset, the cumulate 10th convex hull covers 84.6% of exposed residues on the protein surface area, and nearly 95% of epitope sites. These findings may be helpful in building a predictor for epitopes.
Bioinformatics | 2016
Jianzhao Gao; Yuedong Yang; Yaoqi Zhou
MOTIVATION Backbone structures and solvent accessible surface area of proteins are benefited from continuous real value prediction because it removes the arbitrariness of defining boundary between different secondary-structure and solvent-accessibility states. However, lacking the confidence score for predicted values has limited their applications. Here we investigated whether or not we can make a reasonable prediction of absolute errors for predicted backbone torsion angles, Cα-atom-based angles and torsion angles, solvent accessibility, contact numbers and half-sphere exposures by employing deep neural networks. RESULTS We found that angle-based errors can be predicted most accurately with Spearman correlation coefficient (SPC) between predicted and actual errors at about 0.6. This is followed by solvent accessibility (SPC∼0.5). The errors on contact-based structural properties are most difficult to predict (SPC between 0.2 and 0.3). We showed that predicted errors are significantly better error indicators than the average errors based on secondary-structure and amino-acid residue types. We further demonstrated the usefulness of predicted errors in model quality assessment. These error or confidence indictors are expected to be useful for prediction, assessment, and refinement of protein structures. AVAILABILITY AND IMPLEMENTATION The method is available at http://sparks-lab.org as a part of SPIDER2 package. CONTACT [email protected] or [email protected] information: Supplementary data are available at Bioinformatics online.
PLOS ONE | 2016
Jianzhao Gao; Wei Cui; Yajun Sheng; Jishou Ruan; Lukasz Kurgan
Ion channels are a class of membrane proteins that attracts a significant amount of basic research, also being potential drug targets. High-throughput identification of these channels is hampered by the low levels of availability of their structures and an observation that use of sequence similarity offers limited predictive quality. Consequently, several machine learning predictors of ion channels from protein sequences that do not rely on high sequence similarity were developed. However, only one of these methods offers a wide scope by predicting ion channels, their types and four major subtypes of the voltage-gated channels. Moreover, this and other existing predictors utilize relatively simple predictive models that limit their accuracy. We propose a novel and accurate predictor of ion channels, their types and the four subtypes of the voltage-gated channels called PSIONplus. Our method combines a support vector machine model and a sequence similarity search with BLAST. The originality of PSIONplus stems from the use of a more sophisticated machine learning model that for the first time in this area utilizes evolutionary profiles and predicted secondary structure, solvent accessibility and intrinsic disorder. We empirically demonstrate that the evolutionary profiles provide the strongest predictive input among new and previously used input types. We also show that all new types of inputs contribute to the prediction. Results on an independent test dataset reveal that PSIONplus obtains relatively good predictive performance and outperforms existing methods. It secures accuracies of 85.4% and 68.3% for the prediction of ion channels and their types, respectively, and the average accuracy of 96.4% for the discrimination of the four ion channel subtypes. Standalone version of PSIONplus is freely available from https://sourceforge.net/projects/psion/
BioMed Research International | 2013
Kui Wang; Jianzhao Gao; Shiyi Shen; Jack A. Tuszynski; Jishou Ruan; Gang Hu
Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we introduce statistical depth function to define negative samples and propose an SVM-based method which integrates sequence and structural information to predict binding site. The results show that the present method performs better than the existent ones. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively; on the independent test set, the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively.