Jiangming Sun
Tongji University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiangming Sun.
Bioinformatics | 2012
Dapeng Li; Tonghua Li; Peisheng Cong; Wenwei Xiong; Jiangming Sun
MOTIVATION The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult. RESULTS We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%. AVAILABILITY The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/ CONTACT [email protected]
BMC Bioinformatics | 2011
Zehui Tang; Tonghua Li; Rida Liu; Wenwei Xiong; Jiangming Sun; Yaojuan Zhu; Guanyan Chen
BACKGROUND The β-turn is a secondary protein structure type that plays an important role in protein configuration and function. Development of accurate prediction methods to identify β-turns in protein sequences is valuable. Several methods for β-turn prediction have been developed; however, the prediction quality is still a challenge and there is substantial room for improvement. Innovations of the proposed method focus on discovering effective features, and constructing a new architectural model. RESULTS We utilized predicted secondary structures, predicted shape strings and the position-specific scoring matrix (PSSM) as input features, and proposed a novel two-layer model to enhance the prediction. We achieved the highest values according to four evaluation measures, i.e. Q(total) = 87.2%, MCC = 0.66, Q(observed) = 75.9%, and Q(predicted) = 73.8% on the BT426 dataset. The results show that our proposed two-layer model discriminates better between β-turns and non-β-turns than the single model due to obtaining higher Q(predicted). Moreover, the predicted shape strings based on the structural alignment approach greatly improve the performance, and the same improvements were observed on BT547 and BT823 datasets as well. CONCLUSION In this article, we present a comprehensive method for the prediction of β-turns. Experiments show that the proposed method constitutes a great improvement over the competing prediction methods.
Nucleic Acids Research | 2012
Jiangming Sun; Shengnan Tang; Wenwei Xiong; Peisheng Cong; Tonghua Li
Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.
Nucleic Acids Research | 2013
Shengnan Tang; Tonghua Li; Peisheng Cong; Wenwei Xiong; Zhiheng Wang; Jiangming Sun
Knowledge of subcellular localizations (SCLs) of plant proteins relates to their functions and aids in understanding the regulation of biological processes at the cellular level. We present PlantLoc, a highly accurate and fast webserver for predicting the multi-label SCLs of plant proteins. The PlantLoc server has two innovative characters: building localization motif libraries by a recursive method without alignment and Gene Ontology information; and establishing simple architecture for rapidly and accurately identifying plant protein SCLs without a machine learning algorithm. PlantLoc provides predicted SCLs results, confidence estimates and which is the substantiality motif and where it is located on the sequence. PlantLoc achieved the highest accuracy (overall accuracy of 80.8%) of identification of plant protein SCLs as benchmarked by using a new test dataset compared other plant SCL prediction webservers. The ability of PlantLoc to predict multiple sites was also significantly higher than for any other webserver. The predicted substantiality motifs of queries also have great potential for analysis of relationships with protein functional regions. The PlantLoc server is available at http://cal.tongji.edu.cn/PlantLoc/.
Amino Acids | 2012
Yaojuan Zhu; Tonghua Li; Dapeng Li; Yun Zhang; Wenwei Xiong; Jiangming Sun; Zehui Tang; Guanyan Chen
Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Qtotal can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ freely.
Journal of Theoretical Biology | 2012
Yinxia Hu; Tonghua Li; Jiangming Sun; Shengnan Tang; Wenwei Xiong; Dapeng Li; Guanyan Chen; Peisheng Cong
The subcellular localization of proteins is closely related to their functions. In this work, we propose a novel approach based on localization motifs to improve the accuracy of predicting subcellular localization of Gram-positive bacterial proteins. Our approach performed well on a five-fold cross validation with an overall success rate of 89.5%. Besides, the overall success rate of an independent testing dataset was 97.7%. Moreover, our approach was tested using a new experimentally-determined set of Gram-positive bacteria proteins and achieved an overall success rate of 96.3%.
Talanta | 2011
Qian Qiao; Tonghua Li; Jiangming Sun; Xiaoyan Liu; Jianke Ren; Jian Fei
Previous studies have shown that the C57 and 129 strains of mice display marked differences in behavioural performance, neuroanatomy, neurochemistry and synaptic plasticity. However, few metabolomic studies of their biofluids have been performed. As part of a series of metabolic phenotyping, the effects of gender and strain upon serum metabolite composition and variation are examined in this study using gas chromatography-mass spectrometry (GC-MS) in normal C57BL/6J and 129S1/SvImJ strains of mice. The 129S1/SvImJ strain is phenotypically distinct from the C57BL/6J strain and characteristic metabotypes are produced for both male and female mice of each strain. These data demonstrate that the C57BL/6J and 129S1/SvImJ strains of mice show a wide range of metabolic differences across glycine, serine and threonine metabolism; valine, leucine and isoleucine biosynthesis; and tricarboxylic acid cycle pathways. Remarkably, the concentration of glyceric acid in the 129S1/SvImJ strain is significantly increased compared to the C57BL/6J mouse strain, reflecting important considerations for studies that use the 129S1/SvImJ mouse as the human d-glycericaciduria model. We infer that a deficiency of d-glycerate kinase would explain such a glyceric acid accumulation in the 129S1/SvImJ strain. More importantly, this differential metabolite level data provide insight into specific metabolic pathways and lay the groundwork for integrated studies of the mouse models.
PLOS ONE | 2013
Xiaoyan Zhang; Longjian Lu; Qi Song; Qianqian Yang; Dapeng Li; Jiangming Sun; Tonghua Li; Peisheng Cong
Motivation The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. Results In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. Availability The DomHR is available at http://cal.tongji.edu.cn/domain/.
Biochimie | 2013
Duo-Duo Wang; Tonghua Li; Jiangming Sun; Dapeng Li; Wenwei Xiong; Wen-Yan Wang; Shengnan Tang
Protein-DNA interactions are involved in many biological processes essential for gene expression and regulation. To understand the molecular mechanisms of protein-DNA recognition, it is crucial to analyze and identify DNA-binding residues of protein-DNA complexes. Here, we proposed a novel descriptor shape string and another two related features shape string PSSM and shape string pair composition to characterize DNA-binding residues. We employed the new features and the position-specific scoring matrix (PSSM) for modeling and prediction. The results of a benchmark dataset showed that our approach significantly improved the accuracy of the predictor. The overall accuracy of our approach reached 85.86% with 85.02% sensitivity and 86.02% specificity. The results also demonstrated that shape string is a powerful descriptor for the prediction of DNA-binding residues. The additional two related features enhanced the predictive value.
PLOS ONE | 2012
Qi Song; Tonghua Li; Peisheng Cong; Jiangming Sun; Dapeng Li; Shengnan Tang
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.