Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhan-Chao Li is active.

Publication


Featured researches published by Zhan-Chao Li.


Amino Acids | 2009

Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis

Zhan-Chao Li; Xi-Bin Zhou; Zong Dai; Xiaoyong Zou

A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.


Amino Acids | 2008

Prediction of protein structure class by coupling improved genetic algorithm and support vector machine

Zhan-Chao Li; Xi-Bin Zhou; Ying Lin; Xiaoyong Zou

Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining the structural class of a protein are based on a group of features that only possesses a kind of discriminative information for the prediction of protein structure class. However, different types of discriminative information associated with primary sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine (SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters. Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range of 97.8–100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful tool in bioinformatics.


Amino Acids | 2008

Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine

Xi-Bin Zhou; Chao Chen; Zhan-Chao Li; Xuenong Zou

Summary.Apoptosis proteins play an important role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is very helpful for understanding the mechanism of apoptosis and their biological functions. However, most of the existing predictive methods are designed by utilizing a single classifier, which would limit the further improvement of their performances. In this paper, a novel predictive method, which is essentially a multi-classifier system, has been proposed by combing a dual-layer support vector machine (SVM) with multiple compositions including amino acid composition (AAC), dipeptide composition (DPC) and amphiphilic pseudo amino acid composition (Am-Pse-AAC). As a demonstration, the predictive performance of our method was evaluated on two datasets of apoptosis proteins, involving the standard dataset ZD98 generated by Zhou and Doctor, and a larger dataset ZW225 generated by Zhang et al. With the jackknife test, the overall accuracies of our method on the two datasets reach 94.90% and 88.44%, respectively. The promising results indicate that our method can be a complementary tool for the prediction of subcellular location.


BMC Bioinformatics | 2010

Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm

Zhan-Chao Li; Xuan Zhou; Zong Dai; Xiaoyong Zou

BackgroundBecause a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs.ResultsIn this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred.ConclusionThe results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.


Journal of Molecular Graphics & Modelling | 2010

QSAR modeling of peptide biological activity by coupling support vector machine with particle swarm optimization algorithm and genetic algorithm.

Xuan Zhou; Zhan-Chao Li; Zong Dai; Xiaoyong Zou

A novel method coupling particle swarm optimization algorithm (PSO) and genetic algorithm (GA) was proposed to optimize simultaneously the kernel parameters of support vector machine (SVM) and determine the optimized features subset. By coupling GA with PSO, the particles produced in each generation in PSO algorithm were processed by crossover and mutation of GA, and then the particles could keep diversity to escape from local optima and find the global optima quickly and accurately. In order to evaluate the proposed method, four peptide datasets were employed for the investigation of quantitative structure-activity relationship (QSAR). The structural and physicochemical features of peptides from amino acid sequences were used to represent peptides for QSAR. The correlation coefficients (R) of training set of the four datasets were 1.0000, 0.9508, 1.0000, 0.9995, the R of test set of the four datasets were 0.9922, 0.9687, 0.9022, 0.7404, respectively. The root-mean-square errors (RMSEs) of training set of the four datasets were 0.0000, 0.0986, 0.0000, 0.0203, the RMSEs of test set of the four datasets were 0.2522, 0.2782, 0.9625, 0.2928, respectively. A protein dataset, which consists of 277 proteins, was also employed to evaluate the current method for predicting protein structural class, and the good results of overall success rate were obtained. The results indicated that the proposed method might hold a high potential to become a useful tool in peptide QSAR and protein prediction research.


Analytica Chimica Acta | 2011

Identification of protein methylation sites by coupling improved ant colony optimization algorithm and support vector machine.

Zhan-Chao Li; Xuan Zhou; Zong Dai; Xiaoyong Zou

Protein methylation is involved in dozens of biological processes and plays an important role in adjusting protein physicochemical properties, conformation and function. However, with the rapid increase of protein sequence entering into databanks, the gap between the number of known sequence and the number of known methylation annotation is widening rapidly. Therefore, it is vitally significant to develop a computational method for quick and accurate identification of methylation sites. In this study, a novel predictor (Methy_SVMIACO) based on support vector machine (SVM) and improved ant colony optimization algorithm (IACO) is developed to identify methylation sites. The IACO is utilized to find the optimal feature subset and parameter of SVM, while SVM is employed to perform the identification of methylation sites. Comparison of the IACO with conventional ACO shows that the IACO converges quickly toward the global optimal solution and it is more useful tool for feature selection and SVM parameter optimization. The performance of Methy_SVMIACO is evaluated with a sensitivity of 85.71%, a specificity of 86.67%, an accuracy of 86.19% and a Matthews correlation coefficient (MCC) of 0.7238 for lysine as well as a sensitivity of 89.08%, a specificity of 94.07%, an accuracy of 91.56% and a MCC of 0.8323 for arginine in 10-fold cross-validation test. It is shown through the analysis of the optimal feature subset that some upstream and downstream residues play important role in the methylation of arginine and lysine. Compared with other existing methods, the Methy_SVMIACO provides higher Acc, Sen and Spe, indicating that the current method may serve as a powerful complementary tool to other existing approaches in this area. The Methy_SVMIACO can be acquired freely on request from the authors.


Computers in Biology and Medicine | 2012

Prediction of methylation CpGs and their methylation degrees in human DNA sequences

Xuan Zhou; Zhan-Chao Li; Zong Dai; Xiaoyong Zou

DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. The detections of DNA methylation have been determined mostly by experimental methods, which were time-consuming and expensive, difficult to meet the requirements of modern large-scale sequencing technology. Accordingly, it is necessary to develop automatic, reliable prediction methods for DNA methylation. In this study, the trinucleotide composition, a 64-dimensional feature vector of the occurrence frequency of 64 trinucleotides in the DNA sequence, was utilized to model SVM for the prediction of CpG methylation degrees in humans. The model was evaluated by jackknife validation and the correlation coefficient (R) and root-mean-square error (RMSE) were 0.8223 and 0.2042, respectively. The proposed method was also used to predict methylation sites, the model was evaluated by jackknife validation and the Matthews correlation coefficient (MCC) and accuracy (ACC) were 0.6263 and 0.8133, respectively. The good results indicated that the proposed method was a useful tool for the investigation of DNA methylation prediction research.


Analytica Chimica Acta | 2012

Identification of human protein complexes from local sub-graphs of protein–protein interaction network based on random forest with topological structure features

Zhan-Chao Li; Yan-Hua Lai; Li-Li Chen; Xuan Zhou; Zong Dai; Xiaoyong Zou

In the post-genomic era, one of the most important and challenging tasks is to identify protein complexes and further elucidate its molecular mechanisms in specific biological processes. Previous computational approaches usually identify protein complexes from protein interaction network based on dense sub-graphs and incomplete priori information. Additionally, the computational approaches have little concern about the biological properties of proteins and there is no a common evaluation metric to evaluate the performance. So, it is necessary to construct novel method for identifying protein complexes and elucidating the function of protein complexes. In this study, a novel approach is proposed to identify protein complexes using random forest and topological structure. Each protein complex is represented by a graph of interactions, where descriptor of the protein primary structure is used to characterize biological properties of protein and vertex is weighted by the descriptor. The topological structure features are developed and used to characterize protein complexes. Random forest algorithm is utilized to build prediction model and identify protein complexes from local sub-graphs instead of dense sub-graphs. As a demonstration, the proposed approach is applied to protein interaction data in human, and the satisfied results are obtained with accuracy of 80.24%, sensitivity of 81.94%, specificity of 80.07%, and Matthews correlation coefficient of 0.4087 in 10-fold cross-validation test. Some new protein complexes are identified, and analysis based on Gene Ontology shows that the complexes are likely to be true complexes and play important roles in the pathogenesis of some diseases. PCI-RFTS, a corresponding executable program for protein complexes identification, can be acquired freely on request from the authors.


Talanta | 2011

Predicting methylation status of human DNA sequences by pseudo-trinucleotide composition.

Xuan Zhou; Zhan-Chao Li; Zong Dai; Xiaoyong Zou

DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. The detections of DNA methylation have been determined mostly by experimental methods; however, these methods were time-consuming, expensive, and difficult to meet the requirements of modern large-scale sequencing technology. Accordingly, it is necessary to develop automatic and reliable prediction methods for DNA methylation. In this study, the pseudo-trinucleotide composition was proposed, and a novel method was developed by support vector machine (SVM) with the pseudo-trinucleotide composition as input parameter to represent DNA sequence for DNA methylation prediction. The model was evaluated on two datasets, including a dataset of Rollins (dataset_1) and a dataset collected healthy human records from the MethDB database (dataset_2). For dataset_1, the Matthews correlation coefficient (MCC) and accuracy (ACC) by jackknife validation were 0.8051 and 0.6098, respectively. For dataset_2, the MCC and ACC were 0.8500 and 0.7203, respectively. The good prediction results reveal that the pseudo-trinucleotide composition is an effective representation method for DNA sequence and plays a very important role in the prediction of DNA function.


Amino Acids | 2012

Classification of G proteins and prediction of GPCRs-G proteins coupling specificity using continuous wavelet transform and information theory

Zhan-Chao Li; Xuan Zhou; Zong Dai; Xiaoyong Zou

The coupling between G protein-coupled receptors (GPCRs) and guanine nucleotide-binding proteins (G proteins) regulates various signal transductions from extracellular space into the cell. However, the coupling mechanism between GPCRs and G proteins is still unknown, and experimental determination of their coupling specificity and function is both expensive and time consuming. Therefore, it is significant to develop a theoretical method to predict the coupling specificity between GPCRs and G proteins as well as their function using their primary sequences. In this study, a novel four-layer predictor (GPCRsG_CWTIT) based on support vector machine (SVM), continuous wavelet transform (CWT) and information theory (IT) is developed to classify G proteins and predict the coupling specificity between GPCRs and G proteins. SVM is used for construction of models. CWT and IT are used to characterize the primary structure of protein. Performance of GPCRsG_CWTIT is evaluated with cross-validation test on various working dataset. The overall accuracy of the G proteins at the levels of class and family is 98.23 and 85.42%, respectively. The accuracy of the coupling specificity prediction varies from 74.60 to 94.30%. These results indicate that the proposed predictor is an effective and feasible tool to predict the coupling specificity between GPCRs and G proteins as well as their functions using only the protein full sequence. The establishment of such an accurate prediction method will facilitate drug discovery by improving the ability to identify and predict protein–protein interactions. GPCRsG_CWTIT and dataset can be acquired freely on request from the authors.

Collaboration


Dive into the Zhan-Chao Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zong Dai

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Xuan Zhou

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Xi-Bin Zhou

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Chao Chen

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Li-Li Chen

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Yan-Hua Lai

Sun Yat-sen University

View shared research outputs
Top Co-Authors

Avatar

Xuenong Zou

Sun Yat-sen University

View shared research outputs
Researchain Logo
Decentralizing Knowledge