Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yu-Dong Cai is active.

Publication


Featured researches published by Yu-Dong Cai.


Biophysical Journal | 2003

Support Vector Machines for Predicting Membrane Protein Types by Using Functional Domain Composition

Yu-Dong Cai; Guo-Ping Zhou; Kuo-Chen Chou

Membrane proteins are generally classified into the following five types: 1), type I membrane protein; 2), type II membrane protein; 3), multipass transmembrane proteins; 4), lipid chain-anchored membrane proteins; and 5), GPI-anchored membrane proteins. In this article, based on the concept of using the functional domain composition to define a protein, the Support Vector Machine algorithm is developed for predicting the membrane protein type. High success rates are obtained by both the self-consistency and jackknife tests. The current approach, complemented with the powerful covariant discriminant algorithm based on the pseudo-amino acid composition that has incorporated quasi-sequence-order effect as recently proposed by K. C. Chou (2001), may become a very useful high-throughput tool in the area of bioinformatics and proteomics.


Computational Biology and Chemistry | 2002

Prediction of protein structural classes by support vector machines.

Yu-Dong Cai; Xiao-Jun Liu; Xue-biao Xu; Kuo-Chen Chou

In this paper, we apply a new machine learning method which is called support vector machine to approach the prediction of protein structural class. The support vector machine method is performed based on the database derived from SCOP which is based upon domains of known structure and the evolutionary relationships and the principles that govern their 3D structure. As a result, high rates of both self-consistency and jackknife test are obtained. This indicates that the structural class of a protein inconsiderably correlated with its amino and composition, and the support vector machine can be referred as a powerful computational tool for predicting the structural classes of proteins.


PLOS ONE | 2010

Predicting drug-target interaction networks based on functional groups and biological features.

Zhisong He; Jian Zhang; Xiao-He Shi; Le-Le Hu; Xiangyin Kong; Yu-Dong Cai; Kuo-Chen Chou

Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging.


Journal of Chemical Information and Modeling | 2005

Prediction of membrane protein types by incorporating amphipathic effects.

Kuo-Chen Chou; Yu-Dong Cai

According to their intramolecular arrangement and position in a cell, membrane proteins are generally classified into the following six types: (1) type I transmembrane, (2) type II transmembrane, (3) multipass transmembrane, (4) lipid chain-anchored membrane, (5) GPI-anchored membrane, and (6) peripheral membrane. Situated in a heteropolar environment, these six types of membrane proteins must have quite different amphiphilic sequence-order patterns in order to stabilize their respective frameworks. To incorporate such a feature into the predictor, the amphiphilic pseudo amino acid composition has been formulated that contains a series of hydrophobic and hydrophilic correlation factors. The success rates thus obtained have been remarkably enhanced in identifying the types of membrane proteins, as demonstrated by the jackknife test and independent data set test, respectively.


BMC Bioinformatics | 2001

Support Vector Machines for predicting protein structural class

Yu-Dong Cai; Xiao-Jun Liu; Xue-biao Xu; Guo-Ping Zhou

BackgroundWe apply a new machine learning method, the so-called Support Vector Machine method, to predict the protein structural class. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified based on known structures and the evolutionary relationships and the principles that govern their 3-D structure.ResultsHigh rates of both self-consistency and jackknife tests are obtained. The good results indicate that the structural class of a protein is considerably correlated with its amino acid composition.ConclusionsIt is expected that the Support Vector Machine method and the elegant component-coupled method, also named as the covariant discrimination algorithm, if complemented with each other, can provide a powerful computational tool for predicting the structural classes of proteins.


Proteins | 2003

Predicting protein quaternary structure by pseudo amino acid composition

Kuo-Chen Chou; Yu-Dong Cai

In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, that associate through noncovalent interactions and, occasionally, disulfide bonds. With the number of protein sequences entering into data banks rapidly increasing, we are confronted with a challenge: how to develop an automated method to identify the quaternary attribute for a new polypeptide chain (i.e., whether it is formed just as a monomer, or as a dimer, trimer, or any other oligomer). This is important, because the functions of proteins are closely related to their quaternary attribute. For example, some critical ligands only bind to dimers but not to monomers; some marvelous allosteric transitions only occur in tetramers but not other oligomers; and some ion channels are formed by tetramers, whereas others are formed by pentamers. To explore this problem, we adopted the pseudo amino acid composition originally proposed for improving the prediction of protein subcellular location (Chou, Proteins, 2001; 43:246–255). The advantage of using the pseudo amino acid composition to represent a protein is that it has paved a way that can take into account a considerable amount of sequence‐order effects to significantly improve prediction quality. Results obtained by resubstitution, jack‐knife, and independent data set tests, have indicated that the current approach might be quite promising in dealing with such an extremely complicated and difficult problem. Proteins 2003.


PLOS ONE | 2012

Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network

Bi-Qing Li; Tao Huang; Lei Liu; Yu-Dong Cai; Kuo-Chen Chou

One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.


Journal of Computational Chemistry | 2002

Support vector machines for predicting HIV protease cleavage sites in protein

Yu-Dong Cai; Xiao-Jun Liu; Xue-biao Xu; Kuo-Chen Chou

Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, a Support Vector Machine is applied to predict the cleavability of oligopeptides by proteases with multiple and extended specificity subsites. We selected HIV‐1 protease as the subject of the study. Two hundred ninety‐nine oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. Because of its high rate of self‐consistency (299/299=100%), a good result in the jackknife test (286/299=95%) and correct prediction rate (55/63 = 87%), it is expected that the Support Vector Machine method can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, which is one of the targets in designing potential drugs against AIDS. The principle of the Support Vector Machine method can also be applied to analyzing the specificity of other multisubsite enzymes.


Journal of Cellular Biochemistry | 2003

Prediction and classification of protein subcellular location—sequence‐order effect and pseudo amino acid composition

Kuo-Chen Chou; Yu-Dong Cai

Given a protein sequence, how to identify its subcellular location? With the rapid increase in newly found protein sequences entering into databanks, the problem has become more and more important because the function of a protein is closely correlated with its localization. To practically deal with the challenge, a dataset has been established that allows the identification performed among the following 14 subcellular locations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. Compared with the datasets constructed by the previous investigators, the current one represents the largest in the scope of localizations covered, and hence many proteins which were totally out of picture in the previous treatments, can now be investigated. Meanwhile, to enhance the potential and flexibility in taking into account the sequence‐order effect, the series‐mode pseudo‐amino‐acid‐composition has been introduced as a representation for a protein. High success rates are obtained by the re‐substitution test, jackknife test, and independent dataset test, respectively. It is anticipated that the current automated method can be developed to a high throughput tool for practical usage in both basic research and pharmaceutical industry.


PLOS ONE | 2010

Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks

Tao Huang; Xiao-He Shi; Ping Wang; Zhisong He; Kai-Yan Feng; Le-Le Hu; Xiangyin Kong; Yixue Li; Yu-Dong Cai; Kuo-Chen Chou

The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of proteins metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure proteins metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: “short”, “medium”, “long”, and “extra-long” half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age.

Collaboration


Dive into the Yu-Dong Cai's collaboration.

Top Co-Authors

Avatar

Tao Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lei Chen

Shanghai Maritime University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kai-Yan Feng

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Xiangyin Kong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yixue Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yu-Hang Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bi-Qing Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge