Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiao-He Shi is active.

Publication


Featured researches published by Xiao-He Shi.


PLOS ONE | 2010

Predicting drug-target interaction networks based on functional groups and biological features.

Zhisong He; Jian Zhang; Xiao-He Shi; Le-Le Hu; Xiangyin Kong; Yu-Dong Cai; Kuo-Chen Chou

Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging.


PLOS ONE | 2010

Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks

Tao Huang; Xiao-He Shi; Ping Wang; Zhisong He; Kai-Yan Feng; Le-Le Hu; Xiangyin Kong; Yixue Li; Yu-Dong Cai; Kuo-Chen Chou

The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of proteins metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure proteins metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: “short”, “medium”, “long”, and “extra-long” half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age.


PLOS ONE | 2011

Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties

Le-Le Hu; Tao Huang; Xiao-He Shi; Wencong Lu; Yu-Dong Cai; Kuo-Chen Chou

Background With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner. Methodology/Principal Findings Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%. Conclusions/Significance The results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem.


Amino Acids | 2012

Prediction of lysine ubiquitination with mRMR feature selection and analysis

Yu-Dong Cai; Tao Huang; Le-Le Hu; Xiao-He Shi; Lu Xie; Yixue Li

Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.


Biochimie | 2011

Prediction and analysis of protein palmitoylation sites.

Le-Le Hu; Si-Bao Wan; Shen Niu; Xiao-He Shi; Haipeng Li; Yu-Dong Cai; Kuo-Chen Chou

Palmitoylation is a universal and important lipid modification, involving a series of basic cellular processes, such as membrane trafficking, protein stability and protein aggregation. With the avalanche of new protein sequences generated in the post genomic era, it is highly desirable to develop computational methods for rapidly and effectively identifying the potential palmitoylation sites of uncharacterized proteins so as to timely provide useful information for revealing the mechanism of protein palmitoylation. By using the Incremental Feature Selection approach based on amino acid factors, conservation, disorder feature, and specific features of palmitoylation site, a new predictor named IFS-Palm was developed in this regard. The overall success rate thus achieved by jackknife test on a newly constructed benchmark dataset was 90.65%. It was shown via an in-depth analysis that palmitoylation was intimately correlated with the feature of the upstream residue directly adjacent to cysteine site as well as the conservation of amino acid cysteine. Meanwhile, the protein disorder region might also play an import role in the post-translational modification. These findings may provide useful insights for revealing the mechanisms of palmitoylation.


Molecules | 2010

Analysis of Protein Pathway Networks Using Hybrid Properties

Lei Chen; Tao Huang; Xiao-He Shi; Yu-Dong Cai; Kuo-Chen Chou

Given a protein-forming system, i.e., a system consisting of certain number of different proteins, can it form a biologically meaningful pathway? This is a fundamental problem in systems biology and proteomics. During the past decade, a vast amount of information on different organisms, at both the genetic and metabolic levels, has been accumulated and systematically stored in various specific databases, such as KEGG, ENZYME, BRENDA, EcoCyc and MetaCyc. These data have made it feasible to address such an essential problem. In this paper, we have analyzed known regulatory pathways in humans by extracting different (biological and graphic) features from each of the 17,069 protein-formed systems, of which 169 are positive pathways, i.e., known regulatory pathways taken from KEGG; while 16,900 were negative, i.e., not formed as a biologically meaningful pathway. Each of these protein-forming systems was represented by 352 features, of which 88 are graph features and 264 biological features. To analyze these features, the “Minimum Redundancy Maximum Relevance” and the “Incremental Feature Selection” techniques were utilized to select a set of 22 optimal features to query whether a protein-forming system is able to form a biologically meaningful pathway or not. It was found through cross-validation that the overall success rate thus obtained in identifying the positive pathways was 79.88%. It is anticipated that, this novel approach and encouraging result, although preliminary yet, may stimulate extensive investigations into this important topic.


Journal of Proteome Research | 2009

Identifying Protein Complexes Using Hybrid Properties

Lei Chen; Xiao-He Shi; Xiangyin Kong; Zhenbing Zeng; Yu-Dong Cai

Protein complexes, integrating multiple gene products, perform all sorts of fundamental biological functions in cells. Much effort has been put into identifying protein complexes using computational approaches. A vast majority attempt to research densely connected regions in protein-protein interaction (PPI) network/graph. In this research, we try an alterative approach to analyze protein complexes using hybrid features and present a method to determine whether multiple (more than two) proteins from yeast can form a protein complex. The data set consists of 493 positive protein complexes and 9878 negative protein complexes. Every complex is represented by graph features, where proteins in the complex form a graph (web) of interactions, and features derived from biological properties including protein length, biochemical properties and physicochemical properties. These features are filtered and optimized by Minimum Redundancy Maximum Relevance method, Incremental Feature Selection and Forward Feature Selection, established through a prediction/identification model called Nearest Neighbor Algorithm. Jackknife cross-validation test is employed to evaluate the identification accuracy. As a result, the highest accuracy for the identification of the real protein complexes using filtered features is 69.17%, and feature analysis shows that, among the adopted features, graph features play the main roles in the determination of protein complexes.


PLOS ONE | 2010

Prediction and Analysis of Protein Hydroxyproline and Hydroxylysine

Le-Le Hu; Shen Niu; Tao Huang; Kai Wang; Xiao-He Shi; Yu-Dong Cai

Background Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites. Methodology/Principal Findings In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination. Conclusions/Significance These findings may provide useful insights for exploiting the mechanisms of hydroxylation.


Biopolymers | 2011

Prediction and analysis of protein methylarginine and methyllysine based on Multisequence features

Le-Le Hu; Zhen Li; Kai Wang; Shen Niu; Xiao-He Shi; Yu-Dong Cai; Haipeng Li

Protein methylation, one of the most important post-translational modifications, typically takes place on arginine or lysine residue. The reversible modification involves a series of basic cellular processes. Identification of methyl proteins with their sites will facilitate the understanding of the molecular mechanism of methylation. Besides the experimental methods, computational predictions of methylated sites are much more desirable for their convenience and fast speed. Here, we propose a method dedicated to predicting methylated sites of proteins. Feature selection was made on sequence conservation, physicochemical/biochemical properties, and structural disorder by applying maximum relevance minimum redundancy and incremental feature selection methods. The prediction models were built according to nearest the neighbor algorithm and evaluated by the jackknife cross-validation. We built 11 and 9 predictors for methylarginine and methyllysine, respectively, and integrated them to predict methylated sites. As a result, the average prediction accuracies are 74.25%, 77.02% for methylarginine and methyllysine training sets, respectively. Feature analysis suggested evolutionary information, and physicochemical/biochemical properties play important roles in the recognition of methylated sites. These findings may provide valuable information for exploiting the mechanisms of methylation. Our method may serve as a useful tool for biologists to find the potential methylated sites of proteins.


Protein and Peptide Letters | 2013

A Sequence-based Approach for Predicting Protein Disordered Regions

Tao Huang; Zhisong He; Weiren Cui; Yu-Dong Cai; Xiao-He Shi; Le-Le Hu; Kuo-Chen Chou

Protein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathews correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.

Collaboration


Dive into the Xiao-He Shi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhisong He

CAS-MPG Partner Institute for Computational Biology

View shared research outputs
Top Co-Authors

Avatar

Haipeng Li

CAS-MPG Partner Institute for Computational Biology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiangyin Kong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yixue Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge