Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chin-Sheng Yu is active.

Publication


Featured researches published by Chin-Sheng Yu.


Proteins | 2006

Prediction of protein subcellular localization

Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang

Because the proteins function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836–2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences—some data sets comprising sequences up to 80–90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two‐level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets—one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all‐against‐all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches—especially those relying on homology search or sequence annotations. Our two‐level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two‐level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization. Proteins 2006.


Protein Science | 2004

Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions

Chin-Sheng Yu; Chih-Jen Lin; Jenn-Kang Hwang

Gram‐negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram‐negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n‐peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT‐B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high‐throughput and large‐scale analysis of proteomic and genomic data.


PLOS ONE | 2014

CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation.

Chin-Sheng Yu; Chih-Wen Cheng; Wen-Chi Su; Kuei-Chung Chang; Shao-Wei Huang; Jenn-Kang Hwang; Chih-Hao Lu

CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.


PLOS ONE | 2012

Prediction of Metal Ion-Binding Sites in Proteins Using the Fragment Transformation Method

Chih-Hao Lu; Yu-Feng Lin; Jau-Ji Lin; Chin-Sheng Yu

The structure of a protein determines its function and its interactions with other factors. Regions of proteins that interact with ligands, substrates, and/or other proteins, tend to be conserved both in sequence and structure, and the residues involved are usually in close spatial proximity. More than 70,000 protein structures are currently found in the Protein Data Bank, and approximately one-third contain metal ions essential for function. Identifying and characterizing metal ion–binding sites experimentally is time-consuming and costly. Many computational methods have been developed to identify metal ion–binding sites, and most use only sequence information. For the work reported herein, we developed a method that uses sequence and structural information to predict the residues in metal ion–binding sites. Six types of metal ion–binding templates– those involving Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, and Zn2+–were constructed using the residues within 3.5 Å of the center of the metal ion. Using the fragment transformation method, we then compared known metal ion–binding sites with the templates to assess the accuracy of our method. Our method achieved an overall 94.6 % accuracy with a true positive rate of 60.5 % at a 5 % false positive rate and therefore constitutes a significant improvement in metal-binding site prediction.


Proteins | 2003

Fine-grained protein fold assignment by support vector machines using generalized npeptide coding schemes and jury voting from multiple-parameter sets

Chin-Sheng Yu; Jung-Ying Wang; Jinn-Moon Yang; Ping-Chiang Lyu; Chih-Jen Lin; Jenn-Kang Hwang

In the coarse‐grained fold assignment of major protein classes, such as all‐α, all‐β, α + β, α/β proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine‐grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n‐peptide and jury voting, to the fine‐grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set—significantly higher than the previous results. On 10‐fold cross‐validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence‐coding schemes can significantly improve the fine‐grained fold prediction. Our approach should be useful in structure prediction and modeling. Proteins 2003;50:531–536.


Proteins | 2007

Predicting disulfide connectivity patterns

Chih-Hao Lu; Yu-Ching Chen; Chin-Sheng Yu; Jenn-Kang Hwang

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507–512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature. Proteins 2007.


PLOS ONE | 2011

Identification of Antifreeze Proteins and Their Functional Residues by Support Vector Machine and Genetic Algorithms based on n-Peptide Compositions

Chin-Sheng Yu; Chih-Hao Lu

For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.


Journal of Chemical Information and Modeling | 2016

MIB: Metal Ion-Binding Site Prediction and Docking Server

Yu-Feng Lin; Chih-Wen Cheng; Chung-Shiuan Shih; Jenn-Kang Hwang; Chin-Sheng Yu; Chih-Hao Lu

The structure of a protein determines its biological function(s) and its interactions with other factors; the binding regions tend to be conserved in sequence and structure, and the interacting residues involved are usually in close 3D space. The Protein Data Bank currently contains more than 110 000 protein structures, approximately one-third of which contain metal ions. Identifying and characterizing metal ion-binding sites is thus essential for investigating a proteins function(s) and interactions. However, experimental approaches are time-consuming and costly. The web server reported here was built to predict metal ion-binding residues and to generate the predicted metal ion-bound 3D structure. Binding templates have been constructed for regions that bind 12 types of metal ion-binding residues have been used to construct binding templates. The templates include residues within 3.5 Å of the metal ion, and the fragment transformation method was used for structural comparison between query proteins and templates without any data training. Through the adjustment of scoring functions, which are based on the similarity of structure and binding residues. Twelve kinds of metal ions (Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, Zn2+, Cd2+, Fe2+, Ni2+, Hg2+, Co2+, and Cu+) binding residues prediction are supported. MIB also provides the metal ions docking after prediction. The MIB server is available at http://bioinfo.cmu.edu.tw/MIB/ .


Proteins | 2006

The fragment transformation method to detect the protein structural motifs.

Chih-Hao Lu; Yeong-Shin Lin; Yu-Ching Chen; Chin-Sheng Yu; Shi-Yu Chang; Jenn-Kang Hwang

To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp‐His‐Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the ββα‐metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the ββα‐metal binding motif and the treble clef motif. The ββα‐metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc‐binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions. Proteins 2006.


BioMed Research International | 2014

EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction

Chih-Hao Lu; Chin-Sheng Yu; Yu-Tung Chien; Shao-Wei Huang

We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.

Collaboration


Dive into the Chin-Sheng Yu's collaboration.

Top Co-Authors

Avatar

Chih-Hao Lu

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jenn-Kang Hwang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yu-Ching Chen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chih-Jen Lin

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Chih-Wen Cheng

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Shao-Wei Huang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yu-Feng Lin

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jau-Ji Lin

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jinn-Moon Yang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jung-Ying Wang

National Taiwan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge