Is this you? Create Your Porfile

Chih-Hao Lu

National Chiao Tung University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chih-Hao Lu is active.

Explore More

Publication

Featured researches published by Chih-Hao Lu.

Proteins | 2006

Prediction of protein subcellular localization

Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang

Because the proteins function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836–2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences—some data sets comprising sequences up to 80–90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two‐level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets—one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all‐against‐all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches—especially those relying on homology search or sequence annotations. Our two‐level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two‐level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization. Proteins 2006.

PLOS ONE | 2014

CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation.

Chin-Sheng Yu; Chih-Wen Cheng; Wen-Chi Su; Kuei-Chung Chang; Shao-Wei Huang; Jenn-Kang Hwang; Chih-Hao Lu

CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.

Proteins | 2008

Deriving protein dynamical properties from weighted protein contact number

Chih-Peng Lin; Shao-Wei Huang; Yan-Long Lai; Shih-Chung Yen; Chien-Hua Shih; Chih-Hao Lu; Cuen-Chao Huang; Jenn-Kang Hwang

It has recently been shown that in proteins the atomic mean‐square displacement (or B‐factor) can be related to the number of the neighboring atoms (or protein contact number), and that this relationship allows one to compute the B‐factor profiles directly from protein contact number. This method, referred to as the protein contact model, is appealing, since it requires neither trajectory integration nor matrix diagonalization. As a result, the protein contact model can be applied to very large proteins and can be implemented as a high‐throughput computational tool to compute atomic fluctuations in proteins. Here, we show that this relationship can be further refined to that between the atomic mean‐square displacement and the weighted protein contact‐number, the weight being the square of the reciprocal distance between the contacting pair. In addition, we show that this relationship can be utilized to compute the cross‐correlation of atomic motion (the B‐factor is essentially the auto‐correlation of atomic motion). For a nonhomologous dataset comprising 972 high‐resolution X‐ray protein structures (resolution <2.0 Å and sequence identity <25%), the mean correlation coefficient between the X‐ray and computed B‐factors based on the weighted protein contact‐number model is 0.61, which is better than those of the original contact‐number model (0.51) and other methods. We also show that the computed correlation maps based on the weighted contact‐number model are globally similar to those computed through normal model analysis for some selected cases. Our results underscore the relationship between protein dynamics and protein packing. We believe that our method will be useful in the study of the protein structure‐dynamics relationship. Proteins 2008.

PLOS ONE | 2012

Prediction of Metal Ion-Binding Sites in Proteins Using the Fragment Transformation Method

Chih-Hao Lu; Yu-Feng Lin; Jau-Ji Lin; Chin-Sheng Yu

The structure of a protein determines its function and its interactions with other factors. Regions of proteins that interact with ligands, substrates, and/or other proteins, tend to be conserved both in sequence and structure, and the residues involved are usually in close spatial proximity. More than 70,000 protein structures are currently found in the Protein Data Bank, and approximately one-third contain metal ions essential for function. Identifying and characterizing metal ion–binding sites experimentally is time-consuming and costly. Many computational methods have been developed to identify metal ion–binding sites, and most use only sequence information. For the work reported herein, we developed a method that uses sequence and structural information to predict the residues in metal ion–binding sites. Six types of metal ion–binding templates– those involving Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, and Zn2+–were constructed using the residues within 3.5 Å of the center of the metal ion. Using the fragment transformation method, we then compared known metal ion–binding sites with the templates to assess the accuracy of our method. Our method achieved an overall 94.6 % accuracy with a true positive rate of 60.5 % at a 5 % false positive rate and therefore constitutes a significant improvement in metal-binding site prediction.

Proteins | 2007

Predicting disulfide connectivity patterns

Chih-Hao Lu; Yu-Ching Chen; Chin-Sheng Yu; Jenn-Kang Hwang

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507–512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature. Proteins 2007.

Proteins | 2008

On the relationship between the protein structure and protein dynamics.

Chih-Hao Lu; Shao-Wei Huang; Yan-Long Lai; Chih-Peng Lin; Chien-Hua Shih; Cuen-Chao Huang; Wei-Lun Hsu; Jenn-Kang Hwang

Recently, we have developed a method (Shih et al., Proteins: Structure, Function, and Bioinformatics 2007;68: 34–38) to compute correlation of fluctuations of proteins. This method, referred to as the protein fixed‐point (PFP) model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the PFP model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model will be applied. In this article, we have carried out the comprehensive analysis of the PFP model for a dataset comprising 972 high‐resolution X‐ray structures with pairwise sequence identity ≤25%. We found that in most cases the PFP model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the resultant correlation coefficient between the computed and the X‐ray structural B‐factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient ≥0.5. Our result shows that the fixed‐point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins. Proteins 2008.

PLOS ONE | 2011

Identification of Antifreeze Proteins and Their Functional Residues by Support Vector Machine and Genetic Algorithms based on n-Peptide Compositions

Chin-Sheng Yu; Chih-Hao Lu

For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Journal of Chemical Information and Modeling | 2016

MIB: Metal Ion-Binding Site Prediction and Docking Server

Yu-Feng Lin; Chih-Wen Cheng; Chung-Shiuan Shih; Jenn-Kang Hwang; Chin-Sheng Yu; Chih-Hao Lu

The structure of a protein determines its biological function(s) and its interactions with other factors; the binding regions tend to be conserved in sequence and structure, and the interacting residues involved are usually in close 3D space. The Protein Data Bank currently contains more than 110 000 protein structures, approximately one-third of which contain metal ions. Identifying and characterizing metal ion-binding sites is thus essential for investigating a proteins function(s) and interactions. However, experimental approaches are time-consuming and costly. The web server reported here was built to predict metal ion-binding residues and to generate the predicted metal ion-bound 3D structure. Binding templates have been constructed for regions that bind 12 types of metal ion-binding residues have been used to construct binding templates. The templates include residues within 3.5 Å of the metal ion, and the fragment transformation method was used for structural comparison between query proteins and templates without any data training. Through the adjustment of scoring functions, which are based on the similarity of structure and binding residues. Twelve kinds of metal ions (Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, Zn2+, Cd2+, Fe2+, Ni2+, Hg2+, Co2+, and Cu+) binding residues prediction are supported. MIB also provides the metal ions docking after prediction. The MIB server is available at http://bioinfo.cmu.edu.tw/MIB/ .

Proteins | 2006

The fragment transformation method to detect the protein structural motifs.

Chih-Hao Lu; Yeong-Shin Lin; Yu-Ching Chen; Chin-Sheng Yu; Shi-Yu Chang; Jenn-Kang Hwang

To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp‐His‐Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the ββα‐metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the ββα‐metal binding motif and the treble clef motif. The ββα‐metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc‐binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions. Proteins 2006.

BioMed Research International | 2014

EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction

Chih-Hao Lu; Chin-Sheng Yu; Yu-Tung Chien; Shao-Wei Huang

We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.

Explore More