Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Changhui Yan is active.

Publication


Featured researches published by Changhui Yan.


BMC Bioinformatics | 2006

Predicting DNA-binding sites of proteins from amino acid sequence

Changhui Yan; Michael Terribilini; Feihong Wu; Robert L. Jernigan; Drena Dobbs; Vasant G. Honavar

BackgroundUnderstanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.ResultsWe start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions.ConclusionNaïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.


intelligent systems in molecular biology | 2004

A two-stage classifier for identification of protein--protein interface residues

Changhui Yan; Drena Dobbs; Vasant G. Honavar

MOTIVATION The ability to identify protein-protein interaction sites and to detect specific amino acid residues that contribute to the specificity and affinity of protein interactions has important implications for problems ranging from rational drug design to analysis of metabolic and signal transduction networks. RESULTS We have developed a two-stage method consisting of a support vector machine (SVM) and a Bayesian classifier for predicting surface residues of a protein that participate in protein-protein interactions. This approach exploits the fact that interface residues tend to form clusters in the primary amino acid sequence. Our results show that the proposed two-stage classifier outperforms previously published sequence-based methods for predicting interface residues. We also present results obtained using the two-stage classifier on an independent test set of seven CAPRI (Critical Assessment of PRedicted Interactions) targets. The success of the predictions is validated by examining the predictions in the context of the three-dimensional structures of protein complexes.


Journal of Bioinformatics and Computational Biology | 2011

A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY

Marco A. Alvarez; Changhui Yan

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.


BMC Bioinformatics | 2008

Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information

Jing Hu; Changhui Yan

BackgroundAs the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence.ResultsWe compiled a set of 686 features that were derived from protein sequence. For each feature, the distance between the wild-type residue and mutant-type residue was computed. Then a greedy approach was used to select the features that were useful for the classification of SAPs. 10 features were selected. Using the selected features, a decision tree method can achieve 82.6% overall accuracy with 0.607 Matthews Correlation Coefficient (MCC) in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the decision tree method achieves 82.6% overall accuracy with 0.604 MCC. We also evaluated the proposed method on all SAPs obtained from the Swiss-Prot, the method achieves 0.42 MCC with 73.2% overall accuracy. This method allows users to make reliable predictions when protein structures are not available. Different from previous studies, in which only a small set of features were arbitrarily chosen and considered, here we used an automated method to systematically discover useful features from a large set of features well-annotated in public databases.ConclusionThe proposed method is a useful tool for the classification of SAPs, especially, when the structure of the protein is not available.


Protein Journal | 2010

An Analysis of Reentrant Loops

Changhui Yan; Jingru Luo

Reentrant loops are an important structural motif in alpha-helical transmembrane proteins. A reentrant loop is a structural motif that goes only halfway through the membrane and then turns back to the side from which it originates. The question of what causes the reentrant loops to form such a unique topology is still unanswered. In this study, we try to answer this question by analyzing the hydrophobicity distribution on the amino acid sequences of the reentrant loops. Our results show that reentrant loops have very low hydrophobicity around the deepest point buried in the membrane and relative high hydrophobicity close to the membrane surfaces. We speculate that this hydrophobicity distribution is a major force that stabilizes the unique reentrant loop structure. Our results also show that this hydrophobicity distribution results in special patterns on protein sequences, which can be captured using profile hidden Markov models (HMMs). The resulting profile HMMs can detect reentrant loops on protein sequences with high sensitivity and perfect specificity.


Pattern Recognition Letters | 2009

Discrimination of disease-related non-synonymous single nucleotide polymorphisms using multi-scale RBF kernel fuzzy support vector machine

Wen Ju; Juan Shan; Changhui Yan; Heng-Da Cheng

In this paper, we develop a multi-scale RBF kernel fuzzy support vector machine (MSKFSVM) and apply it to the identification of disease-associated non-synonymous single nucleotide polymorphisms (nsSNPs). The experimental results show that the proposed MSKFSVM outperforms the traditional SVM method.


BMC Bioinformatics | 2008

Discrimination of outer membrane proteins with improved performance

Changhui Yan; Jing Hu; Yingfeng Wang

BackgroundOuter membrane proteins (OMPs) perform diverse functional roles in Gram-negative bacteria. Identification of outer membrane proteins is an important task.ResultsThis paper presents a method for distinguishing outer membrane proteins (OMPs) from non-OMPs (that is, globular proteins and inner membrane proteins (IMPs)). First, we calculated the average residue compositions of OMPs, globular proteins and IMPs separately using a training set. Then for each protein from the test set, its distances to the three groups were calculated based on residue composition using a weighted Euclidean distance (WED) approach. Proteins from the test set were classified into OMP versus non-OMP classes based on the least distance. The proposed method can distinguish between OMPs and non-OMPs with 91.0% accuracy and 0.639 Matthews correlation coefficient (MCC). We then improved the method by including homologous sequences into the calculation of residue composition and using a feature-selection method to select the single residue and di-peptides that were useful for OMP prediction. The final method achieves an accuracy of 96.8% with 0.859 MCC. In direct comparisons, the proposed method outperforms previously published methods.ConclusionThe proposed method can identify OMPs with improved performance. It will be very helpful to the discovery of OMPs in a genome scale.


Journal of Biological Chemistry | 2015

Mechanism of N-Acylthiourea-mediated Activation of Human Histone Deacetylase 8 (HDAC8) at Molecular and Cellular Levels

Raushan K. Singh; Kyongshin Cho; Satish K. R. Padi; Junru Yu; Manas K. Haldar; Tanmay Mandal; Changhui Yan; Gregory R. Cook; Bin Guo; Sanku Mallik; D. K. Srivastava

Background: N-Acylthiourea (TM-2-51) is an HDAC8-selective activator. Results: TM-2-51 binds to HDAC8 at two sites in a positive cooperative manner, and it produces anticancer effect in neuroblastoma cells. Conclusion: TM-2-51 modulates the binding thermodynamics/kinetics of substrate/inhibitor to HDAC8, and it enhances the cellular expression of p53/p21. Significance: These mechanistic studies will shed light on designing HDAC-selective activators as potential therapeutic agents. We reported previously that an N-acylthiourea derivative (TM-2-51) serves as a potent and isozyme-selective activator for human histone deacetylase 8 (HDAC8). To probe the molecular mechanism of the enzyme activation, we performed a detailed account of the steady-state kinetics, thermodynamics, molecular modeling, and cell biology studies. The steady-state kinetic data revealed that TM-2-51 binds to HDAC8 at two sites in a positive cooperative manner. Isothermal titration calorimetric and molecular modeling data conformed to the two-site binding model of the enzyme-activator complex. We evaluated the efficacy of TM-2-51 on SH-SY5Y and BE(2)-C neuroblastoma cells, wherein the HDAC8 expression has been correlated with cellular malignancy. Whereas TM-2-51 selectively induced cell growth inhibition and apoptosis in SH-SY5Y cells, it showed no such effects in BE(2)-C cells, and this discriminatory feature appears to be encoded in the p53 genotype of the above cells. Our mechanistic and cellular studies on HDAC8 activation have the potential to provide insight into the development of novel anticancer drugs.


Computational Biology and Chemistry | 2008

Short Communication: A method for discovering transmembrane beta-barrel proteins in Gram-negative bacterial proteomes

Jing Hu; Changhui Yan

Transmembrane beta-barrel (TMB) proteins play pivotal roles in many aspects of bacterial functions. This paper presents a k-nearest neighbor (K-NN) method for discriminating TMB and non-TMB proteins. We start with a method that makes predictions based on a distance computed from residue composition and gradually improve the prediction performance by including homologous sequences and searching for a set of residues and di-peptides for calculating the distance. The final method achieves an accuracy of 97.1%, with 0.876 MCC, 86.4% sensitivity and 98.8% specificity. A web server based on the proposed method is available at http://yanbioinformatics.cs.usu.edu:8080/TMBKNNsubmit.


pacific symposium on biocomputing | 2005

Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and RNA Binding Sites in Rev Proteins of HIV-1 and EIAV Agree with Experimental Data

Michael Terribilini; Jae-Hyung Lee; Changhui Yan; Robert L. Jernigan; Susan Carpenter; Vasant G. Honavar; Drena Dobbs

Protein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the protein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-1 and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events--without the requirement for structural information regarding either the protein or complexes in which it participates--can potentially generate new disease intervention strategies.

Collaboration


Dive into the Changhui Yan's collaboration.

Top Co-Authors

Avatar

Vasant G. Honavar

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wen Cheng

North Dakota State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge