Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hae-Jin Hu is active.

Publication


Featured researches published by Hae-Jin Hu.


IEEE Transactions on Nanobioscience | 2004

Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier

Hae-Jin Hu; Yi Pan; Robert W. Harrison; Phang C. Tai

Prediction of protein secondary structures is an important problem in bioinformatics and has many applications. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). The SVM method is a comparatively new learning system which has mostly been used in pattern recognition problems. In this study, SVM is used as a machine learning tool for the prediction of secondary structure and several encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix, and combined matrix of these, are applied and optimized to improve the prediction accuracy. Also, the optimal window length for six SVM binary classifiers is established by testing different window sizes and our new encoding scheme is tested based on this optimal window size via sevenfold cross validation tests. The results show 2% increase in the accuracy of the binary classifiers when compared with the instances in which the classical orthogonal matrix is used. Finally, to combine the results of the six SVM binary classifiers, a new tertiary classifier which combines the results of one-versus-one binary classifiers is introduced and the performance is compared with those of existing tertiary classifiers. According to the results, the Q/sub 3/ prediction accuracy of new tertiary classifier reaches 78.8% and this is better than the best result reported in the literature.


Journal of Bacteriology | 2011

Nonclassical Protein Secretion by Bacillus subtilis in the Stationary Phase Is Not Due to Cell Lysis

Chun-Kai Yang; Hosam E. Ewis; XiaoZhou Zhang; Chung-Dar Lu; Hae-Jin Hu; Yi Pan; Ahmed T. Abdelal; Phang C. Tai

The carboxylesterase Est55 has been cloned and expressed in Bacillus subtilis strains. Est55, which lacks a classical, cleavable N-terminal signal sequence, was found to be secreted during the stationary phase of growth such that there is more Est55 in the medium than inside the cells. Several cytoplasmic proteins were also secreted in large amounts during late stationary phase, indicating that secretion in B. subtilis is not unique to Est55. These proteins, which all have defined cytoplasmic functions, include GroEL, DnaK, enolase, pyruvate dehydrogenase subunits PdhB and PdhD, and SodA. The release of Est55 and those proteins into the growth medium is not due to gross cell lysis, a conclusion that is supported by several lines of evidence: constant cell density and secretion in the presence of chloramphenicol, constant viability count, the absence of EF-Tu and SecA in the culture medium, and the lack of effect of autolysin-deficient mutants. The shedding of these proteins by membrane vesicles into the medium is minimal. More importantly, we have identified a hydrophobic α-helical domain within enolase that contributes to its secretion. Thus, upon the genetic deletion or replacement of a potential membrane-embedding domain, the secretion of plasmid gene-encoded mutant enolase is totally blocked, while the wild-type chromosomal enolase is secreted normally in the same cultures during the stationary phase, indicating differential specificity. We conclude that the secretion of Est55 and several cytoplasmic proteins without signal peptides in B. subtilis is a general phenomenon and is not a consequence of cell lysis or membrane shedding; instead, their secretion is through a process(es) in which protein domain structure plays a contributing factor.


Expert Systems With Applications | 2006

Transmembrane segments prediction and understanding using support vector machine and decision tree

Jieyue He; Hae-Jin Hu; Robert W. Harrison; Phang C. Tai; Yi Pan

In recent years, there have been many studies focusing on improving the accuracy of prediction of transmembrane segments, and many significant results have been achieved. In spite of these considerable results, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision made is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. While support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction, they are black box models and hard to understand. On the other hand, decision trees provide insightful interpretation, however, they have lower prediction accuracy. In this paper, we present an innovative approach to rule generation for understanding prediction of transmembrane segments by integrating the merits of both SVMs and decision trees. This approach combines SVMs with decision trees into a new algorithm called SVM_DT. The results of the experiments for prediction of transmembrane segments on 165 low-resolution test data set show that not only the comprehensibility of SVM_DT is much better than that of SVMs, but also that the test accuracy of these rules is high as well. Rules with confidence values over 90% have an average prediction accuracy of 93.4%. We also found that confidence and prediction accuracy values of the rules generated by SVM_DT are quite consistent. We believe that SVM_DT can be used not only for transmembrane segments prediction, but also for understanding the prediction. The prediction and its interpretation obtained can be used for guiding biological experiments.


IEEE Transactions on Nanobioscience | 2007

To Be or Not to Be: Predicting Soluble SecAs as Membrane Proteins

Hae-Jin Hu; Jeanetta Holley; Jieyue He; Robert W. Harrison; Hsiuchin Yang; Phang C. Tai; Yi Pan

SecA is an important component of protein translocation in bacteria, and exists in soluble and membrane-integrated forms. Most membrane prediction programs predict SecA as being a soluble protein, with the exception of TMpred and TopPred. However, the membrane associated predicted segments by TMpred and TopPred are inconsistent across bacterial species in spite of high sequence homology. In this paper we describe a new method for membrane protein prediction, PSSM_SVM, which provides consistent results for integral membrane domains of SecAs across bacterial species. This PSSM encoding scheme demonstrates the highest accuracy in terms of Q2 among the common prediction methods, and produces consistent results on blind test data. None of the previously described methods showed this kind of consistency when tested against the same blind test set. This scheme predicts traditional transmembrane segments and most of the soluble proteins accurately. The PSSM scheme applied to the membrane-associated protein SecA shows characteristic features. In the set of 223 known SecA sequences, the PSSM_SVM prediction scheme predicts eight to nine residue embedded membrane segments. This predicted region is part of a 12 residue helix from known X-ray crystal structures of SecAs. This information could be important for determining the structure of SecA proteins in the membrane which have different conformational properties from other transmembrane proteins, as well as other soluble proteins that may similarly integrate into lipid bi-layers.


granular computing | 2006

Hybrid SVM kernels for protein secondary structure prediction

Gulsah Altun; Hae-Jin Hu; Dumitru Brinza; Robert W. Harrison; Alexander Zelikovsky; Yi Pan

The Support Vector Machine is a powerful methodology for solving problems in nonlinear classification, function estimation and density estimation. When data are not linearly separable, data are mapped to a high dimensional future space using a nonlinear function which can be computed through a positive definite kernel in the input space. Using a suitable kernel function for a particular problem and input data can change the prediction results remarkably and improve the accuracy. The goal of this work is to find the best kernel functions that can be applied to different types of data and problems. In this paper, we propose two hybrid kernels SVMSM+RBF and SVMEDIT+RBF. SVMSM+RBF is designed by combining the best performed RBF kernel with substitution matrix (SM) based kernel developed by Vanschoenwinkel and Manderick. SVMEDIT+RBF kernel combines the RBF kernel and the edit kernel devised by Li and Jiang. We tested these two hybrid kernels on one of the widely studied problems in bioinformatics which is the protein secondary structure prediction problem. For the protein secondary structure problem, our results were 91% accuracy on H/E binary classifier.


Rule Extraction from Support Vector Machines | 2008

Rule Extraction from SVM for Protein Structure Prediction

Jieyue He; Hae-Jin Hu; Bernard Chen; Phang C. Tai; Robert W. Harrison; Yi Pan

In recent years, many researches have focused on improving the accuracy of protein structure prediction, and many significant results have been achieved. However, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. The support vector machines (SVMs) have shown better performance than most traditional machine learning approaches in a variety of application areas. However, the SVMs are still black box models. They do not produce comprehensible models that account for the predictions they make. To overcome this limitation, in this chapter, we present two new approaches of rule generation for understanding protein structure prediction. Based on the strong generalization ability of the SVM and the interpretation of the decision tree, one approach combines SVMs with decision trees into a new algorithm called SVM_DT. Another method combines SVMs with association rule (AR) based scheme called SVM_PCPAR. We also provide the method of rule aggregation for a large number of rules to produce the super rules by using conceptual clustering. The results of the experiments for protein structure prediction show that not only the comprehensibility of SVM_DT and SVM_PCPAR are much better than that of SVMs, but also that the test accuracy of these rules is comparable. We believe that SVM_DT and SVM_PCPAR can be used for protein structure prediction, and understanding the prediction as well. The prediction and its interpretation can be used for guiding biological experiments.


computational systems bioinformatics | 2005

Rule clustering and super-rule generation for transmembrane segments prediction

Jieyue He; Bernard Chen; Hae-Jin Hu; Robert W. Harrison; Phang C. Tai; Yisheng Dong; Yi Pan

The explanation of a decision is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. In past research, we have already combined SVM with decision tree to extract rules for understanding transmembrane segments prediction. However, rules we have gotten are as many as about 20,000. This large number of rules makes them difficult for us to interpret their meaning. In this paper, a novel approach of rule clustering (SVM/spl I.bar/DT/spl I.bar/C) for super-rule generation is presented. We use K-means clustering to cluster huge number of rules to generate many new super-rules. The experimental results show that the super-rules produced by SVM/spl I.bar/DT/spl I.bar/C can be analyzed manually by a researcher, and these super-rules are not only new but also achieve very high transmembrane prediction accuracy (exceeding 95%) most of the times.


international symposium on bioinformatics research and applications | 2007

A feature selection algorithm based on graph theory and random forests for protein secondary structure prediction

Gulsah Altun; Hae-Jin Hu; Stefan Gremalschi; Robert W. Harrison; Yi Pan

Protein secondary structure prediction problem is one of the widely studied problems in bioinformatics. Predicting the secondary structure of a protein is an important step for determining its tertiary structure and thus its function. This paper explores the protein secondary structure problem using a novel feature selection algorithm combined with a machine learning approach based on random forests. For feature reduction, we propose an algorithm that uses a graph theoretical approach which finds cliques in the nonposition specific evolutionary profiles of proteins obtained from BLOSUM62. Then, the features selected by this algorithm are used for condensing the position specific evolutionary information obtained from PSI-BLAST. Our results show that we are able to save significant amount of space and time and still achieve high accuracy results even when the features of the data are 25% reduced.


computational intelligence in bioinformatics and computational biology | 2004

Factoring tertiary classification into binary classification improves neural network for protein secondary structure prediction

Wei Zhong; Gulsah Altun; Hae-Jin Hu; Robert W. Harrison; Phang C. Tai; Yi Pan

Protein secondary structure prediction is one of the most important problems in bioinformatics research. When the traditional tertiary classifier is used in our neural network, 72% accuracy is reached. Since the neural network might not work very well in three-class classification for certain domains, the three-class problem is reduced to six binary class problems for the first time to carry out protein secondary structure prediction. With the combination of six binary classifiers, we experiment and test several tertiary classifiers. Additionally, three new tertiary classifiers are proposed in this study: MAX/spl I.bar/HEC, ONE/spl I.bar/TO/spl I.bar/ONE/spl I.bar/MAX and ONE/spl I.bar/TO/spl I.bar/ONE/spl I.bar/VOTE. ONE/spl I.bar/TO/spl I.bar/ONE/spl I.bar/VOTE outperforms the six other experimental tertiary classifiers in this study. ONE/spl I.bar/TO/spl I.bar/ONE/spl I.bar/VOTE tertiary classifier with PSSM encoding scheme obtains 74.02% test accuracy on RS126 dataset. To the best of our knowledge, this is the best result for RS126 dataset with the cross-validation method for neural network. The improvement of prediction accuracy indicates that decomposition of the multiclass problem into several binary class problems may be applied to other areas of computational biology in order to increase generalization power of neural networks.


computational intelligence in bioinformatics and computational biology | 2004

Transmembrane segments prediction with support vector machine based on high performance encoding schemes

Hae-Jin Hu; Robert W. Harrison; Phang C. Tai; Yi Pan

A new prediction scheme of transmembrane segments (TM) was developed based on the support vector machine (SVM). To apply this SVM for prediction more efficiently, three optimization processes were performed: encoding scheme, sliding window size and parameter optimization. From the encoding scheme optimization, position-specific scoring matrix (PSSM) encoding scheme is proved to be the most informative one and the prediction accuracy (Q/sub 2/) with this scheme attained up to 92%. Based on the performance comparison with previous studies, this PSSM encoding scheme demonstrates the highest prediction accuracy among the common prediction methods, and the accuracy improvement is more than 13%. To verify this scheme, the blind test was done with E.coli SecE and E.coli SecY transmembrane proteins, and the result shows a decent match with the SwissProt database information and the TopPred results. However, another blind test result with five SecA proteins leaves room for discussion since it shows about 8-9 residues long TM segments for all five proteins.

Collaboration


Dive into the Hae-Jin Hu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yi Pan

Georgia State University

View shared research outputs
Top Co-Authors

Avatar

Phang C. Tai

Georgia State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gulsah Altun

Georgia State University

View shared research outputs
Top Co-Authors

Avatar

Bernard Chen

University of Central Arkansas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chun-Kai Yang

Georgia State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge