Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guo-Sheng Han is active.

Publication


Featured researches published by Guo-Sheng Han.


Journal of Theoretical Biology | 2014

A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC

Guo-Sheng Han; Zu-Guo Yu; Vo Anh

Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php.


International Journal of Molecular Sciences | 2010

Proper distance metrics for phylogenetic analysis using complete genomes without sequence alignment.

Zu-Guo Yu; Xiao-Wen Zhan; Guo-Sheng Han; Roger Wei Wang; Vo Anh; Ka Hou Chu

A shortcoming of most correlation distance methods based on the composition vectors without alignment developed for phylogenetic analysis using complete genomes is that the “distances” are not proper distance metrics in the strict mathematical sense. In this paper we propose two new correlation-related distance metrics to replace the old one in our dynamical language approach. Four genome datasets are employed to evaluate the effects of this replacement from a biological point of view. We find that the two proper distance metrics yield trees with the same or similar topologies as/to those using the old “distance” and agree with the tree of life based on 16S rRNA in a majority of the basic branches. Hence the two proper correlation-related distance metrics proposed here improve our dynamical language approach for phylogenetic analysis.


Computational Biology and Chemistry | 2015

Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization

Zhi-Qin Zhao; Guo-Sheng Han; Zu-Guo Yu; Jinyan Li

Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request.


Molecular Phylogenetics and Evolution | 2015

Whole-proteome based phylogenetic tree construction with inter-amino-acid distances and the conditional geometric distribution profiles.

Xian-Hua Xie; Zu-Guo Yu; Guo-Sheng Han; Wei-Feng Yang; Vo Anh

There has been a growing interest in alignment-free methods for whole genome comparison and phylogenomic studies. In this study, we propose an alignment-free method for phylogenetic tree construction using whole-proteome sequences. Based on the inter-amino-acid distances, we first convert the whole-proteome sequences into inter-amino-acid distance vectors, which are called observed inter-amino-acid distance profiles. Then, we propose to use conditional geometric distribution profiles (the distributions of sequences where the amino acids are placed randomly and independently) as the reference distribution profiles. Last the relative deviation between the observed and reference distribution profiles is used to define a simple metric that reflects the phylogenetic relationships between whole-proteome sequences of different organisms. We name our method inter-amino-acid distances and conditional geometric distribution profiles (IAGDP). We evaluate our method on two data sets: the benchmark dataset including 29 genomes used in previous published papers, and another one including 67 mammal genomes. Our results demonstrate that the new method is useful and efficient.


fuzzy systems and knowledge discovery | 2009

Distinguishing Coding from Non-coding Sequences in a Prokaryote Complete Genome Based on the Global Descriptor

Guo-Sheng Han; Zu-Guo Yu; Vo Anh; Raymond H. Chan

Recognition of coding sequences in a complete genome is animportant problem in DNA sequence analysis. Their rapid and accurate recognition contributes to various relevant research and application. In this paper, we aim to distinguish the coding sequences from the non-coding sequences in a prokaryote complete genome. We select a data set of 51 available bacterial genomes. Then, we use the global descriptor method on the coding/non-coding primary sequences and obtain 36 parameters for each coding/non-coding primary sequence. These parameters are used to generate some spaces, whose points represent coding/non-coding sequences in our selected data set. In order to evaluate this method, we perform Fishers linear discriminant algorithm on it and get relative satisfactory discriminant accuracies. The average accuracies of the global descriptor method (36 parameters) for the training and test sets are 97.81% and 97.49%, respectively. Finally, a comparison with Z curve methods using the same data set is undertaken. When we combine our method with the Z curve method, higher accuracies are obtained. This good performance indicates that the global descriptor method of this paper may complement the existing methods for the gene finding problem.


Current Bioinformatics | 2014

Secondary Structure Element Alignment Kernel Method for Prediction of Protein Structural Classes

Guo-Sheng Han; Zu-Guo Yu; Vo Anh

In this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.


fuzzy systems and knowledge discovery | 2013

Effects of amino acid classification on prediction of protein structural classes

Zhi Mao; Guo-Sheng Han; Ting-Ting Wang

We use the Lempel-Ziv complexity method to investigate effects of amino acid classification on prediction of protein structural classes. First, we find that contributions of amino acid classification are differential for predicting protein structural classes and even the performances of some amino acid classification are better than that without using the amino acid classification. This inspires us to observe whether the combination of amino acid classification can improve the performance for predicting protein structural classes. Finally, we convert each Lempel-Ziv complexity distance matrix into a novel kernel matrix and then use Bayesian multiple kernel learning to combine all kernels. Our method is tested on four benchmark datasets and outperforms previous methods consistently. This suggests that our proposed method is valuable for predicting protein structural classes.


international conference on natural computation | 2012

Some comparison on whole-proteome phylogeny of large dsDNA viruses based on dynamical language approach and feature frequency profiles method

Li-Qian Zhou; Zu-Guo Yu; Guo-Sheng Han; Guang-ming Zhou; Desheng Wang

There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among them, CVTree method, feature frequency profiles method and dynamical language approach were used to investigate the whole-proteome phylogeny of large dsDNA viruses. Using the data set of large dsDNA viruses from Gao and Qi (BMC Evol. Biol. 2007), the phylogenetic results based on the CVTree method and the dynamical language approach were compared in Yu et al. (BMC Evol. Biol. 2010). In this paper, we first apply dynamical language approach to the data set of large dsDNA viruses from Wu et al. (Proc. Natl. Acad. Sci. USA 2009) and compare our phylogenetic results with those based on the feature frequency profiles method. Then we construct the whole-proteome phylogeny of the larger dataset combining the above two data sets. According to the report of The International Committee on the Taxonomy of Viruses (ICTV), the trees from our analyses are in good agreement to the latest classification of large dsDNA viruses.


Current Bioinformatics | 2016

Protein Folding Kinetic Order Prediction from Amino Acid Sequence Based on Horizontal Visibility Network

Zhi-Qin Zhao; Zu-Guo Yu; Vo Anh; Jing-Yang Wu; Guo-Sheng Han


EasyChair Preprints | 2018

A new method for identification of pre-microRNAs based on hybrid features

Yuan-Lin Ma; Zu-Guo Yu; Guo-Sheng Han; Vo Anh

Collaboration


Dive into the Guo-Sheng Han's collaboration.

Top Co-Authors

Avatar

Vo Anh

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bo Li

Xiangtan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge