Is this you? Create Your Porfile

Guoji Zhang

South China University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoji Zhang is active.

Explore More

Publication

Featured researches published by Guoji Zhang.

Pattern Recognition | 2012

Semi-supervised classification based on random subspace dimensionality reduction

Guoxian Yu; Guoji Zhang; Carlotta Domeniconi; Zhiwen Yu; Jane You

Graph structure is vital to graph based semi-supervised learning. However, the problem of constructing a graph that reflects the underlying data distribution has been seldom investigated in semi-supervised learning, especially for high dimensional data. In this paper, we focus on graph construction for semi-supervised learning and propose a novel method called Semi-Supervised Classification based on Random Subspace Dimensionality Reduction, SSC-RSDR in short. Different from traditional methods that perform graph-based dimensionality reduction and classification in the original space, SSC-RSDR performs these tasks in subspaces. More specifically, SSC-RSDR generates several random subspaces of the original space and applies graph-based semi-supervised dimensionality reduction in these random subspaces. It then constructs graphs in these processed random subspaces and trains semi-supervised classifiers on the graphs. Finally, it combines the resulting base classifiers into an ensemble classifier. Experimental results on face recognition tasks demonstrate that SSC-RSDR not only has superior recognition performance with respect to competitive methods, but also is robust against a wide range of values of input parameters.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013

Protein Function Prediction using Multi-label Ensemble Classification

Guoxian Yu; Huzefa Rangwala; Carlotta Domeniconi; Guoji Zhang; Zhiwen Yu

High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.

knowledge discovery and data mining | 2012

Transductive multi-label ensemble classification for protein function prediction

Guoxian Yu; Carlotta Domeniconi; Huzefa Rangwala; Guoji Zhang; Zhiwen Yu

Advances in biotechnology have made available multitudes of heterogeneous proteomic and genomic data. Integrating these heterogeneous data sources, to automatically infer the function of proteins, is a fundamental challenge in computational biology. Several approaches represent each data source with a kernel (similarity) function. The resulting kernels are then integrated to determine a composite kernel, which is used for developing a function prediction model. Proteins are also found to have multiple roles and functions. As such, several approaches cast the protein function prediction problem within a multi-label learning framework. In our work we develop an approach that takes advantage of several unlabeled proteins, along with multiple data sources and multiple functions of proteins. We develop a graph-based transductive multi-label classifier (TMC) that is evaluated on a composite kernel, and also propose a method for data integration using the ensemble framework, called transductive multi-label ensemble classifier (TMEC). The TMEC approach trains a graph-based multi-label classifier for each individual kernel, and then combines the predictions of the individual models. Our contribution is the use of a bi-relational directed graph that captures relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the ability of TMC and TMEC to predict the functions of proteins by using two yeast datasets. We show that our approach performs better than recently proposed protein function prediction methods on composite and multiple kernels.

Applied Soft Computing | 2012

Semi-supervised ensemble classification in subspaces

Guoxian Yu; Guoji Zhang; Zhiwen Yu; Carlotta Domeniconi; Jane You; Guoqiang Han

Graph-based semi-supervised classification depends on a well-structured graph. However, it is difficult to construct a graph that faithfully reflects the underlying structure of data distribution, especially for data with a high dimensional representation. In this paper, we focus on graph construction and propose a novel method called semi-supervised ensemble classification in subspaces, SSEC in short. Unlike traditional methods that execute graph-based semi-supervised classification in the original space, SSEC performs semi-supervised linear classification in subspaces. More specifically, SSEC first divides the original feature space into several disjoint feature subspaces. Then, it constructs a neighborhood graph in each subspace, and trains a semi-supervised linear classifier on this graph, which will serve as the base classifier in an ensemble. Finally, SSEC combines the obtained base classifiers into an ensemble classifier using the majority-voting rule. Experimental results on facial images classification show that SSEC not only has higher classification accuracy than the competitive methods, but also can be effective in a wide range of values of input parameters.

Knowledge and Information Systems | 2015

Semi-supervised classification based on subspace sparse representation

Guoxian Yu; Guoji Zhang; Zili Zhang; Zhiwen Yu; Lin Deng

Graph plays an important role in graph-based semi-supervised classification. However, due to noisy and redundant features in high-dimensional data, it is not a trivial job to construct a well-structured graph on high-dimensional samples. In this paper, we take advantage of sparse representation in random subspaces for graph construction and propose a method called Semi-Supervised Classification based on Subspace Sparse Representation, SSC-SSR in short. SSC-SSR first generates several random subspaces from the original space and then seeks sparse representation coefficients in these subspaces. Next, it trains semi-supervised linear classifiers on graphs that are constructed by these coefficients. Finally, it combines these classifiers into an ensemble classifier by minimizing a linear regression problem. Unlike traditional graph-based semi-supervised classification methods, the graphs of SSC-SSR are data-driven instead of man-made in advance. Empirical study on face images classification tasks demonstrates that SSC-SSR not only has superior recognition performance with respect to competitive methods, but also has wide ranges of effective input parameters.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014

Protein function prediction with incomplete annotations

Guoxian Yu; Huzefa Rangwala; Carlotta Domeniconi; Guoji Zhang; Zhiwen Yu

Automated protein function prediction is one of the grand challenges in computational biology. Multi-label learning is widely used to predict functions of proteins. Most of multi-label learning methods make prediction for unlabeled proteins under the assumption that the labeled proteins are completely annotated, i.e., without any missing functions. However, in practice, we may have a subset of the ground-truth functions for a protein, and whether the protein has other functions is unknown. To predict protein functions with incomplete annotations, we propose a Protein Function Prediction method with Weak-label Learning (ProWL) and its variant ProWL-IF. Both ProWL and ProWL-IF can replenish the missing functions of proteins. In addition, ProWL-IF makes use of the knowledge that a protein cannot have certain functions, which can further boost the performance of protein function prediction. Our experimental results on protein-protein interaction networks and gene expression benchmarks validate the effectiveness of both ProWL and ProWL-IF.

Neurocomputing | 2012

Local and global structure preserving based feature selection

Yazhou Ren; Guoji Zhang; Guoxian Yu; Xuan Li

Feature selection is of great importance in data mining tasks, especially for exploring high dimensional data. Laplacian Score, a recently proposed feature selection method, makes use of local manifold structure of samples to select features and achieves good performance. However, it ignores the global structure of samples and the selected features are of high redundancy. To address these issues, we propose a feature selection method based on local and global structure preserving, LGFS in short. LGFS first uses two graphs, nearest neighborhood graph and farthest neighborhood graph to describe the underlying local and global structure of samples, respectively. It then defines a criterion to prefer the features which have good ability on local and global structure preserving. To remove redundancy among the selected features, Extended LGFS (E-LGFS) is introduced by taking advantage of normalized mutual information to measure the dependency between a pair of features. We conduct extensive experiments on two artificial data sets, six UCI data sets and two public available face databases to evaluate LGFS and E-LGFS. The experimental results show our methods can achieve higher accuracies than other unsupervised comparing methods.

ambient intelligence | 2015

Image encryption algorithm with compound chaotic maps

Xuan Li; Guoji Zhang; Xia-Yan Zhang

AbstractThis paper proposes a novel image encryption scheme based on two even-symmetric chaotic maps and a skew tent chaotic map. In the permutation process, a P-box produced by sorting an even-symmetric chaotic sequence is applied to shuffle the positions of all image pixels. In the diffusion process, both even-symmetric chaotic map and skew tent map are used to generate the key stream. The pixels in the permuted image determine which of two even-symmetric chaotic maps is iterated for next byte in the keystream each time, so the keystream is closely related to the plain image. The performance and security of the proposed method are evaluated thoroughly histogram, correlation of adjacent pixels, information entropy and sensitivity analysis. Results are encouraging and suggest that the scheme is reliable to be adopted for the secure image communication application.

international conference on data mining | 2013

Weighted-Object Ensemble Clustering

Yazhou Ren; Carlotta Domeniconi; Guoji Zhang; Guoxian Yu

Ensemble clustering, also known as consensus clustering, aims to generate a stable and robust clustering through the consolidation of multiple base clusterings. In recent years many ensemble clustering methods have been proposed, most of which treat each clustering and each object as equally important. Some approaches make use of weights associated with clusters, or with clusterings, when assembling the different base clusterings. Boosting algorithms developed for classification have also led to the idea of considering weighted objects during the clustering process. However, not much effort has been put towards incorporating weighted objects into the consensus process. To fill this gap, in this paper we propose an approach called Weighted-Object Ensemble Clustering (WOEC). We first estimate how difficult it is to cluster an object by constructing the co-association matrix that summarizes the base clustering results, and we then embed the corresponding information as weights associated to objects. We propose three different consensus techniques to leverage the weighted objects. All three reduce the ensemble clustering problem to a graph partitioning one. We present extensive experimental results which demonstrate that our WOEC approach outperforms state-of-the-art consensus clustering methods and is robust to parameter settings.

Knowledge and Information Systems | 2017

Weighted-object ensemble clustering: methods and analysis

Yazhou Ren; Carlotta Domeniconi; Guoji Zhang; Guoxian Yu

Ensemble clustering has attracted increasing attention in recent years. Its goal is to combine multiple base clusterings into a single consensus clustering of increased quality. Most of the existing ensemble clustering methods treat each base clustering and each object as equally important, while some approaches make use of weights associated with clusters, or to clusterings, when assembling the different base clusterings. Boosting algorithms developed for classification have led to the idea of considering weighted objects during the clustering process. However, not much effort has been put toward incorporating weighted objects into the consensus process. To fill this gap, in this paper, we propose a framework called Weighted-Object Ensemble Clustering (WOEC). We first estimate how difficult it is to cluster an object by constructing the co-association matrix that summarizes the base clustering results, and we then embed the corresponding information as weights associated with objects. We propose three different consensus techniques to leverage the weighted objects. All three reduce the ensemble clustering problem to a graph partitioning one. We experimentally demonstrate the gain in performance that our WOEC methodology achieves with respect to state-of-the-art ensemble clustering methods, as well as its stability and robustness.

Explore More