Gongde Guo
Fujian Normal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gongde Guo.
cooperative information systems | 2003
Gongde Guo; Hui Wang; David Bell; Yaxin Bi; Kieran Greer
The k-Nearest-Neighbours (kNN) is a simple but effective method for classification. The major drawbacks with respect to kNN are (1) its low efficiency – being a lazy learning method prohibits it in many applications such as dynamic web mining for a large repository, and (2) its dependency on the selection of a “good value” for k. In this paper, we propose a novel kNN type method for classification that is aimed at overcoming these shortcomings. Our method constructs a kNN model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied for different data, and is optimal in terms of classification accuracy. The construction of the model reduces the dependency on k and makes classification faster. Experiments were carried out on some public datasets collected from the UCI machine learning repository in order to test our method. The experimental results show that the kNN based model compares well with C5.0 and kNN in terms of classification accuracy, but is more efficient than the standard kNN.
soft computing | 2006
Gongde Guo; Hui Wang; David Bell; Yaxin Bi; Kieran Greer
An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.
conference on intelligent text processing and computational linguistics | 2004
Gongde Guo; Hui Wang; David Bell; Yaxin Bi; Kieran Greer
An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.
modeling decisions for artificial intelligence | 2004
Yaxin Bi; David Bell; Hui Wang; Gongde Guo; Kieran Greer
In this paper, we present an investigation into the combination of four different classification methods for text categorization using Dempster’s rule of combination. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first present an approach for effectively combining the different classification methods. We then apply these methods to a benchmark data collection of 20-newsgroup, individually and in combination. Our experimental results show that the performance of the best combination of the different classifiers on the 10 groups of the benchmark data can achieve 91.07% classification accuracy, which is 2.68% better than that of the best individual method, SVM, on average.
International Journal of Machine Learning and Cybernetics | 2012
Gongde Guo; Si Chen; Lifei Chen
Traditional clustering algorithms are often defeated by high dimensionality. In order to find clusters hiding in different subspaces, soft subspace clustering has become an effective means of dealing with high dimensional data. However, most existing soft subspace clustering algorithms contain parameters which are difficult to be determined by users in real-world applications. A new soft subspace clustering algorithm named SC-IFWSA is proposed, which uses an improved feature weight self-adjustment mechanism IFWSA to update adaptively the weights of all features for each cluster according to the importance of the features to clustering quality and does not require users to set any parameter values. In addition, SC-IFWSA can overcome the traditional FWSA mechanism which may fail to calculate feature weights in some particular cases. In comparison with its related approaches, the experimental results carried out on ten data sets demonstrate the effectiveness and feasibility of the proposed method.
Applied Artificial Intelligence | 2007
Yaxin Bi; David Bell; Hui Wang; Gongde Guo; Jiwen Guan
In this paper we investigate the combination of four machine learning methods for text categorization using Dempsters rule of combination. These methods include Support Vector Machine (SVM), kNN (Nearest Neighbor), kNN model-based approach (kNNM), and Rocchio. We first present a general representation of the outputs of different classifiers, in particular, modeling it as a piece of evidence by using a novel evidence structure called focal element triplet. Furthermore, we investigate an effective method for combining pieces of evidence derived from classifiers generated by a 10-fold cross-validation. Finally, we evaluate our methods on the 20-newsgroup and Reuters-21578 benchmark data sets and perform the comparative analysis with majority voting in combining multiple classifiers along with the previous result. Our experimental results show that the best combined classifier can improve the performance of the individual classifiers and Dempsters rule of combination outperforms majority voting in combining multiple classifiers.
systems, man and cybernetics | 2005
Gongde Guo; Daniel Neagu
This study focuses on combination schemes of multiple classifiers to achieve better classification performance than that obtained by individual models, for real-world applications such as toxicity prediction of chemical compounds. The classifiers studied include kNN (k-nearest neighbors), wkNN (weighted kNN), kNNModel (kNN model-based classifier), and CPC (contextual probability-based classifier), which are all similarity-based methods. We firstly review these learning methods and the methods for combining the classifiers, and then present three similarity-based combination methods as the basis of our experiments. The experimental results have shown the promise of this approach.
International Journal of Machine Learning and Cybernetics | 2013
Nan Li; Gongde Guo; Lifei Chen; Si Chen
KNNModel algorithm is an improved version for k-nearest neighbor method. However, it has the problem of high time complexity and lower performance when dealing with complex data. An optimal subspace classification method called IKNNModel is proposed in this paper by projecting different training samples onto their own optimal subspace and constructing the corresponding class cluster and pure cluster as the basis of classification. For datasets with complex structure, that is, the training samples from different categories are overlapped with one another on the original space or have a high dimensionality, the proposed method can construct the corresponding clusters for the overlapped samples on their own subspaces easily. Experimental results show that compared with KNNModel, the proposed method not only significantly improves the classification performance on datasets with complex structure, but also improves the efficiency of the classification.
international conference on advances in pattern recognition | 2005
Gongde Guo; Daniel Neagu; Mark T. D. Cronin
This paper proposes a kNN model-based feature selection method aimed at improving the efficiency and effectiveness of the ReliefF method by: (1) using a kNN model as the starter selection, aimed at choosing a set of more meaningful representatives to replace the original data for feature selection; (2) integration of the Heterogeneous Value Difference Metric to handle heterogeneous applications – those with both ordinal and nominal features; and (3) presenting a simple method of difference function calculation based on inductive information in each representative obtained bykNN model. We have evaluated the performance of the proposed kNN model-based feature selection method on toxicity dataset Phenols with two different endpoints. Experimental results indicate that the proposed feature selection method has a significant improvement in the classification accuracy for the trial dataset.
International Journal of Computational Intelligence and Applications | 2005
Gongde Guo; Daniel Neagu
A robust method, fuzzy kNNModel, for toxicity prediction of chemical compounds is proposed. The method is based on a supervised clustering method, called kNNModel, which employs fuzzy partitioning instead of crisp partitioning to group clusters. The merits of fuzzy kNNModel are two-fold: (1) it overcomes the problems of choosing the parameter e — allowed error rate in a cluster and the parameter N — minimal number of instances covered by a cluster, for each data set; (2) it better captures the characteristics of boundary data by assigning them with different degrees of membership between 0 and 1 to different clusters. The experimental results of fuzzy kNNModel conducted on thirteen public data sets from UCI machine learning repository and seven toxicity data sets from real-world applications, are compared with the results of fuzzy c-means clustering, k-means clustering, kNN, fuzzy kNN, and kNNModel in terms of classification performance. This application shows that fuzzy kNNModel is a promising method for the toxicity prediction of chemical compounds.