Is this you? Create Your Porfile

Guohua Liang

University of Technology, Sydney

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guohua Liang is active.

Explore More

Publication

Featured researches published by Guohua Liang.

australasian joint conference on artificial intelligence | 2011

An empirical study of bagging predictors for imbalanced data with different levels of class distribution

Guohua Liang; Xingquan Zhu; Chengqi Zhang

Research into learning from imbalanced data has increasingly captured the attention of both academia and industry, especially when the class distribution is highly skewed. This paper compares the Area Under the Receiver Operating Characteristic Curve (AUC ) performance of bagging in the context of learning from different imbalanced levels of class distribution. Despite the popularity of bagging in many real-world applications, some questions have not been clearly answered in the existing research, e.g., which bagging predictors may achieve the best performance for applications, and whether bagging is superior to single learners when the levels of class distribution change. We perform a comprehensive evaluation of the AUC performance of bagging predictors with 12 base learners at different imbalanced levels of class distribution by using a sampling technique on 14 imbalanced data-sets. Our experimental results indicate that Decision Table (DTable) and RepTree are the learning algorithms with the best bagging AUC performance. Most AUC performances of bagging predictors are statistically superior to single learners, except for Support Vector Machines (SVM) and Decision Stump (DStump).

australasian joint conference on artificial intelligence | 2012

A comparative study of sampling methods and algorithms for imbalanced time series classification

Guohua Liang; Chengqi Zhang

Mining time series data and imbalanced data are two of ten challenging problems in data mining research. Imbalanced time series classification (ITSC) involves these two challenging problems, which take place in many real world applications. In the existing research, the structure-preserving over-sampling (SOP) method has been proposed for solving the ITSC problems. It is claimed by its authors to achieve better performance than other over-sampling and state-of-the-art methods in time series classification (TSC). However, it is unclear whether an under-sampling method with various learning algorithms is more effective than over-sampling methods, e.g., SPO for ITSC, because research has shown that under-sampling methods are more effective and efficient than over-sampling methods. We propose a comparative study between an under-sampling method with various learning algorithms and over-sampling methods, e.g. SPO. Statistical tests, the Friedman test and post-hoc test are applied to determine whether there is a statistically significant difference between methods. The experimental results demonstrate that the under-sampling technique with KNN is the most effective method and can achieve results that are superior to the existing complicated SPO method for ITSC.

advanced data mining and applications | 2011

An empirical evaluation of bagging with different algorithms on imbalanced data

Guohua Liang; Chengqi Zhang

This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR ), Geometric mean (G-mean ) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC ) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets.

International Journal of Machine Learning and Cybernetics | 2014

The effect of varying levels of class distribution on bagging for different algorithms: An empirical study

Guohua Liang; Xingquan Zhu; Chengqi Zhang

Many real world applications involve highly imbalanced class distribution. Research into learning from imbalanced class distribution is considered to be one of ten challenging problems in data mining research, and it has increasingly captured the attention of both academia and industry. In this work, we study the effects of different levels of imbalanced class distribution on bagging predictors by using under-sampling techniques. Despite the popularity of bagging in many real-world applications, some questions have not been clearly answered in the existing research, such as the effect of varying the levels of class distribution on different bagging predictors, e.g., whether bagging is superior to single learners when the levels of class distribution change. Most classification learning algorithms are designed to maximize the overall accuracy rate and assume that training instances are uniformly distributed; however, the overall accuracy does not represent correct prediction on the minority class, which is the class of interest to users. The overall accuracy metric is therefore ineffective for evaluating the performance of classifiers in extremely imbalanced data. This study investigates the effect of varying levels of class distribution on different bagging predictors based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) as a performance metric, using an under-sampling technique on 14 data-sets with imbalanced class distributions. Our experimental results indicate that Decision Table (DTable) and RepTree are the learning algorithms with the best bagging AUC performance. The AUC performances of bagging predictors are statistically superior to single learners, with the exception of Support Vector Machines (SVM) and Decision Stump (DStump).

conference on information and knowledge management | 2012

An efficient and simple under-sampling technique for imbalanced time series classification

Guohua Liang; Chengqi Zhang

Imbalanced time series classification (TSC) involving many real-world applications has increasingly captured attention of researchers. Previous work has proposed an intelligent-structure preserving over-sampling method (SPO), which the authors claimed achieved better performance than other existing over-sampling and state-of-the-art methods in TSC. The main disadvantage of over-sampling methods is that they significantly increase the computational cost of training a classification model due to the addition of new minority class instances to balance data-sets with high dimensional features. These challenging issues have motivated us to find a simple and efficient solution for imbalanced TSC. Statistical tests are applied to validate our conclusions. The experimental results demonstrate that this proposed simple random under-sampling technique with SVM is efficient and can achieve results that compare favorably with the existing complicated SPO method for imbalanced TSC.

australasian joint conference on artificial intelligence | 2013

An Effective Method for Imbalanced Time Series Classification: Hybrid Sampling

Guohua Liang

Most traditional supervised classification learning algorithms are ineffective for highly imbalanced time series classification, which has received considerably less attention than imbalanced data problems in data mining and machine learning research. Bagging is one of the most effective ensemble learning methods, yet it has drawbacks on highly imbalanced data. Sampling methods are considered to be effective to tackle highly imbalanced data problem, but both over-sampling and under-sampling have disadvantages; thus it is unclear which sampling schema will improve the performance of bagging predictor for solving highly imbalanced time series classification problems. This paper has addressed the limitations of existing techniques of the over-sampling and under-sampling, and proposes a new approach, hybrid sampling technique to enhance bagging, for solving these challenging problems. Comparing this new approach with previous approaches, over-sampling, SPO and under-sampling with various learning algorithms on benchmark data-sets, the experimental results demonstrate that this proposed new approach is able to dramatically improve on the performance of previous approaches. Statistical tests, Friedman test and Post-hoc Nemenyi test are used to draw valid conclusions.

international conference on information technology in medicine and education | 2009

Design an applied student model for Intelligent Tutoring System

Lizhen Liu; Minhua Wu; Hua Wang; Guohua Liang

Student model is a very important and timeconsuming process in Intelligent Tutoring System. Furthermore, for many Web based educational systems, student personality is hard to apply explicitly. To address this problem, we proposed an approach for combining fuzzy compositive evaluation with Bayesian network to design an applied student model. This article describes improvements made to the method and its application to make corresponding tutoring strategy by reasoning student action that supports useful tutoring services in a practical tutoring system. The proposed framework has been integrated within the Discrete Mathematics Tutor System. We obtain results of an experiment that shows the benefit of the integration of the way.

web intelligence | 2008

A Learning Process Using SVMs for Multi-agents Decision Classification

Yanshan Xiao; Feiqi Deng; Bo Liu; Shouqiang Liu; Dan Luo; Guohua Liang

In order to resolve decision classification problem in multiple agents system, this paper first introduces the architecture of multiple agents system. It then proposes a support vector machines based assessment approach, which has the ability to learn the rules form previous assessment results from domain experts. Finally, the experiment are conducted on the artificially dataset to illustrate how the proposed works, and the results show the proposed method has effective learning ability for decision classification problems.

national conference on artificial intelligence | 2011