Hoai Bach Nguyen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoai Bach Nguyen is active.

Explore More

Publication

Featured researches published by Hoai Bach Nguyen.

congress on evolutionary computation | 2014

Filter based backward elimination in wrapper based PSO for feature selection in classification

Hoai Bach Nguyen; Bing Xue; Ivy Liu; Mengjie Zhang

The advances in data collection increase the dimensionality of the data (i.e. the total number of features) in many fields, which arises a challenge to many existing feature selection approaches. This paper develops a new feature selection approach based on particle swarm optimisation (PSO) and a local search that mimics the typical backward elimination feature selection method. The proposed algorithm uses a wrapper based fitness function, i.e. the classification error rate. The local search is performed only on the global best and uses a filter based measure, which aims to take the advantages of both filter and wrapper approaches. The proposed approach is tested and compared with three recent PSO based feature selection algorithms and two typical traditional feature selection methods. Experiments on eight benchmark datasets show that the proposed algorithm can be successfully used to select a significantly smaller number of features and simultaneously improve the classification performance over using all features. The proposed approach outperforms the three PSO based algorithms and the two traditional methods.

european conference on applications of evolutionary computation | 2015

Gaussian Transformation Based Representation in Particle Swarm Optimisation for Feature Selection

Hoai Bach Nguyen; Bing Xue; Ivy Liu; Peter Andreae; Mengjie Zhang

In classification, feature selection is an important but challenging task, which requires a powerful search technique. Particle swarm optimisation (PSO) has recently gained much attention for solving feature selection problems, but the current representation typically forms a high-dimensional search space. A new representation based on feature clusters was recently proposed to reduce the dimensionality and improve the performance, but it does not form a smooth fitness landscape, which may limit the performance of PSO. This paper proposes a new Gaussian based transformation rule for interpreting a particle as a feature subset, which is combined with the feature cluster based representation to develop a new PSO-based feature selection algorithm. The proposed algorithm is examined and compared with two recent PSO-based algorithms, where the first uses a Gaussian based updating mechanism and the conventional representation, and the second uses the feature cluster representation without using Gaussian distribution. Experiments on commonly used datasets of varying difficulty show that the proposed algorithm achieves better performance than the other two algorithms in terms of the classification performance and the number of features in both the training sets and the test sets. Further analyses show that the Gaussian transformation rule improves the stability, i.e. selecting similar features in different independent runs and almost always selects the most important features.

simulated evolution and learning | 2014

PSO and Statistical Clustering for Feature Selection: A New Representation

Hoai Bach Nguyen; Bing Xue; Ivy Liu; Mengjie Zhang

Classification tasks often involve a large number of features, where irrelevant or redundant features may reduce the classification performance. Such tasks typically requires a feature selection process to choose a small subset of relevant features for classification. This paper proposes a new representation in particle swarm optimisation PSO to utilise statistical clustering information to solve feature selection problems. The proposed algorithm is examined and compared with two conventional feature selection algorithms and two existing PSO based algorithms on eight benchmark datasets of varying difficulty. The experimental results show that the proposed algorithm can be successfully used for feature selection to considerably reduce the number of features and achieve similar or significantly higher classification accuracy than using all features. It achieves significantly better classification accuracy than one conventional method although the number of features is larger. Compared with the other conventional method and the two PSO methods, the proposed algorithm achieves better performance in terms of both the classification performance and the number of features.

Evolutionary Intelligence | 2016

Mutual information for feature selection: estimation or counting?

Hoai Bach Nguyen; Bing Xue; Peter Andreae

In classification, feature selection is an important pre-processing step to simplify the dataset and improve the data representation quality, which makes classifiers become better, easier to train, and understand. Because of an ability to analyse non-linear interactions between features, mutual information has been widely applied to feature selection. Along with counting approaches, a traditional way to calculate mutual information, many mutual information estimations have been proposed to allow mutual information to work directly on continuous datasets. This work focuses on comparing the effect of counting approach and kernel density estimation (KDE) approach in feature selection using particle swarm optimisation as a search mechanism. The experimental results on 15 different datasets show that KDE can work well on both continuous and discrete datasets. In addition, feature subsets evolved by KDE achieves similar or better classification performance than the counting approach. Furthermore, the results on artificial datasets with various interactions show that KDE is able to capture correctly the interaction between features, in both relevance and redundancy, which can not be achieved by using the counting approach.

soft computing | 2016

New mechanism for archive maintenance in PSO-based multi-objective feature selection

Hoai Bach Nguyen; Bing Xue; Ivy Liu; Peter Andreae; Mengjie Zhang

In classification problems, a large number of features are typically used to describe the problem’s instances. However, not all of these features are useful for classification. Feature selection is usually an important pre-processing step to overcome the problem of “curse of dimensionality”. Feature selection aims to choose a small number of features to achieve similar or better classification performance than using all features. This paper presents a particle swarm Optimization (PSO)-based multi-objective feature selection approach to evolving a set of non-dominated feature subsets which achieve high classification performance. The proposed algorithm uses local search techniques to improve a Pareto front and is compared with a pure multi-objective PSO algorithm, three well-known evolutionary multi-objective algorithms and a current state-of-the-art PSO-based multi-objective feature selection approach. Their performances are examined on 12 benchmark datasets. The experimental results show that in most cases, the proposed multi-objective algorithm generates better Pareto fronts than all other methods.

european conference on applications of evolutionary computation | 2017

Surrogate-Model Based Particle Swarm Optimisation with Local Search for Feature Selection in Classification

Hoai Bach Nguyen; Bing Xue; Peter Andreae

Evolutionary computation (EC) techniques have been applied widely to many problems because of their powerful search ability. However, EC based algorithms are usually computationally intensive, especially with an expensive fitness function. In order to solve this issue, many surrogate models have been proposed to reduce the computation time by approximating the fitness function, but they are hardly applied to EC based feature selection. This paper develops a surrogate model for particle swarm optimisation based wrapper feature selection by selecting a small number of instances to create a surrogate training set. Furthermore, based on the surrogate model, we propose a sampling local search, which improves the current best solution by utilising information from the previous evolutionary iterations. Experiments on 10 datasets show that the surrogate training set can reduce the computation time without affecting the classification performance. Meanwhile the sampling local search results in a significantly smaller number of features, especially on large datasets. The combination of the two proposed ideas successfully reduces the number of features and achieves better performance than using all features, a recent sequential feature selection algorithm, original PSO, and PSO with one of them only on most datasets.

european conference on applications of evolutionary computation | 2016

Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

Hoai Bach Nguyen; Bing Xue; Peter Andreae

Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.

Proceedings of the Second Australasian Conference on Artificial Life and Computational Intelligence - Volume 9592 | 2016

A Subset Similarity Guided Method for Multi-objective Feature Selection

Hoai Bach Nguyen; Bing Xue; Mengjie Zhang

This paper presents a particle swarm optimisation PSO based multi-objective feature selection method for evolving a set of non-dominated feature subsets and achieving high classification performance. Firstly, a multi-objective PSO named MOPSO-SRD algorithm, is applied to solve feature selection problems. The results of this algorithm are then used to compare with the proposed multi-objective PSO algorithm, called MOPSO-SiD. MOPSO-SiD is specifically designed for feature selection problems, in which a subset similarity distance measure distance in the solution space is used to select a leader for each particle in the swarm. This distance measure is also used to update the archive set, which will be the final solutions returned by the MOPSO-SiD algorithm. The results show that both algorithms successfully evolve a set of non-dominated solutions, which include a small number of features while achieving similar or better performance than using all features. In addition, in most case MOPSO-SiD selects smaller feature subsets than MOPSO-SRD, and outperforms single objective PSO for feature selection and a traditional feature selection method.

Memetic Computing | 2018

PSO with surrogate models for feature selection: static and dynamic clustering-based methods

Hoai Bach Nguyen; Bing Xue; Peter Andreae

Feature selection is an important but often expensive process, especially with a large number of instances. This problem can be addressed by using a small training set, i.e. a surrogate set. In this work, we propose to use a hierarchical clustering method to build various surrogate sets, which allows to analyze the effect of surrogate sets with different qualities and quantities on the feature subsets. Further, a dynamic surrogate model is proposed to automatically adjust surrogate sets for different datasets. Based on this idea, a feature selection system is developed using particle swarm optimization as the search mechanism. The experiments show that the hierarchical clustering method can build better surrogate sets to reduce the computational time, improve the feature selection performance, and alleviate overfitting. The dynamic method can automatically choose suitable surrogate sets to further improve the classification accuracy.

simulated evolution and learning | 2017

A Hybrid GA-GP Method for Feature Reduction in Classification

Hoai Bach Nguyen; Bing Xue; Peter Andreae

Feature reduction is an important pre-processing step in classification and other artificial intelligent applications. Its aim is to improve the quality of feature sets. There are two main types of feature reduction: feature construction and feature selection. Most current feature reduction algorithms focus on just one of the two types because they require different representations. This paper proposes a new representation which supports a feature reduction algorithm that combines feature selection and feature construction. The algorithm uses new genetic operators to update the new representation. The proposed algorithm is compared with two conventional feature selection algorithms, a genetic algorithms-based feature selection algorithm, and a genetic programming-based algorithm which evolves feature sets containing both original and high-level features. The experimental results on 10 different datasets show that the new representation can help to produce a smaller number of features and improve the classification accuracy over using all features on most datasets. In comparison with other feature selection or construction algorithms, the proposed algorithm achieves similar or better classification performance on all datasets.

Explore More