Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Binh Tran is active.

Publication


Featured researches published by Binh Tran.


Memetic Computing | 2016

Genetic programming for feature construction and selection in classification on high-dimensional data

Binh Tran; Bing Xue; Mengjie Zhang

Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.


simulated evolution and learning | 2014

Improved PSO for Feature Selection on High-Dimensional Datasets

Binh Tran; Bing Xue; Mengjie Zhang

Classification on high-dimensional i.e. thousands of dimensions data typically requires feature selection FS as a pre-processing step to reduce the dimensionality. However, FS is a challenging task even on datasets with hundreds of features. This paper proposes a new particle swarm optimisation PSO based FS approach to classification problems with thousands or tens of thousands of features. The proposed algorithm is examined and compared with three other PSO based methods on five high-dimensional problems of varying difficulty. The results show that the proposed algorithm can successfully select a much smaller number of features and significantly increase the classification accuracy over using all features. The proposed algorithm outperforms the other three PSO methods in terms of both the classification performance and the number of features. Meanwhile, the proposed algorithm is computationally more efficient than the other three PSO methods because it selects a smaller number of features and employs a new fitness evaluation strategy.


simulated evolution and learning | 2014

Overview of Particle Swarm Optimisation for Feature Selection in Classification

Binh Tran; Bing Xue; Mengjie Zhang

Feature selection is a process of selecting a subset of relevant features from a large number of original features to achieve similar or better classification performance and improve the computation efficiency. As an important data pre-processing technique, research into feature selection has been carried out over the past four decades. Determining an optimal feature subset is a complicated problem. Due to the limitations of conventional methods, evolutionary computation EC has been proposed to solve feature selection problems. Particle swarm optimisation PSO is an EC technique which recently has caught much interest from researchers in the field. This paper presents a review of PSO for feature selection in classification. After describing the background of feature selection and PSO, recent work involving PSO for feature selection is reviewed. Current issues and challenges are also presented for future research.


european conference on genetic programming | 2017

Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data

Binh Tran; Bing Xue; Mengjie Zhang

Feature construction is a pre-processing technique to create new features with better discriminating ability from the original features. Genetic programming (GP) has been shown to be a prominent technique for this task. However, applying GP to high-dimensional data is still challenging due to the large search space. Feature clustering groups similar features into clusters, which can be used for dimensionality reduction by choosing representative features from each cluster to form the feature subset. Feature clustering has been shown promising in feature selection; but has not been investigated in feature construction for classification. This paper presents the first work of utilising feature clustering in this area. We propose a cluster-based GP feature construction method called CGPFC which uses feature clustering to improve the performance of GP for feature construction on high-dimensional data. Results on eight high-dimensional datasets with varying difficulties show that the CGPFC constructed features perform better than the original full feature set and features constructed by the standard GP constructor based on the whole feature set.


ieee symposium series on computational intelligence | 2016

Multiple feature construction in classification on high-dimensional data using GP

Binh Tran; Mengjie Zhang; Bing Xue

Feature construction and feature selection are common pre-processing techniques to obtain smaller but better discriminating feature sets than the original ones. These two techniques are essential in high-dimensional data with thousands or tens of thousands of features where there may exist many irrelevant and redundant features. Genetic programming (GP) is a powerful technique that has shown promising results in feature construction and feature selection. However, constructing multiple features for high-dimensional data is still challenging due to its large search space. In this paper, we propose a GP-based method that simultaneously performs multiple feature construction and feature selection to automatically transform high-dimensional datasets into much smaller ones. Experiment results on six datasets show that the size of the generated feature set is less than 4% of the original feature set size and it significantly improves the performance of K-Nearest Neighbour, Naive Bayes and Decision Tree algorithms on 15 out of 18 comparisons. Compared with the single feature construction method using GP, the proposed method has better performance on half cases and similar on the other half. Comparisons between the constructed features, the selected features and the combination of both constructed and selected features by the propose method reveal different preferences of the three learning algorithms on these feature sets.


IEEE Transactions on Systems, Man, and Cybernetics | 2018

A New Representation in PSO for Discretization-Based Feature Selection

Binh Tran; Bing Xue; Mengjie Zhang

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity.


european conference on applications of evolutionary computation | 2016

Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features for High-Dimensional Classification

Binh Tran; Bing Xue; Mengjie Zhang

Feature selection and discretisation have shown their effectiveness for data preprocessing especially for high-dimensional data with many irrelevant features. While feature selection selects only relevant features, feature discretisation finds a discrete representation of data that contains enough information but ignoring some minor fluctuation. These techniques are usually applied in two stages, discretisation and then selection since many feature selection methods work only on discrete features. Most commonly used discretisation methods are univariate in which each feature is discretised independently; therefore, the feature selection stage may not work efficiently since information showing feature interaction is not considered in the discretisation process. In this study, we propose a new method called PSO-DFS using bare-bone particle swarm optimisation (BBPSO) for discretisation and feature selection in a single stage. The results on ten high-dimensional datasets show that PSO-DFS obtains a substantial dimensionality reduction for all datasets. The classification performance is significantly improved or at least maintained on nine out of ten datasets by using the transformed “small” data obtained from PSO-DFS. Compared to applying the two-stage approach which uses PSO for feature selection on the discretised data, PSO-DFS achieves better performance on six datasets, and similar performance on three datasets with a much smaller number of features selected.


congress on evolutionary computation | 2016

A PSO based hybrid feature selection algorithm for high-dimensional classification

Binh Tran; Mengjie Zhang; Bing Xue

Recent research has shown that Particle Swarm Optimisation is a promising approach to feature selection. However, applying it on high-dimensional data with thousands to tens of thousands of features is still challenging because of the large search space. While filter approaches are time efficient and scalable for high-dimensional data, they usually obtain lower classification accuracy than wrapper approaches. On the other hand, wrapper methods require a longer running time than filter methods due to the learning algorithm involved in fitness evaluation. This paper proposes a new strategy of combining filter and wrapper approaches in a single evolutionary process in order to achieve smaller feature subsets with better classification performance in a shorter time. A new local search heuristic using symmetric uncertainty is proposed to refine the solutions found by PSO and a new hybrid fitness function is used to better evaluate candidate solutions. The proposed method is examined and compared with three recent PSO based methods on eight high-dimensional problems of varying difficulty. The results show that the new hybrid PSO is more effective and efficient than the other methods.


Connection Science | 2016

Investigation on particle swarm optimisation for feature selection on high-dimensional data: local search and selection bias

Binh Tran; Bing Xue; Mengjie Zhang; Su Nguyen

ABSTRACT Feature selection is an essential step in classification tasks with a large number of features, such as in gene expression data. Recent research has shown that particle swarm optimisation (PSO) is a promising approach to feature selection. However, it also has potential limitation to get stuck into local optima, especially for gene selection problems with a huge search space. Therefore, we developed a PSO algorithm (PSO-LSRG) with a fast “local search” combined with a gbest resetting mechanism as a way to improve the performance of PSO for feature selection. Furthermore, since many existing PSO-based feature selection approaches on the gene expression data have feature selection bias, i.e. no unseen test data is used, 2 sets of experiments on 10 gene expression datasets were designed: with and without feature selection bias. As compared to standard PSO, PSO with gbest resetting only, and PSO with local search only, PSO-LSRG obtained a substantial dimensionality reduction and a significant improvement on the classification performance in both sets of experiments. PSO-LSRG outperforms the other three algorithms when feature selection bias exists. When there is no feature selection bias, PSO-LSRG selects the smallest number of features in all cases, but the classification performance is slightly worse in a few cases, which may be caused by the overfitting problem. This shows that feature selection bias should be avoided when designing a feature selection algorithm to ensure its generalisation ability on unseen data.


simulated evolution and learning | 2017

Automatic Feature Construction for Network Intrusion Detection

Binh Tran; Stjepan Picek; Bing Xue

The notion of cyberspace became impossible to separate from the notions of cyber threat and cyberattack. Since cyberattacks are getting easier to run, they are also becoming more serious threats from the economic damage perspective. Consequently, we are evident of a continuous adversarial relationship between the attackers trying to mount as powerful as possible attacks and defenders trying to stop the attackers in their goals. To defend against such attacks, defenders have at their disposal a plethora of techniques but they are often falling behind the attackers due to the fact that they need to protect the whole system while the attacker needs to find only a single weakness to exploit. In this paper, we consider one type of a cyberattack – network intrusion – and investigate how to use feature construction via genetic programming in order to improve the intrusion detection accuracy. The obtained results show that feature construction offers improvements in a number of tested scenarios and therefore should be considered as an important step in defense efforts. Such improvements are especially apparent in scenario with the highly unbalanced data, which also represents the most interesting case from the defensive perspective.

Collaboration


Dive into the Binh Tran's collaboration.

Top Co-Authors

Avatar

Bing Xue

Victoria University of Wellington

View shared research outputs
Top Co-Authors

Avatar

Mengjie Zhang

Victoria University of Wellington

View shared research outputs
Top Co-Authors

Avatar

Su Nguyen

Victoria University of Wellington

View shared research outputs
Top Co-Authors

Avatar

Stjepan Picek

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge