Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thanh-Binh Le is active.

Publication


Featured researches published by Thanh-Binh Le.


Pattern Recognition Letters | 2015

Modified criterion to select useful unlabeled data for improving semi-supervised support vector machines

Thanh-Binh Le; Sang-Woon Kim

A small amount of unlabeled data was selected to enhance classification accuracy of S3VMs.To select them efficiently, impacts of the labeled data and the unlabeled data were balanced.The class-conditional probabilities of unlabeled samples were utilized as uncertainty levels.Run-time characteristics and error rates of the modified criterion were empirically evaluated. Recent studies have demonstrated that semi-supervised learning (SSL) approaches that use both labeled and unlabeled data are more effective and robust than those that use only labeled data. In SemiBoost, a boosting framework for SSL, a similarity based criterion is developed to select (and utilize) a small amount of useful unlabeled data. However, sometimes it does not work appropriately, particularly when the unlabeled data are near the boundary. In order to address this concern, in this paper the selection criterion is modified using the class-conditional probability in addition to the similarity: first, the criterion is decomposed into three terms of positive class term, negative class term, and unlabeled term; second, when computing the confidences of unlabeled data, using the conditional probability estimated, impacts of the three terms on the confidences are adjusted; third, some unlabeled data that have higher confidences are selected and, together with labeled data, used for re-training a supervised classifier. This select-and-train process is repeated until a termination condition is met. The experimental results, obtained using semi-supervised support vector machines (S3VMs) with benchmark data, demonstrate that the proposed algorithm can compensate for the shortcomings of the traditional S3VMs and, when compared with previous approaches, can achieve further improved results in terms of the classification accuracy.


international conference on pattern recognition applications and methods | 2014

On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines

Thanh-Binh Le; Sang-Woon Kim

Recent studies have demonstrated that semi-supervised learning (SSL) approaches that use both labeled and unlabeled data are more effective and robust than those that use only labeled data. However, it is also well known that using unlabeled data is not always helpful in SSL algorithms. Thus, in order to select a small amount of helpful unlabeled samples, various selection criteria have been proposed in the literature. One criterion is based on the prediction by an ensemble classifier and the similarity between pairwise training samples. However, because the criterion is only concerned with the distance information among the samples, sometimes it does not work appropriately, particularly when the unlabeled samples are near the boundary. In order to address this concern, a method of training semi-supervised support vector machines (S3VMs) using selection criterion is investigated; this method is a modified version of that used in SemiBoost. In addition to the quantities of the original criterion, using the estimated conditional class probability, the confidence values of the unlabeled data are computed first. Then, some unlabeled samples that have higher confidences are selected and, together with the labeled data, used for retraining the ensemble classifier. The experimental results, obtained using artificial and real-life benchmark datasets, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional S3VMs and, compared with previous approaches, can achieve further improved results in terms of classification accuracy.


Neurocomputing | 2016

On measuring confidence levels using multiple views of feature set for useful unlabeled data selection

Thanh-Binh Le; Sang-Woon Kim

This paper concerns the use of multiple views of a feature set to select a small amount of useful unlabeled data. In the semi-supervised learning (SSL) approach, using a selection strategy, strongly discriminative examples are first selected from unlabeled data and then, together with labeled data, utilized for training a (supervised) classifier or used for re-training the ensemble classifier. In this scenario, the selection strategy plays an important role in improving classification performance. This paper investigates a new selection strategy for a case in which the data are composed of different multiple views: first, multiple views of the data are derived independently; second, each of the views are used to measure corresponding confidence levels with which examples to be selected are evaluated; third, all the confidence levels measured from the multiple views are used as a weighted average to derive the target confidence; this select-and-train process is repeated for a pre-defined number of iterations. The experimental results, obtained using semi-supervised support vector machines for synthetic and real-life benchmark data, demonstrate that the proposed mechanism can compensate for the shortcomings of traditional strategies. In particular, the results demonstrate that when the data is appropriately decomposed into multiple views, this strategy can achieve further improved results in terms of the classification accuracy. HighlightsA small amount of unlabeled data was selected to enhance classification accuracy.To select them efficiently, multiple views of the labeled and unlabeled data were utilized.The confidence levels of the multiple views are referred to derive the target confidence.Run-time characteristics and error rates of the proposed criterion were empirically evaluated.


Electronics, Information and Communications (ICEIC), 2014 International Conference on | 2014

Simply recycled selection and incrementally reinforced selection methods applicable for semi-supervised learning algorithms

Thanh-Binh Le; Sang-Woon Kim

This paper presents an empirical study on selecting a small amount useful unlabeled data with which the classification accuracy of semi-supervised learning (SSL) algorithms can be improved. In particular, two selection strategies, named simply recycled selection and incrementally reinforced selection, are considered and empirically compared. The experimental results, obtained with well-known benchmark data sets, demonstrate that the latter works better than the former does in terms of classification accuracy.


Neurocomputing | 2017

Multi-view based unlabeled data selection using feature transformation methods for semiboost learning

Thanh-Binh Le; Sugwon Hong; Sang-Woon Kim

To enhance classification accuracy, useful unlabeled data are selected based on multi-views.To obtain the multiple views of the feature set, transformation based methods are used.Run-time characteristics and error rates of the proposed method were empirically evaluated.The method outperformed traditional methods, including decomposition based methods. SemiBoost Mallapragada etal. (2009) is a boosting framework for semi-supervised learning, in which unlabeled data as well as labeled data both contribute to learning. Various strategies have been proposed in the literature to perform the task of selecting useful unlabeled data in SemiBoost. Recently, a multi-view based strategy was proposed in Le and Kim (2016), in which the feature set of the data is decomposed into subsets (i.e., multiple views) using a feature-decomposition method. In the decomposition process, the strategy inevitably results in some loss of information. To avoid this drawback, this paper considered feature-transformation methods, rather than using the decomposition method, to obtain the multiple views. More specifically, in the feature-transformation method, a number of views were obtained from the entire feature set using the same number of different mapping functions. After deriving the number of views of the data, each of the views was used for measuring corresponding confidences, for first evaluating examples to be selected. Then, all the confidence levels measured from the multiple views were combined as a weighted average for deriving a target confidence. The experimental results, which were obtained using support vector machines for well-known benchmark data, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional strategies. In addition, the results demonstrate that when the data is transformed appropriately into multiple views, the strategy can achieve further improvement in results in terms of classification accuracy.


international conference on natural computation | 2016

Choosing unlabeled examples for SemiBoost using modified cuckoo search algorithms

Trung Hai Nguyen; Thanh-Binh Le; Sang-Woon Kim

SemiBoost (SB) is a boosting form, in which both labeled and unlabeled examples are contributed to learning. In order to perform the task of choosing unlabeled examples in SB, various strategies have been proposed in the literature. Meanwhile, cuckoo search (CS) is a novel meta-heuristic algorithm, whose obtained solutions are better than solutions obtained by efficient particle swarm optimizer (PSO) and genetic algorithm (GA). Recently, rather than using the original CS algorithm, some kinds of modifications have been proposed to improve its searching performance. This paper presents an empirical comparison, using modified CS algorithms to select useful unlabeled examples for SB learning. In particular, several modification strategies were considered and compared empirically, including a CS modified by using the probability estimate in initialization (CS-pE); a modification for unconstrained optimization problems by changing the step-size; a modified CS using Centroidal Voronoi Tessellations in initialization; and modified CSs using adaptive parameters. The experimental results show that the CS-pE strategy works better than the other modifications in synthetic and real-life benchmark data in terms of classification accuracy. In addition, the results show that when a set of the fine tuned parameters is provided and the searching process is appropriately initialized, the strategy achieves further improvement in the accuracy.


international conference on pattern recognition applications and methods | 2015

On Selecting Useful Unlabeled Data Using Multi-view Learning Techniques

Thanh-Binh Le; Sang-Woon Kim

In a semi-supervised learning approach, using a selection strategy, strongly discriminative examples are first selected from unlabeled data and then, together with labeled data, utilized for training a (supervised) classifier. This paper investigates a new selection strategy for the case when the data are composed of different multiple views: first, multiple views of the data are derived independently; second, each of the views are used for measuring corresponding confidences with which examples to be selected are evaluated; third, all the confidence levels measured from the multiple views are used as a weighted average for deriving a target confidence; this selecting-and-training is repeated for a predefined number of iterations. The experimental results, obtained using synthetic and real-life benchmark data, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional strategies. In particular, the results demonstrate that when the data is appropriately decomposed into multiple views, the strategy can achieve further improved results in terms of the classification accuracy.


international conference industrial, engineering & other applications applied intelligent systems | 2015

Comparison of Adjusted Methods for Selecting Useful Unlabeled Data for Semi-Supervised Learning Algorithms

Thanh-Binh Le; Sang-Woon Kim

This paper presents a comparison of the methods of selecting a small amount useful unlabeled data to improve the classification accuracy of semi-supervised learning SSL algorithms. In particular, three selection approaches, namely, the simply adjusted approach based on an uncertainty level, the normalized-and-adjusted approach, and the entropy based adjusted approach, are considered and compared empirically. The experimental results, which are obtained from synthetic and real-life benchmark data using semi-supervised support vector machines S3VMs, demonstrate that the entropy based approach works slightly better than the other ones in terms of the classification accuracy.


Pattern Recognition Letters | 2014

On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms

Thanh-Binh Le; Sang-Woon Kim


international conference on pattern recognition applications and methods | 2012

ON IMPROVING SEMI-SUPERVISED MARGINBOOST INCREMENTALLY USING STRONG UNLABELED DATA

Thanh-Binh Le; Sang-Woon Kim

Collaboration


Dive into the Thanh-Binh Le's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge