Huei Diana Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huei Diana Lee is active.

Explore More

Publication

Featured researches published by Huei Diana Lee.

Electronic Notes in Theoretical Computer Science | 2013

A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

Feature selection is an important task in machine learning, which can effectively reduce the dataset dimensionality by removing irrelevant and/or redundant features. Although a large body of research deals with feature selection in single-label data, in which measures have been proposed to filter out irrelevant features, this is not the case for multi-label data. This work proposes multi-label feature selection methods which use the filter approach. To this end, two standard multi-label feature selection approaches, which transform the multi-label data into single-label data, are used. Besides these two problem transformation approaches, we use ReliefF and Information Gain to measure the goodness of features. This gives rise to four multi-label feature selection methods. A thorough experimental evaluation of these methods was carried out on 10 benchmark datasets. Results show that ReliefF is able to select fewer features without diminishing the quality of the classifiers constructed using the features selected.

Neurocomputing | 2016

A systematic review of multi-label feature selection and a new method based on label construction

Newton Spolaôr; Maria Carolina Monard; Grigorios Tsoumakas; Huei Diana Lee

Each example in a multi-label dataset is associated with multiple labels, which are often correlated. Learning from this data can be improved when dimensionality reduction tasks, such as feature selection, are applied. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. In this work, we propose an alternative method, LCFS, that constructs new labels based on relations between the original labels. By doing so, the label set from the data is augmented with second-order information before applying the standard approach. To assess LCFS, an experimental evaluation using Information Gain as a measure to estimate the importance of features was carried out on 10 benchmark multi-label datasets. This evaluation compared four LCFS settings with the standard approach, using random feature selection as a reference. For each dataset, the performance of a feature selection method is estimated by the quality of the classifiers built from the data described by the features selected by the method. The results show that a simple LCFS setting gave rise to classifiers similar to, or better than, the ones built using the standard approach. Furthermore, this work also pioneers the use of the systematic review method to survey the related work on multi-label feature selection. The summary of the 99 papers found promotes the idea that exploring label dependence during feature selection can lead to good results. HighlightsBy constructing new labels, LCFS considers label relations from a multi-label dataset.A LCFS setting achieved performance competitive with the standard approach.LCFS contributed to outperform classifiers based on experimental references.We also pioneer the systematic review use on multi-label feature selection literature.The summary of 99 papers found evidence that agrees with LCFS achievements.

brazilian conference on intelligent systems | 2013

ReliefF for Multi-label Feature Selection

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

The feature selection process aims to select a subset of relevant features to be used in model construction, reducing data dimensionality by removing irrelevant and redundant features. Although effective feature selection methods to support single-label learning are abound, this is not the case for multi-label learning. Furthermore, most of the multi-label feature selection methods proposed initially transform the multi-label data to single-label in which a traditional feature selection method is then applied. However, the application of single-label feature selection methods after transforming the data can hinder exploring label dependence, an important issue in multi-label learning. This work proposes a new multi-label feature selection algorithm, RF-ML, by extending the single-label feature selection ReliefF algorithm. RF-ML, unlike strictly univariate measures for feature ranking, takes into account the effect of interacting attributes to directly deal with multi-label data without any data transformation. Using synthetic datasets, the proposed algorithm is experimentally compared to the ReliefF algorithm in which the multi-label data has been previously transformed to single-label data using two well-known data transformation approaches. Results show that the proposed algorithm stands out by ranking the relevant features as the best ones more often.

brazilian symposium on artificial intelligence | 2012

Filter approach feature selection methods to support multi-label learning based on relieff and information gain

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

In multi-label learning, each example in the dataset is associated with a set of labels, and the task of the generated classifier is to predict the label set of unseen examples. Feature selection is an important task in machine learning, which aims to find a small number of features that describes the dataset as well as, or even better, than the original set of features does. This can be achieved by removing irrelevant and/or redundant features according to some importance criterion. Although effective feature selection methods to support classification for single-label data are abound, this is not the case for multi-label data. This work proposes two multi-label feature selection methods which use the filter approach. This approach evaluates statistics of the data independently of any particular classifier. To this end, ReliefF, a single-label feature selection method and an adaptation of the Information Gain measure for multi-label data are used to find the features that should be selected. Both methods were experimentally evaluated in ten benchmark datasets, taking into account the reduction in the number of features as well as the quality of the generated classifiers, showing promising results.

international conference on evolutionary multi criterion optimization | 2011

Multi-objective genetic algorithm evaluation in feature selection

Newton Spolaôr; Ana Carolina Lorena; Huei Diana Lee

Feature Selection may be viewed as a search for optimal feature subsets considering one or more importance criteria. This search may be performed with Multi-objective Genetic Algorithms. In this work, we present an application of these algorithms for combining different filter approach criteria, which rely on general characteristics of the data, as feature-class correlation, to perform the search for subsets of features. We conducted experiments on public data sets and the results show the potential of this proposal when compared to mono-objective genetic algorithms and two popular filter algorithms.

Expert Systems With Applications | 2016

Prototype system for feature extraction, classification and study of medical images

Jefferson Tales Oliva; Huei Diana Lee; Newton Spolaôr; Cláudio Saddy Rodrigues Coy; Feng Chung Wu

MIAS 3.0 supports automatic feature extraction and medical image classification.The system was developed according to prototyping, a software engineering approach.Experts found that MIAS 3.0 meets the proposed requirements and is a promising tool.MIAS 3.0 experimental evaluation was conducted in 67 image fragments.A J48 classifier built from Amadasun and Haralick features was the best MIAS setting. Colonoscopy exam images are useful to identify diseases, such as the colorectal cancer, which is one of the most common cancers worldwide. Computational image analysis and machine learning techniques can assist experts to identify abnormalities in these images. In this work, we present and evaluate MIAS 3.0, which aims to help experts to study and analyze colon tissue images. To do so, the system initially extracts features from these images. Currently, Amadasum, Haralick and Laws texture descriptors are supported. Then, the described images are classified into normal or abnormal images. In this version, J48, nearest neighbor, backpropagation based on multilayer perceptron, naive Bayes, and support vector machine classification algorithms are implemented. MIAS was developed with open source technologies using a software engineering approach to improve flexibility and maintainability. In this work, MIAS was quantitatively assessed by its application in a set of 134 tissue image fragments. The classifiers built from this set were compared according to the cross-validation and contingency table strategies. Also, the system was qualitatively evaluated using 12 heuristics by twelve volunteers from Health and Exact Sciences. The issues found were categorized according to Rolf Molichs severity scale. As a result, the J48 classifier achieved the highest sensitivity (85.07%) and reasonable average error (18.68%). In the qualitative evaluation, 61.26% of the issues found were not considered serious. These assessments suggest that MIAS can be useful to assist domain experts with minimum knowledge in informatics to conduct more complete studies of medical images, by identifying patterns regarding different abnormalities.

brazilian conference on intelligent systems | 2014

Label Construction for Multi-label Feature Selection

Newton Spolaôr; Maria Carolina Monard; Grigorios Tsoumakas; Huei Diana Lee

Multi-label learning handles datasets where each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by dimensionality reduction tasks, such as feature selection. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. This work proposes an alternative method, LCFS, which constructs new labels based on relations between the original labels to augment the label set of the original dataset. Afterwards, the augmented dataset is submitted to the standard multi-label feature selection approach. Experiments using Information Gain as a measure to evaluate features were carried out in 10 multi-label benchmark datasets. For each dataset, the quality of the features selected was assessed by the quality of the classifiers built using the features selected by the standard approach in the original dataset, as well as in the dataset constructed by four LCFS settings. The results show that setting LCFS with simple strategies using pairs of labels gives rise to better classifiers than the ones built using the standard approach in the original dataset. Moreover, these good results are accomplished when a small number of features are selected.

ibero american conference on ai | 2006

A fractal dimension based filter algorithm to select features for supervised learning

Huei Diana Lee; Maria Carolina Monard; Feng Chung Wu

Feature selection plays an important role in machine learning and is often applied as a data pre-processing step. Its objective is to choose a subset from the original set of features that describes a data set, according to some importance criterion, by removing irrelevant and/or redundant features, as they may decrease data quality and reduce the comprehensibility of hypotheses induced by supervised learning algorithms. Most of the state-of-art feature selection algorithms mainly focus on finding relevant features. However, it has been shown that relevance alone is not sufficient to select important features. It is also important to deal with the problem of features’ redundancy. For the purpose of selecting features and discarding others, it is necessary to measure the features’ goodness (importance), and many importance measures have been proposed. This work proposes a filter algorithm that decouples relevance and redundancy analysis, and introduces the use of Fractal Dimension to deal with redundant features. Empirical results on several data sets show that Fractal Dimension is an appropriate criterion to filter out redundant features for supervised learning.

brazilian symposium on neural networks | 2010

Use of Multiobjective Genetic Algorithms in Feature Selection

Newton Spolaôr; Ana Carolina Lorena; Huei Diana Lee

The intelligent analysis of Databases may be affected by the presence of unimportant features, which motivates the application of Feature Selection. By treating this task as a search and optimization process, it is possible to use the synergy between Genetic Algorithms and Multi-objective Optimization to carry out the search for (quasi) optimal subsets of features considering possible conflicting importance criteria. This work presents an application of Multi-objective Genetic Algorithms to the Feature Selection problem, combining different criteria measuring the importance of the subsets of features.

Acta Cirurgica Brasileira | 2004

Energia total de ruptura: um teste biomecânico para avaliação de material biológico com propriedade viscoelástica não linear

Feng Chung Wu; Huei Diana Lee; Renato Bobsin Machado; Sérgio Dalmás; Cláudio Saddy Rodrigues Coy; Juvenal Ricardo Navarro Góes; João José Fagundes

Purpose: Presentation of the Total Energy of Rupture biomechanical test to evaluate the intrinsic resistance of the rat’s left colon which presents a non-linear viscoelastic property. Methods: Implementation of Total Energy of Rupture test (ETR) and the Biomechanical Data Acquisition and Analysis System (SABI 2.0) based on physic-mechanical, computational and biomechanical concepts. Fifteen specimens of Wistar adults rat’s left colon where considered for experiments. Results: Using the TER biomechanical test it was possible calculate the accumulated total energy necessary to promote the specimens rupture during the mechanical trial. It was also possible to generate descriptive and statistics reports and graphics through the data acquisition and analysis automatization and management. Conclusion: Based on physic-mechanical, computational and biomechanical concepts, the Total Energy of Rupture test provides mathematical analysis of the rat’s left colon segment behaviour during the experiments, demonstrating to be a possible method to measure the intrinsic resistance of this biological material presenting non-linear viscoelastic property.Objetivo: Apresentacao do teste biomecânico Energia Total de Ruptura para o estudo da resistencia intrinseca de material biologico com propriedade viscoelastica nao-linear representado neste trabalho por segmento integro de colon descendente de rato. Metodos: Implementacao do teste biomecânico Energia Total de Ruptura e do Sistema de Aquisicao e Analise de dados Biomecânicos - SABI 2.0. Para esse fim, foram utilizados conceitos fisico-mecânicos, computacionais e biomecânicos e como corpos de teste, 15 especimes de colon descendente de ratos. Resultados: O teste biomecânico Energia Total de Ruptura permitiu o calculo da energia total acumulada necessaria para promover a ruptura dos corpos de prova durante os ensaios mecânicos. Por meio da automatizacao e gerenciamento da aquisicao e analise dos dados capturados foi possivel a geracao de graficos e relatorios descritivos e estatisticos. Conclusao: Fundamentado em conceitos fisico-mecânicos, computacionais e biomecânicos, o teste Energia Total de Ruptura pode proporcionar analise matematica do comportamento dos segmentos de colon descendente de ratos durante os ensaios, demonstrando ser um possivel metodo de medicao da resistencia intrinseca desse material biologico com propriedade viscoelastica nao-linear.

Explore More