Newton Spolaôr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Newton Spolaôr is active.

Explore More

Publication

Featured researches published by Newton Spolaôr.

Electronic Notes in Theoretical Computer Science | 2013

A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

Feature selection is an important task in machine learning, which can effectively reduce the dataset dimensionality by removing irrelevant and/or redundant features. Although a large body of research deals with feature selection in single-label data, in which measures have been proposed to filter out irrelevant features, this is not the case for multi-label data. This work proposes multi-label feature selection methods which use the filter approach. To this end, two standard multi-label feature selection approaches, which transform the multi-label data into single-label data, are used. Besides these two problem transformation approaches, we use ReliefF and Information Gain to measure the goodness of features. This gives rise to four multi-label feature selection methods. A thorough experimental evaluation of these methods was carried out on 10 benchmark datasets. Results show that ReliefF is able to select fewer features without diminishing the quality of the classifiers constructed using the features selected.

Neurocomputing | 2016

A systematic review of multi-label feature selection and a new method based on label construction

Newton Spolaôr; Maria Carolina Monard; Grigorios Tsoumakas; Huei Diana Lee

Each example in a multi-label dataset is associated with multiple labels, which are often correlated. Learning from this data can be improved when dimensionality reduction tasks, such as feature selection, are applied. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. In this work, we propose an alternative method, LCFS, that constructs new labels based on relations between the original labels. By doing so, the label set from the data is augmented with second-order information before applying the standard approach. To assess LCFS, an experimental evaluation using Information Gain as a measure to estimate the importance of features was carried out on 10 benchmark multi-label datasets. This evaluation compared four LCFS settings with the standard approach, using random feature selection as a reference. For each dataset, the performance of a feature selection method is estimated by the quality of the classifiers built from the data described by the features selected by the method. The results show that a simple LCFS setting gave rise to classifiers similar to, or better than, the ones built using the standard approach. Furthermore, this work also pioneers the use of the systematic review method to survey the related work on multi-label feature selection. The summary of the 99 papers found promotes the idea that exploring label dependence during feature selection can lead to good results. HighlightsBy constructing new labels, LCFS considers label relations from a multi-label dataset.A LCFS setting achieved performance competitive with the standard approach.LCFS contributed to outperform classifiers based on experimental references.We also pioneer the systematic review use on multi-label feature selection literature.The summary of 99 papers found evidence that agrees with LCFS achievements.

brazilian conference on intelligent systems | 2013

ReliefF for Multi-label Feature Selection

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

The feature selection process aims to select a subset of relevant features to be used in model construction, reducing data dimensionality by removing irrelevant and redundant features. Although effective feature selection methods to support single-label learning are abound, this is not the case for multi-label learning. Furthermore, most of the multi-label feature selection methods proposed initially transform the multi-label data to single-label in which a traditional feature selection method is then applied. However, the application of single-label feature selection methods after transforming the data can hinder exploring label dependence, an important issue in multi-label learning. This work proposes a new multi-label feature selection algorithm, RF-ML, by extending the single-label feature selection ReliefF algorithm. RF-ML, unlike strictly univariate measures for feature ranking, takes into account the effect of interacting attributes to directly deal with multi-label data without any data transformation. Using synthetic datasets, the proposed algorithm is experimentally compared to the ReliefF algorithm in which the multi-label data has been previously transformed to single-label data using two well-known data transformation approaches. Results show that the proposed algorithm stands out by ranking the relevant features as the best ones more often.

brazilian symposium on artificial intelligence | 2012

Filter approach feature selection methods to support multi-label learning based on relieff and information gain

Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard; Huei Diana Lee

In multi-label learning, each example in the dataset is associated with a set of labels, and the task of the generated classifier is to predict the label set of unseen examples. Feature selection is an important task in machine learning, which aims to find a small number of features that describes the dataset as well as, or even better, than the original set of features does. This can be achieved by removing irrelevant and/or redundant features according to some importance criterion. Although effective feature selection methods to support classification for single-label data are abound, this is not the case for multi-label data. This work proposes two multi-label feature selection methods which use the filter approach. This approach evaluates statistics of the data independently of any particular classifier. To this end, ReliefF, a single-label feature selection method and an adaptation of the Information Gain measure for multi-label data are used to find the features that should be selected. Both methods were experimentally evaluated in ten benchmark datasets, taking into account the reduction in the number of features as well as the quality of the generated classifiers, showing promising results.

Electronic Notes in Theoretical Computer Science | 2014

A Framework to Generate Synthetic Multi-label Datasets

Jimena Torres Tomás; Newton Spolaôr; Everton Alvares Cherman; Maria Carolina Monard

A controlled environment based on known properties of the dataset used by a learning algorithm is useful to empirically evaluate machine learning algorithms. Synthetic (artificial) datasets are used for this purpose. Although there are publicly available frameworks to generate synthetic single-label datasets, this is not the case for multi-label datasets, in which each instance is associated with a set of labels usually correlated. This work presents Mldatagen, a multi-label dataset generator framework we have implemented, which is publicly available to the community. Currently, two strategies have been implemented in Mldatagen: hypersphere and hypercube. For each label in the multi-label dataset, these strategies randomly generate a geometric shape (hypersphere or hypercube), which is populated with points (instances) randomly generated. Afterwards, each instance is labeled according to the shapes it belongs to, which defines its multi-label. Experiments with a multi-label classification algorithm in six synthetic datasets illustrate the use of Mldatagen.

brazilian conference on intelligent systems | 2014

Label Construction for Multi-label Feature Selection

Newton Spolaôr; Maria Carolina Monard; Grigorios Tsoumakas; Huei Diana Lee

Multi-label learning handles datasets where each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by dimensionality reduction tasks, such as feature selection. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. This work proposes an alternative method, LCFS, which constructs new labels based on relations between the original labels to augment the label set of the original dataset. Afterwards, the augmented dataset is submitted to the standard multi-label feature selection approach. Experiments using Information Gain as a measure to evaluate features were carried out in 10 multi-label benchmark datasets. For each dataset, the quality of the features selected was assessed by the quality of the classifiers built using the features selected by the standard approach in the original dataset, as well as in the dataset constructed by four LCFS settings. The results show that setting LCFS with simple strategies using pairs of labels gives rise to better classifiers than the ones built using the standard approach in the original dataset. Moreover, these good results are accomplished when a small number of features are selected.

Computers in Education | 2017

Robotics applications grounded in learning theories on tertiary education: A systematic review

Newton Spolaôr; Fabiane Barreto Vavassori Benitti

Abstract Empirical evidence suggests the effectiveness of robotics as a learning complementary tool in tertiary education. In this context, some experiences benefited from the link between educational practice and theory. However, a comprehensive survey on initiatives that explores this link in universities and colleges is missing. This work systematically reviews quantitatively assessed robots applications, grounded in learning theories, in tertiary institutions. By applying a protocol review in different bibliographic databases, 15 papers were selected for synthesis. As a result, experiences developing non-robotic concepts and skills in universities and colleges were found. In most of the cases, Computer Science and Engineering undergraduate courses were involved. In addition, empirical results reported by the selected publications suggest that some literature proposals can be useful in practice. Based on the panorama obtained, this work also points out future directions for practitioners and researchers in education.

ibero-american conference on artificial intelligence | 2014

Evaluating ReliefF-Based Multi-Label Feature Selection Algorithm

Newton Spolaôr; Maria Carolina Monard

In multi-label learning, each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by feature selection. This work experimentally evaluates four multi-label feature selection algorithms that use the filter approach. Three of them are based on the ReliefF algorithm, which takes into account interacting features. The quality of the selected features is assessed by three different learning algorithms. Evaluating multi-label learning algorithms is a complicated task, as multiple evaluation measures, which might optimize different loss functions, should be considered. To this end, \(General_B\), a baseline algorithm which learns by only looking at the multi-labels of the dataset, is used as a reference. Results show that feature selection contributed to improve the performance of classifiers initially worse than \(General_B\) and highlight ReliefF-based algorithms in some experimental settings.

international symposium on neural networks | 2010

Complexity measures of supervised classifications tasks: A case study for cancer gene expression data

Marcílio Carlos Pereira de Souto; Ana Carolina Lorena; Newton Spolaôr; Ivan G. Costa

Machine Learning algorithms have been widely used for gene expression data classification, despite the fact that these data have often intrinsic limitations, such as high dimensionality and a small number of examples. Few studies try to characterize to which extent these aspects can influence the performance of the classification models induced. In this paper we compute different measures characterizing the complexity of gene expression data sets for cancer diagnosis. We then investigate how these measures relate to the classification performances achieved by support vector machines, a popular Machine Learning technique usually employed in the analysis of gene expression data. The results obtained indicate that some of the complexity indices utilized are indeed successful in explaining the difficulty involved in the classification of cancer gene expression data.

brazilian symposium on neural networks | 2010

Use of Multiobjective Genetic Algorithms in Feature Selection

Newton Spolaôr; Ana Carolina Lorena; Huei Diana Lee

The intelligent analysis of Databases may be affected by the presence of unimportant features, which motivates the application of Feature Selection. By treating this task as a search and optimization process, it is possible to use the synergy between Genetic Algorithms and Multi-objective Optimization to carry out the search for (quasi) optimal subsets of features considering possible conflicting importance criteria. This work presents an application of Multi-objective Genetic Algorithms to the Feature Selection problem, combining different criteria measuring the importance of the subsets of features.

Explore More