Márcio P. Basgalupp
Federal University of São Paulo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Márcio P. Basgalupp.
systems man and cybernetics | 2012
Rodrigo C. Barros; Márcio P. Basgalupp; A. de Carvalho; Alex Alves Freitas
This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The papers original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.
Information Sciences | 2011
Rodrigo C. Barros; Duncan D. Ruiz; Márcio P. Basgalupp
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications.
International Journal of Bio-inspired Computation | 2009
Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Rodrigo C. Barros; Duncan D. Ruiz; Alex Alves Freitas
Among the several tasks that evolutionary algorithms have successfully employed, the induction of classification rules and decision trees has been shown to be a relevant approach for several application domains. Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, conventionally used decision trees induction algorithms present limitations due to the strategy they usually implement: recursive top-down data partitioning through a greedy split evaluation. The main problem with this strategy is quality loss during the partitioning process, which can lead to statistically insignificant rules. In this paper, we propose a new GA-based algorithm for decision tree induction. The proposed algorithm aims to prevent the greedy strategy and to avoid converging to local optima. For such, it is based on a lexicographic multi-objective approach. In order to evaluate the proposed algorithm, it is compared with a well-known and frequently used decision tree induction algorithm using different public datasets. According to the experimental results, the proposed algorithm is able to avoid the previously described problems, reporting accuracy gains. Even more important, the proposed algorithm induced models with a significantly reduction in the complexity considering tree sizes.
acm symposium on applied computing | 2009
Márcio P. Basgalupp; Rodrigo C. Barros; André Carlos Ponce Leon Ferreira de Carvalho; Alex Alves Freitas; Duncan D. Ruiz
Decision trees are widely disseminated as an effective solution for classification tasks. Decision tree induction algorithms have some limitations though, due to the typical strategy they implement: recursive top-down partitioning through a greedy split evaluation. This strategy is limiting in the sense that there is quality loss while the partitioning process occurs, creating statistically insignificant rules. In order to prevent the greedy strategy and to avoid converging to local optima, we present a novel Genetic Algorithm for decision tree induction based on a lexicographic multi-objective approach, and we compare it with the most well-known algorithm for decision tree induction, J48, over distinct public datasets. The results show the feasibility of using this technique as a means to avoid the previously described problems, reporting not only a comparable accuracy but also, importantly, a significantly simpler classification model in the employed datasets.
IEEE Transactions on Evolutionary Computation | 2014
Rodrigo C. Barros; Márcio P. Basgalupp; Alex Alves Freitas; André Carlos Ponce Leon Ferreira de Carvalho
Decision-tree induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing decision trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of decision trees: instead of proposing a new manually designed method for inducing decision trees, we propose automatically designing decision-tree induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing decision-tree algorithms (HEAD-DT) that evolves design components of top-down decision-tree induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better decision-tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known decision-tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed decision-tree algorithms regarding predictive accuracy and F-measure.
acm symposium on applied computing | 2013
Márcio P. Basgalupp; Rodrigo C. Barros; Tiago Silva da Silva; André Carlos Ponce Leon Ferreira de Carvalho
Software effort prediction is an important task within software engineering. In particular, machine learning algorithms have been widely-employed to this task, bearing in mind their capability of providing accurate predictive models for the analysis of project stakeholders. Nevertheless, none of these algorithms has become the de facto standard for metrics prediction given the particularities of different software projects. Among these intelligent strategies, decision trees and evolutionary algorithms have been continuously employed for software metrics prediction, though mostly independent from each other. A recent work has proposed evolving decision trees through an evolutionary algorithm, and applying the resulting tree in the context of software maintenance effort prediction. In this paper, we raise the search-space level of an evolutionary algorithm by proposing the evolution of a decision-tree algorithm instead of the decision tree itself --- an approach known as hyper-heuristic. Our findings show that the decision-tree algorithm automatically generated by a hyper-heuristic is capable of statistically outperforming state-of-the-art top-down and evolution-based decision-tree algorithms, as well as traditional logistic regression. The ability of generating a highly-accurate comprehensible predictive model is crucial in software projects, considering that it allows the stakeholder to properly manage the teams resources with an improved confidence in the model predictions.
Evolutionary Computation | 2013
Rodrigo C. Barros; Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Alex Alves Freitas
This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.
genetic and evolutionary computation conference | 2012
Rodrigo C. Barros; Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Alex Alves Freitas
Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating decision-tree induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional decision-tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure.
BMC Bioinformatics | 2012
Rodrigo C. Barros; Ana T. Winck; Karina S. Machado; Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Duncan D. Ruiz; Osmar Norberto de Souza
BackgroundThis paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance.ResultsThe empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application.ConclusionsWe conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
genetic and evolutionary computation conference | 2011
Rodrigo C. Barros; Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Alex Alves Freitas
Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic decision tree induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.