Luiz Otávio Vilas Boas Oliveira

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luiz Otávio Vilas Boas Oliveira is active.

Explore More

Publication

Featured researches published by Luiz Otávio Vilas Boas Oliveira.

genetic and evolutionary computation conference | 2016

A Dispersion Operator for Geometric Semantic Genetic Programming

Luiz Otávio Vilas Boas Oliveira; Fernando E. B. Otero; Gisele L. Pappa

Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution.

european conference on genetic programming | 2017

RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines

Alex Guimarães Cardoso de Sá; Walter José G. S. Pinto; Luiz Otávio Vilas Boas Oliveira; Gisele L. Pappa

Automatic Machine Learning is a growing area of machine learning that has a similar objective to the area of hyper-heuristics: to automatically recommend optimized pipelines, algorithms or appropriate parameters to specific tasks without much dependency on user knowledge. The background knowledge required to solve the task at hand is actually embedded into a search mechanism that builds personalized solutions to the task. Following this idea, this paper proposes RECIPE (REsilient ClassifIcation Pipeline Evolution), a framework based on grammar-based genetic programming that builds customized classification pipelines. The framework is flexible enough to receive different grammars and can be easily extended to other machine learning tasks. RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, such as generating invalid individuals, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar. Results of f-measure obtained by RECIPE are compared to those two state-of-the-art methods, and shown to be as good as or better than those previously reported in the literature. RECIPE represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.

european conference on genetic programming | 2015

The Effect of Distinct Geometric Semantic Crossover Operators in Regression Problems

Julio Albinati; Gisele L. Pappa; Fernando E. B. Otero; Luiz Otávio Vilas Boas Oliveira

This paper investigates the impact of geometric semantic crossover operators in a wide range of symbolic regression problems. First, it analyses the impact of using Manhattan and Euclidean distance geometric semantic crossovers in the learning process. Then, it proposes two strategies to numerically optimize the crossover mask based on mathematical properties of these operators, instead of simply generating them randomly. An experimental analysis comparing geometric semantic crossovers using Euclidean and Manhattan distances and the proposed strategies is performed in a test bed of twenty datasets. The results show that the use of different distance functions in the semantic geometric crossover has little impact on the test error, and that our optimized crossover masks yield slightly better results. For SGP practitioners, we suggest the use of the semantic crossover based on the Euclidean distance, as it achieved similar results to those obtained by more complex operators.

Archive | 2015

Sequential Symbolic Regression with Genetic Programming

Luiz Otávio Vilas Boas Oliveira; Fernando E. B. Otero; Gisele L. Pappa; Julio Albinati

This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function, in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference from GP. More importantly, they show the potential of the proposed approach: an effective way of applying geometric semantic operators to combine different (partial) solutions, and at the same time, avoiding the exponential growth problem arising from the use of semantic operators.

genetic and evolutionary computation conference | 2017

How noisy data affects geometric semantic genetic programming

Luis Fernando Miranda; Luiz Otávio Vilas Boas Oliveira; Joao Francisco B. S. Martins; Gisele L. Pappa

Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.

genetic and evolutionary computation conference | 2018

Analysing symbolic regression benchmarks under a meta-learning approach

Luiz Otávio Vilas Boas Oliveira; Joao Francisco B. S. Martins; Luis Fernando Miranda; Gisele L. Pappa

The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.

genetic and evolutionary computation conference | 2018

Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming

Joao Francisco B. S. Martins; Luiz Otávio Vilas Boas Oliveira; Luis Fernando Miranda; Felipe Casadei; Gisele L. Pappa

Advances in Geometric Semantic Genetic Programming (GSGP) have shown that this variant of Genetic Programming (GP) reaches better results than its predecessor for supervised machine learning problems, particularly in the task of symbolic regression. However, by construction, the geometric semantic crossover operator generates individuals that grow exponentially with the number of generations, resulting in solutions with limited use. This paper presents a new method for individual simplification named GSGP with Reduced trees (GSGP-Red). GSGP-Red works by expanding the functions generated by the geometric semantic operators. The resulting expanded function is guaranteed to be a linear combination that, in a second step, has its repeated structures and respective coefficients aggregated. Experiments in 12 real-world datasets show that it is not only possible to create smaller and completely equivalent individuals in competitive computational time, but also to reduce the number of nodes composing them by 58 orders of magnitude, on average.

european conference on genetic programming | 2017

Strategies for Improving the Distribution of Random Function Outputs in GSGP

Luiz Otávio Vilas Boas Oliveira; Felipe Casadei; Gisele L. Pappa

In the last years, different approaches have been proposed to introduce semantic information to genetic programming. In particular, the geometric semantic genetic programming (GSGP) and the interesting properties of its evolutionary operators have gotten the attention of the community. This paper is interested in the use of GSGP to solve symbolic regression problems, where semantics is defined by the output set generated by a given individual when applied to the training cases. In this scenario, both mutation and crossover operators defined with fitness function based on Manhattan distance use randomly built functions to generate offspring. However, the outputs of these random functions are not guaranteed to be uniformly distributed in the semantic space, as the functions are generated considering the syntactic space. We hypothesize that the non-uniformity of the semantics of these functions may bias the search, and propose three different standard normalization techniques to improve the distribution of the outputs of these random functions over the semantic space. The results are compared with a popular strategy that uses a logistic function as a wrapper to the outputs, and show that the strategies tested can improve the results of the previous method. The experimental analysis also indicates that a more uniform distribution of the semantics of these functions does not necessarily imply in better results in terms of test error.

brazilian conference on intelligent systems | 2016

Revisiting the Sequential Symbolic Regression Genetic Programming

Luiz Otávio Vilas Boas Oliveira; Fernando E. B. Otero; Luis Fernando Miranda; Gisele L. Pappa

Sequential Symbolic Regression (SSR) is a technique that recursively induces functions over the error of the current solution, concatenating them in an attempt to reduce the error of the resulting model. As proof of concept, the method was previously evaluated in one-dimensional problems and compared with canonical Genetic Programming (GP) and Geometric Semantic Genetic Programming (GSGP). In this paper we revisit SSR exploring the method behaviour in higher dimensional, larger and more heterogeneous datasets. We discuss the difficulties arising from the application of the method to more complex problems, e.g., overfitting, along with suggestions to overcome them. An experimental analysis was conducted comparing SSR to GP and GSGP, showing SSR solutions are smaller than those generated by the GSGP with similar performance and more accurate than those generated by the canonical GP.

congress on evolutionary computation | 2013

A new representation for instance-based clonal selection algorithms

Luiz Otávio Vilas Boas Oliveira; Isabela Drummond; Gisele L. Pappa

This work borrows the traditional Pittsburgh-style representation from Genetic-Based Machine Learning and evaluates its performance in artificial immune systems (AIS) for classification. Our main goal is to select as few instances as possible to represent the data from the training set without losing accuracy. The new representation is tested in a modified version of a clonal selection algorithm, where the antibodies represent lists of prototypes instead of a single one. The generated method, named Clonal Selection Prototypes Generator, was tested in 10 UCI datasets and compared to other seven methods that execute the same task. Results showed that the proposed method is very good at considering a trade-off between the number of prototypes generated and the accuracy of the system.

Explore More