William La Cava
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William La Cava.
genetic and evolutionary computation conference | 2016
William La Cava; Lee Spector; Kourosh Danai
Lexicase selection is a parent selection method that considers test cases separately, rather than in aggregate, when performing parent selection. It performs well in discrete error spaces but not on the continuous-valued problems that compose most system identification tasks. In this paper, we develop a new form of lexicase selection for symbolic regression, named ε-lexicase selection, that redefines the pass condition for individuals on each test case in a more effective way. We run a series of experiments on real-world and synthetic problems with several treatments of ε and quantify how ε affects parent selection and model performance. ε-lexicase selection is shown to be effective for regression, producing better fit models compared to other techniques such as tournament selection and age-fitness Pareto optimization. We demonstrate that ε can be adapted automatically for individual test cases based on the population performance distribution. Our experiments show that ε-lexicase selection with automatic ε produces the most accurate models across tested problems with negligible computational overhead. We show that behavioral diversity is exceptionally high in lexicase selection treatments, and that ε-lexicase selection makes use of more fitness cases when selecting parents than lexicase selection, which helps explain the performance improvement.
Biodata Mining | 2017
Randal S. Olson; William La Cava; Patryk Orzechowski; Ryan J. Urbanowicz; Jason H. Moore
BackgroundThe selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists.ResultsThe present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered.ConclusionsThis work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
genetic and evolutionary computation conference | 2015
William La Cava; Thomas Helmuth; Lee Spector; Kourosh Danai
We focus on improving genetic programming through local search of the space of program structures using an inheritable epigenetic layer that specifies active and inactive genes. We explore several genetic programming implementations that represent the different properties that epigenetics can provide, such as passive structure, phenotypic plasticity, and inheritable gene regulation. We apply these implementations to several symbolic regression and program synthesis problems. For the symbolic regression problems, the results indicate that epigenetic local search consistently improves genetic programming by producing smaller solution programs with better fitness. Furthermore, we find that incorporating epigenetic modification as a mutation step in program synthesis problems can improve the ability of genetic programming to find exact solutions. By analyzing population homology we show that the epigenetic implementations maintain diversity in silenced portions of programs which may provide protection from premature convergence.
Archive | 2015
Karthik Kannappan; Lee Spector; Moshe Sipper; Thomas Helmuth; William La Cava; Jake Wisdom; Omri Bernstein
Techniques in evolutionary computation (EC) have improved significantly over the years, leading to a substantial increase in the complexity of problems that can be solved by EC-based approaches. The HUMIES awards at the Genetic and Evolutionary Computation Conference are designed to recognize work that has not just solved some problem via techniques from evolutionary computation, but has produced a solution that is demonstrably human-competitive. In this chapter, we take a look across the winners of the past 10 years of the HUMIES awards, and analyze them to determine whether there are specific approaches that consistently show up in the HUMIE winners. We believe that this analysis may lead to interesting insights regarding prospects and strategies for producing further human competitive results.
arXiv: Quantitative Methods | 2018
Randal S. Olson; William La Cava; Zairah Mustahsan; Akshay Varik; Jason H. Moore
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.
genetic and evolutionary computation conference | 2014
William La Cava; Lee Spector; Kourosh Danai; Matthew A. Lackner
This paper describes a method of solving the symbolic regression problem using developmental linear genetic programming (DLGP) with an epigenetic hill climber (EHC). We propose the EHC for optimizing the epigenetic properties of the genotype. The epigenetic characteristics are then inherited through coevolution with the population. Results reveal that the EHC improves performance through maintenance of smaller expressed program sizes. For some problems it produces more successful runs while remaining essentially cost-neutral with respect to number of fitness evaluations.
european conference on applications of evolutionary computation | 2017
William La Cava; Sara Silva; Leonardo Vanneschi; Lee Spector; Jason H. Moore
We present a new classification method that uses genetic programming (GP) to evolve feature transformations for a deterministic, distanced-based classifier. This method, called M4GP, differs from common approaches to classifier representation in GP in that it does not enforce arbitrary decision boundaries and it allows individuals to produce multiple outputs via a stack-based GP system. In comparison to typical methods of classification, M4GP can be advantageous in its ability to produce readable models. We conduct a comprehensive study of M4GP, first in comparison to other GP classifiers, and then in comparison to six common machine learning classifiers. We conduct full hyper-parameter optimization for all of the methods on a suite of 16 biomedical data sets, ranging in size and difficulty. The results indicate that M4GP outperforms other GP methods for classification. M4GP performs competitively with other machine learning methods in terms of the accuracy of the produced models for most problems. M4GP also exhibits the ability to detect epistatic interactions better than the other methods.
Engineering Applications of Artificial Intelligence | 2016
William La Cava; Kourosh Danai; Lee Spector
We introduce a method to enhance the inference of meaningful dynamic models from observational data by genetic programming (GP). This method incorporates an inheritable epigenetic layer that specifies active and inactive genes for a more effective local search of the model structure space. We define several GP implementations using different features of epigenetics, such as passive structure, phenotypic plasticity, and inheritable gene regulation. To test these implementations, we use hundreds of data sets generated from nonlinear ordinary differential equations (ODEs) in several fields of engineering and from randomly constructed nonlinear ODE models. The results indicate that epigenetic hill climbing consistently produces more compact dynamic equations with better fitness values, and that it identifies the exact solution of the system more often, validating the categorical improvement of GP by epigenetic local search. The results further indicate that when faced with complex dynamics, epigenetic hill climbing reduces the computational effort required to infer the correct underlying dynamics. We then apply the method to the identification of three real-world systems: a cascaded tanks system, a chemical distillation tower, and an industrial wind turbine. We analyze its solutions in comparison to theoretical and black-box approaches in terms of accuracy and intelligibility. Finally, we analyze population homology to evaluate the efficiency of the method. The results indicate that the epigenetic implementations provide protection from premature convergence by maintaining diversity in silenced portions of programs.
Archive | 2015
William La Cava; Lee Spector
Classical genetic programming solves problems by applying the Darwinian concepts of selection, survival and reproduction to a population of computer programs. Here we extend the biological analogy to incorporate epigenetic regulation through both learning and evolution. We begin the chapter with a discussion of Darwinian, Lamarckian, and Baldwinian approaches to evolutionary computation and describe how recent findings in biology differ conceptually from the computational strategies that have been proposed. Using inheritable Lamarckian mechanisms as inspiration, we propose a system that allows for updating of individuals in the population during their lifetime while simultaneously preserving both genotypic and phenotypic traits during reproduction. The implementation is made simple through the use of syntax-free, developmental, linear genetic programming. The representation allows for arbitrarily-ordered genomes to be syntactically valid programs, thereby creating a genetic programming approach upon which quasi-uniform epigenetic updating and inheritance can easily be applied. Generational updates are made using an epigenetic hill climber (EHC), and the epigenetic properties of genes are inherited during crossover and mutation. The addition of epigenetics results in faster convergence, less bloat, and an improved ability to find exact solutions on a number of symbolic regression problems.
european conference on genetic programming | 2017
William La Cava; Jason H. Moore
We propose a general wrapper for feature learning that interfaces with other machine learning methods to compose effective data representations. The proposed feature engineering wrapper (FEW) uses genetic programming to represent and evolve individual features tailored to the machine learning method with which it is paired. In order to maintain feature diversity, \(\epsilon \)-lexicase survival is introduced, a method based on \(\epsilon \)-lexicase selection. This survival method preserves semantically unique individuals in the population based on their ability to solve difficult subsets of training cases, thereby yielding a population of uncorrelated features. We demonstrate FEW with five different off-the-shelf machine learning methods and test it on a set of real-world and synthetic regression problems with dimensions varying across three orders of magnitude. The results show that FEW is able to improve model test predictions across problems for several ML methods. We discuss and test the scalability of FEW in comparison to other feature composition strategies, most notably polynomial feature expansion.