Archive | 2021

Improved evaluation of existing methods in landscape analysis and comparison of black box optimization problems using regression models

 

Abstract


Continuous black-box optimization problems are found in many real-world situations and scientific domains. The fitness landscape of an optimization problem represents the structural topology of the objective function over the space of variables. Optimization is a search problem over this landscape to find a maximum or minimum value. The performance of an optimization algorithm depends upon the landscape structure of the underlying problem. Therefore, characterization of the landscape is very important to understand the algorithmproblem relationship. Most of the research in black box optimization is focused on developing algorithms for solving problems rather than problem characterization. Problem characterization helps in identifying the (dis)similarities that can assist in choosing the most promising or suitable algorithm to solve the problem. Existing methods to characterize problem landscapes can be combined under the term, exploratory landscape analysis (ELA). These techniques are largely algorithm independent and based on data sampled from problem instances to compute features. To capture different characteristics of a landscape, a set of heterogeneous problem features has been developed. For further analysis, a set of selected features is used to generate a feature space in which problems are positioned depending upon their characteristics (i.e. similar problems are located close to each other). While the featurebased approach has shown some success, validating and evaluating the utility of problem features in practice presents some significant challenges. Machine learning models have been employed as part of the analysis process, but they have their own hyperparameters, biases and experimental variability. As a result, extra layers of uncertainty and complexity are added into the experimental evaluation process, making it difficult to clearly assess the role of the problem features. In regards to problem comparison using features, the selection of the most informative feature set is another difficult task.This thesis makes several contributions to exploratory landscape analysis. First, a novel method for the evaluation of problem features is proposed. The existing methods of feature evaluation are based on the collective performance of a set of features for classifying problems into pre-defined categories using machine learning techniques. However, the individual contribution/strength of a feature is not clear. In the proposed methodology, every feature is tested individually against different sets of problem transformations. In each set, a single problem is transformed into another problem gradually, by altering one characteristic (ruggedness, neutrality, ill-conditioning, linearity) of the landscape. This creates a ranking of similarity among problems. Analysis of variance (ANOVA) significance tests are then used to provide evidence about the feature’s ability to detect differences among problems in each transformation.The second major contribution of the thesis is a model-based framework that can be used to compare problems in terms of landscape similarity. The general idea is to fit a regression model to a problem landscape using a sample of candidate solutions and their corresponding fitness values. The (dis)similarity between models (measured using a Kullback-Leibler (KL) divergence) is used as a measure of the (dis)similarity between problem instances. The choice of regression model used here is Gaussian process (GP). GPs are used as a flexible regression model that can characterize a much wider range of problems as compared to the problems that can be characterized using individual features. A clear advantage of this approach is that the goodness of fit of the regression model can be used directly to validate how well the model approximates the original problem. In general, this is not possible with existing landscape features. In the experiments, different problem sets are compared using the model-based framework. The results show that the framework is very good at recovering induced similarities in artificial problem sets. The framework is then used to explore the black box optimization benchmark (BBOB) problem set in high dimensions using different sample sizes. Later, a set of clustering problems based on real world datasets is compared using the model based framework. The results show that the framework is very effective at identifying the differences between a diverse set of problems.The work in this thesis provides important practical techniques for characterizing and comparing black box optimization problems. It contributes a new method for capturing problem structure as well as improvements to the experimental methodology for applying and using exploratory landscape analysis. These contributions can lead to a better understanding of the problem-algorithm performance mapping and provide benefits for automated algorithm selection and configuration.

Volume None
Pages None
DOI 10.14264/A664616
Language English
Journal None

Full Text