Bas van Stein
Leiden University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bas van Stein.
intelligent data analysis | 2015
Bas van Stein; Hao Wang; Wojtek Kowalczyk; Thomas Bäck; Michael Emmerich
In business and academia we are continuously trying to model and analyze complex processes in order to gain insight and optimize. One of the most popular modeling algorithms is Kriging, or Gaussian Processes. A major bottleneck with Kriging is the amount of processing time of at least \(O(n^3)\) and memory required \(O(n^2)\) when applying this algorithm on medium to big data sets. With big data sets, that are more and more available these days, Kriging is not computationally feasible. As a solution to this problem we introduce a hybrid approach in which a number of Kriging models built on disjoint subsets of the data are properly weighted for the predictions. The proposed model is both in processing time and memory much more efficient than standard Global Kriging and performs equally well in terms of accuracy. The proposed algorithm is better scalable, and well suited for parallelization.
international conference information processing | 2016
Bas van Stein; Wojtek Kowalczyk
Real-life datasets that occur in domains such as industrial process control, medical diagnosis, marketing, risk management, often contain missing values. This poses a challenge for many classification and regression algorithms which require complete training sets. In this paper we present a new approach for “repairing” such incomplete datasets by constructing a sequence of regression models that iteratively replace all missing values. Additionally, our approach uses the target attribute to estimate the values of missing data. The accuracy of our method, Incremental Attribute Regression Imputation, IARI, is compared with the accuracy of several popular and state of the art imputation methods, by applying them to five publicly available benchmark datasets. The results demonstrate the superiority of our approach.
ieee international conference on fuzzy systems | 2016
Bas van Stein; Hao Wang; Wojtek Kowalczyk; Michael Emmerich; Thomas Bäck
Kriging or Gaussian Process Regression has been successfully applied in many fields. One of the major bottlenecks of Kriging is the complexity in both processing time (cubic) and memory (quadratic) in the number of data points. To overcome these limitations, a variety of approximation algorithms have been proposed. One of these approximation algorithms is Optimally Weighted Cluster Kriging (OWCK). In this paper, OWCK is extended and enhanced by the use of fuzzy clustering methods in order to increase the accuracy. Several options are proposed and evaluated against both the original OWCK and a variety of other Kriging approximation algorithms.
Archive | 2013
Bas van Stein; Michael Emmerich; Zhiwei Yang
Combinatorial landscape analysis (CLA) is an essential tool for understanding problem difficulty in combinatorial optimization and to get a more fundamental understanding for the behavior of search heuristics. Within CLA, Barrier trees are an efficient tool to visualize essential topographical features of a landscape. They capture the fitness of local optima and how they are separated by fitness barriers from other local optima. The contribution of this study is two-fold: Firstly, the Barrier tree will be extended by a visualization of the size of fitness basins (valleys below saddle points) using expandable node sizes for saddle points and a graded dual-color scheme will be used to distinguish between penalized infeasible and non-penalized feasible solutions of different fitness. Secondly, fitness landscapes of two important NP hard problems with practical relevance will be studied: These are the NK landscapes and Vehicle Routing Problems (with time window constraints). Here the goal is to use EBT to study the influence of problem parameters on the landscape structure: for NK landscapes the number of interacting genes K and for Vehicle Routing Problems the influence of the number of vehicles, the capacity and time window constraints.
genetic and evolutionary computation conference | 2017
Sander van Rijn; Hao Wang; Bas van Stein; Thomas Bäck
In the past years, quite a number of algorithmic extensions of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) have been proposed. These extensions define a large algorithm design space, but relatively little is known about the performance of most of these variations and the interaction between them. In this paper we investigate how various algorithmic extensions interact and what their impact is on objective functions from the Black Box Optimization Benchmark (BBOB). Based on the existing Estimated Running Time (ERT) and Fixed Cost Error (FCE) measures, a novel algorithm quality measure is proposed to quantify an impact-score of the variants studied. Using performance data from running 4,608 available algorithmic variations in the configurable CMA-ES framework published previously, decision trees and other data mining methods are used to analyze performance data. Analysis identifies algorithmic variations required for obtaining best performance and identifies strong differences between objective functions, thereby helping to understand the interaction of algorithmic components for an objective function and, ultimately, for an objective function class. The results also quantitatively confirm that popular variants such as increasing population size and elitism generally have a positive impact on algorithm performance.
international conference on big data | 2016
Bas van Stein; Matthijs van Leeuwen; Thomas Bäck
Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.
international conference information processing | 2018
Bas van Stein; Hao Wang; Wojtek Kowalczyk; Thomas Bäck
For most regression models, their overall accuracy can be estimated with help of various error measures. However, in some applications it is important to provide not only point predictions, but also to estimate the “uncertainty” of the prediction, e.g., in terms of confidence intervals, variances, or interquartile ranges. There are very few statistical modeling techniques able to achieve this. For instance, the Kriging/Gaussian Process method is equipped with a theoretical mean squared error. In this paper we address this problem by introducing a heuristic method to estimate the uncertainty of the prediction, based on the error information from the k-nearest neighbours. This heuristic, called the k-NN uncertainty measure, is computationally much cheaper than other approaches (e.g., bootstrapping) and can be applied regardless of the underlying regression model. To validate and demonstrate the usefulness of the proposed heuristic, it is combined with various models and plugged into the well-known Efficient Global Optimization algorithm (EGO). Results demonstrate that using different models with the proposed heuristic can improve the convergence of EGO significantly.
systems, man and cybernetics | 2017
Hao Wang; Bas van Stein; Michael Emmerich; Thomas Bäck
Bayesian Optimization or Efficient Global Optimization (EGO) is a global search strategy that is designed for expensive black-box functions. In this algorithm, a statistical model (usually the Gaussian process model) is constructed on some initial data samples. The global optimum is approached by iteratively maximizing a so-called acquisition function, that balances the exploration and exploitation effect of the search. The performance of such an algorithm is largely affected by the choice of the acquisition function. Inspired by the usage of higher moments from the Gaussian process model, it is proposed to construct a novel acquisition function based on the moment-generating function (MGF) of the improvement, which is the stochastic gain over the current best fitness value by sampling at an unknown point. This MGF-based acquisition function takes all the higher moments into account and introduces an additional real-valued parameter to control the trade-off between exploration and exploitation. The motivation, rationale and closed-form expression of the proposed function are discussed in detail. In addition, we also illustrate its advantage over other acquisition functions, especially the so-called generalized expected improvement.
genetic and evolutionary computation conference | 2017
Hao Wang; Bas van Stein; Michael Emmerich; Thomas Bäck
Efficient Global Optimization (EGO) is an effective method to optimize expensive black-box functions and utilizes Kriging models (or Gaussian process regression) trained on a relatively small design data set. In real-world applications, such as experimental optimization, where a large data set is available, the EGO algorithm becomes computationally infeasible due to the time and space complexity of Kriging. Recently, the so-called Cluster Kriging methods have been proposed to reduce such complexities for the big data, where data sets are clustered and Kriging models are built on each cluster. Furthermore, Kriging models are combined in an optimal way for the prediction. In addition, we analyze the Cluster Kriging landscape to adopt the existing infill-criteria, e.g., the expected improvement. The approach is tested on selected global optimization problems. It is shown by the empirical studies that this approach significantly reduces the CPU time of the EGO algorithm while maintaining the convergence rate of the algorithm.
international conference on computational science | 2016
Bas van Stein; Matthijs van Leeuwen; Hao Wang; Stephan Purr; Sebastian Kreissl; Josef Meinhardt; Thomas Bäck
The manufacturing process of car body parts is a complex industrial process where many machine parameters and material measurements are involved in establishing the quality of the final product. Data driven models have shown great advantages in helping decision makers to optimize this kind of complex processes where good physical models are hard to build. In this paper a framework for on-line process monitoring and predictive modeling is proposed to optimize a car body part production process. Anomaly detection plays an important role in this framework as it can provide an early alert for operators on the production line using a complex set of machine parameters and material properties. In this paper an anomaly detection algorithm, Gloss, that is successfully implemented as the first module in the process, is introduced. Gloss finds local outliers in high dimensional mixed data-sets using a relative density measure that takes the global neighborhood into account while searching for outliers in subspaces of the data. An overview of the application and implementation of the algorithm in the car body part press shop is presented.