Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guido Smits is active.

Publication


Featured researches published by Guido Smits.


IEEE Transactions on Evolutionary Computation | 2009

Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming

Ekaterina Vladislavleva; Guido Smits; Dick den Hertog

This paper presents a novel approach to generate data-driven regression models that not only give reliable prediction of the observed data but also have smoother response surfaces and extra generalization capabilities with respect to extrapolation. These models are obtained as solutions of a genetic programming (GP) process, where selection is guided by a tradeoff between two competing objectives - numerical accuracy and the order of nonlinearity. The latter is a novel complexity measure that adopts the notion of the minimal degree of the best-fit polynomial, approximating an analytical function with a certain precision. Using nine regression problems, this paper presents and illustrates two different strategies for the use of the order of nonlinearity in symbolic regression via GP. The combination of optimization of the order of nonlinearity together with the numerical accuracy strongly outperforms ldquoconventionalrdquo optimization of a size-related expressional complexity and the accuracy with respect to extrapolative capabilities of solutions on all nine test problems. In addition to exploiting the new complexity measure, this paper also introduces a novel heuristic of alternating several optimization objectives in a 2-D optimization framework. Alternating the objectives at each generation in such a way allows us to exploit the effectiveness of 2-D optimization when more than two objectives are of interest (in this paper, these are accuracy, expressional complexity, and the order of nonlinearity). Results of the experiments on all test problems suggest that alternating the order of nonlinearity of GP individuals with their structural complexity produces solutions that are both compact and have smoother response surfaces, and, hence, contributes to better interpretability and understanding.


Archive | 2005

Pareto-Front Exploitation in Symbolic Regression

Guido Smits; Mark Kotanchek

Symbolic regression via genetic programming (hereafter, referred to simply as symbolic regression) has proven to be a very important tool for industrial empirical modeling (Kotanchek et al., 2003). Two of the primary problems with industrial use of symbolic regression are (1) the relatively large computational demands in comparison with other nonlinear empirical modeling techniques such as neural networks and (2) the difficulty in making the trade-off between expression accuracy and complexity. The latter issue is significant since, in general, we prefer parsimonious (simple) expressions with the expectation that they are more robust with respect to changes over time in the underlying system or extrapolation outside the range of the data used as the reference in evolving the symbolic regression.


Archive | 2006

Variable Selection in Industrial Datasets Using Pareto Genetic Programming

Guido Smits; Arthur K. Kordon; Katherine Vladislavleva; Elsa M. Jordaan; Mark Kotanchek

This chapter gives an overview, based on the experience from the Dow Chemical Company, of the importance of variable selection to build robust models from industrial datasets. A quick review of variable selection schemes based on linear techniques is given. A relatively simple fitness inheritance scheme is proposed to do nonlinear sensitivity analysis that is especially effective when combined with Pareto GP. The method is applied to two industrial datasets with good results.


Archive | 2008

Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models

Mark Kotanchek; Guido Smits; Ekaterina Vladislavleva

Trust is a major issue with deploying empirical models in the real world since changes in the underlying system or use of the model in new regions of parameter space can produce (potentially dangerous) incorrect predictions. The trepidation involved with model usage can be mitigated by assembling ensembles of diverse models and using their consensus as a trust metric, since these models will be constrained to agree in the data region used for model development and also constrained to disagree outside that region. The problem is to define an appropriate model complexity (since the ensemble should consist of models of similar complexity), as well as to identify diverse models from the candidate model set. In this chapter we discuss strategies for the development and selection of robust models and model ensembles and demonstrate those strategies against industrial data sets. An important benefit of this approach is that all available data may be used in the model development rather than a partition into training, test and validation subsets. The result is constituent models are more accurate without risk of over-fitting, the ensemble predictions are more accurate and the ensemble predictions have a meaningful trust metric.


Archive | 2006

Application Issues of Genetic Programming in Industry

Arthur K. Kordon; Flor A. Castillo; Guido Smits; Mark Kotanchek

This chapter gives a systematic view, based on the experience from The Dow Chemical Company, of the key issues for applying symbolic regression with Genetic Programming (GP) in industrial problems. The competitive advantages of GP are defined and several industrial problems appropriate for GP are recommended and referenced with specific applications in the chemical industry. A systematic method for selecting the key GP parameters, based on statistical design of experiments, is proposed. The most significant technical and non-technical issues for delivering a successful GP industrial application are discussed briefly.


Archive | 2003

Industrial Strength Genetic Programming

Mark Kotanchek; Guido Smits; Arthur K. Kordon

Since the mid-1990’s, symbolic regression via genetic programming (GP) has become a core component of a multi-disciplinary approach to empirical modeling at Dow Chemical. Herein we review the role of symbolic regression within an integrated empirical modeling methodology, discuss symbolic regression system design issues, best practices and lessons learned from industrial application, and present future directions for research and application


Archive | 2007

Pursuing the Pareto Paradigm: Tournaments, Algorithm Variations and Ordinal Optimization

Mark Kotanchek; Guido Smits; Ekaterina Vladislavleva

The ParetoGP algorithm which adopts a multi-objective optimization approach to balancing expression complexity and accuracy has proven to have significant impact on symbolic regression of industrial data due to its improvement in speed and quality of model development as well as user model selection, (Smits and Kotanchek, 2004), (Smits et al., 2005), (Castillo et al., 2006). In this chapter, we explore a range of topics related to exploiting the Pareto paradigm. First we describe and explore the strengths and weaknesses of the ClassicGPand Pareto-Front GP variants for symbolic regression as well as touch on related algorithms. Next, we show a derivation for the selection intensity of tournament selection with multiple winners (albeit, in a single-objective case). We then extend classical tournament and elite selection strategies into a multi-objective framework which allows classical GP schemes to be readily Pareto-aware. Finally, we introduce the latest extension of the Pareto paradigm which is the melding with ordinal optimization. It appears that ordinal optimization will provide a theoretical foundation to guide algorithm design. Application of these insights has already produced at least a four-fold improvement in the ParetoGP performance for a suite of test problems.


congress on evolutionary computation | 2002

Robust soft sensors based on integration of genetic programming, analytical neural networks, and support vector machines

Arthur K. Kordon; Guido Smits; Elsa M. Jordaan; Ed Rightor

A novel approach for development of inferential sensors based on integration of three key computational intelligence approaches (genetic programming, analytical neural networks, and support vector machines) is proposed. The advantages of this type of soft sensors are their good generalization capabilities, increased robustness, explicit input/output relationships, self-assessment capabilities, and low implementation and maintenance cost.


genetic and evolutionary computation conference | 2006

Pareto front genetic programming parameter selection based on design of experiments and industrial data

Flor A. Castillo; Arthur K. Kordon; Guido Smits; Ben Christenson; Dee Dickerson

Symbolic regression based on Pareto Front GP is the key approach for generating high-performance parsimonious empirical models acceptable for industrial applications. The paper addresses the issue of finding the optimal parameter settings of Pareto Front GP which direct the simulated evolution toward simple models with acceptable prediction error. A generic methodology based on statistical design of experiments is proposed. It includes statistical determination of the number of replicates by half-width confidence intervals, determination of the significant inputs by fractional factorial design of experiments, approaching the optimum by steepest ascent/descent, and local exploration around the optimum by Box Behnken or by central composite design of experiments. The results from implementing the proposed methodology to a small-sized industrial data set show that the statistically significant factors for symbolic regression, based on Pareto Front GP, are the number of cascades, the number of generations, and the population size. A second order regression model with high R2 of 0.97 includes the three parameters and their optimal values have been defined. The optimal parameter settings were validated with a separate small sized industrial data set. The optimal settings are recommended for symbolic regression applications using data sets with up to 5 inputs and up to 50 data points.


ieee international conference on evolutionary computation | 2006

Ordinal Pareto Genetic Programming

Guido Smits; Ekaterina Vladislavleva

This paper introduces the first attempt to combine the theory of ordinal optimization and symbolic regression via genetic programming. A new approach called ordinal ParetoGP allows obtaining considerably fitter solutions with more consistency between independent runs while spending less computational effort. The conclusions are supported by a number of experiments using three symbolic regression benchmark problems of various size.

Collaboration


Dive into the Guido Smits's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge