Mark Kotanchek
Dow Chemical Company
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark Kotanchek.
Computers & Chemical Engineering | 2004
Leo H. Chiang; Mark Kotanchek; Arthur K. Kordon
Abstract The proficiencies of Fisher discriminant analysis (FDA), support vector machines (SVM), and proximal support vector machines (PSVM) for fault diagnosis (i.e. classification of multiple fault classes) are investigated. The Tennessee Eastman process (TEP) simulator was used to generate overlapping datasets to evaluate the classification performance. When all variables were used, the datasets were masked with irrelevant information, which resulted in poor classification. With key variables selected by genetic algorithms and the contribution charts, SVM and PSVM outperformed FDA and demonstrated the advantage of using nonlinear technique when data are overlapped. The overall misclassification for the testing data using FDA dropped from 38 to 18%; while those using SVM and PSVM dropped from 44–45 to 6%. The effectiveness of the proposed approach is increased in PSVM by saving significant computation time and memory requirement, while obtaining comparable classification results. For auto-correlated data, the incorporation of time lags into SVM and PSVM improved classification results. The added dimensions decreased the degree to which the data overlap and the overall misclassification for the testing set using SVM and PSVM decreased further to 3%.
Archive | 2005
Guido Smits; Mark Kotanchek
Symbolic regression via genetic programming (hereafter, referred to simply as symbolic regression) has proven to be a very important tool for industrial empirical modeling (Kotanchek et al., 2003). Two of the primary problems with industrial use of symbolic regression are (1) the relatively large computational demands in comparison with other nonlinear empirical modeling techniques such as neural networks and (2) the difficulty in making the trade-off between expression accuracy and complexity. The latter issue is significant since, in general, we prefer parsimonious (simple) expressions with the expectation that they are more robust with respect to changes over time in the underlying system or extrapolation outside the range of the data used as the reference in evolving the symbolic regression.
Archive | 2006
Guido Smits; Arthur K. Kordon; Katherine Vladislavleva; Elsa M. Jordaan; Mark Kotanchek
This chapter gives an overview, based on the experience from the Dow Chemical Company, of the importance of variable selection to build robust models from industrial datasets. A quick review of variable selection schemes based on linear techniques is given. A relatively simple fitness inheritance scheme is proposed to do nonlinear sensitivity analysis that is especially effective when combined with Pareto GP. The method is applied to two industrial datasets with good results.
Archive | 2008
Mark Kotanchek; Guido Smits; Ekaterina Vladislavleva
Trust is a major issue with deploying empirical models in the real world since changes in the underlying system or use of the model in new regions of parameter space can produce (potentially dangerous) incorrect predictions. The trepidation involved with model usage can be mitigated by assembling ensembles of diverse models and using their consensus as a trust metric, since these models will be constrained to agree in the data region used for model development and also constrained to disagree outside that region. The problem is to define an appropriate model complexity (since the ensemble should consist of models of similar complexity), as well as to identify diverse models from the candidate model set. In this chapter we discuss strategies for the development and selection of robust models and model ensembles and demonstrate those strategies against industrial data sets. An important benefit of this approach is that all available data may be used in the model development rather than a partition into training, test and validation subsets. The result is constituent models are more accurate without risk of over-fitting, the ensemble predictions are more accurate and the ensemble predictions have a meaningful trust metric.
Archive | 2006
Arthur K. Kordon; Flor A. Castillo; Guido Smits; Mark Kotanchek
This chapter gives a systematic view, based on the experience from The Dow Chemical Company, of the key issues for applying symbolic regression with Genetic Programming (GP) in industrial problems. The competitive advantages of GP are defined and several industrial problems appropriate for GP are recommended and referenced with specific applications in the chemical industry. A systematic method for selecting the key GP parameters, based on statistical design of experiments, is proposed. The most significant technical and non-technical issues for delivering a successful GP industrial application are discussed briefly.
Archive | 2003
Mark Kotanchek; Guido Smits; Arthur K. Kordon
Since the mid-1990’s, symbolic regression via genetic programming (GP) has become a core component of a multi-disciplinary approach to empirical modeling at Dow Chemical. Herein we review the role of symbolic regression within an integrated empirical modeling methodology, discuss symbolic regression system design issues, best practices and lessons learned from industrial application, and present future directions for research and application
Archive | 2007
Mark Kotanchek; Guido Smits; Ekaterina Vladislavleva
The ParetoGP algorithm which adopts a multi-objective optimization approach to balancing expression complexity and accuracy has proven to have significant impact on symbolic regression of industrial data due to its improvement in speed and quality of model development as well as user model selection, (Smits and Kotanchek, 2004), (Smits et al., 2005), (Castillo et al., 2006). In this chapter, we explore a range of topics related to exploiting the Pareto paradigm. First we describe and explore the strengths and weaknesses of the ClassicGPand Pareto-Front GP variants for symbolic regression as well as touch on related algorithms. Next, we show a derivation for the selection intensity of tournament selection with multiple winners (albeit, in a single-objective case). We then extend classical tournament and elite selection strategies into a multi-objective framework which allows classical GP schemes to be readily Pareto-aware. Finally, we introduce the latest extension of the Pareto paradigm which is the melding with ordinal optimization. It appears that ordinal optimization will provide a theoretical foundation to guide algorithm design. Application of these insights has already produced at least a four-fold improvement in the ParetoGP performance for a suite of test problems.
Archive | 2010
Mark Kotanchek; Ekaterina Y. Vladislavleva; Guido Smits
In this chapter we illustrate a framework based on symbolic regression to generate and sharpen the questions about the nature of the underlying system and provide additional context and understanding based on multi-variate numeric data.
Archive | 2011
Guido Smits; Ekaterina Vladislavleva; Mark Kotanchek
The future of computing is one of massive parallelism. To exploit this and generatemaximumperformance itwill be inevitable thatmore co-design between hardware and software takes place. Many software algorithms need rethinking to expose all the possible concurrency, increase locality and have built-in fault tolerance. Evolutionary algorithms are naturally parallel and should as such have an edge in exploiting these hardware features.
Archive | 2014
Rick L. Riolo; Jason H. Moore; Mark Kotanchek
Genetic Programming Theory and Practice VI was developed from the sixth workshop at the University of Michigans Center for the Study of Complex Systems to facilitate the exchange of ideas and information related to the rapidly advancing field of Genetic Programming (GP). Contributions from the foremost international researchers and practitioners in the GP arena examine the similarities and differences between theoretical and empirical results on real-world problems. The text explores the synergy between theory and practice, producing a comprehensive view of the state of the art in GP application. These contributions address several significant interdependent themes which emerged from this years workshop, including: (1) Making efficient and effective use of test data. (2) Sustaining the long-term evolvability of our GP systems. (3) Exploiting discovered subsolutions for reuse. (4) Increasing the role of a Domain Expert.