Is this you? Create Your Porfile

Ingo Mierswa

Technical University of Dortmund

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ingo Mierswa is active.

Explore More

Publication

Featured researches published by Ingo Mierswa.

knowledge discovery and data mining | 2006

YALE: rapid prototyping for complex data mining tasks

Ingo Mierswa; Michael Wurst; Ralf Klinkenberg; Martin Scholz; Timm Euler

KDD is a complex and demanding task. While a large number of methods has been established for numerous problems, many challenges remain to be solved. New tasks emerge requiring the development of new methods or processing schemes. Like in software development, the development of such solutions demands for careful analysis, specification, implementation, and testing. Rapid prototyping is an approach which allows crucial design decisions as early as possible. A rapid prototyping system should support maximal re-use and innovative combinations of existing methods, as well as simple and quick integration of new ones.This paper describes Yale, a free open-source environment forKDD and machine learning. Yale provides a rich variety of methods whichallows rapid prototyping for new applications and makes costlyre-implementations unnecessary. Additionally, Yale offers extensive functionality for process evaluation and optimization which is a crucial property for any KDD rapid prototyping tool. Following the paradigm of visual programming eases the design of processing schemes. While the graphical user interface supports interactive design, the underlying XML representation enables automated applications after the prototyping phase.After a discussion of the key concepts of Yale, we illustrate the advantages of rapid prototyping for KDD on case studies ranging from data pre-processing to result visualization. These case studies cover tasks like feature engineering, text mining, data stream mining and tracking drifting concepts, ensemble methods and distributed data mining. This variety of applications is also reflected in a broad user base, we counted more than 40,000 downloads during the last twelve months.

Machine Learning | 2005

Automatic Feature Extraction for Classifying Audio Data

Ingo Mierswa; Katharina Morik

Today, many private households as well as broadcasting or film companies own large collections of digital music plays. These are time series that differ from, e.g., weather reports or stocks market data. The task is normally that of classification, not prediction of the next value or recognizing a shape or motif. New methods for extracting features that allow to classify audio data have been developed. However, the development of appropriate feature extraction methods is a tedious effort, particularly because every new classification task requires tailoring the feature set anew.This paper presents a unifying framework for feature extraction from value series. Operators of this framework can be combined to feature extraction methods automatically, using a genetic programming approach. The construction of features is guided by the performance of the learning classifier which uses the features. Our approach to automatic feature extraction requires a balance between the completeness of the methods on one side and the tractability of searching for appropriate methods on the other side. In this paper, some theoretical considerations illustrate the trade-off. After the feature extraction, a second process learns a classifier from the transformed data. The practical use of the methods is shown by two types of experiments: classification of genres and classification according to user preferences.

Archive | 2003

A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm

Simon Fischer; Ralf Klinkenberg; Ingo Mierswa; Oliver Ritthoff

Genetic algorithms proved to work well on feature selection problems where the search space produced by the initial feature set al ready contains the hypothesis to be learned In cases where this premise is not ful lled one needs to nd or generate new features to adequately extend the search space As a solution to this representation problem we introduce a framework that combines feature selection and genera tion in a wrapper based approach using a modi ed genetic algorithm for the feature transformation and an inductive learner for the evaluation of the constructed feature set The basic idea of this concept is to combine the positive search properties of conventional genetic algorithms with an incremental adaptation of the search space To evaluate this hybrid fea ture selection and generation approach we compare it to several feature selection wrappers both on arti cial and real world data

genetic and evolutionary computation conference | 2006

Evolutionary learning with kernels: a generic solution for large margin problems

Ingo Mierswa

In this paper we embed evolutionary computation into statistical learning theory. First, we outline the connection between large margin optimization and statistical learning and see why this paradigm is successful for many pattern recognition problems. We then embed evolutionary computation into the most prominent representative of this class of learning methods, namely into Support Vector Machines (SVM). In contrast to former applications of evolutionary algorithms to SVMs we do not only optimize the method or kernel parameters. We rather use both evolution strategies and particle swarm optimization in order to directly solve the posed constrained optimization problem. Transforming the problem into the Wolfe dual reduces the total runtime and allows the usage of kernel functions. Exploiting the knowledge about this optimization problem leads to a hybrid mutation which further decreases convergence time while classification accuracy is preserved. We will show that evolutionary SVMs are at least as accurate as their quadratic programming counterparts on six real-world benchmark data sets. The evolutionary SVM variants frequently outperform their quadratic programming competitors. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite kernel functions and for several, possibly competing, performance criteria.

knowledge discovery and data mining | 2006

Understandable models Of music collections based on exhaustive feature generation with temporal statistics

Fabian Moerchen; Ingo Mierswa; Alfred Ultsch

Data mining in large collections of polyphonic music has recently received increasing interest by companies along with the advent of commercial online distribution of music. Important applications include the categorization of songs into genres and the recommendation of songs according to musical similarity and the customers musical preferences. Modeling genre or timbre of polyphonic music is at the core of these tasks and has been recognized as a difficult problem. Many audio features have been proposed, but they do not provide easily understandable descriptions of music. They do not explain why a genre was chosen or in which way one song is similar to another. We present an approach that combines large scale feature generation with meta learning techniques to obtain meaningful features for musical similarity. We perform exhaustive feature generation based on temporal statistics and train regression models to summarize a subset of these features into a single descriptor of a particular notion of music. Using several such models we produce a concise semantic description of each song. Genre classification models based on these semantic features are shown to be better understandable and almost as accurate as traditional methods.

genetic and evolutionary computation conference | 2006

Information preserving multi-objective feature selection for unsupervised learning

Ingo Mierswa; Michael Wurst

In this work we propose a novel, sound framework for evolutionary feature selection in unsupervised machine learning problems. We show that unsupervised feature selection is inherently multi-objective and behaves differently from supervised feature selection in that the number of features must be maximized instead of being minimized. Although this might sound surprising from a supervised learning point of view, we exemplify this relationship on the problem of data clustering and show that existing approaches do not pose the optimization problem in an appropriate way. Another important consequence of this paradigm change is a method which segments the Pareto sets produced by our approach. Inspecting only prototypical points from these segments drastically reduces the amount of work for selecting a final solution. We compare our methods against existing approaches on eight data sets.

genetic and evolutionary computation conference | 2007

Controlling overfitting with multi-objective support vector machines

Ingo Mierswa

Recently, evolutionary computation has been successfully integrated into statistical learning methods. A Support Vector Machine (SVM) using evolution strategies for its optimization problem frequently deliver better results with respect to the optimization criterion and the prediction accuracy. Moreover, evolutionary computation allows for the efficient large margin optimization of a huge family of new kernel functions, namely non-positive semi definite kernels as the Epanechnikov kernel. For these kernel functions, evolutionary SVM even outperform other learning methods like the Relevance Vector Machine. In this paper, we will discuss another major advantage of evolutionary SVM compared to traditional SVM solutions: we can explicitly optimize the inherent trade-off between training error and model complexity by embedding multi-objective optimization into the evolutionary SVM. This leads to three advantages: first, it is no longer necessary to tune the SVM parameter C which weighs both conflicting criteria. This is a very time-consuming task for traditional SVM. Second, the shape and size of the Pareto front give interesting insights about the complexity of the learning task at hand. Finally, the user can actually see the point where overfitting occurs and can easily select a solution from the Pareto front best suiting his or her needs.

GfKl | 2005

Automatic Feature Extraction from Large Time Series

Ingo Mierswa

The classification of high dimensional data like time series requires the efficient extraction of meaningful features. The systematization of statistical methods allows automatic approaches to combine these methods and construct a method tree which delivers suitable features. It can be shown that the combination of efficient methods also works efficiently, which is especially necessary for the feature extraction from large value series. The transformation from raw series data to feature vectors is illustrated by different classification tasks in the domain of audio data.

european conference on machine learning | 2006

Localized alternative cluster ensembles for collaborative structuring

Michael Wurst; Katharina Morik; Ingo Mierswa

Personal media collections are structured in very different ways by different users. Their support by standard clustering algorithms is not sufficient. First, users have their personal preferences which they hardly can express by a formal objective function. Instead, they might want to select among a set of proposed clusterings. Second, users most often do not want hand-made partial structures be overwritten by an automatic clustering. Third, given clusterings of others should not be ignored but used to enhance the own structure. In contrast to other cluster ensemble methods or distributed clustering, a global model (consensus) is not the aim. Hence, we investigate a new learning task, namely learning localized alternative cluster ensembles, where a set of given clusterings is taken into account and a set of proposed clusterings is delivered. This paper proposes an algorithm for solving the new task together with a method for evaluation.

Cancer Letters | 2009

Reanalysis of neuroblastoma expression profiling data using improved methodology and extended follow-up increases validity of outcome prediction

Alexander Schramm; Ingo Mierswa; Lars Kaderali; Katharina Morik; Angelika Eggert; Johannes H. Schulte

Neuroblastoma is the most common extracranial childhood tumor, comprising 15% of all childhood cancer deaths. In an initial study, we used Affymetrix oligonucleotide microarrays to analyse gene expression in 68 primary neuroblastomas and compared different data mining approaches for prediction of early relapse. Here, we performed re-analyses of the data including prolonged follow-up and applied support vector machine (SVM) algorithms and outer cross-validation strategies to improve reliability of expression profiling based predictors. Accuracy of outcome prediction was significantly improved by the use of innovative SVM algorithms on the updated data. In addition, CASPAR, a hierarchical Bayesian approach, was used to predict survival times for the individual patient based on expression profiling data. CASPAR reliably predicted event-free survival, given a cut-off time of three years. Differential expression of genes used by CASPAR to predict patient outcome was validated in an independent cohort of 117 neuroblastomas. In conclusion, we show here for the first time that reanalysis of microarray data using improved methodology, state-of-the-art performance tests and updated follow-up data improves prognosis prediction, and may further improve risk stratification of individual patients.

Explore More