Is this you? Create Your Porfile

Stefan Rüping

Technical University of Dortmund

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Rüping is active.

Explore More

Publication

Featured researches published by Stefan Rüping.

international conference on data mining | 2001

Incremental learning with support vector machines

Stefan Rüping

Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given.Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given.

european conference on machine learning | 2008

Tight Optimistic Estimates for Fast Subgroup Discovery

Henrik Grosskreutz; Stefan Rüping; Stefan Wrobel

Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality. Due to the non monotonicity of the corresponding evaluation functions, standard pruning techniques cannot be used for subgroup discovery, requiring the use of optimistic estimate techniques instead. So far, however, optimistic estimate pruning has only been considered for the extremely simple case of a binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates. In this paper, we show that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery. Based on a precise definition of optimality we show that previous estimates have been tight only in special cases. Thereafter, we present tight optimistic estimates for the most popular binary and multi-class quality functions, and present a family of increasingly efficient approximations to these optimal functions. As we show in empirical experiments, the use of our newly proposed optimistic estimates can lead to a speed up of an order of magnitude compared to previous approaches.

Technical reports | 2001

SVM kernels for time series analysis

Stefan Rüping

Time series analysis is an important and complex problem in machine learning and statistics. Real-world applications can consist of very large and high dimensional time series data. Support Vector Machines (SVMs) are a popular tool for the analysis of such data sets. This paper presents some SVM kernel functions and discusses their relative merits, depending on the type of data that is used.

Data Mining and Knowledge Discovery | 2009

On subgroup discovery in numerical domains

Henrik Grosskreutz; Stefan Rüping

Subgroup discovery is a Knowledge Discovery task that aims at finding subgroups of a population with high generality and distributional unusualness. While several subgroup discovery algorithms have been presented in the past, they focus on databases with nominal attributes or make use of discretization to get rid of the numerical attributes. In this paper, we illustrate why the replacement of numerical attributes by nominal attributes can result in suboptimal results. Thereafter, we present a new subgroup discovery algorithm that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions. The same algorithm can also be applied to ordinal attributes. In an experimental section, we show that the use of our new pruning scheme results in a huge performance gain when more that just a few split-points are considered for the numerical attributes.

LWA | 2004

A Simple Method for Estimating Conditional Probabilities for SVMs

Stefan Rüping

Support Vector Machines (SVMs) have become a popular learning algorithm, in particular for large, high-dimensional classification problems. SVMs have been shown to give most accurate classification results in a variety of applications. Several methods have been proposed to obtain not only a classification, but also an estimate of the SVMs confidence in the correctness of the predicted label. In this paper, several algorithms are compared which scale the SVM decision function to obtain an estimate of the conditional class probability. A new simple and fast method is derived from theoretical arguments and empirically compared to the existing approaches.

ieee international conference on information technology and applications in biomedicine | 2009

Integrated web services platform for the facilitation of fraud detection in health care e-government services

A. Tagaris; G. Konnis; X. Benetou; T. Dimakopoulos; K. Kassis; N. Athanasiadis; Stefan Rüping; Henrik Grosskreutz; Dimitrios D. Koutsouris

Public healthcare is a basic service provided by governments to citizens which is increasingly coming under pressure as the European population ages and the ratio of working to elderly persons falls. A way to make public spending on healthcare more efficient is to ensure that the money is spent on legitimate causes. This paper presents the work of the iWebCare project where a flexible, on-line, fraud detection, Web services platform was designed and developed. It aims to help those in the healthcare business, minimize the loss of funds to fraud. The platform is able to detect erroneous or suspicious records in submitted health care data sets, ensuring homogeneity and consistency and promoting awareness and harmonization of fraud detection practices across health care systems in the EU. Critical objectives included, the development of an ontology of health care data associated with semantic rules, implementation and initial population of an ontology and rules repository, development of a fraud detection engine and implementation of a data mining module. The potential impact of this work can be substantial. More money on healthcare mean better healthcare. Living conditions and the trust of citizens in public healthcare will be improved.

european conference on machine learning | 2009

On Subgroup Discovery in Numerical Domains

Henrik Grosskreutz; Stefan Rüping

Technical reports | 2005

D-optimal plans in observational studies

Constanze Pumplün; Stefan Rüping; Katharina Morik; Claus Weihs

This paper investigates the use of Design of Experiments in observational studies in order to select informative observations and features for classification. D-optimal plans are searched for in existing data and based on these plans the variables most relevant for classification are determined. The adapted models are then compared with respect to their predictive accuracy on an independent test sample. Eight different data sets are investigated by this method.

Technical reports | 2005

Determination of hyper-parameters for kernel based classification and regression

Marcos Marin-Galiano; Karsten Luebke; Andreas Christmann; Stefan Rüping

The optimization of the hyper-parameters of a statistical procedure or machine learning task is a crucial step for obtaining a minimal error. Unfortunately, the optimization of hyper-parameters usually requires many runs of the procedure and hence is very costly. A more detailed knowledge of the dependency of the performance of a procedure on its hyper-parameters can help to speed up this process. In this paper, we investigate the case of kernel-based classifiers and regression estimators which belong to the class of convex risk minimization methods from machine learning. In an empirical investigation, the response surfaces of nonlinear support vector machines and kernel logistic regression are analyzed and the performance of several algorithms for determining hyper-parameters is investigated. The rest of the paper is organized as follows: Section 2 briefly outlines kernel based classification and regression methods. Section 3 gives details on several methods for optimizing the hyper-parameters of statistical procedures. Then, some numerical examples are presented in Section 4. Section 5 contains a discussion. Finally, all figures are given in the appendix.

Informatik Spektrum | 2010

Privacy Preserving Data Mining

Henrik Grosskreutz; Benedikt Lemmen; Stefan Rüping

Einleitung Data-Mining erlaubt das automatisierte Durchsuchen von Daten nach Mustern, Modellen oder Abweichungen. Dies ermöglicht es beispielsweise, in medizinischen Daten automatisch nach Zusammenhängen zwischen Behandlungsmethoden, Patientenmerkmalen und Behandlungserfolgen zu suchen. Das Ergebnis kann zu wertvollen Erkenntnissen führen, wirft aber potenziell auch Datenschutzfragen auf. Im obigen Beispiel sollte beispielweise sichergestellt sein, dass aus den Analyseergebnissen nicht auf genetische Merkmale, bestehende Schwangerschaften oder Vorerkrankungen einzelner Patienten geschlossen werden kann. Ähnliche Fragestellungen ergeben sich auch, wenn mehrere Unternehmen bei der Datenanalyse kooperieren – etwa zur Bekämpfung von Betrug –, dabei aber sichergehen wollen, dass ihre sensiblen Geschäftsdaten geheim bleiben. Die systematische Aufschlüsselung der durch ein Data-Mining-Verfahren offengelegten Information sowie die Entwicklung von neuen Data-Mining-Verfahren, die den Schutz von sensiblen Informationen gewährleisten, ist Thema des sogenannten Privacy-Preserving Data-Mining. Damit liefert dieses Gebiet Antworten auf Fragen wie ,,Welche Daten und Muster können ohne Bedenken veröffentlicht werden?“ und ,,Wie kann eine bestimmte Fragestellung so analysiert werden, dass dabei keine sensiblen Informationen offengelegt werden?“. Diese Fragen sind gerade im Hinblick auf die zunehmende gesellschaftliche Sensibilisierung für Datenschutzthemen, die sich in der Vielzahl der Medienberichte zu diesem Thema widerspiegelt, hochaktuell. Im Folgenden werden zwei Hauptansätze zum Privacy-Preserving Data-Mining vorgestellt, die Anonymisierung und das sichere verteilte Data-Mining. Die Anonymisierung versucht, kritische Informationen schon beim Zugriff auf die Daten zu unterdrücken, was den Ansatz sehr allgemein macht, aber auch die Gefahr eines deutlichen Qualitätsverlusts der Ergebnisse birgt. Das sichere verteilte Data-Mining zielt stattdessen darauf ab, Informationslecks bei der Ausführung von Data-MiningVerfahren auf den kompletten Daten zu vermeiden.

Explore More