Stefan Kramer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Kramer is active.

Explore More

Publication

Featured researches published by Stefan Kramer.

Relational Data Mining | 2001

Propositionalization approaches to relational data mining

Stefan Kramer; Nada Lavrač; Peter A. Flach

This chapter surveys methods that transform a relational representation of a learning problem into a propositional (feature-based, attribute-value) representation. This kind of representation change is known as propositionalization. Taking such an approach, feature construction can be decoupled from model construction. It has been shown that in many relational data mining applications this can be done without loss of predictive performance. After reviewing both general-purpose and domain-dependent propositionalization approaches from the literature, an extension to the LINUS propositionalization method that overcomes the systems earlier inability to deal with non-determinate local variables is described.

knowledge discovery and data mining | 2001

Molecular feature mining in HIV data

Stefan Kramer; Luc De Raedt; Christoph Helma

We present the application of Feature Mining techniques to the Developmental Therapeutics Programs AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.

Bioinformatics | 2001

The Predictive Toxicology Challenge 2000–2001

Christoph Helma; Ross D. King; Stefan Kramer; Ashwin Srinivasan

We initiated the Predictive Toxicology Challenge (PTC) to stimulate the development of advanced SAR techniques for predictive toxicology models. The goal of this challenge is to predict the rodent carcinogenicity of new compounds based on the experimental results of the US National Toxicology Program (NTP). Submissions will be evaluated on quantitative and qualitative scales to select the most predictive models and those with the highest toxicological relevance. Availability: http://www.informatik.uni-freiburg.de/∼ml/ptc/ Contact: [email protected].

international syposium on methodologies for intelligent systems | 2001

Prediction of Ordinal Classes Using Regression Trees

Stefan Kramer; Gerhard Widmer; Bernhard Pfahringer; Michael de Groeve

This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of transforming it into a learner for ordinal classification tasks. These algorithm variants are compared on a number of benchmark data sets to verify the relative strengths and weaknesses of the strategies and to study the trade-off between optimal categorical classification accuracy (hit rate) and minimum distance-based error. Preliminary results indicate that this is a promising avenue towards algorithms that combine aspects of classification and regression.

european conference on computational biology | 2005

Analyzing microarray data using quantitative association rules

Elisabeth Georgii; Lothar Richter; Ulrich Rückert; Stefan Kramer

MOTIVATION We tackle the problem of finding regularities in microarray data. Various data mining tools, such as clustering, classification, Bayesian networks and association rules, have been applied so far to gain insight into gene-expression data. Association rule mining techniques used so far work on discretizations of the data and cannot account for cumulative effects. In this paper, we investigate the use of quantitative association rules that can operate directly on numeric data and represent cumulative effects of variables. Technically speaking, this type of quantitative association rules based on half-spaces can find non-axis-parallel regularities. RESULTS We performed a variety of experiments testing the utility of quantitative association rules for microarray data. First of all, the results should be statistically significant and robust against fluctuations in the data. Next, the approach should be scalable in the number of variables, which is important for such high-dimensional data. Finally, the rules should make sense biologically and be sufficiently different from rules found in regular association rule mining working with discretizations. In all of these dimensions, the proposed approach performed satisfactorily. Therefore, quantitative association rules based on half-spaces should be considered as a tool for the analysis of microarray gene-expression data. AVAILABILITY The code is available from the authors on request.

Bioinformatics | 2010

Pitfalls of supervised feature selection

Pawel Smialowski; Dmitrij Frishman; Stefan Kramer

Pitfalls of supervised feature selection Pawel Smialowski1,2,∗, Dmitrij Frishman1,2 and Stefan Kramer3 1Department of Genome Oriented Bioinformatics, Technische Universitat Munchen Wissenschaftszentrum Weihenstephan, Am Forum 1, 85350 Freising, 2Helmholtz Zentrum Munich, National Research Center for Environment and Health, Institute for Bioinformatics, Ingolstadter Landstrase 1, 85764 Neuherberg and 3Institut fur Informatik/I12, Technische Universitat Munchen, Boltzmannstr. 3, 85748 Garching b. Munchen, Germany

Journal of Cheminformatics | 2010

Collaborative development of predictive toxicology applications

Barry Hardy; Nicki Douglas; Christoph Helma; Micha Rautenberg; Nina Jeliazkova; Vedrin Jeliazkov; Ivelina Nikolova; Romualdo Benigni; Olga Tcheremenskaia; Stefan Kramer; Tobias Girschick; Fabian Buchwald; Jörg Wicker; Andreas Karwath; Martin Gütlein; Andreas Maunz; Haralambos Sarimveis; Georgia Melagraki; Antreas Afantitis; Pantelis Sopasakis; David Gallagher; Vladimir Poroikov; Dmitry Filimonov; Alexey V. Zakharov; Alexey Lagunin; Tatyana A. Gloriozova; Sergey V. Novikov; Natalia Skvortsova; Dmitry Druzhilovsky; Sunil Chawla

OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.

international conference on machine learning | 2004

Ensembles of nested dichotomies for multi-class problems

Eibe Frank; Stefan Kramer

Nested dichotomies are a standard statistical technique for tackling certain polytomous classification problems with logistic regression. They can be represented as binary trees that recursively split a multi-class classification task into a system of dichotomies and provide a statistically sound way of applying two-class learning algorithms to multi-class problems (assuming these algorithms generate class probability estimates). However, there are usually many candidate trees for a given problem and in the standard approach the choice of a particular tree is based on domain knowledge that may not be available in practice. An alternative is to treat every system of nested dichotomies as equally likely and to form an ensemble classifier based on this assumption. We show that this approach produces more accurate classifications than applying C4.5 and logistic regression directly to multi-class problems. Our results also show that ensembles of nested dichotomies produce more accurate classifiers than pairwise classification if both techniques are used with C4.5, and comparable results for logistic regression. Compared to error-correcting output codes, they are preferable if logistic regression is used, and comparable in the case of C4.5. An additional benefit is that they generate class probability estimates. Consequently they appear to be a good general-purpose method for applying binary classifiers to multi-class problems.

Bioinformatics | 2003

Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001

Hannu Toivonen; Ashwin Srinivasan; Ross D. King; Stefan Kramer; Christoph Helma

MOTIVATION The development of in silico models to predict chemical carcinogenesis from molecular structure would help greatly to prevent environmentally caused cancers. The Predictive Toxicology Challenge (PTC) competition was organized to test the state-of-the-art in applying machine learning to form such predictive models. RESULTS Fourteen machine learning groups generated 111 models. The use of Receiver Operating Characteristic (ROC) space allowed the models to be uniformly compared regardless of the error cost function. We developed a statistical method to test if a model performs significantly better than random in ROC space. Using this test as criteria five models performed better than random guessing at a significance level p of 0.05 (not corrected for multiple testing). Statistically the best predictor was the Viniti model for female mice, with p value below 0.002. The toxicologically most interesting models were Leuven2 for male mice, and Kwansei for female rats. These models performed well in the statistical analysis and they are in the middle of ROC space, i.e. distant from extreme cost assumptions. These predictive models were also independently judged by domain experts to be among the three most interesting, and are believed to include a small but significant amount of empirically learned toxicological knowledge. AVAILABILITY PTC details and data can be found at: http://www.predictive-toxicology.org/ptc/.

Bioinformatics | 2006

A new representation for protein secondary structure prediction based on frequent patterns

Fabian Birzele; Stefan Kramer

MOTIVATION A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using support vector machines (SVMs). RESULTS Three different sets of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Despite being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions. AVAILABILITY The method is available from the authors upon request.

Explore More