Steffen Bickel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steffen Bickel is active.

Explore More

Publication

Featured researches published by Steffen Bickel.

international conference on data mining | 2004

Multi-view clustering

Steffen Bickel; Tobias Scheffer

We consider clustering problems in which the available attributes can be split into two independent subsets, such that either subset suffices for learning. Example applications of this multi-view setting include clustering of Web pages which have an intrinsic view (the pages themselves) and an extrinsic view (e.g., anchor texts of inbound hyperlinks); multi-view learning has so far been studied in the context of classification. We develop and study partitioning and agglomerative, hierarchical multi-view clustering algorithms for text data. We find empirically that the multi-view versions of k-means and EM greatly improve on their single-view counterparts. By contrast, we obtain negative results for agglomerative hierarchical multi-view clustering. Our analysis explains this surprising phenomenon.

international conference on machine learning | 2007

Discriminative learning for differing training and test distributions

Steffen Bickel; Michael Brückner; Tobias Scheffer

We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution---problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. We formulate the general problem of learning under covariate shift as an integrated optimization problem. We derive a kernel logistic regression classifier for differing training and test distributions.

international conference on machine learning | 2007

Unsupervised prediction of citation influences

Laura Dietz; Steffen Bickel; Tobias Scheffer

Publication repositories contain an abundance of information about the evolution of scientific research areas. We address the problem of creating a visualization of a research area that describes the flow of topics between papers, quantifies the impact that papers have on each other, and helps to identify key contributions. To this end, we devise a probabilistic topic model that explains the generation of documents; the model incorporates the aspects of topical innovation and topical inheritance via citations. We evaluate the models ability to predict the strength of influence of citations against manually rated citations.

international conference on machine learning | 2008

Multi-task learning for HIV therapy screening

Steffen Bickel; Jasmina Bogojeska; Thomas Lengauer; Tobias Scheffer

We address the problem of learning classifiers for a large number of tasks. We derive a solution that produces resampling weights which match the pool of all examples to the target distribution of any given task. Our work is motivated by the problem of predicting the outcome of a therapy attempt for a patient who carries an HIV virus with a set of observed genetic properties. Such predictions need to be made for hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms. Multi-task learning enables us to make predictions even for drug combinations with few or no training examples and substantially improves the overall prediction accuracy.

Journal of Information Processing | 2009

Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation

Yuta Tsuboi; Hisashi Kashima; Shohei Hido; Steffen Bickel; Masashi Sugiyama

Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the loss function according to the importance, which is the ratio of test and training densities. We propose a novel method that allows us to directly estimate the importance from samples without going through the hard task of density estimation. An advantage of the proposed method is that the computation time is nearly independent of the number of test input samples, which is highly beneficial in recent applications with large numbers of unlabeled samples. We demonstrate through experiments that the proposed method is computationally more efficient than existing approaches with comparable accuracy. We also describe a promising result for large-scale covariate shift adaptation in a natural language processing task.

Sigkdd Explorations | 2005

Classifying search engine queries using the web as background knowledge

David S. Vogel; Steffen Bickel; Peter Haider; Rolf Schimpfky; Peter Siemen; Steve Bridges; Tobias Scheffer

The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in. Based on its performance on the classification of 800,000 example queries recorded from MSN search, the system received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005.

BMC Bioinformatics | 2005

Systematic feature evaluation for gene name recognition

Jörg Hakenberg; Steffen Bickel; Conrad Plake; Ulf Brefeld; Hagen Zahn; Lukas C. Faulstich; Ulf Leser; Tobias Scheffer

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

european conference on machine learning | 2005

Estimation of mixture models using Co-EM

Steffen Bickel; Tobias Scheffer

We study estimation of mixture models for problems in which multiple views of the instances are available. Examples of this setting include clustering web pages or research papers that have intrinsic (text) and extrinsic (references) attributes. Our optimization criterion quantifies the likelihood and the consensus among models in the individual views; maximizing this consensus minimizes a bound on the risk of assigning an instance to an incorrect mixture component. We derive an algorithm that maximizes this criterion. Empirically, we observe that the resulting clustering method incurs a lower cluster entropy than regular EM for web pages, research papers, and many text collections.

empirical methods in natural language processing | 2005

Predicting Sentences using N-Gram Language Models

Steffen Bickel; Peter Haider; Tobias Scheffer

We explore the benefit that users in several application areas can experience from a tab-complete editing assistance function. We develop an evaluation metric and adapt N-gram language models to the problem of predicting the subsequent words, given an initial text fragment. Using an instance-based method as baseline, we empirically study the predictability of call-center emails, personal emails, weather reports, and cooking recipes.

european conference on machine learning | 2005

Learning to complete sentences

Steffen Bickel; Peter Haider; Tobias Scheffer

We consider the problem of predicting how a user will continue a given initial text fragment. Intuitively, our goal is to develop a “tab-complete” function for natural language, based on a model that is learned from text data. We consider two learning mechanisms that generate predictive models from collections of application-specific document collections: we develop an N-gram based completion method and discuss the application of instance-based learning. After developing evaluation metrics for this task, we empirically compare the model-based to the instance-based method and assess the predictability of call-center emails, personal emails, and weather reports.

Explore More