Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter Haider is active.

Publication


Featured researches published by Peter Haider.


Nature Methods | 2011

Taxonomic metagenome sequence assignment with structured output models

Kaustubh Patil; Peter Haider; Phillip B. Pope; Peter J. Turnbaugh; Mark Morrison; Tobias Scheffer; Alice C. McHardy

Supplementary Figure 2 Scaffold-contig visualization of different binning methods for the WG-2 population in the Tammar wallaby metagenome sample. Supplementary Figure 3 Evaluation of different binning methods on short fragments of varying lengths. Supplementary Figure 4 Overlap between predictions of different methods on the TW sample for the three uncultured populations. Supplementary Figure 5 Overlap between predictions of different methods on TW sample for dominant phyla. Supplementary Table 1 Assignment accuracy of different binning methods on the simulated Acid Mine Drainage data set. Supplementary Table 2 Performance of different binning methods for the abundant populations in the TW sample. Supplementary Table 3 NUCmer analysis of the WG-1 assignments for TW sample. Supplementary Table 4 Modeled clades for the TW sample. Supplementary Table 5 Taxonomic assignments for abundant genera in the human gut metagenome samples. Supplementary Table 6 Bin validation for the human gut metagenome samples using marker genes. Supplementary Table 7 Validation for the human gut metagenome samples using CD-HIT (fraction matched). Supplementary Table 8 Modeled clades for PhyloPythiaS for the human gut metagenome samples (TS28 and TS29). Supplementary Table 9 Statistical comparison of the assignments of different methods on TW data set. Supplementary Table 10 Number of contigs classified by different methods at different taxonomic ranks for the TW sample. Supplementary Table 11 Effect of sample specific data on the assignment of the TW sample for PhyloPythiaS and PhymmBL. Supplementary Table 12 Genomes used for simulated short fragment test data set.


Sigkdd Explorations | 2005

Classifying search engine queries using the web as background knowledge

David S. Vogel; Steffen Bickel; Peter Haider; Rolf Schimpfky; Peter Siemen; Steve Bridges; Tobias Scheffer

The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in. Based on its performance on the classification of 800,000 example queries recorded from MSN search, the system received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005.


international conference on machine learning | 2007

Supervised clustering of streaming data for email batch detection

Peter Haider; Ulf Brefeld; Tobias Scheffer

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made --- owing to the streaming nature of the data --- then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.


international conference on machine learning | 2008

Learning from incomplete data with infinite imputations

Uwe Dick; Peter Haider; Tobias Scheffer

We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimization problem in which the distribution governing the missing values is a free parameter. We show that the optimal solution concentrates the density mass on finitely many imputations, and provide a corresponding algorithm for learning from incomplete data. We report on empirical results on benchmark data, and on the email spam application that motivates our work.


european conference on machine learning | 2005

Learning to complete sentences

Steffen Bickel; Peter Haider; Tobias Scheffer

We consider the problem of predicting how a user will continue a given initial text fragment. Intuitively, our goal is to develop a “tab-complete” function for natural language, based on a model that is learned from text data. We consider two learning mechanisms that generate predictive models from collections of application-specific document collections: we develop an N-gram based completion method and discuss the application of instance-based learning. After developing evaluation metrics for this task, we empirically compare the model-based to the instance-based method and assess the predictability of call-center emails, personal emails, and weather reports.


knowledge discovery and data mining | 2012

Discriminative clustering for market segmentation

Peter Haider; Luca Chiarandini; Ulf Brefeld

We study discriminative clustering for market segmentation tasks. The underlying problem setting resembles discriminative clustering, however, existing approaches focus on the prediction of univariate cluster labels. By contrast, market segments encode complex (future) behavior of the individuals which cannot be represented by a single variable. In this paper, we generalize discriminative clustering to structured and complex output variables that can be represented as graphical models. We devise two novel methods to jointly learn the classifier and the clustering using alternating optimization and collapsed inference, respectively. The two approaches jointly learn a discriminative segmentation of the input space and a generative output prediction model for each segment. We evaluate our methods on segmenting user navigation sequences from Yahoo! News. The proposed collapsed algorithm is observed to outperform baseline approaches such as mixture of experts. We showcase exemplary projections of the resulting segments to display the interpretability of the solutions.


international conference on machine learning | 2009

Bayesian clustering for email campaign detection

Peter Haider; Tobias Scheffer

We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained. We present a case study that evaluates Bayesian clustering for this application.


empirical methods in natural language processing | 2005

Predicting Sentences using N-Gram Language Models

Steffen Bickel; Peter Haider; Tobias Scheffer


international conference on machine learning | 2012

Finding Botnets Using Minimal Graph Clusterings

Peter Haider; Tobias Scheffer


text retrieval conference | 2006

Highly Scalable Discriminative Spam Filtering.

Michael Brückner; Peter Haider; Tobias Scheffer

Collaboration


Dive into the Peter Haider's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Uwe Dick

University of Potsdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arne Jansen

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Siemen

Humboldt University of Berlin

View shared research outputs
Researchain Logo
Decentralizing Knowledge