Markus Goldstein
German Research Centre for Artificial Intelligence
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Markus Goldstein.
knowledge discovery and data mining | 2013
Mennatallah Amer; Markus Goldstein; Slim Abdennadher
Support Vector Machines (SVMs) have been one of the most successful machine learning techniques for the past decade. For anomaly detection, also a semi-supervised variant, the one-class SVM, exists. Here, only normal data is required for training before anomalies can be detected. In theory, the one-class SVM could also be used in an unsupervised anomaly detection setup, where no prior training is conducted. Unfortunately, it turns out that a one-class SVM is sensitive to outliers in the data. In this work, we apply two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one-class SVMs and eta one-class SVMs. The key idea of both modifications is, that outliers should contribute less to the decision boundary as normal instances. Experiments performed on datasets from UCI machine learning repository show that our modifications are very promising: Comparing with other standard unsupervised anomaly detection algorithms, the enhanced one-class SVMs are superior on two out of four datasets. In particular, the proposed eta one-class SVM has shown the most promising results.
Pattern Analysis and Applications | 2014
Matthias Reif; Faisal Shafait; Markus Goldstein; Thomas M. Breuel; Andreas Dengel
Choosing a suitable classifier for a given dataset is an important part of developing a pattern recognition system. Since a large variety of classification algorithms are proposed in literature, non-experts do not know which method should be used in order to obtain good classification results on their data. Meta-learning tries to address this problem by recommending promising classifiers based on meta-features computed from a given dataset. In this paper, we empirically evaluate five different categories of state-of-the-art meta-features for their suitability in predicting classification accuracies of several widely used classifiers (including Support Vector Machines, Neural Networks, Random Forests, Decision Trees, and Logistic Regression). Based on the evaluation results, we have developed the first open source meta-learning system that is capable of accurately predicting accuracies of target classifiers. The user provides a dataset as input and gets an automatically created high-performance ready-to-use pattern recognition system in a few simple steps. A user study of the system with non-experts showed that the users were able to develop more accurate pattern recognition systems in significantly less development time when using our system as compared to using a state-of-the-art data mining software.
international conference on pattern recognition | 2008
Matthias Reif; Markus Goldstein; Armin Stahl; Thomas M. Breuel
In this paper a modified decision tree algorithm for anomaly detection is presented. During the tree building process, densities for the outlier class are used directly in the split point determination algorithm. No artificial counter-examples have to be sampled from the unknown class, which yields to more precise decision boundaries and a deterministic classification result. Furthermore, the prior of the outlier class can be used to adjust the sensitivity of the anomaly detector. The proposed method combines the advantages of classification trees with the benefit of a more accurate representation of the outliers. For evaluation, we compare our approach with other state-of-the-art anomaly detection algorithms on four standard data sets including the KDD-Cup 99. The results show that the proposed method performs as well as more complex approaches and is even superior on three out of four data sets.
international conference on document analysis and recognition | 2013
Johann Gebhardt; Markus Goldstein; Faisal Shafait; Andreas Dengel
Automatically identifying that a certain page in a set of documents is printed with a different printer than the rest of the documents can give an important clue for a possible forgery attempt. Different printers vary in their produced printing quality, which is especially noticeable at the edges of printed characters. In this paper, a system using the difference in edge roughness to distinguish laser printed ages from inkjet printed pages is presented. Several feature extraction methods have been developed and evaluated for that purpose. In contrast to previous work, this system uses unsupervised anomaly detection to detect documents printed by a different printing technique than the majority of the documents among a set. This approach has the advantage that no prior training using genuine documents has to be done. Furthermore, we created a dataset featuring 1200 document images from different domains (invoices, contracts, scientific papers) printed by 7 different inkjet and 13 laser printers. Results show that the presented feature extraction method achieves the best outlier rank score in comparison to state-of-the-art features.
conference on emerging network experiment and technology | 2008
Markus Goldstein; Matthias Reif; Armin Stahl; Thomas M. Breuel
Distributed Denial of Service (DDoS) attack mitigation systems usually generate a list of filter rules in order to block malicious traffic. In contrast to this binary decision we suggest to use traffic shaping whereas the bandwidth limit is defined by the probability of a source to be a legal user. As a proof of concept, we implemented a simple high performance Linux kernel module nf-HiShape which is able to shape thousands of source IP addresses at different bandwidth limits even under high packet rates. Our shaping algorithm is comparable to Random Early Detection (RED) applied on every single source IP range. The evaluation shows, that our kernel module can handle up to 50,000 IP ranges at nearly constant throughput whereas Linux tc already decreases throughput at about 200 ranges.
availability, reliability and security | 2009
Markus Goldstein; Matthias Reif; Armin Stahl; Thomas M. Breuel
Source IP addresses are often used as a major feature for user modeling in computer networks. Particularly in the field of Distributed Denial of Service (DDoS) attack detection and mitigation traffic models make extensive use of source IP addresses for detecting anomalies. Typically the real IP address distribution is strongly undersampled due to a small amount of observations. Density estimation overcomes this shortage by taking advantage of IP neighborhood relations. In many cases simple models are implicitly used or chosen intuitively as a network based heuristic. In this paper we review and formalize existing models including a hierarchical clustering approach first. In addition, we present a modified k-means clustering algorithm for source IP density estimation as well as a statistical motivated smoothing approach using the Nadaraya-Watson kernel-weighted average. For performance evaluation we apply all methods on a 90 days real world dataset consisting of 1.3 million different source IP addresses and try to predict the users of the following next 10 days. ROC curves and an example DDoS mitigation scenario show that there is no uniformly better approach: k-means performs best when a high detection rate is needed whereas statistical smoothing works better for low false alarm rate requirements like the DDoS mitigation scenario.
Archive | 2012
Markus Goldstein; Andreas Dengel
international conference on networking | 2008
Markus Goldstein; Christoph H. Lampert; Matthias Reif; Armin Stahl; Thomas M. Breuel
international conference on pattern recognition | 2012
Markus Goldstein
Archive | 2008
Mehran Roshandel; Markus Goldstein; Matthias Reif; Armin Stahl; Thomas Breue