Jörg Sander | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jörg Sander is active.

Explore More

Publication

Featured researches published by Jörg Sander.

international conference on management of data | 2000

LOF: identifying density-based local outliers

Markus M. Breunig; Hans-Peter Kriegel; Raymond T. Ng; Jörg Sander

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

Data Mining and Knowledge Discovery | 1998

Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Jörg Sander; Martin Ester; Hans-Peter Kriegel; Xiaowei Xu

The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm—called GDBSCAN—can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems.

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2011

Density-based clustering

Hans-Peter Kriegel; Peer Kröger; Jörg Sander; Arthur Zimek

Clustering refers to the task of identifying groups or clusters in a data set. In density‐based clustering, a cluster is a set of data objects spread in the data space over a contiguous region of high density of objects. Density‐based clusters are separated from each other by contiguous regions of low density of objects. Data objects located in low‐density regions are typically considered noise or outliers.

Lecture Notes in Computer Science | 1997

Spatial Data Mining: A Database Approach

Martin Ester; Hans-Peter Kriegel; Jörg Sander

Knowledge discovery in databases (KDD) is an important task in spatial databases since both, the number and the size of such databases are rapidly growing. This paper introduces a set of basic operations which should be supported by a spatial database system (SDBS) to express algorithms for KDD in SDBS. For this purpose, we introduce the concepts of neighborhood graphs and paths and a small set of operations for their manipulation. We argue that these operations are sufficient for KDD algorithms considering spatial neighborhood relations by presenting the implementation of four typical spatial KDD algorithms based on the proposed operations. Furthermore, the efficient support of operations on large neighborhood graphs and on large sets of neighborhood paths by the SDBS is discussed. Neighborhood indices are introduced to materialize selected neighborhood graphs in order to speed up the processing of the proposed operations.

international conference on data engineering | 1998

A distribution-based clustering algorithm for mining in large spatial databases

Xiaowei Xu; Martin Ester; Hans-Peter Kriegel; Jörg Sander

The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers clusters of arbitrary shape. Furthermore, DBCLASD does not require any input parameters, in contrast to the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requiring two input parameters, which may be difficult to provide for large databases. In terms of efficiency, DBCLASD is between CLARANS and DBSCAN, close to DBSCAN. Thus, the efficiency of DBCLASD on large spatial databases is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.

european conference on principles of data mining and knowledge discovery | 1999

OPTICS-OF: Identifying Local Outliers

Markus M. Breunig; Hans-Peter Kriegel; Raymond T. Ng; Jörg Sander

For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how ‘isolated’ this object is, with respect to the surrounding clustering structure. In this paper, we formally introduce a new notion of outliers which bases outlier detection on the same theoretical foundation as density-based cluster analysis. Our notion of an outlier is ‘local’ in the sense that the outlier-degree of an object is determined by taking into account the clustering structure in a bounded neighborhood of the object. We demonstrate that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches, and we also present an algorithm for finding them. Furthermore, we show that by combining the outlier detection with a density-based method to analyze the clustering structure, we can get the outliers almost for free if we already want to perform a cluster analysis on a data set.

Nucleic Acids Research | 2006

cisRED: a database system for genome-scale computational discovery of regulatory elements.

Gordon Robertson; Misha Bilenky; Keven Lin; An He; W. Yuen; M. Dagpinar; Richard Varhol; Kevin Teague; Obi L. Griffith; Xuekui Zhang; Yinghong Pan; Maik Hassel; Monica C. Sleumer; Wenying Pan; Erin Pleasance; M. Chuang; H. Hao; Yvonne Y. Li; Neil A. Robertson; Christopher D. Fjell; Bernard Li; Stephen B. Montgomery; Tamara Astakhova; Jianjun Zhou; Jörg Sander; Asim Siddiqui; Steven J.M. Jones

We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system (). The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded or installed locally.

international conference on computer vision | 2005

Segmenting brain tumors with conditional random fields and support vector machines

Chi-Hoon Lee; Mark W. Schmidt; Albert Murtha; Jörg Sander; Russell Greiner

Markov Random Fields (MRFs) are a popular and well-motivated model for many medical image processing tasks such as segmentation. Discriminative Random Fields (DRFs), a discriminative alternative to the traditionally generative MRFs, allow tractable computation with less restrictive simplifying assumptions, and achieve better performance in many tasks. In this paper, we investigate the tumor segmentation performance of a recent variant of DRF models that takes advantage of the powerful Support Vector Machine (SVM) classification method. Combined with a powerful Magnetic Resonance (MR) preprocessing pipeline and a set of ‘alignment-based’ features, we evaluate the use of SVMs, MRFs, and two types of DRFs as classifiers for three segmentation tasks related to radiation therapy target planning for brain tumors, two of which do not rely on ‘contrast agent’ enhancement. Our results indicate that the SVM-based DRFs offer a significant advantage over the other approaches.

knowledge discovery and data mining | 2003

Automatic extraction of clusters from hierarchical clustering representations

Jörg Sander; Xuejie Qin; Zhiyong Lu; Nan Niu; Alex Kovarsky

Hierarchical clustering algorithms are typically more effective in detecting the true clustering structure of a data set than partitioning algorithms. However, hierarchical clustering algorithms do not actually create clusters, but compute only a hierarchical representation of the data set. This makes them unsuitable as an automatic pre-processing step for other algorithms that operate on detected clusters. This is true for both dendrograms and reachability plots, which have been proposed as hierarchical clustering representations, and which have different advantages and disadvantages. In this paper we first investigate the relation between dendrograms and reachability plots and introduce methods to convert them into each other showing that they essentially contain the same information. Based on reachability plots, we then introduce a technique that automatically determines the significant clusters in a hierarchical cluster representation. This makes it for the first time possible to use hierarchical clustering as an automatic pre-processing step that requires no user interaction to select clusters from a hierarchical cluster representation.

Sigkdd Explorations | 2014

Ensembles for unsupervised outlier detection: challenges and research questions a position paper

Arthur Zimek; Ricardo J. G. B. Campello; Jörg Sander

Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.

Explore More