Joerg Sander | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joerg Sander is active.

Explore More

Publication

Featured researches published by Joerg Sander.

knowledge discovery and data mining | 2008

Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering

Gabriela Moise; Joerg Sander

Projected and subspace clustering algorithms search for clusters of points in subsets of attributes. Projected clustering computes several disjoint clusters, plus outliers, so that each cluster exists in its own subset of attributes. Subspace clustering enumerates clusters of points in all subsets of attributes, typically producing many overlapping clusters. One problem of existing approaches is that their objectives are stated in a way that is not independent of the particular algorithm proposed to detect such clusters. A second problem is the definition of cluster density based on user-defined parameters, which makes it hard to assess whether the reported clusters are an artifact of the algorithm or whether they actually stand out in the data in a statistical sense. We propose a novel problem formulation that aims at extracting axis-parallel regions that stand out in the data in a statistical sense. The set of axis-parallel, statistically significant regions that exist in a given data set is typically highly redundant. Therefore, we formulate the problem of representing this set through a reduced, non-redundant set of axis-parallel, statistically significant regions as an optimization problem. Exhaustive search is not a viable solution due to computational infeasibility, and we propose the approximation algorithm STATPC. Our comprehensive experimental evaluation shows that STATPC significantly outperforms existing projected and subspace clustering algorithms in terms of accuracy.

pacific-asia conference on knowledge discovery and data mining | 2013

Density-Based Clustering Based on Hierarchical Density Estimates

Ricardo J. G. B. Campello; Davoud Moulavi; Joerg Sander

We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed. For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem. We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data.

Hypertension | 2008

Mitochondrial dysfunction in the hypertensive rat brain: respiratory complexes exhibit assembly defects in hypertension.

Ana Lopez-Campistrous; Li Hao; Wang Xiang; Dong Ton; Paul D. Semchuk; Joerg Sander; Michael J. Ellison; Carlos Fernandez-Patron

The central nervous system plays a critical role in the normal control of arterial blood pressure and in its elevation in virtually all forms of hypertension. Mitochondrial dysfunction has been increasingly associated with the development of hypertension. Therefore, we examined whether mitochondrial dysfunction occurs in the brain in hypertension and characterized it at the molecular scale. Mitochondria from whole brain and brain stem from 12-week–old spontaneously hypertensive rats with elevated blood pressure (190±5 mm Hg) were compared against those from age-matched normotensive (134±7 mm Hg) Wistar Kyoto rats (n=4 in each group). Global differential analysis using 2D electrophoresis followed by tandem mass spectrometry–based protein identification suggested a downregulation of enzymes involved in cellular energetics in hypertension. Targeted differential analysis of mitochondrial respiratory complexes using the classical blue-native SDS-PAGE/Western method and a complementary combination of sucrose-gradient ultracentrifugation/tandem mass spectrometry revealed previously unknown assembly defects in complexes I, III, IV, and V in hypertension. Interestingly, targeted examination of the brain stem, a regulator of cardiovascular homeostasis and systemic blood pressure, further showed the occurrence of mitochondrial complex I dysfunction, elevated reactive oxygen species production, decreased ATP synthesis, and impaired respiration in hypertension. Our findings suggest that in already-hypertensive spontaneously hypertensive rats, the brain respiratory complexes exhibit previously unknown assembly defects. These defects impair the function of the mitochondrial respiratory chain. This mitochondrial dysfunction localizes to the brain stem and is, therefore, likely to contribute to the development, as well as to pathophysiological complications, of hypertension.

Distributed and Parallel Databases | 2007

Adaptive processing of historical spatial range queries in peer-to-peer sensor networks

Alexandru Coman; Joerg Sander; Mario A. Nascimento

Abstract We investigate the problem of processing historical queries on a sensor network. Since data is considered to have been already collected at the sensor nodes, the main issue is exploring the spatial component of the query in order to minimize its cost represented by the energy consumption. We assume queries can be issued at any network node, i.e., there is no central base station and all nodes have only local knowledge of the network. On the one hand, a globally optimum query processing plan is desirable but its construction is not possible due to the lack of global knowledge of the network. On the other hand, while a simple network flooding is feasible, it is not a practical choice from a cost perspective. To address this problem we propose a two-phase query processing strategy, where in the first phase a path from the query originator to the query region is found and in the second phase the query is processed within the query region itself. This strategy is supported by analytical models that are used to dynamically select the best processing strategy depending on the query specifics. Our extensive analytical and experimental results show that our analytical models are accurate and that the two-phase strategy is better suited for small to medium sized queries, being up to 10 times more cost effective than a typical network flooding. In addition, the dynamic selection of a query processing technique proved itself capable of always delivering at least as good performance as the most energy efficient strategy for all query sizes.

international conference on data mining | 2006

Speedup Clustering with Hierarchical Ranking

Jianjun Zhou; Joerg Sander

Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed.

ieee international conference on data science and advanced analytics | 2016

Active Semi-Supervised Classification Based on Multiple Clustering Hierarchies

Antonio J.L. Batista; Ricardo J. G. B. Campello; Joerg Sander

Active semi-supervised learning can play an important role in classification scenarios in which labeled data are difficult to obtain, while unlabeled data can be easily acquired. This paper focuses on an active semi-supervised algorithm that can be driven by multiple clustering hierarchies. If there is one or more hierarchies that can reasonably align clusters with class labels, then a few queries are needed to label with high quality all the unlabeled data. We take as a starting point the well-known Hierarchical Sampling (HS) algorithm and perform changes in different aspects of the original algorithm in order to tackle its main drawbacks, including its sensitivity to the choice of a single particular hierarchy. Experimental results over many real datasets show that the proposed algorithm performs superior or competitive when compared to a number of state-of-the-art algorithms for active semi-supervised classification.

international workshop computational transportation science | 2013

Discovering Spatial Co-Clustering Patterns in Traffic Collision Data

Dapeng Li; Joerg Sander; Mario A. Nascimento; Dae-Won Kwon

Identifying spatial patterns of traffic collisions is critical for improving the efficiency and effectiveness of the deployment of traffic enforcement resources as well as road safety. In recent years, many studies have focused on finding locations with high collision concentration, so-called hotspots, without integrating the likely available non-spatial attributes into analysis. In this paper we propose a method for identifying the sets of non-spatial attribute-value pairs (AVPs) that together contribute significantly to the spatial clustering of the corresponding collisions. We call such a set of AVPs a Spatial Co-Clustering Pattern (SCCP). By applying our method on the city of Edmontons historical collision data, we discovered a larger number of meaningful hotspot patterns than traditional hotspot analysis methods did, and revealed the relevant non-spatial indicators for explaining those hotspots.

siam international conference on data mining | 2016

Finding Surprisingly Frequent Patterns of Variable Lengths in Sequence Data

Reza Sadoddin; Joerg Sander; Davood Rafiei

We address the problem of finding ‘surprising’ patterns of variable length in sequence data, where a surprising pattern is defined as a subsequence of a longer sequence, whose observed frequency is statistically significant with respect to a given distribution. Finding statistically significant patterns in sequence data is the core task in some interesting applications such as Biological motif discovery and anomaly detection. We show that the presence of few ‘true’ surprising patterns in the data could cause a large number of highlycorrelated patterns to stand statistically significant just because of those few significant patterns. Our approach to solving the ‘redundant patterns’ problem is based on capturing the dependencies between patterns through an ‘explain’ relationship where a set of patterns can explain the statistical significance of another pattern. This allows us to address the problem of redundancy by choosing a few ‘core’ patterns which explain the significance of all other significant patterns. We propose a greedy algorithm for efficiently finding an approximate core pattern set of minimum size. Using both synthetic and real-world sequential data, chosen from different domains including Medicine and Bioinformatics, we show that the proposed notion of core patterns very closely matches the notion of ‘true’ surprising patterns in

ieee international conference on data science and advanced analytics | 2016

On the Evaluation of Outlier Detection and One-Class Classification Methods

Lorne Swersky; Henrique O. Marques; Joerg Sander; Ricardo J. G. B. Campello; Arthur Zimek

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. Our experiments led to conclusions that do not fully agree with those of previous work.

international conference on data mining | 2014

Heavyweight Pattern Mining in Attributed Flow Graphs

Carolina Simões Gomes; José Nelson Amaral; Joerg Sander; Joran Siu; Li Ding

This paper defines a new problem - heavyweight pattern mining in attributed flow graphs. The problem can be described as the discovery of patterns in flow graphs that have sets of attributes associated with their nodes. A connection between nodes is represented as a directed edge. The amount of load that goes through a path between nodes, or the frequency of transmission of such load between nodes, is represented as edge weights. A heavyweight pattern is a sub-set of attributes, found in a dataset of attributed flow graphs, that are connected by edges and have a computed weight higher than an user-defined threshold. A new algorithm called AFG Miner is introduced, the first one to our knowledge that finds heavyweight patterns in a dataset of attributed flow graphs and associates each pattern with its occurrences. The paper also describes a new tool for compiler engineers, HEP Miner, that applies the AFG Miner algorithm to Profile-based Program Analysis modeled as a heavyweight pattern mining problem.

Explore More