Thomas Seidl
Ludwig Maximilian University of Munich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Seidl.
Lecture Notes in Computer Science | 1999
Mihael Ankerst; Gabi Kastenmüller; Hans-Peter Kriegel; Thomas Seidl
Classification is one of the basic tasks of data mining in modern database applications including molecular biology, astronomy, mechanical engineering, medical imaging or meteorology. The underlying models have to consider spatial properties such as shape or extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful similarity model for 3D objects. Particular flexibility is provided by using quadratic form distance functions in order to account for errors of measurement, sampling, and numerical rounding that all may result in small displacements and rotations of shapes. For query processing, a general filter-refinement architecture is employed that efficiently supports similarity search based on quadratic forms. An experimental evaluation in the context of molecular biology demonstrates both, the high classification accuracy of more than 90% and the good performance of the approach.
international conference on management of data | 1998
Thomas Seidl; Hans-Peter Kriegel
For an increasing number of modern database applications, efficient support of similarity search becomes an important task. Along with the complexity of the objects such as images, molecules and mechanical parts, also the complexity of the similarity models increases more and more. Whereas algorithms that are directly based on indexes work well for simple medium-dimensional similarity distance functions, they do not meet the efficiency requirements of complex high-dimensional and adaptable distance functions. The use of a multi-step query processing strategy is recommended in these cases, and our investigations substantiate that the number of candidates which are produced in the filter step and exactly evaluated in the refinement step is a fundamental efficiency parameter. After revealing the strong performance shortcomings of the state-of-the-art algorithm for k-nearest neighbor search [Korn et al. 1996], we present a novel multi-step algorithm which is guaranteed to produce the minimum number of candidates. Experimental evaluations demonstrate the significant performance gain over the previous solution, and we observed average improvement factors of up to 120 for the number of candidates and up to 48 for the total runtime.
international conference on data engineering | 1998
Stefan Berchtold; Bernhard Ertl; Daniel A. Keim; Hans-Peter Kriegel; Thomas Seidl
Similarity search in multimedia databases requires an efficient support of nearest neighbor search on a large set of high dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest neighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for high dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the X tree (up to a factor of 4).
conference on image and video retrieval | 2010
Christian Beecks; Merih Seran Uysal; Thomas Seidl
The Signature Quadratic Form Distance is an adaptive similarity measure for flexible content-based feature representations of multimedia data. In this paper, we present a deep survey of the mathematical foundation of this similarity measure which encompasses the classic Quadratic Form Distance defined only for the comparison between two feature histograms of the same length and structure. Moreover, we give the benefits of the Signature Quadratic Form Distance and experimental evaluation on numerous real-world databases.
data and knowledge engineering | 2007
Mohammed Javeed Zaki; Markus Peters; Ira Assent; Thomas Seidl
We present a novel algorithm called Clicks, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, Clicks mines subspace clusters. It uses a selective vertical method to guarantee complete search. Clicks outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for high-dimensional datasets. These results are demonstrated in a comprehensive performance study on real and synthetic datasets.
extending database technology | 2004
Karin Kailing; Hans-Peter Kriegel; Stefan Schönauer; Thomas Seidl
Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.
extending database technology | 2008
Ira Assent; Ralph Krieger; Farzad Afschari; Thomas Seidl
Continuous growth in sensor data and other temporal data increases the importance of retrieval and similarity search in time series data. Efficient time series query processing is crucial for interactive applications. Existing multidimensional indexes like the R-tree provide efficient querying for only relatively few dimensions. Time series are typically long which corresponds to extremely high dimensional data in multidimensional indexes. Due to massive overlap of index descriptors, multidimensional indexes degenerate for high dimensions and access the entire data by random I/O. Consequently, the efficiency benefits of indexing are lost. In this paper, we propose the TS-tree (time series tree), an index structure for efficient time series retrieval and similarity search. Exploiting inherent properties of time series quantization and dimensionality reduction, the TS-tree indexes high-dimensional data in an overlap-free manner. During query processing, powerful pruning via quantized separator and meta data information greatly reduces the number of pages which have to be accessed, resulting in substantial speed-up. In thorough experiments on synthetic and real world time series data we demonstrate that our TS-tree outperforms existing approaches like the R*-tree or the quantized A-tree.
international conference on data engineering | 2011
Emmanuel Müller; Matthias Schiffer; Thomas Seidl
Outlier mining is an important data analysis task to distinguish exceptional outliers from regular objects. For outlier mining in the full data space, there are well established methods which are successful in measuring the degree of deviation for outlier ranking. However, in recent applications traditional outlier mining approaches miss outliers as they are hidden in subspace projections. Especially, outlier ranking approaches measuring deviation on all available attributes miss outliers deviating from their local neighborhood only in subsets of the attributes. In this work, we propose a novel outlier ranking based on the objects deviation in a statistically selected set of relevant subspace projections. This ensures to find objects deviating in multiple relevant subspaces, while it excludes irrelevant projections showing no clear contrast between outliers and the residual objects. Thus, we tackle the general challenges of detecting outliers hidden in subspaces of the data. We provide a selection of subspaces with high contrast and propose a novel ranking based on an adaptive degree of deviation in arbitrary subspaces. In thorough experiments on real and synthetic data we show that our approach outperforms competing outlier ranking approaches by detecting outliers in arbitrary subspace projections.
international conference on multimedia and expo | 2010
Christian Beecks; Merih Seran Uysal; Thomas Seidl
Determining similarities among data objects is a core task of content-based multimedia retrieval systems. Approximating data object contents via flexible feature representations, such as feature signatures, multimedia retrieval systems frequently determine similarities among data objects by applying distance functions. In this paper, we compare major state-of-the-art similarity measures applicable to flexible feature signatures with respect to their qualities of effectiveness and efficiency. Furthermore, we study the behavior of the similarity measures by discussing their properties. Our findings can be used in guiding the development of content-based retrieval applications for numerous domains.
international conference on data engineering | 1994
Daniel A. Keim; Hans-Peter Kriegel; Thomas Seidl
Describes a query system that provides visual relevance feedback in querying large databases. The goal is to support the process of data mining by representing as many data items as possible on the display. By arranging and coloring the data items as pixels according to their relevance for the query, the user gets a visual impression of the resulting data set. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. Furthermore, by using multiple windows for different parts of a complex query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. The system allows one to represent the largest amount of data that can be visualized on current display technology, provides valuable feedback in querying the database, and allows the user to find results which would otherwise remain hidden in the database.<<ETX>>