Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Karin Kailing is active.

Publication


Featured researches published by Karin Kailing.


international conference on management of data | 2004

Computing Clusters of Correlation Connected objects

Christian Böhm; Karin Kailing; Peer Kröger; Arthur Zimek

The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.


extending database technology | 2004

Efficient Similarity Search for Hierarchical Data in Large Databases

Karin Kailing; Hans-Peter Kriegel; Stefan Schönauer; Thomas Seidl

Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.


pacific-asia conference on knowledge discovery and data mining | 2004

Clustering Multi-represented Objects with Noise

Karin Kailing; Hans-Peter Kriegel; Alexey Pryakhin; Matthias Schubert

Traditional clustering algorithms are based on one representation space, usually a vector space. However, in a variety of modern applications, multiple representations exist for each object. Molecules for example are characterized by an amino acid sequence, a secondary structure and a 3D representation. In this paper, we present an efficient density-based approach to cluster such multi-represented data, taking all available representations into account. We propose two different techniques to combine the information of all available representations dependent on the application. The evaluation part shows that our approach is superior to existing techniques.


european conference on principles of data mining and knowledge discovery | 2003

Ranking Interesting Subspaces for Clustering High Dimensional Data

Karin Kailing; Hans-Peter Kriegel; Peer Kröger; Stefanie Wanka

Application domains such as life sciences, e.g. molecular biology produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensional, inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. We present a pre-processing step for traditional clustering algorithms, which detects all interesting subspaces of high-dimensional data containing clusters. For this purpose, we define a quality criterion for the interestingness of a subspace and propose an efficient algorithm called RIS (Ranking Interesting Subspaces) to examine all such subspaces. A broad evaluation based on synthetic and real-world data sets empirically shows that RIS is suitable to find all relevant subspaces in large, high dimensional, sparse data and to rank them accordingly.


international conference on knowledge-based and intelligent information and engineering systems | 2004

Content-Based Image Retrieval Using Multiple Representations

Karin Kailing; Hans-Peter Kriegel; Stefan Schönauer

Many different approaches for content-based image retrieval have been proposed in the literature. Successful approaches consider not only simple features like color, but also take the structural relationship between objects into account. In this paper we describe two models for image representation which integrate structural features and content features in a tree or a graph structure. The effectiveness of this two approaches is evaluated with real world data, using clustering as means for evaluation. Furthermore, we show that combining those two models can further enhance the retrieval accuracy.


Knowledge and Information Systems | 2006

Extending metric index structures for efficient range query processing

Karin Kailing; Hans-Peter Kriegel; Martin Pfeifle; Stefan Schönauer

Databases are getting more and more important for storing complex objects from scientific, engineering, or multimedia applications. Examples for such data are chemical compounds, CAD drawings, or XML data. The efficient search for similar objects in such databases is a key feature. However, the general problem of many similarity measures for complex objects is their computational complexity, which makes them unusable for large databases. In this paper, we combine and extend the two techniques of metric index structures and multi-step query processing to improve the performance of range query processing. The efficiency of our methods is demonstrated in extensive experiments on real-world data including graphs, trees, and vector sets.


international conference on data engineering | 2004

Efficient similarity search in large databases of tree structured objects

Karin Kailing; Hans-Peter Kriegel; Stefan Schönauer; Thomas Seidl

We implemented our new approach for efficient similarity search in large databases of tree structures. Our experiments show that filtering significantly accelerates the complex task of similarity search for tree-structured objects. Moreover, they show that no single feature of a tree is sufficient for effective filtering, but only the combination of structural and content-based filters yields good results.


siam international conference on data mining | 2004

Density-Connected Subspace Clustering for High-Dimensional Data

Karin Kailing; Hans-Peter Kriegel; Peer Kröger


Archive | 2004

Efficient Indexing of Complex Objects for Density-based Clustering

Karin Kailing; Hans-Peter Kriegel; Martin Pfeifle


Datenbank-spektrum | 2004

Immer größere und komplexere Datenmengen: Herausforderungen für Clustering-Algorithmen.

Christian Böhm; Karin Kailing; Peer Kröger; Hans-Peter Kriegel

Collaboration


Dive into the Karin Kailing's collaboration.

Researchain Logo
Decentralizing Knowledge