Klemens Böhm | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Klemens Böhm is active.

Explore More

Publication

Featured researches published by Klemens Böhm.

advances in databases and information systems | 2002

OLAP Query Evaluation in a Database Cluster: A Performance Study on Intra-Query Parallelism

Fuat Akal; Klemens Böhm; Hans-Jörg Schek

While cluster computing is well established, it is not clear how to coordinate clusters consisting of many database components in order to process high workloads. In this paper, we focus on Online Analytical Processing (OLAP) queries, i.e., relatively complex queries whose evaluation tends to be time-consuming, and we report on some observations and preliminary results of our PowerDB project in this context. We investigate how many cluster nodes should be used to evaluate an OLAP query in parallel. Moreover, we provide a classification of OLAP queries, which is used to decide, whether and how a query should be parallelized. We run extensive experiments to evaluate these query classes in quantitative terms. Our results are an important step towards a two-phase query optimizer. In the first phase, the coordination infrastructure decomposes a query into subqueries and ships them to appropriate cluster nodes. In the second phase, each cluster node optimizes and evaluates its subquery locally.

international conference on data engineering | 2001

Cache-aware query routing in a cluster of databases

Uwe Röhm; Klemens Böhm; Hans-Jörg Schek

We investigate query routing techniques in a cluster of databases for a query-dominant environment. The objective is to decrease query response time. Each component of the cluster runs an off-the-shelf DBMS and holds a copy of the whole database. The cluster has a coordinator that routes each query to an appropriate component. Considering queries of realistic complexity, e.g., TPC-R, this article addresses the following questions: Can routing benefit from caching effects due to previous queries? Since our components are black-boxes, how can we approximate their cache content? How to route a query, given such cache approximations? To answer these questions, we have developed a cache-aware query router that is based on signature approximations of queries. We report on experimental evaluations with the TPC-R benchmark using our PowerDB database cluster prototype. Our main result is that our approach of cache approximation routing is better than state-of-the-art strategies by a factor of two with regard to mean response time.

Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98- | 1998

Query optimization for structured documents based on knowledge on the document type definition

Klemens Böhm; Karl Aberer; M.T. Ozsu; K. Gayer

Declarative access mechanisms for structured document collections and for semi-structured data are becoming increasingly important. Using a rule-based approach for query optimization and applying it to such queries, we deploy knowledge on Document Type Definition (DTD) to formulate transformation rules for query-algebra terms. Specifically, we look at rules that serve navigation along paths by cutting off these paths or by replacing them with access operations to indices, i.e., materialized views on paths. We show for both cases that we correctly apply and completely exploit knowledge on the DTD, and we briefly discuss performance results.

international conference on data engineering | 2001

High-level parallelisation in a database cluster: a feasibility study using document services

Torsten Grabs; Klemens Böhm; Hans-Jörg Schek

Our concern is the design of a scalable infrastructure for complex application services. We want to find out if a cluster of commodity database systems is well-suited as such an infrastructure. To this end, we have carried out a feasibility study based on document services, e.g. document insertion and retrieval. We decompose a service request into short parallel database transactions. Our system, implemented as an extension of a transaction processing monitor, routes the short transactions to the appropriate database systems in the cluster. Routing depends on the data distribution that we have chosen. To avoid bottlenecks, we distribute document functionality, such as term extraction, over the cluster. Extensive experiments show the following. (1) A relatively small number of components - for example eight components

Proceedings IEEE Advances in Digital Libraries 2000 | 2000

On extending the XML engine with query-processing capabilities

Klemens Böhm

already suffices to cope with high workloads of more than 100 concurrently active clients. (2) Speedup and throughput increase linearly for insertion operations when increasing the cluster size. These observations also hold when bundling service invocations into transactions at the semantic layer. A specialized coordinator component then implements semantic serializability and atomicity. Our experiments show that such a coordinator has minimal impact on CPU resource consumption and on response times.

international conference on data engineering | 1999

Working together in Harmony-an implementation of the CORBA object query service and its evaluation

Uwe Röhm; Klemens Böhm

We study how to efficiently evaluate queries over XML documents whose representation is according to the XML specification, i.e., XML files. The software architecture is as follows: the XML engine (i.e., XML parser) makes the structure of the documents explicit. The query processor operates directly on the output of the XML engine. We see two basic alternatives of how such a query processor operates: event-based and tree-based. In the first case, the query processor immediately checks for each event, e.g., begin of an element, if it contributes to a query result or if it invalidates current partial results. In the second case, the query processor generates an explicit transient representation of the document structure and evaluates the query set-at-a-time. This work evaluates these approaches and some optimizations in quantitative terms. Our main results are as follows. The event-based evaluation scheme is approximately 10% faster, even with all the optimizations from this article. The overhead of the query processors is small, compared to the running times of the XML engine. Finally exploiting DTD information in this particular context does not lead to a better performance.

Multimedia Tools and Applications | 1999

Building a Hybrid Database Application for Structured Documents

Klemens Böhm; Karl Aberer; Wolfgang Klas

The CORBA standard, together with its service specifications, has gained considerable attention in recent years. The CORBA Object Query Service allows for declarative access to heterogeneous storage systems. We have come up with an implementation of this service called Harmony. The objective of the article is to provide a detailed description and quantitative assessment of Harmony. Its main technical characteristics are data-flow evaluation, bulk transfer and intra-query parallelism. To carry out the evaluation, we have classified data exchange between components of applications in several dimensions: one is to distinguish between point-, context- and bulk data access. We have compared Harmony with: (1) data access through application-specific CORBA objects, and (2) conventional client/server communication, i.e., Embedded SQL. Our results show that Harmony performs much better than Alternative 1 for bulk data access. Besides that, due to the features mentioned above, Harmony, performs approximately as well as conventional client/server communication mechanisms.The CORBA standard, together with its service specifications, has gained considerable attention in recent years. The CORBA Object Query Service allows for declarative access to heterogeneous storage systems. We have come up with an implementation of this service called Harmony. The objective of the article is to provide a detailed description and quantitative assessment of Harmony. Its main technical characteristics are data-flow evaluation, bulk transfer and intra-query parallelism. To carry out the evaluation, we have classified data exchange between components of applications in several dimensions: one is to distinguish between point-, context- and bulk data access. We have compared Harmony with: (1) data access through application-specific CORBA objects, and (2) conventional client/server communication, i.e., Embedded SQL. Our results show that Harmony performs much better than Alternative 1 for bulk data access. Besides that, due to the features mentioned above, Harmony, performs approximately as well as conventional client/server communication mechanisms.

State-of-the-Art in Content-Based Image and Video Retrieval [Dagstuhl Seminar, 5-10 December 1999] | 2001

Parallel NN-search for large multimedia repositories

Roger Weber; Klemens Böhm; Hans-Jörg Schek

In this article, we propose a database-internal representation for SGML-/HyTime-documents based on object-oriented database technology with the following features: documents of arbitrary type can be administered. The semantics of architectural forms is reflected by means of methods that are part of the database schema and by the database-internal representation of HyTime-specific characteristics. The framework includes mechanisms to ensure conformance of documents to the HyTime standard. Measures for improved performance of HyTime operations are also described. The database-internal representation of documents is a hybrid between a completely structured and a flat representation. Namely, the structured representation is better to support the HyTime semantics, and modifications of document components. On the other hand, most operations are faster for the flat representation, as will be shown.

very large data bases | 2002

FAS: a freshness-sensitive coordination middleware for a cluster of OLAP components

Uwe Röhm; Klemens Böhm; Hans-Jörg Schek; Heiko Schuldt

Nearest-neighbor search (NN-search) plays a key role for content-based retrieval over multimedia objects. However, performance of existing NN-search techniques is not satisfactory with large collections and with high-dimensional representations of the objects. To obtain response times that are interactive, our approach uses a linear algorithm, parallelizes it and works with approximations of the vectors. In more detail, we parallelize NN-search based on the VA-File in a Network of Workstations (NOW). This approach reduces search time to a reasonable level for relatively large collections. The best speedup we have observed is by almost 30 for a NOW with only three components with 900 MB of feature data. But this requires a number of design decisions, in particular when taking notions such as load dynamicity and heterogeneity of components into account. Our first contribution is to systematically describe and evaluate the various design alternatives, e.g., data placement or decomposing queries into subqueries. As another contribution, we predict the speedup and response times for a given setup.

very large data bases | 2001