Karsten Schmidt
Kaiserslautern University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karsten Schmidt.
international conference on data engineering | 2007
M. Liihring; Kai-Uwe Sattler; Karsten Schmidt; Eike Schallehn
In recent years the support for index tuning as pan of physical database design has gained focus in research and product development, which resulted in index and design advisors. Nevertheless, these tools provide a one-off solution for a continuous task and are not deeply integrated with the DBMS functionality by only applying the query optimizer for index recommendation and profit estimation and decoupling the decision about and execution of index configuration changes from the core system functionality. In this paper we propose an approach that continuously collects statistics for recommended indexes and based on this, repetitively solves the Index Selection Problem (lSP).A key novelty is the on-the-fly index generation during query processing implemented by new query plan operators In-dexBuildScan and SwitchPlan. Finally, we present the implementation and evaluation of the introduced concepts as part of the PostgreSQL system.
international database engineering and applications symposium | 2007
Theo Härder; Christian Mathis; Karsten Schmidt
Because XML documents tend to be very large, are accessed by declarative and navigational languages, and often are processed in a collaborative way using read/write transactions, their fine-grained storage and management in XML DBMSs is a must for which, in turn, a flexible and space-economic tree representation is mandatory. In this paper, we explore a variety of options to natively store, encode, and compress XML documents thereby preserving the full DBMS processing flexibility on the documents required by the various language models and usage characteristics. Important issues of our empirical study are related to node labeling, document container layout, indexing, as well as structure and content compression. Encoding and compression of XML documents with their complete structure leads to a space consumption of ~40% to ~60% compared to their plain representation, whereas structure virtualization (elementless storage) saves in the average more than 10%, in addition.
Computer Science - Research and Development | 2009
Christian Mathis; Theo Härder; Karsten Schmidt
AbstractXML documents contain substantial redundancy in their structure part, because each path from the root node to a leaf node is explicitly represented and typically large sets of such path instances belong to a path class, i.e., the nodes of the path instances are labeled by the same sequence of element (or attribute) names. To save storage space and I/O cost, we want to get rid of this structural redundancy to the extent possible. While all known methods for the physical representation (storage) of XML documents proceed from the root via the element/attribute hierarchy (internal nodes) down to the leaves (values), we follow an upside-down approach which explicitly stores the values and only reconstructs the internal nodes, if needed. The cornerstones for such a solution are suitable node labels and a path synopsis which efficiently represents all path classes of an XML document. As a solution, we propose a compact internal storage format for native XML database systems where the inner structure of the stored documents is virtualized. Because this elementless storage format provides an efficient reconstruction of a document using its path synopsis, all processing properties are preserved and the semantics of navigational and declarative operations of XML languages remains unchanged. Adjusted indexes support the full spectrum of so-called content-and-structure single path queries. Apart from greatly reduced storage consumption, our approach demonstrates its superiority, compared to competing methods, not only for a substantial fraction of those queries, but also for storing, reconstructing, and navigating XML documents.
Computer Science - Research and Development | 2015
Christian Mathis; Theo Härder; Karsten Schmidt; Sebastian Bächle
XML Indexing and Storage (XMIS) techniques are crucial for the functionality and the overall performance of an XML database management system (XDBMS). Because of the complexity of XQuery and performance demands of XML query processing, efficient path processing operators—including those for tree-pattern queries (so-called twigs)—are urgently needed for which tailor-made indexes and their flexible use are indispensable. Although XML indexing and storage are standard problems and, of course, manifold approaches have been proposed in the last decade, adaptive and broad-enough solutions for satisfactory query evaluation support of all path processing operators are missing in the XDBMS context. Therefore, we think that it is worthwhile to take a step back and look at the complete picture to derive a salient and holistic solution. To do so, we first compile an XMIS wish list containing what—in our opinion—are essential functional storage and indexing requirements in a modern XDBMS. With these desiderata in mind, we then develop a new XMIS scheme, which—by reconsidering previous work—can be seen as a practical and general approach to XML storage and indexing. Interestingly, by working on both problems at the same time, we can make the storage and index managers live in a kind of symbiotic partnership, because the document store re-uses ideas originally proposed by the indexing community and vice versa. The XMIS scheme is implemented in XTC, an XDBMS used for empirical tests.
international database engineering and applications symposium | 2008
Karsten Schmidt; Theo Härder
There are numerous and influential parameters associated with the selection of suitable native storage structures for XML documents within an XML DBMS. The most important parameters are related to node labeling, path synopsis, document container layout, indexing, and text compression. While node labeling and the presence of a path synopsis greatly determine flexibility and efficiency of the entire DBMS-internal XML processing, the remaining issues primarily address I/O demand and space consumption. In particular, text compression---considered orthogonal to the design choices affected by the other parameters---implies additional algorithmic costs for encoding/decoding during document processing. Having the vision in mind that future storage managers can autonomously figure out optimal choices for all of these parameters, we discuss how various storage options favor different usage patterns and how they can be specified beforehand to influence native XML storage options by the anticipated usages.
database systems for advanced applications | 2009
Karsten Schmidt; Sebastian Bächle; Theo Härder
The rapidly increasing number of XML-related applications indicates a growing need for efficient, dynamic, and native XML support in database management systems (XDBMS). So far, both industry and academia primarily focus on benchmarking of high-level performance figures for a variety of applications, queries, or documents --- frequently executed in artificial workload scenarios --- and, therefore, may analyze and compare only specific or incidental behavior of the underlying systems. To cover the full XDBMS support, it is mandatory to benchmark performance-critical components bottom-up, thereby removing bottlenecks and optimizing component behavior. In this way, wrong conclusions are avoided when new techniques such as tailored XML operators, index types, or storage mappings with unfamiliar performance characteristics are used. As an experience report, we present what we have learned from benchmarking a native XDBMS and recommend certain setups to do it in a systematic and meaningful way.
computer science and software engineering | 2009
Karsten Schmidt; Yi Ou; Theo Härder
Most database systems (DBMSs) today are operating on servers equipped with magnetic disks. In our contribution, we want to motivate the use of two emerging and striking technologies, namely native XML DBMSs (XDBMSs for short) and solid state disks. In this context, the traditional read/write model optimized by a sophisticated cache hierarchy and the IO cost model for databases needs adjustment. Such new devices together with optimized storage mapping of XML documents provide a number of challenging characteristics. We discuss howtheir optimization potential ca be exploited to enhance transactional DB performance under reduced energy consumption to match increasing application demands and, at the same time, to guarantee economic energy use in data centers.
business intelligence for the real-time enterprises | 2015
Karsten Schmidt; Sebastian Bächle; Philipp Scholl; Georg Nold
Identifying and exploring relevant content in growing document collections is a challenge for researchers, users, and system providers alike. Supporting this is crucial for companies offering knowledge in the form of documents as their core product. Our demo shows an intelligent way of doing guided research in big text collections, using the collection of the major scientific publisher Springer SBM as an example data set. We use the SAP HANA platform for flexible text analysis, ad-hoc calculations and data linkage, in order to enhance the experience of users navigating and exploring publications. We integrate unstructured data (textual documents) and structured data (document metadata and web server logs), and provide interactive filters in order to enable a responsive user experience while searching for relevant content. With HANA, we are able to implement this functionality over big data on a single machine by making use of HANA’s SQL data store and the built-in application server.
Computer Science - Research and Development | 2012
Karsten Schmidt; Sebastian Bächle
Effective I/O buffering is a performance-critical task in database management systems. Accordingly, systems usually employ various special-purpose buffers to align, e.g., device speed, page size, and replacement policies with the actual data and workload. However, such partitioning of available buffer memory results in complex optimization problems for database administrators and also in fragile configurations which quickly deteriorate on workload shifts. Reliable forecasts of I/O costs enable a system to evaluate alternative configurations to continuously optimize its buffer memory allocation at runtime. So far, all techniques proposed for the prediction of buffer performance focus solely on hit ratio gains for increased buffer sizes to identify buffers which promise the greatest benefit. These approaches, however, assume that their forecast allows to extrapolate the effect for buffer downsizing, too. As we will show, this comes along with a severe risk of wrong tuning decisions, which may heavily impact system performance. Thus, we emphasize the importance of reliably forecasting the penalty to expect for shrinking buffers in favor of others. We explore the use of lightweight extensions for widely used buffer algorithms to perform on-the-fly simulation of buffer performance of smaller and larger buffer sizes simultaneously. Furthermore, we present a simple cost model and demonstrate how to compose these concepts into a self-tuning component for dynamic buffer reallocation.
international conference on knowledge based and intelligent information and engineering systems | 2011
Jean-Marie Gaillourdet; Thomas Grundmann; Martin Memmel; Karsten Schmidt; Arnd Poetzsch-Heffter; Stefan Deßloch
Mathematical models play an increasingly important role in science and engineering. In this paper, we present WoM, a platform for building up knowledge repositories for searching, exploring, combining, and sharing such models. In contrast to similar efforts, WoM supports a well-defined semi-structured representation for mathematical models, which acts as a solid foundation for intelligent web-based presentation, browsing, search, simulation/visualization, and Web community capabilities. We envision WoM to provide a foundation for future design flows and engineering processes using standardized, composable, and computer-supported models.