Marcus Paradies | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcus Paradies is active.

Explore More

Publication

Featured researches published by Marcus Paradies.

statistical and scientific database management | 2015

GRAPHITE: an extensible graph traversal framework for relational database management systems

Marcus Paradies; Wolfgang Lehner; Christof Bornhövd

Graph traversals are a basic but fundamental ingredient for a variety of graph algorithms and graph-oriented queries. To achieve the best possible query performance, they need to be implemented at the core of a database management system that aims at storing, manipulating, and querying graph data. Increasingly, modern business applications demand native graph query and processing capabilities for enterprise-critical operations on data stored in relational database management systems. In this paper we propose an extensible graph traversal framework (GRAPHITE) as a central graph processing component on a common storage engine inside a relational database management system. We study the influence of the graph topology on the execution time of graph traversals and derive two traversal algorithm implementations specialized for different graph topologies and traversal queries. We conduct extensive experiments on GRAPHITE for a large variety of real-world graph data sets and input configurations. Our experiments show that the proposed traversal algorithms differ by up to two orders of magnitude for different input configurations and therefore demonstrate the need for a versatile framework to efficiently process graph traversals on a wide range of different graph topologies and types of queries. Finally, we highlight that the query performance of our traversal implementations is competitive with those of two native graph database management systems.

international database engineering and applications symposium | 2010

How to juggle columns: an entropy-based approach for table compression

Marcus Paradies; Christian Lemke; Hasso Plattner; Wolfgang Lehner; Kai-Uwe Sattler; Alexander Zeier; Jens Krueger

Many relational databases exhibit complex dependencies between data attributes, caused either by the nature of the underlying data or by explicitly denormalized schemas. In data warehouse scenarios, calculated key figures may be materialized or hierarchy levels may be held within a single dimension table. Such column correlations and the resulting data redundancy may result in additional storage requirements. They may also result in bad query performance if inappropriate independence assumptions are made during query compilation. In this paper, we tackle the specific problem of detecting functional dependencies between columns to improve the compression rate for column-based database systems, which both reduces main memory consumption and improves query performance. Although a huge variety of algorithms have been proposed for detecting column dependencies in databases, we maintain that increased data volumes and recent developments in hardware architectures demand novel algorithms with much lower runtime overhead and smaller memory footprint. Our novel approach is based on entropy estimations and exploits a combination of sampling and multiple heuristics to render it applicable for a wide range of use cases. We demonstrate the quality of our approach by means of an implementation within the SAP NetWeaver Business Warehouse Accelerator. Our experiments indicate that our approach scales well with the number of columns and produces reliable dependence structure information. This both reduces memory consumption and improves performance for nontrivial queries.

First International Workshop on Graph Data Management Experiences and Systems | 2013

SynopSys: large graph analytics in the SAP HANA database through summarization

Michael Rudolf; Marcus Paradies; Christof Bornhövd; Wolfgang Lehner

Graph-structured data is ubiquitous and with the advent of social networking platforms has recently seen a significant increase in popularity amongst researchers. However, also many business applications deal with this kind of data and can therefore benefit greatly from graph processing functionality offered directly by the underlying database. This paper summarizes the current state of graph data processing capabilities in the SAP HANA database and describes our efforts to enable large graph analytics in the context of our research project SynopSys. With powerful graph pattern matching support at the core, we envision OLAP-like evaluation functionality exposed to the user in the form of easy-to-apply graph summarization templates. By combining them, the user is able to produce concise summaries of large graph-structured datasets. We also point out open questions and challenges that we plan to tackle in the future developments on our way towards large graph analytics.

Proceedings of Workshop on GRAph Data management Experiences and Systems | 2014

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Marcus Paradies; Michael Rudolf; Christof Bornhövd; Wolfgang Lehner

Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm.

acm symposium on applied computing | 2012

Entity matching for semistructured data in the Cloud

Marcus Paradies; Susan Malaika; Jérôme Siméon; Shahan Khatchadourian; Kai-Uwe Sattler

The rapid expansion of available information, on the Web or inside companies, is increasing. With Cloud infrastructure maturing (including tools for parallel data processing, text analytics, clustering, etc.), there is more interest in integrating data to produce higher-value content. New challenges, notably include entity matching over large volumes of heterogeneous data. In this paper, we describe an approach for entity matching over large amounts of semistructured data in the Cloud. The approach combines ChuQL[4], a recently proposed extension of XQuery with MapReduce, and a blocking technique for entity matching which can be efficiently executed on top of MapReduce. We illustrate the proposed approach by applying it to extract automatically and enrich references in Wikipedia and report on an experimental evaluation of the approach.

international conference on management of data | 2018

G-CORE: A Core for Future Graph Query Languages

Renzo Angles; Marcelo Arenas; Pablo Barceló; Peter A. Boncz; George H. L. Fletcher; Claudio Gutierrez; Tobias Lindaaker; Marcus Paradies; Stefan Plantikow; Juan F. Sequeda; Oskar van Rest; Hannes Voigt

We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.

european conference on parallel processing | 2015

Highspeed Graph Processing Exploiting Main-Memory Column Stores

Matthias Hauck; Marcus Paradies; Holger Fröning; Wolfgang Lehner; Hannes Rauhe

A popular belief in the graph database community is that relational database management systems are generally ill-suited for efficient graph processing. This might apply for analytic graph queries performing iterative computations on the graph, but does not necessarily hold true for short-running, OLTP-style graph queries. In this paper we argue that, instead of extending a graph database management system with traditional relational operators—predicate evaluation, sorting, grouping, and aggregations among others—one should consider adding a graph abstraction and graph-specific operations, such as graph traversals and pattern matching, to relational database management systems. We use an exemplary query from the interactive query workload of the ldbc social network benchmark and run it against our enhanced in-memory, columnar relational database system to support our claims. Our performance measurements indicate that a columnar rdbms—extended by graph-specific operators and data structures—can serve as a foundation for high-speed graph processing on big memory machines with non-uniform memory access and a large number of available cores.

international conference on management of data | 2017

An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries

Frank Tetzel; Hannes Voigt; Marcus Paradies; Wolfgang Lehner

Regular path queries (RPQs) are a fundamental part of recent graph query languages like SPARQL and PGQL. They allow the definition of recursive path structures through regular expressions in a declarative pattern matching environment. We study the use of the K2-tree graph compression technique to materialize RPQ results with low memory consumption for indexing. Compact index representations enable the efficient storage of multiple indexes for varying RPQs.

international conference on management of data | 2017

Can Modern Graph Processing Engines Run Concurrent Queries Efficiently

Matthias Hauck; Marcus Paradies; Holger Fröning

Analytic graph processing has witnessed an ever-growing interest both in industry and academia with the focus on providing the most effective algorithm implementations to maximize single-query performance. In a complex application scenario, where multiple users issue concurrent queries to the analytic graph processing engine, the major performance metric is throughput rather than single-query elapsed time. As of today, there is no single-node graph engine that is designed for concurrent graph processing running multiple queries in parallel. In this work, we analyze the single-node graph engine Galois and extend it to run multiple graph queries concurrently. We perform an extensive evaluation of Galois for various graph algorithms and data sets to gain a fundamental understanding of the performance bottlenecks of existing graph engines. Finally, we derive important insights and conclude that modern graph engines cannot be easily adapted to handle concurrent graph queries efficiently.

international conference data science | 2018

Analysis of Data Structures Involved in RPQ Evaluation.

Frank Tetzel; Hannes Voigt; Marcus Paradies; Romans Kasperovics; Wolfgang Lehner

A fundamental ingredient of declarative graph query languages are regular path queries (RPQs). They provide an expressive yet compact way to match long and complex paths in a data graph by utilizing regular expressions. In this paper, we systematically explore and analyze the design space for the data structures involved in automaton-based RPQ evaluation. We consider three fundamental data structures used during RPQ processing: adjacency lists for quick neighborhood exploration, visited data structure for cycle detection, and the representation of intermediate results. We conduct an extensive experimental evaluation on realistic graph data sets and systematically investigate various alternative data structure representations and implementation variants. We show that carefully crafted data structures which exploit the access pattern of RPQs lead to reduced peak memory consumption and evaluation time.

Explore More