André Petermann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where André Petermann is active.

Explore More

Publication

Featured researches published by André Petermann.

very large data bases | 2014

Graph-based data integration and business intelligence with BIIIG

André Petermann; Martin Junghanns; Robert Müller; Erhard Rahm

We demonstrate BIIIG (Business Intelligence with Integrated Instance Graphs), a new system for graph-based data integration and analysis. It aims at improving business analytics compared to traditional OLAP approaches by comprehensively tracking relationships between entities and making them available for analysis. BIIIG supports a largely automatic data integration pipeline for metadata and instance data. Metadata from heterogeneous sources are integrated in a so-called Unified Metadata Graph (UMG) while instance data is combined in a single integrated instance graph (IIG). A unique feature of BIIIG is the concept of business transaction graphs, which are derived from the IIG and which reflect all steps involved in a specific business process. Queries and analysis tasks can refer to the entire instance graph or sets of business transaction graphs. In the demonstration, we perform all data integration steps and present analytic queries including pattern matching and graph-based aggregation of business measures.

international conference on data engineering | 2014

BIIIG: Enabling business intelligence with integrated instance graphs

André Petermann; Martin Junghanns; Robert Müller; Erhard Rahm

We propose a new graph-based framework for business intelligence called BIIIG supporting the flexible evaluation of relationships between data instances. It builds on the broad availability of interconnected objects in existing business information systems. Our approach extracts such interconnected data from multiple sources and integrates them into an integrated instance graph. To support specific analytic goals, we extract subgraphs from this integrated instance graph representing executed business activities with all their data traces and involved master data. We provide an overview of the BIIIG approach and describe its main steps. We also present initial results from an evaluation with real ERP data.

Handbook of Big Data Technologies | 2017

Management and Analysis of Big Graph Data: Current Systems and Open Challenges

Martin Junghanns; André Petermann; Martin Neumann; Erhard Rahm

Many big data applications in business and science require the management and analysis of huge amounts of graph data. Suitable systems to manage and to analyze such graph data should meet a number of challenging requirements including support for an expressive graph data model with heterogeneous vertices and edges, powerful query and graph mining capabilities, ease of use as well as high performance and scalability. In this chapter, we survey current system approaches for management and analysis of “big graph data”. We discuss graph database systems, distributed graph processing systems such as Google Pregel and its variations, and graph dataflow approaches based on Apache Spark and Flink. We further outline a recent research framework called Gradoop that is build on the so-called Extended Property Graph Data Model with dedicated support for analyzing not only single graphs but also collections of graphs. Finally, we discuss current and future research challenges.

international conference on management of data | 2016

Analyzing extended property graphs with Apache Flink

Martin Junghanns; André Petermann; Niklas Teichmann; Kevin Gómez; Erhard Rahm

Graphs are an intuitive way to model complex relationships between real-world data objects. Thus, graph analytics plays an important role in research and industry. As graphs often reflect heterogeneous domain data, their representation requires an expressive data model including the abstraction of graph collections, for example, to analyze communities inside a social network. Further on, answering complex analytical questions about such graphs entails combining multiple analytical operations. To satisfy these requirements, we propose the Extended Property Graph Model, which is semantically rich, schema-free and supports multiple distinct graphs. Based on this representation, it provides declarative and combinable operators to analyze both single graphs and graph collections. Our current implementation is based on the distributed dataflow framework Apache Flink. We present the results of a first experimental study showing the scalability of our implementation on social network data with up to 11 billion edges.

Workshop on Big Data Benchmarks | 2014

FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics

André Petermann; Martin Junghanns; Robert Müller; Erhard Rahm

We present FoodBroker, a new data generator for benchmarking graph-based business intelligence systems and approaches. It covers two realistic business processes and their involved master and transactional data objects. The interactions are correlated in controlled ways to enable non-uniform distributions for data and relationships. For benchmarking data integration, the generated data is stored in two interrelated databases. The dataset can be arbitrarily scaled and allows comprehensive graph- and pattern-based analysis.

international conference on management of data | 2017

Cypher-based Graph Pattern Matching in Gradoop

Martin Junghanns; Max Kießling; Alex Averbuch; André Petermann; Erhard Rahm

Graph pattern matching is an important and challenging operation on graph data. Typical use cases are related to graph analytics. Since analysts are often non-programmers, a graph system will only gain acceptance, if there is a comprehensible way to declare pattern matching queries. However, respective query languages are currently only supported by graph databases but not by distributed graph processing systems. To enable pattern matching on a large scale, we implemented the declarative graph query language Cypher within the distributed graph analysis platform Gradoop. Using LDBC graph data, we show that our query engine is scalable for operational as well as analytical workloads. The implementation is open-source and easy to extend for further research.

international conference on data mining | 2016

Graph Mining for Complex Data Analytics

André Petermann; Martin Junghanns; Stephan Kemper; Kevin Gómez; Niklas Teichmann; Erhard Rahm

Complex data analytics that involve data mining often comprise not only a single algorithm but also further data processing steps, for example, to restrict the search space or to filter the result. We demonstrate graph mining with Gradoop, the first scalable system supporting declarative analytical programs composed from multiple graph operations. We use a business intelligence example including frequent subgraph mining to highlight the analytical capabilities enabled by such programs. The results can be visualized and, to show its ease of use, the program can be modified on visitors request. Gradoop is built on top of state-of-the-art big data technology and out-of-the-box horizontally scalable. Its source code is publicly available and designed for easy extensibility. We offer to the graph mining community, to apply Gradoop in large scale use cases and to contribute further algorithms.

Information Technology | 2016

Scalable business intelligence with graph collections

André Petermann; Martin Junghanns

Abstract Using graph data models for business intelligence applications is a novel and promising approach. In contrast to traditional data warehouse models, graph models enable the mining of relationship patterns. In our prior work, we introduced an approach to graph-based data integration and analytics called BIIIG (Business Intelligence with Integrated Instance Graphs). In this work, we compare state-of-the-art systems for graph data management and analytics with regard to the support for our approach in Big Data scenarios. To exemplify the analytical value of graph models for business intelligence, we propose an analytical workflow to extract knowledge from graph-integrated business data. Finally, we show how we use Gradoop, a novel framework for distributed graph analytics, to implement our approach.

very large data bases | 2018

Declarative and distributed graph analytics with GRADOOP

Martin Junghanns; Max Kießling; Niklas Teichmann; Kevin Gómez; André Petermann; Erhard Rahm

We demonstrate Gradoop, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing. Using a rich graph data model and powerful graph operators, users can declaratively express graph analytical programs for distributed execution without needing advanced programming experience or a deeper understanding of the underlying system. Visitors of the demo can declare graph analytical programs using the Gradoop operators and also visually experience two of our advanced operators: graph pattern matching and graph grouping. We provide real world and artificial social network data with up to 10 billion edges and allow running the programs either locally or on a remote research cluster to demonstrate scalability.

international conference on management of data | 2018

THoSP: an algorithm for nesting property graphs

Giacomo Bergami; André Petermann; Danilo Montesi

Despite the growing popularity of techniques related to graph summarization, a general operator for the flexible nesting of graphs is still missing. We propose a novel nested graph data model and a powerful graph nesting operator. In contrast to existing approaches, our approach is able to summarize vertices and paths among vertex groups within a single query. Further on, our model supports partial nestings under the preservation of original graph elements as well as the full recovery of the original graph. We propose an efficient nesting algorithm (THoSP) that is able to perform vertex and path nestings in a single visit of the input graph. Results of an experimental evaluation show that THoSP outperforms equivalent implementations based on graph (Cypher, SPARQL), relational (SQL) and document oriented (ArangoDB) databases.

Explore More