Eric Peukert
Leipzig University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eric Peukert.
international conference on data engineering | 2011
Eric Peukert; Julian Eberius; Erhard Rahm
We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performance-oriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems.
extending database technology | 2010
Eric Peukert; Henrike Berthold; Erhard Rahm
A recurring manual task in data integration, ontology alignment or model management is finding mappings between complex meta data structures. In order to reduce the manual effort, many matching algorithms for semi-automatically computing mappings were introduced. Unfortunately, current matching systems severely lack performance when matching large schemas. Recently, some systems tried to tackle the performance problem within individual matching approaches. However, none of them developed solutions on the level of matching processes. In this paper we introduce a novel rewrite-based optimization technique that is generally applicable to different types of matching processes. We introduce filter-based rewrite rules similar to predicate push-down in query optimization. In addition we introduce a modeling tool and recommendation system for rewriting matching processes. Our evaluation on matching large web service message types shows significant performance improvements without losing the quality of automatically computed results.
international conference on data engineering | 2012
Eric Peukert; Julian Eberius; Erhard Rahm
Mapping complex metadata structures is crucial in a number of domains such as data integration, ontology alignment or model management. To speed up the generation of such mappings, automatic matching systems were developed to compute mapping suggestions that can be corrected by a user. However, constructing and tuning match strategies still requires a high manual effort by matching experts as well as correct mappings to evaluate generated mappings. We therefore propose a self-configuring schema matching system that is able to automatically adapt to the given mapping problem at hand. Our approach is based on analyzing the input schemas as well as intermediate matching results. A variety of matching rules use the analysis results to automatically construct and adapt an underlying matching process for a given match task. We comprehensively evaluate our approach on different mapping problems from the schema, ontology and model management domains. The evaluation shows that our system is able to robustly return good quality mappings across different mapping problems and domains.
advances in databases and information systems | 2017
Alieh Saeedi; Eric Peukert; Erhard Rahm
Entity resolution identifies semantically equivalent entities, e.g., describing the same product or customer. It is especially challenging for big data applications where large volumes of data from many sources have to be matched and integrated. Entity resolution for multiple data sources is best addressed by clustering schemes that group all matching entities within clusters. While there are many possible clustering schemes for entity resolution, their relative suitability and scalability is still unclear. We therefore implemented and comparatively evaluate distributed versions of six clustering schemes based on Apache Flink within a new entity resolution framework called Famer. Our evaluation for different real-life and synthetically generated datasets considers both the match quality as well as the scalability for different number of machines and data sizes.
Proceedings of the 1st Workshop on New Trends in Similarity Search | 2011
Eric Peukert; Erhard Rahm
Computing similarities between metadata elements is an essential process in schema and ontology matching systems. Such systems aim at reducing the manual effort of finding mappings for data integration or ontology alignment. Similarity measures compute syntactic, semantic or structural similarities of metadata elements. Typically, different similarities are combined and the most similar element pairs are selected to produce a best-1 mapping suggestion. Unfortunately automatic schema matching systems are only rarely commercially adopted since correcting the best-1mapping suggestion is often harder than finding the mapping manually. To alleviate this, schema matching must be used incrementally by computing Top-N mapping suggestions that the user can select from. However, current similarity measures and selection operators suggest the same target elements for many different source elements. This effect, that we call overlap, reduces the quality of schema matching significantly. To address this problem, we introduce a new weighted token similarity measure that implicitly decreases the overlap between Top-N sets. Secondly, a new Top-N selection operator is introduced that is able to increase the recall by restricting overlap directly. We evaluate our approaches on large-sized, real world matching problems and show the positive effect on match quality.
european semantic web conference | 2018
Alieh Saeedi; Eric Peukert; Erhard Rahm
Knowledge graphs holistically integrate information about entities from multiple sources. A key step in the construction and maintenance of knowledge graphs is the clustering of equivalent entities from different sources. Previous approaches for such an entity clustering suffer from several problems, e.g., the creation of overlapping clusters or the inclusion of several entities from the same source within clusters. We therefore propose a new entity clustering algorithm CLIP that can be applied both to create entity clusters and to repair entity clusters determined with another clustering scheme. In contrast to previous approaches, CLIP not only uses the similarity between entities for clustering but also further features of entity links such as the so-called link strength. To achieve a good scalability we provide a parallel implementation of CLIP based on Apache Flink. Our evaluation for different datasets shows that the new approach can achieve substantially higher cluster quality than previous approaches.
conference on information and knowledge management | 2011
Eric Peukert; Julian Eberius; Erhard Rahm
Semi-automatic schema matching systems have been developed to compute mapping suggestions that can be corrected by a user. However, constructing and tuning match strategies still requires a high manual effort. We therefore propose a self-configuring schema matching system that is able to automatically adapt to the given mapping problem at hand. Our approach is based on analyzing the input schemas as well as intermediate match results. A variety of matching rules use the analysis results to automatically construct and adapt an underlying matching process for a given match task. The evaluation shows that our system is able to robustly return good quality mappings across different mapping problems and domains.
Archive | 2018
Lars-Peter Meyer; Jan Frenzel; Eric Peukert; René Jäkel; Stefan Kühne
In vielen Bereichen sollen zunehmende Datenmengen sinnvoll ausgewertet werden. Dabei fallt haufig das Schlagwort Big Data. Bei potenziellen Anwendern in Wissenschaft und Wirtschaft bleiben jedoch haufig viele Fragen offen, hier helfen Big-Data-Kompetenzzentren. In diesem Beitrag wird aus den Erfahrungen eines Big-Data-Kompetenzzentrums berichtet mit dem Fokus auf dem Service-Aspekt. Das Service-Portfolio des Big-Data-Kompetenzzentrums wird vorgestellt und anhand realer Falle aus der Praxis erlautert. Exemplarisch wird auf den Betrieb der notwendigen Big-Data-Cluster als wichtiger Service-Baustein eingegangen.
Datenbank-spektrum | 2017
Pascal Hirmer; Tim Waizenegger; Ghareeb Falazi; Majd Abdo; Yuliya Volga; Alexander Askinadze; Matthias Liebeck; Stefan Conrad; Tobias Hildebrandt; Conrad Indiono; Stefanie Rinderle-Ma; Martin Grimmer; Matthias Kricke; Eric Peukert
The 17th Conference on Database Systems for Business, Technology, and Web (BTW2017) of the German Informatics Society (GI) took place in March 2017 at the University of Stuttgart in Germany. A Data Science Challenge was organized for the first time at a BTW conference by the University of Stuttgart and Sponsor IBM. We challenged the participants to solve a data analysis task within one month and present their results at the BTW. In this article, we give an overview of the organizational process surrounding the Challenge, and introduce the task that the participants had to solve. In the subsequent sections, the final four competitor groups describe their approaches and results.
Taking the LEAP#R##N#The Methods and Tools of the Linked Engineering and Manufacturing Platform (LEAP) | 2016
Eric Peukert; C. Wartner
This chapter introduces the Data and Knowledge Integration infrastructure of LEAP. It consists of a flexible graph store, a linking framework, as well as analysis and integration tools. The presented approach combines schema level integration techniques with instance level matching, as well as graph-based storage. Each of the components of the infrastructure is presented in detail and its usage is sketched briefly. In addition to the basic integration techniques, several useful analysis tools are introduced.