Bernd Amann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bernd Amann is active.

Explore More

Publication

Featured researches published by Bernd Amann.

international conference on big data | 2015

LiteMat: A scalable, cost-efficient inference encoding scheme for large RDF graphs

Olivier Curé; Hubert Naacke; Tendry Randriamalala; Bernd Amann

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various big data problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.

web information systems engineering | 2010

Best-effort refresh strategies for content-based RSS feed aggregation

Roxana Horincar; Bernd Amann; Thierry Artières

During the past several years RSS-based content syndication has become a standard technique for efficiently and timely disseminating information on the web. From a data processing perspective RSS feeds are standard XML resources which are periodically refreshed by feed aggregators for generating continuous streams of items. In this article, we study the problem of information loss in the context of a content-based feed aggregation system and we propose a new best-effort refresh strategy for RSS feeds under limited bandwidth. This strategy is evaluated experimentally and compared to other state-of-the-art crawling strategies for web pages.

international conference on web engineering | 2012

Online change estimation models for dynamic web resources: a case-study of RSS feed refresh strategies

Roxana Horincar; Bernd Amann; Thierry Artières

Modern web 2.0 applications have transformed the Internet into an interactive, dynamic and alive information space. Personal weblogs, commercial web sites, news portals and social media applications generate highly dynamic information streams which have to be propagated to millions of users. This article focuses on the problem of estimating the publication frequency of highly dynamic web resources. We illustrate the importance of developing efficient online estimation techniques for improving the refresh strategies of RSS feed aggregators like Google Reader [8], Datasift [7] or Roses [11]. We study the temporal publication characteristics of a large collection of real world RSS feeds and we define and evaluate several online estimation methods in cohesion with different refresh strategies. We show the benefit of using periodical source publication patterns for change estimation and we highlight the challenges imposed by the application context.

World Wide Web | 2015

Online refresh strategies for content based feed aggregation

Roxana Horincar; Bernd Amann; Thierry Artières

With the rapid growth of data sources, services and devices connected to the Internet, online available web content is getting more and more diverse and dynamic. In order to facilitate the efficient dissemination of evolving and temporary information, many web applications publish their new information as RSS and Atom documents which are then collected and transformed by RSS aggregators like Feedly or Yahoo! News. This article addresses the particular issue of large scale aggregation of highly dynamic information sources by focusing on the design of optimal refresh strategies for large collections of RSS feed documents. First, we introduce two quality measures specific to RSS aggregation which reflect the information completeness and average freshness of the result feeds. Then, we propose a best effort feed refresh strategy that achieves maximum aggregation quality compared with all other existing policies with the same average number of refreshes. This strategy is based on specific online change estimation models developed after a deep analysis of the temporal publication characteristics of a representative collection of real-world RSS feeds. The presented methods have been implemented and tested against synthetic and real-world RSS feed data sets.

international conference on management of data | 2017

SPARQL Graph Pattern Processing with Apache Spark

Hubert Naacke; Bernd Amann; Olivier Curé

A common way to achieve scalability for processing SPARQL queries is to choose MapReduce frameworks like Hadoop or Spark. Processing basic graph pattern (BGP) expressions generating large join plans over distributed data partitions is a major challenge in these frameworks. In this article, we study the use of two distributed join algorithms, partitioned join and broadcast join, for the evaluation of BGP expressions on top of Apache Spark. We compare five possible implementation and illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to efficiently take account of existing data partitioning schemes. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often achieve better performance than single kind join plans.

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2014

Provenance-Based Quality Assessment and Inference in Data-Centric Workflow Executions

Clément Caron; Bernd Amann; Camelia Constantin; Patrick Giroux; André Santanchè

In this article we present a rule-based quality model for data centric workflows. The goal is to build a tool assisting workflow designers and users in annotating, exploring and improving the quality of data produced by complex media mining workflow executions. Our approach combines an existing fine-grained provenance generation approach [3] with a new quality assessment model for annotating XML fragments with data/application-specific quality values and inferring new values from existing annotations and provenance dependencies. We define the formal semantics using an appropriate fixpoint operator and illustrate how it can be implemented using standard Jena inference rules provided by current semantic web infrastructures.

web information systems engineering | 2007

Collaborative cache based on path scores

Bernd Amann; Camelia Constantin

Large-scale distributed data integration systems have to deal with important query processing costs which are essentially due to the high communication overload between data peers. Caching techniques can drastically reduce processing and communication cost.We propose a new distributed caching strategy that reduces redundant caching decisions of individual peers. We estimate cache redundancy by a distributed algorithmwithout additionalmessages. Our simulation experiments show that considering redundancy scores can drastically reduce distributed query execution costs.

Archive | 2018

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Bernd Amann; Olivier Curé; Hubert Naacke

This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions. The main challenge in processing complex graph pattern queries is to optimize the join operations which dominate the cost of all other operators. The chapter introduces the specific solution using the MapReduce framework for processing SPARQL graph patterns. It describes the use of Apache Spark and explains the importance of the physical data layers for the query performance. Spark SQL translates a SQL query into an algebraic expression composed of DF operators such as selection, projection and join.

Archive | 2007

Web services: Technology issues and foundations

Bernd Amann; Salima Benbernou; Benjamin Nguyen

Unlike traditional applications, which depend upon a tight interconnection of all program elements, Web service applications are composed of loosely coupled, autonomous and independent services published on the Web. In this chapter, we first introduces the concept of service oriented computing (SOC) on the Web and the current standards enabling the definition and publication of Web services. This technology’s next evolution is to facilitate the creation and maintenance of Web applications. This can be achieved by exploiting the self-descriptive nature of Web services combined with more powerful models and languages for composing Web services. A second objective of this chapter is to illustrate the complexity of the Web service composition problem and to provide a representative overview of the existing approaches. The chapter concludes with a short presentation of two research projects exploiting and extending the Web service paradigm.

arXiv: Databases | 2015