Julien Leblay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julien Leblay is active.

Explore More

Publication

Featured researches published by Julien Leblay.

very large data bases | 2011

View selection in Semantic Web databases

François Goasdoué; Konstantinos Karanasos; Julien Leblay; Ioana Manolescu

We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.

very large data bases | 2015

Querying with access patterns and integrity constraints

Michael Benedikt; Julien Leblay; Efthymia Tsamoura

Traditional query processing involves a search for plans formed by applying algebraic operators on top of primitives representing access to relations in the input query. But many querying scenarios involve two interacting issues that complicate the search. On the one hand, the search space may be limited by access restrictions associated with the interfaces to datasources, which require certain parameters to be given as inputs. On the other hand, the search space may be extended through the presence of integrity constraints that relate sources to each other, allowing for plans that do not match the structure of the user query. In this paper we present the first optimization approach that attacks both these difficulties within a single framework, presenting a system in which classical cost-based join optimization is extended to support both access-restrictions and constraints. Instead of iteratively exploring subqueries of the input query, our optimizer explores a space of proofs that witness the answering of the query, where each proof has a direct correspondence with a query plan.

international conference on management of data | 2013

Fact checking and analyzing the web

François Goasdoué; Konstantinos Karanasos; Yannis Katsis; Julien Leblay; Ioana Manolescu; Stamatis Zampetakis

Fact checking and data journalism are currently strong trends. The sheer amount of data at hand makes it difficult even for trained professionals to spot biased, outdated or simply incorrect information. We propose to demonstrate FactMinder, a fact checking and analysis assistance application. SIGMOD attendees will be able to analyze documents using FactMinder and experience how background knowledge and open data repositories help build insightful overviews of current topics.

very large data bases | 2013

Growing triples on trees: an XML-RDF hybrid model for annotated documents

François Goasdoué; Konstantinos Karanasos; Yannis Katsis; Julien Leblay; Ioana Manolescu; Stamatis Zampetakis

Since the beginning of the Semantic Web initiative, significant efforts have been invested in finding efficient ways to publish, store, and query metadata on the Web. RDF and SPARQL have become the standard data model and query language, respectively, to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured (typically XML) documents. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. We propose XR, a novel hybrid data model capturing the structural aspects of XML data and the semantics of RDF, also enabling us to reason about XML data. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. This data model comes with the XRQ query language that combines features of both XQuery and SPARQL. To demonstrate the feasibility of this hybrid XML-RDF data management setting, and to validate its interest, we have developed an XR platform on top of well-known data management systems for XML and RDF. In particular, the platform features several XRQ query processing algorithms, whose performance is experimentally compared.

conference on information and knowledge management | 2010

RDFViewS: a storage tuning wizard for RDF applications

François Goasdoué; Konstantinos Karanasos; Julien Leblay; Ioana Manolescu

In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of potential view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation.

Archive | 2016

Generating Plans from Proofs: The Interpolation-based Approach to Query Reformulation

Michael Benedikt; Julien Leblay; Balder ten Cate; Efthymia Tsamoura

Abstract Query reformulation refers to a process of translating a source query—a request for information in some high-level logic-based language—into a target plan that abides by certain interface ...

very large data bases | 2014

PDQ: proof-driven query answering over web-based data

Michael Benedikt; Julien Leblay; Efthymia Tsamoura

The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a system for determining a query plan in the presence of web-based sources. It is: (i) constraint-aware -- exploiting relationships between sources to rewrite an expensive query into a cheaper one, (ii) access-aware -- abiding by any access restrictions known in the sources, and (iii) cost-aware -- making use of any cost information that is available about services. PDQ takes the novel approach of generating query plans from proofs that a query is answerable. We demonstrate the use of PDQ and its effectiveness in generating low-cost plans.

Proceedings of the 4th International Workshop on Semantic Web Information Management | 2012

SPARQL query answering with bitmap indexes

Julien Leblay

When querying RDF data, one may use reasoning to reach intensional data, i.e., data defined by sets of rules. This is usually achieved through forward chaining, with space and maintenance overheads, or backward chaining, with high query evaluation and optimization costs. Recent approaches rely on pre-computing the terminological closure of the data rather than the full saturation. In this setting, one can even query the data without resorting to backward chaining, using a so-called semantic index. However, these techniques are limited in the type of queries they can support. In this paper, we introduce a data storage technique which mitigates the space issues of forward-chaining. We show that it can also be used with a semantic index. We propose a new structure for the index that relies on bitmaps making it resilient to updates. Our experimental results demonstrate that our storage model significantly reduces the space required to store the data. We show that the indexes can be computed quickly and fit well in memory even for very large ontologies. Finally, we analyze how query answering is affected by the data layout.

very large data bases | 2018

ConnectionLens: Finding Connections Across Heterogeneous Data Sources

Camille Chanial; Rédouane Dziri; Helena Galhardas; Julien Leblay; Minh-Huong Le Nguyen; Ioana Manolescu

Nowadays, journalism is facilitated by the existence of large amounts of publicly available digital data sources. In particular, journalists can do investigative work, which typically consists on keyword-based searches over many heterogeneous, independently produced and dynamic data sources, to obtain useful, interconnecting and traceable information. We propose to demonstrate ConnectionLens, a system based on a novel algorithm for keyword search across heterogeneous data sources. Our demonstration scenarios are based on use cases suggested by journalists from the french journal Le Monde, with whom we collaborate.

very large data bases | 2018

Computational Fact Checking: A Content Management Perspective

Sylvie Cazalens; Julien Leblay; Philippe Lamarre; Ioana Manolescu; Xavier Tannier

Data journalism designates journalistic work inspired by digital data sources. A particularly popular and active area of data journalism is concerned with fact-checking. The term was born in the journalist community and referred the process of verifying and ensuring the accuracy of published media content; since 2012, however, it has increasingly focused on the analysis of politics, economy, science, and news content shared in any form, but first and foremost on the Web (social and otherwise). These trends have been noticed by computer scientists working in the industry and academia. Thus, a very lively area of digital content management research has taken up these problems and works to propose foundations (models), algorithms, and implement them through concrete tools. Our tutorial: (i) Outlines the current state of affairs in the area of digital (or computational) fact-checking in newsrooms, by journalists, NGO workers, scientists and IT companies; (ii) Shows which areas of digital content management research, in particular those relying on the Web, can be leveraged to help fact-checking, and gives a comprehensive survey of efforts in this area; (iii) Highlights ongoing trends, unsolved problems, and areas where we envision future scientific and practical advances.

Explore More