Is this you? Create Your Porfile

Maribel Acosta

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maribel Acosta is active.

Explore More

Publication

Featured researches published by Maribel Acosta.

international semantic web conference | 2011

ANAPSID: an adaptive query processing engine for SPARQL endpoints

Maribel Acosta; Maria-Esther Vidal; Tomas Lampo; Julio Castillo; Edna Ruckhaus

Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-thenexecute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traffic is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.

international semantic web conference | 2013

Crowdsourcing Linked Data Quality Assessment

Maribel Acosta; Amrapali Zaveri; Elena Simperl; Dimitris Kontokostas; Sören Auer; Jens Lehmann

In this paper we look into the use of crowdsourcing as a means to handle Linked Data quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classified them according to the extent to which they are likely to be amenable to a specific form of crowdsourcing. Based on this analysis, we implemented a quality assessment methodology for Linked Data that leverages the wisdom of the crowds in different ways: (i) a contest targeting an expert crowd of researchers and Linked Data enthusiasts; complemented by (ii) paid microtasks published on Amazon Mechanical Turk.We empirically evaluated how this methodology could efficiently spot quality issues in DBpedia. We also investigated how the contributions of the two types of crowds could be optimally integrated into Linked Data curation processes. The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data.

international semantic web conference | 2015

Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data

Maribel Acosta; Maria-Esther Vidal

Client-side query processing techniques that rely on the materialization of fragments of the original RDF dataset provide a promising solution for Web query processing. However, because of unexpected data transfers, the traditional optimize-then-execute paradigm, used by existing approaches, is not always applicable in this context, i.e., performance of client-side execution plans can be negatively affected by live conditions where rate at which data arrive from sources changes. We tackle adaptivity for client-side query processing, and present a network of Linked Data Eddies that is able to adjust query execution schedulers to data availability and runtime conditions. Experimental studies suggest that the network of Linked Data Eddies outperforms static Web query schedulers in scenarios with unpredictable transfer delays and data distributions.

international world wide web conferences | 2014

WikiWho: precise and efficient attribution of authorship of revisioned content

Fabian Flöck; Maribel Acosta

Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance.

extended semantic web conference | 2012

DEFENDER: A DEcomposer for quEries agaiNst feDERations of Endpoints

Gabriela Montoya; Maria-Esther Vidal; Maribel Acosta

We present DEFENDER and illustrate the benefits of identifying promising query decompositions and efficient plans that combine results from federations of SPARQL endpoints. DEFENDER is a query decomposer that implements a two-fold approach. First, triple patterns in a SPARQL query are decomposed into simple sub-queries that can be completely executed on one endpoint. Second, sub-queries are combined into a feasible bushy tree plan where the number of joins is maximized and the height of tree is minimized. We demonstrate DEFENDER and compare its performance with respect to state-of-the-art RDF engines for queries of diverse complexity, networks with different delays, and dataset differently distributed among a variety of endpoints.

international conference on knowledge capture | 2015

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

Maribel Acosta; Elena Simperl; Fabian Flöck; Maria-Esther Vidal

Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. We propose a model that exploits the characteristics of RDF in order to estimate the completeness of portions of a data set. The completeness model complemented by crowd knowledge is used by the HARE query engine to on-the-fly decide which parts of a query should be executed against the data set or via crowd computing. To evaluate HARE, we created and executed a collection of 50 SPARQL queries against the DBpedia data set. Experimental results clearly show that our solution accurately enhances answer completeness.

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV - Volume 9620 | 2015

On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries

Maria-Esther Vidal; Simón Castillo; Maribel Acosta; Gabriela Montoya; Guillermo Palma

We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed-DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100i¾?%. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.

Sprachwissenschaft | 2016

Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

Maribel Acosta; Amrapali Zaveri; Elena Simperl; Dimitris Kontokostas; Fabian Flöck; Jens Lehmann

In this paper we examine the use of crowdsourcing as a means to master Linked Data quality problems that are difficult to solve automatically. We base our approach on the analysis of the most common errors encountered in Linked Data sources, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and compare different crowdsourcing approaches to identify these Linked Data quality issues, employing the DBpedia dataset as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds’ complementary aptitudes in quality issue detection. The results show that a combination of the two styles of crowdsourcing is likely to achieve more efficient results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of Linked Data.

Handbook of Human Computation | 2013

Knowledge Engineering via Human Computation

Elena Simperl; Maribel Acosta; Fabian Flöck

In this chapter, we will analyze a number of essential knowledge engineering activities that, for technical or principled reasons, can hardly be optimally executed through automatic processing approaches, thus remaining heavily reliant on human intervention. Human computation methods can be applied to this field in order to overcome these limitations in terms of accuracy, while still being able to fully take advantage of the scalability and performance of machine-driven capabilities. For each activity, we will explain how this symbiosis can be achieved by giving a short overview of the state of the art and several examples of systems and applications such as games-with-a-purpose, microtask crowdsourcing projects, and community-driven collaborative initiatives that showcase the benefits of the general idea.

international conference on move to meaningful internet systems | 2011

CAREY: ClimAtological contRol of EmergencY regions

Maribel Acosta; Marlene Goncalves; Maria-Esther Vidal

Nowadays, climate changes are impacting life on Earth; ecological effects as warming of sea-surface temperatures, catastrophic events as storms or mudslides, and the increase of infectious diseases, are affecting life and development. Unfortunately, experts predict that global temperatures will increase even more during the next years; thus, to decide how to assist possibly affected people, experts require tools that help them to discover potential risky regions based on their weather conditions. We address this problem and propose a tool able to support experts in the discovery of these risky areas.We present CAREY, a federated tool built on top of a weather database, that implements a semi-supervised data mining approach to discover regions with similar weather observations which may characterize micro-climate zones. Additionally, Top-k Skyline techniques have been developed to rank micro-climate areas according to how close they are to a given weather condition of risk. We conducted an initial experimental study as a proof-of-concepts, and the preliminary results suggest that CAREY may provide an effective support for the visualization of potential risky areas.

Explore More