Kemele M. Endris | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kemele M. Endris is active.

Explore More

Publication

Featured researches published by Kemele M. Endris.

web intelligence, mining and semantics | 2016

Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment

Harsh Thakkar; Kemele M. Endris; José M. Giménez-García; Jeremy Debattista; Christoph Lange; Sören Auer

The current decade is a witness to an enormous explosion of data being published on the Web as Linked Data to maximise its reusability. Answering questions that users speak or write in natural language is an increasingly popular application scenario for Web Data, especially when the domain of the questions is not limited to a domain where dedicated curated datasets exist, like in medicine. The increasing use of Web Data in this and other settings has highlighted the importance of assessing its quality. While quite some work has been done with regard to assessing the quality of Linked Data, only few efforts have been dedicated to quality assessment of linked data from the question answering domains perspective. From the linked data quality metrics that have so far been well documented in the literature, we have identified those that are most relevant for QA. We apply these quality metrics, implemented in the Luzzu framework, to subsets of two datasets of crucial importance to open domain QA -- DBpedia and Wikidata -- and thus present the first assessment of the quality of these datasets for QA. From these datasets, we assess slices covering the specific domains of restaurants, politicians, films and soccer players. The results of our experiments suggest that for most of these domains, the quality of Wikidata with regard to the majority of relevant metrics is higher than that of DBpedia.

international conference on web engineering | 2016

Co-evolution of RDF Datasets

Sidra Faisal; Kemele M. Endris; Saeedeh Shekarpour; Sören Auer; Maria-Esther Vidal

Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.

international conference on semantic systems | 2017

SMJoin: A Multi-way Join Operator for SPARQL Queries

Mikhail Galkin; Kemele M. Endris; Maribel Acosta; Diego Collarana; Maria-Esther Vidal; Sören Auer

Join operators are particularly important in SPARQL query engines that collect RDF data using Web access interfaces. State-of-the-art SPARQL query engines rely on binary join operators tailored for merging results from SPARQL queries over Web access interfaces. However, in queries with a large number of triple patterns, binary joins constitute a significant burden on the query performance. Multi-way joins that handle more than two inputs are able to reduce the complexity of pre-processing stages and reduce the execution time. Whereas in the relational databases field multi-way joins have already received some attention, the applicability of multi-way joins in SPARQL query processing remains unexplored. We devise SMJoin, a multi-way non-blocking join operator tailored for independently merging results from more than two RDF data sources. SMJoin implements intra-operator adaptivity, i.e., it is able to adjust join execution schedulers to the conditions of Web access interfaces; thus, query answers are produced as soon as they are computed and can be continuously generated even if one of the sources becomes blocked. We empirically study the behavior of SMJoin in two benchmarks with queries of different selectivity; state-of-the-art SPARQL query engines are included in the study. Experimental results suggest that SMJoin outperforms existing approaches in very selective queries, and produces first answers as fast as compared adaptive query engines in non-selective queries.

international conference on knowledge capture | 2017

Dataset Reuse: An Analysis of References in Community Discussions, Publications and Data

Kemele M. Endris; José M. Giménez-García; Harsh Thakkar; Elena Demidova; Antoine Zimmermann; Christoph Lange; Elena Simperl

Following the Linked Data principles means maximising the reusability of data over the Web. Reuse of datasets can become apparent when datasets are linked to from other datasets, and referred in scientific articles or community discussions. It can thus be measured, similarly to citations of papers. In this paper we propose dataset reuse metrics and use these metrics to analyse indications of dataset reuse in different communication channels within a scientific community. In particular we consider mailing lists and publications in the Semantic Web community and their correlation with data interlinking. Our results demonstrate that indications of dataset reuse across different communication channels and reuse in terms of data interlinking are positively correlated.

database and expert systems applications | 2017

MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates

Kemele M. Endris; Mikhail Galkin; Ioanna Lytra; Mohamed Nadjib Mami; Maria-Esther Vidal; Sören Auer

The increasing number of RDF data sources that allow for querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data sources, and rely on source descriptions for selecting the data sources over which unified queries will be executed. Albeit efficient, existing federated SPARQL query engines usually ignore the meaning of data accessible from a data source, and describe sources only in terms of the vocabularies utilized in the data source. Lack of source description may conduce to the erroneous selection of data sources for a query, thus affecting the performance of query processing over the federation. We tackle the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of RDF molecule templates, i.e., abstract descriptions of entities belonging to the same RDF class. Moreover, MULDER utilizes RDF molecule templates for source selection, and query decomposition and optimization. We empirically study the performance of MULDER on existing benchmarks, and compare MULDER performance with state-of-the-art federated SPARQL query engines. Experimental results suggest that RDF molecule templates empower MULDER federated query processing, and allow for the selection of RDF data sources that not only reduce execution time, but also increase answer completeness.

database and expert systems applications | 2018

BOUNCER: Privacy-aware Query Processing Over Federations of RDF Datasets

Kemele M. Endris; Zuhair Almhithawi; Ioanna Lytra; Maria-Esther Vidal; Sören Auer

Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for the citizens. However, effective data-centric applications demand data management techniques able to process a large volume of data which may include sensitive data, e.g., financial transactions, medical procedures, or personal data. Managing sensitive data requires the enforcement of privacy and access control regulations, particularly, during the execution of queries against datasets that include sensitive and non-sensitive data. In this paper, we tackle the problem of enforcing privacy regulations during query processing, and propose BOUNCER, a privacy-aware query engine over federations of RDF datasets. BOUNCER allows for the description of RDF datasets in terms of RDF molecule templates, i.e., abstract descriptions of the properties of the entities in an RDF dataset and their privacy regulations. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over RDF datasets that not only contain the relevant entities to answer a query, but that are also regulated by policies that allow for accessing these relevant entities. We empirically evaluate the effectiveness of the BOUNCER privacy-aware techniques over state-of-the-art benchmarks of RDF datasets. The observed results suggest that BOUNCER can effectively enforce access control regulations at different granularity without impacting the performance of query processing.

international semantic web conference | 2015