Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jeremy Debattista is active.

Publication


Featured researches published by Jeremy Debattista.


ieee international conference semantic computing | 2016

Luzzu -- A Framework for Linked Data Quality Assessment

Jeremy Debattista; Sören Auer; Christoph Lange

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data, and subsequently to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This paper describes Luzzu, a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics, (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be reused within different semantic frameworks, (3) a scalable stream processor for data dumps and SPARQL endpoints, and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets with regard to relevant metrics.


Journal of Data and Information Quality | 2016

Luzzu—A Methodology and Framework for Linked Data Quality Assessment

Jeremy Debattista; Sören Auer; Christoph Lange

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics; (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) scalable dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.


international conference on semantic systems | 2014

Representing dataset quality metadata using multi-dimensional views

Jeremy Debattista; Christoph Lange; Sören Auer

Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.


european semantic web conference | 2015

Quality Assessment of Linked Datasets Using Probabilistic Approximation

Jeremy Debattista; Santiago Londoño; Christoph Lange; Sören Auer

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.


international conference on human-computer interaction | 2013

Interacting with a Context-Aware Personal Information Sharing System

Simon Scerri; Andreas Schuller; Ismael Rivera; Judie Attard; Jeremy Debattista; Massimo Valla; Fabian Hermann; Siegfried Handschuh

The di.me userware is a decentralised personal information sharing system with a difference: extracted information and observed personal activities are exploited to automatically recognise personal situations, provide privacy-related warnings, and recommend and/or automate user actions. To enable reasoning, personal information from multiple devices and online sources is integrated and transformed to a machine-interpretable format. Aside from distributed personal information monitoring, an intuitive user interface also enables the i) manual customisation of advanced context-driven services and ii) their semi-automatic adaptation across interactive notifications. In this paper we outline how average users interact with the current user interface, and our plans to improve it.


web intelligence, mining and semantics | 2016

Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment

Harsh Thakkar; Kemele M. Endris; José M. Giménez-García; Jeremy Debattista; Christoph Lange; Sören Auer

The current decade is a witness to an enormous explosion of data being published on the Web as Linked Data to maximise its reusability. Answering questions that users speak or write in natural language is an increasingly popular application scenario for Web Data, especially when the domain of the questions is not limited to a domain where dedicated curated datasets exist, like in medicine. The increasing use of Web Data in this and other settings has highlighted the importance of assessing its quality. While quite some work has been done with regard to assessing the quality of Linked Data, only few efforts have been dedicated to quality assessment of linked data from the question answering domains perspective. From the linked data quality metrics that have so far been well documented in the literature, we have identified those that are most relevant for QA. We apply these quality metrics, implemented in the Luzzu framework, to subsets of two datasets of crucial importance to open domain QA -- DBpedia and Wikidata -- and thus present the first assessment of the quality of these datasets for QA. From these datasets, we assess slices covering the specific domains of restaurants, politicians, films and soccer players. The results of our experiments suggest that for most of these domains, the quality of Wikidata with regard to the majority of relevant metrics is higher than that of DBpedia.


web information systems engineering | 2013

Processing Ubiquitous Personal Event Streams to Provide User-Controlled Support

Jeremy Debattista; Simon Scerri; Ismael Rivera; Siegfried Handschuh

The increase in use of smart devices nowadays provides us with a lot of personal data and context information. In this paper we describe an approach which allows users to define and register rules based on their personal data activities in an event processor, which continuously listens to perceived context data and triggers any satisfied rules. We describe the Rule Management Ontology (DRMO) as a means to define rules using a standard format, whilst providing a scalable solution in the form of a Rule Network Event Processor which detects and analyses events, triggering rules which are satisfied. Following an evaluation of the network v.s. a simplistic sequential approach, we justify a trade-off between initialisation time and processing time.


Sprachwissenschaft | 2017

Literally better: Analyzing and improving the quality of literals

Wouter Beek; Filip Ilievski; Jeremy Debattista; Stefan Schlobach; Jan Wielemaker

Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.


ieee international conference semantic computing | 2016

Towards Cleaning-Up Open Data Portals: A Metadata Reconciliation Approach

Alan Freihof Tygel; Sören Auer; Jeremy Debattista; Fabrizio Orlandi; Maria Luiza Machado Campos

This paper presents an approach for metadata reconciliation, curation and linking for Open Governamental Data Portals (ODPs). ODPs have been lately the standard solution for governments willing to put their public data available for the society. Portal managers use several types of metadata to organize the datasets, one of the most important ones being the tags. However, the tagging process is subject to many problems, such as synonyms, ambiguity or incoherence, among others. As our empiric analysis of ODPs shows, these issues are currently prevalent in most ODPs and effectively hinders the reuse of Open Data. In order to address these problems, we develop and implement an approach for tag reconciliation in Open Data Portals, encompassing local actions related to individual portals, and global actions for adding a semantic metadata layer above individual portals. The local part aims to enhance the quality of tags in a single portal, and the global part is meant to interlink ODPs by establishing relations between tags.


european semantic web conference | 2014

di.me: Ontologies for a Pervasive Information System

Simon Scerri; Ismael Rivera; Jeremy Debattista; Simon Thiel; Keith Cortis; Judie Attard; Christian Knecht; Andreas Schuller; Fabian Hermann

The di.me userware is a pervasive personal information management system that successfully adopted ontologies to provide various intelligent features. Supported by a suitable user interface, di.me provides ontology-driven support for the (i) integration of personal information from multiple personal sources, (ii) privacy-aware sharing of personal data, (iii) context-awareness and personal situation recognition, and (iv) creation of personalised rules that operate over live events to provide notifications, effect system changes or share data.

Collaboration


Dive into the Jeremy Debattista's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ismael Rivera

National University of Ireland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Siegfried Handschuh

National University of Ireland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Javier D. Fernández

Vienna University of Economics and Business

View shared research outputs
Top Co-Authors

Avatar

Jürgen Umbrich

Vienna University of Economics and Business

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge