Anisa Rula
University of Milano-Bicocca
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anisa Rula.
Sprachwissenschaft | 2015
Amrapali Zaveri; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; Soeren Auer
The development and standardization of semantic web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.
international semantic web conference | 2012
Anisa Rula; Matteo Palmonari; Andreas Harth; Steffen Stadtmüller; Andrea Maurino
An increasing amount of data is published and consumed on the Web according to the Linked Data paradigm. In consideration of both publishers and consumers, the temporal dimension of data is important. In this paper we investigate the characterisation and availability of temporal information in Linked Data at large scale. Based on an abstract definition of temporal information we conduct experiments to evaluate the availability of such information using the data from the 2011 Billion Triple Challenge (BTC) dataset. Focusing in particular on the representation of temporal meta-information, i.e., temporal information associated with RDF statements and graphs, we investigate the approaches proposed in the literature, performing both a quantitative and a qualitative analysis and proposing guidelines for data consumers and publishers. Our experiments show that the amount of temporal information available in the LOD cloud is still very small; several different models have been used on different datasets, with a prevalence of approaches based on the annotation of RDF documents.
Journal of Database Management | 2015
Carlo Batini; Anisa Rula; Monica Scannapieco; Gianluigi Viscusi
This article investigates the evolution of data quality issues from traditional structured data managed in relational databases to Big Data. In particular, the paper examines the nature of the relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics. Consequently a set of structural characteristics is identified and a systematization of the a posteriori correlation between them and quality dimensions is provided. Finally, Big Data quality issues are considered in a conceptual framework suitable to map the evolution of the quality paradigm according to three core coordinates that are significant in the context of the Big Data phenomenon: the data type considered, the source of data, and the application domain. Thus, the framework allows ascertaining the relevant changes in data quality emerging with the Big Data phenomenon, through an integrative and theoretical literature review.
european semantic web conference | 2015
Matteo Palmonari; Anisa Rula; Riccardo Porrini; Andrea Maurino; Blerina Spahiu; Vincenzo Ferme
While much work has focused on continuously publishing Linked Open Data, little work considers how to help consumers to better understand existing datasets. ABSTAT framework aims at providing a better understanding of big and complex datasets by extracting summaries of linked data sets based on an ontology-driven data abstraction model. Our ABSTAT framework takes as input a data set and an ontology and returns an ontology-driven data summary as output. The summary is exported into RDF and then made accessible through a SPARQL endpoint and a web interface to support the navigation.
ieee international conference semantic computing | 2012
Anisa Rula; Matteo Palmonari; Andrea Maurino
An increasing amount of data are published and consumed on the Web according to the Linked Data paradigm. In such scenario, understanding if the data consumed are up-to-date is crucial. Outdated data are usually considered inappropriate for many crucial tasks, such as make the consumer confident that answers returned to a query are still valid at the time the query is formulated. In this paper we present a first dataset-independent framework for assessing currency of Linked Open Data (LOD) graphs. Starting from the analysis of the 8,713,282 triples containing temporal metadata in the billion triple challenge 2011, we investigate which vocabularies are used to represent versioning metadata, we defined Onto Currency, an ontology that integrates the most frequent properties used in this domain, and supports the collection of metadata from datasets that use different vocabularies. The proposed framework uses this ontology to assess the currency of an RDF graph/statement, by extrapolating it from the currency of the documents that describe the resources occurring in the graphs (statement). The approach has been implemented and evaluated in two different scenarios.
european semantic web conference | 2016
Blerina Spahiu; Riccardo Porrini; Matteo Palmonari; Anisa Rula; Andrea Maurino
An increasing number of research and industrial initiatives have focused on publishing Linked Open Data, but little attention has been provided to help consumers to better understand existing data sets. In this paper we discuss how an ontology-driven data abstraction model supports the extraction and the representation of summaries of linked data sets. The proposed summarization model is the backbone of the ABSTAT framework, that aims at helping users understanding big and complex linked data sets. The proposed model produces a summary that is correct and complete with respect to the assertions of the data set and whose size scales well with respect to the ontology and data size. Our framework is evaluated by showing that it is capable of unveiling information that is not explicitly represented in underspecified ontologies and that is valuable to users, e.g., helping them in the formulation of SPARQL queries.
european semantic web conference | 2014
Anisa Rula; Matteo Palmonari; Axel-Cyrille Ngonga Ngomo; Daniel Gerber; Jens Lehmann; Lorenz Bühmann
Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching, selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia with an F-measure of up to 70%.
international semantic web conference | 2011
Anisa Rula
Since the Linked Data is continuously growing on the Web, the quality of overall data can rapidly degrade over time. The research proposed here deals with the quality assessment in the Linked Data and the temporal linking techniques. First, we conduct an in-depth study of appropriate dimensions and their respectively metrics by defining a data quality framework that evaluates, along these dimensions, linked published data on the Web. Second, since the assessment and improvement of the Linked Data quality such as accuracy or the resolution of heterogeneities is performed through record linkage techniques, we propose an extended technique that apply time in similarity computation which can improve over traditional linkage techniques. This paper describes the core problem, presents the proposed approach, reports on initial results, and lists planned future tasks.
european semantic web conference | 2018
Renzo Arturo Alva Principe; Blerina Spahiu; Matteo Palmonari; Anisa Rula; Flavio De Paoli; Andrea Maurino
As Linked Data available on the Web continue to grow, understanding their structure and content remains a challenging task making such the bottleneck for their reuse. ABSTAT is an online profiling tool which helps data consumers in better understanding the data by extracting ontology-driven patterns and statistics about the data. This demo paper presents the capabilities of the new added feature of ABSTAT.
Journal of Data and Information Quality | 2018
Diego Esteves; Anisa Rula; Aniketh Janardhan Reddy; Jens Lehmann
Among different characteristics of knowledge bases, data quality is one of the most relevant to maximize the benefits of the provided information. Knowledge base quality assessment poses a number of big data challenges such as high volume, variety, velocity, and veracity. In this article, we focus on answering questions related to the assessment of the veracity of facts through Deep Fact Validation (DeFacto), a triple validation framework designed to assess facts in RDF knowledge bases. Despite current developments in the research area, the underlying framework faces many challenges. This article pinpoints and discusses these issues and conducts a thorough analysis of its pipeline, aiming at reducing the error propagation through its components. Furthermore, we discuss recent developments related to this fact validation as well as describing advantages and drawbacks of state-of-the-art models. As a result of this exploratory analysis, we give insights and directions toward a better architecture to tackle the complex task of fact-checking in knowledge bases.