Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amrapali Zaveri is active.

Publication


Featured researches published by Amrapali Zaveri.


Sprachwissenschaft | 2015

Quality assessment for Linked Data: A Survey

Amrapali Zaveri; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; Soeren Auer

The development and standardization of semantic web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.


international world wide web conferences | 2014

Test-driven evaluation of linked data quality

Dimitris Kontokostas; Patrick Westphal; Sören Auer; Sebastian Hellmann; Jens Lehmann; Roland Cornelissen; Amrapali Zaveri

Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.


RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access | 2013

Introduction to linked data and its lifecycle on the web

Sören Auer; Jens Lehmann; Axel-Cyrille Ngonga Ngomo; Amrapali Zaveri

With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has gained some traction in the last years. The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. While many standards, methods and technologies developed within by the Semantic Web community are applicable for Linked Data, there are also a number of specific characteristics of Linked Data, which have to be considered. In this article we introduce the main concepts of Linked Data. We present an overview of the Linked Data lifecycle and discuss individual approaches as well as the state-of-the-art with regard to extraction, authoring, linking, enrichment as well as quality of Linked Data. We conclude the chapter with a discussion of issues, limitations and further research and development challenges of Linked Data. This article is an updated version of a similar lecture given at Reasoning Web Summer School 2011.


international conference on semantic systems | 2013

User-driven quality evaluation of DBpedia

Amrapali Zaveri; Dimitris Kontokostas; Mohamed Ahmed Sherif; Lorenz Bühmann; Mohamed Morsey; Sören Auer; Jens Lehmann

Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information and broken links were the most recurring quality problems. With this study, we not only aim to assess the quality of this sample of DBpedia resources but also adopt an agile methodology to improve the quality in future versions by regularly providing feedback to the DBpedia maintainers.


international semantic web conference | 2013

Crowdsourcing Linked Data Quality Assessment

Maribel Acosta; Amrapali Zaveri; Elena Simperl; Dimitris Kontokostas; Sören Auer; Jens Lehmann

In this paper we look into the use of crowdsourcing as a means to handle Linked Data quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classified them according to the extent to which they are likely to be amenable to a specific form of crowdsourcing. Based on this analysis, we implemented a quality assessment methodology for Linked Data that leverages the wisdom of the crowds in different ways: (i) a contest targeting an expert crowd of researchers and Linked Data enthusiasts; complemented by (ii) paid microtasks published on Amazon Mechanical Turk.We empirically evaluated how this methodology could efficiently spot quality issues in DBpedia. We also investigated how the contributions of the two types of crowds could be optimally integrated into Linked Data curation processes. The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data.


Clinical Orthopaedics and Related Research | 2010

Electronic Data Capture for Registries and Clinical Trials in Orthopaedic Surgery Open Source versus Commercial Systems

Jatin Shah; Dimple Rajgor; Shreyasee S. Pradhan; Mariana McCready; Amrapali Zaveri; Ricardo Pietrobon

BackgroundCollection and analysis of clinical data can help orthopaedic surgeons to practice evidence based medicine. Spreadsheets and offline relational databases are prevalent, but not flexible, secure, workflow friendly and do not support the generation of standardized and interoperable data. Additionally these data collection applications usually do not follow a structured and planned approach which may result in failure to achieve the intended goal.Questions/purposesOur purposes are (1) to provide a brief overview of EDC systems, their types, and related pros and cons as well as to describe commonly used EDC platforms and their features; and (2) describe simple steps involved in designing a registry/clinical study in DADOS P, an open source EDC system.Where are we now?Electronic data capture systems aimed at addressing these issues are widely being adopted at an institutional/national/international level but are lacking at an individual level. A wide array of features, relative pros and cons and different business models cause confusion and indecision among orthopaedic surgeons interested in implementing EDC systems.Where do we need to go?To answer clinical questions and actively participate in clinical studies, orthopaedic surgeons should collect data in parallel to their clinical activities. Adopting a simple, user-friendly, and robust EDC system can facilitate the data collection process.How do we get there?Conducting a balanced evaluation of available options and comparing them with intended goals and requirements can help orthopaedic surgeons to make an informed choice.


JAMA Dermatology | 2014

Global Burden of Skin Disease as Reflected in Cochrane Database of Systematic Reviews

Chante Karimkhani; Lindsay N. Boyers; Laura Prescott; Vivian Welch; Finola M. Delamere; Mona Nasser; Amrapali Zaveri; Roderick J. Hay; Theo Vos; Christopher J L Murray; David J. Margolis; John Hilton; Harriet MacLehose; Hywel C. Williams; Robert P. Dellavalle

IMPORTANCE Research prioritization should be guided by impact of disease. OBJECTIVE To determine whether systematic reviews and protocol topics in Cochrane Database of Systematic Reviews (CDSR) reflect disease burden, measured by disability-adjusted life years (DALYs) from the Global Burden of Disease (GBD) 2010 project. DESIGN, SETTING, AND PARTICIPANTS Two investigators independently assessed 15 skin conditions in the CDSR for systematic review and protocol representation from November 1, 2013, to December 6, 2013. The 15 skin diseases were matched to their respective DALYs from GBD 2010. An official publication report of all reviews and protocols published by the Cochrane Skin Group (CSG) was also obtained to ensure that no titles were missed. There were no study participants other than the researchers, who worked with databases evaluating CDSR and GBD 2010 skin condition disability data. MAIN OUTCOMES AND MEASURES Relationship of CDSR topic coverage (systematic reviews and protocols) with percentage of total 2010 DALYs, 2010 DALY rank, and DALY percentage change from 1990 to 2010 for 15 skin conditions. RESULTS All 15 skin conditions were represented by at least 1 systematic review in CDSR; 69% of systematic reviews and 67% of protocols by the CSG covered the 15 skin conditions. Comparing the number of reviews/protocols and disability, dermatitis, melanoma, nonmelanoma skin cancer, viral skin diseases, and fungal skin diseases were well matched. Decubitus ulcer, psoriasis, and leprosy demonstrated review/protocol overrepresentation when matched with corresponding DALYs. In comparison, acne vulgaris, bacterial skin diseases, urticaria, pruritus, scabies, cellulitis, and alopecia areata were underrepresented in CDSR when matched with corresponding DALYs. CONCLUSIONS AND RELEVANCE Degree of representation in CDSR is partly correlated with DALY metrics. The number of published reviews/protocols was well matched with disability metrics for 5 of the 15 studied skin diseases, while 3 skin diseases were overrepresented, and 7 were underrepresented. Our results provide high-quality and transparent data to inform future prioritization decisions.


International Conference on Knowledge Engineering and the Semantic Web | 2013

TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data

Dimitris Kontokostas; Amrapali Zaveri; Sören Auer; Jens Lehmann

Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. In this paper we focus on the manual process where the first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. The second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is implemented by the tool TripleCheckMate wherein a user assesses an individual resource and evaluates each fact for correctness. This paper focuses on describing the methodology, quality taxonomy and the tools’ system architecture, user perspective and extensibility.


international semantic web conference | 2010

I18n of semantic web applications

Sören Auer; Matthias Weidl; Jens Lehmann; Amrapali Zaveri; Key-Sun Choi

Recently, the use of semantic technologies has gained quite some traction. With increased use of these technologies, their maturation not only in terms of performance, robustness but also with regard to support of non-latin-based languages and regional differences is of paramount importance. In this paper, we provide a comprehensive review of the current state of the internationalization (I18n) of Semantic Web technologies. Since resource identifiers play a crucial role for the Semantic Web, the internatinalization of resource identifiers is of high importance. It turns out that the prevalent resource identification mechanism on the Semantic Web, i.e. URIs, are not sufficient for an efficient internationalization of knowledge bases. Fortunately, with IRIs a standard for international resource identifiers is available, but its support needs much more penetration and homogenization in various semantic web technology stacks. In addition, we review various RDF serializations with regard to their support for internationalized knowledge bases. The paper also contains an in-depth review of popular semantic web tools and APIs with regard to their support for internationalization.


web intelligence | 2011

ReDD-Observatory: Using the Web of Data for Evaluating the Research-Disease Disparity

Amrapali Zaveri; Ricardo Pietrobon; Sören Auer; Jens Lehmann; Michael Martin; Timofey Ermilov

It is widely accepted that there is a large disparity between the availability of treatment options and the prevalence of diseases all over the world, thus placing individuals in danger. This disparity is partially caused by the restricted access to information that would allow health care and research policy makers to formulate more appropriate measures to mitigate it. Specifically, this shortage of information is caused by the difficulty in reliably obtaining and integrating data regarding the disease burden and the respective research investments. In response to these challenges, the Linked Data paradigm provides a simple mechanism for publishing and interlinking structured information on the Web. In conjunction with the ever increasing data on diseases and health care research available as Linked Data, an opportunity is created to reduce this information gap that would allow for better policy in response to these disparities. In this paper, we present the ReDD-Observatory, an approach for evaluating the Research-Disease Disparity based on the interlinking and integrating of various biomedical data sources. Specifically, we devise a method for representing statistical information as Linked Data and adopt interlinking algorithms for integrating relevant datasets (mainly GHO, Linked CT and PubMed). The assessment of the disparity is then performed with a number of parametrized SPARQL queries on the integrated data substrate. As a consequence, we are for the first time able to provide reliable indicators for the extent of the research-disease disparity in a semi-automated fashion, thus enabling health care professionals and policy makers to make more informed decisions.

Collaboration


Dive into the Amrapali Zaveri's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anisa Rula

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kathleen M. Jagodnik

Icahn School of Medicine at Mount Sinai

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge