Harsh Thakkar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harsh Thakkar is active.

Explore More

Publication

Featured researches published by Harsh Thakkar.

european semantic web conference | 2017

Towards an Open Extensible Framework for Empirical Benchmarking of Data Management Solutions: LITMUS

Harsh Thakkar

Developments in the context of Open, Big, and Linked Data have led to an enormous growth of structured data on the Web. To keep up with the pace of efficient consumption and management of the data at this rate, many Data Management Solutions There exists many efforts for benchmarking these domain specific DMSs, however, (i) reproducing these third party benchmarks is an extremely tedious task, and (ii) there is a lack of a common framework which enables and advocates the extensibility and re-usability of the benchmarks. We propose LITMUS, one such framework for benchmarking data management solutions. LITMUS will go beyond classical storage benchmarking frameworks by allowing for analysing the performance of DMSs across query languages. In this early stage doctoral work, we present the LITMUS concept as well as the considerations that led to its preliminary architecture, and progress reported so far in its realisation.

database and expert systems applications | 2017

Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin

Harsh Thakkar; Dharmen Punjani; Sören Auer; Maria-Esther Vidal

Graph data management has revealed beneficial characteristics in terms of flexibility and scalability by differently balancing between query expressivity and schema flexibility. This has resulted into an rapid developing new task specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present day graph query languages are focused towards flexible graph pattern matching (aka sub-graph matching), where as graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language and machine, provides a common platform for supporting any graph computing system (such as an OLTP graph database or OLAP graph processors). We present a formalization of graph pattern matching for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.

web intelligence, mining and semantics | 2016

Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment

Harsh Thakkar; Kemele M. Endris; José M. Giménez-García; Jeremy Debattista; Christoph Lange; Sören Auer

The current decade is a witness to an enormous explosion of data being published on the Web as Linked Data to maximise its reusability. Answering questions that users speak or write in natural language is an increasingly popular application scenario for Web Data, especially when the domain of the questions is not limited to a domain where dedicated curated datasets exist, like in medicine. The increasing use of Web Data in this and other settings has highlighted the importance of assessing its quality. While quite some work has been done with regard to assessing the quality of Linked Data, only few efforts have been dedicated to quality assessment of linked data from the question answering domains perspective. From the linked data quality metrics that have so far been well documented in the literature, we have identified those that are most relevant for QA. We apply these quality metrics, implemented in the Luzzu framework, to subsets of two datasets of crucial importance to open domain QA -- DBpedia and Wikidata -- and thus present the first assessment of the quality of these datasets for QA. From these datasets, we assess slices covering the specific domains of restaurants, politicians, films and soccer players. The results of our experiments suggest that for most of these domains, the quality of Wikidata with regard to the majority of relevant metrics is higher than that of DBpedia.

international conference on semantic systems | 2017

Trying Not to Die Benchmarking: Orchestrating RDF and Graph Data Management Solution Benchmarks Using LITMUS

Harsh Thakkar; Yashwant Keswani; Mohnish Dubey; Jens Lehmann; Sören Auer

Knowledge graphs, usually modelled via RDF or property graphs, have gained importance over the past decade. In order to decide which Data Management Solution (DMS) performs best for specific query loads over a knowledge graph, it is required to perform benchmarks. Benchmarking is an extremely tedious task demanding repetitive manual effort, therefore it is advantageous to automate the whole process. However, there is currently no benchmarking framework which supports benchmarking and comparing diverse DMSs for both RDF and property graph DMS. To this end, we introduce, the first working prototype of, LITMUS which provides this functionality as well as fine-grained environment configuration options, a comprehensive set of DMS and CPU-specific key performance indicators and a quick analytical support via custom visualization (i.e. plots) for the benchmarked DMSs.

database and expert systems applications | 2017

QAestro – Semantic-Based Composition of Question Answering Pipelines

Kuldeep Singh; Ioanna Lytra; Maria-Esther Vidal; Dharmen Punjani; Harsh Thakkar; Christoph Lange; Sören Auer

The demand for interfaces that allow users to interact with computers in an intuitive, effective, and efficient way is increasing. Question Answering (QA) systems address this need by answering questions posed by humans using knowledge bases. In recent years, many QA systems and related components have been developed both by practitioners and the research community. Since QA involves a vast number of (partially overlapping) subtasks, existing QA components can be combined in various ways to build tailored QA systems that perform better in terms of scalability and accuracy in specific domains and use cases. However, to the best of our knowledge, no systematic way exists to formally describe and automatically compose such components. Thus, in this work, we introduce QAestro, a framework for semantically describing both QA components and developer requirements for QA component composition. QAestro relies on a controlled vocabulary and the Local-as-View (LAV) approach to model QA tasks and components, respectively. Furthermore, the problem of QA component composition is mapped to the problem of LAV query rewriting, and state-of-the-art SAT solvers are utilized to efficiently enumerate the solutions. We have formalized 51 existing QA components implemented in 20 QA systems using QAestro. Our empirical results suggest that QAestro enumerates the combinations of QA components that effectively implement QA developer requirements.

international semantic web conference | 2016

Assessing Trust with PageRank in the Web of Data

José M. Giménez-García; Harsh Thakkar; Antoine Zimmermann

While a number of quality metrics have been successfully proposed for datasets in the Web of Data, there is a lack of trust metrics that can be computed for any given dataset. We argue that reuse of data can be seen as an act of trust. In the Semantic Web environment, datasets regularly include terms from other sources, and each of these connections express a degree of trust on that source. However, determining what is a dataset in this context is not straightforward. We study the concepts of dataset and dataset link, to finally use the concept of Pay-Level Domain to differentiate datasets, and consider usage of external terms as connections among them. Using these connections we compute the PageRank value for each dataset, and examine the influence of ignoring predicates for computation. This process has been performed for more than 300 datasets, extracted from the LOD Laundromat. The results show that reuse of a dataset is not correlated with its size, and provide some insight on the limitations of the approach and ways to improve its efficacy.

international conference on knowledge capture | 2017

Dataset Reuse: An Analysis of References in Community Discussions, Publications and Data

Kemele M. Endris; José M. Giménez-García; Harsh Thakkar; Elena Demidova; Antoine Zimmermann; Christoph Lange; Elena Simperl

Following the Linked Data principles means maximising the reusability of data over the Web. Reuse of datasets can become apparent when datasets are linked to from other datasets, and referred in scientific articles or community discussions. It can thus be measured, similarly to citations of papers. In this paper we propose dataset reuse metrics and use these metrics to analyse indications of dataset reuse in different communication channels within a scientific community. In particular we consider mailing lists and publications in the Semantic Web community and their correlation with data interlinking. Our results demonstrate that indications of dataset reuse across different communication channels and reuse in terms of data interlinking are positively correlated.

international conference on management of data | 2018

Two for one: querying property graph databases using SPARQL via g remlinator

Harsh Thakkar; Dharmen Punjani; Jens Lehmann; Sören Auer

In the past decade Knowledge graphs have become very popular and frequently rely on the Resource Description Framework (RDF) or Property Graphs (PG) as their data models. However, the query languages for these two data models - SPARQL for RDF and the PG traversal language Gremlin - are lacking basic interoperability. In this demonstration paper, we present Gremlinator, the first translator from SPARQL - the W3C standardized language for RDF - to Gremlin - a popular property graph traversal language. Gremlinator translates SPARQL queries to Gremlin path traversals for executing graph pattern matching queries over graph databases. This allows a user, who is well versed in SPARQL, to access and query a wide variety of Graph databases avoiding the steep learning curve for adapting to a new Graph Query Language (GQL). Gremlin is a graph computing system-agnostic traversal language (covering both OLTP graph databases and OLAP graph processors), making it a desirable choice for supporting interoperability for querying Graph databases. Gremlinator is planned to be released as an Apache TinkerPop plugin in the upcoming releases.

international world wide web conferences | 2016