Qaiser Mehmood
National University of Ireland, Galway
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qaiser Mehmood.
international semantic web conference | 2015
Muhammad Saleem; Muhammad Intizar Ali; Aidan Hogan; Qaiser Mehmood; Axel-Cyrille Ngonga Ngomo
We present LSQ: a Linked Dataset describing SPARQL queries extracted from the logs of public SPARQL endpoints. We argue that LSQ has a variety of uses for the SPARQL research community, be it for example to generate custom benchmarks or conduct analyses of SPARQL adoption. We introduce the LSQ data model used to describe SPARQL query executions as RDF. We then provide details on the four SPARQL endpoint logs that we have RDFised thus far. The resulting dataset contains 73 million triples describing 5.7 million query executions.
international semantic web conference | 2015
Muhammad Saleem; Qaiser Mehmood; Axel-Cyrille Ngonga Ngomo
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address this drawback by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs. The generation is achieved by selecting prototypical queries of a user-defined size from the input set of queries. We evaluate our approach on two query logs and show that the benchmarks it generates are accurate approximations of the input query logs. Moreover, we compare four different triple stores with benchmarks generated using our approach and show that they behave differently based on the data they contain and the types of queries posed. Our results suggest that FEASIBLE generates better sample queries than the state of the art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly differ from the rankings generated by previous works.
international semantic technology conference | 2014
Ali Hasnain; Syeda Sana e Zainab; Maulik R. Kamdar; Qaiser Mehmood; Claude N. Warren; Qurratal Ain Fatimah; Helena F. Deus; Muntazir Mehdi; Stefan Decker
Multiple datasets that add high value to biomedical research have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. The ability to easily navigate through these datasets is crucial for personalized medicine and the improvement of drug discovery process. However, navigating these multiple datasets is not trivial as most of these are only available as isolated SPARQL endpoints with very little vocabulary reuse. The content that is indexed through these endpoints is scarce, making the indexed dataset opaque for users. In this paper, we propose an approach for the creation of an active Linked Life Sciences Data Roadmap, a set of congurable rules which can be used to discover links (roads) between biological entities (cities) in the LSLOD cloud. We have catalogued and linked concepts and properties from 137 public SPARQL endpoints. Our Roadmap is primarily used to dynamically assemble queries retrieving data from multiple SPARQL endpoints simultaneously. We also demonstrate its use in conjunction with other tools for selective SPARQL querying, semantic annotation of experimental datasets and the visualization of the LSLOD cloud. We have evaluated the performance of our approach in terms of the time taken and entity capture. Our approach, if generalized to encompass other domains, can be used for road-mapping the entire LOD cloud.
Journal of Biomedical Semantics | 2017
Ali Hasnain; Qaiser Mehmood; Syeda Sana e Zainab; Muhammad Saleem; Claude N. Warren; Durre Zehra; Stefan Decker; Dietrich Rebholz-Schuhmann
BackgroundBiomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.MethodsThe efficient cataloguing approach of the federated query processing system ’BioFed’, the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider).ResultsBioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint’s availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection.ConclusionDeveloping and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
International Journal on Semantic Web and Information Systems | 2016
Ali Hasnain; Qaiser Mehmood; Syeda Sana e Zainab; Aidan Hogan
This publication was supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004, and by Fondecyt Grant No. 11140900
Journal of Biomedical Semantics | 2017
Yasar Khan; Muhammad Saleem; Muntazir Mehdi; Aidan Hogan; Qaiser Mehmood; Dietrich Rebholz-Schuhmann; Ratnesh Sahay
BackgroundSeveral query federation engines have been proposed for accessing public Linked Open Data sources. However, in many domains, resources are sensitive and access to these resources is tightly controlled by stakeholders; consequently, privacy is a major concern when federating queries over such datasets. In the Healthcare and Life Sciences (HCLS) domain real-world datasets contain sensitive statistical information: strict ownership is granted to individuals working in hospitals, research labs, clinical trial organisers, etc. Therefore, the legal and ethical concerns on (i) preserving the anonymity of patients (or clinical subjects); and (ii) respecting data ownership through access control; are key challenges faced by the data analytics community working within the HCLS domain. Likewise statistical data play a key role in the domain, where the RDF Data Cube Vocabulary has been proposed as a standard format to enable the exchange of such data. However, to the best of our knowledge, no existing approach has looked to optimise federated queries over such statistical data.ResultsWe present SAFE: a query federation engine that enables policy-aware access to sensitive statistical datasets represented as RDF data cubes. SAFE is designed specifically to query statistical RDF data cubes in a distributed setting, where access control is coupled with source selection, user profiles and their access rights. SAFE proposes a join-aware source selection method that avoids wasteful requests to irrelevant and unauthorised data sources. In order to preserve anonymity and enforce stricter access control, SAFE’s indexing system does not hold any data instances—it stores only predicates and endpoints. The resulting data summary has a significantly lower index generation time and size compared to existing engines, which allows for faster updates when sources change.ConclusionsWe validate the performance of the system with experiments over real-world datasets provided by three clinical organisations as well as legacy linked datasets. We show that SAFE enables granular graph-level access control over distributed clinical RDF data cubes and efficiently reduces the source selection and overall query execution time when compared with general-purpose SPARQL query federation engines in the targeted setting.
international semantic web conference | 2016
Syeda Sana e Zainab; Qaiser Mehmood; Aidan Hogan; Ali Hasnain
There are hundreds of SPARQL endpoints on the Web, but finding an endpoint relevant to a client’s needs is difficult: each endpoint acts like a black box, often without a description of its content. Herein we briefly describe Sportal: a system that collects meta-data about the content of endpoints and collects them into a central catalogue over which clients can search. Sportal sends queries to individual endpoints offline to learn about their content, generating a best-effort VoID description for each endpoint. These descriptions can then be searched and queried over by clients in the Sportal user interface, for example, to find endpoints that contain instances of a given class, or triples with a given predicate, or more complex requests such as endpoints with at least 1,000 images of people. Herein we give a brief overview of Sportal, its design and functionality, and the features that shall be demoed at the conference.
International Conference on Knowledge Engineering and the Semantic Web | 2015
Ali Hasnain; Qaiser Mehmood; Syeda Sana e Zainab; Stefan Decker
A significant portion of Web of Data is composed of multiple datasets that add high value to biomedical research. These datasets have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. Different initiatives have been proposed for navigating through these datasets with or without vocabulary reuse. The significance of provenance information regarding life sciences data is great as compared to any other domain. With the provenance information, user becomes aware regarding the source, size, format along with authorization and privilege associated with the data. Previously, we proposed an approach for the creation of an active Linked Life Sciences Data Roadmap, that catalogues and links concepts as well as properties from 137 public SPARQL endpoints. In this work we extend the Roadmap with the provenance information collected directly by querying datasets. We designed a set of queries and the results were catalouged. This extended Roadmap is useful for dynamically assembling queries for retrieving data along with the provenance from multiple SPARQL endpoints. We also demonstrate its use in conjunction with other tools for selective SPARQL querying and the visualization of the LSLOD cloud. We have evaluated the performance of our approach in terms of time taken and success rates of data retrieved.
international conference on knowledge capture | 2017
Muhammad Saleem; Claus Stadler; Qaiser Mehmood; Jens Lehmann; Axel-Cyrille Ngonga Ngomo
Query containment is a fundamental problem in data management with its main application being in global query optimization. A number of SPARQL query containment solvers for SPARQL have been recently developed. To the best of our knowledge, the Query Containment Benchmark (QC-Bench) is the only benchmark for evaluating these containment solvers. However, this benchmark contains a fixed number of synthetic queries, which were handcrafted by its creators. We propose SQCFramework, a SPARQL query containment benchmark generation framework which is able to generate customized SPARQL containment benchmarks from real SPARQL query logs. The framework is flexible enough to generate benchmarks of varying sizes and according to the user-defined criteria on the most important SPARQL features to be considered for query containment benchmarking. This is achieved using different clustering algorithms. We compare state-of-the-art SPARQL query containment solvers by using different query containment benchmarks generated from DBpedia and Semantic Web Dog Food query logs. In addition, we analyze the quality of the different benchmarks generated by SQCFramework.
international conference on semantic systems | 2017
Vadim Savenkov; Qaiser Mehmood; Jürgen Umbrich; Axel Polleres
While the volume of graph data available on the Web in RDF is steadily growing, SPARQL, as the standard query language for RDF still remains effectively unusable for the basic task of finding paths through the graph between selected nodes. Property Paths, as introduced in SPARQL 1.1 are unfit for this purpose, as they can only be used to test path existence. More expressive features, such as counting distinct paths between two nodes, have been shown highly intractable in the worst case, in particular in graphs with high degree of cyclicity. Still, practical use cases demand a solution for path retrieval even when the total number of paths is prohibitively large. A common approach is to ask not for all, but only for the k shortest paths. In this paper, we extend SPARQL 1.1 property paths in a manner that allows to compute and return the k shortest paths matching a property path expression between two nodes. For RDF graphs in the compact HDT format, we evaluate or algorithm for top k shortest paths showing that a relatively simple approach works (in fact, more efficiently than other, more complex algorithms in the literature) in practical use cases.