Khalid Belhajjame | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khalid Belhajjame is active.

Explore More

Publication

Featured researches published by Khalid Belhajjame.

Nucleic Acids Research | 2013

The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud

Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan R. Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria Paula Balcazar Vargas; Shoaib Sufi; Carole A. Goble

The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

international provenance and annotation workshop | 2008

Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements

Paolo Missier; Khalid Belhajjame; Jun Zhao; Marco Roos; Carole A. Goble

The provenance, or lineage , of a workflow data product can be reconstructed by keeping a complete trace of workflow execution. This lineage information, however, is likely to be both imprecise, because of the black-box nature of the services that compose the workflow, and noisy, because of the many trivial data transformations that obscure the intended purpose of the workflow. In this paper we argue that these shortcomings can be alleviated by introducing a small set of optional lightweight annotations to the workflow, in a principled way. We begin by presenting a baseline, annotation-free lineage model for the Taverna workflow system, and then show how the proposed annotations improve the results of fundamental lineage queries.

ACM Transactions on The Web | 2008

Automatic annotation of Web services based on workflow definitions

Khalid Belhajjame; Suzanne M. Embury; Norman W. Paton; Robert Stevens; Carole A. Goble

Semantic annotations of web services can support the effective and efficient discovery of services, and guide their composition into workflows. At present, however, the practical utility of such annotations is limited by the small number of service annotations available for general use. Manual annotation of services is a time consuming and thus expensive task, so some means are required by which services can be automatically (or semi-automatically) annotated. In this paper, we show how information can be inferred about the semantics of operation parameters based on their connections to other (annotated) operation parameters within tried-and-tested workflows. Because the data links in the workflows do not necessarily contain every possible connection of compatible parameters, we can infer only constraints on the semantics of parameters. We show that despite their imprecise nature these so-called loose annotations are still of value in supporting the manual annotation task, inspecting workflows and discovering services. We also show that derived annotations for already annotated parameters are useful. By comparing existing and newly derived annotations of operation parameters, we can support the detection of errors in existing annotations, the ontology used for annotation and in workflows. The derivation mechanism has been implemented, and its practical applicability for inferring new annotations has been established through an experimental evaluation. The usefulness of the derived annotations is also demonstrated.

international conference on e-science | 2012

Why workflows break — Understanding and combating decay in Taverna workflows

Jun Zhao; José Manuél Gómez-Pérez; Khalid Belhajjame; Graham Klyne; Esteban García-Cuesta; Aleix Garrido; Kristina M. Hettne; Marco Roos; David De Roure; Carole A. Goble

Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.

extending database technology | 2013

The W3C PROV family of specifications for modelling provenance metadata

Paolo Missier; Khalid Belhajjame; James Cheney

Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling specific domains. This tutorial provides an account of these specifications. Starting from intuitive and informal examples that present idiomatic provenance patterns, it progressively introduces the relational model of provenance along with the constraints model for validation of provenance documents, and concludes with example applications that show the extension points in use.

extending database technology | 2010

Fine-grained and efficient lineage querying of collection-based workflow provenance

Paolo Missier; Norman W. Paton; Khalid Belhajjame

The management and querying of workflow provenance data underpins a collection of activities, including the analysis of workflow results, and the debugging of workflows or services. Such activities require efficient evaluation of lineage queries over potentially complex and voluminous provenance logs. Näive implementations of lineage queries navigate provenance logs by joining tables that represent the flow of data between connected processors invoked from workflows. In this paper we provide an approach to provenance querying that: (i) avoids joins over provenance logs by using information about the workflow definition to inform the construction of queries that directly target relevant lineage results; (ii) provides fine grained provenance querying, even for workflows that create and consume collections; and (iii) scales effectively to address complex workflows, workflows with large intermediate data sets, and queries over multiple workflows.

Journal of Web Semantics | 2015

Using a suite of ontologies for preserving workflow-centric research objects

Khalid Belhajjame; Jun Zhao; Daniel Garijo; Matthew Gamble; Kristina M. Hettne; Raúl Palma; Eleni Mina; Oscar Corcho; José Manuél Gómez-Pérez; Sean Bechhofer; Graham Klyne; Carole A. Goble

Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects-aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntingtons disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.

Journal of Biomedical Semantics | 2013

PAV ontology: provenance, authoring and versioning

Paolo Ciccarese; Stian Soiland-Reyes; Khalid Belhajjame; Alasdair J. G. Gray; Carole A. Goble; Timothy W.I. Clark

BackgroundProvenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.ResultsWe present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing “just enough” descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability.MethodThe initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible.DiscussionWe analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.

extending database technology | 2010

Feedback-based annotation, selection and refinement of schema mappings for dataspaces

Khalid Belhajjame; Norman W. Paton; Suzanne M. Embury; Alvaro A. A. Fernandes; Cornelia Hedeler

The specification of schema mappings has proved to be time and resource consuming, and has been recognized as a critical bottleneck to the large scale deployment of data integration systems. In an attempt to address this issue, dataspaces have been proposed as a data management abstraction that aims to reduce the up-front cost required to setup a data integration system by gradually specifying schema mappings through interaction with end users in a pay-as-you-go fashion. As a step in this direction, we explore an approach for incrementally annotating schema mappings using feedback obtained from end users. In doing so, we do not expect users to examine mapping specifications; rather, they comment on results to queries evaluated using the mappings. Using annotations computed on the basis of user feedback, we present a method for selecting from the set of candidate mappings, those to be used for query evaluation considering user requirements in terms of precision and recall. In doing so, we cast mapping selection as an optimization problem. Mapping annotations may reveal that the quality of schema mappings is poor. We also show how feedback can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement. The results of evaluation exercises show the effectiveness of our solution for annotating, selecting and refining schema mappings.

Information Systems | 2013

Incrementally improving dataspaces based on user feedback

Khalid Belhajjame; Norman W. Paton; Suzanne M. Embury; Alvaro A. A. Fernandes; Cornelia Hedeler

One aspect of the vision of dataspaces has been articulated as providing various benefits of classical data integration with reduced up-front costs. In this paper, we present techniques that aim to support schema mapping specification through interaction with end users in a pay-as-you-go fashion. In particular, we show how schema mappings, that are obtained automatically using existing matching and mapping generation techniques, can be annotated with metrics estimating their fitness to user requirements using feedback on query results obtained from end users. Using the annotations computed on the basis of user feedback, and given user requirements in terms of precision and recall, we present a method for selecting the set of mappings that produce results meeting the stated requirements. In doing so, we cast mapping selection as an optimization problem. Feedback may reveal that the quality of schema mappings is poor. We show how mapping annotations can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement. User feedback can also be used to annotate the results of the queries that the user poses against an integration schema. We show how estimates for precision and recall can be computed for such queries. We also investigate the problem of propagating feedback about the results of (integration) queries down to the mappings used to populate the base relations in the integration schema.

Explore More