Daniel Garijo
Technical University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Garijo.
PLOS ONE | 2013
Daniel Garijo; Sarah L. Kinnings; Li Xie; Lei Xie; Yinliang Zhang; Philip E. Bourne; Yolanda Gil
How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to “reproducibility maps” that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one’s own laboratory.
Journal of Web Semantics | 2015
Khalid Belhajjame; Jun Zhao; Daniel Garijo; Matthew Gamble; Kristina M. Hettne; Raúl Palma; Eleni Mina; Oscar Corcho; José Manuél Gómez-Pérez; Sean Bechhofer; Graham Klyne; Carole A. Goble
Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects-aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntingtons disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.
Semantic Web archive | 2013
Ghislain Auguste Atemezing; Oscar Corcho; Daniel Garijo; Jose Mora; María Poveda-Villalón; Pablo Rozas; Daniel Vila-Suero; Boris Villazón-Terrazas
We describe the AEMET meteorological dataset, which makes available some data sources from the Agencia Estatal de Meteorologia AEMET, Spanish Meteorological Office as Linked Data. The data selected for publication are generated every ten minutes by approximately 250 automatic weather stations deployed across Spain and made available as CSV files in the AEMET FTP server. These files are retrieved from the server, processed with Python scripts, transformed to RDF according to an ontology network which reuses the W3C SSN Ontology, published in a triple store and visualized using Map4RDF.
international conference on knowledge capture | 2013
Daniel Garijo; Oscar Corcho; Yolanda Gil
Provenance plays a major role when understanding and reusing the methods applied in a scientific experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and final results. In the specific case of in-silico scientific experiments, a large variety of scientific workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientific experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.
international conference on knowledge capture | 2015
Yolanda Gil; Varun Ratnakar; Daniel Garijo
This paper presents OntoSoft, an ontology to describe metadata for scientific software. The ontology is designed considering how scientists would approach the reuse and sharing of software. This includes supporting a scientist to: 1) identify software, 2) understand and assess software, 3) execute software, 4) get support for the software, 5) do research with the software, and 6) update the software. The ontology is available in OWL and contains more than fifty terms. We are using OntoSoft to structure a software registry for geosciences, and to develop user interfaces to capture its metadata.
edbt icdt workshops | 2013
Khalid Belhajjame; Jun Zhao; Daniel Garijo; Aleix Garrido; Stian Soiland-Reyes; Pinar Alper; Oscar Corcho
We describe a corpus of provenance traces that we have collected by executing 120 real world scientific workflows. The workflows are from two different workflow systems: Taverna [5] and Wings [3], and 12 different application domains (see Figure 1). Table 1 provides a summary of this PROV-corpus.
international conference on semantic systems | 2011
Daniel Garijo; Boris Villazón-Terrazas; Oscar Corcho
We present El Viajero, an application for exploiting, managing and organizing Linked Data in the domain of news and blogs about travelling. El Viajero makes use of several heterogeneous datasets to help users to plan future trips, and relies on the Open Provenance Model for modeling the provenance information of the resources.
workflows in support of large scale science | 2014
Daniel Garijo; Yolanda Gil; Oscar Corcho
Workflows are increasingly used to manage and share scientific computations and methods. Workflow tools can be used to design, validate, execute and visualize scientific workflows and their execution results. Other tools manage workflow libraries or mine their contents. There has been a lot of recent work on workflow system integration as well as common workflow interlinguas, but the interoperability among workflow systems remains a challenge. Ideally, these tools would form a workflow ecosystem such that it should be possible to create a workflow with a tool, execute it with another, visualize it with another, and use yet another tool to mine a repository of such workflows or their executions. In this paper, we describe our approach to create a workflow ecosystem through the use of standard models for provenance (OPM and W3C PROV) and extensions (P-PLAN and OPMW) to represent workflows. The ecosystem integrates different workflow tools with diverse functions (workflow generation, execution, browsing, mining, and visualization) created by a variety of research groups. This is, to our knowledge, the first time that such a variety of workflow systems and functions are integrated.
intelligent user interfaces | 2017
Yolanda Gil; Daniel Garijo
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key entities involved for data, software versions, and workflows; 3) A set of narrative accounts that are automatically generated human-consumable renderings of the record and entities and can be included in a paper. Different narrative accounts can be used for different audiences with different content and details, based on the level of interest or expertise of the reader. Data narratives can make science more transparent and reproducible, because they ensure that the text description of the computational experiment reflects with high fidelity what was actually done. Data narratives can be incorporated in papers, either in the methods section or as supplementary materials. We introduce DANA, a prototype that illustrates how to generate data narratives automatically, and describe the information it uses from the computational records. We also present a formative evaluation of our approach and discuss potential uses of automated data narratives.
workflows in support of large scale science | 2013
Sonja Holl; Daniel Garijo; Khalid Belhajjame; Olav Zimmermann; Renato De Giovanni; Matthias Obst; Carole A. Goble
Reusing and repurposing scientific workflows for novel scientific experiments is nowadays facilitated by workflow repositories. Such repositories allow scientists to find existing workflows and re-execute them. However, workflow input parameters often need to be adjusted to the research problem at hand. Adapting these parameters may become a daunting task due to the infinite combinations of their values in a wide range of applications. Thus, a scientist may preferably use an automated optimization mechanism to adjust the workflow set-up and improve the result. Currently, automated optimizations must be started from scratch as optimization meta-data are not stored together with workflow provenance data. This important meta-data is lost and can neither be reused nor assessed by other researchers. In this paper we present a novel approach to capture optimization meta-data by extending the Research Object model and reusing the W3C standards. We validate our proposal through a real-world use case taken from the biodivertsity domain, and discuss the exploitation of our solution in the context of existing e-Science infrastructures.