Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Saumen C. Dey is active.

Publication


Featured researches published by Saumen C. Dey.


workflows in support of large-scale science | 2010

Linking multiple workflow provenance traces for interoperable collaborative science

Paolo Missier; Bertram Ludäscher; Shawn Bowers; Saumen C. Dey; Anandarup Sarkar; Biva Shrestha; Ilkay Altintas; Manish Kumar Anand; Carole A. Goble

Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.


ieee international conference on high performance computing data and analytics | 2012

Modeling and Querying Scientific Workflow Provenance in the D-OPM

Víctor Cuevas-Vicenttín; Saumen C. Dey; Michael Li Yuan Wang; Tianhong Song; Bertram Ludäscher

We present the D-OPM, a model that extends the Open Provenance Model (OPM) with workflow-specific aspects. In particular, our model captures aspects such as the workflow structure, traces, data structure, and workflow evolution. Thus, it enables scientists to obtain detailed information about the origin of data resulting from past experiments, as well as about the process itself and its possible future executions. A reference implementation of the D-OPM validates our model and opens the opportunity for interoperation with multiple workflow systems. Furthermore, to facilitate querying D-OPM data we introduce a querying mechanism based on regular path queries (RPQs) on provenance graphs. Our RPQs evaluator is built on a relational DBMS which makes it robust and extensible.


Datenbank-spektrum | 2012

Scientific Workflows and Provenance: Introduction and Research Opportunities

Víctor Cuevas-Vicenttín; Saumen C. Dey; Sven Köhler; Sean Riddle; Bertram Ludäscher

Scientific workflows are becoming increasingly popular for compute-intensive and data-intensive scientific applications. The vision and promise of scientific workflows includes rapid, easy workflow design, reuse, scalable execution, and other advantages, e.g., to facilitate “reproducible science” through provenance (e.g., data lineage) support. However, as described in the paper, important research challenges remain. While the database community has studied (business) workflow technologies extensively in the past, most current work in scientific workflows seems to be done outside of the database community, e.g., by practitioners and researchers in the computational sciences and eScience. We provide a brief introduction to scientific workflows and provenance, and identify areas and problems that suggest new opportunities for database research.


international provenance and annotation workshop | 2016

Yin & Yang: Demonstrating Complementary Provenance from noWorkflow & YesWorkflow

João Felipe Pimentel; Saumen C. Dey; Timothy M. McPhillips; Khalid Belhajjame; David Koop; Leonardo Murta; Vanessa Braganholo; Bertram Ludäscher

The noWorkflow and YesWorkflow toolkits both enable researchers to capture, store, query, and visualize the provenance of results produced by scripts that process scientific data. noWorkflow captures prospective provenance representing the program structure of Python scripts, and retrospective provenance representing key events observed during script execution. YesWorkflow captures prospective provenance declared through annotations in the comments of scripts, and supports key retrospective provenance queries by observing what files were used or produced by the script. We demonstrate how combining complementary information gathered by noWorkflow and YesWorkflow enables provenance queries and data lineage visualizations neither tool can provide on its own.


edbt icdt workshops | 2013

A declarative approach to customize workflow provenance

Saumen C. Dey; Bertram Ludäscher

Provenance describes the origin, context, derivation, and ownership of data products and is becoming increasingly important in scientific applications. This information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In contrast, it may be infeasible or undesirable to share complete provenance of a scientific experiment. Towards finding a balance between these requirements, we develop a framework and a system that allows scientists to declaratively specify their provenance data publication and customization requirements. Using this system, scientists can specify which parts of the provenance data are to be included in the result and which parts should be hidden, or anonymized. However, arbitrary application of these specifications may not maintain provenance data integrity. Thus, we allow scientists to specify provenance data integrity requirements, in form of provenance policies, along with their provenance data publication and customization requirements. Our system then systematically applies all the publication and customization requirements on the provenance data and ensures all the provenance policies as specified by the scientist.


international semantic web conference | 2011

Reconciling provenance policy conflicts by inventing anonymous nodes

Saumen C. Dey; Daniel Zinn; Bertram Ludäscher

In scientific collaborations, provenance is increasingly used to understand, debug, and explain the processing history of data, and to determine the validity and quality of data products. While provenance is easily recorded by scientific workflow systems, it can be infeasible or undesirable to publish provenance details for all data products of a workflow run. We have developed ProPub, a system that allows users to publish a customized version of their data provenance, based on a set of publication and customization requests, while observing certain provenance publication policies, expressed as logic integrity constraints. When user requests conflict with provenance policies, repair actions become necessary. In prior work, we removed additional parts of the provenance graph (i.e., not directly requested by the user) to repair constraint violations. In this paper, we present an alternative approach, which ensures that all relevant nodes are retained in the provenance graph. The key idea is to introduce new anonymous nodes to represent lineage dependencies, without revealing information that the user wants to protect. With this new approach, a user may now explore different provenance publication strategies, and choose the most appropriate one before publishing sensitive provenance data.


international provenance and annotation workshop | 2014

Computing Location-Based Lineage from Workflow Specifications to Optimize Provenance Queries

Saumen C. Dey; Sven Köhler; Shawn Bowers; Bertram Ludäscher

We present a location-based approach for executing provenance lineage queries that significantly reduces query execution cost without incurring additional storage costs. The key idea of our approach is to exploit the fact that provenance graphs resemble the workflow graphs that generated them and that many workflow computation models assume workflow steps have statically defined data consumption-production i.e., data input-output rates. We describe a new lineage computation technique that uses the structure of workflow specifications together with consumption-production rates to pre-compute i.e., to forecast the access paths of all dependent data items prior to workflow execution. We also present experimental results showing that our approach can significantly out perform traditional data lineage query techniques.


international provenance and annotation workshop | 2014

Provenance Storage, Querying, and Visualization in PBase

Víctor Cuevas-Vicenttín; Parisa Kianmajd; Bertram Ludäscher; Paolo Missier; Fernando Chirigati; Yaxing Wei; David Koop; Saumen C. Dey

We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.


International Journal of Digital Curation | 2015

YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

Timothy M. McPhillips; Tianhong Song; Tyler Kolisnik; Steve Aulenbach; Khalid Belhajjame; R. Kyle Bocinsky; Yang Cao; James Cheney; Fernando Chirigati; Saumen C. Dey; Juliana Freire; Christopher Jones; James Hanken; Keith W. Kintigh; Timothy A. Kohler; David Koop; James A. Macklin; Paolo Missier; Mark Schildhauer; Christopher R. Schwalm; Yaxing Wei; Mark Bieda; Bertram Ludäscher


statistical and scientific database management | 2011

PROPUB: towards a declarative approach for publishing customized, policy-aware provenance

Saumen C. Dey; Daniel Zinn; Bertram Ludäscher

Collaboration


Dive into the Saumen C. Dey's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sven Köhler

University of California

View shared research outputs
Top Co-Authors

Avatar

Tianhong Song

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel Zinn

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yaxing Wei

Oak Ridge National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge