Manish Kumar Anand | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manish Kumar Anand is active.

Explore More

Publication

Featured researches published by Manish Kumar Anand.

extending database technology | 2010

Techniques for efficiently querying scientific workflow provenance graphs

Manish Kumar Anand; Shawn Bowers; Bertram Ludäscher

A key advantage of scientific workflow systems over traditional scripting approaches is their ability to automatically record data and process dependencies introduced during workflow runs. This information is often represented through provenance graphs, which can be used by scientists to better understand, reproduce, and verify scientific results. However, while most systems record and store data and process dependencies, few provide easy-to-use and efficient approaches for accessing and querying provenance information. Instead, users formulate provenance graph queries directly against physical data representations (e.g., relational, XML, or RDF), leading to queries that are difficult to express and expensive to evaluate. We address these problems through a high-level query language tailored for expressing provenance graph queries. The language is based on a general model of provenance supporting scientific workflows that process XML data and employ update semantics. Query constructs are provided for querying both structure and lineage information. Unlike other languages that return sets of nodes as answers, our query language is closed, i.e., answers to lineage queries are sets of lineage dependencies (edges) allowing answers to be further queried. We provide a formal semantics for the language and present novel techniques for efficiently evaluating lineage queries. Experimental results on real and synthetic provenance traces demonstrate that our lineage based optimizations outperform an in-memory and standard database implementation by orders of magnitude. We also show that our strategies are feasible and can significantly reduce both provenance storage size and query execution time when compared with standard approaches.

international provenance and annotation workshop | 2008

Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Shawn Bowers; Timothy M. McPhillips; Sean Riddle; Manish Kumar Anand; Bertram Ludäscher

The complexity of scientific workflows for analyzing biological data creates a number of challenges for current workflow and provenance systems. This complexity is due in part to the nature of scientific data (e.g., heterogeneous, nested data collections) and the programming constructs required for automation (e.g., nested workflows, looping, pipeline parallelism). We present an extended version of the Kepler scientific workflow system to address these challenges, tailored for the systematics community. Our system combines novel approaches for representing scientific data, modeling and automating complex analyses, and recording and browsing associated provenance information.

statistical and scientific database management | 2009

Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

Manish Kumar Anand; Shawn Bowers; Timothy M. McPhillips; Bertram Ludäscher

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employ update semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventional provenance approaches to such models results in provenance information that is either too coarse (e.g., stating that one version of an XML document depends entirely on a prior version) or potentially incorrect (e.g., stating that each element of an XML document depends on every element in a prior version). We describe a generic provenance model that naturally represents workflow runs involving processes that work over nested data collections and that employ update semantics. Moreover, we extend current query approaches to support our model, enabling queries to be posed not only over data lineage relationships, but also over versions of nested data structures produced during a workflow run. We show how hybrid queries can be expressed against our model using high-level query constructs and implemented efficiently over relational provenance storage schemes.

BMC Genomics | 2006

Cis-regulatory variations: A study of SNPs around genes showing cis-linkage in segregating mouse populations

Debraj GuhaThakurta; Tao Xie; Manish Kumar Anand; Stephen Edwards; Guoya Li; Susanna S. Wang; Eric E. Schadt

BackgroundChanges in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene e xpression q uantitative t rait l oci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites.ResultsWe demonstrate that the cis-e QTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes.ConclusionThis present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area.

international conference on data engineering | 2010

Provenance browser: Displaying and querying scientific workflow provenance graphs

Manish Kumar Anand; Shawn Bowers; Bertram Ludäscher

This demonstration presents an interactive provenance browser for visualizing and querying data dependency (lineage) graphs produced by scientific workflow runs. The browser allows users to explore different views of provenance as well as to express complex and recursive graph queries through a high-level query language (QLP). Answers to QLP queries are lineage preserving in that queries return sets of lineage dependencies (denoting provenance graphs), which can be further queried and visually displayed (as graphs) in the browser. By combining provenance visualization, navigation, and query, the provenance browser can enable scientists to more easily access and explore scientific workflow provenance information.

workflows in support of large-scale science | 2010

Linking multiple workflow provenance traces for interoperable collaborative science

Paolo Missier; Bertram Ludäscher; Shawn Bowers; Saumen C. Dey; Anandarup Sarkar; Biva Shrestha; Ilkay Altintas; Manish Kumar Anand; Carole A. Goble

Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

international provenance and annotation workshop | 2010

Understanding Collaborative Studies through Interoperable Workflow Provenance

Ilkay Altintas; Manish Kumar Anand; Daniel Crawl; Shawn Bowers; Adam Belloum; Paolo Missier; Bertram Ludäscher; Carole A. Goble; Peter M. A. Sloot

The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. Currently, most provenance models are designed to capture the provenance related to a single run, and mostly executed by a single user. However, a scientific discovery is often the result of methodical execution of many scientific workflows with many datasets produced at different times by one or more users. Further, to promote and facilitate exchange of information between multiple workflow systems supporting provenance, the Open Provenance Model (OPM) has been proposed by the scientific workflow community. In this paper, we describe a new query model that captures implicit user collaborations. We show how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions. We also adopt and extend the high-level Query Language for Provenance (QLP) with additional constructs, and show how these extensions allow non-expert users to express collaborative provenance queries against this model easily and concisely. Furthermore, we adopt the Provenance Challenge 3 (PC3) workflows as a collaborative and interoperable usecase scenario, where different stages of the workflow are executed in three different workflow environments - Kepler, Taverna, and WSVLAM. Through this usecase, we demonstrate how we can establish and understand collaborative studies through interoperable workflow provenance.

international provenance and annotation workshop | 2010

Approaches for Exploring and Querying Scientific Workflow Provenance Graphs

Manish Kumar Anand; Shawn Bowers; Ilkay Altintas; Bertram Ludäscher

While many scientific workflow systems track and record data provenance, few tools have been developed that provide convenient and effective ways to access and explore this information. Two important ways for provenance information to be accessed and explored is through browsing (i.e., visualizing and navigating data and process dependencies) and querying (e.g., to select certain portions of provenance graphs or to determine if certain paths exist between items within a graph). We extend our prior work on representing and querying data provenance by showing how these can be effectively and efficiently combined into an interactive provenance browser. The browser allows different views of provenance to be explored and queried, where queries are expressed in a declarative graph-based provenance query language. Query results are expressed as provenance subgraphs, which can be further visualized and navigated through the browser. The browser supports a generic model of provenance that can be used with various workflow computation models, and has a direct translation to the Open Provenance Model. We present the provenance model, the query language, and describe the overall browser architecture and implementation.

statistical and scientific database management | 2012

Database support for exploring scientific workflow provenance graphs

Manish Kumar Anand; Shawn Bowers; Bertram Ludäscher

Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and analyze data and process dependencies. We address these issues through a set of abstractions that allow users to construct specialized views of provenance graphs. Our model provides operations that allow users to expand, collapse, filter, group, and summarize all or portions of provenance graphs to construct tailored provenance views. A unique feature of the model is that it can be implemented using standard relational database technology, which has a number of advantages in terms of supporting existing provenance frameworks and efficiency and scalability of the model. We present and formalize the operations within the model as a set of relational queries expressed against an underlying provenance schema. We also present a detailed experimental evaluation that demonstrates the feasibility and efficiency of our approach against provenance graphs generated from a number of scientific workflows.

IEEE Data(base) Engineering Bulletin | 2007