Shawn Bowers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shawn Bowers is active.

Explore More

Publication

Featured researches published by Shawn Bowers.

Ecological Informatics | 2007

An ontology for describing and synthesizing ecological observation data

Joshua S. Madin; Shawn Bowers; Mark Schildhauer; Sergey Krivov; Deana D. Pennington; Ferdinando Villa

Abstract Research in ecology increasingly relies on the integration of small, focused studies, to produce larger datasets that allow for more powerful, synthetic analyses. The results of these synthetic analyses are critical in guiding decisions about how to sustainably manage our natural environment, so it is important for researchers to effectively discover relevant data, and appropriately integrate these within their analyses. However, ecological data encompasses an extremely broad range of data types, structures, and semantic concepts. Moreover, ecological data is widely distributed, with few well-established repositories or standard protocols for their archiving and retrieval. These factors make the discovery and integration of ecological data sets a highly labor-intensive task. Metadata standards such as the Ecological Metadata Language and Darwin Core are important steps for improving our ability to discover and access ecological data, but are limited to describing only a few, relatively specific aspects of data content ( e.g. , data owner and contact information, variable “names”, keyword descriptions, etc. ). A more flexible and powerful way to capture the semantic subtleties of complex ecological data, its structure and contents, and the inter-relationships among data variables is needed. We present a formal ontology for capturing the semantics of generic scientific observation and measurement. The ontology provides a convenient basis for adding detailed semantic annotations to scientific data, which crystallize the inherent “meaning” of observational data. The ontology can be used to characterize the context of an observation ( e.g. , space and time), and clarify inter-observational relationships such as dependency hierarchies ( e.g. , nested experimental observations) and meaningful dimensions within the data ( e.g. , axes for cross-classified categorical summarization). It also enables the robust description of measurement units ( e.g. , grams of carbon per liter of seawater), and can facilitate automatic unit conversions ( e.g. , pounds to kilograms). The ontology can be easily extended with specialized domain vocabularies, making it both broadly applicable and highly customizable. Finally, we describe the utility of the ontology for enriching the capabilities of data discovery and integration processes.

Trends in Ecology and Evolution | 2008

Advancing ecological research with ontologies

Joshua S. Madin; Shawn Bowers; Mark Schildhauer; Matthew Jones

Ecology is inherently cross-disciplinary, drawing together many types of information to address questions about the natural world. Finding and integrating relevant data to assist in these analyses is crucial, but is difficult owing to ambiguous terminology and the lack of sufficient information about datasets. Ontologies provide a formal mechanism for defining terms and their relationships, and can improve the location, interpretation and integration of data based on its inherent meaning. Ontologies have assisted other disciplines (e.g. molecular biology) in unifying and enriching descriptions of data, and ecology can benefit from similar approaches. We review ontology efforts in ecology, and describe how these can benefit research by enhancing the location and interpretation of relevant data for confronting crucial ecological questions.

international conference on conceptual modeling | 2005

Actor-oriented design of scientific workflows

Shawn Bowers; Bertram Ludäscher

Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. Scientific workflow systems are problem-solving environments, supporting scientists in the creation and execution of scientific workflows. While current systems permit the creation of executable workflows, conceptual modeling and design of scientific workflows has largely been neglected. Unlike business workflows, scientific workflows are typically highly data-centric naturally leading to dataflow-oriented modeling approaches. We first develop a formal model for scientific workflows based on an actor-oriented modeling and design approach, originally developed for studying models of complex concurrent systems. Actor-oriented modeling separates two modeling concerns: component communication (dataflow) and overall workflow coordination (orchestration). We then extend our framework by introducing a novel hybrid type system, separating further the concerns of conventional data modeling (structural data type) and conceptual modeling (semantic type). In our approach, semantic and structural mismatches can be handled independently or simultaneously, and via different types of adapters, giving rise to new methods of scientific workflow design.

data integration in the life sciences | 2004

An Ontology-Driven Framework for Data Transformation in Scientific Workflows

Shawn Bowers; Bertram Ludäscher

Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate semantic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, appropriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.

international provenance and annotation workshop | 2006

A model for user-oriented data provenance in pipelined scientific workflows

Shawn Bowers; Timothy M. McPhillips; Bertram Ludäscher; Shirley Cohen; Susan B. Davidson

Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more significant that provenance information may also be used by scientists to reproduce results from earlier runs, to explain unexpected results, and to prepare results for publication. Current workflow systems offer little or no direct support for these “scientist-oriented” queries of provenance information. Indeed the use of advanced execution models in scientific workflows (e.g. process networks, which exhibit pipeline parallelism over streaming data) and failure to record certain fundamental events such as state resets of processes, can render existing provenance schemas useless for scientific applications of provenance. We develop a simple provenance model that is capable of supporting a wide range of scientific use cases even for complex models of computation such as process networks. Our approach reduces these use cases to database queries over event logs, and is capable of reconstructing complete data and invocation dependency graphs for a workflow run.

business process management | 2009

Scientific Workflows: Business as Usual?

Bertram Ludäscher; Mathias Weske; Timothy M. McPhillips; Shawn Bowers

Business workflow management and business process modeling are mature research areas, whose roots go far back to the early days of office automation systems. Scientific workflow management, on the other hand, is a much more recent phenomenon, triggered by (i) a shift towards data-intensive and computational methods in the natural sciences, and (ii) the resulting need for tools that can simplify and automate recurring computational tasks. In this paper, we provide an introduction and overview of scientific workflows, highlighting features and important concepts commonly found in scientific workflow applications. We illustrate these using simple workflow examples from a bioinformatics domain. We then discuss similarities and, more importantly, differences between scientific workflows and business workflows. While some concepts and solutions developed in one domain may be readily applicable to the other, there remain sufficiently many differences that warrant a new research effort at the intersection of scientific and business workflows. We close by proposing a number of research opportunities for cross-fertilization between the scientific workflow and business workflow communities.

extending database technology | 2010

Techniques for efficiently querying scientific workflow provenance graphs

Manish Kumar Anand; Shawn Bowers; Bertram Ludäscher

A key advantage of scientific workflow systems over traditional scripting approaches is their ability to automatically record data and process dependencies introduced during workflow runs. This information is often represented through provenance graphs, which can be used by scientists to better understand, reproduce, and verify scientific results. However, while most systems record and store data and process dependencies, few provide easy-to-use and efficient approaches for accessing and querying provenance information. Instead, users formulate provenance graph queries directly against physical data representations (e.g., relational, XML, or RDF), leading to queries that are difficult to express and expensive to evaluate. We address these problems through a high-level query language tailored for expressing provenance graph queries. The language is based on a general model of provenance supporting scientific workflows that process XML data and employ update semantics. Query constructs are provided for querying both structure and lineage information. Unlike other languages that return sets of nodes as answers, our query language is closed, i.e., answers to lineage queries are sets of lineage dependencies (edges) allowing answers to be further queried. We provide a formal semantics for the language and present novel techniques for efficiently evaluating lineage queries. Experimental results on real and synthetic provenance traces demonstrate that our lineage based optimizations outperform an in-memory and standard database implementation by orders of magnitude. We also show that our strategies are feasible and can significantly reduce both provenance storage size and query execution time when compared with standard approaches.

PLOS ONE | 2014

Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

Ramona L. Walls; John Deck; Robert P. Guralnick; Steve Baskauf; Reed S. Beaman; Stanley Blum; Shawn Bowers; Pier Luigi Buttigieg; Neil Davies; Dag Terje Filip Endresen; Maria A. Gandolfo; Robert Hanner; Alyssa Janning; Leonard Krishtalka; Andréa M. Matsunaga; Peter E. Midford; Norman Morrison; Éamonn Ó Tuama; Mark Schildhauer; Barry Smith; Brian J. Stucky; Andrea K. Thomer; John Wieczorek; Jamie Whitacre; John Wooley

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

data integration in the life sciences | 2006

Collection-Oriented scientific workflows for integrating and analyzing biological data

Timothy M. McPhillips; Shawn Bowers; Bertram Ludäscher

Steps in scientific workflows often generate collections of results, causing the data flowing through workflows to become increasingly nested. Because conventional workflow components (or actors) typically operate on simple or application-specific data types, additional actors often are required to manage these nested data collections. As a result, conventional workflows become increasingly complex as data becomes more nested. This paper describes a new paradigm for developing scientific workflows that transparently manages nested data collections. Collection-oriented workflows have a number of advantages over conventional approaches including simpler workflow designs (e.g., requiring fewer actors and control-flow constructs) that are invariant under changes in data nesting. Our implementation within the Kepler scientific workflow system enables the explicit representation of collections and collection schemas, concurrent operation over collection contents via multi-level pipeline parallelism, and allows collection-aware actors to be composed readily from conventional actors.

international provenance and annotation workshop | 2008

Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Shawn Bowers; Timothy M. McPhillips; Sean Riddle; Manish Kumar Anand; Bertram Ludäscher

The complexity of scientific workflows for analyzing biological data creates a number of challenges for current workflow and provenance systems. This complexity is due in part to the nature of scientific data (e.g., heterogeneous, nested data collections) and the programming constructs required for automation (e.g., nested workflows, looping, pipeline parallelism). We present an extended version of the Kepler scientific workflow system to address these challenges, tailored for the systematics community. Our system combines novel approaches for representing scientific data, modeling and automating complex analyses, and recording and browsing associated provenance information.

Explore More