Is this you? Create Your Porfile

Daniel Zinn

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Zinn is active.

Explore More

Publication

Featured researches published by Daniel Zinn.

ieee/acm international symposium cluster, cloud and grid computing | 2011

Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms

Daniel Zinn; Quinn Hart; Timothy M. McPhillips; Bertram Ludäscher; Yogesh Simmhan; Michail Giakkoupis; Viktor K. Prasanna

Scientific workflows are commonplace in eScience applications. Yet, the lack of integrated support for data models, including streaming data, structured collections and files, is limiting the ability of workflows to support emerging applications in energy informatics that are stream oriented. This is compounded by the absence of Cloud data services that support reliable and performant streams. In this paper, we propose and present a scientific workflow framework that supports streams as first-class data, and is optimized for performant and reliable execution across desktop and Cloud platforms. The workflow framework features and its empirical evaluation on a private Eucalyptus cloud are presented.

international conference on data engineering | 2011

Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler

Lei Dou; Daniel Zinn; Timothy M. McPhillips; Sven Köhler; Sean Riddle; Shawn Bowers; Bertram Ludäscher

Scientific workflow systems are used to integrate existing software components (actors) into larger analysis pipelines to perform in silico experiments. Current approaches for handling data in nested-collection structures, as required in many scientific domains, lead to many record-management actors (shims) that make the workflow structure overly complex, and as a consequence hard to construct, evolve and maintain. By constructing and executing workflows from bioinformatics and geosciences in the Kepler system, we will demonstrate how COMAD (Collection-Oriented Modeling and Design), an extension of conventional workflow design, addresses these shortcomings. In particular, COMAD provides a hierarchical data stream model (as in XML) and a novel declarative configuration language for actors that functions as a middleware layer between the workflows data model (streaming nested collections) and the actors data model (base data and lists thereof). Our approach allows actor developers to focus on the internal actor processing logic oblivious to the workflow structure. Actors can then be re-used in various workflows simply by adapting actor configurations. Due to streaming nested collections and declarative configurations, COMAD workflows can usually be realized as linear data processing pipelines, which often reflect the scientific data analysis intention better than conventional designs. This linear structure not only simplifies actor insertions and deletions (workflow evolution), but also decreases the overall complexity of the workflow, reducing future effort in maintenance.

arXiv: Databases | 2013

First-Order Provenance Games

Sven Köhler; Bertram Ludäscher; Daniel Zinn

We propose a new model of provenance, based on a game-theoretic approach to query evaluation. First, we study games G in their own right, and ask how to explain that a position x in G is won, lost, or drawn. The resulting notion of game provenance is closely related to winning strategies, and excludes from provenance all “bad moves”, i.e., those which unnecessarily allow the opponent to improve the outcome of a play. In this way, the value of a position is determined by its game provenance. We then define provenance games by viewing the evaluation of a first-order query as a game between two players who argue whether a tuple is in the query answer. For \(\mathcal{RA}^+\) queries, we show that game provenance is equivalent to the most general semiring of provenance polynomials ℕ[X]. Variants of our game yield other known semirings. However, unlike semiring provenance, game provenance also provides a “built-in” way to handle negation and thus to answer why-not questions: In (provenance) games, the reason why x is not won, is the same as why x is lost or drawn (the latter is possible for games with draws). Since first-order provenance games are draw-free, they yield a new provenance model that combines how- and why-not provenance.

statistical and scientific database management | 2011

Improving workflow fault tolerance through provenance-based recovery

Sven Köhler; Sean Riddle; Daniel Zinn; Timothy M. McPhillips; Bertram Ludäscher

Scientific workflow systems frequently are used to execute a variety of long-running computational pipelines prone to premature termination due to network failures, server outages, and other faults. Researchers have presented approaches for providing fault tolerance for portions of specific workflows, but no solution handles faults that terminate the workflow engine itself when executing a mix of stateless and stateful workflow components. Here we present a general framework for efficiently resuming workflow execution using information commonly captured by workflow systems to record data provenance. Our approach facilitates fast workflow replay using only such commonly recorded provenance data. We also propose a checkpoint extension to standard provenance models to significantly reduce the computation needed to reset the workflow to a consistent state, thus resulting in much shorter reexecution times. Our work generalizes the rescue-DAG approach used by DAGMan to richer workflow models that may contain stateless and stateful multi-invocation actors as well as workflow loops.

Acta Crystallographica Section D-biological Crystallography | 2013

AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery.

Yingssu Tsai; Scott E. McPhillips; Ana Gonzalez; Timothy M. McPhillips; Daniel Zinn; Aina E. Cohen; Michael D. Feese; David Bushnell; Theresa Tiefenbrunn; C. David Stout; Bertram Ludaescher; Britt Hedman; Keith O. Hodgson; S. Michael Soltis

AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.

symposium on principles of database systems | 2014

Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the calm-conjecture

Tom J. Ameloot; Bas Ketsman; Frank Neven; Daniel Zinn

The CALM-conjecture, first stated by Hellerstein [23] and proved in its revised form by Ameloot et al. [13] within the framework of relational transducer networks, asserts that a query has a coordination-free execution strategy if and only if the query is monotone. Zinn et al. [32] extended the framework of relational transducer networks to allow for specific data distribution strategies and showed that the nonmonotone win-move query is coordination-free for domain-guided data distributions. In this paper, we complete the story by equating increasingly larger classes of coordination-free computations with increasingly weaker forms of monotonicity and make Datalog variants explicit that capture each of these classes. One such fragment is based on stratified Datalog where rules are required to be connected with the exception of the last stratum. In addition, we characterize coordination-freeness as those computations that do not require knowledge about all other nodes in the network, and therefore, can not globally coordinate. The results in this paper can be interpreted as a more fine-grained answer to the CALM-conjecture.

databases information systems and peer to peer computing | 2005

Processing rank-aware queries in P2P systems

Katja Hose; Marcel Karnstedt; Anke Koch; Kai-Uwe Sattler; Daniel Zinn

Efficient query processing in P2P systems poses a variety of challenges. As a special problem in this context we consider the evaluation of rank-aware queries, namely top-N and skyline, on structured data. The optimization of query processing in a distributed manner at each peer requires locally available statistics. In this paper, we address this problem by presenting approaches relying on the R-tree and histogram-based index structures. We show how this allows for optimizing rank-aware queries even over multiple attributes and thus significantly enhances the efficiency of query processing.

workflows in support of large-scale science | 2010

Streaming satellite data to cloud workflows for on-demand computing of environmental data products

Daniel Zinn; Quinn Hart; Bertram Ludäscher; Yogesh Simmhan

Environmental data arriving constantly from satellites and weather stations are used to compute weather coefficients that are essential for agriculture and viticulture. For example, the reference evapotranspiration (ET0) coefficient, overlaid on regional maps, is provided each day by the California Department of Water Resources to local farmers and turf managers to plan daily water use. Scaling out single-processor compute/data intensive applications operating on realtime data to support more users and higher-resolution data poses data engineering challenges. Cloud computing helps data providers expand resource capacity to meet growing needs besides supporting scientific needs like reprocessing historic data using new models. In this article, we examine migration of a legacy script used for daily ET0 computation by CIMIS to a workflow model that eases deployment to and scaling on the Windows Azure Cloud. Our architecture incorporates a direct streaming model into Cloud virtual machines (VMs) that improves the performance by 130% to 160% for our workflow over using Cloud storage for data staging, used commonly. The streaming workflows achieve runtimes comparable to desktop execution for single VMs and a linear speed-up when using multiple VMs, thus allowing computation of environmental coefficients at a much larger resolution than done presently.

international provenance and annotation workshop | 2010

Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance

Daniel Zinn; Bertram Ludäscher

Provenance graphs capture flow and dependency information recorded during scientific workflow runs, which can be used subsequently to interpret, validate, and debug workflow results. In this paper, we propose the new concept of Abstract Provenance Graphs (APGs). APGs are created via static analysis of a configured workflow W and input data schema, i.e., before W is actually executed. They summarize all possible provenance graphs the workflow W can create with input data of type τ, that is, for each input v ∈ τ there exists a graph homomorphism \(\mathcal H_v\) between the concrete and abstract provenance graph. APGs are helpful during workflow construction since (1) they make certain workflow design-bugs (e.g., selecting none or wrong input data for the actors) easy to spot; and (2) show the evolution of the overall data organization of a workflow. Moreover, after workflows have been run, APGs can be used to validate concrete provenance graphs. A more detailed version of this work is available as [14].

international conference on data engineering | 2010

XML-based computation for scientific workflows

Daniel Zinn; Shawn Bowers; Bertram Ludäscher

Scientific workflows are increasingly used to rapidly integrate existing algorithms to create larger and more complex programs. However, designing workflows using purely dataflow-oriented computation models introduces a number of challenges, including the need to use low-level components to mediate and transform data (so-called shims) and large numbers of additional ¿wires¿ for routing data to components within a workflow. To address these problems, we employ Virtual Data Assembly Lines (VDAL), a modeling paradigm that can eliminate most shims and reduce wiring complexity. We show how a VDAL design can be implemented using existing XML technologies and how static analysis can provide significant help to scientists during workflow design and evolution, e.g., by displaying actor dependencies or by detecting so-called unproductive actors.

Explore More