Daniel Deutch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Deutch is active.

Explore More

Publication

Featured researches published by Daniel Deutch.

very large data bases | 2008

Type inference and type checking for queries on execution traces

Daniel Deutch; Tova Milo

This paper studies, for the first time, the management of type information for an important class of semi-structured data: nested DAGs (Directed Acyclic Graphs) that describe execution traces of business processes (BPs for short). Specifically, we consider here type inference and type checking for queries over BP execution traces. The queries that we consider select portions of the traces that are of interest to the user; the types describe the possible shape of the execution traces in the input/output of the query. We formally define and characterize here three common classes of BP execution traces and their respective notions of type inference and type checking. We study the complexity of the two problems for query languages of varying expressive power and present efficient type inference/checking algorithms whenever possible. Our analysis offers a nearly complete picture of which combinations of trace classes and query features lead to PTIME algorithms and which to NP-complete or undecidable problems.

database programming languages | 2007

Querying structural and behavioral properties of business processes

Daniel Deutch; Tova Milo

BPQL is a novel query language for querying business process specifications, introduced recently in [5,6]. It is based on an intuitive model of business processes as rewriting systems, an abstraction of the emerging BPEL (Business Process Execution Language) standard [7]. BPQL allows users to query business processes visually, in a manner very analogous to the language used to specify the processes. The goal of the present paper is to study the formal model underlying BPQL and investigate its properties as well as the complexity of query evaluation. We also study its relationship to previously suggested formalisms for process modeling and querying. In particular we propose a query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process. We show that unless P=NP the efficiency of our algorithm is asymptotically optimal.

symposium on principles of database systems | 2010

On probabilistic fixpoint and Markov chain query languages

Daniel Deutch; Christoph Koch; Tova Milo

We study highly expressive query languages such as datalog, fixpoint, and while-languages on probabilistic databases. We generalize these languages such that computation steps (e.g. datalog rules) can fire probabilistically. We define two possible semantics for such query languages, namely inflationary semantics where the results of each computation step are added to the current database and noninflationary queries that induce a random walk in-between database instances. We then study the complexity of exact and approximate query evaluation under these semantics.

international conference on database theory | 2009

TOP-K projection queries for probabilistic business processes

Daniel Deutch; Tova Milo

A Business Process (BP) consists of some business activities undertaken by one or more organizations in pursuit of some business goal. Tools for querying and analyzing BP specifications are extremely valuable for companies. In particular, given a BP specification, identifying the top-k flows that are most likely to occur in practice, out of those satisfying a given query criteria, is crucial for various applications such as personalized advertizement and BP web-site design. This paper studies, for the first time, top-k query evaluation for queries with projection in this context. We analyze the complexity of the problem for different classes of distribution functions for the flows likelihood, and provide efficient (PTIME) algorithms whenever possible. Furthermore, we show an interesting application of our algorithms to the analysis of BP execution traces (logs), for recovering missing information about the run-time process behavior, that has not been recorded in the logs.

symposium on principles of database systems | 2011

On provenance minimization

Yael Amsterdamer; Daniel Deutch; Tova Milo; Val Tannen

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.

international conference on data engineering | 2009

Evaluating TOP-K Queries over Business Processes

Daniel Deutch; Tova Milo

A Business Process (BP) consists of some business activities undertaken by one or more organizations in pursuit of some business goal. Tools for querying and analyzing BP specifications are extremely valuable for companies as they allow to optimize the BP, identify potential problems, and reduce operational costs. In particular, given a BP specification, identifying the top-k execution flows that are most likely to occur in practice out of those satisfying the query criteria, is crucial for various applications. To address this need, we introduce in this paper the notion of {\em likelihood} for BP execution flows, and study top-k query evaluation (finding the

symposium on principles of database systems | 2011

A quest for beauty and wealth (or, business processes for database researchers)

Daniel Deutch; Tova Milo

international conference on data engineering | 2011

Using Markov Chain Monte Carlo to play Trivia

Daniel Deutch; Ohad Greenshpan; Boris Kostenko; Tova Milo

most likely matches) for queries over BP specifications. We analyze the complexity of query evaluation in this context and present novel algorithms for computing top-k query results. To our knowledge, this is the first paper that studies such top-k query evaluation for BP specifications.

ACM Transactions on Database Systems | 2012

On Provenance Minimization

Yael Amsterdamer; Daniel Deutch; Tova Milo; Val Tannen

While classic data management focuses on the data itself, research on Business Processes considers also the context in which this data is generated and manipulated, namely the processes, the users, and the goals that this data serves. This allows the analysts a better perspective of the organizational needs centered around the data. As such, this research is of fundamental importance. Much of the success of database systems in the last decade is due to the beauty and elegance of the relational model and its declarative query languages, combined with a rich spectrum of underlying evaluation and optimization techniques, and efficient implementations. This, in turn, has lead to an economic wealth for both the users and vendors of database systems. Similar beauty and wealth are sought for in the context of Business Processes. Much like the case for traditional database research, elegant modeling and rich underlying technology are likely to bring economic wealth for the Business Process owners and their users; both can benefit from easy formulation and analysis of the processes. While there have been many important advances in this research in recent years, there is still much to be desired: specifically, there have been many works that focus on the processes behavior (flow), and many that focus on its data, but only very few works have dealt with both. We will discuss here the important advantages of a holistic flow-and-data framework for Business Processes, the progress towards such a framework, and highlight the current gaps and research directions.

very large data bases | 2015

Selective provenance for datalog programs using top-k queries

Daniel Deutch; Amir Gilad; Yuval Moskovitch

We introduce in this Demonstration a system called Trivia Masster that generates a very large Database of facts in a variety of topics, and uses it for question answering. The facts are collected from human users (the “crowd”); the system motivates users to contribute to the Database by using a Trivia Game, where users gain points based on their contribution. A key challenge here is to provide a suitable Data Cleaning mechanism that allows to identify which of the facts (answers to Trivia questions) submitted by users are indeed correct / reliable, and consequently how many points to grant users, how to answer questions based on the collected data, and which questions to present to the Trivia players, in order to improve the data quality. As no existing single Data Cleaning technique provides a satisfactory solution to this challenge, we propose here a novel approach, based on a declarative framework for defining recursive and probabilistic Data Cleaning rules. Our solution employs an algorithm that is based on Markov Chain Monte Carlo Algorithms.

Explore More