Abhay Kumar Jha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abhay Kumar Jha is active.

Explore More

Publication

Featured researches published by Abhay Kumar Jha.

very large data bases | 2012

Probabilistic databases with MarkoViews

Abhay Kumar Jha; Dan Suciu

Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, tree-decomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations, and query evaluation then is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the views outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tuple-independent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite non-obvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data.

international conference on database theory | 2011

Knowledge compilation meets database theory: compiling queries to decision diagrams

Abhay Kumar Jha; Dan Suciu

The goal of Knowledge Compilation is to represent a Boolean expression in a format in which it can answer a range of online-queries in PTIME. The online-query of main interest to us is model counting, because of its application to query evaluation on probabilistic databases, but other online-queries can be supported as well such as testing for equivalence, testing for implication, etc. In this paper we study the following problem. Given a database query q, decide whether its lineage can be compiled efficiently into a given target language. We consider four target languages, of strictly increasing expressive power(when the size of compilation is constrained to be polynomial in the input size): Read-Once Boolean formulae, OBDD, FBDD and d-DNNF. For each target, we study the class of database queries that admit polynomial size representation: these queries can also be evaluated in PTIME over probabilistic databases. When queries are restricted to conjunctive queries without self-joins, it was known that these four classes collapse to the class of hierarchical queries, which is also the class of PTIME queries over probabilistic databases. Our main result in this paper is that, in the case of Unions of Conjunctive Queries (UCQ), these classes form a strict hierarchy. Thus, unlike conjunctive queries without self-joins, the expressive power of UCQ differs considerably w.r.t. these target compilation languages. Moreover, we give a complete characterization of the first two target languages, based on the querys syntax.

extending database technology | 2010

Bridging the gap between intensional and extensional query evaluation in probabilistic databases

Abhay Kumar Jha; Dan Olteanu; Dan Suciu

There are two broad approaches to query evaluation over probabilistic databases: (1) Intensional Methods proceed by manipulating expressions over symbolic events associated with uncertain tuples. This approach is very general and can be applied to any query, but requires an expensive postprocessing phase, which involves some general-purpose probabilistic inference. (2) Extensional Methods, on the other hand, evaluate the query by translating operations over symbolic events to a query plan; extensional methods scale well, but they are restricted to safe queries. In this paper, we bridge this gap by proposing an approach that can translate the evaluation of any query into extensional operators, followed by some post-processing that requires probabilistic inference. Our approach uses characteristics of the data to adapt smoothly between the two evaluation strategies. If the query is safe or becomes safe because of the data instance, then the evaluation is completely extensional and inside the database. If the query/data combination departs from the ideal setting of a safe query, then some intensional processing is performed, whose complexity depends only on the distance from the ideal setting.

international conference on database theory | 2012

On the tractability of query compilation and bounded treewidth

Abhay Kumar Jha; Dan Suciu

We consider the problem of computing the probability of a Boolean function, which generalizes the model counting problem. Given an OBDD for such a function, its probability can be computed in linear time in the size of the OBDD. In this paper we investigate the connection between treewidth and the size of the OBDD. Bounded treewidth has proven to be applicable to many graph problems, which are NP-hard in general but become tractable on graphs with bounded treewidth. However, it is less well understood how bounded treewidth can be used for the probability computation problem of a Boolean function. We introduce a new notion of treewidth of a Boolean function, called the expression treewidth, as the smallest treewidth of any DAG-expression representing the function. Our new notion of bounded treewidth includes some previously known tractable cases: all read-once Boolean functions, and all functions having a bounded treewidth of the primal graph or of the incidence graph also have a bounded expression treewidth. We show that bounded expression treewidth implies the existence of a polynomial size OBDD, and that bounded expression pathwidth implies the existence of a constant-width OBDD. We also show a converse of the latter result: constant-width OBDD imply bounded expression pathwidth. We then study the implications of these results to query compilation, where the Boolean function is the lineage of a fixed query on varying input databases. We give a syntactic characterizations of all UCQ≠ queries that admit a polynomial size OBDD, showing that these are precisely inversion-free queries with unrestricted use of ≠. It was previously known that inversion-free queries characterize precisely those UCQ queries that have a polynomial size OBDD, and that these also have a constant width OBDD: in contrast, inversion-free queries with ≠ have polynomial-width OBDD, thus using the full power of OBDD. Finally, we show that in the case of UCQ, the four classes studied in this paper collapse: bounded expression pathwidth, bounded expression treewidth, constant-width OBDD, and polynomial size OBDD.

symposium on principles of database systems | 2008

Query evaluation with soft-key constraints

Abhay Kumar Jha; Vibhor Rastogi; Dan Suciu

Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constraint by penalizing every violating world by a quantity proportional to the violation. To represent our probabilistic database with constraints, we define a class of markov networks, where we can do query evaluation in PTIME. We also study the evaluation of conjunctive queries on relations with soft keys and present a dichotomy that separates this set into those in PTIME and the rest which are #P-Hard.

neural information processing systems | 2010