The Impact of Disjunction on Reasoning under Existential Rules: Research Summary
UUnder consideration for publication in Theory and Practice of Logic Programming The Impact of Disjunction on Reasoning underExistential Rules: Research Summary
MICHAEL MORAK
University of Oxford, Department of Computer Science, OX1 3QD, United Kingdom ( e-mail: [email protected] ) submitted 1 January 2003; revised 1 January 2003; accepted 1 January 2003 Abstract
Datalog ± is a Datalog-based language family enhanced with existential quantification in rule heads, equal-ities and negative constraints. Query answering over databases with respect to a Datalog ± theory is gener-ally undecidable, however several syntactic restrictions have been proposed to remedy this fact. However,a useful and natural feature however is as of yet missing from Datalog ± : The ability to express uncertainknowledge, or choices, using disjunction. It is the precise objective of the doctoral thesis herein discussed, toinvestigate the impact on the complexity of query answering, of adding disjunction to well-known decidableDatalog ± fragments, namely guarded, sticky and weakly-acyclic Datalog ± theories. For guarded theorieswith disjunction, we obtain a strong 2E XP lower bound in the combined complexity, even for very restrictedformalisms like fixed sets of (disjunctive) inclusion dependencies. For sticky theories, the query answeringproblem becomes undecidable, even in the data complexity, and for weakly-acyclic query answering we seea reasonable and expected increase in complexity. A full version of a paper accepted to be presented at the Doctoral Consortium of the 30th InternationalConference on Logic Programming (ICLP 2014), July 19-22, Vienna, AustriaKEYWORDS : Ontological Reasoning, Query Answering, Existential Rules, Logic, TGDs
For the last thirty years, Datalog (see e.g., (Abiteboul et al. 1995)) has played an importantrole as a conceptual query language. Whilst not directly implemented in mainstream databasemanagement systems (DBMS), it did heavily influence the design of the SQL standard, whichnow also allows for recursive statements, as can be expressed in Datalog.However in recent years it has become increasingly important to add ontological reasoningcapabilities to the existing object-relational querying capabilities of traditional DBMS: A queryis no longer just evaluated over the extensional relational database, but also over an ontologicaltheory that, using rules and constraints, describes how to derive new (intensional) knowledgefrom the extensional data. By extending Datalog in such a way that existential quantification, thefirst-order logic constant false and equalities between variables are permitted in the rule heads,this behaviour can be expressed. Recently the Datalog ± family of languages has been proposedin (Cal`ı et al. 2011), that defines sensible restrictions on the structure of such an ontological the-ory. These restrictions are necessary as, depending on the structure of the ontological theory, aninfinite amount of intensional knowledge might be derivable, rising the question of decidabilityof this type of reasoning. Also, as new values can be invented along the way, the domain canbecome infinite. a r X i v : . [ c s . D B ] M a y M. Morak
Despite these obstacles, commercial service providers have already started to integrate onto-logical reasoning engines into their database management systems (see e.g., (Oracle Inc. 2011;Microsoft Corp. 2011)), as there are several applications where such capabilities are desirable,such as data exchange, ontological reasoning (e.g., reasoning under description logics, or in thesemantic web) and web data extraction.
Problem Statement.
Given the fact that ontological reasoning is gaining mainstream acceptanceand the fact that, as for example Answer Set Programming has proven, rule-based languages arewell suited for knowledge representation and reasoning tasks, it is natural to ask how to enrichthe languages we currently have with new, useful constructs. The construct that we want to focuson here is disjunction. Until now, Datalog ± rules only allow us to express deterministic knowl-edge. But what about natural statements like “every person has a parent that is either male orfemale” or “every student is either an undergraduate or a graduate student”? Such statementsare not captured by existing Datalog ± languages. Seeing that disjunctive knowledge is an impor-tant feature in other logical languages like Answer Set Programming or Description Logics thatallows users to intuitively formulate problems by, e.g., applying a guess-and-check approach,enriching Datalog ± with disjunction is therefore a logical next step.The objective of my doctoral studies is thus to introduce the language feature of disjunctionto Datalog ± , and investigate in-depth what the impact of doing so is w.r.t. decidability and com-plexity of reasoning, focussing on conjunctive query answering in particular. In the following subsections, we give a few basic preliminaries describing Datalog ± , as well asan overview over the known results in the area. In this section the basic notions of conjunctive query evaluation under tuple generating depen-dencies (TGDs) are recalled, including a review of the chase procedure, an important algorithmictool in the evaluation of queries under TGDs. Furthermore we briefly introduce the concept ofstable models in the logic programming perspective. We assume that the reader is familiar withfirst-order logic as well as basic complexity theory. Good introductions to the former can befound in e.g. (Barwise 1977) and (Andrews 2002), for the latter we recommend (Papadimitriou1994).
In order to define the semantics of conjunctive queries, we first need to introduce the relationaldata model. In the relational data model, the structure or schema S of a database and its contentsor instance D are distinct objects.A schema S consists of a finite number of relation symbols (also called predicates ) r i , that is, S = { r , . . . , r n } .Such a relation symbol r i ∈ S (for any i ) consists of a finite number of attributes , such thateach attribute has a domain of possible values. We consider here only the case that all predicateshave a common domain Γ ∪ Γ N , where Γ is a set of constants and Γ N is a set of labelled nulls (i.e., easoning under Disjunctive Existential Rules arity , denoted arity ( r i ) .A relation R i for predicate r i is a set of tuples and each tuple is a mapping of each attribute in r i to Γ ∪ Γ N . Such a tuple of R i is denoted by r i ( x , . . . , x k ) (also referred to as an atom ), where k = arity ( i i ) .An instance I for a schema S consists of relations R i for each r i ∈ S , that is, D = { R , . . . , R n } .An instance in which no null values from Γ N appear is referred to as a database , usually denoted D . Note that, when viewed as a first-order theory, we may simply interpret an instance as aconjunction of atoms.A conjunctive query q over a database schema S is an assertion of the form q ( X ) ← ∃ Y ϕ ( X , Y ) where X and Y are vectors of (first-order logic) variables, q ( X ) is called the head , dimension ( X ) is called the arity of q and ϕ ( X , Y ) is called the body , where ϕ ( X , Y ) is a first-order formulaconsisting of a conjunction of atoms of the form r i ( t , . . . , t k ) and equalities of the form t = t ,where r i is a predicate of S with arity k and each t i is either a constant from Γ or a (first-orderlogic) variable. If the arity is 0 then q is called a boolean conjunctive query .With every database D = { R , . . . , R n } over a schema S , we can now associate a finite first-order structure M D = ( U , R , . . . , R n ) with universe U = Γ . The evaluation of a conjunctive query q then comes down to checking satisfiability in first-order logic as follows: q has an answer over D , denoted D | = q , if and only if the set {(cid:104) a , . . . , a k (cid:105) | M D | = q ( a , . . . , a k ) } is non-empty, with a i ∈ Γ . This set is also called the set of answers to q over D , where k is the arity of q . For reasoning tasks over databases, the need arises to express how new ( intensional ) knowledgecan be derived from the data that is stored in the database (called the extensional data ). Anestablished way to do this is to introduce a set Σ of rules that describe the relation betweenintensional and extensional data. In this case for a database D , the logical theory D ∪ Σ , i.e., theconjunction of the facts in the database with all the rules in Σ , is taken as a basis for conjunctivequery evaluation.Rules in Σ over a schema S are of either one of the following two forms: ∀ X ( ϕ ( X ) → ∃ Y ψ ( X , Y )) (1) ∀ X ( ϕ ( X ) → X i = X j ) (2)where rules of the form of (1) are referred to as tuple generating dependencies (TGDs) and of(2) as equality generating dependencies (EGDs), with ϕ and ψ being conjunctions of predicatesfrom S (also called atoms ) and X i and X j are the i -th and j -th position in vector X . ϕ is alsoreferred to as the body of the dependency and ψ or X i = X j as the head. TGDs where ψ = ⊥ arecalled negative constraints . For brevity, we will omit the universal quantifiers in front of TGDsand EGDs, and replace conjunctions in the body by commas.Given an instance I , it is said to be satisfying a dependency σ ∈ Σ , that is, I | = σ , if the first-order sentence formed by a conjunction of the facts in I and σ is satisfiable. By extension, I satisfies Σ ( I | = Σ ) iff it satisfies every σ ∈ Σ .The models of a database D over a schema S with respect to Σ , denoted Mod ( D , Σ ) , are allinstances M that satisfy D ∪ Σ (i.e., I ⊇ D and I | = Σ ). When answering conjunctive queries we M. Morak use the certain answer semantics, i.e., we consider the query to be true only iff it is true underevery model. The set of answers for a conjunctive query q , denoted ans ( q , D , Σ ) , thus equals theset {(cid:104) a , . . . , a k (cid:105) | ∀ M ∈ Mod ( D , Σ ) : M | = q ( a , . . . , a k ) } For complexity analysis we focus on the decision version of this problem. This is the centralproblem when analyzing the complexity of databases, tuple and equality generating dependen-cies and therefore Datalog ± complexity issues. Below it is formulated for boolean conjunctivequeries, which we will focus on in this work:BCQ-A NSWERING
Instance: (cid:104) q , D , Σ (cid:105) : q a boolean conjunctive query, D a database and Σ a set of depen-dencies Question: D ∪ Σ | = q ?Usually when dealing with query evaluation over databases the data complexity and the com-bined complexity are of interest. In this paper we follow the approach of (Vardi 1982) wherefor the former everything except the database D is considered fixed, i.e., the only input is thedatabase. For the latter, the database D , Σ and the query itself form the input.Unfortunately, in general it holds that BCQ-A NSWERING is undecidable under unrestrictedsets of TGDs, as has been shown in (Beeri and Vardi 1981). In (Cal`ı et al. 2013; Baget et al.2009; Baget et al. 2011) it has further been shown that even singleton sets of TGDs cause queryanswering to become undecidable.These results clearly show that restrictions must be placed on the structure of Σ to ensuredecidability. This is a non-trivial problem, as simple restrictions, like limiting the number ofTGDs, are not enough. One of the fundamental tools to algorithmically check implication of dependencies is the chaseprocedure , introduced in (Maier et al. 1979), which was later adapted for checking query contain-ment in (Johnson and Klug 1984), in the setting of databases with tuple and equality generatingdependencies, or, more specifically, in the setting of databases with inclusion and functional de-pendencies. The chase algorithm tries to extend a given database instance in such a way thatevery TGD and EGD becomes satisfied. This is done by exhaustively (i.e., until a fix-point isreached) applying the chase step : Definition 2.1
Let D be a database and Σ be a set of dependencies. A chase step is defined as follows: TGDs.
Let Σ contain a TGD ϕ ( X ) → ∃ Y ψ ( X , Y ) , such that • D | = ϕ ( a ) for some assignment a to X , and • D (cid:50) ∃ Y ψ ( a , Y ) .Then extend D with facts ψ ( a , y ) , where the elements of y are fresh labelled nulls (i.e., valuesfrom Γ N that have not been in use in D up to that point. EGDs.
Let Σ contain an EGD ϕ ( X ) → X i = X j , such that easoning under Disjunctive Existential Rules • D | = ϕ ( a ) for some assignment a to X , and • a i (cid:54) = a j If a i is a labelled null, then replace every occurrence of a i with a j or vice-versa if a j is a labellednull. If a i and a j are distinct constants, end the chase with failure. Definition 2.2
The chase expansion of a database instance D with respect to a set of dependencies Σ is a se-quence D , D , . . . , D m , such that D = D and for i ≥ D i + is obtained from D i by apply-ing a chase step. After exhaustively applying such chase steps, we obtain D m , also denoted chase ( D , Σ ) .The chase can have three different outcomes: Failure, non-terminating success or terminatingsuccess. In case of success the resulting instance D m satisfies all dependencies in Σ . Note that ifthe chase does not terminate, m = inf and the size of D m is infinite.We assume that the chase is fair, i.e., we exclude the possibility of a degenerated chase expan-sion by assuming that the chase expansion is constructed level by level, and after each applicationof a TGD, all applicable EGDs are applied. This ensures that every TGD that can be applied, isapplied, and therefore we exclude the case that only a single infinite path in the chase expansionis ever expanded when in case the chase is infinite. Query Answering and the Chase
In case the chase succeeds, it computes a universal solution for (cid:104) D , Σ (cid:105) . Every model M ∈ Mod ( D , Σ ) can then be obtained by appropriate instantiation oflabelled nulls in chase ( D , Σ ) (i.e., for every model M , there exists a homomorphism mappingthe universal solution to M ; cf. (Deutsch et al. 2008)). Using this property, the chase expansionof a database D , with respect to a set of dependencies Σ , can be used for answering conjunctivequeries, as the following theorem shows: Theorem 2.3 ( (Deutsch et al. 2008) )Given a boolean conjunctive query q over a schema S , a database D of S and a set of depen-dencies Σ over S , then in cases where the chase does not fail, it holds that D ∪ Σ | = q if and onlyif chase ( D , Σ ) | = q .In case the chase fails, query answering is trivial: As there is no model, every boolean con-junctive query clearly is entailed by D ∪ Σ (cf. the definition of certain answers in section 2.1.2). In this section we discuss the different kinds of restrictions known to ensure decidability of queryanswering under sets of TGDs. The decidable classes of TGDs discussed below are defined bysyntactic properties that either apply to single TGDs (local syntactic conditions) or to the setof all TGDs (global syntactic conditions). These properties can be checked in finite time usingappropriate algorithms. Each subsection deals with a known syntactic condition that ensuresdecidability of query answering.
Inclusion Dependencies
Inclusion dependencies (IDs), one of the simplest forms of dependen-cies, allow one to express that certain values occurring in a specific position in one relation, mustalso occur at (or be included in) a specific position in another relation. This allows for TGDs
M. Morak that consist of one body and head atom only, and no variable may occur twice in the head or thebody. The following is an example of an inclusion dependency, expressing that every student isa person: student ( X , Y ) → ∃ Z person ( X , Z ) The query answering problem was shown to be decidable, and in fact in AC (resp. PS PACE ) inthe data (resp. combined) complexity.
Linear Tuple Generating Dependencies
This class is similar to IDs in that it allows for TGDswith only a single body atom, but generalizes them, because it allows repetition of variables inthe body or head (e.g., the TGD r ( X , Y , X ) → s ( X , Y ) is a linear TGD but not an ID).Sets of linear TGDs enjoy the so-called bounded derivation-depth property (BDDP) , whichroughly implies that only a finite initial part of the chase is required for query answering, thusensuring decidability. As with inclusion dependencies, first-order rewritability (i.e., rewriting q and Σ into a first-order query q Σ , such that D ∪ Σ | = q iff D | = q Σ ) is thus possible (cf. (Cal`ı et al.2009; Cal`ı et al. 2010)). Therefore we get decidability, and query answering is in AC in the datacomplexity. Regarding combined complexity, results from inclusion dependencies carry over tolinear TGDs, resulting in the PS PACE -completeness for query answering in the general case andNP-completeness in case of a fixed set of TGDs.
Guarded Tuple Generating Dependencies
In (Cal`ı et al. 2013), linear TGDs are extended to so-called guarded
TGDs, that have a body atom that contains all variables occurring in the body,i.e., all universally quantified variables. This atom is called the guard . If there are multiple suchatoms, the leftmost is taken as the guard. An example of a guarded TGD that says that if studentsare in their first semester, they have a tutor, is as follows. Note that it is not linear as it has multipleatoms in the body. student ( X , Y ) , firstsemester ( X ) → ∃ Z tutor ( X , Z ) Linear TGDs and inclusion dependencies are trivially guarded, as they only have exactly onebody atom. However, guarded TGDs are not first-order rewritable. This is shown by creatinga database, query and a set of guarded TGDs in such a way that answering the query requiresthe computation of the transitive closure over a relation in the database. It is well known thatthis property cannot be expressed in a finite first-order query, and we cannot obtain decidabil-ity thusly. However, it can be shown that the universal model constructed by the chase, albeitpossibly infinite, is of finite treewidth (i.e., it is tree-like and cannot be arbitrarily cyclic). FromCourcelle’s famous Theorem (cf. (Courcelle 1990)), which states that evaluating first-order sen-tences over structures of finite treewidth is decidable, we derive decidability for query answeringunder sets of guarded TGDs.The complexity of query answering under guarded TGDs was investigated in (Cal`ı et al. 2009),where it was established that, whenever a query is actually entailed by a database and a set ofguarded TGDs, then all atoms needed to answer the query are derived in a finite, initial portion ofthe chase when restricted only to guards and atoms derived from them, whereby the size of thisportion depends only on the query and the set of TGDs. Therefore, constructing this part of thechase and evaluating a boolean conjunctive query over it is enough to compute the answer. It isshown that this can be done in polynomial time in the data complexity, whereby P-membershipfollows. Hardness for P was shown in (Dantsin et al. 2001) by reduction to the implicationproblem for propositional logic programs. easoning under Disjunctive Existential Rules XP -completenessfor the general case and E XP -completeness in case of fixed arity. Also membership in NP wasshown in case where the set of TGDs is fixed. NP-hardness follows from results in (Chandra andMerlin 1977), which show that NP-hardness holds even for the empty set of TGDs. Weakly-Guarded Sets of TGDs
In (Cal`ı et al. 2013), guarded TGDs were extended to weakly-guarded sets of TGDs . Every TGD in such sets must have an atom in its body that contains allthe variables where a null value may appear during the chase. The leftmost such atom is calledthe weak-guard . This class is the first class discussed here that is based on a global property. It iseasy to see that, as guarded TGDs contain a body atom with all universally quantified variables,they are trivially weakly-guarded, as the guard is also a weak-guard.It is implicit in (Cal`ı et al. 2013) that it can be verified in polynomial time whether a set ofTGDs is weakly-guarded or not: For a schema S we first need to compute all the positionsfor each predicate where a null value can occur during the chase with respect to a set of TGDs Σ . These positions are called affected and computing them has been shown to be possible inpolynomial time. Then we have to check for each TGD in Σ whether it contains a weak-guard,which, knowing the affected positions, is also possible in polynomial time.It is then shown that weakly-guarded sets of TGDs enjoy the same favorable property asguarded TGDs, namely, the chase has finite treewidth. Given this fact, decidability of queryanswering is established as before. Regarding the complexity, in general the problem is 2E XP -complete, E XP -complete if the arity is fixed or the set of TGDs fixed, and it remains E XP -complete even if only the database is considered as input (data complexity). Weakly-Acyclic Sets of Tuple Generating Dependencies
The notion of
Weak Acyclicity was es-tablished in the landmark paper (Fagin et al. 2005) as a syntactic condition to guarantee termina-tion of the chase procedure. For this we first need to define the notion of a dependency graph.A dependency graph G = ( V , E ) is constructed as follows: V is the set of attributes of all therelations occurring in Σ . We will denote the i th attribute of some relation r by r [ i ] . For each TGD σ = ϕ ( X ) → ∃ Y ψ ( X , Y ) and each variable X ∈ X shared between the relation attributes r [ i ] in ϕ and s [ j ] in ψ , we add an edge ( r [ i ] , s [ j ]) to E . We add a special edge ( r [ i ] , p [ k ]) to E for eachattribute p [ k ] in ψ occupied by a variable Y ∈ Y , and each attribute r [ i ] occurring in the body of σ .A set Σ of TGDs is called weakly-acyclic if its dependency graph contains no cycles throughspecial edges. The definition of weak acyclicity is a global property and can be decided in P, asthe construction of the dependency graph and the cycle-check through a special edge are bothfeasible in P.In (Fagin et al. 2005) it was shown that for weakly-acyclic sets of TGDs the chase alwaysterminates. This is ensured by the fact that when cycles through special edges in the dependencygraph are forbidden, no new null values can be added in a later chase step because of a null valueadded in an earlier chase step. Therefore we trivially get decidability: Simply compute the (finite)chase, and then answer the query on the obtained finite model.Regarding complexity (cf. (Cal`ı et al. 2013; Kolaitis et al. 2006)), in general the problem ofBCQ-A NSWERING is 2E XP -complete for weakly-acyclic sets of TGDs. When the set of TGDsis fixed, the BCQ-A NSWERING problem is known to be NP-complete. P-completeness holdsfor the data complexity, following from the complexity of the fact inference problem for fixedDatalog programs (see (Dantsin et al. 2001)).
M. MorakSticky Sets of Tuple Generating Dependencies
A recent addition to the set of syntactic condi-tions that ensure decidability and favourable complexity of conjunctive query evaluation is theparadigm of stickiness , introduced in (Cal`ı et al. 2012). A survey of sticky classes can be found in(Cal`ı et al. 2010). The class of sticky sets of TGDs is defined as follows: In a first step, a variablemarking of all TGDs in a set Σ is computed by a procedure called SMarking . This is a two-stepprocedure:1.
Initial marking:
For each σ ∈ Σ , if there exists a variable V in the body of σ and an atomwithout this variable exists in the head of σ , mark each occurrence of V in the body.2. Propagation step:
Until a fixpoint is reached, consider any pair (cid:104) σ , σ (cid:105) ∈ Σ × Σ . If auniversally quantified variable V occurs in head ( σ ) at positions π , . . . , π m for m ≥ body ( σ ) exists where at each of these same positions a marked variable occurs,then mark each occurrence of V in body ( σ ) . Definition 2.4 ( (Cal`ı et al. 2012) )A set Σ of TGDs is called sticky if and only if there is no TGD in SMarking ( Σ ) such that a markedvariable occurs in its body more than once.The property of stickiness is incomparable to guardedness and weak acyclicity but strictlygeneralizes inclusion dependencies. In comparison to other discussed syntactic classes of TGDs,sticky sets of TGDs allow for a mildly restricted way to express joins. The following is an exam-ple of a sticky (singleton) set of TGDs, expressing the join between two tables, department andemployee, to get a combined table of departments and their heads: department ( X , Y ) , employee ( Y , Z ) → headofdept ( X , Y , Z ) Note that the above TGD is not weakly-guarded.
The goal, as already discussed in the introduction, is to introduce disjunction into Datalog ± andinvestigate the impact of doing so on the decidability and complexity of query answering. Wethus extend the definition of a TGD to allow for disjunction as follows:A disjunctive tuple-generating dependency (DTGD) σ is a first-order formula ∀ X ϕ ( X ) → (cid:87) ni = ∃ Y ψ i ( X , Y ) , where n (cid:62) X ∪ Y ⊂ Γ V , and ϕ , ψ , . . . , ψ n are conjunctions of atoms; ϕ is the body of σ , denoted body ( σ ) , while (cid:87) ni = ψ i is the head of σ , denoted head ( σ ) . If n = σ is a tuple-generating dependency (TGD) . Given a set Σ of DTGDs, schema ( Σ ) is the setof predicates occurring in Σ .We employ the disjunctive chase introduced in (Deutsch and Tannen 2003) in order to answerqueries. It is an extension of the chase procedure described in Section 2.1. Consider an instance I , and a DTGD σ = ϕ ( X ) → (cid:87) ni = ∃ Y ψ i ( X , Y ) . We say that σ is applicable to I if there existsa homomorphism h (i.e., a substitution of labelled nulls to either constants or other labellednulls) such that h ( ϕ ( X )) ⊆ I , but there is no i ∈ { , . . . , n } and a homomorphism h (cid:48) ⊇ h suchthat h (cid:48) ( ψ i ( X , Y )) ⊆ I . The result of applying σ to I with h is the set { I , . . . , I n } , where I i = I ∪ h (cid:48) ( ψ i ( X , Y )) , for each i ∈ { , . . . , n } , and h (cid:48) ⊇ h is such that h (cid:48) ( Y ) is a “fresh” labelled nullnot occurring in I , for each Y ∈ Y . For such an application of a DTGD, which defines a singleDTGD chase step , we write I (cid:104) σ , h (cid:105){ I , . . . , I n } .A disjunctive chase tree of a database D and a set Σ of DTGDs is a (possibly infinite) tree such easoning under Disjunctive Existential Rules D , and for every node I , assuming that { I , . . . , I n } are the children of I , there exists σ ∈ Σ and a homomorphism h such that I (cid:104) σ , h (cid:105){ I , . . . , I n } . The disjunctive chase algorithm for D and Σ consists of an exhaustive application of DTGD chase steps in a fair fashion, which leadsto a disjunctive chase tree T of D and Σ ; we denote by dchase ( D , Σ ) the set { I | I is a leaf of T } .Note that each leaf of T is well-defined as the least fixpoint of a monotonic operator. By con-struction, each instance of dchase ( D , Σ ) is a model of D and Σ . Interestingly, dchase ( D , Σ ) is a universal set model of D and Σ , i.e., for each M ∈ Mod ( D , Σ ) , there exists I ∈ dchase ( D , Σ ) anda homomorphism h I such that h I ( I ) ⊆ M (Deutsch et al. 2008). This implies that w.r.t. certainanswers, given a query q , D ∪ Σ | = q iff I | = q , for each I ∈ dchase ( D , Σ ) . Current Status.
Currently we have investigated and obtained results for all the decidable classesof TGDs. For the guarded-based classes, adding disjunction does not make the problem of queryanswering undecidable. However it does in certain cases increase the complexity of the problemby a significant amount. For the guarded-based classes of TGDs (i.e., IDs, linear, guarded andweakly-guarded), we have established all relevant complexity results when extending them toDTGDs.In case of sticky TGDs, when adding disjunction the problem of query answering becomes un-decidable. This was a very surprising result, given the fact that the complexity of query answeringunder sticky sets of TGDs is lower than under guarded TGDs.In case of weakly-acyclic TGDs, data complexity results have been obtained, as well as certainlower bounds in the combined complexity, however a matching upper bound is still missing here.Decidability is assured in any case, because the disjunctive chase terminates, which follows fromthe definition of weak acyclicity.
One classical work on disjunction in ontologies is (Calvanese et al. 2006), which immediatelygives us coNP-hardness for conjunctive query answering over disjunctive ontologies, even ifthe query is fixed, and the ontology consists of a fixed, single rule of the form a ( X ) → b ( X ) ∨ c ( X ) . Without restricting the query language, there is thus no hope to get tractability results.However, for atomic queries, where the query consists only of a single atom, there are tractabledata complexity cases to be found. Arbitrary queries.
In (Bourhis et al. 2013), we have investigated the complexity picture for an-swering arbitrary queries. The main results are as follows: • XP -completeness whenever the query is non fixed. This is shown by simulating a B¨uchitree automaton, and it even holds for fixed sets of Disjunctive Inclusion Dependencies(DIDs) of arity at most three, or of non-fixed sets of the same with arity at most two. • coNP-completeness in the data complexity for query answering under DIDs up to guardedDTGDs. • E XP -completeness in the data complexity for query answering under weakly-guarded setsof DTGDs.In case of (non-disjunctive, classical) TGDs, complexity results coincide in the data complex-ity, but vary from coNP-completeness to 2E XP -completeness for fixed sets of IDs to weakly-guarded sets of TGDs. It is thus interesting to note that adding disjunction to expressive languages0 M. Morak doesn’t change the complexity in this case, but there is a high cost to add it to less expressivelanguages.
Atomic queries.
In (Gottlob et al. 2012), we have investigated the complexity of answeringsingle-atom queries. Here the complexity results vary considerably: • XP -completeness in the combined complexity for guarded DTGDs. • E XP -completeness in the combined complexity for linear DTGDs. • coNP-completeness in the data complexity for guarded DTGDs. • Membership in AC in the data complexity for linear DTGDs.In the case of atomic queries we do have a number of tractability results to offer, especiallythe highly parallelizable data complexity of AC in case of atomic query answering over setsof linear DTGDs (which captures the class of DIDs). For guarded DTGDs, most of the resultsfollow directly from expressive fragments of first-order logic (the Guarded Fragment (B´ar´anyet al. 2010; Gr¨adel 1999), and Guarded-Negation First-Order Logic (B´ar´any et al. 2011; B´ar´anyet al. 2012)). For linear, we develop novel machinery to obtain our respective bounds. In addition to the published results, we would like to find answers to the following questions:What is the complexity of query answering under sets of1. guarded-based DTGDs in case where the query is acyclic or of bounded (hyper)treewidth?2. weakly-acyclic sets of DTGDs?3. sticky sets of DTGDs?Regarding the first item, we have already managed to obtain all the relevant results. In fact,for bounded (hyper)treewidth, the complexity table coincides with that of arbitrary queries. Foracyclic queries, there are drops in complexity corresponding to the expressivity of the languageconsidered. Papers containing these results have been submitted to this year’s MFCS conferenceand DL workshop. It is our plan to subsequently publish these results, in addition to some ex-tended work on arbitrary and atomic queries in a comprehensive journal paper, treating all theguarded-based classes of DTGDs, in the course of 2014.Regarding weakly-acyclic, we already have answers to the complexity questions for data com-plexity and the cases of fixed sets and fixed arities. However, we are still missing the combinedcomplexity results. Before submission of my thesis, we plan to close these open complexityquestions as well.Lastly, for sticky DTGDs, we have an undecidability proof, which is somewhat surprisingas query answering under sticky TGDs is easier in terms of complexity than it is for guardedTGDs, yet the addition of disjunction doesn’t cause a complexity increase in the latter. We havetherefore focused on extending guarded DTGDs with cross-products (a form of join allowed insticky TGDs). This again yields undecidability, however it becomes decidable if restricted toarity at most two, where binary predicates can never participate in a disjunction. For this case weare working on obtaining the relevant complexity results. easoning under Disjunctive Existential Rules References A BITEBOUL , S., H
ULL , R.,
AND V IANU , V. 1995.
Foundations of Databases . Addison-Wesley.A
NDREWS , P. B. 2002.
An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof .Academic Press Professional, Inc., San Diego, CA, USA.B
AGET , J.-F., L
ECL ` ERE , M.,
AND M UGNIER , M.-L. 2009. Walking the decidability line for rules withexistential variables (long version). Tech. Rep. LIRMM RR-09030, Laboratoire d’Informatique et deRobotique et de Micro´electronique de Montpellier, Universit´e Montpellier, CNRS.B
AGET , J.-F., L
ECL ` ERE , M., M
UGNIER , M.-L.,
AND S ALVAT , E. 2011. On rules with existential vari-ables: Walking the decidability line.
Artif. Intell. 175, AR ´ ANY , V., G
OTTLOB , G.,
AND O TTO , M. 2010. Querying the guarded fragment. In
Proc. of LICS .1–10.B ´ AR ´ ANY , V.,
TEN C ATE , B.,
AND O TTO , M. 2012. Queries with guarded negation.
PVLDB 5,
11, 1328–1339.B ´ AR ´ ANY , V.,
TEN C ATE , B.,
AND S EGOUFIN , L. 2011. Guarded negation. In
Proc. of ICALP . 356–367.B
ARWISE , J. 1977. An introduction to first-order logic. In
Handbook of Mathematical Logic , J. Barwise,Ed. North-Holland, Amsterdam, 5–46.B
EERI , C.
AND V ARDI , M. Y. 1981. The implication problem for data dependencies. In
Proc. ICALP .Lecture Notes in Computer Science, vol. 115. Springer, 73–85.B
OURHIS , P., M
ORAK , M.,
AND P IERIS , A. 2013. The impact of disjunction on query answering underguarded-based existential rules. In
IJCAI , F. Rossi, Ed. IJCAI/AAAI.C AL ` I , A., G OTTLOB , G.,
AND K IFER , M. 2013. Taming the infinite chase: Query answering under ex-pressive relational constraints.
J. Artif. Intell. Res. (JAIR) 48 , 115–174.C AL ` I , A., G OTTLOB , G.,
AND L UKASIEWICZ , T. 2009. A general datalog-based framework for tractablequery answering over ontologies. In
Proc. PODS . ACM, 77–86.C AL ` I , A., G OTTLOB , G., L
UKASIEWICZ , T., M
ARNETTE , B.,
AND P IERIS , A. 2010. Datalog ± : A familyof logical knowledge representation and query languages for new applications. In Proc. LICS . IEEEComputer Society, 228–242.C AL ` I , A., G OTTLOB , G.,
AND P IERIS , A. 2010. Query rewriting under non-guarded rules. In
Proc. AMW .CEUR Workshop Proceedings, vol. 619. CEUR-WS.org.C AL ` I , A., G OTTLOB , G.,
AND P IERIS , A. 2011. New expressive languages for ontological query answer-ing. In
Proc. AAAI . AAAI Press.C AL ` I , A., G OTTLOB , G.,
AND P IERIS , A. 2012. Towards more expressive ontology languages: The queryanswering problem.
Artif. Intell. 193 , 87–128.C
ALVANESE , D., G
IACOMO , G. D., L
EMBO , D., L
ENZERINI , M.,
AND R OSATI , R. 2006. Data complex-ity of query answering in description logics. In
Proc. KR . AAAI Press, 260–270.C
HANDRA , A. K.
AND M ERLIN , P. M. 1977. Optimal implementation of conjunctive queries in relationaldata bases. In
Proc. STOC . ACM, 77–90.C
OURCELLE , B. 1990. The monadic second-order logic of graphs. i. recognizable sets of finite graphs.
Inf.Comput. 85,
1, 12–75.D
ANTSIN , E., E
ITER , T., G
OTTLOB , G.,
AND V ORONKOV , A. 2001. Complexity and expressive power oflogic programming.
ACM Comput. Surv. 33,
3, 374–425.D
EUTSCH , A., N
ASH , A.,
AND R EMMEL , J. B. 2008. The chase revisisted. In
Proc. of PODS . 149–158.D
EUTSCH , A.
AND T ANNEN , V. 2003. Reformulation of XML queries and constraints. In
Proc. of ICDT .225–241.F
AGIN , R., K
OLAITIS , P. G., M
ILLER , R. J.,
AND P OPA , L. 2005. Data exchange: Semantics and queryanswering.
Theor. Comput. Sci. 336,
1, 89–124.G
OTTLOB , G., M
ANNA , M., M
ORAK , M.,
AND P IERIS , A. 2012. On the complexity of ontologicalreasoning under disjunctive existential rules. In
MFCS , B. Rovan, V. Sassone, and P. Widmayer, Eds.Lecture Notes in Computer Science, vol. 7464. Springer, 1–18. M. Morak G R ¨ ADEL , E. 1999. On the restraining power of guards.
J. Symb. Log. 64,
4, 1719–1742.J
OHNSON , D. S.
AND K LUG , A. C. 1984. Testing containment of conjunctive queries under functionaland inclusion dependencies.
J. Comput. Syst. Sci. 28,
1, 167–189.K
OLAITIS , P. G., P
ANTTAJA , J.,
AND T AN , W. C. 2006. The complexity of data exchange. In Proc.PODS . ACM, 30–39.M
AIER , D., M
ENDELZON , A. O.,
AND S AGIV , Y. 1979. Testing implications of data dependencies.
ACMTrans. Database Syst. 4,
4, 455–469.M
ICROSOFT C ORP . 2011. The microsoft connected services framework.O
RACLE I NC . 2011. Oracle database semantic technologies.P APADIMITRIOU , C. M. 1994.
Computational Complexity . Addison-Wesley, Reading, Massachusetts.V
ARDI , M. Y. 1982. The complexity of relational query languages (extended abstract). In