[PDF] Combining Existential Rules and Description Logics (Extended Version)

Abstract

Query answering under existential rules -- implications with existential quantifiers in the head -- is known to be decidable when imposing restrictions on the rule bodies such as frontier-guardedness [BLM10, BLMS11]. Query answering is also decidable for description logics [Baa03], which further allow disjunction and functionality constraints (assert that certain relations are functions), however, they are focused on ER-type schemas, where relations have arity two. This work investigates how to get the best of both worlds: having decidable existential rules on arbitrary arity relations, while allowing rich description logics, including functionality constraints, on arity-two relations. We first show negative results on combining such decidable languages. Second, we introduce an expressive set of existential rules (frontier-one rules with a certain restriction) which can be combined with powerful constraints on arity-two relations (e.g. GC 2, ALCQIb) while retaining decidable query answering. Further, we provide conditions to add functionality constraints on the higher-arity relations.

Full PDF

aa r X i v : . [ c s . D B ] M a y Combining Existential Rulesand Description Logics(Extended Version)

Antoine AmarilliT´el´ecom ParisTech; Institut Mines–T´el´ecom; CNRS [email protected] BenediktUniversity of [email protected] 27, 2018

Query answering under existential rules — implications with existential quanti-ﬁers in the head — is known to be decidable when imposing restrictions on therule bodies such as frontier-guardedness [BLM10, BLMS11]. Query answering isalso decidable for description logics [Baa03], which further allow disjunction and functionality constraints (assert that certain relations are functions); however, theyare focused on ER-type schemas, where relations have arity two.This work investigates how to get the best of both worlds: having decidableexistential rules on arbitrary arity relations, while allowing rich description logics,including functionality constraints, on arity-two relations. We ﬁrst show negativeresults on combining such decidable languages. Second, we introduce an expressiveset of existential rules (frontier-one rules with a certain restriction) which can becombined with powerful constraints on arity-two relations (e.g. GC , ALCQI b )while retaining decidable query answering. Further, we provide conditions to addfunctionality constraints on the higher-arity relations.

1. Introduction

Recent years have seen an explosion of techniques for solving the query answering problem :given a query q , a conjunction F of atoms, and a set of logical constraints Σ, determinewhether q follows from F and Σ. In databases this is called querying under constraints orthe certain answer problem , seeing F as an incomplete database, and Σ as restrictions on thepossible completions. For researchers working on description logics, F is referred to as the A-box and Σ the

T-box . In both communities q is usually a conjunctive query , an existentialquantiﬁcation of conjunctions of atoms, equivalent to a basic SQL SELECT. We will make1his assumption throughout this work, referring for simplicity to the problem as just “queryanswering” (QA).QA is undecidable when Σ ranges over arbitrary ﬁrst-order logic constraints. This motivatesthe search for restricted constraint languages with decidable QA. Within the description logiccommunity, powerful such languages were developed to express constraints on vocabularies ofarity two. The unary relations are referred to as concepts while the binary ones are the roles .The languages can build new concepts and roles from basic ones via Boolean operations and(limited) quantiﬁcation, and many of them, such as DL-Lite [CDGL +

05] or

ALCQI b [Tob01],may restrict the input roles R ( x, y ) to be functional – for all x there is at most one y such that R ( x, y ). Functionality constraints are crucial to faithfully model many real-world relationships:the relationship of a person to their birthdate, the relationship of an event to its starting time,etc. Hence, description logics are very powerful languages for arity-two vocabularies .In parallel, the AI and database communities have developed rich constraint languages onarbitrary arity via existential rules or tuple-generating dependencies (TGDs). Existential rulesare constraints of the form ∀ x ( φ ( x ) → ∃ y ψ ( x ′ , y )) where x ′ ⊆ x and φ and ψ are conjunctionsof atoms. They generalize the well-known inclusion dependencies or referential constraints in databases [AHV95], and can also express mapping relationships used in data exchange[FKMP05] and data integration [Len02]. Although QA over general rules is undecidable,important subclasses are decidable. First, decidability holds whenever the chase procedure[AHV95] is guaranteed to terminate, which is ensured by a number of conditions on the rules,e.g., weak acyclicity [FKMP05], joint acyclicity [KR11], or the very restricted class of source-to-target TGDs. See [GHK +

13] for a survey and [BGMR14] for a recent study. A second classof tame constraints are those that admit bounded-treewidth models. There are several suchclasses, such as guarded TGDs [CGL12], frontier-guarded TGDs [BLM10], or the more general greedy bounded-treewidth sets [BMRT11]. However, many features of description logics, suchas disjunction or functionality restrictions, cannot be expressed by existential rules.Could we then enjoy the best of both worlds, by allowing both description logic constraintsand existential rules, while maintaining the decidability of QA? This paper studies to whatextent both paradigms can be combined, by looking for classes of constraints with decidableQA over relational schemas of arbitrary arity that can 1. express non-trivial existential rulesover any relation in the schema and 2. assert expressive constraints (e.g., in

ALCQI b ) on the arity-two subschema — the subset of the relations of arity one and two within the schemaOur ﬁrst results (Section 3) are negative: we show that arity-two languages featuring func-tionality constraints on the arity-two subschema may lead to undecidable QA when combinedwith even very simple acyclic rules (source-to-target TGDs, S2T ), or with the simplest ex-istential rules that export two variables (frontier-two inclusion dependencies, ID [2]). Moresurprisingly, undecidability can occur with rules exporting only a single variable, the class of frontier-one dependencies FR [ ] of [BLMS09]. We say the existential rule languages S2T , ID [2], FR [ ] are destructive of arity-two QA .We then show (Section 4) that by restricting FR [ ] slightly, imposing that the head of therules have a certain tree shape (denoted “non-looping”), we can obtain a class of existentialrules that can be combined with expressive constraints on the arity-two schema while main-taining decidable QA (we call this not destructive ). The reduction proceeds in two steps. Weﬁrst handle rules with tree-shaped bodies, via a direct rewriting technique to constraints onan arity-two encoding of the schema. Second, we handle rules with non-tree-shaped bodies,showing that the bodies can be soundly replaced by a tree-shaped approximation. Soundnessis proven by extending the technique of “treeiﬁcation” used previously in many modal and2uarded logics (e.g., [BGO14]), showing that models of the constraints can be “unraveled” tobe tree-shaped.We go on to study (Section 5) the addition of functional dependencies (FDs), a well-knowngeneralization of description logic functionality constraints to arbitrary arity. QA with existen-tial rules and FDs is generally undecidable unless their interaction with the existential rules iscontrolled, e.g., by imposing the non-conﬂicting condition [CGP12]. We show that FDs can beadded to our existential rules while maintaining decidable QA with the arity-two constraints,as long as the non-conﬂicting condition is satisﬁed. As in the standard non-conﬂicting setting,we show that the FDs can always be satisﬁed unless the initial facts violate them. We provethis by modifying the unraveling argument.Our results have the advantage that QA for our combined constraints reduces to QA on anarity-two schema; hence, existing QA algorithms for rich description logics could be extendedto arbitrary arity signatures with expressive constraints. Related work.

A great deal of research has centered around the integration of DLs withDatalog-style rules, including work as early as the 1990’s, when the languages AL-Log [DLNS91]and CARIN [LR98] were introduced. AL-Log links Horn rules with concepts from a descriptionlogic terminology, while the later language CARIN provides a broader framework allowing bothconcepts and roles from a terminology to appear in rules. [LR98] provides both entailmentalgorithms for CARIN and undecidability results exploring the borderline for combining rulesand DLs.Datalog rules, however, unlike the existential rules that we consider in this work, do notallow existential quantiﬁcation in the head, so they cannot assert the existence of higher-arityfacts on fresh elements.Another approach to combination are description logics that support higher-arity relationsdirectly. Languages such as

DLR reg [CGL08] give some support for higher arity while retaininga DL-style syntax. Unlike them, we support existential rules with cyclic bodies that cannot beencoded in

DLR reg , as well as arbitrary higher-arity functional dependencies that go beyondDL-expressible functionality assertions. On the other hand, we do not support some featuresof

DLR reg , such as regular expression on role paths. Indeed, we do not consider the interactionof rules with DLs supporting transitivity and other recursion mechanisms [GLHS08], focusinginstead only on ﬁrst-order-expressible constraints given by decidable DLs and existential rules.

2. Preliminaries

Signatures, facts, queries. A signature σ consists of relation names (e.g. R ) and an associ-ated arity (e.g. | R | ). We write σ as σ ≤ ⊔ σ > , containing respectively the relations of arity ≤ higher-arity relations with arity >

2. An atom R ( x ) consists of a relation name R and an | R | -tuple x of variables. A σ - fact (or just fact when σ is clear from context) is a con-junction of atoms using relations in σ . A Boolean conjunctive query (or CQ) is an existentiallyquantiﬁed conjunction of atoms. In this paper we assume for simplicity that CQs are Boolean,i.e., have no free variables, and we disallow constants. This is without loss of generality: fornon-Boolean queries we can enumerate all possible assignments, and constants can be encodedwith fresh unary relations. 3 onstraints, QA. We consider constraints that are formulae in function-free and constant-free ﬁrst-order logic (FO), on the signature σ . A σ - interpretation I (or just interpretation )consists of a domain dom( I ) and an interpretation function · I mapping each relation R of σ to a set R I of | R | -tuples of dom( I ). The deﬁnition of I satisfying a FO formula φ , written I | = φ , is standard. A witness W of F in I is an interpretation that maps each relation R tothe tuples in R I obtained by substituting the atoms of F using some variable binding w suchthat I | = F ( w ).We study the query answering problem (QA): given a fact F , a set of constraints Σ, and aCQ q , decide the validity of ∀ x ( F ( x ) ∧ Σ → q ); that is, whether F and Σ entail q . In this case,we write F ∧ Σ | = q . The combined complexity of QA, for a ﬁxed class of constraints, is thecomplexity of deciding it when all of F , Σ (in the constraint class) and q are given as input.If we assume that Σ and q are ﬁxed, and only F is given as input, then we deﬁne instead the data complexity .The QA problem above allows arbitrary FO constraint classes. Below we present two kindsof integrity constraints that are known to enjoy decidable QA. Existential rules. An existential rule (or tuple-generating dependency , or TGD) is a logicalconstraint of the form ∀ x ( φ ( x ) → ∃ y ψ ( x ′ , y )), with x ′ ⊆ x , where the body φ and head ψ areconjunctions of atoms. Equality atoms and constants are disallowed. For brevity, in rules weoften omit the quantiﬁcation on x and write ‘ ∧ ’ as a comma. A rule is single-head if its headconsists of only one atom.QA is undecidable for general rules (following from [BV81]). One class of rules with decidableQA are those satisfying acyclicity conditions. We will show negative results for one of the mostrestrictive classes, the class S2T of source-to-target TGDs , where σ is partitioned as σ = σ S ⊔ σ T ,the bodies of all rules only use relations in σ S , and the heads only use relations in σ T . Ourresults on S2T extend to more permissive acyclicity conditions, such as those mentioned in theintroduction.A second class of decidable rules guarantees that it suﬃces to consider bounded-treewidthinterpretations, usually because of constraints on the rule bodies. We focus on the class FR [ ]of frontier-one rules , following [BLMS09]: the frontier of a rule is the set x ′ of variables thatoccur both in the body and the head, and a rule is frontier-one if | x ′ | = 1. The class of inclusiondependencies ID imposes that the head and body are single atoms where each variable is usedonly once and that the frontier is not empty, and we will focus on the class ID [2] of the inclusiondependencies with frontier size 2. QA is decidable for FR [ ] [BLMS09]. For ID it is decidableand has PTIME data complexity [CLR03b].Existential rules can be augmented with functional dependencies (FDs), which are variantsof existential rules that impose equalities. Writing ∀ x = ∀ x · · · ∀ x n and similarly for y , an FDon the relation R is of the form: ∀ xy ( R ( x , . . . , x n ) ∧ R ( y , . . . , y n ) ∧ V l ∈ L x l = y l ) → x r = y r for some 1 ≤ r ≤ | R | and some subset L ⊆ { , . . . , | R |} which we call the determiner of theFD. QA is undecidable when combining existential rules and arbitrary FDs, for instance it isundecidable for ID [2] and FDs [CLR03a]. Arity-two constraints.

The second kind of tame constraints are arity-two constraints , whichare only deﬁned on σ ≤ . The most general such language that we study is the two-variableguarded fragment with counting quantiﬁers , GC [Kaz04]. GC is the smallest class of constant-4ree FO formulae with at most two variables, containing all atoms for σ ≤ relations, closedunder Boolean connectives, under guarded universal and existential quantiﬁcation, and under number quantiﬁcations : if φ ( x, y ) is a GC formula and A ( x, y ) is an arity-two atom with twofree variables (the guard ), then ∃ ≥ n y A ( x, y ) ∧ φ ( x, y ) and ∃

For any class CL of existential rules, we call CL non-destructive (of arity-two QA) if QA is decidable for the class CL ∧ GC of conjunctions of constraints of CL (on σ ) and of constraints of GC (on σ ≤ ). Otherwise, we call CL destructive .

3. Negative Results for Combination

We now present classes of existential rules which have decidable QA but are destructive. First,we observe that even the simplest class of rules that ensures decidability based on chase ter-mination, the class

S2T of source-to-target TGDs, is destructive. This is not so surprising,since the arbitrary constraints on the arity-two signature may add dependencies that are notsource-to-target.

Theorem 3.1.

S2T is destructive of arity-two QA, even when the whole σ has arity twoand there is no query (i.e., this is just the satisﬁability problem asking whether the fact andconstraints are satisﬁable). Thus we move on to classes of existential rules that are decidable because of guardednessassumptions.We ﬁrst observe that the class ID [2] of frontier-two inclusion dependencies is destructiveof arity-two QA. In fact, functionality assertions on the binary relations are suﬃcient to getundecidability, because they can be lifted to functionality assertions on higher-arity relationsusing ID [2]. Thus, following a standard reduction from QA to entailment of dependencies asin [CLR03a], we can use the undecidability of entailment for ID [2] and FDs (Theorem 2 of[Mit83], which we adapt slightly) and prove the following:5 heorem 3.2. ID [2] is destructive of arity-two QA. In particular, QA is undecidable for ID [2] ∧ D , for any DL D (such as DL-Lite) featuring functionality assertions. More surprisingly, frontier-one rules FR [ ] are destructive of arity-two QA, even thoughthey can only export a single variable, and this holds even when the whole σ has arity two.The reason is that FR [ ] may be more expressive than GC as it can disobey the two-variablerestriction. Theorem 3.3. FR [ ] is destructive of arity-two QA, even when the whole σ has arity two andthere is no query. This motivates the search for more restricted existential rule classes which could be non-destructive of arity-two QA.

4. From Existential Rules to Arity-Two

We will focus on the subclass of frontier-one rules whose heads do not contain non-trivial

Bergecycles [Fag83].

Deﬁnition 4.1. A Berge cycle in a conjunction of atoms Ψ is a sequence A , x , A , x , . . . ,A n , x n of length n > where the x i are pairwise distinct variables, the A i are pairwise distinctatoms of Ψ , and every x i occurs in atoms A i and A i +1 (with addition modulo n , so x n occursin A ).We say Ψ is non-looping if there is no Berge cycle of length above , and no Berge cyclethat contains an atom of σ > .We deﬁne the head-non-looping FR [ ] Hnl subclass of FR [ ] rules whose heads are non-looping.In particular, single-head FR [ ] rules are always head-non-looping. Example 4.2.

Rules A ( x ) → ∃ yz R ( x, y ) , S ( y, z ) , T ( z, x ) and B ( y ) → ∃ yz R ( x, y ) , U ( x, y, z ) are not in FR [ ] Hnl . However, A ( x ) → ∃ y V ( x, x, y, y ) and B ( x ) → ∃ y R ( x, y ) , S ( x, y ) , R ( y, x ) are in FR [ ] Hnl . We claim that head-non-looping rules are non-destructive, in contrast with general frontier-one rules (Theorem 3.3):

Theorem 4.3. FR [ ] Hnl is not destructive of arity-two QA.

Of course, this means that QA is decidable for FR [ ] Hnl ∧ D , for any DL D expressible in GC , such as ALCQI b . The rest of this section proves the theorem and addresses complexity. Shredding.

Our proof of Theorem 4.3 translates the FR [ ] Hnl rules to arity-two constraints,using a common way to represent general relational databases in a binary relational store, whichwe call shredding : we represent an n -ary relation by a set of binary relations giving the linkfrom each tuple (materialized as an element) to its attributes. We present ﬁrst the translationof the signature σ to its shredded arity-two signature σ S , and the constraints imposed on σ S -interpretations to ensure that they can be decoded back to σ -interpretations. Second, weexplain how to shred facts and CQs. 6 eﬁnition 4.4. The shredded signature σ S of a signature σ consists of σ ≤ , a unary relation Elt , and, for each R ∈ σ > , a unary relation A R and binary relations R i for ≤ i ≤ | R | .The well-formedness constraints of σ S , written wf ( σ S ) , are the following DL constraints (theyare ALCQI b -expressible): • C ⊑ Elt for every unary relation C of σ ≤ • ∃ R. ⊤ ⊑ Elt and ∃ R − . ⊤ ⊑ Elt for all binary R of σ ≤ and the following, where R = S are in σ > and ≤ i ≤ | R | : • ∃ R i . ⊤ ⊑ A R and ∃ R − i . ⊤ ⊑ Elt • Elt ⊓ A R ⊑ ⊥ and A R ⊓ A S ⊑ ⊥• A R ⊑ ∃ R i . ⊤ and funct( R i )The shredding SHR( F ) of a σ -fact F is the σ S -fact obtained by adding the atom Elt( x ) foreach variable x of F and replacing each atom R ( x ) of F when R ∈ σ > by the atoms A R ( t ) and R i ( t, x i ) for 1 ≤ i ≤ | R | , for t a fresh variable. The shredding SHR( q ) of a CQ q is similarlydeﬁned. Example 4.5.

Considering CQ q : ∃ xyz U ( x ) , R ( x, y ) , S ( z, z, x ) , we deﬁne SHR( q ) as: ∃ xyzt Elt( x ) , Elt( y ) , Elt( z ) , U ( x ) , R ( x, y ) , A R ( t ) , S ( t, z ) , S ( t, z ) , S ( t, x ) . Fully-non-looping.

The interesting part is to deﬁne the shredding of FR [ ] Hnl rules. We ﬁrstrestrict to the class of fully-non-looping rules, FR [ ] Fnl , whose head and body are non-looping.We show that FR [ ] Fnl can be directly shredded to GC . We will later move from FR [ ] Fnl to FR [ ] Hnl .For any existential rule τ : ∀ x φ ( x ) → ∃ y ψ ( x ′ , y ) with x ⊆ x ′ , we deﬁne its shredding SHR( τ ) as the existential rule ∀ xt (SHR( φ ( x ))) → ∃ yt ′ (SHR( ψ ( x ′ , y ))), where t and t ′ arethe fresh elements introduced in the shredding of φ and ψ respectively. We claim the following: Lemma 4.6.

For any FR [ ] Fnl rule τ , SHR( τ ) can be translated in PTIME to a GC sentenceon σ S . Example 4.7.

For brevity, this example ignores the

Elt and A R atoms when shredding. Con-sider the FR [ ] Fnl rule: U ( u ) , T ( u, x ) , S ( x ) → ∃ yz T ( x, y ) , U ( y ) , R ( x, x, z, z ) Its shredding is expressible in GC (and even in ALCQI b ): ( ∃ T − .U ) ⊓ S ⊑ ( ∃ T.U ) ⊓ ( ∃ ( R − ⊓ R − ) . ( ∃ ( R ⊓ R ) . ⊤ )) By contrast, consider the following rule in FR [ ] \ FR [ ] Hnl : U ( x ) → ∃ yz R ( x, y ) , S ( x, y, z ) Its shredding is as follows; it is not GC -expressible: U ( x ) → ∃ yzt R ( x, y ) , S ( t, x ) , S ( t, y ) , S ( t, z )In the general case, the GC rewriting of Lemma 4.6 is obtained in PTIME by seeing thebody and head of SHR( τ ) as a tree, which is possible because τ is fully-non-looping.It is now easy to show the following general result: Proposition 4.8 (Shredding) . For any fact F , GC constraints Σ , existential rules ∆ and CQ q , the following are equivalent: • F ∧ Σ ∧ ∆ | = q ; SHR( F ) ∧ Σ ∧ SHR(∆) ∧ wf ( σ S ) | = SHR( q ) . Thus, from Lemma 4.6, as SHR( F ), SHR(∆), σ S , wf ( σ S ), and SHR( q ) can be computed inPTIME following their deﬁnition, we deduce the following, in the case of FR [ ] Fnl : Corollary 4.9.

QA for GC and FR [ ] Fnl constraints can be reduced to QA for GC in PTIME;further, when the constraints and query are ﬁxed in the input, they also are in the output, sodata complexity bounds for GC QA are preserved.

This concludes the proof of Theorem 4.3 for FR [ ] Fnl constraints. It further implies that QAfor GC and FR [ ] Fnl has co-NP-complete data complexity, like GC , [PH09], and the combinedcomplexity is the same as for GC .Note that, although QA for GC is decidable, we know of no realistic implementations. Ourtranslation could however reduce instead to arity-two QA with constraints in DLs such as ALCQI b , if we impose impose additional minor restrictions on the FR [ ] Fnl rules (e.g., noatom of the form S ( x, x )). For simplicity, however, we focus in the sequel on reductions todecidable QA on arity-two (i.e., translating to GC ) rather than investigating which restrictionswould ensure that the output of our translations can be expressed in particular DLs. Head-non-looping.

We now extend the claim to FR [ ] Hnl rather than FR [ ] Fnl . The idea isthat we rewrite FR [ ] Hnl rules to FR [ ] Fnl by treeifying them , considering all possible fully-non-looping rules that they imply, and all possible ways that they can match on the parts of theinterpretations that satisfy the fact. To formalize this, we assume that we have added to thefact F one atom P x ( x ) for each variable x of F , where each P x is a fresh unary relation. Wethen deﬁne: Deﬁnition 4.10.

The treeiﬁcation on fact F of a FR [ ] Hnl rule τ : ∀ x ( φ ( x ) → ∃ y ψ ( x f , y )) ,where x f ∈ x is the frontier variable, is the conjunction TR F ( τ ) of FR [ ] Fnl rules deﬁned asfollows: • consider every mapping f from x to itself, and let f ( τ ) be obtained from τ by renamingall variables in x with f ; • for every such f ( τ ) , consider every x ′ ⊆ x and every mapping g from x ′ to the variablesof F , and construct g ( f ( τ )) by replacing every occurrence of each x ∈ x ′ in φ ( x ) by freshvariables x , . . . , x n , and adding the facts P g ( x ) ( x i ) for all x ∈ x ′ and all i (if x f ∈ x ′ ,also replace x f in ψ ( x f , y ) by one of its copies); • if g ( f ( τ )) is fully-non-looping, add it to TR F ( τ ) . Example 4.11.

Consider a fact F and the following rule τ : R ( x, y ) , S ( y, z ) , T ( z, w ) , U ( w, x ) → A ( x ) The treeiﬁcation TR F ( τ ) contains the rule: R ( x, y ) , S ( y, z ) , T ( z, y ) , U ( y, x ) → A ( x ) .Consider the rule τ ′ : R ( x, y ) , S ( y, x, x ) → A ( x ) , and a fact F containing variable z . Then TR F ( τ ′ ) contains: R ( x , y ) , S ( y, x , x ) , P z ( x ) , P z ( x ) , P z ( x ) → A ( x )We now claim: Proposition 4.12.

For any fact F , GC constraints Σ , FR [ ] Hnl rules ∆ and CQ q , thefollowing are equivalent: F ∧ Σ ∧ ∆ | = q ; • F ∧ Σ ∧ TR F (∆) | = q . This proposition implies that QA for FR [ ] Hnl and GC can be reduced to QA for FR [ ] Fnl and GC , which is decidable by the Shredding Proposition, proving Theorem 4.3.To prove Proposition 4.12, for the ﬁrst direction, if F ∧ Σ ∧ ∆ = q , one can show that allof the fresh unary relations P x in an interpretation of F ∧ Σ ∧ ∆ ∧ ¬ q can be assumed to beinterpreted by one tuple. One then shows that ∆ implies TR F (∆) on such interpretations. Forthe other direction, assuming that F ∧ Σ ∧ TR F (∆) = q , the Shredding Proposition impliesthat there is a σ S -interpretation J of Θ ·· = Σ ∧ SHR(TR F (∆)) ∧ wf ( σ S ), ¬ q ′ ·· = ¬ SHR( q ), andthe existential closure of F ′ ·· = SHR( F ). We apply an unraveling argument to show that J can be made cycle-free : Deﬁnition 4.13.

The

Gaifman graph G ( I ) of an interpretation I is the undirected graphon dom( I ) connecting any two elements co-occurring in a tuple of I . Given a fact F , aninterpretation I is cycle-free except for F if F has a witness W in I such that any cycleof G ( I ) is only on elements of dom( W ) . Lemma 4.14 (Unraveling) . For any σ S -fact F ′ , GC constraints Θ , and CQ q ′ , if ( ∃ xt F ′ ( x , t )) ∧ Θ ∧ ¬ q ′ is satisﬁable then it has an interpretation which is cycle-free except for F ′ . Letting J ′ be the unraveling of our interpretation J (obtained by the Unraveling Lemma),we can then “unshred” J ′ back to a σ -interpretation I : Deﬁnition 4.15.

The unshredding I of a σ S -interpretation J | = wf ( σ S ) is obtained by setting R I ·· = R J for R ∈ σ ≤ , and, for all R ∈ σ > and t ∈ A J R , creating the tuple a ∈ R I such that ( t, a i ) ∈ R J i for all ≤ i ≤ | R | . As in the proof of the Shredding Proposition, we can show that the unshredding I is well-deﬁned and satisﬁes the unshredded constraints ( ∃ x F ( x )) ∧ Σ ∧ TR F (∆) ∧ ¬ q . Further, weshow that it satisﬁes ∆ and not just TR F (∆), because a match of a FR [ ] Hnl rule τ in I mustbe a match of TR F ( τ ); otherwise the match would witness that J ′ was not cycle-free: Lemma 4.16 (Soundness) . For a σ -fact F , FR [ ] Hnl rule τ and σ S -interpretation J , if J satisﬁes SHR(TR F ( τ )) and is cycle-free except for SHR( F ) , then the unshredding I of J satisﬁes τ . We conclude by sketching the proof of the Unraveling Lemma, which follows [Kaz04, PH09].From an interpretation J of ( ∃ xt F ′ ( x , t )) ∧ Θ ∧ ¬ q ′ , for all u = v in dom( J ) co-occurring insome tuple of J , we call a bag the interpretation with domain { u, v } consisting of the tuples of J mentioning only u, v . We build a graph G over the bags by connecting bags whose domainshares one element. We pick a witness W of F ′ in J and merge in the fact bag all bags whosedomain is included in dom( W ).An unraveling is a tree T of bags obtained by unfolding G starting at the fact bag, whichis preserved as-is. Each bag b of T except the fact bag has a domain containing two elements:one of them occurs exactly in b , its siblings and its parent; the other occurs exactly in b and itschildren (it is introduced in b ). We see T as an interpretation formed of the union of its bags.We construct T from G inductively. For any bag b in T corresponding to a bag b ′ in G ,construct the children of b as follows. For each bag b ′′ adjacent to b ′ in G , if b ′ and b ′′ sharethe element corresponding to the element u introduced in b , create an isomorphic copy of b ′′ as9 child of b in T , whose domain is u plus a fresh element, and perform the unraveling processrecursively on the children.It can be shown that the unraveling operation preserves GC constraints, the fact F ′ , andthe negated CQ ¬ q ′ . As T is a tree, the interpretation it describes is cycle-free (except for thewitness W , because we copied the fact bag as-is). Complexity.

Proposition 4.12 gives a reduction from FR [ ] Hnl and GC QA to FR [ ] Fnl to GC QA, but its output is of exponential size in the input, because of treeiﬁcation. Hence, letting f ( n ) bound the size of the output of our reduction given an input of size n , and letting g ( n )bound the combined complexity of GC QA, we have shown an upper bound of g ( f ( n )) for QAfor FR [ ] Hnl and GC .Further, treeiﬁcation rewrites the rules in a fact-dependent way, so, unlike the previous caseof FR [ ] Fnl and GC QA, data complexity bounds for GC QA do not imply data complexitybounds for FR [ ] Hnl and GC QA.

5. Adding Functional Dependencies

The previous section showed that the language of head-non-looping frontier-one rules is notdestructive of GC QA. However, another kind of rules that we would want to support onhigher-arity relations are functional dependencies (FDs).It is well-known that QA is undecidable for, e.g., ID [2] and arbitrary FDs [CLR03a], so suchconstraints are trivially destructive. As it turns out, undecidability also holds for FR [ ] Hnl rules and FDs; in fact, even for single-head FR [ ] rules and FDs: Theorem 5.1.

QA is undecidable for FDs and single-head frontier-one rules, even if all FDshave a determiner of size . However, for certain kinds of existential rules and FDs, QA is known to be decidable: thisis in particular the case of non-conﬂicting rules and FDs [CGP12]:

Deﬁnition 5.2.

We say that a single-head existential rule τ is non-conﬂicting with respect toa set of FDs Φ if, letting A = R ( z ) be the head atom of τ , letting S be the subset of { , . . . , | R |} such that z i is a frontier variable iﬀ i ∈ S : • No strict subset of S is the determiner of an FD in Φ ; • If S is exactly the determiner of an FD of Φ , then all existentially quantiﬁed variablesin A occur only once. Note that this requires rules to be single-head , and thus head-non-looping. Our result withrespect to adding FDs is:

Theorem 5.3.

Non-conﬂicting frontier-one rules and FDs are non-destructive of arity-twoQA.

In particular, single-head frontier-one rules and FDs are non-destructive of arity-two QA ifall variables in the head atom of rules are assumed to have only one occurrence, as this simplesuﬃcient condition implies the non-conﬂicting condition.To prove the theorem, we assume without loss of generality that we only have FDs on higher-arity relations, as we can write them in GC otherwise. We cannot shred the FDs, as theywould translate to a functionality assertion for the path, e.g., R − i ◦ R j , which is not expressible10n GC (and not even in expressive DLs such as SROIQ [HKS06]). However, we can show that,thanks to the non-conﬂicting requirement, FDs can always be made to hold on interpretations,as long as they hold on a witness of the fact.

Proposition 5.4.

For any GC constraints Σ , non-conﬂicting frontier-one rules ∆ , FDs Φ on σ > , σ -fact F , and CQ q , if there is an interpretation I satisfying Θ ·· = ( ∃ x F ( x )) ∧ Σ ∧ ∆ ∧¬ q and there is a witness W of F in I satisfying Φ , then Θ ∧ Φ is satisﬁable. We ﬁrst prove Proposition 5.4. As in Section 4, consider the treeiﬁcation TR F (∆): it isstill non-conﬂicting as treeiﬁcation only aﬀects rule bodies. Use the Shredding Proposition toobtain an interpretation J of ¬ q ′ ·· = ¬ SHR( q ), Θ ·· = Σ ∧ SHR(TR F (∆)) ∧ wf ( σ S ), and theexistential closure of F ′ ·· = SHR( F ). By our hypothesis about the existence of a witness, wecan assume that J has a witness W of F ′ whose unshredding satisﬁes Φ.In the previous section, we used the Unraveling Lemma to show that J could be assumed tobe cycle-free. We now modify the lemma to additionally ensure the following property on J ,which will forbid FD violations in its unshredding: Deﬁnition 5.5.

Given a set of FDs Φ on σ > , a σ S -interpretation J , and a witness W of afact in J , we call J FD-safe except for W if for every a ∈ dom( J ) , for any R ∈ σ > and FDdeterminer P of R in Φ , considering each t ∈ dom( J ) such that ( t, a ) ∈ R J i for every i ∈ P ,either there is at most one such t or all are in dom( W ) . FD-safety is useful for the following reason:

Lemma 5.6.

For any set of FDs Φ on σ > , for any σ S -interpretation J which is cycle-freeand FD-safe except for a witness W , if the unshredding of W satisﬁes Φ , then the unshreddingof J satisﬁes Φ . We now claim a variant of the Unraveling Lemma:

Lemma 5.7 (FD-aware unraveling) . Let Σ be a GC constraint, F a σ -fact, q a CQ, ∆ non-conﬂicting frontier-one rules and Φ a set of FDs on σ > . Let J be an interpretation satisfying Θ ·· = ( ∃ xt SHR( F )( x , t )) ∧ Σ ∧ SHR(TR F (∆)) ∧ wf ( σ S ) ∧¬ SHR( q ) , and W a witness of SHR( F ) in J . Then there is an interpretation J ′ satisfying Θ such that W is a witness of SHR( F ) in J ′ , and J ′ is cycle-free and FD-safe except for W . We prove the lemma by tweaking the unraveling process to ensure FD-safety: when creatingchildren of each bag b in the unraveling T for neighbors of its corresponding bag b ′ in thebag graph G , omit some neighbors that contain shreddings of higher-arity tuples if the sharedelement u occurs in a strict superset of an FD determiner of Φ, and unravel diﬀerently theneighbors where u occurs exactly at a determiner. This unraveling still satisﬁes Σ, ¬ q ′ , andthe existential closure of F ′ , and satisﬁes SHR(TR F (∆)): the non-conﬂicting condition ensuresthat the omitted facts were not required by a rule.We then apply the FD-aware Unraveling Lemma to J and consider the unshredding I ofthe result; it satisﬁes all necessary constraints as in Section 4, including Φ by Lemma 5.6. Thisproves Proposition 5.4.We conclude by proving Theorem 5.3. We ﬁrst observe that the results of Section 4 extendto a more general notion of fact that allows inequality axioms ( x = y ); indeed, inequalitiesin the fact are preserved by shredding and unshredding, and by unraveling. So Theorem 4.3holds for such facts with inequalities, with the same complexity. Second, we enumerate all11ossible equalities between variables of the fact F , and for each possibility, consider the fact F = where variables are merged following the equalities, and inequalities are asserted between theremaining variables. Proposition 5.4 implies that our original entailment holds iﬀ all the derivedentailments hold where F is replaced by some F = whose canonical interpretation satisﬁes Φ(this can be tested in PTIME for each F = ). Thus we have reduced to QA for FR [ ] Fnl and GC .In terms of complexity, as GC QA is EXPTIME-hard in combined complexity (becausesatisﬁability for the usual two-variable guarded fragment is EXPTIME-hard [Gr¨a99]), theadditional exponential factor (from all possible F = ) has no impact, so the bounds of Section 4also apply to QA for GC and non-conﬂicting frontier-one rules and FDs.

6. Conclusion

In this paper, we have studied the impact of existential rules on the decidability of queryanswering for classes of arity-two constraints. We also explained (in proving Theorem 5.3) howthe decidability extends when inequalities are allowed in facts.We have limited our arbitrary arity constraints to rules, i.e., dependencies. In future workwe will study how to extend our results to arbitrary arity constraint languages with morefeatures, e.g., disjunction. We will also study what happens in the presence of constants (ornominals), which are disallowed in GC (and in the rule languages we consider), but are knownnot to break decidability in arity-two contexts [RG10, CEO09]. This, however, would probablyrequire diﬀerent techniques, as unraveling may create multiple copies of constants. Anotherquestion that would probably require speciﬁc tools is the study of ﬁnite QA , i.e., QA restrictedto ﬁnite interpretations. Acknowledgements.

We are very grateful to Boris Motik and Pierre Senellart for their helpfulfeedback. This work was partly supported by the T´el´ecom ParisTech Research Chair on BigData and Market Insights and by the Engineering and Physical Sciences Research Council,UK, grants EP/G004021/1 and EP/M005852/1.

References [AHV95] Serge Abiteboul, Richard Hull, and Victor Vianu.

Foundations of Databases .Addison-Wesley, 1995.[Baa03] Franz Baader.

The description logic handbook: theory, implementation, and ap-plications . Cambridge University Press, 2003.[BGMR14] Jean-Fran¸cois Baget, Fabien Garreau, Marie-Laure Mugnier, and Swan Rocher.Extending acyclicity notions for existential rules. In

ECAI , 2014.[BGO14] Vince B´ar´any, Georg Gottlob, and Martin Otto. Querying the guarded fragment.

LMCS , 10(2), 2014.[BLM10] Jean-Fran¸cois Baget, Michel Lecl`ere, and Marie-Laure Mugnier. Walking thedecidability line for rules with existential variables. In KR , 2010.[BLMS09] Jean-Fran¸cois Baget, Michel Lecl`ere, Marie-Laure Mugnier, and Eric Salvat. Ex-tending decidable cases for rules with existential variables. In IJCAI , 2009.12BLMS11] Jean-Fran¸cois Baget, Michel Lecl`ere, Marie-Laure Mugnier, and Eric Salvat. Onrules with existential variables: Walking the decidability line.

Artif. Intell. , 175(9-10):1620–1654, 2011.[BMRT11] Jean-Fran¸cois Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Micha¨elThomazo. Walking the complexity lines for generalized guarded existential rules.In

IJCAI , 2011.[BV81] Catriel Beeri and Moshe Y Vardi. The implication problem for data dependencies.In

ICALP . 1981.[CDGL +

05] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,and Riccardo Rosati. DL-Lite: Tractable description logics for ontologies. In

AAAI , 2005.[CEO09] Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Regular path queries inexpressive description logics with nominals. In

IJCAI , 2009.[CGL08] Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Conjunctivequery containment and answering under description logic constraints.

TOCL ,9(3), 2008.[CGL12] Andrea Cal`ı, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-basedframework for tractable query answering over ontologies.

J. Web Semantics , 14,2012.[CGP12] Andrea Cal`ı, Georg Gottlob, and Andreas Pieris. Towards more expressive ontol-ogy languages: The query answering problem.

Artif. Intel. , 193, 2012.[CLR03a] Andrea Cal`ı, Domenico Lembo, and Riccardo Rosati. On the decidability andcomplexity of query answering over inconsistent and incomplete databases. In

PODS , 2003.[CLR03b] Andrea Cal`ı, Domenico Lembo, and Riccardo Rosati. Query rewriting and an-swering under constraints in data integration systems. In

IJCAI , 2003.[DLNS91] Francesco M. Donini, Maurizio Lenzerini, Daniele Nardi, and Andrea Schaerf. Ahybrid system with Datalog and concept languages. In

AI*IA , 1991.[Fag83] Ronald Fagin. Degrees of acyclicity for hypergraphs and relational databaseschemes.

JACM , 30(3), 1983.[FKMP05] Ronald Fagin, Phokion G. Kolaitis, Renee J. Miller, and Lucian Popa. Dataexchange: Semantics and query answering.

TCS , 336(1), 2005.[GHK +

13] Bernardo Cuenca Grau, Ian Horrocks, Markus Kr¨otzsch, Clemens Kupke, De-spoina Magka, Boris Motik, and Zhe Wang. Acyclicity notions for existentialrules and their application to query answering in ontologies.

JAIR , 47, 2013.[GHO02] Erich Gr¨adel, Colin Hirsch, and Martin Otto. Back and forth between guardedand modal logics.

TOCL , 3(3), 2002.13GLHS08] Birte Glimm, Carsten Lutz, Ian Horrocks, and Ulrike Sattler. Conjunctive queryanswering for the description logic SHIQ.

JAIR , 31, 2008.[Gr¨a99] Erich Gr¨adel. On the restraining power of guards.

J. Symbolic Logic , 1999.[HKS06] Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The even more irresistible

SROIQ .In KR , 2006.[Kaz04] Yevgeny Kazakov. A polynomial translation from the two-variable guarded frag-ment with number restrictions to the guarded fragment. In JELIA . 2004.[KR11] Markus Kr¨otzsch and Sebastian Rudolph. Extending decidable existential rulesby joining acyclicity and guardedness. In

IJCAI , 2011.[Len02] Maurizio Lenzerini. Data integration: A theoretical perspective. In

PODS , 2002.[LR98] Alon Y. Levy and Marie-Christine Rousset. Combining Horn rules and descriptionlogics in CARIN.

Artif. Intell. , 104(1-2):165–209, 1998.[Mit83] John C. Mitchell. The implication problem for functional and inclusion dependen-cies.

Information and Control , 56(3), 1983.[PH09] Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with countingquantiﬁers.

Inf. Comput. , 207(8), 2009.[RG10] Sebastian Rudolph and Birte Glimm. Nominals, inverses, counting, and conjunc-tive queries or: Why inﬁnity is your friend!

JAIR , 39, 2010.[Tob01] Stephan Tobies.

Complexity results and practical algorithms for logics in knowl-edge representation . PhD thesis, 2001.

A. Proofs for Section 3: Negative Results for Combination

A.1. Proof of Theorem 3.1:

S2T is destructive

Adapt the proof of Theorem 3.3 by rewriting τ to replace all S -atoms in the right-hand side by S ′ -atoms. The resulting rule is clearly source-to-target, with σ S = { S } and σ T = { D, R, S ′ } .Now impose the concept inclusion S ′ ⊑ S . It is clear that the resulting rules are equivalent tothose of Theorem 3.3, so the same proof applies. A.2. Proof of Theorem 3.2: ID [2] is destructive In this section, as in the rest of the appendix, we write the positions of any relation R as R , . . . , R | R | . We will show this undecidability result by considering the entailment problem . Deﬁnition A.1.

The (unrestricted) entailment problem for two classes CL , CL of constraints,asks, given a set of constraints Σ of CL and a constraint τ ∈ CL , whether Σ entails τ , written Σ | = τ . That is, whether any interpretation of Σ is an interpretation of τ .

14e show a reduction to QA for a class of logical constraints to entailment to this class of con-straints and rules. The idea follows [CLR03b] (Theorem 3.4) but is slightly more complicatedto take care of a diﬃculty that was omitted there.

Lemma A.2.

For any class CL of constraints and CL of existential rules, there is a reductionfrom entailment for CL and CL to QA for CL .Proof. Consider an instance of the entailment problem for CL and CL : Σ is a set of constraintsof CL , and τ : ∀ x ( φ ( x ) → ∃ y ψ ( x ′ , y )) with x ′ ⊆ x is an existential rule of CL . Let us reducethis to an instance of the QA problem for CL .Create fresh unary relations P x for each x ∈ x . We consider the QA instance asking whether F ∧ Σ | = q , where the fact F is φ ( x ) ∧ V x ∈ x P x ( x ) and the query q is ∃ xy φ ( x ) ∧ ψ ( x ′ , y ) ∧ V x ∈ x P x ( x ). We claim that F ∧ Σ | = q iﬀ Σ | = τ , which proves that the reduction is correct.If Σ | = τ , then consider an interpretation I satisfying Σ and the existential closure of F . As I | = Σ and Σ | = τ , we have I | = τ ; thus, applying τ to any witness of F in I , we deduce theexistence of a match of q . This proves that F ∧ Σ | = q .Conversely, if Σ = τ , there exists an interpretation I of Σ that does not satisfy τ , meaningthat there is a violation of τ in I : a set b of elements of dom( I ) such that I | = φ ( b ) but thismatch cannot be extended to a match of ψ . Let us modify I to I ′ by setting, for each x ∈ x , P I ′ x ·· = { ( b ) } , where b is the element of b corresponding to x ∈ x , and setting R I ′ ·· = R I forall other relations R . It is clear that I ′ still satisﬁes Σ, as Σ does not mention the fresh unaryrelations P x . Now, we also have I ′ | = φ ( b ), and by construction I ′ | = V b ∈ b P x ( b ), so that I ′ satisﬁes the existential closure of F . However, I ′ does not satisfy q : the only possible matchof q is on the elements that occur in the P I ′ x , and the impossibility to extend this match to amatch of ψ is by deﬁnition of it being a violation of τ . Hence, I ′ witnesses that F ∧ Σ = τ .Hence, the reduction is correct, which concludes the proof.Thus, let D be a DL that can express the assertions funct( R ) for any binary relation R . Toshow the undecidability of QA for D ∧ ID [2], by the above, it suﬃces to show the undecidabilityof entailment for D ∧ ID [2] and ID [2]. Deﬁnition A.3.

We call

UFD the class of unary functional dependencies (UFDs), that is,functional dependencies (on arbitrary arity relations) whose determiner consist of a singleattribute. We write UFDs as R p → R q , where R p and R q are positions of a higher-arityrelation R . We now claim that functionality assertions on binary relations can be bootstrapped to UFDson arbitrary arity relations, using ID [2]: Lemma A.4.

There is a reduction from entailment for

UFD ∧ ID [2] and ID [2] to entailmentfor D ∧ ID [2] and ID [2] .Proof. Consider constraints Σ of

UFD ∧ ID [2] and a rule τ ∈ ID [2]. Encode each UFD φ : R p → R q of Σ as an ID [2] rule τ φ : ∀ x R ( x ) → R φ ( x p , x q ), where R φ is a fresh binary relation for φ , and a functionality assertion funct( R φ ). Let the constraints Σ ′ consist of the original ID [2]rules, the new ID [2] rules, and the functionality assertions. We claim that Σ | = τ iﬀ Σ ′ | = τ .If Σ ′ = τ , let I be a counterexample interpretation satisfying Σ ′ but not τ . We claim that I also satisﬁes Σ. Indeed, the only thing to check is that UFDs are satisﬁed; but assume thatthere is a UFD φ : R p → R q of Σ that has a violation in I , namely, two tuples a , b ∈ R I such that a p = b p but a q = b q . As I satisﬁes the ID [2] rule τ φ , we have ( a p , a q ) ∈ R I φ and15 b p , b q ) ∈ R I φ ; this contradicts the assertion funct( R φ ) that I is supposed to respect. Hence I satisﬁes Σ, and as I does not satisfy τ , it witnesses that Σ = τ .Conversely, if Σ = τ , let I be a counterexample interpretation satisfying Σ but not τ .Without loss of generality, we have R I φ = ∅ for all the fresh relations R φ , as they are notmentioned in Σ. Now, extend I to an interpretation I ′ that satisﬁes Σ ′ by adding to R I ′ φ , forevery FD φ : R p → R q , for every a ∈ R I , the tuple ( a p , a q ). It is clear that the result I ′ stillsatisﬁes the ID [2] rules of Σ, and that it satisﬁes the ID [2] rules of Σ ′ ; and it is easily seen thatit satisﬁes the functionality assertions as otherwise, as before, a violation of such an assertionin I ′ witnesses a violation of the UFDs of Σ in I . Further, as τ does not mention the R φ , I ′ still does not satisfy τ , because I did not. Hence, I ′ witnesses that Σ ′ = τ .This shows that the reduction is correct, concluding the proof. Deﬁnition A.5.

The class of frontier-one inclusion dependencies (or unary inclusion depen-dencies), ID [1] , is the class of inclusion dependencies with frontier of size . We write an ID [1] rule ∀ x ( R ( x ) → ∃ y S ( x ′ , y )) as R p ⊆ S q , where R p and S q are the positions at which thefrontier variable occurs in the body and head atom respectively.Following this convention, we write rules of ID [2] in the same way: R a R b ⊆ S c S d denotesthe rule ∀ x ( R ( x ) → ∃ y S ( x ′ , y )) where the ﬁrst frontier variable occurs at positions R a and S c in the body and head, and the second occurs at positions R b and S d in the body and head.(Remember that the deﬁnition of ID requires each variable to only occur once in the body atomand head atom.) Note that we must have R a = R b and S c = S d ; but we may have R a = S c or R a = S d , and similarly for R b . We now explain that we can add without loss of generality frontier-one inclusion dependen-cies (or unary inclusion dependencies), ID [1], to the entailment problem, the reason being that ID [1] rules can be encoded in ID [2] up to adding additional attributes. Lemma A.6.

There is a reduction from entailment for

UFD ∧ ID [1] ∧ ID [2] and ID [2] to entail-ment for UFD ∧ ID [2] and ID [2] .Proof. Consider constraints Σ of

UFD ∧ ID [1] ∧ ID [2] and τ ∈ ID [2]. Let σ + be the signatureobtained from σ in the following way: for each relation R ∈ σ , we create a relation R + in σ + whose positions are those of R plus one position R δ, for each ID [1] rule δ of the form R p ⊆ S q ,and one position R δ, for each ID [1] rule δ of the form S q ⊆ R p .Now, encode each ID [1] rule δ : R p ⊆ S q of Σ as the following ID [2] rule on σ + : R p + R δ, ⊆ S q + S δ, . We thus deﬁne the constraints Σ ′ on σ + to consist of these additional ID [2] rules, andof the straightforward rewriting of the original ID [2] and UFD constraints of σ to σ + , rewriting,e.g., R a R b ⊆ S c S d as R a + R b + ⊆ S c + S d + , and R p → R q as R p + → S q + . Once again we show thatΣ | = τ iﬀ Σ ′ | = τ .If Σ = τ , then we extend a counterexample σ -interpretation I to a σ + -interpretation I ′ satisfying Σ ′ as follows: for all R ∈ σ , consider each tuple a ∈ R I , and create in R I ′ + thetuple b deﬁned by b p ·· = a p for all positions R p of R , and b δ, ·· = a p such that δ is of the form R p ⊆ S q , and b δ, ·· = a p such that δ is of the form S q ⊆ R p . It is clear that the result I ′ still satisﬁes the UFD and ID [2] constraints of Σ ′ and violates τ , because they do not mentionthe new attributes of σ + . Further, I ′ clearly satisﬁes the new ID [2] rules because the originalinterpretation I satisﬁed the ID [1] rules. Hence, I ′ witnesses that Σ ′ = τ .Conversely, if Σ ′ = τ , we rewrite a counterexample σ + -interpretation to a σ -interpretationof Σ by simply removing the additional attributes in all tuples, which clearly gives an inter-16retation satisfying Σ: it satisﬁes the ID [1] rules because I ′ satisﬁed the new ID [2] rules of Σ,and the other constraints are preserved. This concludes the proof.We are now ready to conclude, because: Theorem A.7 ([Mit83]) . The entailment problem for

UFD ∧ ID [1] ∧ ID [2] and ID [2] is unde-cidable. This is a slightly stronger result than what is claimed in [Mit83], because their deﬁnition of ID [2] does not forbid repetitions of positions (i.e., it allows ID [2] rules of the form R p R q ⊆ R r R r ).We refer to Appendix C.1 for more details about how the stronger result is proved.This concludes the proof of Theorem 3.2, because, if QA for ID [2] ∧D were decidable, then wewould have decidability of the entailment problem above, by reducing it successively throughLemma A.6, Lemma A.4, and Lemma A.2. A.3. Proof of Theorem 3.3: FR [ ] is destructive Formally, we deﬁne the satisﬁability problem of a fact F and constraints Θ as checking whetherthere is an interpretation of Θ and of the existential closure of F . We will show that thesatisﬁability problem is undecidable, not for FR [ ] ∧ GC , but for the weaker FR [ ] ∧ ALCF .The DL ALCF is GC -expressible; in addition to the constructors of Section 2, it also allows disjunction of concepts: C ⊔ · · · ⊔ C n .We use tiling systems, following the notations of [PH09]. Let T = ( C , H, V ) be a tilingsystem where C = C , . . . , C N is a non-empty ﬁnite set of tiles and H, V ⊆ C are binaryrelations (intuitively standing for “horizontal” and “vertical”).Given a sequence c = c , c , . . . , c n , the inﬁnite tiling problem for c is to determine whetherthere exists an inﬁnite tiling , that is, a function f : N → C such that f ( i,

0) = c i for 0 ≤ i ≤ n and for all i, j ∈ N , ( f ( i, j ) , f ( i + 1 , j )) ∈ H and ( f ( i, j ) , f ( i, j + 1)) ∈ V . It is known that wecan choose a ﬁxed T such that the inﬁnite tiling problem that has c as input is undecidable.Hence, ﬁx such a T in what follows.We consider the (single) FR [ ] rule: τ : ∀ u ( S ( u ) → ∃ xyz R ( u, x ) ∧ D ( u, y ) ∧ R ( y, z ) ∧ D ( x, z ) ∧ S ( x ) ∧ S ( y ) ∧ S ( z ))We impose the functionality restrictions funct( R ) and funct( D ). Intuitively, R stands for“right” and D for “down”.We create one concept C i for each tile in C . We impose the disjointness assertions C i ⊓ C j ⊑ ⊥ for all i = j .We impose the concept inclusions S ⊑ C ⊔ · · · ⊔ C N .We impose the concept inclusions C i ⊑ ∃ R C j ⊔ · · · ⊔ ∃ R C j l where C j , . . . , C j l are all thetiles such that H = { ( C i , C j k ) | ≤ k ≤ l } . Having done this for R and H , we do the samewith D and V .We are now ready to conclude the reduction. We claim that the inﬁnite tiling problem for T and the input c reduces to the satisﬁability of the fact F c and the constraints that we haveimposed, where we deﬁne: F c ( x , . . . , x n ) ·· = S ( x ) ∧ ^ ≤ i ≤ n C c i ( x i ) ∧ ^ , ≤ i

B. Proofs for Section 4: From Existential Rules to Arity-Two

B.1. Proof of Lemma 4.6: Shreddings of FR [ ] Fnl are in GC Deﬁnition B.1.

Recall Deﬁnition 4.13: we call a σ S -interpretation J cycle-free if the Gaifmangraph G ( J ) of J is acyclic.We call a frontier-one existential rule on σ S cycle-free if the conjunctions of atoms of itshead and body are cycle-free.We call a CQ q cycle-free if its Gaifman graph is acyclic, deﬁning the Gaifman graph G ( q ) to have the variables of q as vertices and an edge between any pair of variables that co-occurin an atom of q . We ﬁrst show the following:

Lemma B.2.

Cycle-free frontier-one existential rules on σ S can be translated in PTIME toan equivalent GC sentence. The above claim is clearly implied by the following:

Lemma B.3.

For any cycle-free CQ q ( x ) on σ S with one free variable, q ( x ) can be translatedin quadratic time to an equivalent GC formula with one free variable. ∀ x f x ( φ ( x f , x ) → ∃ y φ ( x f , y ))as the following, in GC : ∀ x f ( φ ′ ( x f ) → ψ ′ ( x f ))where φ ′ and ψ ′ are the formulas obtained from φ and ψ .Let us then show Lemma B.3: Proof.

We test in PTIME whether G ( q ) is connected. If it isn’t, we can rewrite q ( x ) in PTIMEas q ′ ( x ) ∧ V i ∃ y q i ( y ) where q ′ and q i are CQs whose conjunction of atoms is connected, andtranslate q ( x ) in PTIME by translating each of the q i . Hence, we assume without loss ofgenerality that G ( q ) is connected.We proceed by induction on | q | , the number of atoms of q . If | q | = 1 the result is trivial.Otherwise, let A be the set of atoms of q in which the free variable x occurs. Let X be the setof variables occurring in A except x . For any y ∈ X , let X y be the set of variables z diﬀerentfrom x and y such that there exists a path from z to y in G ( q ) which does not go throughthe vertex x . Let A y be the set of the atoms of q which are not in A and contain a variableof { y } ∪ X y . All of these sets can be computed in linear time as the answers to reachabilityquestions on G ( q ), and the number of sets is linear, so the computation takes at most quadratictime.We now claim that { x } , X , and the X y for y ∈ X are a partition of the variables of x . Indeed,as G ( q ) is connected, any variable z diﬀerent from x is either adjacent to x (and thus z ∈ X ),or there is a path from x to it, and the ﬁrst variable of that path after x must be some y ∈ X (so that z ∈ X z ); this justiﬁes that these sets cover the variables of y . Further, these sets arepairwise disjoint. Indeed, ﬁrst, x / ∈ X and x / ∈ X y for all y ∈ X by construction. Second, ifthere is a variable z ∈ X ∩ X y for some y ∈ X , we have y = z as z ∈ X y , and considering theedges in G ( q ) between x and y , x and z , and the path from z to y that does not go through x , we have a cycle in G ( q ), a contradiction. Third, for y, y ′ ∈ X , y = y ′ , if X y and X y ′ are notdisjoint, letting z ∈ X y ∩ X y ′ , as x and y , x and y ′ are connected in G ( q ), and there is a pathfrom z to y and z to y ′ in G ( q ) not going through x , we have a cycle in G ( q ), a contradiction.For similar reasons, A and the A y are a partition of the atoms of q .Now observe that, for any y ∈ X , A y is a conjunction of atoms with free variables y and X y , and G ( A y ) is acyclic and connected because G ( q ) is. Because we have shown disjointness,we can apply the induction hypothesis to justify that ∃X y A y ( X y , y ) can be written in GC as F y ( y ), in quadratic time in ∃X y A y ( X y , y ). Hence, partitioning A as A ′ x (the atoms where only x occurs) and A ′ y for y ∈ X (the atoms of A where variable y occurs, and the other variable isnecessarily x ), we can express q ( x ) as follows in GC : ^ A ∈A ′ x A ( x ) ∧ ^ y ∈X  ∃ z  F y ( z ) ∧ ^ A ∈A ′ y A ( x, z )  Hence, the overall complexity of the rewriting is quadratic, as the induction hypothesis isapplied to sets of atoms that are a partition of the atoms of the original input formula, so thatthe quadratic time spent rewriting each set of atoms is quadratic overall in the input formula.By induction, the proof is completed. 19e then conclude the proof of Lemma 4.6 by observing that for any FR [ ] Fnl rule τ , SHR( τ )is indeed a cycle-free frontier-one existential rule on σ S . Indeed, we show this for the head andbody with the following lemma: Lemma B.4.

For any non-looping conjunction of atoms Φ , SHR(Φ) is cycle-free.Proof.

Any cycle in G (SHR(Φ)) clearly translates to a Berge cycle in Φ that has length > B.2. Proof of Proposition 4.8: QA through shredding

We start by deﬁning shreddings of interpretations:

Deﬁnition B.5.

For any σ -interpretation I , the shredding SHR( I ) of I is the σ S -interpretation J such that R J ·· = R I for all R ∈ σ ≤ , Elt J = dom( I ) , and for every R ∈ σ > , for each tuple a ∈ R I , we create a fresh element t ∈ dom( J ) , we add t to A J R , and we add ( t, a i ) to R J i forall ≤ i ≤ | R | . It is immediate that for any σ -interpretation I , its shredding J satisﬁes wf ( σ ), and that theunshredding of J (in the sense of Deﬁnition 4.15) is I .We ﬁrst show the following lemma to show that negations of CQs, facts and existential rulesare preserved by shredding. Lemma B.6.

For every fact F , CQ q and set ∆ of existential rules, for any interpretation I , I satisﬁes ∆ , ¬ q and the existential closure of F iﬀ SHR( I ) satisﬁes SHR(∆) , ¬ SHR( q ) , andthe existential closure of SHR( F ) . To show this, we deﬁne the notion of homomorphism : Deﬁnition B.7.

For any interpretations I and I ′ , a mapping h : dom( I ) → dom( I ′ ) is a homomorphism from I to I ′ if for every relation R ∈ σ , for any tuple a ∈ R I , the tuple h ( a ) = ( h ( a ) , . . . , h ( a | R | )) is in R I ′ .This notion extends to homomorphisms from queries to interpretations in the usual manner. Lemma B.8.

For any two interpretations I and I ′ , any homomorphism from I to I ′ can beextended to a homomorphism from SHR( I ) to SHR( I ′ ) , and conversely any homomorphismfrom SHR( I ) to SHR( I ′ ) can be restricted to a homomorphism from I to I ′ .Proof. This is immediate, paying attention to the fact that a homomorphism h from I to I ′ deﬁnes a mapping from the tuples of I to the tuples of I ′ , which describes how to extend h to ahomomorphism from SHR( I ) to SHR( I ′ ) by deﬁning the image of h on dom(SHR( I )) \ dom( I ).Conversely, given a homomorphism from SHR( I ) to SHR( I ′ ), its restriction to dom( I ) iseasily seen to be a homomorphism from I to I ′ .We now prove Lemma B.6. Proof.

We prove each part of the claim:

Query q . By Lemma B.8, there is a homomorphism from SHR( q ) to SHR( I ) iﬀ there is ahomomorphism from q to I . Fact F . Similar to the case of the query. 20 ules ∆ . Consider any existential rule τ ∈ ∆.Assume that I | = τ . Consider a homomorphism h from the body of SHR( τ ) (which is theshredding of the body of τ ) to SHR( I ), and show that the image of h is not a violationof τ in SHR( I ). By Lemma B.8, h can be restricted to a homomorphism h ′ from thebody of τ to I . Hence, because I | = τ , h ′ can be extended to a homomorphism h ′′ fromthe body and head of τ to I . By Lemma B.8, h ′′ can be extended to a homomorphism h ′′′ from the shredding of the head and body of τ to SHR( I ) that matches h on the bodyof SHR( τ ). So we conclude that h does not witness a violation of SHR( τ ).Conversely, assume that SHR( I ) | = SHR( τ ). Consider a homomorphism h from thebody of τ to I . As previously h can be extended to a homomorphism h ′ from the bodyof SHR( τ ) to SHR( I ), which can be extended to a homomorphism h ′′ from the bodyand head of SHR( τ ) to SHR( I ). Again we use Lemma B.8 to justify that this deﬁnes ahomomorphism h ′′′ from the body and head of τ to I that matches h on the body of τ ,and conclude that h does not witness a violation of τ .Having proved Lemma B.6, we show the preservation of GC constraints: Lemma B.9.

For every interpretation I and GC theory Σ , we have I | = Σ iﬀ SHR( I ) | = Σ .Proof. The restrictions I | σ ≤ and SHR( I ) | σ ≤ of I and SHR( I ) to σ ≤ are identical (rememberthat the R i in σ S \ σ are fresh so they do not occur in Σ), hence I and SHR( I ) satisfy the same GC constraints.We can now prove one direction of the result: if there is a counterexample interpretation of( ∃ x F ( x )) ∧ Σ ∧ ∆ ∧ ¬ q , its shredding is an interpretation of ( ∃ xt SHR( F )( x , t )) ∧ SHR(∆) ∧¬ SHR( q ) (Lemma B.6) that satisﬁes Σ (Lemma B.9) and wf ( σ S ) (by our initial immediateobservation about the shredding of interpretations).What remains is to prove the converse direction of decoding an interpretation J of Θ ·· =( ∃ xt SHR( F )( x , t )) ∧ Σ ∧ SHR(∆) ∧ wf ( σ S ) ∧ ¬ SHR( q ). This is harder, because we must arguethat J can be understood as the shredding of a σ -interpretation for the above results to apply.This requires us to deal with the issue of redundant tuples : Deﬁnition B.10. A σ S -interpretation J is redundancy-free if there is no R ∈ σ > , no t = t ′ in dom( J ) , and no | R | -tuple a such that ( t, a i ) and ( t ′ , a i ) belong to R J i for all ≤ i ≤ | R | . Redundant tuples are the only obstacle to that prevents us from understanding any inter-pretation of wf ( σ S ) as the shredding of some σ -interpretation. Indeed: Lemma B.11.

SHR is a bijection from σ -interpretations to redundancy-free σ S -interpretationssatisfying wf ( σ S ) .Proof. This is clear, as, writing SHR − the unshredding operation of Deﬁnition 4.15, we havealready observed that, for any σ -interpretation I , we have (SHR − ◦ SHR)( I ) = I . Further,given a redundancy-free σ S -interpretation J satisfying wf ( σ S ), it is immediate that (SHR ◦ SHR − )( J ) = J . This concludes the proof.As redundancy-freeness cannot be expressed in GC , our counterexample interpretation J may not satisfy it. But this does not matter. Recalling our deﬁnition of Θ above, we show: Lemma B.12. If Θ has an interpretation then it has a redundancy-free interpretation. roof. Let J be a σ S -interpretation of Θ.Deﬁne the equivalence relation ∼ J on dom( J ) as follows: t ∼ J t ′ if, for some R ∈ σ > ,( t ) and ( t ′ ) are both in A J R , and for every 1 ≤ i ≤ | R | , ( t, z ) ∈ R J i iﬀ ( t ′ , z ) ∈ R J i . Theconditions of wf ( σ S ) ensure that this is an equivalence relation, because the A J R are pairwisedisjoint. Deﬁne χ J : dom( J ) → dom( J ) / ∼ J the function mapping every element of dom( J )to its ∼ J -equivalence class, and let J ′ be the image of J under χ ·· = χ J . J ′ is redundancy-free as any t, t ′ ∈ dom( J ′ ) witnessing redundancy in J ′ would have aspreimage by χ two elements of J that are ∼ J -equivalent. (This uses the fact that, by wf ( σ S ),two elements of A J R and A J R ′ cannot be adjacent in G ( J ) for any R, R ′ ∈ σ > .)It is easily checked that J ′ is still an interpretation of wf ( σ S ). As χ is a homomorphismfrom J to J ′ , and J satisﬁes the existential closure of F , J ′ also satisﬁes it. Further, becausethe restrictions of J ′ and J to σ ≤ coincide, J ′ is still an interpretation of Σ.To show that J ′ still satisﬁes ¬ SHR( q ), it suﬃces to show the existence of a homomorphismfrom J ′ to J . We build such a homomorphism h by setting, for all a ∈ dom( J ′ ), h ( a ) ·· = a ′ for any preimage a ′ of a by χ . To see why h is a homomorphism, consider any tuple t ∈ R J ′ for some R ∈ σ S . Let t ′ ∈ R J be a preimage of the tuple t by χ . Clearly, by wf ( σ S ), unless R is one of the fresh binary relations R i , all elements of t ′ are singletons in their ∼ J -class, sothat necessarily t ′ = h ( t ) and h ( t ) ∈ R J . If R = R i , write t = ( u, a ) and t ′ = ( u ′ , a ′ ). Wethen have that h ( t ) is a pair ( u ′′ , a ′′ ), and necessarily a ′′ = a ′ because by wf ( σ S ) we know that( a ′ ) ∈ Elt J so a ′ ∼ J a ′′ implies a ′ = a ′′ . Now, as u ′ ∼ J u ′′ as u ′ , u ′′ ∈ A J R , and the A R arepairwise disjoint by wf ( σ S ), we have ( u ′ , a ′ ) ∈ R J i iﬀ ( u ′′ , a ′′ ) ∈ R J i , so that indeed h ( t ) ∈ R J i .Hence, h is indeed a homomorphism from J ′ to J . Thus J ′ satisﬁes ¬ SHR( q ) because J is.For any existential rule τ , to show that J ′ still satisﬁes SHR( τ ), it suﬃces to observe that h ◦ χ is the identity, so that any match m of the body of τ in SHR( τ ) gives such a match in J which, as J | = SHR( τ ), extends to a match of the body and head which is mapped back by f to a match of the body and head of SHR( τ ) in J ′ , so that m does not witness a violation ofSHR( τ ). Hence, J ′ still satisﬁes SHR(∆).We can now complete the proof of Proposition 4.8 with the backwards direction: given ourinterpretation J of Θ, make it redundancy-free by Lemma B.12, and now unshred it to aninterpretation I ′ such that, by Lemma B.11, SHR( I ′ ) = J . We conclude by Lemma B.6 andLemma B.9 that I ′ satisﬁes ∆, Σ, ¬ q , and the existential closure of F . B.3. Proof of Lemma 4.14: Unraveling for GC We present the formal unraveling process. In all of this section, we work only on the signature σ S . Deﬁnition B.13.

For any interpretation J , the induced interpretation J | a of J by a ⊆ dom( J ) is the interpretation containing all the tuples of J where only elements of a occur.A guarded pair in J is a pair { a, b } of two distinct elements of dom( J ) such that a and b co-occur in some tuple of J . The immediate neighborhood IN J ( a ) of a ∈ dom( J ) in J is { b ∈ dom( J ) \{ a } | { a, b } guarded pair in J } .The bags of an interpretation J are the interpretations induced by all guarded pairs of J .The bag graph of a σ S -interpretation J is the undirected graph on the bags of J (withoutself-loops) where two distinct bags are adjacent whenever their domains share one commonelement. (As the domains have size two, they must then share exactly one element.) iven a witness W of a fact F in J , we alter the deﬁnition of the bag graph of J by addingone fact bag corresponding to the witness W ; the fact bag is adjacent to all bags with which itshares one element (but not those with which it shares two elements). Deﬁnition B.14. A tree-like interpretation is a tree T = ( W, E, b r ) where each b ∈ W is abag (that is, an interpretation), b r ∈ W is the root bag, and E is the directed edge relation. Werequire that for all ( b, b ′ ) ∈ E , the domains of b and b ′ share exactly one element u such that u exactly occurs in T at the following places: in b (we say it was introduced in b ), and in allchildren of b (including b ′ ). Further, if two bags b and b ′ in W share some element then eitherthey are siblings in T or one is a child of the other in T . We write dom( T ) = S b ∈ W dom( b ) and also see T as the interpretation S b ∈ W b .Given a fact F and a witness W of F in J , we say that T is an unraveling of J preserving W if b r is the fact bag of J , all other bags of T have domain of size , and elements of dom( J ) only occur in b r (we say they were introduced in b r ). Our goal will be to construct an unraveling of the counterexample interpretation, becauseof the following:

Lemma B.15. If T is an unraveling of an interpretation J preserving a witness W of a fact F , then T is an interpretation where W is also a witness of F and which is cycle-free exceptfor W (recall Deﬁnition 4.13).Proof. Except for dom( W ), G ( T ) is a tree which matches T : if any two elements u, v of dom( T )are not both in dom( W ) and co-occur in a tuple of T , this edge of G ( T ) corresponds to theedge between the bag where u was introduced, and the bag where v was introduced.However, we also want the unraveling to be faithful , so the constraints are preserved. Deﬁnition B.16. T = ( W, E, b r ) is a faithful unraveling of an interpretation J preserving W (where W is a witness of a fact F ) if it is an unraveling of J preserving W such that thereexists a homomorphism π from T to J , and a mapping φ from W to the bags of J that maps b r to the fact bag, and maps no other bag to the fact bag. We require that: (Compat) φ is compatible with π : for any b ∈ W , π | dom( b ) is an isomorphism between b and φ ( b ) , and it is even the identity for b = b r ; (IN) for every a ∈ dom( T ) , π | IN T ( a ) is an isomorphism between IN T ( a ) and IN J ( π ( a )) ; (Surj) φ is surjective except for W : for any bag b of J whose domain is not a subset of dom( W ) , b has a preimage by φ . We say an interpretation is unravelable if all elements of the interpretation occur in at leastone tuple for a binary relation, and if its bag graph is connected; we can assume without lossof generality that interpretations are unravelable by adding tuples for a fresh binary relationto satisfy these conditions. We now claim:

Proposition B.17.

For any fact F , GC constraints Θ , and CQ q ′ , if J is an unravelableinterpretation that satisﬁes Θ and ¬ q ′ and has a witness W of F , and T is a faithful unravelingof J preserving W , then T (seen as an interpretation) satisﬁes Θ , ¬ q ′ , and the existentialclosure of F (in fact W is still a witness of F in T ). roof. It is clear that W is still a witness of F in T . T also satisﬁes ¬ q ′ , since there exists ahomomorphism from T to J , so if T satisﬁed q ′ then so would J .We must show that T still satisﬁes Θ. Up to expanding the original interpretation byinterpreting new relation names, following [Kaz04] we can rewrite the GC constraints Θ asa conjunction of a formula of GF (the guarded fragment with two variables but no numberrestrictions) and number restrictions of the form ∀ x ∃ ⊲⊳n y R ( x, y ) where n ∈ N , ⊲⊳ ∈ {≥ , < } ,and R is a binary relation.The fact that the number restrictions are preserved is immediate, since they only depend onthe immediate neighborhood of elements, which are isomorphically preserved by π accordingto property (IN).We show that GF is preserved by showing the existence of a guarded bisimulation from T to J [GHO02]. We deﬁne the guarded bisimulation as the set I of all restrictions of π tosingletons and guarded pairs of T , which are indeed partial isomorphisms from T to J . Weshow that the back and forth conditions are satisﬁed. For any f : X → Y in I : Forth.

Consider a guarded set Z of T . There is a partial isomorphism f ′ in I with domain Z ,and it agrees with f on Z ∩ X as they are both restrictions of π Back.

Consider a guarded set Z of J . As J is unravelable, all singletons of J occur insome guarded pair of J , so it suﬃces to consider the case where | Z | = 2. Let b be thecorresponding bag of J . We distinguish depending on whether Z does not intersect Y or whether it does:If | Z ∩ Y | = 0, either dom( b ) ⊆ dom( W ) so we can ﬁnd an isomorphism of I with domain π − ( Z ) because π is the identity on dom( W ), or as φ is surjective (property (Surj)) thereexists b ′ ∈ W such that φ ( b ′ ) = b and thus, because φ and π are compatible by property(Compat), the image of π | dom( b ′ ) is Z , so there is a corresponding partial isomorphism in I .If | Z ∩ Y | 6 = 0, the only non-trivial case is | Z ∩ Y | = 1. Let a be the element of Z ∩ Y .Because by property (IN) π | IN T ( a ) is an isomorphism from IN T ( a ) to IN J ( π ( a )), thereexists a guarded pair X ′ of T such that π ( X ′ ) = Z ; hence, there is a partial isomorphism f ′ in J from X ′ to Z , and it agrees with f as both are restrictions of π .This concludes the proof.We must now show that a faithful unraveling exists: Proposition B.18.

For any fact F , for any unravelable interpretation J and witness W of F in J , there is a faithful unraveling T of J preserving W .Proof. To build T , deﬁne the root b r of T as W , set φ ( t r ) = W , initialize π as the identityon W , and deﬁne inductively T = ( W, E, t r ), the homomorphism π and the mapping φ , asfollows. At every bag b ∈ W , consider the corresponding bag φ ( b ) of J . For every element a introduced in b (there is only one except for b = b r ), consider every bag b ′′ in the bag graphof J that shares element π ( a ) with φ ( b ) (so b ′′ is adjacent to φ ( b ) in the bag graph). Lettingdom( b ′′ ) = { π ( a ) , a ′′ } , create a bag b ′′′ in T as a child of b , with domain { a, a ′ } where a ′ is freshand where we set π ( a ′ ) ·· = a ′′ , and make b ′′′ an isomorphic copy of b ′′ following the mapping π .Perform the same process inductively on all child bags.It is clear that the result of this process is indeed an unraveling of J . It is also clear that π thus deﬁned is a homomorphism as any created tuple in T clearly has a homomorphic image24ia π . Last, it is clear that φ maps t r , and only t r , to the fact bag. For property (Compat),it is clear that the restriction of π to any bag b of T is an isomorphism between b and φ ( b ).For property (IN), for any element a ∈ dom( T ) it is clear that π | IN T ( a ) is an isomorphism from IN T ( a ) to IN J ( π ( a )): IN T ( a ) consists of the union of the bag b a where a was introduced and thechildren of b a with which a is shared (i.e., all children, except at b r ), which corresponds exactlyto the bags of J where π ( a ) occurs. For property (Surj), the surjectivity of φ is because J isunravelable, so all bags of J are reachable from the fact bag.This concludes the proof: we make the interpretation unravelable without loss of generality,unravel it with Proposition B.18, and Proposition B.17 and Lemma B.15 ensure that the resultsatisﬁes the required conditions. B.4. Proof of Lemma 4.16: Treeiﬁcation soundness

We call a bad cycle in a conjunction of σ -atoms Φ a Berge cycle of length > F be a σ S -fact, τ be a FR [ ] Hnl rule, and J be a σ S -interpretation. Assume that J iscycle-free except for SHR( F ), and let W be the witness whose existence is guaranteed by this.Similarly to Lemma B.4, it is easily seen that this implies that I is non-looping except withinthe domain of the unshredding W ′ of W .Now, assume that J satisﬁes SHR(TR F ( τ )), and assume that I 6| = τ . Let f be a mappingfrom the body of τ to I that witnesses the violation. We consider the dependency τ ′ (impliedby τ ) obtained by identifying all variables of the body of τ that are mapped to the sameelement by f . We can thus see f as a match of τ ′ that maps all variables of the body of τ ′ to distinct elements. If τ ′ is a FR [ ] Fnl rule, then it is in TR F ( τ ) (taking x ′ = ∅ ), so that if I violates τ then it violates TR F ( τ ), contradicting the fact that it is the unshredding of J whichsatisﬁes SHR(TR F ( τ )) (as in Proposition 4.8).Hence, assume that τ ′ is not a FR [ ] Fnl rule, so that its body has a bad cycle. Because f maps all variables in the body of τ ′ to distinct elements of I , the image of any bad cycle of thebody of τ ′ by f is a bad cycle of I . Hence, as I is non-looping except for W ′ , any bad cycle of τ ′ must be mapped by f to elements of dom( W ′ ). Now consider τ ′′ obtained from τ ′ by setting x ′ to be the variables mapped to elements of dom( W ′ ), setting g that maps each variable x of x ′ to the variable z of F such that we have f ( x ) ∈ P I z (there is precisely one, as W is awitness of SHR( F ), which we have deﬁned to include the atoms P • ( • )), and performing theconstruction g ( τ ′′ ) as in Deﬁnition 4.10. The result τ ′′ is in FR [ ] Fnl , as otherwise a bad cyclein it translates to a bad cycle in τ ′ of elements not matched to dom( W ′ ), which, as we haveseen, contradicts the fact that I is non-looping except within dom( W ′ ). So τ ′′ is in TR F ( τ ),and f is also a match of τ ′′ that maps the frontier variable to the same element. Hence, as I | = TR F ( τ ), we have a contradiction of the fact that f witnesses a violation.This concludes the proof. C. Proofs for Section 5: Adding Functional Dependencies

C.1. Proof of Theorem 5.1: QA is undecidable for FDs and single-head FR [ ] rules Call FR [ ] SH the class of single-head frontier-one rules. Recall the deﬁnition of the entailmentproblem (Deﬁnition A.1) and of UFD (Deﬁnition A.3). We will write rules of ID [1] and ID [2]25s in Deﬁnition A.5.By Lemma A.2, the entailment problem for FR [ ] SH ∧ UFD and ID [2] reduces to QA for FR [ ] SH ∧ UFD , so it suﬃces to show the undecidability of the former to show undecidabilityof the latter. We will do so by adapting the result of [Mit83], who showed that implicationof ID [2] rules by UFD and ID [2] constraints is undecidable. We will need to consider a specialform of the problem studied in [Mit83]: Deﬁnition C.1.

The restricted

UFD / ID [2] entailment problem is the entailment problem for UFD ∧ ID [2] and ID [2] where the input is restricted so that there is only one relation R , and,for any ID [2] rule R a R b ⊆ R c R d in the input, the UFD R a → R b holds in the input. We now state our variant of the undecidability result in [Mit83]:

Theorem C.2.

The restricted

UFD / ID [2] entailment problem is undecidable.Proof. We recall the proof technique of [Mit83]. The proof gives a reduction to the entailmentproblem from the following undecidable problem: given a system of equations of the form x = y ◦ z on functional monoids, decide if a certain equation x = y ◦ z is entailed by thesystem.This problem is reduced to the entailment problem in the following way. Given such a system,we create a relation R with one attribute R x per variable x , plus an extra attribute R a . Weimpose the UFD R a → R x and the ID [1] rule R x ⊆ R a for each position R x of R (except R a ).This ensures that the projection of R to R a R x can be interpreted as the graph of a function.Now, equations of the form x = y ◦ z can be understood as the corresponding assertions on thefunctions represented by R a R x , R a R y and R a R z , and Lemma 4 of [Mit83] shows that such anassertion can actually be enforced by a ID [2]-like constraint: R y R x ⊆ R a R z . Those constraintsare not necessarily ID [2] constraints because we may have R x = R y .We observe that we can enforce that we always have x = y in such constraints by addingmore equations. For every variable x , we replace all its occurrence in the equations by freshvariables x , . . . , x n , and we add the equations x = x , . . . , x n − = x n . Clearly the resultingproblem is equivalent to the original one, and the encoding of each constraint x = y ◦ z is nowan actual ID [2] rule. Similarly to Lemma 4 of [Mit83], we observe that the new equations ofthe form x i = x i +1 are equivalent to asserting R a R x i ⊆ R a R x i +1 and R a R x i +1 ⊆ R a R x i on theprojections.We now observe that the implication problem of [Mit83] with the above restriction can infact be assumed to be in the form of the restricted UFD / ID [2] problem, except that it featuressome ID [1] rules. Indeed, each of the ID [2] rules in the encoding of the equations x = y ◦ z isof the form τ : R y R x ⊆ R a R z , and the UFD constraint φ : R a → R z holds. It is clear that τ ∧ φ | = φ ′ , where φ ′ : R y → R x . Indeed, any violation of φ ′ in an interpretation satisfying τ implies by τ the existence of a violation of φ . Hence, the problem is equivalent to the onewhere we add the UFDs R y → R x for every equation x = y ◦ z . For the equations of theform x i = x i +1 , as R a → R x i and R a → R x i +1 hold, the condition of the restricted UFD / ID [2]problem is also satisﬁed.The last step to reduce to the restricted UFD / ID [2] setting is to eliminate the ID [1] rules. Wedo this using a variant of Lemma A.6, where we encode each ID [1] rule τ : R p ⊆ S q as the ID [2]rule R p R τ, ⊆ R q R τ, , where R τ, and S τ, are fresh positions of R and S respectively, plusthe UFD R p → R τ, so that the condition of the restricted UFD / ID [2] problem is respected.It is easily seen that this does not aﬀect the rest of the proof: projecting away the additional26ttributes or populating them with the same value as their determiner cannot violate any ofthese additional UFDs.What remains now is to show the following: Proposition C.3.

There is a reduction from the restricted

UFD / ID [2] entailment problem toentailment for UFD ∧ FR [ ] SH and ID [2] .Proof. Consider an instance of the restricted UFD/BID entailment problem: we are given arelation R , a set Φ of UFDs, a set ∆ of ID [2] rules, and the ID [2] rule τ , and we ask whetherΦ ∧ ∆ | = τ .Let n be the number of positions of R . We construct the relation S whose positions are S i, and S i, for every position R i of R . We translate each UFD φ : R p → R q of Φ to the twoUFDs φ i : S p,i → S q,i for i ∈ { , } , letting Φ ′ be the resulting UFDs on S . We translate the ID [2] rule τ : R a R b ⊆ R c R d to the ID [2] rule τ ′ : S a, S b, ⊆ S c, S d, . We now describe howeach ID [2] rule of ∆ is translated to FR [ ] SH .Consider a ID [2] rule δ : R a R b ⊆ R c R d . We create a ﬁrst FR [ ] SH rule δ : ∀ x (cid:0) S ( x , . . . , x n , x , . . . , x n ) → ∃ y S ( z , . . . , z n , z , . . . , z n ) (cid:1) deﬁned as follows: • z a is x a ; • z a is x a ; • z b is y b ; • otherwise, z ij is y ij .We create a second FR [ ] SH rule δ : ∀ x (cid:0) S ( x , . . . , x n , x , . . . , x n ) → ∃ y S ( z , . . . , z n , z , . . . , z n ) (cid:1) deﬁned as follows: • z a is x a ; • z c is x a ; • z d is y b ; • otherwise, z ij is y ij .For instance, the ID [2] rule δ : R R ⊆ R R would be encoded as: δ : ∀ x (cid:0) S ( x , x , x , x , x , x , x , x ) → ∃ y S ( x , y , y , y , x , y , y , y ) (cid:1) δ : ∀ x (cid:0) S ( x , x , x , x , x , x , x , x ) → ∃ y S ( y , y , x , y , x , y , y , y ) (cid:1) Note that, by the condition of the restricted UFD/BID entailment problem, the UFD R → R holds in Φ. Hence, y in the head of the ﬁrst rule must be matched to the same element as x ,and likewise for y in the second rule.We let ∆ ′ be the result of this encoding of ∆, and we claim that Φ ∧ ∆ | = τ iﬀ Φ ′ ∧ ∆ ′ | = τ ′ .27o this end, we ﬁrst show that, for any ID [2] constraint δ : R a R b ⊆ R c R d of ∆, with φ : R a → R b in Φ by the assumption of the restricted UFD/BID entailment problem, consideringthe translations φ , φ ∈ Φ ′ of φ , and considering and δ , δ ∈ ∆ ′ as deﬁned above, letting δ ′ : S a, S b, ⊆ S c, S d, be the intuitive ID [2] translation of δ to S , the following entailmentholds: δ ∧ δ ∧ φ ∧ φ | = δ ′ . In other words, our rewriting δ and δ of δ implies thestraightforward rewriting δ ′ .Indeed, consider an interpretation I of δ ∧ δ ∧ φ ∧ φ . Consider a tuple t = ( u , . . . , u n , u , . . . , u n ) ∈ S I We wish to show that it does not witness a violation of δ ′ . By δ , there exists a tu-ple ( v , . . . , v n , v , . . . , v n ) ∈ S I with v a = u a , v a = u a , and v b = v b . As I satisﬁes φ ,as v a = u a , we must have v b = u b , so that v b = u b . Now, by δ , there exists a tuple t ′ = ( w , . . . , w n , w , . . . , w n ) ∈ S I with w a = v a , w c = v a , and w d = w b . Now, as I satisﬁes φ , as w a = v a , we must have w b = v b . Putting it together, we have w c = v a = u a , and w d = w b = v b = u b . Hence, t ′ witnesses that t is not a violation of δ ′ . This proves that, indeed, δ ∧ δ ∧ φ ∧ φ | = δ ′ .Let us now proceed with the proof of the fact that Φ ∧ ∆ | = τ iﬀ Φ ′ ∧ ∆ ′ | = τ ′ , to show thatthe reduction is correct. Assume that Φ ′ ∧ ∆ ′ = τ ′ . Let I be an interpretation of Φ ′ , ∆ ′ thatviolates τ ′ . Let J be the projection of I to the positions S , , . . . , S n, , formally: R J = { ( a , . . . , a n ) | ( a , . . . , a n , a , . . . , a n ) ∈ S I } Because I satisﬁes Φ ′ , I clearly satisﬁes Φ. By our previous observation, it is clear that,because I satisﬁes ∆ ′ and Φ, J satisﬁes ∆. It is also clear that, because I violates τ ′ , J violates τ . So J witnesses that Φ ∧ ∆ = τ .Conversely, assume that Φ ∧ ∆ = τ , and let I be a counterexample interpretation. We create J by constructing S as the product of R by itself: create the tuple ( a , b ) ∈ S J for every tuples a , b ∈ R I . It is clear that J satisﬁes Φ ′ because I satisﬁes Φ (as the FDs are either withinthe positions S i, or within the positions S i, ). For the same reason J still violates τ ′ because I did. We now check that J satisﬁes ∆ ′ . Let δ : R a R b ⊆ R c R d be a rule of ∆ and show that J satisﬁes δ and δ . For δ , let t = ( u , v ) be a tuple of S J . By construction of J we have( u , u ) ∈ S J which witnesses that that F is not a violation of δ . For δ , let t = ( u , v ) be atuple of S J . By construction of J we have v ∈ R I . As I satisﬁes δ , there is a tuple w ∈ R I such that w c = v a and w d = v b . By construction of J , we have ( w , v ) ∈ S J , which witnessesthat t is not a violation of δ . Hence J satisﬁes ∆ ′ , so it witnesses that Φ ′ ∧ ∆ ′ = τ ′ .This shows that our reduction is sound, and concludes the proof.We conclude the proof of Theorem 5.1 by combining Lemma A.2, Proposition C.3 andTheorem C.2. C.2. Proof of Lemma 5.6: FD-safety and cycle-freeness

Let Φ be a set of FDs on σ > , let J be a σ S -interpretation, and assume that it is cycle-freeand FD-safe except for a witness W (of some σ S -fact F ). Note that that there is a slight abuseof terminology here relative to Deﬁnition 4.13: we mean that J is cycle-free except for F , andthat W is a witness satisfying the conditions of the deﬁnition of being cycle-free.Let I be the unshredding of J , and consider two tuples a and b in R I that violate an FD φ of Φ (remember that this implies | R | > W ′ of W satisﬁes Φ, it is not possible that both a and b are in R W ′ . Let P be the positions of R φ , and R r be the position that φ determines, so that a i = b i for all R i ∈ P , but a r = b r .Consider the set S = { a i | R i ∈ P } . If S is not a singleton set, then, as | R | > a = b ,the image of the shredding of a and b creates a cycle in G (SHR( I )), which does not consistonly of elements of W because a and b are not both in R W ′ . This contradicts the fact that J should be cycle-free except for W . Hence, S is a singleton set.Accordingly, let a be the common element which is the a j for any R j ∈ P . Now, consideringthe shredding of a and b in J ′ ·· = SHR( I ), SHR( I ) is such that ( t, a ) and ( t ′ , a ) are in R I ′ i forall R i ∈ P . As P is the determiner of a FD of Φ, SHR( I ) is not FD-safe except for W , because t and t ′ cannot both be in dom( W ), otherwise a and b would be in R W ′ . This contradicts thefact that J is FD-safe except for W C.3. Proof of Lemma 5.7: Unraveling with FDs

We ﬁrst assume without loss of generality that the FR [ ] Hnl constraints have only unary orhigher-arity relations in their head. Indeed, for any FR [ ] Hnl rule τ violating this condition, wecan replace its head by a fresh unary atom U ( x ), where x is the frontier variable, and assertin the GC constraints that U implies the head atom of τ .We ﬁrst deﬁne: Deﬁnition C.4. A proper guarded pair of a σ S -interpretation J is a pair { a, b } of distinctelements of dom( J ) such that a and b co-occur in a relation which is not in σ S \ σ . Note thatif J satisﬁes wf ( σ S ) then, for any guarded pair, either the pair only occurs in tuples for suchrelations, or the pair only occurs in tuples for relations of σ S \ σ .The proper bags of J are the bags induced by proper guarded pairs.Given a σ S -interpretation J and ( a ) ∈ Elt J , the arity-two immediate neighborhood IN J ( a ) of a in J is the restriction of IN J ( a ) to the proper guarded pairs. We give a diﬀerent name to the unravelings that we will create:

Deﬁnition C.5. T = ( W, E, b r ) is an FD-faithful unraveling of an interpretation J preservinga witness W given FDs Φ if it is an unraveling of J preserving W (recall Deﬁnition B.14)such that there exists a homomorphism π from dom( T ) to dom( J ) , and a mapping φ from W to the bags of J that maps b r to the fact bag, and maps no other bag to the fact bag. Werequire that: (Compat-P) φ is compatible with π : for any b ∈ W such that φ ( b ) is a proper bag, π | dom( b ) isan isomorphism between b and φ ( b ) , and it is even the identity for b = b r ; (IN-2) for every a ∈ dom( T ) , π ′| IN T ( a ) is an isomorphism between IN T ( a ) and IN J ( π ′ ( a )) ; (Surj-P) φ is surjective for proper bags except for W : for any proper bag b of J whose domainis not a subset of W , b has a preimage by φ ; (FD-S) T (seen as an interpretation) is FD-safe except for W ; (Achieve) for any a ∈ dom( T ) , for any relation R of σ > , for any subset P of the positions of R which is not a strict superset of an FD determiner of Φ , if π ( a ) is such that ( t ′ , π ( a )) ∈ R J i for some t ′ for all R i ∈ P , then the same is true of a in T (seen as an interpretation) forsome t . Further, unless P is exactly an FD determiner, letting S ·· = IN T ( t ) \{ R i ( t, a ) | i ∈ P } and S ′ ·· = IN J ( t ′ ) \{ R i ( t ′ , π ( a )) | R i ∈ P } , π | S is an isomorphism between S and S ′ . Intuitively, property (Achieve) is designed to preserve exactly what can be asserted by non-conﬂicting rules. Except in the case where the frontier variables are exactly a determiner, thisincludes the patterns of equalities between the “non-frontier” variables of the atom. We cannotpreserve more, because we need to remain FD-safe.We modify the deﬁnition of unravelable interpretations to require that all elements of theinterpretation occur in at least one tuple for a binary relation not in σ S \ σ , and that its baggraph is connected even when the non-proper bags are removed. This can be ensured withoutloss of generality as before, because the fresh binary relation used to ensure the condition isnot in σ S \ σ .We must show the correctness of such unravelings: Proposition C.6.

For any σ S -fact F , GC constraints Σ , CQ q ′ , FDs Φ , and non-conﬂicting FR [ ] constraints ∆ , if J is an unravelable interpretation that satisﬁes Σ , SHR(∆) , wf ( σ S ) , ¬ q ′ , and has a witness W of F , and T is an FD-faithful unraveling of J preserving W , then T is an interpretation which is FD-safe except for W , it satisﬁes Σ , SHR(∆) , wf ( σ S ) , and ¬ q ′ ,and W is still a witness of F in T .Proof. T is clearly FD-safe except for W by property (FD-S), and it satisﬁes ¬ q ′ (by thehomomorphism π ). It satisﬁes Σ and wf ( σ S ) by the same arguments as in the proof of Propo-sition B.17, noting that Σ and wf ( σ S ) do not refer to the fresh relations of σ S , so it is suﬃcientto have isomorphisms between arity-two immediate neighborhoods, and to have surjectivity of π for the proper bags only. The harder part is to show that SHR(∆) is satisﬁed.Consider any τ ∈ ∆, and consider a match f of the body of SHR( τ ) in T , and let a be theelement of dom( T ) to which the frontier variable of τ is mapped. Consider the image of f bythe homomorphism π in J . As J satisﬁes SHR( τ ), this implies that the element a ′ ·· = π ( a ) indom( J ) is such that the head of SHR( τ ) can be matched to J with a homomorphism mappingthe frontier variable to a ′ . Now τ is a single-head dependency, and we made the assumptionthat heads were either unary or higher-arity. If the head of τ is unary, then, so is the headof SHR( τ ), and, considering the restriction of π to any proper bag containing a in T (sucha bag exists as we assumed that the interpretation is unravelable), as this restriction is anisomorphism, we conclude that the unary head atom to which the head of SHR( τ ) is matchedin J also has a match in T , so that f does not witness a violation of SHR( τ ). Hence, let usassume that the head of τ is higher-arity, and let R be the higher-arity relation.This means that there is a subset P of positions of R (namely, the set of positions of thehead of τ where the frontier variable occurs), and there is t ′ ∈ dom( J ), such that ( t ′ , a ′ ) ∈ R J i for all R i ∈ P . We know by the non-conﬂicting condition that P is not a strict superset ofa determiner of an FD in Φ. If P is exactly a determiner of an FD in Φ, property (Achieve)ensures ( t, a ) ∈ R Ti for all R i ∈ P for some t ∈ dom( T ). Now, by the non-conﬂicting condition,all variables in the head of τ at positions not in P are existential variables and it is their onlyoccurrence. Hence, the fact that T satisﬁes wf ( σ ) ensures that the head of SHR( τ ) has a matchin T mapping the frontier variable to a , so that f does not witness a violation of SHR( τ ).If P is not a determiner of an FD, then property (Achieve) ensures that ( t, a ) ∈ R Ti for all R i ∈ P for some t ∈ dom( T ) and IN T ( t ) \{ R i ( t, a ) | R i ∈ P } and IN J ( t ′ ) \{ R i ( t ′ , a ′ ) | R i ∈ P } are isomorphic. This implies that the head of SHR( τ ) has a suitable match in T so that f doesnot witness a violation of SHR( τ ); indeed, seeing the tuples ( t, a ′′ ) and ( t ′ , a ′′′ ) in R Ti and R J i

30s ground R -atoms A and A , the head atom A of τ has a homomorphism to A mappingthe frontier variable to a ′ , and we know that the elements at positions of A and A whichare not in P have the same equalities, and that A contains the frontier variable at positions P and other variables at the other positions; so A also has a homomorphism to A mapping thefrontier variable to a . Hence, T satisﬁes ∆.This justiﬁes that T satisﬁes all the required constraints, concluding the proof.We now describe the FD-faithful unraveling process: Proposition C.7.