Characterising Fixed Parameter Tractability of Query Evaluation over Guarded TGDs
aa r X i v : . [ c s . L O ] J a n Efficiency of Query Evaluation Under Guarded TGDs: TheUnbounded Arity Case
Cristina Feier [email protected] of BremenGermany
ABSTRACT
The paper analyzes the parameterized complexity of evaluatingOntology Mediated Queries (OMQs) based on Guarded TGDs (GT-GDs) and Unions of Conjunctive Queries (UCQs), in the settingwhere relational symbols might have unbounded arity and wherethe parameter is the size of the OMQ. It establishes exact criteriafor fixed-parameter tractability (fpt) evaluation of recursively enu-merable classes of such OMQs (under the widely held ExponentialTime Hypothesis). One of the main technical tools introduced inthe paper is an fpt-reduction from deciding parameterized uniformCSPs to parameterized OMQ evaluation. A fundamental featureof the reduction is preservation of measures which are known tobe essential for classifying classes of parameterized uniform CSPs:submodular width (according to the well known result of Marx forunbounded-arity schemas) and treewidth (according to the wellknown result of Grohe for bounded-arity schemas). As such, thereduction can be employed to obtain hardness results for evalua-tion of classes of parameterized OMQs both in the unbounded andin the bounded arity case. Previously, in the case of bounded ar-ity schemas, this has been tackled using a technique requiring fullintrospection into the construction employed by Grohe.
KEYWORDS ontology mediated querying, efficient evaluation, fix parameter tractabil-ity, unbounded arity, submodular width
Ontology mediated querying refers to the scenario where queriesare posed to a database enhanced with a logical theory, commonlyreferred to as an ontology . The ontology refines the specific knowl-edge provided by the database by means of a logical theory whichmodels general knowledge about the domain. Popular ontologylanguages are decidable fragments of first order logic (FOL) likedescription logics [1], guarded TGDs [12], monadic disjunctive dat-alog [8], but not only – non-monotonic formalisms rooted in logicprogramming like answer set programming [28] and its extensionwith open domains, open answer set programming [25], or com-binations of languages from the former and latter category like r-hybrid knowledge bases [31], g-hybrid knowledge bases [24], etc.have also been considered. As concerns query languages, atomicqueries (AQs), conjunctive queries (CQs), and unions thereof (UCQs)are commonly used. A tuple (L , Q) , where L is an ontology lan-guage and Q is a query language, is referred to as an OMQ lan-guage .One thoroughly explored fragment of FOL as a basis for ontol-ogy specification languages is that of tuple generating dependencies (TGDs). A tgd is a rule (logical implication) having as body andhead conjunctions of atoms, where some variables occurring inhead atoms might be existentially quantified (all other variables areuniversally quantified). As such, it potentially allows the deriva-tion of atoms over fresh individuals (individuals not mentioned inthe database). Answering (even atomic) queries with respect to setsof tgds is undecidable [12]. However, there has been lots of workon identifying decidable fragments [2, 12, 14]. A prominent suchfragment is that of Guarded TGDs (GTGD) [12]: a tgd is guarded if all universally quantified variables occur as terms of some bodyatom, called guard .While query answering with respect to GTGDs is decidable, thecombined complexity of the problem is quite high: exptime -completefor bounded arities schemas, and -complete in general. Anatural question is when can OMQs from (GTGD, UCQ) be evalu-ated efficiently? A first observation is that by fixing the set of tgds,the complexity drops to NP for evaluating CQs, and to PTime forevaluating AQs and CQs of bounded treewdith [13]. These com-plexity results are similar to those concerning query evaluationover databases [32].Efficiency of query evaluation over databases has been a longevolving topic in the database community: starting with resultsconcerning tractability of acyclic CQs evaluation [32], extendedto bounded treewidth CQs in [16], and culminating, in the caseof bounded arity schemas, with a famous result of Grohe whichcharacterizes classes of CQs which can be efficiently evaluated ina parameterized complexity framework where the parameter is thequery size. In this setting, under the assumption that
FPT ≠ W [1],Grohe [23] establishes that those and only those recursively enu-merable classes of CQs which have bounded treewidth modulo ho-momorphic equivalence are fixed parameter tractable (fpt). It isalso shown that fpt coincides with polytime evaluation in this case.Returning to OMQs, for classes of OMQs from (GTGD, UCQ)with bounded arity schemas, a similar characterization to that ofGrohe has been established in a parameterized setting where theparameter is the size of the OMQ [3]. The cut-off criterium for ef-ficient evaluation is again bounded treewidth modulo equivalence,only this time equivalence takes into account also the ontology. AnOMQ from (GTGD, UCQ) has semantic treewidth 𝑘 if there existsan equivalent OMQ from (GTGD, UCQ) whose UCQ has syntac-tic treewidth 𝑘 [5]. Then, under the assumption that FPT ≠ W [1],a recursively enumerable class of OMQs from (GTGD, UCQ) overbounded arity schemas can be evaluated in fpt iff it has bounded se-mantic treewidth. The similarity of the characterization with thedatabase case is not coincidental: the results for OMQs build onthe results of Grohe in a non-trivial way. In fact, the lower bound ristina Feier proof uses a central construction from [23], but has to employ verysophisticated techniques to adapt this to the case of ontologies.The main open question which is adressed in this paper is: whenis it possible to efficiently evaluate OMQs from (GTGD, UCQ) in thegeneral case, i.e. when there is no restriction concerning schema arity? We again consider a parameterized setting, where the parameteris the size of the OMQ. The practical motivation for the choice ofthe parameter is that the size of the OMQ is usually considerablysmaller than the size of the database.Before giving an overview of the main results of the paper, wereview some results concerning efficiency of solving constraint sat-isfaction problems (CSP). Given two sets of relational structures A and B , a CSP problem ( A , B ) asks whether there exists a homomor-phism from some relational structure from A to another relationalstructure from B . The uniform case refers to the situation where A is fixed and B is the class of all relational structures; in this case,the problem is denoted as ( A , _ ) . The parameterized version of theproblem (where the parameter is | A | ) is denoted as p-CSP ( A , _ ) .When A is the class of all relational structures and B is fixed, onespeaks of non-uniform CSPs. In the latter case, Feder and Vardi [21]posited the famous CSP dichotomy conjecture, that is, that everynon-uniform CSP can either be solved in PTIME or it is NP-hard.After a period of intense research on the topic, the conjecture hasbeen proved in parallel in by Zhuk [33] and Bulatov [10].Here we are primarily interested in uniform CSPs, as CQ evalua-tion is tightly linked to solving such CSPs and, as such, complexityresults carry over. In fact, Grohe’s characterization for fpt evalua-tion of recursively enumerable classes of CQs in the bounded aritycase has been achieved via a uniform CSP detour [23].As concerns the unrestricted arity case, a seminal result of Marx[30] provides an fpt characterization for solving parameterized uni-form CSPs which are closed under underlying hypergraphs, i.e.problems of the form p-CSP ( A , _ ) where 𝐴 is a recursively enu-merable class of structures closed under underlying hypergraphs.The characterization has been recently lifted to arbitrary parame-terized uniform CSPs [17]. Both results are based on a widely heldconjecture, the Exponential Time Hypothesis [26]. It turns out thatin this case a new structural measure is relevant, called submodularwidth : a recursively enumerable class of uniform CSPs can be eval-uated in fpt iff it has bounded semantic submodular width, i.e. it isequivalent to a recursively enumerable class of CSPs of boundedsubmodular width.Submodular width will also play a role in our characterization.In the following, we denote with p-OMQ ( Q ) the parameterizedversion of the problem of evaluating a class Q of OMQs from (GTGD,UCQ) (where the parameter is the size of the OMQ). We also saythat Q has bounded semantic submodular width if there exists some 𝑘 > such that each OMQ in Q is equivalent to an OMQ in whichthe corresponding UCQ has submodular width less than 𝑘 . Main Result 1.
Let Q be a recursively enumerable class of OMQsfrom (GTGD, UCQ). Assuming the Exponential Time Hypothesis,p-OMQ ( Q ) is fixed-parameter tractable iff Q has bounded seman-tic submodular width.In order to prove the result above, we exploit the fact that ev-ery OMQ from (GTGD, UCQ) can be rewritten into an OMQ from(GDLog, UCQ) [3], where GDLog stands for Guarded Datalog, the restriction of GTGD to rules with only universally quantified vari-ables. For OMQs from (GDLog, UCQ), we construct equivalent OMQscalled covers which are witnesses for bounded semantic submodu-lar width. Covers are based on sets of characteristic databases forOMQs which are databases that entail the OMQs and which aresufficiently minimal with respect to the homomorphism order, ina very specific sense. Typically, in a database setting, the databaseinduced by a CQ (or its core) can be seen as a canonical databasewhich entails the query and on which to base further constructions.However, in the case of OMQs which pose restrictions on the data-base schema this is no longer possible: a CQ might contain symbolswhich are not allowed to occur in a database. In fact, this is a typ-ical usage of ontologies: to enrich the database schema with newterminology. Thus, it is needed to identify other databases whichare representative for a given OMQ. Based on these new concepts,for OMQs from (GDLog, UCQ) it is possible to obtain an alternativecharacterization for fpt evaluation based on syntactic measures: Main Result 2.
Let Q be a recursively enumerable class of OMQsfrom (GDLog, UCQ) and let Q 𝑐 and D Q be the class of covers andthe class of characteristic databases for OMQs from Q , respectively.Then, under the Exponential Time Hypothesis, the following state-ments are equivalent:(1) p-OMQ ( Q ) is fixed-parameter tractable;(2) Q 𝑐 has bounded submodular width;(3) D Q has bounded submodular width.In order to obtain the hardness result for the above character-ization, we employ an fpt-reduction from parameterized uniformCSP evaluation to parameterized OMQ evaluation. Main Result 3.
For 𝑄 an OMQ from ( GDLog , UCQ ) , there existsan fpt-reduction from p-CSP ( D 𝑄 , _ ) to p-OMQ ( 𝑄 ) .The reduction is important also as a stand-alone result as itcan be used as a black-box tool to port results from the uniformCSP/database realm to the OMQ one. Actually, both Main Result 1and Main Result 2 above can be cast into results characterizingfixed parameter tractability for classes of OMQs in the bounded ar-ity case, by replacing submodular width with treewidth and the Ex-ponential Time Hypothesis with the assumption that FPT ≠ W [1].As such, it is possible to retrieve the semantic characterizationfrom [3] without going into details of the construction employedby Grohe in [23], and also to provide an alternative syntactic char-acterization for fpt evaluation of OMQs from (GDLog, UCQ). Structures, Databases. A schema S is a finite set of relation sym-bols with associated arities. An S -fact has the form 𝑟 ( a ) , where 𝑟 ∈ S , and a is a tuple of constants of size the arity of 𝑟 . An S -structure 𝐴 is a set of S -facts. The domain of a structure 𝐴 , dom ( 𝐴 ) ,is the set of constants which occur in facts in 𝐴 . Given a structure 𝐴 and a subset 𝐶 ⊆ dom ( 𝐴 ) , the sub-structure of 𝐴 induced by 𝐶 isthe structure containing all facts 𝑟 ( b ) ∈ 𝐴 such that b ⊆ 𝐶 .Given two structures 𝐴 and 𝐵 , a function 𝑓 is said to be a ho-momorphism from 𝐴 to 𝐵 , if for every fact 𝑟 ( a ) ∈ 𝐴 , there existsa fact 𝑟 ( b ) ∈ 𝐵 such that ℎ ( a ) = b . When such a homomorphismexists we say that 𝐴 maps into 𝐵 , denoted 𝐴 → 𝐵 . Two structuresare equivalent if 𝐴 → 𝐵 and 𝐵 → 𝐴 . They are isomorphic if they fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case are equivalent and every homomorphism from 𝐴 to 𝐵 is bijective.A structure 𝐴 is said to be a core if every homomorphism from 𝐴 toitself is injective. Every structure 𝐴 has an induced sub-structure 𝐴 ′ which is equivalent to 𝐴 and is a core. We say that 𝐴 ′ is thecore of 𝐴 . All cores of a structure are isomorphic. When restrictedto structures which are cores, the homomorphism relation → is apartial order, which we will refer to as the homomorphism order .An S -database is a finite S -structure. For a database 𝐷 , a set ofconstants a is a guarded set in 𝐷 if there exists a fact 𝑟 ( a ′ ) in 𝐷 suchthat a ⊆ a ′ . Such a set is a maximal guarded set if there exists nostrictly guarded superset. As in the previous definitions, we willsometimes abuse notation by using tuples of constants to refer tothe underlying sets of constants instead. Conjunctive Queries, Atomic Queries. A conjunctive query (CQ) is a formula of the form 𝑞 ( x ) = ∃ y 𝜙 ( x , y ) , where x and y are tuplesof variables and 𝜙 ( x , y ) is a conjunction of atoms having as termsonly variables from x ∪ y . The set x is the set of answer variables of 𝑞 , while the set y is the set of existential variables of 𝑞 . We willdenote with var ( 𝑞 ) the set of variables of 𝑞 , and with 𝐷 [ 𝑞 ] the database induced by 𝑞 , i.e. the set of atoms which occur in 𝜙 . When x is empty, the query is said to be Boolean . A union of conjunctivequeries (UCQ) is a formula of the form 𝑞 ( x ) = 𝑞 ( x ) ∨ . . . 𝑞 𝑛 ( x ) ,where each 𝑞 𝑖 ( x ) is a CQ, for 𝑖 ∈ [ 𝑛 ] . An atomic query (AQ) is aCQ in which 𝜙 contains a single atom. In the following, wheneverwe refer to CQs or UCQs we tacitly assume they are Boolean. Asconcerns AQs, unless stated otherwise, we assume they are of theform 𝑟 ( x ) , i.e. they contain no existentially quantified variables.For a structure 𝐼 , a CQ 𝑞 ( x ) , and a tuple of constants a from dom ( 𝐼 ) , we say that a is an answer to 𝑞 over 𝐼 , or 𝐼 | = 𝑞 ( a ) , if thereis a homomorphism ℎ from 𝐷 [ 𝑞 ] to 𝐼 such that ℎ ( x ) = a . If 𝑞 ( x ) is a UCQ of the form 𝑞 ( x ) ∨ . . . 𝑞 𝑛 ( x ) , 𝐼 | = 𝑞 ( a ) if 𝐼 | = 𝑞 𝑖 ( a ) , forsome 𝑖 ∈ [ 𝑛 ] . Ontology Mediated Queries. An ontology mediated query (OMQ) 𝑄 is a triple (O , S , 𝑞 ( x )) , where O is an ontology, S is a schema, and 𝑞 ( x ) is a query. We say that 𝑄 is over schema S . When O is specifiedusing the ontology language L , and 𝑞 using the query language Q ,we say that 𝑄 belongs to the OMQ language (L , Q) . The role ofschema S is to restrict the relational symbols which might occur indatabases over which 𝑄 is evaluated.Given an S -database 𝐷 , and a tuple of constants a , all of whichare from dom ( 𝐷 ) , we say that a is an answer to 𝑄 over 𝐷 , or 𝐷 | = 𝑄 ( a ) , if O ∪ 𝐷 | = 𝑞 ( a ) , where | = is the entailment relation in L . For 𝑄 and 𝑄 two OMQs over the same schema S , we say that 𝑄 is contained in by 𝑄 , written 𝑄 ⊆ 𝑄 , if for every S -database 𝐷 andtuple of constants a : 𝐷 | = 𝑄 ( a ) implies 𝐷 | = 𝑄 ( a ) . We also saythat 𝑄 is equivalent to 𝑄 if 𝑄 ⊆ 𝑄 and 𝑄 ⊆ 𝑄 . TGDs, Guarded TGDs.
Tuple Generatings Dependencies (TGDs)are first order sentences of the form ∀ x ∀ y 𝜙 ( x , y ) → ∃ z 𝜓 ( x , z ) ,with 𝜙 and 𝜓 conjunctions of atoms having as terms only vari-ables from x ∪ y , and from x ∪ z , respectively. Such a sentencewill be abbreviated as 𝜙 ( x , y ) → ∃ z 𝜓 ( x , z ) . The problem of evalu-ating a UCQ 𝑞 ( x ) over a set of TGDs O w.r.t. an S -database 𝐷 con-sists in checking whether for some tuple a over dom ( 𝐷 ) , it is thecase that 𝐷 | = 𝑄 ( a ) , where 𝑄 is the OMQ (O , S , 𝑞 ( x )) . While theproblem is undecidable [12], it can be characterized via a comple-tion of the database 𝐷 to a structure which is a universal model of the set of TGDs O and 𝐷 , called chase [19, 27, 29]. We will de-note with chase (O , 𝐷 ) the chase of O w.r.t. 𝐷 . Then, 𝐷 | = 𝑄 ( a ) iff chase (O , 𝐷 ) | = 𝑞 ( a ) .There are several variants of the chase; here we describe thestandard one called oblivious chase . Let ( ch 𝑘 (O , 𝐷 )) 𝑘 ≥ be a se-quence of structures such that ch (O , 𝐷 ) = 𝐷 . Then, for every 𝑖 > , ch 𝑖 (O , 𝐷 ) is obtained from ch 𝑖 − (O , 𝐷 ) by considering allhomomorphisms ℎ from the body of some tgd 𝜙 ( x , y ) → 𝜓 ( x , z ) in O , and adding to ch 𝑖 (O , 𝐷 ) all facts obtained from atoms in 𝜓 ( x , z ) by replacing each 𝑥 ∈ x with ℎ ( 𝑥 ) and each 𝑧 ∈ z with somefresh constant. Then, chase (O , 𝐷 ) = Ð 𝑘 ≥ ch 𝑘 (O , 𝐷 ) . Note that chase (O , 𝐷 ) might be infinite.A TGD with a body atom which has as terms all its universallyquantified variables is said to be guarded . The language of guardedTGDs will be denoted as GTGD. Unlike evaluation of OMQs basedon unrestricted TGDs, evaluation of OMQs from (GTGD, UCQ) isdecidable [9]. By further restricting guarded tgds to rules with uni-versally quantified variables only, one obtains the ontology lan-guage guarded Datalog (GDLog) . For every GDLog ontology O andevery database 𝐷 , chase (O , 𝐷 ) is finite and, furthermore for everyfact 𝑟 ( a ) ∈ chase (O , 𝐷 ) , there exists a guarded set a ′ over 𝐷 suchthat a ⊆ a ′ . Parameterized Complexity.
For Σ some finite alphabet, a param-eterized problem is a tuple ( 𝑃, 𝜅 ) , where 𝑃 ⊆ Σ ∗ is a problem, and 𝜅 : Σ ∗ → N is a PTime computable function called the parame-terization of 𝑃 . Such a parameterized problem is fixed-parametertractable if there exists an algorithm for deciding 𝑃 for an input 𝑥 ∈ Σ ∗ in time 𝑓 ( 𝜅 ( 𝑥 )) 𝑝𝑜𝑙𝑦 (| 𝑥 |) , where 𝑓 is a computable functionand 𝑝𝑜𝑙𝑦 is a polynomial. The class of all fixed-parameter tractableproblems is denoted as FPT .Given two parameterized problems ( 𝑃 , 𝜅 ) and ( 𝑃 , 𝜅 ) over al-phabets Σ and Σ , resp., an fpt-reduction from ( 𝑃 , 𝜅 ) to ( 𝑃 , 𝜅 ) is a function 𝑅 : Σ ∗ → Σ ∗ with the following properties:(1) 𝑥 ∈ 𝑃 iff 𝑅 ( 𝑥 ) ∈ 𝑃 , for every 𝑥 ∈ Σ ∗ ,(2) there exists some computable function 𝑓 such that 𝑅 ( 𝑥 ) iscomputable in time 𝑓 ( 𝜅 ( 𝑥 )) 𝑝𝑜𝑙𝑦 (| 𝑥 |) , and(3) there exists some computable function 𝑔 such that 𝜅 ( 𝑅 ( 𝑥 )) ≤ 𝑔 ( 𝜅 ( 𝑥 )) , for all 𝑥 ∈ Σ ∗ .Downey and Fellows [20] defined a hierarchy of parameterizedcomplexity classes W [0] ⊆ W [1] ⊆ W [2] . . . , where W [0] = FPT and each inclusion is believed to be strict. Each class W [i], with 𝑖 ≥ , is closed under fpt-reductions.A class of interest for us is W [1] as under the assumption that FPT ≠ W [1], it is possible to establish intractability results (non-membership to FPT ) for parameterized problems. A well-known W [1]-complete problem is the parameterized 𝑘 -clique problem, wherethe parameter is 𝑘 : for an input ( 𝐺, 𝑘 ) , with 𝐺 a graph and 𝑘 ∈ N ∗ itasks whether 𝐺 has a 𝑘 -clique. An even stronger assumption than FPT ≠ W [1] is the Exponential Time Hypothesis : it states that 3-SATwith 𝑛 variables cannot be decided in 𝑜 ( 𝑛 ) time [26]. The assump-tion is standard in the parameterized complexity theory and canbe used as well to establish intractability results. Structural Measures: Treewdith, Submodular Width. A hy-pergraph is a pair 𝐻 = ( 𝑉 , 𝐸 ) with 𝑉 a set of nodes and 𝐸 ⊆ 𝑉 \{∅} ristina Feier a set of edges . Every relational structure 𝐼 has an associated hyper-graph 𝐻 𝐼 whose vertices are the constants of 𝐼 and whose edgesare the sets of constants from tuples a , for every atom 𝑅 ( a ) of 𝐼 .Let 𝐻 = ( 𝑉 , 𝐸 ) be a hypergraph. A tree decomposition of 𝐻 is apair 𝛿 = ( 𝑇 𝛿 , 𝜒 ) , where 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) is a tree, and 𝜒 is a labelingfunction 𝑉 𝛿 → 𝑉 , i.e., 𝜒 assigns a subset of 𝑉 to each node of 𝑇 𝛿 (informally denoted as a ‘bag’), such that:(1) Ð 𝑡 ∈ 𝑉 𝛿 𝜒 ( 𝑡 ) = 𝑉 .(2) If 𝑒 ∈ 𝐸 , then 𝑒 ⊆ 𝜒 ( 𝑡 ) for some 𝑡 ∈ 𝑉 𝛿 .(3) For each 𝑣 ∈ 𝑉 , the set of nodes { 𝑡 ∈ 𝑉 𝛿 | 𝑣 ∈ 𝜒 ( 𝑡 )} inducesa connected subtree of 𝑇 𝛿 .The treewidth of 𝐻 , TW ( 𝐻 ) , is the smallest 𝑘 such that thereexists a tree decomposition ( 𝑇 𝛿 , 𝜒 ) of 𝐻 , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) , suchthat for every 𝑡 ∈ 𝑉 𝛿 , | 𝜒 ( 𝑡 )| ≤ 𝑘 . A function 𝑓 : 2 𝑉 → R is submodular if 𝑓 ( 𝑋 )+ 𝑓 ( 𝑌 ) ≥ 𝑓 ( 𝑋 ∩ 𝑌 )+ 𝑓 ( 𝑋 ∪ 𝑌 ) . We only considerpositive submodular functions 𝑓 that satisfy 𝑓 (∅) = and are edge-dominated , that is, 𝑓 ( 𝑒 ) ≤ for all 𝑒 ∈ 𝐸 . The submodular width of 𝐻 , SMW ( 𝐻 ) , is the smallest 𝑘 such that for every monotonesubmodular function 𝑓 , there exists a tree decomposition ( 𝑇 𝛿 , 𝜒 ) of 𝐻 , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) , such that 𝑓 ( 𝜒 ( 𝑡 )) ≤ 𝑘 for all 𝑡 ∈ 𝑉 𝛿 .The structural measures on hypergraphs can be lifted to rela-tional structures, CQs, UCQs, and OMQs from (GTGD, UCQ). Let X range over { TW , SMW } . Then, for a relational structure 𝐼 , X ( 𝐼 ) = X ( 𝐻 𝐼 ) . For a CQ 𝑞 , X ( 𝑞 ) = X ( 𝐻 𝐷 [ 𝑞 ] ) , while for a UCQ 𝑞 ′ , X ( 𝑞 ′ ) = max 𝑞 is a CQ in 𝑞 ′ ( 𝑋 ( 𝑞 )) . Finally, for an OMQ 𝑄 = (O , S , 𝑞 ) , with 𝑞 a UCQ, X ( 𝑄 ) = X ( 𝑞 ) . As results concerning complexity of evaluating classes of CSPsand OMQs from (GTGD, UCQs) over bounded arity schemas show, equivalence-based measures frequently play a role in characteriza-tions of efficient (fpt) evaluation. Following [3, 4, 6], we will referto such measures as semantic measures . In particular, for an OMQ 𝑄 ∈ ( GTGD , UCQ ) and a structural measure X ∈ { TW , SMW } , wesay that 𝑄 has semantic X -width 𝑘 if there exists some OMQ 𝑄 ′ from ( GTGD , UCQ ) such that 𝑄 ≡ 𝑄 ′ and 𝑋 ( 𝑄 ′ ) = 𝑘 . Boundedsemantic X -width for a class of OMQs is then defined accordingly.However, in order to establish such characterizations, an impor-tant issue is finding witnesses for classes of problems of boundedsemantic measures, i.e. classes of problems which are equivalentto the original ones and which have bounded syntactic measures.In the case of CQs, their cores serve as witnesses for seman-tic treewidth [23] and also for semantic submodular width [17].Thus, as concerns classes of CQs of bounded semantic X -width,with X ∈ { TW , SMW } , the class of cores of CQs from the originalclass serves as a witness, i.e. it has actual bounded X -width. In thecase of OMQs, things become more complicated. It is no longerenough to consider the cores of CQs which occur in an OMQ inorder to find witnesses of small width. The ontology also plays arole and can further lower the semantic width. For examples of thisfor OMQs from ( GTGD , UCQ ) as concerns semantic treewidth, see[3, 4]. Here, we show how the ontology influences semantic sub-modular width: Example 3.1.
For 𝑅 a binary relational symbol, 𝑖 ∈ N , with 𝑖 > ,and x 𝑖 an 𝑖 -tuple of variables, we denote with 𝜓 𝑅𝑖 ( x 𝑖 ) the formula 𝑅 ( 𝑥 , 𝑥 ) ∧ · · · ∧ 𝑅 ( 𝑥 𝑖 − , 𝑥 𝑖 ) . Let Q be the class of OMQs ( 𝑄 𝑖 ) 𝑖 > ,with 𝑄 𝑖 = (O 𝑖 , S 𝑖 , 𝑞 𝑖 ) , where: O 𝑖 = { 𝑆 𝑖 ( x 𝑖 ) → 𝜓 𝑅𝑖 ( x 𝑖 )} S 𝑖 = { 𝑆 𝑖 ,𝑇 } 𝑞 𝑖 = ∃ x 𝑖 𝜓 𝑅𝑖 ( x 𝑖 ) ∧ 𝜓 𝑇𝑖 ( x 𝑖 ) Then Q has unbounded submodular width: for every 𝑖 > , 𝐻 𝐷 [ 𝑞 𝑖 ] is the 𝑖 -clique. Thus, for every tree decomposition ( 𝑇 𝛿 , 𝜒 𝛿 ) of 𝐻 𝐷 [ 𝑞 𝑖 ] , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) , there must be some node 𝑡 ∈ 𝑉 𝛿 such that 𝜒 𝛿 ( 𝑡 ) = x 𝑖 . Let 𝑓 : 2 x 𝑖 → R + be the submodular func-tion 𝑓 ( 𝑋 ) = | 𝑋 |/ . Then 𝑓 is also edge-dominated with respect to 𝐷 [ 𝑞 𝑖 ] , and its minimumum over all tree decompositions of 𝐻 𝐷 [ 𝑞 𝑖 ] is 𝑖 / . Thus, SMW ( 𝑄 𝑖 ) ≥ 𝑖 / , for every 𝑖 > , and Q has un-bounded submodular width. However, for every 𝑖 > , it is the casethat 𝑅 is not part of the schema S 𝑖 . Thus, every database 𝐷 𝑖 suchthat 𝐷 𝑖 | = 𝑄 𝑖 must contain an atom of the form 𝑆 𝑖 ( c 𝑖 ) , where c 𝑖 isan 𝑖 -tuple of constants. Then, for every 𝑖 > , 𝑄 𝑖 is equivalent toan OMQ 𝑄 ′ 𝑖 = (O 𝑖 , S 𝑖 , 𝑞 ′ 𝑖 ) , with 𝑞 ′ 𝑖 = ∃ x 𝑖 𝑆 𝑖 ( x 𝑖 ) ∧ 𝜓 𝑅𝑖 ( x 𝑖 ) ∧ 𝜓 𝑇𝑖 ( x 𝑖 ) .Let Q ′ be the class of OMQs ( 𝑄 ′ 𝑖 ) 𝑖 > . As for every 𝑖 > , 𝑞 ′ 𝑖 isguarded, i.e. it contains an atom 𝑆 𝑖 ( x 𝑖 ) which has as terms var ( 𝑞 ′ 𝑖 ) ,it follows that SMW ( 𝑞 ′ 𝑖 ) ≤ : this is due to the fact that only edge-dominated functions are considered when defining the submodu-lar width and thus the guard ensures that for every such function 𝑓 , 𝑓 ( x 𝑖 ) ≤ . Thus, Q ′ has bounded submodular width and Q hasbounded semantic submodular width.Example 3.1 shows that it is possible to find an equivalent OMQof lower submodular width than the original one, by actually in-flating the original CQ, i.e. adding extra atoms to it. This cannot bethe case for treewidth, as it is a monotonic measure. The examplealso shows that, sometimes, in order to obtain an equivalent OMQof lower submodular width, it is needed to use relational symbolswhich do not occur in the original CQ.In this section, we introduce a way to normalize OMQs from ( GDLog , UCQ ) to obtain equivalent OMQs which witness low sub-modular width. We do this by systematically adding atoms to CQsin the original UCQ which occur in databases which entail theOMQ: the purpose of the added atoms is to provide guards foratoms in the query and to subsequently lower the submodular width,as in Example 3.1. Intuitively, by adding such guards, the submod-ular width is decreased, as the space of edge-dominated submodu-lar functions is shrunk. We will consider only databases which are‘weak’ with respect to the homomorphism order among the setof databases which entail the OMQ. As the homomorphism orderis not well-founded, there are obviously no weakest, i.e. minimal,such databases. However, we will identify sets of databases whichare weak enough for our purposes, which we will refer to as char-acteristic databases . In our quest to find databases which are weak enough with respectto the homomorphism order, we start by considering databaseswhich have a special property with respect to a given OMQ, called injectively only . This property has been first introduced in [4]. A fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case contraction of a CQ 𝑞 is any CQ obtained from 𝑞 by variable iden-tification. If 𝑞 ′ and 𝑞 ′′ are contractions of 𝑞 , 𝑞 ′′ is said to be a pre-contraction of 𝑞 ′ (with respect to 𝑞 ) if 𝐷 [ 𝑞 ′′ ] → 𝐷 [ 𝑞 ′ ] and 𝐷 [ 𝑞 ′ ] 𝐷 [ 𝑞 ′′ ] . For a CQ 𝑞 and an instance 𝐼 , we write 𝑞 → 𝑖𝑜 𝐼 if 𝑞 → 𝐼 and every homomorphism from 𝑞 to 𝐼 is injective. Definition 3.2.
For 𝑄 = (O , S , 𝑞 ) from ( GDLog , UCQ ) and 𝐷 an S -database such that 𝐷 | = 𝑄 , we say that 𝐷 has the injectively only (io)property with respect to 𝑄 if for every contraction 𝑝 ′ of some CQ 𝑝 in 𝑞 such that 𝐷 [ 𝑝 ′ ] → 𝑖𝑜 chase (O , 𝐷 ) , there is no pre-contraction 𝑝 ′′ of 𝑝 ′ with respect to 𝑝 and S -database 𝐷 ′ such that:(1) 𝐷 ′ → 𝐷 , and(2) 𝐷 [ 𝑝 ′′ ] → 𝑖𝑜 chase (O , 𝐷 ′ ) .Intuitively, io databases are databases which entail the OMQ andwhich are weak enough with respect to the homomorphism ordersuch that any pre-database which still entails the OMQ behavessimilarly to the original database as concerns injectively only map-pings of contractions. When OMQs restrict database schemas, theyare especially useful as they can be seen as a form of canonicaldatabases which entail the OMQ (in the case of UCQ evaluation,where the database schema is unrestricted, it is common to con-sider as canonical databases the databases 𝐷 [ 𝑞 ] induced by CQs 𝑞 in the UCQ). An important property of io databases is that forevery OMQ 𝑄 and database 𝐷 such that 𝐷 | = 𝑄 , there exists a pre-database 𝐷 ′ → 𝐷 such that 𝐷 ′ has the io property with respect to 𝑄 [4]. Example 3.3.
Let 𝑄 = (O , S , 𝑞 ) be the OMQ with: O = { 𝑈 ( 𝑥, 𝑦, 𝑧 ) ∧ 𝑉 ( 𝑧, 𝑥 ) → 𝑇 ( 𝑧, 𝑥 )} S = { 𝑅, 𝑆, 𝑈 , 𝑉 } 𝑞 = ∃ 𝑥, 𝑦, 𝑧 𝑅 ( 𝑥, 𝑦 ) ∧ 𝑆 ( 𝑦, 𝑧 ) ∧ 𝑇 ( 𝑧, 𝑥 ) Also, let 𝐷 and 𝐷 be two S -databases: 𝐷 = { 𝑅 ( 𝑎,𝑏 ) , 𝑆 ( 𝑏, 𝑎 ) , 𝑈 ( 𝑎,𝑏, 𝑎 ) ,𝑉 ( 𝑎, 𝑎 )} 𝐷 = { 𝑅 ( 𝑎,𝑏 ) , 𝑆 ( 𝑏, 𝑐 ) , 𝑈 ( 𝑎, 𝑏, 𝑐 ) , 𝑉 ( 𝑐, 𝑎 )} Then, both 𝐷 | = 𝑄 and 𝐷 | = 𝑄 . Let 𝑞 ′ be the contractionof 𝑞 obtained by identification of 𝑥 and 𝑧 : 𝑞 ′ = ∃ 𝑥, 𝑦 𝑅 ( 𝑥, 𝑦 ) ∧ 𝑆 ( 𝑦, 𝑥 ) ∧ 𝑇 ( 𝑥, 𝑥 ) . It can be verified that 𝐷 [ 𝑞 ′ ] → 𝑖𝑜 chase (O , 𝐷 ) and 𝐷 [ 𝑞 ] → 𝑖𝑜 chase (O , 𝐷 ) . As 𝑞 is a pre-contraction of 𝑞 ′ withrespect to itself, and 𝐷 → 𝐷 , it follows that 𝐷 does not havethe io property with respect to 𝑄 . However, as 𝑞 admits no pre-contraction, and 𝐷 [ 𝑞 ] → 𝑖𝑜 chase (O , 𝐷 ) , it follows that 𝐷 hasthe io property with respect to 𝑄 . To further weaken databases which entail an OMQ, we exploit thefact that atomic queries with respect to GDLog programs are pre-served under an operation called guarded unraveling . The opera-tion was first introduced in [22]. A definition closer to our purposesis provided in [3]. For completeness, we repeat it here.For an S -database 𝐷 and a guarded set a over 𝐷 , we constructa structure 𝐼 a , the guarded unravelling of 𝐷 at a , in parallel with atree decomposition ( 𝑇 , 𝜒 ) , with 𝑇 = ( 𝑉 , 𝐸 ) , as follows. 𝑉 is the setof all sequences of the form a . . . a 𝑛 , where a = a , and a , . . . , a 𝑛 are maximal guarded sets in 𝐷 such that a 𝑖 ∩ a 𝑖 + ≠ ∅ , and a 𝑖 ≠ a 𝑖 + , for every ≤ 𝑖 < 𝑛 . For such a sequence 𝑠 , we denote with tail ( 𝑠 ) its last component a 𝑛 . For every 𝑠 , 𝑠 ∈ 𝑉 , it is the casethat ( 𝑠 , 𝑠 ) ∈ 𝐸 iff 𝑠 = 𝑠 b , for some maximal guarded set b in 𝐷 . We further consider a set of constants 𝑆 such that a ⊆ 𝑆 andsuch that 𝑆 contains an infinite amount of copies for every constant 𝑎 ∈ dom ( 𝐷 ) . We define 𝜒 : 𝑉 → 𝑆 in parallel with 𝐼 a inductively.First, we set 𝜒 ( a ) = a and initialize 𝐼 a as 𝐷 | a . Then, for every node 𝑡 ∈ 𝑉 , such that 𝜒 ( 𝑡 ) is defined, and for every 𝑡 ′ ∈ 𝑉 such that ( 𝑡, 𝑡 ′ ) ∈ 𝐸 , let tail ( 𝑡 ) = b and tail ( 𝑡 ′ ) = b . Then, b ′ is the set ofconstants obtained from b by replacing each constant 𝑏 ∈ b \ b with a fresh copy from 𝑆 . Let 𝜒 ( 𝑡 ′ ) = b ′ and 𝐷 ′ be a copy of thedatabase 𝐷 | b in which the constants from b have been replacedwith their counterparts from b ′ . Then, 𝐼 a = 𝐼 a ∪ 𝐷 ′ .The main property of the guarded unraveling construction isthat for every OMQ 𝑄 from (GDLog, AQ), every database 𝐷 , everymaximal guarded set a in 𝐷 , and every tuple a ′ ⊆ a : 𝐷 | = 𝑄 ( a ′ ) implies 𝐼 a | = 𝑄 ( a ′ ) . Furthermore, it follows directly from the con-struction above, that 𝐼 a admits a tree decomposition in which eachbag is guarded, i.e. it contains a fact which has as terms all vari-ables from that bag. We call such a tree decomposition a guardedtree decomposition . Note that 𝐼 a might be infinite.For an OMQ 𝑄 = (O , S , 𝑞 ) from (GDLog, UCQ) and a database 𝐷 such that 𝐷 | = 𝑄 , we will use guarded unravelings to disentangleparts of 𝐷 needed to satisfy different atoms of some CQ in 𝑞 , whileensuring that the new database still entails the OMQ. However, weneed finite versions of guarded unravelings in order to be able toconstruct such a new database. As such, for a guarded set a over 𝐷 , we are interested in a database 𝐷 a which preserves query an-swers to OMQs of the form (O , S , 𝑝 ( x )) , where 𝑝 ( x ) is some atomoccurring in a CQ in 𝑞 . In this setting, it follows from Lemma 3 in[11] that it is possible to find a subset 𝐷 a of 𝐼 a of finite boundedsize, where the bound is in the size of the original OMQ 𝑄 . Fur-thermore, it holds that: a ∈ dom ( 𝐷 a ) and 𝐷 a has a guarded treedecomposition ( 𝑇 𝛿 , 𝜒 ) , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) , a ∈ 𝑉 𝛿 , and 𝜒 ( a ) = a . Example 3.4.
Let 𝑄 = (O , S , 𝑞 ) and 𝐷 be the OMQ and the data-base from Example 3.3, resp. We want to construct 𝐷 ( 𝑎,𝑐 ) such thatfor every tuple a of constants from ( 𝑎, 𝑐 ) and every OMQ 𝑄 ′ ofthe form (O , S , 𝑝 ( x )) with 𝑝 ( x ) an atom from 𝑞 it is the case that 𝐷 | = 𝑄 ′ ( a ) implies 𝐷 ( 𝑎,𝑐 ) | = 𝑄 ′ ( a ) .We first notice that chase (O , 𝐷 ) = 𝐷 ∪ { 𝑇 ( 𝑐, 𝑎 )} . The onlynew fact which can be inferred from 𝐷 using the unique tgd from O is 𝑇 ( 𝑐, 𝑎 ) . In particular, the facts 𝑈 ( 𝑎,𝑏, 𝑐 ) and 𝑉 ( 𝑐, 𝑎 ) from 𝐷 have been used to infer this fact. The only OMQ 𝑄 ′ which needsto be preserved then by 𝐷 ( 𝑎,𝑐 ) is 𝑄 ′ = (O , S ,𝑇 ( 𝑧, 𝑥 )) . We have that 𝐷 | = 𝑄 ′ ( 𝑐, 𝑎 ) and want to ensure that 𝐷 ( 𝑎,𝑐 ) | = 𝑄 ′ ( 𝑐, 𝑎 ) . Then, 𝐷 ( 𝑎,𝑐 ) could be defined as { 𝑈 ( 𝑎,𝑏 ′ , 𝑐 ) , 𝑉 ( 𝑐, 𝑎 )} , where 𝑏 ′ is a freshconstant which is a copy of 𝑏 from dom ( 𝐷 ) . As concerns 𝐷 ( 𝑎,𝑏 ) and 𝐷 ( 𝑎,𝑐 ) , they could be defined as 𝐷 ( 𝑎,𝑏 ) = { 𝑅 ( 𝑎, 𝑏 )} and 𝐷 ( 𝑏,𝑐 ) = { 𝑆 ( 𝑏, 𝑐 )} . W e use the conditional ‘could’ as databases 𝐷 a need not be subset-minimal and assuch there might be a choice when choosing a particular one which fits the constraints.It is enough for our purposes that they have finite bounded size. ristina Feier In this section we introduce the notion of diversification , whichessentially is a pre-image of a database with special properties asconcerns its mappings into the database. For a function 𝑓 and 𝐴 asubset of its domain, we denote with 𝑓 | 𝐴 the restriction of 𝑓 on 𝐴 .For a database 𝐷 , a constant 𝑐 ∈ dom ( 𝐷 ) is isolated in 𝐷 if it occursin a single fact in 𝐷 . The kernel of a database 𝐷 , ker ( 𝐷 ) , is the setof non-isolated constants in 𝐷 . Definition 3.5. A diversification of a database 𝐷 is a tuple ( 𝐷, ↑) ,where 𝐷 is a database which is a core, and which maps into 𝐷 viathe homomorphism ↑ which has the following properties:(1) ↑ | ker ( 𝐷 ) is injective, and(2) ↑ | a is injective, when a is a guarded set in 𝐷 (we say that ↑ is injective on guarded sets ).Whenever ( 𝐷, ↑) is a diversification of 𝐷 , we write 𝐷 (cid:22) 𝐷 .We observe that for every database 𝐷 ⊆ 𝐷 which is a core, ( 𝐷, ↑) is a diversification of 𝐷 , when ↑ is the identity function on dom ( 𝐷 ) . It can also be easily checked that (cid:22) is transitive. Example 3.6.
Let 𝑄 = (O , S , 𝑞 ) and 𝐷 be the OMQ and data-base from Example 3.3, respectively. Also, let 𝐷 be the S -database: { 𝑅 ( 𝑎, 𝑏 ) , 𝑆 ( 𝑏, 𝑐 ) , 𝑉 ( 𝑐, 𝑎 )} and ↑ be the identity mapping on dom ( 𝐷 ) .As 𝐷 is a core and 𝐷 ⊆ 𝐷 , it follows that ( 𝐷, ↑) is a diversificationof 𝐷 and thus 𝐷 (cid:22) 𝐷 .For 𝐷 and 𝐷 databases, and ↑ a mapping from dom ( 𝐷 ) to dom ( 𝐷 ) which is injective on guarded sets, we denote with 𝐷 + 𝐷 ↑ the data-base obtained from 𝐷 by adding for each maximal guarded set a in 𝐷 the database 𝐷 a obtained from 𝐷 ↑( a ) by renaming the constantsin ↑ ( a ) to those in a . When 𝐷 and ↑ are clear from the context,as in the case when ( 𝐷, ↑) is a diversification of 𝐷 , we write 𝐷 + instead 𝐷 + 𝐷 ↑ . Example 3.7.
Let 𝑄 and 𝐷 be the OMQ and the database fromExample 3.3, respectively. Also, let 𝐷 ( 𝑎,𝑏 ) , 𝐷 ( 𝑎,𝑐 ) , and 𝐷 ( 𝑏,𝑐 ) bethe guarded unravelings of 𝐷 from Example 3.4, and ( 𝐷, ↑) be thediversification of 𝐷 from Example 3.6. Then, the database 𝐷 + 𝐷 ↑ ,shortly 𝐷 + , is the union of 𝐷 , 𝐷 ( 𝑎,𝑏 ) , 𝐷 ( 𝑎,𝑐 ) , and 𝐷 ( 𝑏,𝑐 ) . Note thatno constants had to be replaced in the unravelled databases whenadding them to 𝐷 as ↑ is the identity mapping.We will now make operational what it means to disentangle adatabase as much as possible by using guarded unravelings. Definition 3.8.
Let 𝑄 be an OMQ from (GDLog, UCQ) over schema S and let 𝐷 be an S -database such that 𝐷 | = 𝑄 . Also, let ( 𝐷, ↑) be adiversification of 𝐷 . We say that ( 𝐷, ↑) is a minimal diversificationof 𝐷 with respect to 𝑄 if 𝐷 + | = 𝑄 and there exists no diversifica-tion ( 𝐷 ′ , ↓) of 𝐷 such that: ( 𝐷 ′ ) + | = 𝑄 , 𝐷 ′ (cid:22) 𝐷 , and 𝐷 (cid:14) 𝐷 ′ .Let div ( 𝐷 , 𝑄 ) be the set of all minimal diversifications of 𝐷 withrespect to 𝑄 .Intuitively, for such a minimal diversification ( 𝐷, ↑) of 𝐷 andthe ensuing database 𝐷 + , its 𝐷 -part preserves the part of 𝐷 whichis needed to satisfy the structure of a CQ, while the + -part, theguarded unravelings of 𝐷 , are needed to satisfy the actual atomswhich occur in a CQ. Example 3.9.
Let 𝑄 , 𝐷 , 𝐷 , ↑ , and 𝐷 + be as in Example 3.7. Thereexists no database 𝐷 ′ such that 𝐷 ′ (cid:22) 𝐷 , 𝐷 (cid:14) 𝐷 ′ , and function ↓ such that ( 𝐷 ′ , ↓) is a diversification of 𝐷 and ( 𝐷 ′ ) + 𝐷 ↓ | = 𝑄 . Tosee why this is the case, we observe that in order to obtain sucha database 𝐷 ′ , it is needed to either remove some fact from 𝐷 orto rename occurrences of some constant from dom ( 𝐷 ) in differ-ent facts in 𝐷 with fresh constants: e.g. 𝑅 ( 𝑎, 𝑏 ) and 𝑆 ( 𝑏, 𝑐 ) from 𝐷 become 𝑅 ( 𝑎, 𝑏 ) and 𝑆 ( 𝑏 , 𝑐 ) . But then, for every database 𝐷 ′ obtained by such operations it is no longer the case that there ex-ists some diversification ( 𝐷 ′ , ↓) of 𝐷 , for which ( 𝐷 ′ ) + | = 𝑄 . Thus, ( 𝐷, ↑) ∈ div ( 𝐷 , 𝑄 ) .We take a closer look at 𝐷 + . The hypergraph of 𝐷 , 𝐻 𝐷 , is iso-morphic to the hypergraph of 𝐷 [ 𝑞 ] , 𝐻 𝐷 [ 𝑞 ] , thus 𝐷 preserves thestructure of 𝑞 , while the added partial unravelings of 𝐷 satisfy theindividual atoms in 𝑞 : 𝐷 ( 𝑎,𝑏 ) ∪ O | = 𝑅 ( 𝑎, 𝑏 ) , 𝐷 ( 𝑎,𝑐 ) ∪ O | = 𝑇 ( 𝑐, 𝑎 ) , 𝐷 ( 𝑏,𝑐 ) ∪ O | = 𝑆 ( 𝑏, 𝑐 ) .We next show some properties of diversifications. For a database 𝐷 and a set 𝐴 ⊆ dom ( 𝐷 ) , we denote with 𝐷 ∩ 𝐴 the set of all facts 𝑅 ( a ) from 𝐷 such that a ∩ 𝐴 ≠ ∅ . Further on, we denote with 𝐷 𝑛 ∩ 𝐴 the database obtained from 𝐷 ∩ 𝐴 by replacing each constant 𝑎 in some fact 𝑅 ( a ) which is not from 𝐴 , with a fresh constant.For a function 𝑓 , we denote with ran ( 𝑓 ) the range of 𝑓 . If 𝑓 is ahomomorphism from a structure 𝐴 to a structure 𝐵 , 𝑓 ( 𝐴 ) denotesthe image of 𝐴 in 𝐵 under 𝑓 .The following technical lemma shows that given a homomor-phism ℎ from a CQ in an OMQ 𝑄 to the chase of a database ofthe form 𝐷 + 𝐷 ↑ , based on the target 𝐴 of ℎ in 𝐷 , it is possible tofind a pre-image of 𝐷 , 𝐷 𝑛 ∩ 𝐴 , which when extended with guardedunravelings of 𝐷 still entails the OMQ. Assuming that ( 𝐷, ↑) isnot a minimal diversification of 𝐷 with respect to 𝑄 , the construc-tion 𝐷 𝑛 ∩ 𝐴 is actually a way to find a weaker diversification of 𝐷 .This is a strategy which will be later used in the proof of Theo-rem 4.2 in Section 4. It also follows from this property that when ( 𝐷, ↑) is a minimal diversification of 𝐷 with respect to 𝑄 , ℎ alwayshits ker ( 𝐷 ) . This implies that 𝐷 , and subsequently 𝐷 + , have finitebounded size, with the bound in the size of 𝑄 . Lemma 3.10.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from ( GDLog , UCQ ) , 𝐷 and 𝐷 be S -databases, and ↑ be a homomorphism from 𝐷 to 𝐷 which is injective on guarded sets such that 𝐷 + 𝐷 ↑ | = 𝑄 . Further on,let ℎ be a homomorphism from a CQ 𝑝 in 𝑞 to chase (O , 𝐷 + 𝐷 ↑ ) andlet 𝐴 = ran ( ℎ ) ∩ dom ( 𝐷 ) . Then:(1) there exists a homomorphism ↓ from 𝐷 𝑛 ∩ 𝐴 to 𝐷 which is in-jective on guarded sets;(2) 𝑝 maps into chase (O , ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ ) , where ↓ is from Point (1);(3) if ( 𝐷, ↑) ∈ div ( 𝐷 , 𝑄 ) , then ker ( 𝐷 ) ⊆ 𝐴 ;(4) if ( 𝐷, ↑) ∈ div ( 𝐷 , 𝑄 ) , then there exists some computablefunction 𝜌 , such that | 𝐷 + | ≤ 𝜌 (| 𝑄 |) . Proof. (1) Let 𝑔 be a homomorphism from 𝐷 𝑛 ∩ 𝐴 to 𝐷 ∩ 𝐴 whichis the identity on 𝐴 and which maps every fresh constantfrom dom ( 𝐷 𝑛 ∩ 𝐴 ) \ 𝐴 to its counterpart in 𝐷 ∩ 𝐴 . By construc-tion of 𝐷 𝑛 ∩ 𝐴 , 𝑔 is injective on guarded sets. Let ↓ be ↑ ◦ 𝑔 . Thecomposition of two homomorphisms which are injective on fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case guarded sets is injective on guarded sets, thus ↓ is injectiveon guarded sets as well.(2) For every maximal guarded set a in dom ( 𝐷 𝑛 ∩ 𝐴 ) , a copy ofthe database 𝐷 ↓( a ) , has been added to 𝐷 𝑛 ∩ 𝐴 during the con-struction of ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ by renaming the constants in ↓ ( a ) as a . Let 𝐷 a be this database. A copy of the database 𝐷 ↓( a ) has been added also to 𝐷 during the construction of 𝐷 + 𝐷 ↑ by renaming the constants in ↓ ( a ) as 𝑔 ( a ) (all such renam-ings are well defined as 𝑔 , ↑ , and ↓ are injective on guardedsets). Let 𝐷 𝑔 ( a ) be this latter database.For every maximal guarded set a ′ in dom ( 𝐷 ) , for whichthere exists some guarded set a in dom ( 𝐷 𝑛 ∩ 𝐴 ) s.t. 𝑔 ( a ) = a ′ ,let 𝜅 a ′ be an isomorphism from 𝐷 a ′ to 𝐷 a which maps a ′ to a .Then, 𝜅 a ′ is an isomorphism also from chase (O , 𝐷 + 𝐷 ↑ ) | dom ( 𝐷 a ′ ) to chase (O , ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ ) | dom ( 𝐷 a ) .Let ℎ ′ be the mapping from var ( 𝑝 ) to dom (( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ ) de-fined as follows: ℎ ′ ( 𝑥 ) = 𝜅 a ′ ( ℎ ( 𝑥 )) , when ℎ ( 𝑥 ) ∈ dom ( 𝐷 a ′ ) (note that for ℎ ( 𝑥 ) ∈ 𝐴 , this implies ℎ ′ ( 𝑥 ) = ℎ ( 𝑥 ) ). Asthe isomorphisms 𝜅 a ′ extend to the corresponding parts ofthe chases, it follows that ℎ ′ is a homomorphism from 𝑝 to chase (O , ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ ) .(3) Assume the opposite: that ker ( 𝐷 ) * 𝐴 or, in other words, 𝐴 ∩ ker ( D ) ⊂ ker ( 𝐷 ) . From Point (2) above, we have that ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↓ | = 𝑄 . Let 𝐷 − be the core of 𝐷 𝑛 ∩ 𝐴 . It is still thecase that ( 𝐷 − ) + 𝐷 ↓ | = 𝑄 . As ker ( 𝐷 𝑛 ∩ 𝐴 ) ⊆ 𝐴 ∩ ker ( 𝐷 ) (by con-struction), ker ( 𝐷 − ) ⊆ 𝐴 ∩ ker ( 𝐷 ) . Thus ker ( 𝐷 − ) ⊂ ker ( 𝐷 ) .Then, 𝐷 − ≺ 𝐷 , but 𝐷 ⊀ 𝐷 − . But then, ( 𝐷, ↑) ∉ div ( 𝐷 , 𝑄 ) –contradiction.(4) It is the case that | 𝐴 | is bounded by the maximal size of someCQ in 𝑞 . From Point (3) above, we know that ker ( 𝐷 ) ⊆ 𝐴 ,and thus | ker ( 𝐷 )| ≤ | 𝑄 | . Then, there exists a computablefunction 𝜏 such that | 𝐷 | ≤ 𝜏 (| 𝑄 |) . As for every database 𝐷 a with a a guarded set in 𝐷 , there exists a computable func-tion 𝜏 a such that | 𝐷 a | ≤ 𝜏 a (| 𝑄 |) , there exists a computablefunction 𝜌 such that | 𝐷 + | ≤ 𝜌 (| 𝑄 |) . (cid:3) In this section we put together the notions introduced in previoussubsections to define the set of characteristic databases of an OMQ.We start with some notations. For a set of databases D , we saythat D is reduced with respect to homomorphisms if it contains onlycores and for every two databases 𝐷 , 𝐷 ∈ D it is not the casethat 𝐷 → 𝐷 . Given two sets of databases D and D ′ , we say that D is equivalent to D ′ , D ≡ D ′ , if for every database 𝐷 ∈ D , thereexists a database 𝐷 ′ ∈ D ′ such that 𝐷 is equivalent with 𝐷 ′ , andvice versa. Note that for every set of databases D , there exists a setof databases D ′ which is reduced with respect to homomorphismsand which is equivalent to D and furthermore, each such set isunique up to isomorphism.For an OMQ 𝑄 from (GDLog, UCQ), we denote with D ++ 𝑄 theset of databases 𝐷 + for which there exists a database 𝐷 which has the io property with respect to 𝑄 and a function ↑ such that ( 𝐷, ↑) ∈ div ( 𝐷 , 𝑄 ) . We also denote with D + 𝑄 a set of databaseswhich is equivalent to D ++ 𝑄 and which is reduced with respect tohomomorphisms. We can finally provide our definition for charac-teristic databases. Definition 3.11.
Let 𝑄 be an OMQ from (GDLog, UCQ). The set ofcharacteristic databases for 𝑄 , denoted D 𝑄 , is the set of all databases 𝐷 for which there exists a database 𝐷 ′ from D + 𝑄 and a function ↑ such that ( 𝐷, ↑) ∈ div ( 𝐷 ′ , 𝑄 ) . Example 3.12.
Let Q be the class of OMQs from Example 3.1. Forevery 𝑖 > , let 𝐷 𝑖, be the S 𝑖 -database: { 𝑆 𝑖 ( a 𝑖 ) ,𝑇 ( 𝑎 , 𝑎 ) , . . .𝑇 ( 𝑎 𝑖 − , 𝑎 𝑖 )} We have that 𝐷 𝑖, | = 𝑄 𝑖 and furthermore, 𝐷 𝑖, has the io propertywith respect to 𝑄 𝑖 : there is exactly one injective homomorphism ℎ from 𝑞 𝑖 to chase (O 𝑖 , 𝐷 𝑖, ) with ℎ ( x 𝑖 ) = a 𝑖 .Let 𝐷 𝑖 be the S 𝑖 -database { 𝑆 ( a 𝑖 )} . It can be checked that ( 𝐷 𝑖 , ↑) ∈ div ( 𝐷 𝑖, , 𝑄 ) , where ↑ is the identity function on a 𝑖 . Thus, 𝐷 + 𝑖 ∈ D ++ 𝑄 𝑖 . At the same time, 𝐷 𝑖, ⊆ 𝐷 + 𝑖 (for any choice of guarded unrav-elings of 𝐷 𝑖, which are added to 𝐷 𝑖 as long as we still want for 𝐷 + 𝑖 to entail 𝑄 ) and 𝐷 + 𝑖 → 𝐷 𝑖, . Thus, 𝐷 𝑖, and 𝐷 + 𝑖 are equivalent andthere must be some structure 𝐷 ′ ∈ D + 𝑄 𝑖 such that 𝐷 ′ → 𝐷 𝑖, . At thesame time we observe that any database 𝐷 ′′ which entails 𝑄 𝑖 hasthe property that 𝐷 𝑖, → 𝐷 ′′ , and as 𝐷 𝑖, is a core, it must be iso-morphic to 𝐷 ′ . Thus, D + 𝑄 𝑖 = { 𝐷 𝑖, } . As ( 𝐷 𝑖 , ↑) ∈ div ( 𝐷 𝑖, , 𝑄 ) , it fol-lows that 𝐷 𝑖 ∈ D 𝑄 𝑖 . There are also other minimal diversificationsof 𝐷 𝑖, with respect to 𝑄 : actually each singleton set { 𝑇 ( 𝑥 𝑗 , 𝑥 𝑘 )} ,with 𝑗 ≠ 𝑘 , together with the identity mapping induces such adiversication. Thus, D 𝑞 𝑖 will contain these singleton sets as well.It might seem surprising that we define the set of characteristicdatabases by successive refinements, in which we use recurse twiceto minimal diversifications, first when defining D ++ 𝑄 , and secondwhen defining D 𝑄 . However, the minimization step incurring inthe definition of D + 𝑄 plays an important role in ‘standardizing’ ourdatabases. This will become more obvious in the following. Sup-pose we are interested to know whether the notion of characteris-tic databases is semantic, i.e. whether two equivalent OMQs haveisomorphic sets of characteristic databases. As initially in our con-structions we quantify over the set of all databases which have theio property, we first have a look at whether this notion is semantic.As the following example shows, the answer is negative: Example 3.13.
Let 𝑄 = (O , S , 𝑞 ) and 𝑄 = (O , S , 𝑞 ) be thefollowing two equivalent OMQs: O = { 𝑅 ( 𝑥, 𝑦 ) → 𝐴 ( 𝑥 )} S = { 𝑅 } 𝑞 = ∃ 𝑥 𝐴 ( 𝑥 )O = ∅ S = { 𝑅 } 𝑞 = ∃ 𝑥, 𝑦 𝑅 ( 𝑥, 𝑦 ) Also, let 𝐷 = { 𝑅 ( 𝑎, 𝑏 )} and 𝐷 = { 𝑅 ( 𝑎, 𝑎 )} be two S -databases. Itcan be verified that 𝐷 has the io property with respect to both 𝑄 and 𝑄 , but 𝐷 has the io property only with respect to 𝑄 and notwith respect to 𝑄 .However, it turns out that the answer to our original questionis positive, as it is shown in the following lemma. This is wherethe minimization step involved in the definition of D + 𝑄 comes into ristina Feier play. For two sets of databases D and D , we say that they are equi-isomorphic and we write 𝐷 ∼ 𝐷 , if for every database 𝐷 ∈ D ,there exists an isomorphic database 𝐷 ∈ D , and vice versa. Lemma 3.14.
Let 𝑄 and 𝑄 be two OMQs from (GDLog, UCQ)such that 𝑄 ≡ 𝑄 . Then, D + 𝑄 ∼ D + 𝑄 , and D 𝑄 ∼ D 𝑄 . Proof.
We first show that D + 𝑄 ∼ D + 𝑄 : it is enough to show thatfor every database 𝐷 ∈ D + 𝑄 , there exists some database 𝐷 ∈ D + 𝑄 such that 𝐷 and 𝐷 are isomorphic.As 𝐷 ∈ D + 𝑄 , 𝐷 must have the io property with respect to 𝑄 .However, while 𝐷 | = 𝑄 , according to Example 3.13, it is not guar-anteed that it also has the io property with respect to 𝑄 . Assum-ing this is not the case, there must be another database 𝐷 → 𝐷 which has the io property with respect to both 𝑄 and 𝑄 . Further-more, as 𝐷 | = 𝑄 , there must be a database 𝐷 → 𝐷 such that 𝐷 ∈ D + 𝑄 . Then, 𝐷 | = 𝑄 and there must be a database 𝐷 → 𝐷 such that 𝐷 ∈ D + 𝑄 . Thus, 𝐷 → 𝐷 and both are from D + 𝑄 . Then, 𝐷 and 𝐷 must be the same database. As 𝐷 → 𝐷 → 𝐷 and both 𝐷 and 𝐷 are cores, it follows that 𝐷 and 𝐷 are isomorphic. But 𝐷 ∈ D + 𝑄 , which is what we wanted to show.Thus, we have shown that D + 𝑄 ∼ D + 𝑄 . It follows then from thedefinition of sets of characteristic databases that D 𝑄 ∼ D 𝑄 . (cid:3) In the following, for an OMQ 𝑄 from (GDLog, UCQ), we will usethe set of characteristic databases for 𝑄 (in fact D + 𝑄 ) to constructa new OMQ 𝑄 𝑐 from (GDLog, UCQ) which is equivalent to 𝑄 andwhich is called the cover of 𝑄 . To construct the UCQ 𝑞 𝑐 from thecover we join CQs from the original UCQ 𝑞 with extensions ofdatabases from D 𝑄 . Definition 3.15.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from (GDLog,UCQ). For every database 𝐷 ∈ D + 𝑄 , every ( 𝐷, ↑) ∈ div ( 𝐷 , 𝑄 ) , ev-ery contraction 𝑝 of some CQ in 𝑞 , such that 𝐷 [ 𝑝 ] → 𝑖𝑜 chase (O , 𝐷 + ) ,and every (injective) homomorphism ℎ from 𝐷 [ 𝑝 ] to chase (O , 𝐷 + ) ,let 𝑝 ′ be a new Boolean CQ obtained from 𝑝 as follows:(1) let 𝐴 = ran ( ℎ ) ;(2) let 𝐷 ′ be a copy of ( 𝐷 + ) ∩ 𝐴 in which each constant 𝑎 ∈ 𝐴 isreplaced with its pre-image in 𝐷 [ 𝑝 ] , variable ℎ − ( 𝑎 ) ;(3) add to 𝑝 all atoms from 𝐷 ′ .Let 𝑞 𝑐 be the union of all CQs 𝑝 ′ as above and let 𝑄 𝑐 be the OMQ (O , S , 𝑞 𝑐 ) . We say that 𝑄 𝑐 is the cover of 𝑄 .Intuitively, when constructing the cover of an OMQ, one concen-trates on CQs which map i.o. into the chase (as these are the onlyones which are relevant from a complexity point of view) and fur-thermore, adds to each atom in such a CQ the potential guard froman extended characteristic database which allowed the atom to bededuced (for simplicity of exposition and proofs, we also add someextra atoms from this database, but these are not strictly needed). Example 3.16.
Let Q be again the class of OMQs introduced inExample 3.1. In Example 3.12 we have shown that for every 𝑖 > , D + 𝑄 𝑖 , is the singleton set { 𝐷 𝑖, } , with 𝐷 𝑖, being the S 𝑖 -database: { 𝑆 𝑖 ( a 𝑖 ) ,𝑇 ( 𝑎 , 𝑎 ) , . . . 𝑇 ( 𝑎 𝑖 − , 𝑎 𝑖 )} . It also follows from there, thatfor every diversification ( 𝐷, ↑) of 𝐷 𝑖, , 𝐷 𝑖, ⊆ 𝐷 + 𝑖 . For every such database 𝐷 + 𝑖 then, there exists some injective homomorphism ℎ from 𝑞 𝑖 to chase (O 𝑖 , 𝐷 + 𝑖 ) which maps x 𝑖 into a 𝑖 . We have that ( 𝐷 + 𝑖 ) ∩ a 𝑖 = 𝐷 𝑖, . Then, we construct a CQ 𝑞 ′ 𝑖 belonging to the coverof 𝑄 𝑖 by adding to 𝑞 𝑖 the atoms from 𝐷 𝑖, where each constantfrom a 𝑖 is replaced with the corresponding variable from x 𝑖 . Wecan show that all other CQs obtained via this procedure are sub-sumed by 𝑞 ′ 𝑖 . We define 𝑄 𝑖,𝑐 = (O 𝑖 , S 𝑖 , 𝑞 ′ 𝑖 ) as the cover of 𝑄 𝑖 andobserve that it is identical to the OMQ 𝑄 ′ 𝑖 from Example 3.1.As next lemma shows, covers are equivalent to original OMQs:. Lemma 3.17.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from ( GDLog , UCQ ) and let 𝑄 𝑐 = (O , S , 𝑞 𝑐 ) be its cover. Then, 𝑄 ≡ 𝑄 𝑐 . Proof.
It is clear that 𝑄 𝑐 ⊆ 𝑄 . We show that 𝑄 ⊆ 𝑄 𝑐 . Let 𝐷 besome S -database such that 𝐷 | = 𝑄 . Then, there exists a database 𝐷 such that 𝐷 → 𝐷 and 𝐷 has the io property with respect to 𝑄 . Furthermore, there exists a diversification ( 𝐷 , ↑) ∈ div ( 𝐷 , 𝑄 ) .Thus, 𝐷 + ∈ D ++ 𝑄 , and there exists a database 𝐷 ∈ D + 𝑄 with 𝐷 → 𝐷 + . Further on, there exists a diversification ( 𝐷 , ↓) ∈ div ( 𝐷 , 𝑄 ) .Then, there exists some contraction 𝑝 of some CQ in 𝑞 such that 𝐷 [ 𝑝 ] → 𝑖𝑜 chase (O , 𝐷 + ) . Let ℎ be an injective homomorphismfrom 𝐷 [ 𝑝 ] to chase (O , 𝐷 + ) and let 𝐴 = ran ( ℎ ) . By adding a copyof ( 𝐷 + ) ∩ 𝐴 to 𝑝 (where the constants from 𝐴 have been renamed asvariables from ℎ − ( 𝐴 ) ) we obtain a CQ 𝑝 𝑐 in 𝑞 𝑐 . It is easy to verifythat 𝑝 𝑐 maps into chase (O , 𝐷 + ) via the homomorphism ℎ . Thus, 𝐷 + | = 𝑄 𝑐 . As 𝐷 + → 𝐷 → 𝐷 + → 𝐷 → 𝐷 , it follows that 𝐷 | = 𝑄 𝑐 . (cid:3) Lemma 3.18.
For every OMQ 𝑄 ∈ ( GDLog , UCQ ) , 𝑄 𝑐 is com-putable. Proof.
It is enough to show that D + 𝑄 is computable, or in otherwords, that it is decidable for some database 𝐷 whether 𝐷 ∈ D + 𝑄 (up to isomorphism). Note that, from Lemma 3.10 Point (4) weknow that the size of the domain of every database from D ++ 𝑄 , andsubsequently the size of the domain of every database from D + 𝑄 , isbounded in the size of 𝑄 . Let 𝑘 be the bound in our particular case,let 𝑆 be a set of 𝑘 fresh distinct constants, and let D 𝑄,𝑘 be the setof all S -databases 𝐷 with dom ( 𝐷 ) ⊆ 𝑆 for which 𝐷 | = 𝑄 . We showthat the databases from D 𝑄,𝑘 which are minimal with respect tothe homomorphism order and which are cores are (up to isomor-phism) exactly those from D + 𝑄 . Let 𝐷 be a hom-minimal databasefrom D 𝑄,𝑘 , that is for every database 𝐷 ′ ∈ D 𝑄,𝑘 , 𝐷 ′ → 𝐷 implies 𝐷 → 𝐷 ′ , which furthermore is a core. Then, there exists a database 𝐷 which has the io property with respect to 𝑄 such that 𝐷 → 𝐷 .There must also be some core database 𝐷 such that 𝐷 → 𝐷 and 𝐷 ∈ D + 𝑄 . But then, 𝐷 → 𝐷 . As both 𝐷 and 𝐷 are cores, theymust be isomorphic, thus 𝐷 ∈ D + 𝑄 (up to isomorphism). (cid:3) We defer to Section 4 the proof that covers are syntactic wit-nesses for bounded semantic submodular width for classes of OMQsfrom (GDLog, UCQ). fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case
In this section we establish the main results of our paper: a syntac-tic characterization for fixed-parameter tractability of OMQs from ( GDLog , UCQ ) , a reduction from parameterized uniform CSPs toparameterized OMQs from ( GDLog , UCQ ) , and a semantic charac-terization for fixed-parameter tractability of OMQs from ( GTGD , UCQ ) .We conclude the section with a discussion concerning covers ofOMQs from ( GDLog , UCQ ) : we show that they are indeed wit-nesses for bounded semantic submodular width as anticipated inthe previous section. For a class Q of OMQs from ( GDLog , UCQ ) , we denote with D Q the class of characteristic databases for OMQs from Q and with Q 𝑐 the class of covers for OMQs from Q . The first main result whichwe show in this section is as follows: Theorem 4.1 (Main Result 3).
Let Q be a recursively enumer-able class of OMQs from ( GDLog , UCQ ) . Under the Exponential TimeHypothesis, the following are equivalent:(1) p-OMQ ( Q ) is fixed-parameter tractable iff(2) Q 𝑐 has bounded sub-modular width iff(3) D Q has bounded sub-modular width. Towards showing the above mentioned result, the following fpt-reduction from evaluation of parameterized CSP to evaluation ofparameterized OMQs from ( GDLog , UCQ ) plays a very importantrole. In fact, as we will see in Section 5, the reduction paves the wayalso for other results, including retrieving results from [4] and [3]. Theorem 4.2 (Main Result 2).
For 𝑄 an OMQ from (GDLog,UCQ), there exists an fpt-reduction from p-CSP ( D 𝑄 , _ ) to p-OMQ ( 𝑄 ) . Proof.
Let ( 𝐷 𝑘 , 𝐵 ) be an instance of p-CSP ( D 𝑄 , _ ) . As 𝐷 𝑘 ∈ D 𝑄 , there exists a database 𝐷 ∈ D + 𝑄 which has the io propertywith respect to 𝑄 and some minimal diversification ( 𝐷 𝑘 , ↑) of 𝐷 with respect to 𝑄 . We construct a new database 𝐷 + as follows: first,let 𝐷 = 𝐷 𝑘 × 𝐵 and let 𝜋 be the projection on the first componentfrom 𝐷 to 𝐷 𝑘 . We drop from 𝐷 all atoms 𝑅 ( a ) for which 𝜋 | a is notinjective. At this stage, 𝐷 maps into 𝐷 via a homomorphism 𝜋 ◦ ↑ which is injective on guarded sets. Let 𝐷 + be a shortcut for 𝐷 + 𝐷 𝜋 ◦↑ and let 𝜋 + be the natural extension of 𝜋 as a homomorphism from 𝐷 + to 𝐷 + 𝑘 . Then, 𝜋 + is a homomorphism also from chase (O , 𝐷 + ) to chase (O , 𝐷 + 𝑘 ) . In the following, we show that 𝐷 𝑘 → 𝐵 iff 𝐷 + | = 𝑄 . Note that there exists some computable function 𝑓 such that 𝐷 + can be constructed from 𝐷 𝑘 and 𝐵 in time 𝑓 (| 𝐷 𝑘 |) 𝑝𝑜𝑙𝑦 (| 𝐵 |) .‘ ⇒ ’: Assume that 𝐷 + | = 𝑄 . Then, there exists a contraction 𝑝 of a CQ from 𝑞 such that 𝐷 [ 𝑝 ] → 𝑖𝑜 chase (O , 𝐷 + ) . Let ℎ be ahomomorphism from 𝐷 [ 𝑝 ] to chase (O , 𝐷 + ) and let 𝐴 = ran ( ℎ ) ∩ dom ( 𝐷 ) . Further on, we have that 𝜋 + ◦ ℎ is a homomorphism from 𝐷 [ 𝑝 ] to chase (O , 𝐷 + 𝑘 ) . As 𝐷 + 𝑘 → 𝐷 , 𝐷 + 𝑘 has the io property withrespect to 𝑄 , and thus 𝜋 + ◦ ℎ must be injective. To see why it isthe case, assume the opposite: then, there exists a contraction 𝑝 ′ of 𝑝 such that 𝐷 [ 𝑝 ′ ] → 𝑖𝑜 chase (O , 𝐷 + 𝑘 ) and 𝐷 [ 𝑝 ′ ] 6→ 𝐷 [ 𝑝 ] . Thus, 𝑝 is a pre-contraction of 𝑝 ′ and as 𝐷 + → 𝐷 + 𝑘 , the io condition for 𝐷 + 𝑘 is not fulfilled. As 𝐴 ⊆ ran ( ℎ ) , it follows that ( 𝜋 + )| 𝐴 is injective.Also, as 𝐴 ⊆ dom ( 𝐷 ) , it follows that 𝜋 | 𝐴 is injective. Let 𝐷 ′ = 𝐷 𝑛 ∩ 𝐴 . Then there exists a homomorphism 𝑔 from 𝐷 ′ to 𝐷 that is injective on guarded sets and that is the identity on 𝐴 . We have that 𝜋 ◦ 𝑔 is a homomorphism from 𝐷 ′ to 𝐷 𝑘 which isinjective on guarded sets and that is injective on 𝐴 . As ker ( 𝐷 ′ ) ⊆ 𝐴 ,it follows that ( 𝐷 ′ , 𝜋 ◦ 𝑔 ) is a diversification of 𝐷 𝑘 (actually the coreof 𝐷 ′ ) or 𝐷 ′ (cid:22) 𝐷 𝑘 and by transitivity 𝐷 ′ (cid:22) 𝐷 . From Lemma 3.10Point (2) we have that ( 𝐷 𝑛 ∩ 𝐴 ) + 𝐷 ↑◦ 𝜋 ◦ 𝑔 | = 𝑄 . As 𝐷 ′ (cid:22) 𝐷 𝑘 and 𝐷 𝑘 ∈ div ( 𝐷 , 𝑄 ) , it must be the case that 𝐷 𝑘 (cid:22) 𝐷 ′ , thus 𝐷 𝑘 → 𝐷 ′ → 𝐷 → 𝐵 .‘ ⇐ ’: Assume that 𝐷 𝑘 → 𝐵 . Then, 𝐷 𝑘 maps into 𝐷 𝑘 × 𝐵 via somehomomorphism ℎ . At the same time 𝐷 𝑘 × 𝐵 maps into 𝐷 𝑘 via theprojection mapping 𝜋 . Thus, 𝜋 ◦ ℎ is a homomorphism from 𝐷 𝑘 to itself. As 𝐷 𝑘 is a core, 𝜋 ◦ ℎ must be injective and surjective.If 𝐴 = ran ( ℎ ) , it must be the case that 𝜋 | 𝐴 is injective. Thus, therestriction of 𝐷 𝑘 × 𝐵 to 𝐴 , ( 𝐷 𝑘 × 𝐵 )| 𝐴 is a sub-structure of 𝐷 , thestructure obtained from 𝐷 𝑘 × 𝐵 by removing all facts 𝑟 ( a ) such that 𝜋 | a is not injective. Thus, ℎ is a homomorphism from 𝐷 𝑘 to 𝐷 , or 𝐷 + 𝑘 → 𝐷 + . As 𝐷 + 𝑘 | = 𝑄 , it follows that 𝐷 + | = 𝑄 . (cid:3) As explained in the Introduction, the following is known aboutcomplexity of evaluating CSPs of unbounded arity:
Theorem 4.3 (Theorem 1, [17]).
Let C be a recursively enumer-able class of structures. Assuming the Exponential Time Hypothesis,p-CSP ( C , _ ) is fixed-parameter tractable if and only if C has boundedsemantic submodular width. Furthermore, also in [17] it is shown that semantic submodularwidth of a structure is witnessed by its core. For a class of OMQs Q , all the structures in D Q are cores. Thus, assuming that D Q hasunbounded semantic submodular width, by [17] we know that ithas unbounded submodular width, by Theorem 4.3 we know thatit cannot be evaluated in FPT and by our reduction from Theo-rem 4.2, we obtain that p-OMQ ( Q ) cannot be evaluated in FPTeither – which is the counter-positive of direction ‘ ⇒ ’ of The-orem 4.1.We next establish direction ‘ ⇒ ’ of Theorem 4.1, i.e. thatbounded submodular width for D Q implies bounded submodularwidth for Q 𝑐 . We actually show a stronger result, mainly that thesubmodular width of D Q is equal to the submodular width of Q 𝑐 : Lemma 4.4.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from ( GDLog , 𝑈𝐶𝑄 ) and 𝑄 𝑐 = (O , S , 𝑞 𝑐 ) be its cover. Then, SMW ( 𝑄 𝑐 ) = SMW ( D 𝑄 ) . Proof.
We show that for every CQ in 𝑞 𝑐 there exists a databasein D 𝑄 with the same submodular width and vice versa. Let 𝑝 ′ besuch a CQ. Then 𝐷 [ 𝑝 ′ ] is of the form 𝐷 [ 𝑝 ] ∪ 𝐷 ′ , where 𝑝 is aCQ in 𝑞 and 𝐷 ′ is of the form ( 𝐷 + ) ∩ var ( 𝑝 ) , with ker ( D ) ⊆ var ( 𝑝 ) for some database 𝐷 ∈ D 𝑄 and extension 𝐷 + : this is due to thefact that 𝐷 [ 𝑝 ] → 𝑖𝑜 chase (O , 𝐷 ) and there exists a minimal diver-sification ( 𝐷, ↑) of some database 𝐷 ′′ from D + 𝑄 . Then by applyingLemma 3.10 Point (3) we obtain that ker ( 𝐷 ) ⊆ var ( 𝑝 ) . Conversely,for every 𝐷 ∈ D Q , there exists such a CQ 𝑝 ′ in 𝑞 𝑐 , and thus itenough to show that: SMW ( 𝐷 [ 𝑝 ′ ]) = SMW ( 𝐷 ) .By construction and guardedness, for every atom 𝑅 ( a ) in 𝐷 [ 𝑝 ] ,there exists an atom 𝑆 ( b ) in 𝐷 ′ such that a ⊆ b . Thus, 𝐷 [ 𝑝 ′ ] and 𝐷 ′ have exactly the same maximal guarded sets, and then it is easyto see that SMW ( 𝐷 [ 𝑝 ′ ]) = SMW ( 𝐷 ′ ) . Thus, in the following wewill show that SMW ( 𝐷 ) = SMW ( 𝐷 ′ ) . ristina Feier ‘ SMW ( 𝐷 ) ≤ SMW ( 𝐷 ′ ) ’: Assume SMW ( 𝐷 ′ ) = 𝑘 . Let 𝑓 be apositive monotone sub-modular function which is edge dominatedwith respect to 𝐷 . Also let 𝑓 ′ be the following extension of 𝑓 on dom ( 𝐷 ′ ) : 𝑓 ′ ( 𝑆 ) = 𝑓 ( 𝑆 ∩ dom ( 𝐷 )) , for every 𝑆 ⊆ dom ( 𝐷 ′ ) . That 𝑓 ′ is positive, monotone, and submodular follows immmediatelyfrom the fact that 𝑓 is so. To see that 𝑓 ′ is edge dominated, weobserve that for every maximal guarded set a ′ in 𝐷 ′ , the set b = a ′ ∩ dom ( 𝐷 ) can be extended to a maximal guarded set a over 𝐷 , andthus 𝑓 ′ ( a ′ ) = 𝑓 ( b ) , where by monotonicity 𝑓 ( b ) ≤ 𝑓 ( a ) and byedge-domination, 𝑓 ( a ) ≤
1, and thus 𝑓 ′ ( a ′ ) ≤
1. As
SMW ( 𝐷 ′ ) = 𝑘 , there must be some tree decomposition ( 𝑇 ′ 𝛿 , 𝜒 ′ ) of 𝐻 𝐷 ′ , with 𝑇 ′ 𝛿 = ( 𝑉 ′ 𝛿 , 𝐸 ′ 𝛿 ) such that 𝑓 ′ ( 𝜒 ′ ( 𝑡 )) ≤ 𝑘 for all 𝑡 ∈ 𝑉 ′ 𝛿 . Let ( 𝑇 𝛿 , 𝜒 ) ,with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) be the projection of the tree decomposition ( 𝑇 ′ 𝛿 , 𝜒 ′ ) on dom ( 𝐷 ) , i.e.:(1) 𝑉 𝛿 contains all 𝑡 ∈ 𝑉 ′ 𝛿 for which 𝜒 ′ ( 𝑡 ) ∩ dom ( 𝐷 ) ≠ ∅ ;(2) 𝐸 𝛿 contains all pairs ( 𝑡 , 𝑡 𝑛 ) , for which there exists a path ( 𝑡 , . . . , 𝑡 𝑛 ) in 𝑇 ′ 𝛿 , i.e. ( 𝑡 𝑖 , 𝑡 𝑖 + ) ∈ 𝐸 ′ 𝛿 , for every 𝑖 ∈ [ 𝑛 − ] ,such that 𝑡 , 𝑡 𝑛 ∈ 𝑉 𝛿 and 𝑡 𝑗 ∉ 𝑉 𝛿 , for 𝑗 ∉ { , 𝑛 } ;(3) for all 𝑡 ∈ 𝑉 𝛿 : 𝜒 ( 𝑡 ) = 𝜒 ′ ( 𝑡 ) ∩ dom ( 𝐷 ) .It can be verified that as 𝐻 𝐷 is connected, and is a sub-hypergraphof 𝐻 𝐷 ′ , ( 𝑇 𝛿 , 𝜒 ) is a tree decomposition of 𝐻 𝐷 . Furthermore, forevery 𝑡 ∈ 𝑉 𝛿 , 𝑓 ( 𝜒 ( 𝑡 )) = 𝑓 ( 𝜒 ′ ( 𝑡 ) ∩ dom ( 𝐷 )) = 𝑓 ′ ( 𝜒 ′ ( 𝑡 )) . As 𝑓 ′ ( 𝜒 ′ ( 𝑡 )) ≤ 𝑘 , it follows that 𝑓 ( 𝜒 ( 𝑡 )) ≤ 𝑘 .‘ SMW ( 𝐷 ′ ) ≤ SMW ( 𝐷 ) ’: Let SMW ( 𝐷 ) = 𝑘 and 𝑓 be a positivemonotone sub-modular function on 2 dom ( 𝐷 ′ ) which is edge dom-inated with respect to 𝐷 ′ . Its restriction 𝑓 ′ to 2 dom ( 𝐷 ) is positivemonotone sub-modular and edge dominated with respect to 𝐷 . As SMW ( 𝐷 ) = 𝑘 , there must be some tree decomposition ( 𝑇 𝛿 , 𝜒 ) of 𝐻 𝐷 , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) such that 𝑓 ( 𝜒 ( 𝑡 )) ≤ 𝑘 for all 𝑡 ∈ 𝑉 𝛿 .The tree decomposition ( 𝑇 𝛿 , 𝜒 ) of 𝐻 𝐷 can be extended to a de-composition ( 𝑇 + 𝛿 , 𝜒 + ) of 𝐻 𝐷 + , with 𝑇 + 𝛿 = ( 𝑉 + 𝛿 , 𝐸 + 𝛿 ) , by exploitingthe fact that 𝐷 + was constructed from 𝐷 by adding copies of unrav-elled databases of the form 𝐷 a to 𝐷 . Note that each such unravelleddatabase admits a tree decomposition ( 𝑇 a , 𝜒 a ) , with 𝑇 a = ( 𝑉 a , 𝐸 a ) having as root a , such that 𝜒 a ( a ) = a and 𝜒 ( 𝑡 ) is a guarded set in 𝐷 + , for every 𝑡 ∈ 𝑉 a . To obtain 𝑇 + 𝛿 , we glue the trees 𝑇 a to 𝑇 , byconsidering some node 𝑡 a ∈ 𝑇 such that a ⊆ 𝜒 ( 𝑡 a ) and identifyingit with a . Then 𝑉 + 𝛿 = 𝑉 𝛿 ∪ Ð 𝑉 a , and 𝐸 + 𝛿 = 𝐸 𝛿 ∪ Ð 𝐸 a and 𝜒 + is theextension of 𝜒 on V + 𝛿 that agrees with 𝜒 a on nodes from 𝑉 a .At the same time 𝐷 ′ ⊆ 𝐷 + , and thus it is possible to projectthe tree decomposition ( 𝑇 + 𝛿 , 𝜒 + ) on dom ( 𝐷 ′ ) and we obtain a treedecomposition ( 𝑇 ′ 𝛿 , 𝜒 ′ ) of 𝐷 ′ . As concerns the cost with respect to 𝑓 of the decomposition ( 𝑇 ′ 𝛿 , 𝜒 ′ ) , for every 𝑡 ∈ 𝑉 ′ 𝛿 \ 𝑉 𝛿 , 𝑓 ( 𝜒 ( 𝑡 )) = 𝑓 ( 𝜒 a ( 𝑡 )) , for some guarded set a over 𝐷 ′ and thus due to edgedomination: 𝑓 ( 𝜒 ( 𝑡 )) ≤
1. Thus, 𝑓 ( 𝜒 ′ ( 𝑡 )) ≤ max ( , 𝑘 ) = 𝑘 , forevery 𝑡 ∈ 𝑉 ′ 𝛿 , and thus SMW ( 𝐷 ′ ) ≤ 𝑘 . (cid:3) We next show the fpt upper bound, i.e. direction ‘2 ⇒
1’ ofTheorem 4.1: if Q 𝑐 has bounded SMW, then, p-OMQ ( Q ) is fixed-parameter tractable. The result follows from the fact that for anOMQ 𝑄 , 𝑄 𝑐 is computable from 𝑄 (Lemma 3.18) and the followinglemma (which concerns also OMQs based on GTGDs): Lemma 4.5.
Let 𝑄 be an OMQs from (GTGD, UCQ) of boundedsubmodular width. Then, p-OMQ ( 𝑄 ) is fixed-parameter tractable. Proof.
We will use some results from [3]. Let L be the languageof linear TGDS , i.e. TGDs which contain only one atom in the body.Given an OMQ 𝑄 = (O , S , 𝑞 ) from (GTGD, UCQ), and a database 𝐷 , in Lemma A.3 from [3], it is shown how to construct in FPT another OMQ (O ∗ , S ∗ , 𝑞 ) from ( L , UCQ) and a database 𝐷 ∗ suchthat 𝐷 | = 𝑄 iff 𝐷 ∗ | = 𝑄 ∗ . Then, Lemma A.1 from [3] shows thatdeciding whether 𝐷 ∗ | = 𝑄 ∗ can be done by considering a finiteportion of chase (O ∗ , 𝐷 ∗ ) , which again can be computed in FPT.Thus, 𝐷 | = 𝑄 iff 𝐷 ′ | = 𝑞 , where 𝐷 ′ is a database which can becomputed in FPT. Thus, assuming that 𝑞 has bounded submodularwidth, according to Theorem 4.3, deciding whether 𝐷 ′ | = 𝑞 is inFPT as well. (cid:3) The main result which we will show in this section is as follows.
Theorem 4.6 (Main Result 1).
Let Q be a recursively enumer-able class of OMQs from ( GTGD , UCQ ) . Then, under the ExponentialTime Hypothesis, Q has bounded semantic submodular width iff Q is fixed-parameter tractable. Proof.
For two classes of OMQs Q and Q , we say that Q ≡ Q iff for every OMQ 𝑄 ∈ Q there exists an OMQ 𝑄 ∈ Q such that 𝑄 ≡ 𝑄 and vice versa. We will also use the fact thatfor every OMQ 𝑄 ∈ ( GTGD , UCQ ) , there exists an OMQ 𝑄 ′ ∈( GDLog , UCQ ) , the existential rewriting of 𝑄 , such that 𝑄 ≡ 𝑄 ′ [3].‘ ⇒ ’: Assume that Q has bounded semantic submodular width.Thus, there exists some class Q 𝑘 of OMQs of bounded submodularwidth such that Q ≡ Q 𝑘 . Let Q ′ and Q ′ 𝑘 be the classes of existentialrewritings of OMQs from Q and Q 𝑘 , respectively. Then, Q ≡ Q ′ ≡ Q 𝑘 ≡ Q ′ 𝑘 and both Q ′ and Q ′ 𝑘 are from (GDLog, UCQ).From Lemma 4.5, we know that Q 𝑘 can be evaluated in FPT , andthus Q ′ 𝑘 can be evaluated in FPT as well. From Theorem 4.1, D Q ′ 𝑘 has bounded submodular width. Also, from Lemma 3.14, given that Q ′ ≡ Q ′ 𝑘 , we know that D ( Q ′ 𝑘 ) ∼ D Q ′ . Thus, D Q ′ has boundedsubmodular width as well, and according to Theorem 4.1, Q ′ canbe evaluated in FPT. Thus, so can Q , as there exists an fpt reductionfrom Q to Q ′ .‘ ⇐ ’: If Q can be evaluated in FPT , then so can the class Q ′ com-posed of the existential rewritings of OMQs from Q . But then, as Q ′ ⊆ ( GDLog , UCQ ) , according to Theorem 4.1, the class Q ′ 𝑐 ofcovers of OMQs from Q ′ has bounded submodular width, and fromthe fact that Q ≡ Q ′ and Q ′ ≡ Q ′ 𝑐 , it follows that Q has boundedsemantic submodular width. (cid:3) In Section 3 we anticipated that covers of OMQs can serve as wit-nesses for semantic submodular width. In this section, we makethis more precise. First, we obtain as a corollary of Theorem 4.1and Theorem 4.6 that:
Corollary 4.7.
Let Q be a class of OMQs from (GDLog, UCQ).Then, Q has bounded semantic submodular width iff its class of covers Q 𝑐 has bounded submodular width. We next show that covers indeed lower submodular width: fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case
Lemma 4.8.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from (GDLog, UCQ)and 𝑄 𝑐 be the cover of 𝑄 . Then, SMW ( 𝑄 𝑐 ) ≤ SMW ( 𝑄 ) . Proof.
From Lemma 4.4 we know that
SMW ( 𝑄 𝑐 ) = SMW ( D 𝑄 ) .We will show then that SMW ( D 𝑄 ) ≤ SMW ( 𝑄 ) . Let 𝐷 ∈ SMW ( D 𝑄 ) .Then, there exists a minimal diversification ( 𝐷, ↑) of some data-base 𝐷 ′ ∈ D + 𝑄 with respect to 𝑄 such that for some CQ 𝑝 in 𝑞 : 𝐷 [ 𝑝 ] maps into chase (O , 𝐷 + ) via some homomorphism ℎ . Let 𝐴 = ran ( ℎ ) ∩ dom ( 𝐷 ) . From Lemma 3.10 Point (3), we know that ker ( 𝐷 ) ⊆ 𝐴 .Assume that SMW ( 𝐷 [ 𝑝 ]) = 𝑘 . Let 𝑓 be a positive monotonesubmodular function on 2 dom ( 𝐷 ) which is edge dominated withrespect to 𝐷 . Also let 𝑓 ′ be a function on 2 dom ( 𝐷 [ 𝑝 ]) which is suchthat 𝑓 ′ ( 𝑆 ) = 𝑓 ( ℎ ( 𝑆 ) ∩ dom ( 𝐷 )) , for every 𝑆 ∈ var ( 𝑝 ) . Then 𝑓 ′ isedge dominated as well: for every guarded set a in 𝐷 [ 𝑝 ] , ℎ ( a ) ∩ dom ( 𝐷 ) is a guarded set. As SMW ( 𝐷 [ 𝑝 ]) = 𝑘 , there must be sometree decomposition ( 𝑇 ′ 𝛿 , 𝜒 ′ ) of 𝐻 𝐷 [ 𝑝 ] , with 𝑇 ′ 𝛿 = ( 𝑉 ′ 𝛿 , 𝐸 ′ 𝛿 ) such that 𝑓 ′ ( 𝜒 ′ ( 𝑡 ′ )) ≤ 𝑘 for all 𝑡 ′ ∈ 𝑉 ′ 𝛿 . We define a tree decomposition ( 𝑇 𝛿 , 𝜒 ) of 𝐻 𝐷 , with 𝑇 𝛿 = ( 𝑉 𝛿 , 𝐸 𝛿 ) as follows:(1) 𝑉 𝛿 contains all 𝑡 ′ ∈ 𝑉 ′ 𝛿 for which ℎ ( 𝜒 ′ ( 𝑡 ′ )) ∩ dom ( 𝐷 ) ≠ ∅ ; also, some fresh element 𝑡 a for each guarded set a in 𝐷 which contains an isolated constant;(2) for all 𝑡 ∈ 𝑉 ′ 𝛿 ∩ 𝑉 𝛿 , let 𝜒 ( 𝑡 ) = ℎ ( 𝜒 ′ ( 𝑡 )) ∩ dom ( 𝐷 ) . Also, foreach 𝑡 a ∈ 𝑉 𝛿 , let 𝜒 ( 𝑡 a ) = a ;(3) 𝐸 𝛿 contains all pairs ( 𝑡 , 𝑡 𝑛 ) , for which there is a path ( 𝑡 , . . . , 𝑡 𝑛 ) in 𝑇 ′ 𝛿 , i.e. ( 𝑡 𝑖 , 𝑡 𝑖 + ) ∈ 𝐸 ′ 𝛿 , for every 𝑖 ∈ [ 𝑛 − ] , such that 𝑡 , 𝑡 𝑛 ∈ 𝑉 𝛿 and 𝑡 𝑗 ∉ 𝑉 𝛿 , for 𝑗 ∉ { , 𝑛 } ; also, for each 𝑡 a ∈ 𝑉 𝛿 and some 𝑡 ′ ∈ 𝑉 𝛿 such that 𝜒 ( 𝑡 ) ∩ a ≠ ∅ , add ( 𝑡 ′ , 𝑡 a ) to 𝐸 ′ 𝛿 .It can be verified that ( 𝑇 𝛿 , 𝜒 ) is a proper tree decomposition.Furthermore, for every 𝑡 ∈ 𝑉 𝛿 ∩ 𝑉 ′ 𝛿 , 𝑓 ( 𝜒 ( 𝑡 )) = 𝑓 ′ ( 𝜒 ′ ( 𝑡 )) ≤ 𝑘 ,and for every guarded set a in 𝐷 which contains an isolated con-stant, 𝑓 ′ ( 𝜒 ′ ( 𝑡 a )) ≤ 𝜒 ′ ( 𝑡 a ) con-tains only one fact). Thus, for every 𝑡 ∈ 𝑉 𝛿 , 𝑓 ( 𝜒 ( 𝑡 )) ≤ 𝑘 , and thus SMW ( 𝐷 ) ≤ SMW ( 𝐷 [ 𝑝 ]) . As for every database 𝐷 ∈ D 𝑄 , thereexists a CQ 𝑝 in 𝑞 with the properties from above, it follows that SMW ( 𝐷 ) ≤ SMW ( 𝐷 [ 𝑝 ]) . (cid:3) Finally, we show that for OMQs from (GDLog, UCQ), coverscan serve as witnesses for semantic submodular width. This holdsunder the assumption that we only consider OMQs from (GDLog,UCQ) as syntactic witnesses for semantic submodular width. It issubject to future work to see if this extends when we allow OMQsfrom (GTGD, UCQ) as witnesses.
Theorem 4.9.
Let 𝑄 be an OMQ from (GDLog, UCQ) of semanticsubmodular width 𝑘 . Also, let 𝑄 𝑐 be the cover of 𝑄 . Then, 𝑄 𝑐 hassubmodular width 𝑘 . Proof. As 𝑄 has semantic submodular width 𝑘 , there exists anOMQ 𝑄 𝑘 from (GDLog, UCQ) such that 𝑄 ≡ 𝑄 𝑘 and 𝑄 𝑘 has syntac-tic submodular width 𝑘 . Let 𝑄 𝑘,𝑐 be the cover of 𝑄 𝑘 . Then, accord-ing to Lemma 4.8 SMW ( 𝑄 𝑘,𝑐 ) ≤ SMW ( 𝑄 𝑘 ) , and as the submodu-lar width of 𝑄 𝑘 is already optimal, it follows that SMW ( 𝑄 𝑘,𝑐 ) = 𝑘 .But then, from Lemma 4.4 we know that SMW ( D 𝑄 𝑘,𝑐 ) = 𝑘 . As 𝑄 ≡ 𝑄 𝑘 ≡ 𝑄 𝑘,𝑐 , from Lemma 3.14 it follows that D 𝑄 ∼ D 𝑄 𝑘,𝑐 , andthus SMW ( D 𝑄 ) = 𝑘 . By applying once again Lemma 4.4, this timefor 𝑄 , we obtain that SMW ( 𝑄 𝑐 ) = SMW ( D 𝑄 ) = 𝑘 . (cid:3) In this section, we revisit the characterization for fpt evaluation ofOMQs from (GTGD, UCQ) over bounded arity schemas from [3].We say that a class of OMQs Q is over bounded arity schemas ifthere exists a 𝑘 such that every schema in some OMQ in Q containssymbols of arity at most 𝑘 . The characterization is as follows: Theorem 5.1 (Theorem 5.3, [3]).
Let Q be a recursively enumer-able class of OMQs from (GTGD, UCQ) over bounded arity schemas.Assumming FPT ≠ W [1] , the following are equivalent :(1) p-OMQ ( Q ) is fixed-paramater tractable.(2) Q has bounded semantic treewidth.If either statement is false, then p-OMQ ( Q ) is W [1] -hard. The characterization generalizes a previous result concerningcomplexity of evaluating OMQs from (
ELH I ⊥ , UCQ) [4]. It canalso be seen as a generalization of Grohe’s complexity results re-garding the parameterized complexity of uniform CSPs over boundedarity schemas [23]: Theorem 5.2 (Theorem 1, [23]).
Assume that
FPT ≠ W [1] . Thenfor every r.e. class C of structures of bounded arity the following state-ments are equivalent:(1) CSP ( C , _ ) is in polynomial time.(2) p-CSP ( C , _ ) is fixed-paramater tractable.(3) C has bounded treewidth modulo homomorphic equivalence.If either statement is false, then p-CSP ( C , _ ) is W [1] -hard. Direction ‘2 ⇒
1’ of Theorem 5.1 is established in [3] using ar-guments regarding the chase construction and applying the resultfrom [23] on a finite portion of the chase. As concerns the lowerbound (direction ‘1 ⇒ W [1]-hardness of evaluation forclasses of parameterized uniform CSPs of unbounded treewidthover bounded arity schemas is shown by an fpt reduction from theparameterized 𝑘 -clique problem. A central point of the reductionwas the construction for every such CSP of a special input struc-ture/database, which we will refer to as the Grohe database .Lifting the reduction from the parameterized 𝑘 -clique problemto the OMQ case in [3] required introspection into the propertiesof the Grohe database and also refining the construction. At thesame time, special attention had to be given to the fact that OMQsmight restrict database schema.Here we sketch how it is possible to establish the results fromTheorem 5.1 using Grohe’s results for uniform CSPs over boundedarity schemas as a black box by employing our reduction fromTheorem 4.2. This showcases the potential of the reduction for lift-ing results from the uniform CSP (query evaluation over database)realm to the OMQ one. We start by establishing a counterpart ofour Main Result 3 for the case of bounded arities OMQs. Theorem 5.3 (GDLog Bounded Arity Characterization).
Let Q be a r.e. class of OMQs from ( GDLog , UCQ ) over bounded arityschemas. Assuming that FPT ≠ W [1] , the following are equivalent:(1) p-OMQ ( Q ) is fixed-parameter tractable iff(2) Q 𝑐 has bounded tree-width iff(3) D Q has bounded tree-width. ristina Feier If either statement is false, then p-CSP ( C , _ ) is W [1] -hard. The strategy to prove Theorem 5.3 is similar to the one used toprove Theorem 4.1, except that this time we use the results fromTheorem 5.2 as a blackbox, as opposed to the unbounded arity case,where we used the results for uniform CSPs over unbounded arityschemas from Theorem 4.3 as a blackbox. We use these results bothfor showing the upper bound, direction ‘2 ⇒
1’ of the theorem, butalso the lower bound, direction ‘1 ⇒
3’ of the theorem. In the lattercase we employ our reduction from parameterized uniform CSPsto parameterized OMQs, which obviously still works over boundedarity schemas. What still needs to be shown, is the connection be-tween the treewidth of the cover of an OMQ and the treewidthof the set of characteristic databases of the same OMQ. It followsstraightaway from the proof of Lemma 4.4 that:
Lemma 5.4.
Let 𝑄 = (O , S , 𝑞 ) be an OMQ from ( GDLog , 𝑈𝐶𝑄 ) with schema arity at most 𝑟 , for some 𝑟 ≥ , and 𝑄 𝑐 = (O , S , 𝑞 𝑐 ) beits cover. Then, assuming TW ( D 𝑄 ) ≥ 𝑟 , TW ( 𝑄 𝑐 ) = TW ( D 𝑄 ) . Direction ‘3 ⇒
2’ of Theorem 5.3 follows then from Lemma 5.4.By using Theorem 5.3, we can retrieve the results from Theorem 5.1using the same strategy as the one used to establish Theorem 4.6.
We analyzed the complexity of evaluating classes of parameterizedOMQs based on guarded TGDs and UCQs in the unbounded ar-ity case and established necessary and sufficient conditions for fptevaluation of such classes of OMQs. For ontologies expressed inGDLog, the fragment of GTGDs without existentials, we provideda syntactic characterization based on new constructions, namelysets of characteristic databases and covers of an OMQ. On the wayto establish this result we introduced an fpt reduction from eval-uating parameterized uniform CSPs to evaluating parameterizedOMQs.We also revisited the case of OMQs over bounded arity schemaswhich has been previously addresed in [3]. For classes of suchOMQs from (GDLog, UCQ) we have established a new syntacticcharacterization for fixed-parameter tractability using the new no-tions of covers and characteristic databases, while for classes ofOMQs from (GTGD, UCQ), we have provided an alternative strat-egy to prove the semantic characterization concerning fixed-parametertractability from [3]. The new proof strategy is very similar to theproof strategy used to establish results in the unbounded arity case.It is conceptually simpler than the original proof as it uses resultsfrom the uniform CSP case as a black-box. This has been enabledby the fpt reduction we mentioned before.In this work we only considered Boolean OMQs, as the cor-responding results for uniform CSP/query evaluation in the un-bounded arity case have also only been established in the Booleancase. As future work, we plan to extend our work to the non-Booleancase. Many results from the database world concerning efficiencyof performing tasks like counting [18], enumeration [7, 15], andso on, use structural measures on the queries similar to the onesinvolved in the characterizations for evaluation. Thus, by general-izing our constructs to the non-Boolean case, depending on whichstructural measures are preserved when transitioning from sets of characteristic databases to covers of OMQs, we might be able tolift such results to the OMQ world.
REFERENCES [1] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Pe-ter F. Patel-Schneider (Eds.). 2003.
The Description Logic Handbook: Theory, Im-plementation, and Applications .[2] Jean-François Baget, Michel Leclère, Marie-LaureMugnier, and Eric Salvat. 2011.On rules with existential variables: Walking the decidability line.
Artif. Intell.
PODS . 259–270.[4] Pablo Barceló, Cristina Feier, Carsten Lutz, and Andreas Pieris. 2019. When isOntology-Mediated Querying Efficient?. In
LICS . 1–13.[5] Pablo Barceló, Diego Figueira, Georg Gottlob, and Andreas Pieris. 2019. Se-mantic Optimization of Conjunctive Queries. Under submission, available athttp://homepages.inf.ed.ac.uk/apieris/BFGP.pdf.[6] Pablo Barceló, Andreas Pieris, and Miguel Romero. 2017. Semantic Optimizationin Tractable Classes of Conjunctive Queries.
SIGMOD Rec.
46, 2 (2017), 5–17.[7] Christoph Berkholz and Nicole Schweikardt. 2019. Constant Delay Enumerationwith FPT-Preprocessing for Conjunctive Queriesof Bounded Submodular Width.In
MFCS 2019 , Vol. 138. 58:1–58:15.[8] Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. 2014.Ontology-Based Data Access: A Study through Disjunctive Datalog, CSP, andMMSNP.
ACM Trans. Database Syst.
39, 4 (2014), 33:1–33:44.[9] Pierre Bourhis, Marco Manna, Michael Morak, and Andreas Pieris. 2016.Guarded-Based Disjunctive Tuple-Generating Dependencies.
ACM Trans. Data-base Syst.
41, 4 (2016), 27:1–27:45.[10] Andrei A. Bulatov. 2017. A Dichotomy Theorem for Nonuniform CSPs. In
FOCS .319–330.[11] Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. 2012. A general Datalog-based framework for tractable query answering over ontologies.
Journal of WebSemantics
14 (2012), 57 – 83.[12] Andrea Calì, Georg Gottlob, and Michael Kifer. 2013. Taming the Infinite Chase:Query Answering under Expressive Relational Constraints.
J. Artif. Intell. Res.
48 (2013), 115–174.[13] Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. 2009. Datalog ± : a unifiedapproach to ontologies and integrity constraints. In ICDT . 14–30.[14] Andrea Calì, Georg Gottlob, and Andreas Pieris. 2012. Towards more expressiveontology languages: The query answering problem.
Artif. Intell.
193 (2012), 87–128.[15] Nofar Carmeli and Markus Kröll. 2018. Enumeration Complexity of ConjunctiveQueries with Functional Dependencies. In (ICDT 2018) , Vol. 98. 11:1–11:17.[16] Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containmentrevisited.
Theor. Comput. Sci.
IJCAI , Christian Bessiere (Ed.). 1726–1733.[18] Hubie Chen and Stefan Mengel. 2015. A Trichotomy in the Complexity of Count-ing Answers to Conjunctive Queries. In (ICDT 2015) , Vol. 31. 110–126.[19] Alin Deutsch, Alan Nash, and Jeff B. Remmel. 2008. The Chase Revisisted. In
PODS . 149–158.[20] Rodney G. Downey and Michael R. Fellows. 1995. Fixed-Parameter Tractabilityand Completeness I: Basic Results.
SIAM J. Comput.
24 (1995), 873–921.[21] Tomás Feder and Moshe Y. Vardi. 1998. The Computational Structure of Mono-tone Monadic SNP and Constraint Satisfaction: A Study through Datalog andGroup Theory.
SIAM J. Comput.
28, 1 (1998), 57–104.[22] Erich Grädel. 1999. Decision procedures for guarded logics. In
CADE , Vol. 1632.[23] Martin Grohe. 2007. The complexity of homomorphism and constraint satisfac-tion problems seen from the other side.
J. ACM
54, 1 (2007), 1:1–1:24.[24] Stijn Heymans, Jos de Bruijn, Livia Predoiu, Cristina Feier, and Davy VanNieuwenborgh. 2008. Guarded hybrid knowledge bases.
Theory Pract. Log. Pro-gram.
8, 3 (2008), 411–429.[25] Stijn Heymans and Dirk Vermeir. 2003. Integrating Ontology Languages andAnswer Set Programming. In
DEXA . 584–588.[26] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. 2001. Which Prob-lems Have Strongly Exponential Complexity?
J. Comput. Syst. Sci.
63, 4 (2001),512–530.[27] David S. Johnson and Anthony C. Klug. 1984. Testing Containment of Conjunc-tive Queries under Functional and Inclusion Dependencies.
J. Comput. Syst. Sci.
28, 1 (1984), 167–189.[28] Vladimir Lifschitz. 1999. Action Languages, Answer Sets, and Planning. In
APT .357–374.[29] David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing Implica-tions of Data Dependencies.
ACM Trans. Database Syst.
4, 4 (1979), 455–469. fficiency of Query Evaluation Under Guarded TGDs: The Unbounded Arity Case [30] Dániel Marx. 2010. Tractable hypergraph properties for constraint satisfactionand conjunctive queries. In
STOC . 735–744.[31] Riccardo Rosati. 2005. On the decidability and complexity of integrating ontolo-gies and rules.
J. Web Semant.
3, 1 (2005), 41–60. [32] Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In
VLDB .82–94.[33] Dmitriy Zhuk. 2017. A Proof of CSP Dichotomy Conjecture. In