Separating Positive and Negative Data Examples by Concepts and Formulas: The Case of Restricted Signatures
Jean Christoph Jung, Carsten Lutz, Hadrien Pulcini, Frank Wolter
aa r X i v : . [ c s . A I] J u l Separating Positive and Negative DataExamples by Concepts and Formulas:The Case of Restricted Signatures
Jean Christoph Jung , Carsten Lutz , Hadrien Pulcini , and Frank Wolter University of Bremen, Germany University of Liverpool, UK
Abstract.
We study the separation of positive and negative data exam-ples in terms of description logic (DL) concepts and formulas of decidableFO fragments, in the presence of an ontology. In contrast to previouswork, we add a signature that specifies a subset of the symbols from thedata and ontology that can be used for separation. We consider weakand strong versions of the resulting problem that differ in how the neg-ative examples are treated. Our main results are that (a projective formof) the weak version is decidable in
ALCI while it is undecidable in theguarded fragment GF, the guarded negation fragment GNF, and the DL
ALCFIO , and that strong separability is decidable in
ALCI , GF, andGNF. We also provide (mostly tight) complexity bounds.
There are several applications that fall under the broad term of supervised learn-ing and seek to compute a logical expression that separates positive from negativeexamples given in the form of labeled data items in a knowledge base. A promi-nent example is concept learning for description logics (DLs) where the aim is tosupport a user in automatically constructing a concept description that can thenbe used, for instance, in ontology engineering [7,34,33,45,14,17,42]. A further ex-ample is reverse engineering of database queries (also called query by example,QBE), which has a long history in database research [43,44,48,47,29,2,9,30,38]and which has also been studied in the presence of a DL ontology [23,39]. Notethat a closed world semantics is adopted for QBE in databases while an openworld semantics is required when the data is assumed to be incomplete as in thepresence of ontologies, but also, for example, in reverse engineering of SPARQLqueries [3]. Another example is entity comparison in RDF graphs, where one aimsto find meaningful descriptions that separate one entity from another [41,40] anda final example is generating referring expressions (GRE) where the aim is todescribe a single data item by a logical expression such as a DL concept, sepa-rating it from all other data items. GRE has originated in linguistics [32], buthas recently received interest in DL-based ontology-mediated querying [11].A fundamental problem common to all these applications is to decide whethera separating expression exists at all. There are several degrees of freedom in defin-ing this problem. One concerns the negative examples: is it enough that theyo not entail the separating formula ( weak separability ) or are they requiredto entail its negation ( strong separability )? Another one concerns the questionwhether additional helper symbols are admitted in the separating formula ( pro-jective separability ) or not ( non-projective separability ). The emerging family ofproblems has recently been investigated in [18,26], concentrating on the casewhere the separating expression is a DL concept or formulated in a decidablefragment of first-order logic (FO) such as the guarded fragment (GF) and theguarded negation fragment (GNF).In this paper, we add a signature Σ that is given as an additional input andrequire separating expressions to be formulated in Σ (in the non-projective case).This makes it possible to ‘direct’ separation towards expressions based on desiredfeatures and to exclude features that are not supposed to be used for separationsuch as gender and skin color. In the projective case, helper symbols from outsideof Σ are also admitted, but must be ‘fresh’ in that they cannot occur in the givenknowledge base. Argueably, such fresh symbols make the constructed separatingexpressions less intuitive from an application perspective and more difficult tounderstand. However, they sometimes increase the separating power and theyemerge naturally from a technical perspective.The signature Σ brings the separation problem closer to the problem ofdeciding whether an ontology is a conservative extension of another ontology [25],also a form of separation, and to deciding the existence of uniform interpolants[37]. It turns out, in fact, that lower bounds for these problems can often beadapted to weak separability with signature. In constrast, for strong separabilitywe observe a close connection to Craig interpolation.We consider both weak and strong separability, generally assuming that theontology is formulated in the same logic that is used for separation. We con-centrate on combined complexity, that is, the input to the decision problemsconsists of the knowledge base that comprises an ABox and an ontology, thepositive and negative examples in the form of lists of individuals (for DLs) orlists of tuples of individuals (for FO fragments that support more than one freevariable), and the signature. In the following, we summarize our main results.We start with weak projective separability in ALCI , present a character-ization in terms of Σ -homomorphisms that generalizes characterizations from[18,26], and then give a decision procedure based on tree automata. This yieldsa upper bound, and a matching lower bound is obtained by reductionfrom conservative extensions. In contrast, weak projective (and non-projective)separability in ALCI without a signature is only
NExpTime -complete [18]. Thenon-projective case with signature remains open. We then show that weak sepa-rability is undecidable in any fragment of FO that extends GF (such as GNF) or
ALCFIO (such as the two-variable fragment with counting, C ). In both cases,the proof is by adaptation of undecidability proofs for conservative extensions,from [25] and [19] respectively, and applies to both the projective and the non-projective case. This should be contrasted with the fact that weak separabilityis decidable and -complete for GF and for GNF without a signature,both in the projective and in the non-projective case [26]. The decidability statusf (any version of) separability in ALCFIO without a signature is open. It isknown, however, that projective and non-projective weak separability without asignature are undecidable in the two-variable fragment FO of FO [26].We then turn to strong separability. Here, the projective and the non-pro-jective case coincide and will thus not be distinguished in what follows. Weagain start with ALCI for which we show -completeness. The proofs,however, are rather different than in the weak case. For the upper bound, wecharacterize non-separability in terms of the existence of a set of types that areamalgamable in the sense that they can be realized in a model of the ontologyat elements that are all
ALCI ( Σ )-bisimilar, and that satisfy certain additionalproperties. To identify sets of amalgamable types, we use an approach that isloosely in the style of type elimination procedures. A matching lower bound isproved by a reduction from the word problem of exponentially space boundedATMs. We remark that in the strong case, the increase in complexity that resultsfrom adding a signature is even more pronounced. In fact, strong separabilitywithout a signature is only ExpTime -complete in
ALCI [26]. We then turn toGF and GNF and establish a close link between strong separability and inter-polant existence, the problem to decide for formulas ϕ, ψ in a language L whetherthere exists a formula χ in L using only the shared symbols of ϕ and ψ such thatboth ϕ → χ and χ → ψ are valid. We show that strong separability with signa-ture in GF and GNF are polynomial time reducible to interpolant existence inGF and GNF, respectively. GNF enjoys the Craig interpolation property (CIP),that is, there is such a formula χ whenever ϕ → ψ is valid. Thus, from the CIPof GNF and the fact that validity in GNF is -complete [8], we obtainthat strong separability with signature in GNF is -complete. GF failsto have the CIP and -completeness for interpolant existence has onlyrecently been established [28]. We thus obtain a upper bound forstrong separability with signature in GF. A matching lower bound can be shownsimilar to the proof of -hardness for interpolant existence. We notethat strong separability without signature is -complete in both GNFand GF [26]. Let Σ full be a set of relation symbols that contains countably many symbolsof every arity n ≥ constants .A signature is a set of relation symbols Σ ⊆ Σ full . We write a for a tuple( a , . . . , a n ) of constants. A database D is a finite set of ground atoms R ( a ),where R ∈ Σ full has arity n and a is a tuple of constants from Const of length n . We use cons( D ) to denote the set of constants that occur in D .Denote by FO the set of first-order (FO) formulas constructed from constant-free atomic formulas x = y and R ( x ), R ∈ Σ full , using conjunction, disjunction,negation, and existential and universal quantification. As usual, we write ϕ ( x )to indicate that the free variables in FO-formula ϕ are all from x and call aormula open if it has at least one free variable and a sentence otherwise. Notethat we do not admit constants in FO-formulas.An ontology O is a finite set of FO-sentences, and a knowledge base (KB) is apair K = ( O , D ) of an ontology O and a database D . As usual, KBs K = ( O , D )are interpreted in relational structures A = (dom( A ) , ( R A ) R ∈ Σ full , ( c A ) c ∈ Const )where dom( A ) is the non-empty domain of A , each R A is a relation over dom( A )whose arity matches that of R , and c A ∈ dom( A ) for all c ∈ Const. Note that wedo not make the unique name assumption (UNA) , that is c A = c A might holdeven when c = c . This is in fact essential for several of our results. A structure A is a model of a KB K = ( O , D ) if it satisfies all sentences in O and all groundatoms in D . A KB K is satisfiable if there exists a model of K .We introduce two fragments of FO, the guarded fragment and the descriptionlogic ALCI . In the guarded fragment (GF) of FO [1,21], formulas are built fromatomic formulas R ( x ) and x = y by applying the Boolean connectives and guarded quantifiers of the form ∀ y ( α ( x , y ) → ϕ ( x , y )) and ∃ y ( α ( x , y ) ∧ ϕ ( x , y ))where ϕ ( x , y ) is a guarded formula and α ( x , y ) is an atomic formula or anequality x = y that contains all variables in [ x ] ∪ [ y ]. The formula α is calledthe guard of the quantifier. We say that an ontology O is a GF-ontology if allformulas in O are from GF, and likewise for knowledge bases.We next introduce the DL ALCI . In this context, unary relation symbols arecalled concept names and binary relation symbols are called role names [5,6]. A role is a role name or an inverse role R − with R a role name. For uniformity,we set ( R − ) − = R . ALCI -concepts are defined by the grammar
C, D ::= A | ¬ C | C ⊓ D | ∃ R.C where A ranges over concept names and R over roles. As usual, we write ⊤ to abbreviate A ⊔ ¬ A for some fixed concept name A , ⊤ for ¬⊥ , C ⊔ D for ¬ ( ¬ C ⊓ ¬ D ), C → D for ¬ C ⊔ D , and ∀ R.C for ¬∃ R. ¬ C . An ALCI - conceptinclusion (CI) takes the form C ⊑ D where C and D are ALCI -concepts. An
ALCI - ontology is a finite set of ALCI -CIs. An
ALCI -KB K = ( O , D ) consistsof an ALCI -ontology O and a database D . Here and in general in the context of ALCI , we assume that databases use only unary and binary relation symbols.We sometimes also mention the fragment
ALC of ALCI in which inverse rolesare not available.To obtain a semantics, every
ALCI -concept C can be translated into anFO-formula C † with one free variable x : A † = A ( x )( C ⊓ D ) † = C † ⊓ D † ( ∃ R.C ) † = ∃ y ( R ( x, y ) ∧ C † [ y/x ])( ∃ R − .C ) † = ∃ y ( R ( y, x ) ∧ C † [ y/x ]) . The extension C A of a concept C in a structure A is defined as C A = { a ∈ dom( A ) | A | = C † ( a ) } . A CI C ⊑ D is regarded as a shorthand for the FO-sentence ∀ x ( C † ( x ) → D † ( x )). Thus, every ALCI -concept can be viewed as GF-formula and every
ALCI -ontology can be viewed as a GF-ontology. Byeconomically reusing variables, we can even obtain formulas and ontologies fromGF ∩ FO . We write O | = C ⊑ D to say that CI C ⊑ D is a consequence ofontology O , that is, C A ⊆ D A holds in every model A of O . Concepts C and D are equivalent w.r.t. an ontology O if O | = C ⊑ D and O | = D ⊑ C .The Gaifman graph G A of a structure A is the undirected graph with set ofvertices dom( A ) and an edge { d, e } whenever there exists a ∈ R A that contains d, e for some relation R . The distance dist A ( a, b ) between a, b ∈ dom( A ) is definedas the length of a shortest path from a to b , if such a path exists. Otherwisedist A ( a, b ) = ∞ . The maximal connected component (mcc) A con( a ) of a in A isthe substructure of A induced by the set of all b such that dist A ( a, b ) < ∞ .Let A be a structure such that R A = ∅ for any relation symbol R of arity >
2. We say that A is tree-shaped if G A is a tree without reflexive loops and R A ∩ S A = ∅ for all distinct roles R, S . We say that A has finite outdegree if G A has finite outdegree. A structure A is a forest structure w.r.t. an ALCI -KB K = ( O , D ) if the undirected graph ( V, E ) with V = dom( A ) E = {{ d, e } | ( d, e ) ∈ R A for some R } \ {{ d, e } | d, e ∈ cons( D ) } is a tree. We drop ‘w.r.t. K ’ if K is clear from the context and speak of a forestmodel of K when A is a model of K . The following result is well known. Lemma 1.
Let K be an ALCI -KB and C an ALCI -concept. If
K 6| = C ( a ) , thenthere exists a forest model A of K of finite outdegree with a C A . We close this section with introducing homomorphisms and bisimulations.Let Σ be a signature. A Σ -homomorphism h from a structure A to a structure B is a function h : dom( A ) → dom( B ) such that a ∈ R A implies h ( a ) ∈ R B for all relation symbols R ∈ Σ and tuples a and with h ( a ) being defined com-ponent wise in the expected way. Note that homomorphisms need not preserveconstant symbols. Every database D gives rise to the finite structure A D withdom( A D ) = cons( D ) and a ∈ R A D iff R ( a ) ∈ D . A Σ -homomorphism fromdatabase D to structure A is a Σ -homomorphism from A D to A . A pointedstructure takes the form A , a with A a structure and a a tuple of elements ofdom( A ). A homomorphism from pointed structure A , a to pointed structure B , b is a homomorphism h from A to B with h ( a ) = b . We write A , a → B , b toindicate the existence of such a homomorphism.We introduce ALCI ( Σ )-bisimulations between structures A and B that in-terpret relations of arity at most two. Let Σ be a signature. A relation S ⊆ dom( A ) × dom( B ) is an ALCI ( Σ ) -bisimulation between A and B if the follow-ing conditions hold:1. for all ( d, e ) ∈ S : d ∈ A A iff e ∈ A B ;2. if ( d, e ) ∈ S and ( d, d ′ ) ∈ R A , then there is a e ′ with ( e, e ′ ) ∈ R B and( d ′ , e ′ ) ∈ S ;3. if ( d, e ) ∈ S and ( e, e ′ ) ∈ R B , then there is a d ′ with ( d, d ′ ) ∈ R A and( d ′ , e ′ ) ∈ S ,here A ranges over all concept names in Σ and R over all Σ -roles. We write A , d ∼ ALCI ,Σ B , e and call A , d and B , e ALCI ( Σ ) -bisimilar if there exists an ALCI ( Σ )-bisimulation S such that ( d, e ) ∈ S .The next lemma explains why ALCI ( Σ )-bisimulations are relevant [35,20].We say that A , d and B , e are ALCI ( Σ ) -equivalent , in symbols A , d ≡ ALCI ,Σ B , e if d ∈ C A iff e ∈ C B for all C ∈ ALCI ( Σ ). Lemma 2.
Let A , d and B , e be pointed structures of finite outdegree and Σ asignature. Then A , d ≡ ALCI ,Σ B , e iff A , d ∼ ALCI ,Σ B , e. For the “if ”-direction, the condition on the outdegree can be dropped.
For any syntactic object O such as a formula, an ontology, and a KB, we usesig( O ) to denote the set of relation symbols that occur in O and || O || to denotethe size of O , that is, the number of symbols needed to write it with names ofrelations, variables, and constants counting as a single symbol. We start with introducing the problem of (weak) separability with signature, inits projective and non-projective version.
Definition 1.
Let L be a fragment of FO. A labeled L -KB takes the form ( K , P, N ) with K = ( O , D ) an L -KB and P, N ⊆ cons ( D ) n non-empty sets of positive and negative examples , all of them tuples of the same length n .Let Σ ⊆ sig ( K ) be a signature. An FO ( Σ ) -formula ϕ ( x ) with n free variables Σ -separates ( K , P, N ) if sig ( K ) ∩ sig ( ϕ ) ⊆ Σ and1. K | = ϕ ( a ) for all a ∈ P and2. K 6| = ϕ ( a ) for all a ∈ N .Let L S be a fragment of FO. We say that ( K , P, N ) is projectively L S ( Σ )-separable if there is an L S -formula ϕ ( x ) that Σ -separates ( K , P, N ) and (non-projectively) L S ( Σ )-separable if there is such a ϕ ( x ) with sig ( ϕ ) ⊆ Σ . Relation symbols in Σ -separating formulas that are not from Σ should bethought of as helper symbols. Their availability sometimes makes inseparableKBs separable, examples are provided below. We only consider FO-fragments L S that are closed under conjunction. In this case, a labeled KB ( K , P, N ) is L S ( Σ )-separable if and only if all ( K , P, { b } ), b ∈ N , are L S ( Σ )-separable,and likewise for projective L S ( Σ )-separability, see [26]. In what follows, we thusmostly consider labeled KBs with singleton sets N of negative examples.Each choice of an ontology language L and a separation language L S giverise to a projective and to a non-projective separability problem. In the currentpaper, we only consider cases where L = L S (see below for a discusssion). ROBLEM : (Projective) L -separability with signature INPUT : A labeled L -KB ( K , P, N ) and signature Σ ⊆ sig( K ) QUESTION : Is ( K , P, N ) (projectively) L ( Σ )-separable?We study the combined complexity of L -separability with signature where theontology O , database D (both in K ), and sets of examples P and N are all takento be part of the input. One can also study data complexity where only D , P ,and N are regarded as inputs while O is assumed to be fixed [26]. A special caseof L -separability with signature is L - definability with signature where P and N partition the example space, that is, inputs are labeled L -KBs ( K , P, N ) suchthat N = cons( D ) n \ P , n the length of example tuples. All our results also holdfor definability.We now give some examples illustrating the central notions used in thispaper. In [26], projective and non-projective separability are studied withoutsignature restrictions. Thus, all symbols used in the KB can appear in separatingformulas. Rather surprizingly, it turned out that in this case many differentseparation languages have exactly the same separating power. For example, alabeled FO-KB turned out to be FO-separable iff it is UCQ-separable (andprojective and non-projective separability coincide) and a labeled ALCI -KB isprojectively
ALCI -separable iff it is (non-)projectively FO-separable. No suchresult can be expected for separability with signature restrictions, as illustratedby the following example.
Example 1.
Let O = { A ⊑ ∃ R.B ⊓ ∃ R. ¬ B } and D = { A ( a ) , R ( b, c ) } . Let P = { a } , N = { b } , and Σ = { R } . Clearly, the formula ∃ y ∃ y ′ ( R ( x, y ) ∧ R ( x, y ′ ) ∧ ¬ ( y = y ′ )) Σ -separates ( O , D , P, N ), but ( O , D , P, N ) is neither UCQ( Σ )-separable nor ALCI ( Σ )-separable.However, the ability to restrict separating formulas to a given signature makesit possible to guide separation towards desired aspects. Example 2.
Consider a KB about books that uses, say, the concept and rolenames provided by schema.org (called types and properties there). Schema.orgoffers dozens of such names related to books, ranging from editor, author, andillustrator to genre, date published, and character. Assume that a few bookshave been labeled as likes (added to P ) and dislikes (added to N ) and onewould like to find a formula ϕ that separate P from N . Then it might be usefulto restrict the signature of ϕ so as to concentrate on the aspects of books thatone is most interested in. For example, one could select a signature that containssymbols related to genre such as graphic novel, adventure, classic, and drop allremaining symbols. If no separating formula exists, one can then iteratively We denote by UCQ the set of FO-formulas that are disjunctions of formulas con-structed from atoms using conjunction and existential quantification. xtend the signature until separating formulas are found. We refer the reader toresearch on modules and modularity in ontologies, where signatures are also usedto capture the topic of a module [22,31,12,13]. If one is not sure which aspectsare most relevant for separation, one might of course also decide to work witha large signature. But also in such a case, it might be useful to exclude certainundersired symbols such as the author’s age and gender.The helper symbols that distinguish the projective from the non-projective caseplay a completely different role from the symbols in the signature Σ selected forseparation, as discussed next. Example 3.
Consider a database D in which an individual a is part of an R -cycleand b has both an R -reflexive successor and predecessor. Thus, D = { R ( a , a ) , . . . , R ( a n − , a n ) , R ( b, b ) , R ( b , b ) , R ( b , b ) , R ( b , b ) } , where a = a n = a and n >
0. Let O be either empty or any ALCI -ontologysuch that K = ( O , D ) is satisfiable and ⊤ ⊑ ∃ R. ⊤ ⊓ ∃ R − . ⊤ ∈ O . Furtherlet P = { a } , N = { b } , and Σ = { R } . If no helper symbols are allowed,then ( K , P, N ) is not ALCI ( Σ )-separable (we show this in Example 4 below).If, however, a helper symbol A is allowed, then ¬ A ⊔ ∃ R n .A is a separating ALCI -concept. Thus, ( K , P, N ) is projectively ALCI ( Σ )-separable but not non-projectively ALCI ( Σ )-separable. The use of a cycle in this example is no acci-dent; we show in the next section that if there are no cycles in the database,then helper symbols do not add any separating power to ALCI -concepts.
ALCI
We give a model-theoretic characterization of projective weak
ALCI -separabilitywith signature and use it to prove decidability in . A matching lowerbound is obtained by reduction from conservative extensions. We also use thecharacterization to discuss the relationship between non-projective and projec-tive separability, showing, in particular, that on tree-shaped databases the twonotions coincide.We start with a model-theoretic characterization of projective
ALCI ( Σ )-separability of labeled KBs. As ALCI -concepts talk about individual elementsand not tuples, we assume in this section that examples in labeled KBs areconstants from the database. In fact, we give three characterizations that aremore and more refined. We use the second one to clarify the relationship be-tween projective and non-projective separability and the third characterizationfor the decision procedure. The first characterization directly reflects that we areconsidering projective separability with signature as it is based on
ALCI ( Σ ′ )-bisimulations where Σ ′ is a signature satisfying Σ ′ ∩ sig( K ) ⊆ Σ . The sec-ond characterization replaces ALCI ( Σ ′ )-bisimulations by functional ALCI ( Σ )-bisimulations. The final characterization replaces the functional bisimulationsfrom the second characterization by a combination of Σ -homomorphisms and ALCI ( Σ )-bisimulations. We first introduce some required notation.n extended database is a database that additionally may contain ‘atoms’ ofthe form C ( a ) with C an ALCI -concept. The semantics of extended databasesis defined in the expected way. We write A , a ∼ f ALCI ,Σ B , b if there exists an ALCI ( Σ )-bisimulation S between A and B that contains ( a, b ) and is functional,that is, ( d, d ) , ( d, d ) ∈ S imply d = d . Let sub( K ) denote the set of conceptsthat occur in K , closed under single negation and under subconcepts. The K -typerealized in a pointed structure A , a is defined astp K ( A , a ) = { C ∈ sub( K ) | a ∈ C A } . A K -type is any set t ⊆ sub( K ) of the form tp K ( A , a ). For a pointed database D , a , we write D con( a ) , a → Σc A , b if there is a Σ -homomorphism h from themaximal connected component D con( a ) of a in D to A such that h ( a ) = b andthere is a K -type t d for each d ∈ cons( D con( a ) ) such that:1. there exists a model B d of O with tp K ( B d , d ) = t d and B d , d ∼ ALCI ,Σ A , h ( d );2. ( O , D ′ ) is satisfiable, for the extended database D ′ = D ∪{ C ( d ) | C ∈ t d , d ∈ cons( D con( a ) ) } . Theorem 1.
Let ( K , P, { b } ) be a labeled ALCI -KB with K = ( O , D ) and Σ ⊆ sig ( K ) . Then the following conditions are equivalent:1. ( K , P, { b } ) is projectively ALCI ( Σ ) -separable.2. there exists a forest model A of K of finite outdegree and a signature Σ ′ such that Σ ′ ∩ sig ( K ) ⊆ Σ and for all models B of K and all a ∈ P : B , a B ALCI ,Σ ′ A , b A .3. there exists a forest model A of K of finite outdegree such that for all models B of K and all a ∈ P : B , a B f ALCI ,Σ A , b A .4. there exists a forest model A of K of finite outdegree such that for all a ∈ P : D con ( a ) , a Σc A , b A . The proof relies on Lemmas 1 and 2 and is given in the appendix.We first use Theorem 1 to discuss the relationship between projective andnon-projective separability. A basic model-theoretic characterization of non-projective
ALCI ( Σ )-separability is rather straightforward to obtain by justdropping the quantification over Σ ′ in Condition 2 of Theorem 1 and demandinginstead that B , a B ALCI ,Σ A , b A , for all models B of K and a ∈ P . As a conse-quence, one can then also adapt Condition 3 of Theorem 1 to the non-projectivecase by simply dropping the functionality condition on the bisimulation. If thedatabase D con( a ) is tree-shaped, then there is no difference between the two ver-sions of Condition 2 as one can always introduce sufficiently many copies ofnodes in models B of K to turn an unrestricted bisimulation into a functionalone. We obtain the following result. Theorem 2.
Let ( K , P, N ) be a labeled ALCI -KB with K = ( O , D ) such that D con ( a ) is tree-shaped for all a ∈ P and Σ ⊆ sig ( K ) . Then ( K , P, N ) is projec-tively ALCI ( Σ ) -separable iff it is non-projectively ALCI ( Σ ) -separable. he following example illustrates Theorem 1. Example 4.
Consider the labeled KB ( K , P, N ) and signature Σ from Example 3.Then we find a model A of K of finite outdegree such that b A does not participatein any R -cycle. Then, as a participates in an R -cycle, there does not exist anymodel B of K such that B , a B ∼ f ALCI ,Σ A , b A , and so ( K , P, N ) is projectively ALCI ( Σ )-separable. ( K , P, N ) is not non-projectively ALCI ( Σ )-separable: if O contains ⊤ ⊑ ∃ R. ⊤ ⊓ ∃ R − . ⊤ , then S = dom( B ) × dom( A ) is an ALCI ( Σ )-bisimulation between B and A , for any model B of K . If O is empty, then theconstruction of the required bisimulation is also straightforward.We now come to the main result of this section. Theorem 3.
Projective
ALCI -separability with signature is -complete.
The upper bound in Theorem 3 is obtained by using the characterization inCondition 4 of Theorem 1 to devise a decision procedure based on tree automata.Given a labeled
ALCI -KB ( K , P, { b } ), we construct a tree automaton A suchthat the language recognized by A is non-empty if and only if there is a forestmodel A of K as described in Condition 4 of Theorem 1. We use a variant oftwo-way alternating parity tree automata over infinite trees [46]. In contrast tothe standard model, our automata work on trees of finite, but not necessarilybounded outdegree. Such an automata model has been introduced in [25]. Werecall the technical preliminaries.A tree is a non-empty (and potentially infinite) set of words T ⊆ ( N \ ∗ closed under prefixes. We generally assume that trees are finitely branching, thatis, for every w ∈ T , the set { i | w · i ∈ T } is finite. For any w ∈ ( N \ ∗ , as aconvention we set w · w . If w = n n · · · n k , we additionally set w · − n · · · n k − . For an alphabet Θ , a Θ -labeled tree is a pair ( T, L ) with T a treeand L : T → Θ a node labeling function.A two-way alternating tree automaton (2ATA) is a tuple A = ( Q, Θ, q , δ, Ω )where Q is a finite set of states , Θ is the finite input alphabet , q ∈ Q is the initialstate , δ is a transition function as specified below, and Ω : Q → N is a priorityfunction . The automaton runs on Θ -labeled trees. The transition function mapsa state q and some input letter θ ∈ Θ to a transition condition δ ( q, θ ) which is apositive Boolean formula over the truth constants true and false and transitions ofthe form q , h−i q , [ − ] q , ♦ q , (cid:3) q where q ∈ Q . Informally, the transition q expressesthat a copy of the automaton is sent to the current node in state q , h−i q meansthat a copy is sent in state q to the predecessor node, which is then required toexist, [ − ] q means the same except that the predecessor node is not required toexist, ♦ q means that a copy is sent in state q to some successor, and (cid:3) q thata copy is sent in state q to all successors. The semantics is defined in terms ofruns in the usual way, we refer to [25] for details. We use L ( A ) to denote the setof all Θ -labeled trees accepted by A . 2ATAs are closed under complementationand intersection, and their emptiness problem, which asks whether L ( A ) = ∅ for a given 2ATA A , can be decided in time exponential in the number of statesof A [25].The input alphabet Θ consists of two types of symbols:. models A of D with dom( A ) ⊆ D , D a fixed set of cardinality | cons( D ) | ;2. triples ( a, R, M ) for a ∈ cons( D ), R a role used in O , and M ⊆ sig( O ) ∩ N C .A Θ -labeled tree is well-formed if it has a label of Type 1 at the root and labelsof Type 2 everywhere else. A well-formed Θ -labeled tree τ encodes a structure A τ that can be constructed as follows: – start with A τ = A , the structure from the root label; – for every non-root v ∈ T with τ ( v ) = ( a, R, M ), extend A τ as follows: • if the predecessor of v is the root, add an R -successor a v of a A τ ; • if the predecessor v ′ of v is not the root, add an R -successor a v of a v ′ (so a is ignored in this case and should be considered a dummy.)In both cases, a v makes true exactly the concept names in M .It should be clear that every structure A τ is a forest structure. Conversely, wecan encode every forest structure A into a Θ -labeled tree τ such that A τ = A . Lemma 3.
There are 2ATAs A , A , A such that:1. A accepts precisely the well-formed Θ -labeled trees;2. A accepts a well-formed Θ -labeled tree τ iff A τ is a model of K ;3. A accepts a well-formed Θ -labeled tree τ if D con ( a ) , a → Σc A τ , b A τ .The number of states of A is ; the number of states of A is polynomial in ||O|| ; the number of states of A is exponential in ||K|| . Moreover, A , A , A can be constructed in time double exponential in ||K|| . As the construction of A and A is rather standard, we only sketch the construc-tion of the automaton A . See e.g. [25] for full details of a similar construction.As the first step, A reads the symbol A at the root and non-deterministicallyguesses the following: – types t d , d ∈ cons( D con( a ) ), such that ( O , D ′ ) is satisfiable where D ′ = D ∪ { C ( d ) | C ∈ t d , d ∈ cons( D con( a ) ) } ; – a partition D , D , . . . , D m of D con( a ) such that cons( D i ) ∩ cons( D j ) = ∅ for1 ≤ i < j ≤ m ; – a Σ -homomorphism h from D to A such that h ( a ) = b A and there are a , . . . , a m with h ( c ) = a i for all c ∈ cons( D ∩ D i ) and 1 ≤ i ≤ m .The entire guess is stored in the state of the automaton. Note that the first itemchecks that Point 2 from the definition of D con( a ) , a → Σc A τ , b A τ is satisfied forthe guessed types t d . After making its guess, the automaton verifies that thehomomorphism h from the last item can be extended to a homomorphism from D con( a ) to A τ that satisfies Point 1 from the definition of D con( a ) , a → Σc A τ , b A τ .To this end, it does a top-down traversal of A τ checking that each D i can behomomorphically mapped to the subtree of A τ below h ( a i ). During the traversal,the automaton memorizes in its state the set of constants from D i that aremapped to the currently visited element.The automaton additionally makes sure that Point 1 from the definitionof D con( a ) , a → Σc A τ , b A τ is satisfied, in the following way. During the top-downraversal, it spawns copies of itself to verify that, whenever it has decided to mapa d ∈ cons( D con( a ) ) to the current element, then there is a tree-shaped model B d of O with tp K ( B d , d ) = t d and a bisimulation that witnesses B d , d ∼ ALCI ,Σ A τ , c .This is done by ‘virtually’ traversing B d elements-by-element, storing at eachmoment only the type of the current element in a state. This is possible because B d is tree-shaped. At the beginning, the automaton is at an element of B d oftype t d and knows that the bisimulation maps this element to the node of A τ currently visited by the automaton. It then does two things to verify the twomain conditions of bisimulations. First, it transitions to every neighbor of thenode of A τ currently visited, both upwards and downwards, and carries outin its state the corresponding transition in B d , in effect guessing a new type.Second, it considers the current type of B d and guesses successor types thatsatisfy the existential restrictions in it. For every required successor type, it thenguesses a neighbor of the currently visited node in A τ to which the successor ismapped. The two steps are alternated, exploiting the alternation capabilities ofthe automaton. Some extra bookkeeping in states is needed for the root node ofthe input tree as it represents more then one element of A τ .It can be verified that only exponentially many states are required and thatthe transition function can be computed in double exponential time. This finishesthe proof sketch of Lemma 3.The upper bound in Theorem 3 is now obtained as follows. By Condition 4of Theorem 1 and Lemma 3, ( K , P, { b } ) is projectively ALCI ( Σ )-separable iff L ( A ) ∩ L ( A ) ∩ L ( A ) is not empty where L ( A ) denotes the complement of L ( A ). By Lemma 3, all A i can be constructed in double exponential time andtheir number of states is single exponential in ||K|| . As the complement andintersections of 2ATAs can be computed in polynomial time with only a poly-nomial increase in the number of states, it remains to recall that non-emptinessof 2ATAs can be decided in time exponential in the number of states.For the lower bound, we reduce from conservative extensions in ALCI . An
ALCI -ontology O is a conservative extension of an ALCI -ontology O ⊆ O if there is no ALCI (sig( O ))-concept C that is satisfiable w.r.t. O , but unsat-isfiable w.r.t. O . We define projective conservative extensions in the same wayexcept that C is now an ALCI -concept with sig( O ) ∩ sig( C ) ⊆ sig( O ). It wasshown in [19] that it is -hard to decide, given ALCI -ontologies O and O , whether O is a (non-projective) conservative extensions of O . It wasfurther observed that conservative extensions and projective conservative exten-sions coincide in logics that enjoy Craig interpolation, which ALCI does [25].Thus, projective conservative extensions in
ALCI are also -hard. Wegive a polynomial time reduction from (the complement of) that problem toprojective
ALCI -separability with signature.Thus let O , O be ALCI -ontologies with O ⊆ O . We can assume w.l.o.g.that O takes the form {⊤ ⊑ C } , O takes the form O ∪ {⊤ ⊑ C } , and that O is satisfiable. For a concept name A , the A -relativization C A of an ALCI -concept C is obtained by replacing every subconcept ∃ r.D in C with ∃ r. ( A ⊓ D ).efine O = {⊤ ⊑ C A , A ⊑ C A } where A is a concept name that does not occur in O . Then O is not a pro-jective conservative extension of O iff there is an ALCI (sig( O ))-concept C that is satisfiable w.r.t. O but unsatisfiable w.r.t. O iff the labeled ALCI -KB( K , { a } , { b } ) is projectively ALCI (sig( O ))-separable by ¬ C where K = ( O , D )and D = { A ( a ) , D ( b ) } , D a fresh (dummy) concept name. We have thus estab-lished the lower bound from Theorem 3.We leave open the decidability and exact complexity of non-projective ALCI -separability with signature. A lower bound can be established alongthe lines above.
Inspired by the close connection of conservative extensions and weak separabilitywith signature that was established in the previous section, we investigate logicsfor which conservative extensions are undecidable: the guarded fragment GFand the expressive DL
ALCFIO . The latter logic,
ALCFIO , is the extension of
ALCI with nominals and functionality assertions . Nominals are concepts of theform { o } with o a constant symbol, and the translation · † from ALCI into FOcan be extended to nominals by setting { o } † = ( x = o ). Functionality assertionsare concept inclusions of the form ⊤ ⊑ ( r ), r a role, which demand thatthe role r is interpreted as a partial function. Thus, they are a weak form ofcounting, and indeed ALCFIO is a fragment of C , the two-variable fragmentof FO with counting.It is known that conservative extensions and projective conservative exten-sions are undecidable in every extension of the three-variable fragment GF ofGF [25] and in every extension of ALCFIO [36]. Unfortunately, it is not clearhow to achieve a direct reduction of conservative extensions to separability forboth GF and
ALCFIO . In both cases, the relativization that was used for
ALCI does not work. For GF, this is the case because non-conservativity in GF is wit-nessed by sentences while separability is witnessed by formulas . For
ALCFIO ,relativization cannot be applied due to the presence of nominals/constants. Weinstead directly use and adapt the strategies of the mentioned undecidabilityproofs, starting with GF.
Theorem 4.
Projective and non-projective L -separability with signature are un-decidable for every logic L that contains GF . This is even true when the lan-guage of the separating formula is ALC . The proof is by a reduction from the halting problem of two-register machines.A (deterministic) two-register machine (2RM) is a pair M = ( Q, P ) with Q = q , . . . , q ℓ a set of states and P = I , . . . , I ℓ − a sequence of instructions . Bydefinition, q is the initial state , and q ℓ the halting state . For all i < ℓ , either I i = +( p, q j ) is an incrementation instruction with p ∈ { , } a registerand q j the subsequent state; – or I i = − ( p, q j , q k ) is a decrementation instruction with p ∈ { , } a register, q j the subsequent state if register p contains 0, and q k the subsequent stateotherwise.A configuration of M is a triple ( q, m, n ), with q the current state and m, n ∈ N the register contents. We write ( q i , n , n ) ⇒ M ( q j , m , m ) if one of the followingholds: – I i = +( p, q j ), m p = n p + 1, and m − p = n − p ; – I i = − ( p, q j , q k ), n p = m p = 0, and m − p = n − p ; – I i = − ( p, q k , q j ), n p > m p = n p −
1, and m − p = n − p .The computation of M on input ( n, m ) ∈ N is the unique longest configurationsequence ( p , n , m ) ⇒ M ( p , n , m ) ⇒ M · · · such that p = q , n = n , and m = m . The halting problem for 2RMs is to decide, given a 2RM M , whetherits computation on input (0 ,
0) is finite (which implies that its last state is q ℓ ).We convert a given 2RM M into a labeled GF KB ( K , { a } , { b } ), K =( O , D ) and signature Σ such that M halts iff ( K , { a } , { b } ) is (non-)projectivelyGF( Σ )-separable iff ( K , { a } , { b } ) is (non-)projectively ALC ( Σ )-separable. Let M = ( Q, P ) with Q = q , . . . , q ℓ and P = I , . . . , I ℓ − . We assume w.l.o.g. that ℓ ≥ I i = − ( p, q j , q k ), then q j = q k . In K , we use the following setof relation symbols: – a binary symbol N connecting a configuration to its successor configuration; – binary symbols R and R that represent the register contents via the lengthof paths; – unary symbols q , . . . , q ℓ representing the states of M ; – a unary symbol S denoting points where a computation starts. – a unary symbol D used to represent that there is some defect; – binary symbols D + p , D − p , D = p used to describe defects in incrementing, decre-menting, and keeping register p ∈ { , } ; – ternary symbols H +1 , H +2 , H − , H − , H =1 , H =2 used as guards for existentialquantifiers.The signature Σ consists of the symbols from the first four points above.We define the ontology O as the set of several GF sentences. The firstsentence initializes the starting configuration: ∀ x ( Sx → ( q x ∧ ¬∃ y R xy ∧ ¬∃ y R xy ))Second, whenever M is not in the final state, there is a next configuration withthe correctly updated state. For 0 ≤ i < ℓ , we include: ∀ x ( q i x → ∃ y N xy ) ∀ x ( q i x → ∀ y ( N xy → q j y )) if I i = +( p, q j ) ∀ x (( q i x ∧ ¬∃ yR p xy ) → ∀ y ( N xy → q j y )) if I i = − ( p, q j , q k ) ∀ x (( q i x ∧ ∃ yR p xy ) → ∀ y ( N xy → q k y )) if I i = − ( p, q j , q k ) The formulas that are not syntactically guarded can easily be rewritten into suchformulas. oreover, if M is in the final state, there is no successor configuration: ∀ x ( q ℓ x → ¬∃ y N xy ) . The next conjunct expresses that either M does not halt or the representation ofthe computation of M contains a defect. It crucially uses non- Σ relation symbols.It takes the shape of ∀ x ( Dx → ∃ y ( N xy ∧ ψxy ))where ψxy is the following disjunction which ensures that there is a concretedefect ( D + p , D − p , D = p ) here or some defect ( D ) in some successor state: D ( y ) ∨ _ I i =+( p,q j ) ( q i x ∧ q j y ∧ ( D + p xy ∨ D =1 − p xy )) ∨ _ I i = − ( p,q j ,q k ) ( q i x ∧ q k y ∧ ( D − p xy ∨ D =1 − p xy )) ∨ _ I i = − ( p,q j ,q k ) ( q i x ∧ q j y ∧ ( D = p xy ∨ D =1 − p xy ))Finally, using the ternary symbols we make sure that the defects are realized,for example, by taking: ∀ x ∀ y (cid:0) D + p xy → ( ¬∃ z R p yz ∨ ( ¬∃ z R p xz ∧ ∃ z ( R p yz ∧ ∃ xR p zx )) ∨∃ z ( H +1 xyz ∧ R p xz ∧ ∃ x ( H +2 xzy ∧ R p yx ∧ D + p zx ))) (cid:1) . Similar conjuncts implement the desired behaviour of D = p and D − p ; since theyare constructed analogously to the last three lines above (but using guards H − j and H = j ), details are omitted.Finally, we define a database D by taking D = { S ( a ) , D ( a ) , S ( b ) } . Lemmas 4 and 5 below establish correctness of the reduction and thus Theorem 4.
Lemma 4. If M halts, then there is an ALC ( Σ ) concept that non-projectivelyseparates ( K , { a } , { b } ) . Proof.
The idea is that the separating
ALC ( Σ ) concept describes the haltingcomputation of M , up to ALC ( Σ )-bisimulations. More precisely, assume that M halts. We define an ALC ( Σ ) concept C such that K | = ¬ C ( a ), but K 6| = ¬ C ( b ).Intuitively, C represents the computation of M on input (0 , q , n , m ) , . . . , ( q k , n k , m k ), then there is an N -path of length k (but not longer) such that any object reachable in i ≤ k steps from the beginningf the path is labeled with q i , has an outgoing R -path of length n i and no longeroutgoing R -path, and likewise for R and m i . In more detail, consider the Σ -structure A withdom( A ) = { , . . . , k } ∪ { a ij | < i ≤ k, < j < n i } ∪{ b ij | < i ≤ k, < j < m i } in which N A = { ( i, i + 1) | i < k } R A = S i ≤ k { ( i, a i ) , ( a i , a i ) , . . . , ( a in i − , a in i − ) } R A = S i ≤ k { ( i, b i ) , ( b i , b i ) , . . . , ( b im i − , b im i − ) } S A = { } q A = { i | q i = q } for any q ∈ Q. Then let C be the ALC ( Σ ) concept that describes A from the point of 0 up to ALC ( Σ )-bisimulations. Clearly, K ∪ { C ( b ) } is satisfiable. However, K ∪ { C ( a ) } is unsatisfiable since the enforced computation does not contain a defect andcannot be extended to have one. In particular, there are no N -paths of length > k in any model of K ∪ { C ( a ) } and there are no defects in register updates inany model of K ∪ { C ( a ) } . ❏ The following lemma implies that if M does not halt, then ( K , { a } , { b } )is neither projectively L ( Σ )-separable nor non-projectively L ( Σ )-separable for L = GF and in fact for every logic L between GF and FO. Lemma 5. If M does not halt, then for every model A of K , there is a model B of K such that ( A , b A ) is Γ -ismorphic to ( B , a B ) where Γ consists of all symbolsexcept sig ( O ) \ Σ . Proof.
Let A be a model of K . We obtain B from A by re-interpreting a B = b A and inductively defining the extensions of the symbols fromsig( O ) \ Σ = { D, D + p , D − p , D = p , H +1 , H +2 , H − , H − , H =1 , H =2 } . We start with D B = { a B } and X B = ∅ for all other symbols X from sig( O ) \ Σ .Then, whenever d ∈ D B we distinguish two cases: – If there is an N -successor e of d such that the counters below d and e arenot correctly updated with respect to the states at d, e , set the extensionsof the symbols in D + p , D − p , D = p , H +1 , H +2 , H − , H − , H =1 , H =2 so as to representthe defect and finish the construction of B . – Otherwise, choose an N -successor e of d and add e to D B .Note that, since M does not halt, we can always find such an N -successor as inthe second item. ❏ et us now look at ALCFIO . Theorem 5.
Projective and non-projective L -separability with signature are un-decidable for every logic L that contains ALCFIO . The proof is by a reduction of the following undecidable tiling problem.
Definition 2. A tiling system S = ( T, H, V, R, L, T, B ) consists of a finite set T of tiles , horizontal and vertical matching relations H, V ⊆ T × T , and sets R, L, T, B ⊆ T of right tiles, left tiles, top tiles, and bottom tiles. A solution to S is a triple ( n, m, τ ) where n, m ∈ N and τ : { , . . . , n } × { , . . . , m } → T suchthat the following hold:1. ( τ ( i, j ) , τ ( i + 1 , j )) ∈ H , for all i < n and j ≤ m ;2. ( τ ( i, j ) , τ ( i, j + 1)) ∈ V , for all i ≤ n and j < m ;3. τ (0 , j ) ∈ L and τ ( n, j ) ∈ R , for all j ≤ m ;4. τ ( i, ∈ B and τ ( i, m ) ∈ T , for all i ≤ n . We show how to convert a tiling system S into a labeled ALCFIO -KB ( K , P, N )and signature Σ such that S has a solution iff ( K , P, N ) is ALCFIO ( Σ )-separableiff ( K , P, N ) is projectively ALCFIO ( Σ )-separable.Let S = ( T, H, V, R, L, T, B ) be a tiling system. Define an ontology O thatconsists of the following statements: – The roles r x , r y , and their inverses are functional: ⊤ ⊑ ( r ) , for r ∈ { r x , r y , r − x , r − y } – Every grid node is labeled with exactly one tile and the matching conditionsare satisfied: ⊤ ⊑ ⊔ t ∈ T ( t ⊓ ⊓ t ′ ∈ T, t ′ = t ¬ t ′ ) ⊤ ⊑ ⊓ t ∈ T ( t → ( ⊔ ( t,t ′ ) ∈ H ∀ r x .t ′ ⊓ ⊔ ( t,t ′ ) ∈ V ∀ r y .t ′ )) – The concepts left , right , top , bottom mark the borders of the grid in theexpected way: right ⊑ ¬∃ r x . ⊤ ⊓ ∀ r y . right ⊓ ∀ r − y . right ¬ right ⊑ ∃ r x . ⊤ and similarly for left , top , and bottom . – The individual name o marks the origin: { o } ⊑ left ⊓ bottom . – there is no infinite outgoing r x / r y -path starting at o and grid cells close inthe part of models reachable from o : Q ⊑ ∃ r x .Q ⊔ ∃ r y .Q ⊔ ( ∃ r x . ∃ r y .P ⊓ ∃ r y . ∃ r x . ¬ P ) A ⊓ A ⊑ ∃ u. ( { o } ⊓ Q )he final item deserves some further explanation. It is to be read as follows: thestated properties hold in a model A whenever A can not be extended to a modelof the upper CI that makes true Q at o . In conjunction with the database, thesecond CI is a switch that will allow us to sometimes require that Q is madetrue at o .Set Σ = T ∪ { r x , r y , left , right , top , bottom } and consider the labeled KB( K , { a } , { b } ) where K = ( O , D ) with D = { A ( a ) , Y ( b ) } with Y a fresh (dummy)concept name. Lemma 6. If S has a solution, then there is an ALCIO ( Σ ) concept that non-projectively separates ( K , { a } , { b } ) . Proof.
We design the
ALCI ( Σ ) concept C = ¬ D so that any model of D and O , even without the CIs from the last item, includes a properly tiled n × m -gridwith lower left corner o .For every word w ∈ { r x , r y } ∗ , denote by ←− w the word that is obtained byreversing w and then adding · − to each symbol. Let | w | r denote the number ofoccurrences of the symbol r in w . Now, D = A ⊓∃ u.E where E is the conjunctionof { o } ⊓ ∀ r nx . right ⊓ ∀ r my . top and for every w ∈ { r x , r y } ∗ such that | w | r x < n and | w | r y < m , the concept ∃ ( w · r x r y r − x r − y · ←− w ) . { o } , where ∃ w.F abbreviates ∃ r . · · · ∃ r k .F if w = r · · · r k . It is readily checked that E (and thus D ) indeed enforces a properly tiled grid as announced. Then, due tothe CIs in the last item, K ∪ { D ( a ) } is unsatisfiable: Any model A has to satisfy a A ∈ A A since A ( a ) ∈ D and a A ∈ A A , due to the assertion D ( a ). Hence thelast CIs become ‘active’, which is in conflict to the fact that the model enforcedby D contains neither an infinite r x / r y -path nor a non-closing grid-cell. Thus, K | = C ( a ) as required.Now for K 6| = C ( b ). We find a model A of K with b A ∈ D A since all CIs in O except this from the last item are satisfied by the grid enforced by D and theCIs in the last item can be made ‘inactive’ by making A false at b A . ❏ The following lemma implies that if S has no solution, then ( K , { a } , { b } ) isneither projectively L ( Σ )-separable nor non-projectively L ( Σ )-separable for L = ALCIO and in fact for every logic L between ALCIO and FO.
Lemma 7. If S has no solution, then for every model A of K , there is a model B of K such that ( A , b A ) is Γ -isomorphic to ( B , a A ) where Γ consists of allsymbols except { u, A , Q, P } . Proof. (sketch) If b I / ∈ A A , then we can simply obtain B from A by switching a A and b A and making A true at a A . If b A ∈ A A , then after switching weadditionally have to re-interpret Q , P , and u in a suitable way. But S hasno solution and thus when following r x / r y -paths from o in I , we must eitherencounter an infinite such path or a non-closing grid cell as otherwise we canextract from I a solution for S . Thus we can re-interpret Q , P , and u as required. ❏ Strong Separability with Signature
We introduce strong separability of labeled KBs. The crucial difference to weakseparability is that the negation of the separating formula must be entailed atall negative examples.
Definition 3.
Let ( K , P, N ) be a labeled FO-KB and Σ ⊆ sig ( K ) a signature.An FO-formula ϕ ( x ) strongly Σ -separates ( K , P, N ) if sig ( K ) ∩ sig ( ϕ ) ⊆ Σ and1. K | = ϕ ( a ) for all a ∈ P and2. K | = ¬ ϕ ( a ) for all a ∈ N .Let L S be a fragment of FO. We say that ( K , P, N ) is strongly projectively L S ( Σ )-separable if there exists an L S ( Σ ) -formula ϕ ( x ) that strongly separates ( K , P, N ) and non-projectively strongly L S ( Σ )-separable if there is such a ϕ ( x ) with sig ( ϕ ) ⊆ Σ . In contrast to weak separability, any formula ϕ that strongly separates a labeledKB ( K , P, N ) and uses helper symbols R that are not in Σ can easily be trans-formed into a strongly separating formula that uses only symbols from Σ : simplyreplace any such R by a relation symbol R ′ of the same arity that is in Σ . Then,if ϕ strongly separates ( K , P, N ), so does the resulting formula ϕ ′ . If no relationsymbol of the same arity as R occurs in Σ one can alternatively replace relevantsubformulas by ⊤ or ⊥ . In what follows, we thus only consider non-projectivestrong separability and simply speak of strong separability.Note that for languages L S closed under conjunction and disjunction a la-beled KB ( K , P, N ) is strongly L S ( Σ )-separable iff every ( K , { a } , { b } ) with a ∈ P and b ∈ N is strongly L S ( Σ )-separable. In fact, if ϕ a , b strongly separates( K , { a } , { b } ) for a ∈ P and b ∈ N , then W a ∈ P V b ∈ N ϕ a , b strongly separates( K , P, N ). Without loss of generality, we may thus work with labeled KBs withsingleton sets of positive and negative examples.Each choice of an ontology language L and a separation language L S thusgives rise to a (single) strong separability problem that we refer to as strong ( L , L S ) -separability , defined in the expected way: PROBLEM : strong ( L , L S ) separability with signature INPUT : labeled L -KB ( K , P, N ) and signature Σ ⊆ sig( K ) QUESTION : Is ( K , P, N ) strongly L S ( Σ )-separable?If L = L S , then we simply speak of strong L -separability. The study of strongseparability is very closely linked to the study of interpolants and the Craiginterpolation property. Given formulas ϕ ( x ) , ψ ( x ) and a fragment L of FO, wesay that an L -formula χ ( x ) is an L -interpolant of ϕ, ψ if ϕ ( x ) | = χ ( x ), χ ( x ) | = ψ ( x ), and sig( χ ) ⊆ sig( ϕ ) ∩ sig( ψ ). We say that L has the CIP if for any L -formulas ϕ ( x ) , ψ ( x ) such that ϕ ( x ) | = ψ ( x ) there exists an L -interpolant of ϕ, ψ . FO has the CIP, and so does GNF [10,8], at least if one admits non-sharedconstants in the interpolant. On the other hand, GF does not have the CIP [24].he link between the interpolants, the CIP, and strong separability is easy tosee: assume a labeled FO-KB ( K , { a } , { b } ) with K = ( O , D ) and a signature Σ ⊆ sig( K ) are given. Obtain K Σ, a and K Σ, b from K by – replacing all non- Σ -relation symbols R in K by fresh symbols R a and R b ,respectively; – replacing all constant symbols c by fresh variables x c, a and x c, b which aredistinct, except that a and b are replaced by the same tuple x in K Σ, a and K Σ, b , respectively.Then let ϕ Σ, a ( x ) = ∃ z ( V K Σ, a ), where z is the sequence of free variables in K Σ, a without the variables in x and ( V K Σ, a ) is the conjunction of all formulasin K Σ, a . ϕ Σ, b ( x ) is defined in the same way, with a replaced by b . The followinglemma is a direct consequence of the construction. Lemma 8.
Let L be a fragment of FO. Then the following conditions are equiv-alent for any formula ϕ in L :1. ϕ strongly L ( Σ ) -separates ( K , { a } , { b } ) ;2. ϕ is an L -interpolant for ϕ Σ, a ( x ) , ¬ ϕ Σ, b ( x ) . Thus, the problem whether a labeled KB ( K , P, N ) is strongly L ( S )-separableand the computation of a strongly separating formula can be equivalently for-mulated as an interpolant existence problem. As FO has the CIP, we obtain thefollowing characterization of the existence of strongly FO( Σ )-separating formu-las. Theorem 6.
The following conditions are equivalent for any labeled FO-KB ( K , { a } , { b } ) and signature Σ ⊆ sig ( K ) :1. ( K , { a } , { b } ) is strongly FO ( Σ ) -separable;2. ϕ Σ, a ( x ) | = ¬ ϕ Σ, b ( x ) . For fragments L of FO such as ALCI , GF, and GNF, Lemma 8 has to be appliedwith some care, as one has to ensure that the formulas ϕ Σ, a ( x ) , ¬ ϕ Σ, b ( x ) arestill within L . This will be discussed in the next two sections. ALCI
We first compare the strong separating power of
ALCI with signature restric-tions to the strong separating power of FO with signature restrictions and showthat they differ. This is in contrast to strong separability without signature re-strictions. We then show that strong
ALCI -separability with signature restric-tions is -complete, thus one exponential harder than strong
ALCI -separability without signature restrictions. Observe that we cannot apply theCIP of
ALCI [15] to investigate strong separability for
ALCI -KBs as one can-not encode the atomic formulas of the database in
ALCI . One could instead move to the extension
ALCIO of ALCI with nominals. Thislanguage, however, does not have the CIP [15]. Recently, interpolant existence in
ALCIO has been investigated in [4], and the results could be applied here. Thefollowing direct approach is of independent value, however. n [26], strong separability is studied without signature restrictions. It turnedthat a labeled
ALCI -KB is strongly
ALCI -separable without signature restric-tions iff it is strongly FO-separable without signature restrictions. Unfortunately,this is not the case with signature restrictions. A simple counterexample is givenin the following example.
Example 5.
Let D = { R ( a, a ) , A ( b ) } and O = { A ⊑ ∀ R. ¬ A } . Let K = ( O , D )and Σ = { R } . Then R ( x, x ) strongly separates ( K , { a } , { b } ) and thus ( K , { a } , { b } )is strongly FO( Σ )-separable. The characterization below immediately impliesthat ( K , { a } , { b } ) is not strongly ALCI ( Σ )-separable.We now show that strong ALCI -separability with signature restrictions is -complete. To this end we first give a model-theoretic characterization ofstrong
ALCI -separability using
ALCI -bisimulations.
Theorem 7.
Let ( K , { a } , { b } ) be an ALCI -KB and Σ ⊆ sig ( K ) a signature.Then the following conditions are equivalent:1. ( K , { a } , { b } ) is strongly ALCI ( Σ ) -separable;2. There are no models A and B of K such that A , a A ∼ ALCI ,Σ B , b B . The proof is straightforward using Lemma 2. By working with isomorphic copiesof the database D it thus suffices to show the following result. Lemma 9.
Let ( K , { a } , { b } ) be a labeled ALCI -KB with K = ( O , D ) such that a, b are in distinct maximal connected components of D . Then the problem todecide whether there exists a model A of K such that A , a A ∼ ALCI ,Σ A , b A is -complete. Let K = ( O , D ) and let Σ ⊆ sig( K ) be a signature. We start with proving theupper bound and use the notion of K -types as introduced in Section 4. Let R bea role. We say that K -types t and t are R -coherent if there exists a model A of O and nodes d and d realizing t and t , respectively, such that ( d , d ) ∈ R A .We write t R t in this case. Definition 4 ( ( O , Σ ) -amalgamable). A set Φ of K -types is ( O , Σ )-amalga-mable if there exist models A t of O for t ∈ Φ with elements d t realizing t in A t such that all A t , d t with t ∈ Φ are ALCI ( Σ ) -bisimilar. Lemma 10.
The set of all ( O , Σ ) -amalgamable sets of K -types can be computedin double exponential time. We devise an elimination procedure as follows. Start with M the set of allsets of K -types. Given a set M i of sets of types, we obtain M i +1 by eliminatingall Φ = { t , . . . , t n } from M i which do not satisfy the following conditions:1. for every A ∈ Σ , we have A ∈ t i iff A ∈ t j , for all t i , t j ∈ Φ ;2. for every t i , every Σ -role R , and every ∃ R.C ∈ t i there are K -types t ′ , . . . , t ′ n such that C ∈ t ′ i and t j R t ′ j , for all j , and { t ′ , . . . , t ′ n } ∈ M i .et M ∗ be where the sequence M , M , . . . stabilizes. Claim . Φ ∈ M ∗ iff Φ is ( O , Σ )-amalgamable. Proof of the Claim . For the “if”-direction, suppose that Φ = { t , . . . , t n } is( O , Σ )-amalgamable. We can fix (disjoint) models A t , . . . , A t n of O realizingtypes t i at d t i . Let A denote the union of A t , . . . , A t n and let S be the set of allpairs ( d, e ) which are Σ -bisimilar in A . Recall that S is an equivalence relation.By assumption, we have ( d t i , d t j ) ∈ S , for all i, j . It can be verified that the set N defined by N = {{ tp K ( A , d ) | ( d, e ) ∈ S } | e ∈ dom( A ) } is contained in all M i and thus in M ∗ .For “only if”, let Φ = { t , . . . , t n } ∈ M ∗ . We inductively construct a domain ∆ , a map π of the domain ∆ to K -types, and an equivalence relation S . Duringthe construction, we preserve the invariant( ∗ ) π ( D ) = { π ( d ) | d ∈ D } ∈ M ∗ , for every D ∈ S .For the construction, start with setting – ∆ = { d , . . . , d n } , π ( d i ) = t i , for all i , and S = { ∆ } .Obviously, the invariant is satisfied. To obtain ∆ i +1 from ∆ i , choose some D = { e , . . . , e m } ∈ S , some ∃ R.C ∈ π ( d i ) for some Σ -role R . By the invariant,we have { t , . . . , t k } = π ( D ) ∈ M ∗ . Let t ′ , . . . , t ′ k be the types that exist dueto ( E2 ). Now, add fresh elements e Re ′ , . . . , e m Re ′ m to ∆ i , set π ( e i Re ′ i ) = π ( e i ) ′ ,for all i , and add { e Re ′ , . . . , e m Re ′ m } to S . By construction, the invariant ( ∗ )is preserved.Now define a structure A by taking:dom( A ) = [ i ≥ ∆ i A A = { e | A ∈ π ( e ) } r A = { ( d, dRe ) | dRe ∈ dom( A ) } ∪{ ( dR − e, d ) | dR − e ∈ dom( A ) } By construction, S is an ALCI ( Σ )-bisimulation that contains ( d i , d j ) for all i, j .Since K -types are realizable by definition, we can extend A to a model A ∗ of O by adding non- Σ -subtrees whenever they are needed.It follows that Φ is ( O , Σ )-amalgamable. This finishes the proof of the Claim.It remains to discuss the running time of the algorithm. The initial set M contains at most double exponentially many elements. Since in every round someelement is removed from M i , the stabilization is reached after | M | rounds. Itremains to observe that the elimination conditions 1 and 2 can be checked indouble exponential time.It is important to note that the proof of Lemma 10 shows that, if a set Φ is ( O , Σ )-amalgamable, then this is witnessed by disjoint tree-shaped models A t ith root d t , for each t ∈ Φ , and ALCI ( Σ )-bisimulations S which “never visitthe roots again”, that is, if ( d t , e ) ∈ S or ( e, d t ) ∈ S , then e = d t ′ for some t ′ .This will be used in the proof of the characterization below.Let Ψ be a mapping associating with every c ∈ cons( D ) a K -type t c and aset Φ c of K -types. We say that Ψ is K , a, b -satisfiable if1. there exists a model A of K realizing t c in c A for c ∈ cons( D );2. Φ c ∪ { t c } is ( O , Σ )-amalgamable, for all c ∈ cons( D );3. Φ a ∪ Φ b ∪ { t a , t b } is ( O , Σ )-amalgamable;4. If R ( d, e ) ∈ O , for some Σ -role R , and t ∈ Φ d , then there exists t ′ ∈ Φ e suchthat t R t ′ ; Lemma 11.
The following conditions are equivalent: – There exists a model A of K such that A , a A ∼ ALCI ,Σ A , b A . – There exists Ψ that is K , a, b -satisfiable. For “only if”, let A be a model of K such that ( a A , b A ) ∈ S . Let t c be the typerealized in c A for c ∈ cons( D ) and let Φ c be the set of all K -types realized innodes that are ALCI ( Σ )-bisimilar in A to c A , that is, Φ c = { tp K ( A , d ) | A , c A ∼ ALCI ,Σ A , d A } . It is easy to see that the resulting Ψ is as required.Conversely, assume Ψ is given. Due to Condition 1, we can fix a model B of K realizing t c in c B for c ∈ cons( D ). Moreover, due to Condition 2, we can fixfor any c ∈ cons( D ) and t ∈ Φ c ∪ { t c } tree-shaped models A c,t of O with root d c,t such that all pointed structures in { A c,t , d c,t | t ∈ Φ c ∪ { t c }} are ALCI ( Σ )-bisimilar. By Condition 3, we can assume that A a,t a , d a,t a and A b,t b , d b,t b are ALCI ( Σ )-bisimilar.We inductively construct a model A and a ALCI ( Σ )-bisimulation S . Westart with the structure A which is obtained as follows. Let B ′ be B restrictedto the domain { d B | d ∈ cons( D ) } . Now, A is the union of B ′ and all structures A c,t c for all c ∈ cons( D ), always identifying the root of A c,t c with c B . Moreover,let S be the ALCI ( Σ )-bisimulation between A a,t a and A b,t b . By the commentafter the proof of Lemma 10, ( a A , b A ) is the only tuple in S that contains a A or b A .Note that A is in fact a model of K but S is not yet a bisimulation. Inorder to make it one, we “chase” the database in both connected componentspreserving the following invariant (which is obviously satisfied for A , S ):( ∗ ) If ( c, d A i ) ∈ S i or ( d A i , c ) ∈ S i for some d ∈ cons( D ), then the type t realized by c in A i satisfies t ∈ Φ d . Moreover, the trees below c and d A i are ALCI ( Σ )-bisimilar.n the inductive step, obtain A i +1 , S i +1 from A i , S i by applying one of the fol-lowing rules: – Choose ( c, d A i ) ∈ S i and e such that R ( d, e ) ∈ D , and let t = tp A i ( c ) be thetype of c realized in A i . By ( ∗ ), we know that t ∈ Φ d . By Condition 4, wecan choose t ′ ∈ Φ e with t R t ′ . Now, add a copy of A e,t ′ to A i and make itsroot an R -successor of c . By Condition 2, there is an ALCI ( Σ )-bisimulation S between A e,t ′ and the tree A e,t e below e A i . Set S i +1 = S i ∪ S . – Choose ( d A i , c ) ∈ S i and e such that R ( d, e ) ∈ D , and proceed analogouslyto the first rule.Let A = S A i and S = S S i . Claim. A is a model of K and S is ALCI ( Σ )-bisimulation with ( a A , b A ) ∈ S . Proof of the Claim.
We have A | = K since A i | = K , for all i . Moreover, ( a A , b A ) ∈ S since ( a A , b A ) ∈ S , by definition of S . To see that S is an ALCI ( Σ )-bisimulation, let ( d, e ) ∈ S . We distinguish two cases: – None of d, e is in { c A | c ∈ cons( D ) } . Thus, ( d, e ) ∈ S because d and e are inner nodes of some of the trees A c,t that were fixed in the beginning.By construction, ( d, e ) is an element of a ALCI ( Σ )-bisimulation S ′ betweenthose trees. Thus, for every R -successor d ′ of d in A , R a Σ -role, there is an R -successor e ′ of e with ( d ′ , e ′ ) ∈ S ′ and thus ( d ′ , e ′ ) ∈ S . The forth-conditionis analgous. – One of d, e is in { c A | c ∈ cons( D ) } , say d = f A . Suppose d ′ is an R -successorof d , for some Σ -role R . We distinguish two cases: • d ′ is in the subtree A d,t d below d . Then because of ( ∗ ), there is an R -successor e ′ of e in the tree below e such that ( d ′ , e ′ ) ∈ S . • d ′ = g A for some R ( f, g ) ∈ D . Since the rules are applied exhaustively,there is an R -successor e ′ of e in A such that ( d ′ , e ) ∈ S .The forth-condition is analgous.This finishes the proof of the Claim and, in fact, of the Lemma.We can thus use the following algorithm to decide strong ALCI ( Σ )-separabilityon input ( K , P, N ).1. compute the set of all ( O , Σ )-amalgamable sets.2. for all a ∈ P and b ∈ P :(a) enumerate all possible mappings Ψ consisting of K -types t c and sets of K -types Φ c , for every c ∈ cons( D ).(b) if Ψ is K , a, b -satisfiable, that is, satisfies Conditions 1–4 above, return“not separable.”3. return “separable”.The algorithm is correct due to Theorem 7 and Lemma 11. Moreover, it runs indouble exponential time since Step 1 can be executed in double exponential time,by Lemma 10, there are only double exponentially many possible mappings Ψ ,and K , a, b -satisfiability can be checked in double exponential time: Condition 1an be done in exponential time, Conditions 2 and 3 are a mere lookup inthe (precomputed) amalgamable sets, and Condition 4 can be tested in doubleexponential time.For the lower bound, we reduce the word problem for exponen-tially space bounded alternating Turing machines (ATMs). We actually use aslightly unusual ATM model which is easily seen to be equivalent to the stan-dard model.An alternating Turing machine (ATM) is a tuple M = ( Q, Θ, Γ, q , ∆ ) where Q = Q ∃ ⊎ Q ∀ is the set of states that consists of existential states in Q ∃ and universal states in Q ∀ . Further, Θ is the input alphabet and Γ is the tapealphabet that contains a blank symbol (cid:3) / ∈ Θ , q ∈ Q ∃ is the starting state , andthe transition relation ∆ is of the form ∆ ⊆ Q × Γ × Q × Γ × { L, R } . The set ∆ ( q, a ) := { ( q ′ , a ′ , M ) | ( q, a, q ′ , a ′ , M ) ∈ ∆ } must contain exactly two or zeroelements for every q ∈ Q and a ∈ Γ . Moreover, the state q ′ must be from Q ∀ if q ∈ Q ∃ and from Q ∃ otherwise, that is, existential and universal states alternate.Note that there is no accepting state. The ATM accepts if it runs forever andrejects otherwise. Starting from the standard ATM model, this can be achievedby assuming that exponentially space bounded ATMs terminate on any inputand then modifying them to enter an infinite loop from the accepting state.A configuration of an ATM is a word wqw ′ with w, w ′ ∈ Γ ∗ and q ∈ Q .We say that wqw ′ is existential if q is, and likewise for universal . Successorconfigurations are defined in the usual way. Note that every configuration hasexactly two successor configurations.A computation tree of an ATM M on input w is an infinite tree whose nodesare labeled with configurations of M such that – the root is labeled with the initial configuration q w ; – if a node is labeled with an existential configuration wqw ′ , then it has asingle successor and this successor is labeled with a successor configurationof wqw ′ ; – if a node is labeled with a universal configuration wqw ′ , then it has two suc-cessors and these successors are labeled with the two successor configurationsof wqw ′ .An ATM M accepts an input w if there is a computation tree of M on w .We reduce the word problem for 2 n -space bounded ATMs which is known tobe -hard [16]. The idea of the reduction is as follows. We set D = { A ( a ) , r ( b, b ) , B ( b ) } ,Σ = { r, s, Z, B ∀ , B ∃ , B ∃ } ∪ { A σ | σ ∈ Γ ∪ ( Q × Γ ) } The ontology O enforces that in an A -node starts an infinite r -path ρ . Along ρ ,a counter counts modulo 2 n using concept names not in Σ . In each point of ρ starts an infinite tree along role s that is supposed to mimick the computationtree of M . Along this tree, two counters are maintained: – one counter starting at 0 and counting modulo 2 n to divide the tree insubpaths of length 2 n ; each such path of length 2 n represents a configuration; another counter starting at the value of the counter along ρ and also countingmodulo 2 n .To link successive configurations we use that if ( K , { a } , { b } ) has no strong ALCI ( Σ ) solution, then there exist models A and B of K such that A , a A ∼ Σ B , b B : from r ( b, b ) ∈ D it follows that in A all nodes on the r -path ρ are Σ -bisimilar. Thus, each node on the ρ is the starting point of s -trees with identical Σ -decorations. As on the m th s -tree the second counter starts at all nodes atdistances k × n − m , for all k ≥
1, we are in the position to coordinate allpositions at all successive configurations.The ontology O is constructed as follows. We first enforce the infinite r -path ρ with the counter, which is realized using concept names A i , A i , i < n : A ⊑ I s ⊓ ⊓ i To coordinate consecutive configurations, we associate with M functions f i , i ∈{ , } that map the content of three consecutive cells of a configuration to thecontent of the middle cell in the i -the successor configuration (assuming anarbitrary order on the set ∆ ( q, a )). In what follows, we ignore the cornercaseesthat occur at the border of configurations; they can be treated in a similar way.Clearly, for each possible such triple ( σ , σ , σ ) ∈ Γ ∪ ( Q × Γ ), there is an ALC concept C σ ,σ ,σ which is true at an element a of the computation tree iff a is labeled with A σ , a ’s s -successor b is labeled with A σ , and b ’s s -successor c is labeled with A σ . Now, in each configuration, we synchronize elements with V -counter 0 by including for every σ = ( σ , σ , σ ) and i ∈ { , } the followingsentences: ( V = 2 n − ⊓ ( U < n − ⊓ C σ ,σ ,σ ⊑ ∀ s.A f ( σ ) ⊓ ∀ s.A f ( σ ) ( V = 2 n − ⊓ ( U < n − ⊓ C σ ,σ ,σ ⊓ B i ∃ ⊑ ∀ s.A if i ( σ ) The concept names A iσ are used as markers (not in Σ ) and are propagated along s for 2 n steps, exploiting the V -counter. The superscript i ∈ { , } determinesthe successor configuration that the symbol is referring to. After crossing theend of a configuration, the symbol σ is propagated using concept names A ′ σ (thesuperscript is not needed anymore because the branching happens at the end ofthe configuration, based on Z ).( U < n − ⊓ A iσ ⊑ ∀ s.A iσ ( U = 2 n − ⊓ B ∀ ⊓ A σ ⊑ ∀ s. ( ¬ Z ⊔ A ′ σ )( U = 2 n − ⊓ B ∀ ⊓ A σ ⊑ ∀ s. ( Z ⊔ A ′ σ )( U = 2 n − ⊓ B i ∃ ⊓ A iσ ⊑ ∀ s.A ′ σ i ∈ { , } ( V < n − ⊓ A ′ σ ⊑ ∀ s.A ′ σ ( V = 2 n − ⊓ A ′ σ ⊑ ∀ s.A σ For those ( q, a ) with ∆ ( q, a ) = ∅ , we add the concept inclusion A q,a ⊑ ⊥ . he following Claim establishes correctness of the reduction Claim. M accepts the input w iff there exist models A , B of K such that A , a A ∼ Σ B , b B . Proof of the Claim. ( ⇒ ) If M accepts w , there is a computation tree of M on w . We construct a single interpretation A as follows. Let A ∗ be the infinite tree-shaped structure that represents the computation tree of M on w as describedabove, that is, configurations are represented by sequences of 2 n elements linkedby role s and labeled by B ∀ , B ∃ , B ∃ depending on whether the configuration isuniversal or existential, and in the latter case the superscript indicates whichchoice has been made for the existential state. Finally, the first element of thefirst successor configuration of a universal configuration is labeled with Z . Ob-serve that A ∗ interprets only the symbols in Σ as non-empty. Now, we obtainstructures A k , k < n from A ∗ by interpreting non- Σ -symbols as follows: – the root of A k satisfies I s ; – the U -counter starts at 0 at the root and counts modulo 2 n along each s -path; – the V -counter starts at k at the root and counts modulo 2 n along each s -path; – the auxiliary concept names of the shape A iσ and A ′ σ are interpreted in aminimal way so as to satisfy the concept inclusions listed above. Note thatthe respective concept inclusions are Horn, hence there is no choice.Now obtain A from A ∗ and the A k by creating an (both side) infinite r -path ρ through a A = a (with the corresponding A -counter) and adding all A k to everynode on the r -path by identifying the roots of the A k with the node on the path.Additionally, add A ∗ to b A = b by identifying b with the root of A ∗ . It shouldbe clear that A is as required. In particular, A is a model of O and the reflexiveand symmetric closure of – all pairs ( b, e ) , ( e, e ′ ), with e, e ′ on ρ , and – all pairs ( e, e ′ ) , ( e ′ , e ′′ ), with e in A ∗ and e ′ , e ′′ copies of e in the trees A k .is an ALCI ( Σ )-bisimulation S on A with ( b, a ) ∈ S .( ⇐ ) Let A , B be models of K such that A , a A ∼ Σ B , b B . As it was arguedabove, due to the r -self loop at b A , from a A there has to be an outgoing infinite r -path on which all s -trees are Σ -bisimilar. There is also an outgoing infinite r − -path with this property, but it is not relevant for the proof. All those s -treesare additionally labeled with some auxiliary concept names not in Σ , dependingon the distance from a A . However, it can be shown using the CIs in O that all s -trees contain a computation tree of M on input w .Note that we have to take care of inverses in the correctness proof since thecharacterization refers to ALCI -bisimulations. Since the ontology is actually an ALC -ontology (and there is a similar characterization), also strong separabilityin ALC is -hard. Strong Separability in GF and GNF We show that strong separability with signature is decidable in the guardedfragment, GF, and the guarded negation fragment, GNF, of FO. We also obtain a upper bound for GF and -completeness for GNF. Finally,we show that strong GNF separability with signature restrictions coincides withstrong FO separability with signature restrictions for labeled GNF-KBs. Theanalogous result does not hold for GF. The proofs are based on the link tointerpolants and the CIP discussed in Section 6.The formulas ϕ Σ, a ( x ) and ¬ ϕ Σ, b ( x ) defined in Section 6 are not in GF norGNF, even if K is a KB in GF or, respectively, GNF. To obtain formulas inGF and GNF, take fresh relation symbols R D , a and R D , b of arity n , where n is the number of constants in D . Then add R D , a ( y ) to K Σ, a when constructing ϕ Σ, a ( x ), where y is an enumeration of the variables in K Σ, a . Denote the resultingformula by ϕ ′ Σ, a ( x ). Do the same to construct ϕ ′ Σ, b ( x ), using R D , b instead of R D , a . The formulas ϕ ′ Σ, a and ¬ ϕ ′ Σ, b are in GF and GNF if the KB is given inGF and, respectively, GNF. By construction we obtain the following result. Theorem 8. Let L ∈ { GF , GNF } . Then there is a polynomial time reduction ofstrong L -separability with signature to L -interpolant existence. Moreover, givena labeled L -KB ( K , { a } , { b } ) and Σ ⊆ sig ( K ) , the following conditions are equiv-alent for any formula ϕ in L :1. ϕ strongly L ( Σ ) -separates ( K , { a } , { b } ) ;2. ϕ is an L -interpolant for ϕ ′ Σ, a ( x ) , ¬ ϕ ′ Σ, b ( x ) . It has been proved in [10,8] that GNF has the CIP. Thus, we obtain thefollowing result. Theorem 9. Strong GNF-separability with signature is -complete.Moreover, a GNF-KB ( K , { a } , { b } ) is strongly GNF( Σ )-separable iff it is stronglyFO ( Σ ) -separable. In contrast, GF does not enjoy the CIP [24] and so interpolant existence in GFdoes not reduce to a validity. In fact, decidability and -completenessfor GF-interpolant existence has only recently been established [28]. From thisresult and the reduction in Theorem 8, we obtain a -upper bound forstrong GF-separability with signature. A matching lower bound can be shownsimilar to the lower bound for GF-interpolant existence. Theorem 10. Strong GF-separability with signature is -complete. As GF does not enjoy the CIP, we also do not obtain that a GF-KB ( K , { a } , { b } )is strongly GF( Σ )-separable iff it is strongly FO( Σ )-separable. In fact, the coun-terexample to CIP constructed in [8] is easily adapted to show the following. Theorem 11. Strong FO( Σ ) -separability of a GF-KB ( K , { a } , { b } ) does notimply strong GF( Σ )-separabilty of ( K , { a } , { b } ) . Conclusion We have investigated the complexity of deciding weak and strong separabilityof labeled KBs with signature restrictions for ALCI and guarded fragmentsof FO, and observed a close link between weak separability and uniform inter-polants on the one hand, and between strong separability and Craig interpolantson the other. Numerous questions remain to be explored: what is the size ofseparating formulas and how can they be computed efficiently, if they exist?What is the complexity of weak non-projective separability with signature re-strictions for ALCI ? We conjecture that this is still -complete butlack a proof. What happens for DLs with number restrictions and/or nominals?We have shown that weak projective separability is undecidable for ALCFIO with signature restrictions, but it could well be decidable for ALCQO . For strongseparability, there are many exciting open problems: is strong separability withsignature restrictions decidable for ALCFIO ? In this case, even the case withoutsignature restrictions has not yet been investigated and could well already betricky. Is it decidable for the two-variable fragment of FO? For the two-variablefragment, the case without signature restrictions has been investigated in [26],and NExpTime -completeness established. Attacking these problems is closelyrelated to deciding the existence of Craig interpolants and computing (good)separating formulas is closely related to computing (good) Craig interpolants.Also of interest are the same questions for Horn DLs. The situation for EL and ELI has been explored in [18,27], but more expressive ones have not yet beenconsidered. References 1. Andr´eka, H., N´emeti, I., van Benthem, J.: Modal languages and bounded fragmentsof predicate logic. J. Philosophical Logic 27(3), 217–274 (1998)2. Arenas, M., Diaz, G.I.: The exact complexity of the first-order logic definabilityproblem. ACM Trans. Database Syst. 41(2), 13:1–13:14 (2016)3. Arenas, M., Diaz, G.I., Kostylev, E.V.: Reverse engineering SPARQL queries. In:Proc. of WWW. pp. 239–249 (2016)4. Artale, A., Jung, J.C., Mazzullo, A., Ozaki, A., Wolter, F.: Living without Bethand Craig: Explicit definitions and interpolants in description logics with nominals(2020), submitted5. Baader, F., Deborah, Calvanese, D., McGuiness, D.L., Nardi, D., Patel-Schneider,P.F. (eds.): The Description Logic Handbook. Cambridge University Press (2003)6. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to DescriptionLogics. Cambride University Press (2017)7. Badea, L., Nienhuys-Cheng, S.: A refinement operator for description logics. In:Proc. of ILP. pp. 40–59 (2000)8. B´ar´any, V., Benedikt, M., ten Cate, B.: Some model theory of guarded negation.J. Symb. Log. 83(4), 1307–1344 (2018)9. Barcel´o, P., Romero, M.: The complexity of reverse engineering problems for con-junctive queries. In: Proc. of ICDT. pp. 7:1–7:17 (2017)0. Benedikt, M., ten Cate, B., Vanden Boom, M.: Effective interpolation and preser-vation in guarded logics. ACM Trans. Comput. Log. 17(2), 8:1–8:46 (2016)11. Borgida, A., Toman, D., Weddell, G.E.: On referring expressions in query answeringover first order knowledge bases. In: Proc. of KR. pp. 319–328 (2016)12. Botoeva, E., Kontchakov, R., Ryzhikov, V., Wolter, F., Zakharyaschev, M.: Gamesfor query inseparability of description logic knowledge bases. Artif. Intell. 234, 78–119 (2016)13. Botoeva, E., Lutz, C., Ryzhikov, V., Wolter, F., Zakharyaschev, M.: Query insep-arability for ALC ontologies. Artif. Intell. 272, 1–51 (2019)14. B¨uhmann, L., Lehmann, J., Westphal, P., Bin, S.: DL-learner - structured machinelearning on semantic web data. In: Proc. of WWW. pp. 467–471 (2018)15. ten Cate, B., Franconi, E., Seylan, I.: Beth definability in expressive descriptionlogics. J. Artif. Intell. Res. 48, 347–414 (2013)16. Chandra, A.K., Kozen, D.C., Stockmeyer, L.J.: Alternation. J. ACM 28, 114–133(1981)17. Fanizzi, N., Rizzo, G., d’Amato, C., Esposito, F.: DLFoil: Class expression learningrevisited. In: Proc. of EKAW. pp. 98–113 (2018)18. Funk, M., Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Learning description logicconcepts: When can positive and negative examples be separated? In: Proc. ofIJCAI. pp. 1682–1688 (2019)19. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser-vative extensions in description logics. In: Proc. of KR. pp. 187–197. AAAI Press(2006)20. Goranko, V., Otto, M.: Model theory of modal logic. In: Handbook of Modal Logic,pp. 249–329. Elsevier (2007)21. Gr¨adel, E.: On the restraining power of guards. J. Symb. Log. 64(4), 1719–1742(1999)22. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies:Theory and practice. J. of Artifical Intelligence Research 31, 273–318 (2008)23. Guti´errez-Basulto, V., Jung, J.C., Sabellek, L.: Reverse engineering queries inontology-enriched systems: The case of expressive Horn description logic ontologies.In: Proc. of IJCAI-ECAI (2018)24. Hoogland, E., Marx, M.: Interpolation and definability in guarded fragments. Stu-dia Logica 70(3), 373–409 (2002)25. Jung, J., Lutz, C., Martel, M., Schneider, T., Wolter, F.: Conservative extensions inguarded and two-variable fragments. In: Proc. of ICALP. pp. 108:1–108:14. SchlossDagstuhl – LZI (2017)26. Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Logical separability of incompletedata under ontologies. In: Proc. of KR. IJCAI (2020)27. Jung, J.C., Lutz, C., Wolter, F.: Least general generalizations in description logic:Verification and existence. In: Proc. of AAAI. pp. 2854–2861. AAAI Press (2020)28. Jung, J.C., Wolter, F.: Living without beth and craig: Explicit def-initions and interpolants in the guarded fragment (2020), available athttp://arxiv.org/abs/2007.0159729. Kalashnikov, D.V., Lakshmanan, L.V., Srivastava, D.: Fastqre: Fast query reverseengineering. In: Proc. of SIGMOD. pp. 337–350 (2018)30. Kimelfeld, B., R´e, C.: A relational framework for classifier engineering. ACM Trans.Database Syst. 43(3), 11:1–11:36 (2018), https://doi.org/10.1145/3268931 31. Konev, B., Lutz, C., Walther, D., Wolter, F.: Formal properties of modularisation.In: Modular Ontologies, Lecture Notes in Computer Science, vol. 5445, pp. 25–66.Springer (2009)2. Krahmer, E., van Deemter, K.: Computational generation of referring expressions:A survey. Computational Linguistics 38(1), 173–218 (2012)33. Lehmann, J., Fanizzi, N., B¨uhmann, L., d’Amato, C.: Concept learning. In: Per-spectives on Ontology Learning, pp. 71–91. AKA / IOS Press (2014)34. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinementoperators. Machine Learning 78, 203–250 (2010)35. Lutz, C., Piro, R., Wolter, F.: Description logic TBoxes: Model-theoretic charac-terizations and rewritability. In: Proc. of IJCAI (2011)36. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive descriptionlogics. In: Proc. of IJCAI. pp. 453–458 (2007)37. Lutz, C., Wolter, F.: Foundations for uniform interpolation and forgetting in ex-pressive description logics. In: Proc. of IJCAI. pp. 989–995. IJCAI/AAAI (2011)38. Martins, D.M.L.: Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities. Information Systems (2019)39. Ortiz, M.: Ontology-mediated queries from examples: a glimpse at the DL-Litecase. In: Proc. of GCAI. pp. 1–14 (2019)40. Petrova, A., Kostylev, E.V., Grau, B.C., Horrocks, I.: Query-based entity com-parison in knowledge graphs revisited. In: Proc. of ISWC. pp. 558–575. Springer(2019)41. Petrova, A., Sherkhonov, E., Grau, B.C., Horrocks, I.: Entity comparison in RDFgraphs. In: Proc. of ISWC. pp. 526–541 (2017)42. Sarker, M.K., Hitzler, P.: Efficient concept induction for description logics. In:Proc. of AAAI. pp. 3036–3043 (2019)43. Tran, Q.T., Chan, C., Parthasarathy, S.: Query by output. In: Proc. of PODS. pp.535–548. ACM (2009)44. Tran, Q.T., Chan, C.Y., Parthasarathy, S.: Query reverse engineering. VLDB J.23(5), 721–746 (2014)45. Tran, T., Ha, Q., Hoang, T., Nguyen, L.A., Nguyen, H.S.: Bisimulation-based con-cept learning in description logics. Fundam. Inform. 133(2-3), 287–303 (2014)46. Vardi, M.Y.: Reasoning about the past with two-way automata. In: Proc. ofICALP’98. pp. 628–641 (1998)47. Weiss, Y.Y., Cohen, S.: Reverse engineering spj-queries from examples. In: Proc.of PODS. pp. 151–166. ACM (2017)48. Zhang, M., Elmeleegy, H., Procopiuc, C.M., Srivastava, D.: Reverse engineeringcomplex join queries. In: Proc. of SIGMOD. pp. 809–820. ACM (2013) Proof of Theorem 1 We formulate the result to be shown again. Theorem 1 Assume a labeled ALCI -KB ( K , P, { b } ) and Σ ⊆ sig( K ) are given.Then the following conditions are equivalent:1. K = ( O , D ) is projectively ALCI ( Σ )-separable.2. there exists a forest model A of K of finite outdegree and a signature Σ ′ such that Σ ′ ∩ sig( K ) ⊆ Σ and for all models B of K and all a ∈ P : B , a B ALCI ,Σ ′ A , b A . 3. there exists a forest model A of K of finite outdegree such that for all models B of K and all a ∈ P : B , a B f ALCI ,Σ A , b A . 4. there exists a forest model A of K of finite outdegree such that for all a ∈ P : D con( a ) , a Σc A , b A . Proof. “1 ⇒ ALCI -concept C with sig( C ) ∩ sig( K ) ⊆ Σ such C separates ( K , P, { b } ). There exists a model A of K of finite outdegree such that b A ∈ ( ¬ C ) A . Let Σ ′ = sig( C ). Then A and Σ ′ are as required for Condition 2.“2 ⇒ A and Σ ′ such that Condition 2 holds. Weclaim that Condition 3 holds for A as well. Suppose that there exists a model B of K , a ∈ P , and a functional Σ -bisimulation f witnessing B , a B ∼ f ALCI ,Σ A , b A .Define B ′ by expanding B as follows: – for every concept name A ∈ Σ ′ \ sig( K ) and d ∈ dom( f ), let d ∈ A B ′ if f ( d ) ∈ A A ; – for every role R over Σ ′ \ sig( K ) and d ∈ dom( f ), if there exists e ∈ dom( A )with ( f ( d ) , e ) ∈ R A , then add a disjoint copy of A to B and add ( d, e ′ ) to R B ′ for the copy e ′ of e .It is easy to see that B ′ , a B ′ ∼ ALCI ,Σ ′ A , b A , and we have derived a contradic-tion.“3 ⇒ A such that Condition 3 holds. We claimthat Condition 4 holds for A as well. For a proof by contradiction let h be a Σ -homomorphism and t d , d ∈ dom( D ), be K -types, and a ∈ P such h refutesCondition 4. Take models B d of O such that B d , d ∼ ALCI ,Σ A , h ( d ). We mayassume that the B d are tree-shaped with root d and that the bisimulations arefunctions f d . Now attach to every d ∈ dom( D ) the model B d and obtain B byadding ( d, d ′ ) to R B if R ( d, d ′ ) ∈ D . Then f = [ d ∈ dom( D ) f d is a functional ALCI ( Σ )-bisimulation between B and A .“4 ⇒ A such that Condition 4 holds. We claim thatCondition 3 holds for A as well. For a proof by contradiction let f be a functional ALCI ( Σ )-bisimulation witnessing B , a B ∼ f ALCI ,Σ A , b A for some model B of K .he restriction h of f of D is the Σ -homomorphism needed to refute Condition4. “3 ⇒ A such that Condition 3 holds. Define A ′ by expanding A as follows. Take for any d ∈ dom( A ) a fresh concept name A d and set A A ′ d = { d } . Clearly Condition 2 holds for A ′ and Σ ′ = Σ ∪ { A d | d ∈ dom( A ) } .“2 ⇒ ❏ A.1 Additional Definitions for 2ATAs We make precise the semantics of 2ATAs. Let A = ( Q, Θ, q , δ, Ω ) be a 2ATAand ( T, L ) a Θ -labeled tree. A run for A on ( T, L ) is a T × Q -labeled tree ( T r , r )such that: – ε ∈ T r and r ( ε ) = ( ε, q ); – For all y ∈ T r with r ( y ) = ( x, q ) and δ ( q, L ( x )) = ϕ , there is an assignment v of truth values to the transitions in ϕ such that v satisfies ϕ and: • if v ( p ) = 1, then r ( y ′ ) = ( x, p ) for some successor y ′ of y in T r ; • if v ( h−i p ) = 1, then x = ε and there is a successor y ′ of y in T r with r ( y ′ ) = ( x · − , p ); • if v ([ − ] p ) = 1, then x = ε or there is a successor y ′ of y in T r such that r ( y ′ ) = ( x · − , p ); • if v ( ♦ p ) = 1, then there is a successor x ′ of x in T and a successor y ′ of y in T r such that r ( y ′ ) = ( x ′ , p ); • if v ( (cid:3) p ) = 1, then for every successor x ′ of x in T , there is a successor y ′ of y in T r such that r ( y ′ ) = ( x ′ , p ).Let γ = i i · · · be an infinite path in T r and denote, for all j ≥ 0, with q j the state such that r ( i · · · i j ) = ( x, q j ). The path γ is accepting if the largestnumber m such that Ω ( q j ) = m for infinitely many j is even. A run ( T r , r ) isaccepting, if all infinite paths in T rr