Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version)
aa r X i v : . [ c s . A I] J un Probabilistic Reasoning in the Description Logic
ALCP with the Principle of Maximum Entropy
Rafael Pe˜naloza and Nico Potyka Free University of Bozen-Bolzano, Italy [email protected] University of Osnabr¨uck, Germany [email protected]
Abstract.
A central question for knowledge representation is how toencode and handle uncertain knowledge adequately. We introduce theprobabilistic description logic
ALCP that is designed for representingcontext-dependent knowledge, where the actual context taking place isuncertain.
ALCP allows the expression of logical dependencies on thedomain and probabilistic dependencies on the possible contexts. In orderto draw probabilistic conclusions, we employ the principle of maximumentropy. We provide reasoning algorithms for this logic, and show thatit satisfies several desirable properties of probabilistic logics.
A fundamental element of any intelligent application is storing and manipulat-ing the knowledge from the application domain. Logic-based knowledge repre-sentation languages such as description logics (DLs) [1] provide a clear syntaxand unambiguous semantics that guarantee the correctness of the results ob-tained. However, languages based on classical logic are ill-suited for handlingthe uncertainty inherent to many application domains. To overcome this limi-tation, various probabilistic logics have been investigated during the last threedecades (e.g., [3, 15, 20]). In particular, several probabilistic DLs have been de-veloped [18, 19]. To handle probabilistic knowledge, many approaches require acomplete definition of joint probability distributions (JPD) [5, 6, 8, 16, 25]. Oneapproach to avoid a full JPD specification was proposed by Paris [22]: the usergives a partial specification through a set of probabilistic constraints and thepartial knowledge is completed by means of the principle of maximum entropy.In this paper we consider a new probabilistic extension of description logicsbased on the principle of maximum entropy. In our approach we group differ-ent axioms from a knowledge base together into so-called contexts, which areidentified by a propositional formula. Intuitively, each context corresponds to apossible situation, in which the associated sub-KB is guaranteed to hold. Un-certainty is associated to the contexts through a set of probabilistic constraints,which are interpreted under the principle of maximum entropy.To facilitate the understanding of our approach, we focus on the DL
ALC [26]as a prototypical example of a knowledge representation language, and propo-sitional probabilistic constraints as the framework for expressing uncertainty.s reasoning service we consider subsumption relations between concepts givensome partial knowledge of the current context. Since the knowledge in a knowl-edge base is typically incomplete, one cannot expect to obtain a precise probabil-ity for a given consequence. Instead, we compute a belief interval that describesall the probability degrees that can be associated to the consequence withoutcontradiction. The lowest bound of the interval corresponds to a sceptical view,considering only the most fundamental models of the knowledge base. The upperbound, in contrast, reflects the credulous belief in which every context that isnot explicitly removed is considered. In the worst-case, we get the trivial inter-val [0 , . , . .
05. In contrast, when the probability interval that a treatment will besuccessful is [0 . , . . – we define the new probabilistic description logic ALCP that allows for aflexible description of axiomatic dependencies, and its reasoning problems(Section 3); – we explain in detail how degrees of belief for the subsumption problem canbe computed (Section 4); and – we show that ALCP satisfies several desirable properties of probabilisticlogics (Section 5).
We start by recalling the basic notions of probabilistic propositional logic andthe principle of maximum entropy.Let L be a propositional language constructed over a finite signature sig ( L ),i.e., a set of propositional variables, in the usual way. An L -interpretation v is atruth assignment of the propositional variables in sig ( L ). Int ( L ) denotes the set ofall L -interpretations. Satisfaction of a formula φ ∈ L by an L -interpretation v ∈ Int ( L ) (denoted v | = φ ) is defined as usual. A probability distribution over L is afunction P : Int ( L ) → [0 ,
1] where P v ∈ Int ( L ) P ( v ) = 1. Probability distributionsare extended to arbitrary L -formulas φ by setting P ( φ ) = P v | = φ P ( v ). Definition 1 (probabilistic constraints, models).
Given the propositionallanguage L , a probabilistic constraint (over L ) is an expression of the form c + k X i =1 c i · p ( φ i ) ≥ where c , c i ∈ R , and φ i ∈ L , ≤ i ≤ k . A probability distribution P over L is a model of the probabilistic constraint c + P ki =1 c i · p ( φ i ) ≥ if and only if c + P ki =1 c i · P ( φ i ) ≥ . The distribution P is a model of the set of probabilisticontraints R ( P | = R ) iff it satisfies all the constraints in R . The set of allmodels of R is denoted by Mod ( R ) . If Mod ( R ) = ∅ , we say that R is consistent . Our probabilistic constraints can express the most common types of constraintsconsidered in the literature of probabilistic logics. For instance, probabilisticconditionals ( ψ | φ )[ ℓ, u ] are satisfied iff ℓ · P ( φ ) ≤ P ( ψ ∧ φ ) ≤ u · P ( φ ) [17].That is, the conditional is satisfied iff the conditional probability of ψ given φ is between ℓ and u whenever P ( φ ) >
0. Sometimes P ( φ ) > p ( ψ ∧ φ ) − ℓ · p ( φ ) ≥ , and u · p ( φ ) − p ( ψ ∧ φ ) ≥ . Probabilistic constraints can also express more complex restrictions; for example,we can state that the probability that a bird cannot fly is at most one fourth ofthe probability that a bird flies through the constraint14 p ( bird ∧ flies ) − p ( bird ∧ ¬ flies ) ≥ . (2)To improve readability, we will often rewrite constraints in a more compactmanner, using conditionals as in the first example, or e.g. rewriting (2) as p ( bird ∧ flies ) ≥ p ( bird ∧ ¬ flies ) . In general, consistent sets of probabilistic constraints have infinitely manymodels, and there is no obvious way to distinguish between them. One well-studied approach for dealing with this diversity is to focus on the model thatmaximizes the entropy H ( P ) = − X v ∈ Int ( L ) P ( v ) · log P ( v ) . From an information-theoretic point of view, the maximum entropy (ME) dis-tribution can be regarded as the most conservative one in the sense that itminimizes the information-theoretic distance (that is, the KL-divergence) to theuniform distribution among all probability distributions that satisfy our con-straints. In particular, if there are no restrictions on the probability distributionsconsidered, then the uniform distribution is the ME distribution, see, e.g., [27]for a more detailed discussion of these issues. A complete characterization ofmaximum entropy for the purpose of uncertain reasoning can be found in [22].
Definition 2 (ME-model).
Let R be a consistent set of probabilistic con-straints. The ME-model P ME R of R is the unique solution of the maximizationproblem arg max P | = R H ( P ) . Existence and uniqueness of P ME R follows from the fact that H is strictly concaveand continuous, and that the probability distributions that satisfy R form aompact and convex set. P ME R is usually computed by deriving an unconstrainedoptimization problem by means of the Karush-Kuhn-Tucker conditions. Theresulting problem can be solved, for instance, by (quasi-)Newton methods withcost | Int ( L ) | , see, e.g., [21] for more details on these techniques. ALCP
ALCP is a probabilistic extension of the classical description logic
ALC capableof expressing complex logical and probabilistic relations. As with classical DLs,the main building blocks in
ALCP are concepts . Syntactically,
ALCP conceptsare constructed exactly as
ALC concepts. Given two disjoint sets N C of conceptnames and N R of role names , ALCP concepts are built using the grammar rule C ::= A | ¬ C | C ⊓ C | ∃ r.C, where A ∈ N C and r ∈ N R . Note that we canderive disjunction, universal quantification and subsumption from these rulesby using logical equivalences like C ⊔ C ≡ ¬ ( ¬ C ⊓ ¬ C ). The knowledge ofthe application domain is expressed through a finite set of axioms that restrictthe way the different concepts and roles may be interpreted. To express bothprobabilistic and logical relationships, each axiom is annotated with a formulafrom L that intuitively expresses the context in which this axiom holds. Definition 3 (KB). An L -restricted general concept inclusion ( L -GCI) is ofthe form h C ⊑ D : κ i where C, D are
ALCP concepts and κ is an L -formula.An L -TBox is a finite set of L -GCIs. An ALCP knowledge base (KB) over L is a pair K = ( R , T ) where R is a set of probabilistic constraints and T is an L -TBox.Example 4. Consider an application modeling beliefs about bacterial and viralinfections using the concept names strep (streptococcal infection), bac (bacterialinfection), vir (viral infection), inf (infection), and ab (antibiotic); and the rolenames sf (suffers from), and suc (successful treatment); and the propositionalvariables res (antibiotic resistance), and h (heavy use of antibiotics by patient).Define the L -TBox T exa containing the L -GCIs h∃ sf . bac ⊑ ∃ suc . ab : ¬ res ∧¬ h i , h∃ sf . vir ⊑ ¬∃ suc . ab : ⊤i , h strep ⊑ bac : ⊤i , h∃ sf . bac ⊑ ¬∃ suc . ab : res i , h bac ⊑ inf : ⊤i , h vir ⊑ inf : ⊤i , where ⊤ is any L -tautology. For example, the first L -GCI states that a bacterialinfection can be treated successfully with antibiotics if no antibiotic resistanceis present and there was no heavy use of antibiotics; the second one states thatviral infections can never be treated with antibiotics successfully. Consider ad-ditionally the set R containing the probabilistic constraints containing( res )[0 . , ( res | h )[0 . . That is, the probability of an antibiotic resistance is 5% if no further informationis given. However, if the patient used antibiotics heavily, the probability increasesto 80%.otice that the probabilistic constraints, and hence the representation of theuncertainty in the knowledge, refer only to the propositional formulas that labelthe L -GCIs. In ALCP , the uncertainty of the knowledge is handled throughthese propositional formulas as explained next.A possible world interprets both the axiom language (i.e., the concept androle names) and the context language (the propositional variables). Intuitively,it describes a possible context ( L -interpretation) together with the relationshipsbetween concepts in that situation ( ALC -interpretation).
Definition 5 (possible world). A possible world is a triple I = ( ∆ I , · I , v I ) where ∆ I is a non-empty set (called the domain ), v I is an L -interpretation,and · I is an interpretation function that maps every concept name A to a set A I ⊆ ∆ I and every role name r to a binary relation r I ⊆ ∆ I × ∆ I . The interpretation function · I is extended to complex concepts as usual in DLsby letting ( ¬ C ) I := ∆ I \ C I ; ( ∃ r.C ) I := { d ∈ ∆ I | ∃ e ∈ ∆ I . ( d, e ) ∈ r I , e ∈ C I } ;and ( C ⊓ D ) I := C I ∩ D I . A possible world is a model of an L -GCI iff it satisfiesthe description logic constraint of the axiom whenever it satisfies the context. Definition 6 (model of TBox).
The possible world I = ( ∆ I , · I , v I ) is a model of the L -GCI h C ⊑ D : κ i , denoted as I | = h C ⊑ D : κ i , iff (i) v I = κ , or(ii) C I ⊆ D I . It is a model of the L -TBox T iff it is a model of all the L -GCIsin T . The classical DL
ALC is a special case of
ALCP where all the axioms are anno-tated with an L -tautology ⊤ . To preserve the syntax of classical DLs, we denotesuch L -GCIs as C ⊑ D instead of h C ⊑ D : ⊤i . In this case, the condition (i)from Definition 6 cannot be satisfied, and hence a model is required to satisfy C I ⊆ D I for all L -GCIs C ⊑ D in the TBox. For a deeper introduction toclassical ALC , see [1].According to our semantics, we only demand that the L -GCIs are satisfiedin some specific contexts. Thus, it is often useful to focus on the classical ALC
TBox that contains the knowledge that holds in a particular situation. For a KB K = ( R , T ) and v ∈ Int ( L ), the v-restricted TBox is the ALC
TBox T v := { C ⊑ D | h C ⊑ D : κ i ∈ T , v | = κ } . The possible world I satisfies T v ( I | = T v ) if for all L -GCIs C ⊑ D ∈ T v itholds that C I ⊆ D I . In the following, we will often consider subsumption and strong non-subsumption between concepts w.r.t. a restricted TBox. We say that C is subsumed by D w.r.t. T v ( T v | = C ⊑ D ) if for every I | = T v it holds that C I ⊆ D I . Dually, C is strongly non-subsumed by D w.r.t. T v ( T v | = C D ) if forevery I | = T v , C I D I holds. Notice that strong non-subsumption requires thatthe inclusion between axioms does not hold in any possible world satisfying T v .Hence, this condition is more strict than just negating the subsumption relation.We now describe how the probabilistic constraints are handled in our logic.An ALCP -interpretation consists of a finite set of possible worlds and a proba-bility function over these worlds. efinition 7 (
ALCP -interpretation). An ALCP -interpretation is a pair ofthe form P = ( I , P I ) , where I is a non-empty, finite set of possible worlds and P I is a probability distribution over I . Each
ALCP -interpretation induces a probability distribution over L . The prob-ability of a context can be obtained by adding the probabilities of all possibleworlds in which this context holds. Definition 8 (distribution induced by P ). Let P = ( I , P I ) be an ALCP -in-terpretation. The probability distribution P P : Int ( L ) → [0 ,
1] induced by P isdefined by P P ( v ) := P I∈ I | v P I ( I ) , where I | v = { ( ∆ I , · I , v I ) ∈ I | v I = v } . As usual, reasoning is restricted to interpretations that satisfy the restrictionsimposed by the knowledge base. In our case, we have to demand that the in-terpretation is consistent with both the classical and the probabilistic part ofour knowledge base. That is, we consider only those possible worlds that satisfyboth the terminological knowledge ( T ) and the probabilistic constraints ( R ). Definition 9 (model).
Let P = ( I , P I ) be an ALCP -interpretation. P is con-sistent with the TBox T if every I ∈ I is a model of T . P is consistent with theset of probabilistic constraints R iff P P | = R . The ALCP -interpretation P is a model of the KB K = ( R , T ) iff it is consistent with both T and R . As usual, aKB is consistent iff it has a model. Notice that
ALCP -KBs can express both, logical and probabilistic dependenciesbetween axioms. For instance, two L -GCIs h C ⊑ D : κ i and h C ⊑ D : κ i where κ ⇒ κ express that whenever the first L -GCI is satisfied, the secondone must also hold. Similarly, the probabilistic dependencies between axioms areexpressed via the probabilistic constraints of the labeling formulas.We are interested in computing degrees of belief for subsumption relationsbetween concepts. We define the conditional probability of a subsumption rela-tion given a context with respect to a given ALCP -interpretation following theusual notions of conditioning.
Definition 10 (probability of subsumption).
Let
C, D be concepts, κ a con-text and P an ALCP -interpretation. The conditional probability of C ⊑ D given κ with respect to P isPr P ( C ⊑ D | κ ) := P I∈ I , I| = κ, I| = C ⊑ D P I ( I ) P I∈ I , I| = κ P I ( I ) . (3)Notice that the denominator in (3) can be rewritten as X I∈ I , I| = κ P I ( I ) = X v | = κ X I∈ I | v P I ( I ) = X v | = κ P P ( v ) = P P ( κ ) . As usual, the conditional probability is only well-defined when P P ( κ ) > R may be satisfied by aninfinite class of probability distributions. In the spirit of maximum entropy rea-soning, we consider only the most conservative ones in the sense that they inducethe ME-model P ME R of R . efinition 11 (ME- ALCP -model). An ALCP -model P of K is called an ME-
ALCP -model of K iff P P = P ME R . The set of all ME- ALCP -models of K isdenoted by Mod ME ( K ) . K is called ME-consistent iff Mod ME ( K ) = ∅ . Note that ME-consistency is a strictly stronger notion of consistency. ME-consis-tent knowledge bases are always consistent, but the converse does not necessarilyhold if the classical TBox obtained from T by restricting to a context is incon-sistent as we show in the following example. Example 12.
Let sig ( L ) = { x } and K = ( R , T ) be the KB with R = ∅ and T = {h A ⊔ ¬ A ⊑ A ⊓ ¬ A : x i} . Since A ⊔ ¬ A ⊑ A ⊓ ¬ A is contradictorial, each ALCP -model of K must satisfy ¬ x . There certainly are such models, but in eachsuch model P , P P ( x ) = 0. However, since R = ∅ , we have P ME R ( x ) = 0 . K has no ME-model.ME-inconsistency rules out some undesired cases in which the whole knowledgebase is consistent, but the TBox restricted to some context is inconsistent. Thefollowing theorem gives a simple characterization of ME-consistency: to verifyME-consistency of a KB, it suffices to check consistency of the TBoxes inducedby the L -interpretations that have positive probability with respect to P ME R . Bythe properties of the ME distribution, these are the interpretations that are notexplicitly restricted to have zero probability through R . Theorem 13.
The KB K = ( R , T ) is ME-consistent iff for every v ∈ Int ( L ) such that P ME R ( v ) > , T v is consistent. For the rest of this paper we consider only ME-consistent KBs. Hence, wheneverwe speak of a KB K , we implicitly assume that K has at least one ME-model.We are interested in computing the probability of a subsumption relationw.r.t. a given KB K . Notice that, although we consider only one probability dis-tribution P ME R , there can still exist many different ME-models of K , which yielddifferent probabilities for the same subsumption relation. One way to handlethis is to consider the smallest and largest probabilities that can be consistentlyassociated to this relation. We call them the sceptical and the creduluos degreesof belief, respectively. Definition 14 (degree of belief ).
Let
C, D be ALCP concepts, κ a context,and K = ( R , T ) an ALCP
KB. The sceptical degree of belief of C ⊑ D given κ w.r.t. K is B s K ( C ⊑ D | κ ) := inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ ) . The credulous degree of belief of C ⊑ D given κ w.r.t. K is B c K ( C ⊑ D | κ ) := sup P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ ) . Example 15.
Consider K exa from Example 4. If we ask for the degrees of beliefthat a patient who suffers from an infection can be successfully treated withntibiotics, we obtain B s K exa ( ∃ sf . inf ⊑ ∃ suc . ab | ⊤ ) = 0 , B c K exa ( ∃ sf . inf ⊑ ∃ suc . ab | ⊤ ) = 1 . These bounds are not very informative, but they are perfectly justified by ourknowledge base since we do not know anything about the effectiveness of an-tibiotics with respect to infections in general. However, for a patient who suffersfrom a streptococcal infection we get B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = 0 . , B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = 0 . . If we know that this patient used antibiotics heavily in the past, then thereis nothing in our knowledge base that guarantees the existence of a successfultreatment. Hence, the degrees of belief become B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 0 B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 0 . . Our definition of the sceptical degree of belief raises a philosophical question:should there be no difference between the degree of belief 0 and an infinitelysmall degree of belief? A dual question arises for the credulous degree of beliefand the probability 1. However, as we show in the next section, the sceptical andcredulous degrees of belief actually correspond to minimum and maximum ratherthan to infimum and supremum (see Corollary 30) so that these questions becomevacuous. From the following theorem we can conclude that every intermediatedegree can also be obtained by some model of the KB.
Theorem 16 (Intermediate Value Theorem).
Let p < p and P and P be two ME- ALCP -models of the KB K = ( R , T ) such that Pr P ( C ⊑ D | κ ) = p and Pr P ( C ⊑ D | κ ) = p . Then for each p between p and p there exists anME- ALCP -model P of K such that Pr P ( C ⊑ D | κ ) = p As we will show in Corollary 30, both the sceptical degree B s K ( C ⊑ D | κ ) andthe credulous degree B c K ( C ⊑ D | κ ) are in fact witnessed by some ME-models.Therefore it is meaningful to consider the whole interval of beliefs between B s K ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ). Definition 17 (belief interval).
Let
C, D be ALCP concepts, κ ∈ L a contextand K = ( R , T ) a ALCP
KB. The belief interval for C ⊑ D w.r.t. K given κ is B K ( C ⊑ D | κ ) := [ B s K ( C ⊑ D | κ ) , B c K ( C ⊑ D | κ )] . Computing Beliefs
In this section we show how to compute the belief interval. The first theoremstates that the sceptical degreef of belief for a subsumption relation can becomputed by adding the probabilities of those L -interpretations w that entailthis subsumption in the corresponding restricted TBox T w . Theorem 18.
Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextsuch that P ME R ( κ ) > . Then B s K ( C ⊑ D | κ ) = P w ∈ Int ( L ) , T w | = C ⊑ D,w | = κ P ME R ( w ) P ME R ( κ ) . Dually, the credulous degree of belief for a subsumption relation can be computedby removing all the situations in which this relation cannot possibly hold.
Theorem 19.
Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextwith P ME R ( κ ) > . Then B c K ( C ⊑ D | κ ) = 1 − P w ∈ Int ( L ) , T w | = C D,w | = κ P ME R ( w ) P ME R ( κ ) . To prove these theorems, one can build two models of the KB K , P and Q suchthat Pr P ( C ⊑ D | κ ) and Pr Q ( C ⊑ D | κ ) are those degrees expressed byTheorems 18 and 19, respectively. As a byproduct of these proofs, we obtainthat the infimum and supremum that define the sceptical and the credulousdegrees of belief actually correspond to minimum and maximum taken by someME-models, yielding the following corollary. Corollary 20.
Let K be an ALCP
KB,
C, D be two concepts, and κ be a context.There exist two ME-models P , Q of K with B s K ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ) = Pr Q ( C ⊑ D | κ ) . The direct consequence of Theorems 18 and 19 is that if we want to computethe belief interval for C ⊑ D given some context, it suffices to identify all L -interpretations whose induced (classical) TBoxes entail the subsumption rela-tion C ⊑ D (for the sceptical belief) or the strong non-subsumption C D (forcredulous belief). Recall that every set of propositional interpretations can berepresented by a propositional formula. This motivates the following definition. Definition 21 (consequence formula). An L -formula φ is a consequenceformula for C ⊑ D (respectively C D ) w.r.t. the L -TBox T if for every w ∈ Int ( L ) it holds that w | = φ iff T w | = C ⊑ D (respectively T w | = C D ). If we are able to compute these consequence formulas, then the computation ofthe belief interval can be reduced to the evaluation of the probability of theseformulas w.r.t. the ME-distribution satisfying R . Theorem 22.
Let K = ( R , T ) be an ALCP
KB, φ and ψ be consequence for-mulas for C ⊑ D and C D w.r.t. T , respectively, and κ a context. Then B s K ( C ⊑ D | κ ) = P ME R ( φ | κ ) and B c K ( C ⊑ D | κ ) = 1 − P ME R ( ψ | κ ) . lgorithm 1 Computing degrees of belief
Input: KB K = ( R , T ), concepts C, D , context κ Output:
Belief degrees (cid:0) B s K ( C ⊑ D | κ ) , B c K ( C ⊑ D | κ ) (cid:1) ℓ s ← ℓ c ← for all v ∈ Int ( L ) doif v | = κ thenif T v | = C ⊑ D then ℓ s ← ℓ s + P ME R ( v ) else if T v | = C D then ℓ c ← ℓ c + P ME R ( v ) return (cid:0) ℓ s /P ME R ( κ ) , − ℓ c /P ME R ( κ ) (cid:1) Example 23.
In our running example, one can see that a consequence formulafor ∃ sf . strep ⊑ ∃ suc . ab is ¬ res ∧ ¬ h . Indeed, in order to deduce this consequenceit is necessary to satisfy the first axiom of T exa , which is only guaranteed in thecontext ¬ res ∧¬ h . Similarly, res is a consequence formula for ∃ sf . strep suc . ab .Knowing both the consequence formulas and the ME-model, we can deduce B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = P ME R ( ¬ res ∧ ¬ h ) = 0 . B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 1 − P ME R ( res | h ) = 0 . . In particular, Theorem 22 implies that the belief interval can be computed intwo phases. The first phase uses purely logical reasoning to compute the con-sequence formulas, while the second phase applies probabilistic inferences tocompute the degrees of belief from these formulas. We now briefly explain howthe consequence formulas can be computed.Notice first that subsumption and non-subsumption are monotonic conse-quences in the sense of [2]; that is, if an
ALC
TBox T entails the subsumption C ⊑ D , then every superset of T also entails this consequence. Similarly, addingmore axioms to a TBox entailing C D does not remove this entailment. More-over, the set of all L -formulas (modulo logical equivalence) forms a distributivelattice ordered by generality, in which L -interpretations are all the join primeelements. Thus, the consequence formulas from Definition 21 are in fact the so-called boundaries from [2]. Hence, they can be computed using any of the knownboundary computation approaches.Assuming that the number of contexts is small in comparison to the size ofthe TBox, it is better to compute the degrees of belief through a more directapproach following Theorems 18 and 19. In order to compute B s K ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ), it suffices to enumerate all interpretations v ∈ Int ( L ) and checkwhether T v | = C ⊑ D or T v | = C D , and v | = κ , or not (see Algorithm 1). Thisapproach requires 2 | sig ( L ) | calls to a standard ALC reasoner, and each of thesecalls runs in exponential time on |T | [9]. Notice that this algorithm has an any-time behaviour: it is possible to stop its execution at any moment and obtain anapproximation of the belief interval. Moreover, the longer the algorithm runs,he better this approximation becomes. Thus, this method is adequate for asystem where finding good approximations efficiently may be more importantthan computing the precise answers.
We now investigate some properties of probabilistic logics [22]. First we showthat
ALCP is language and representation invariant . Invariance is meant withrespect to logical objects. Language invariance means that just extending thelanguage without changing the knowledge base should not affect reasoning re-sults. Representation invariance means that equivalent knowledge bases shouldyield equal inference results. Notice that different notions of representation de-pendence exist in the literature. For instance, in [11] a very different notion isconsidered, where the language and the knowledge base are changed simultane-ously. This case is not covered by our notion of representation invariance. ALCP also satisfies an independence property; i.e., reasoning results about a part of thelanguage are not changed, when we add knowledge about an independent partof the language. Finally,
ALCP is continuous in the sense that minor changes inthe probabilistic knowledge expressed by a knowledge base cannot induce majorchanges in the reasoning results. Theorem 24 (Representation invariance).
Let K i = ( R i , T i ) , i ∈ { , } , betwo KBs such that Mod ( R ) = Mod ( R ) and Mod ( T ) = Mod ( T ) . Then for allconcepts C, D and contexts κ ∈ L , B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . ALCP is not only representation invariant, but also language invariant. Thisproperty is of computational interest, in particular in combination with inde-pendence, that we investigate subsequently. To illustrate this, suppose that weadded knowledge about bone fractures in our medical example, which is inde-pendent of the knowledge about infections. Independence guarantees that wecan ignore the knowledge about infections when answering queries about bonefractures. In this way, we can decrease the size of the knowledge base. Languageinvariance guarantees that we can also ignore the concepts, relations and propo-sitional variables related to the infection domain. Thus, we can decrease thesize of the language. Exploiting both properties, the size of the computationalproblems can sometimes be decreased significantly.
Theorem 25 (Language Invariance).
Let K , K be KBs over L , N , N and L , N , N , respectively. If K = K , L ⊆ L , N ⊆ N and N ⊆ N , then forall concepts C, D ∈ N and contexts κ ∈ L , it holds that B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . For an L -TBox T , we define the signature of T to be the set sig ( T ) of allconcept names and role names appearing in T . Likewise, sig ( R ) is the set of allpropositional variables appearing in R . The signature of a KB K = ( R , T ) is sig ( K ) := sig ( R ) ∪ sig ( T ). heorem 26 (Independence). Let K , K be s.t. sig ( K ) ∩ sig ( K ) = ∅ , C, D be two concepts, and κ a context where ( sig ( C ) ∪ sig ( D ) ∪ sig ( κ )) ∩ sig ( K ) = ∅ .Then B ( C ⊑ K D | κ ) = B ( C ⊑ K ∪K D | κ ) . The last property we consider is continuity. One important practical featureof continuous probabilistic logics is that they guarantee a numerically stablebehaviour. That is, minor rounding errors due to floating-point arithmetic willnot result in major errors in the computed probabilities. As demonstrated byParis in [22], measuring the difference between probabilistic knowledge basesis subtle and is best addressed by comparing knowledge bases extensionally;i.e., with respect to their model sets. To this end, Paris considered the Blaschkemetric. Formally, the
Blaschke distance k S , S k B between two convex sets S , S is defined by inf { δ ∈ R | ∀ P ∈ S ∃ P ∈ S : k P , P k ≤ δ and ∀ P ∈ S ∃ P ∈ S : k P , P k ≤ δ } Intuitively, k S , S k B is the smallest real number d such that for each distribu-tion in one of the sets, there is a probability distribution in the other that hasdistance at most d to the former. We say that a sequence of knowledge bases ( K i )converges to a knowledge base K iff the classical part of each K i is equivalent tothe classical part of K and the probabilistic part converges to the probabilisticpart of K . Our reasoning approach behaves indeed continuously with respect tothis metric. Theorem 27 (Continuity).
Let ( K i ) be a convergent sequence of KBs withlimit K and B K i ( C ⊑ D | κ ) = [ ℓ i , u i ] . If B K ( C ⊑ D | κ ) = [ ℓ, u ] , then ( l i ) converges to ℓ and ( u i ) converges to u (with respect to the usual topology on R ). Relational probabilistic logical approaches can be roughly divided into those thatconsider probability distributions over the domain, those that consider proba-bility distributions over possible worlds and those that combine both ideas [10].Our framework belongs to the second group. Maximum entropy reasoning inpropositional probabilistic logics has been discussed extensively, e.g., in [13, 22],and various extensions to first-order languages have been considered in recentyears [3, 4, 14, 15]. In these works, the domain is restricted to a finite number ofconstants or bounded in the limit. We circumvent the need to do so by combin-ing a classical first-order logic with unbounded domain with a probabilistic logicwith fixed domain.Many probabilistic DLs have also been considered in the last decades [16,18, 19]. Our approach is closest to Bayesian DLs [5, 6] and disponte [25]. Thegreatest difference with the former lies in the fact that
ALCP
KBs do not re-quire a complete specification of the probability distribution, but only a setof probabilistic constraints. Moreover, the previous formalisms consider only theceptical degree of belief, while we are interested in the full belief interval. In con-trast to disponte , ALCP is capable of expressing both, logical and probabilisticdependencies between the axioms in a KB; in addition, disponte requires alluncertainty degrees to be assigned as mutually independent point probabilities,while
ALCP allows for a more flexible specification.
We have introduced the probabilistic DL
ALCP , which extends the classical DL
ALC with the capability of expressing and reasoning about uncertain contextualknowledge defined through the principle of maximum entropy. Effective reason-ing methods were developed using the decoupling between the logical and theprobabilistic components of
ALCP
KBs. We also studied the properties of thislogic in relation to other probabilistic logics.We plan to extend this work in several directions. First, instead of consideringthe ME-model, we could reason over all probability distributions that satisfy ourprobabilistic constraints similar to [12, 17, 20]. This will result in larger beliefintervals in general. A smaller interval is preferable since it corresponds to amore precise degree of belief. However, when using all probability distributionsthe size of the interval can be a good indicator for the variation of the possiblebeliefs in our query with respect to the knowledge base.In some applications it is also useful to allow more expressive propositionalor relational context languages like those proposed in [4, 7, 15, 23]. Similarly,we can consider other DLs for our concept language. Indeed,
ALC was chosenas a prototypical DL for studying the basic properties of our framework. In-cluding additional constructors into the formalism should be relatively simple.In contrast, considering other reasoning problems beyond subsumption is lessstraightforward. Recall, for instance, that if an
ALCP KB K contains an incon-sistent context with positive probability, then K has no models. It is thus unclearhow to handle the probability of consistency of a KB.Practical reasoning with ALCP can be currently performed by combining ex-isting ME-reasoners with any ALC -reasoner according to Algorithm 1. Clearly,such an approach can still be further optimized. We are working on combiningthe classical and probabilistic reasoning parts in more sophisticated ways. References
1. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.(eds.): The Description Logic Handbook: Theory, Implementation, and Applica-tions. Cambridge University Press, 2nd edn. (2007)2. Baader, F., Knechtel, M., Pe˜naloza, R.: Context-dependent views to axioms andconsequences of semantic web ontologies. J. of Web Semantics 12–13, 22–40 (2012) http://owl.cs.manchester.ac.uk/tools/list-of-reasoners/. Barnett, O., Paris, J.B.: Maximum entropy inference with quantified knowledge.Logic Journal of the IGPL 16(1), 85–98 (2008)4. Beierle, C., Kern-Isberner, G., Finthammer, M., Potyka, N.: Extending and com-pleting probabilistic knowledge and beliefs without bias. KI 29(3), 255–262 (2015)5. Ceylan, I.I., Pe˜naloza, R.: The Bayesian description logic BEL. In: Proc. of IJCAR2014. LNCS, vol. 8562, pp. 480–494. Springer (2014)6. d’Amato, C., Fanizzi, N., Lukasiewicz, T.: Tractable reasoning with Bayesian de-scription logics. In: Proc. SUM 2008. LNCS, vol. 5291, pp. 146–159. Springer (2008)7. De Bona, G., Cozman, F.G., Finger, M.: Towards classifying propositional proba-bilistic logics. J. of Applied Logic 12(3), 349–368 (2014)8. Domingos, P.M., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelli-gence. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan& Claypool Publishers (2009)9. Donini, F.M., Massacci, F.: ExpTime tableaux for ALC . Artificial Intelligence124(1), 87–138 (2000)10. Halpern, J.Y.: An analysis of first-order logics of probability. Artificial Intelligence46, 311–350 (1990)11. Halpern, J.Y., Koller, D.: Representation dependence in probabilistic inference.JAIR pp. 319–356 (2004)12. Hansen, P., Perron, S.: Merging the local and global approaches to probabilisticsatisfiability. Intern. J. of Approx. Reasoning 47(2), 125 – 140 (2008)13. Kern-Isberner, G.: Conditionals in nonmonotonic reasoning and belief revision.Springer, LNAI 2087 (2001)14. Kern-Isberner, G., Thimm, M.: Novel semantical approaches to relational proba-bilistic conditionals. In: Proc. KR 2010. pp. 382–391. AAAI Press (2010)15. Kern-Isberner, G., Lukasiewicz, T.: Combining probabilistic logic programmingwith the power of maximum entropy. Artif. Intell. 157(1-2), 139–202 (2004)16. Klinov, P., Parsia, B.: A hybrid method for probabilistic satisfiability. In: Proc.CADE 2011. LNCS, vol. 6803, pp. 354–368. Springer (2011)17. Lukasiewicz, T.: Probabilistic deduction with conditional constraints over basicevents. JAIR 10, 380–391 (1999)18. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in descriptionlogics for the semantic web. J. of Web Semantics 6(4), 291–308 (2008)19. Lutz, C., Schr¨oder, L.: Probabilistic description logics for subjective uncertainty.In: Proc. KR 2010. AAAI Press (2010)20. Nilsson, N.J.: Probabilistic logic. Artificial Intelligence 28, 71–88 (February 1986)21. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, 2nd edn. (2006)22. Paris, J.: The uncertain reasoner’s companion – A mathematical perspective. Cam-bridge University Press (1994)23. Potyka, N.: Reasoning over linear probabilistic knowledge bases with priorities. In:Proc. SUM 2015. vol. 9310, pp. 121–136. Springer (2015)24. Potyka, N.: Relationships between semantics for relational probabilistic conditionallogics. In: Computational Models of Rationality, Essays dedicated to GabrieleKern-Isberner. pp. 332–347. College Publications (2016)25. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R.: Epistemic and statistical probabilisticontologies. In: Proc. URSW-12. vol. 900, pp. 3–14. CEUR-WS (2012)26. Schmidt-Schauß, M., Smolka, G.: Attributive concept descriptions with comple-ments. Artif. Intell. 48(1), 1–26 (1991)27. Yeung, R.W.: Information theory and network coding. Springer Science & BusinessMedia (2008) ppendix: Proofs
Theorem 13.
The KB K = ( R , T ) is ME-consistent iff for every v ∈ Int ( L ) such that P ME R ( v ) > , T v is consistent.Proof. For the “if” direction, let v , . . . , v n ∈ Int ( L ) be all the L -interpretationssuch that P ME R ( v i ) >
0, 1 ≤ i ≤ n . Then, for every i, ≤ i ≤ n the inducedTBox T v i has a classical model I i = ( ∆ I i , · I i ), by assumption. It is easy to verifythat the ALCP -interpretation P = ( I , P I ) defined by I = {J i = ( ∆ I i , · I i , v i ) | ≤ i ≤ n } and P I ( J i ) = P ME R ( v i ) for all i, ≤ i ≤ n is an ME-model of K . Thus, K isconsistent.Conversely, let P = ( I , P I ) be an ME-model of K . Then, for every v ∈ Int ( L )with P ME R ( v ) > ∆ I , · I , w I ) with w I = v . Since I is amodel of T and satisfies all contexts corresponding to the GCIs in T v , it followsthat ( ∆ I , · I ) | = T v ; thus, T v must be consistent. ⊓⊔ Theorem 16 (Intermediate Value Theorem).
Let p < p and P and P be two ME- ALCP -models of the KB K = ( R , T ) such that Pr P ( C ⊑ D | κ ) = p and Pr P ( C ⊑ D | κ ) = p . Then for each p between p and p there exists anME- ALCP -model P of K such that Pr P ( C ⊑ D | κ ) = p Proof.
Assume w.l.o.g. that P i = ( I i , P i ), i = 1 ,
2, are such that I ∩ I = ∅ : ifthere exists some I ∈ I ∩ I , it suffices to rename the elements in ∆ I in oneof the probabilistic interpretations. Given λ ∈ [0 , P λ = ( I ∪ I , P λ ),where for every I ∈ I ∪ I , P λ ( I ) = ( (1 − λ ) P ( I ) if I ∈ I λP ( I ) otherwise. P λ is consistent with T since P , P are. We now show that P λ induces theME-model of R , which implies that P λ is an ME- ALCP -model of K . For all v ∈ Int ( L ), we have P P λ ( v ) = X I∈ ( I ∪ I ) | v P λ ( I )= X I∈ I | v P λ ( I ) + X I∈ I | v P λ ( I )= X I∈ I | v (1 − λ ) · P ( I ) + X I∈ I | v λ · P ( I )= (1 − λ ) · P P ( v ) + λ · P P ( v ) = P ME R ( v ) , where the last equation follows from P P = P P = P ME R . Hence, each P λ isindeed a ME- ALCP -model of K . For the probability of our subsumption relation,e can derive similarly that P P λ ( κ ) Pr P λ ( C ⊑ D | κ ) is equal to X I∈ I ∪ I , I| = κ, I| = C ⊑ D P λ ( I ) = X I∈ I , I| = κ, I| = C ⊑ D P λ ( I ) + X I∈ I , I| = κ, I| = C ⊑ D P λ ( I )= P P λ ( κ ) ((1 − λ ) p + λp ) . For every p ∈ [ p , p ] there exists a λ p ∈ [0 ,
1] such that p = (1 − λ p ) p + λ p p .Using this value λ p we obtain that Pr P λp ( C ⊑ D | κ ) = p ⊓⊔ In order to prove Theorems 18 and 19 it is useful to consider a restricted classof
ALCP -interpretations in which each context is represented by at most onepossible world. We call these interpretations pithy . Definition 28 (pithy).
The
ALCP -interpretation P =( I , P I ) is pithy if forevery w ∈ Int ( L ) there is at most one possible world ( ∆ I , · I , v I ) ∈ I with v I = w . As the following lemma shows, pithy models are sufficient for computing thesceptical and credulous degrees of belief of a conditional subsumption relationand, by extension, the belief interval.
Lemma 29.
Let K be an ALCP
KB,
C, D two concepts and κ ∈ L such that P ME R ( κ ) > . For every ALCP -model P of K there exist pithy ALCP -models Q , Q of K such thatPr Q ( C ⊑ D | κ ) ≤ Pr P ( C ⊑ D | κ ) ≤ Pr Q ( C ⊑ D | κ ) and P P = P Q = P Q .Proof. Let P = ( I , P I ). If P is already pithy, then the result holds trivially.Otherwise, there must exist two possible worlds I , J ∈ I such that v I = v J .There are four possible cases: (i) I | = C ⊑ D and J | = C ⊑ D ; (ii) I 6| = C ⊑ D and J 6| = C ⊑ D ; (iii) I | = C ⊑ D and J 6| = C ⊑ D ; and (iv) I 6| = C ⊑ D and J | = C ⊑ D .We construct a new model P ′ by removing one of the possible worlds I , J and redistributing the probability according to these cases, as described next.For the first three cases, define H := I \ {I} and the probability distribution P H ( H ) := ( P I ( H ) H 6 = J P I ( I ) + P I ( J ) H = J for all H ∈ H . Then P P = P P ′ and P ′ = ( H , P H ) is still an ME-model of K .Since the denominator in (3) is P ME R ( κ ) independently of C ⊑ D , we have byconstruction that Pr P ′ ( C ⊑ D | κ ) ≤ Pr P ( C ⊑ D | κ ) . The case (iv) is symmetric to (iii), where the possible world J is removed insteadof I . Since H ⊂ I , we can iteratively repeat this process until a pithy model Q is obtained. Q can be constructed symmetrically. ⊓⊔ otice that since P P = P Q = P Q , this lemma in particular implies that foreach ME-model there exists a pithy ME-model that yields a smaller or equalprobability for the subsumption relation, and dually one that yields a larger orequal probability. Note also that in each pithy ME-model, for all w ∈ Int ( L )with P ME R ( w ) >
0, there must be exactly one possible world I with v I = w because otherwise P I could not be a probability distribution (the elementaryevents could not sum to 1). Moreover, since a pithy interpretation can containat most | Int ( L ) | possible worlds and the world corresponding to some w ∈ Int ( L )must have probability P ME R ( w ), there is only a finite number of probabilities thatpithy models can assign to a given subsumption relation. Hence, the infimum andsupremum that define the sceptical and the credulous degrees of belief actuallycorrespond to minimum and maximum taken by some pithy ME-models. Corollary 30.
Given an
ALCP KB K , two concepts C, D and a context κ , thereexist two pithy ME-models P , Q of K such that B s K ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ) = Pr Q ( C ⊑ D | κ ) . Theorem 18.
Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextsuch that P ME R ( κ ) > . Then B s K ( C ⊑ D | κ ) = P w ∈ Int ( L ) , T w | = C ⊑ D,w | = κ P ME R ( w ) P ME R ( κ ) . Proof.
For every w ∈ Int ( L ), we construct an ALCP -interpretation I w as follows.If T w | = C ⊑ D , then I w is any model ( ∆ I w , · I w , w ) of T w ; otherwise, I w isany model ( ∆ I w , · I w , w ) of T w that does not satisfy C ⊑ D , which must existby definition. Let now P K = ( I , P I ) be the ALCP -interpretation such that I = {I w | w ∈ Int ( L ) } and P I ( I w ) = P ME ( w ) for all w . Then P K is a model of K . Moreover, it holds that Pr P K ( C ⊑ D | κ ) = X I w | = C ⊑ D,w | = κ P I ( I w ) /P ME R ( κ )= X T w | = C ⊑ D P ME R ( w ) /P ME R ( κ ) . Thus, P ME R ( κ ) B s K ( C ⊑ D | κ ) ≤ P T w | = C ⊑ D,w | = κ P ME R ( w ). If this inequality isstrict, then w.l.o.g. there must exist a pithy probabilistic model P = ( J , P J ) of K such that Pr P ( C ⊑ D | κ ) < Pr P K ( C ⊑ D | κ ) (see Lemma 29). Hence for every w ∈ Int ( L ) with P ME R ( w ) > J w ∈ J with v J W = w .We thus have X J w | = C ⊑ D P J ( J w ) < X I w | = C ⊑ D P I ( I w ) . Since P I ( I w ) = P J ( J w ) for all w , then there must exist a valuation v such that I v | = C ⊑ D but J v = C ⊑ D . Since J v is a model of T v it follows that T v = C ⊑ D . By construction, then we have that I v = C ⊑ D , which is acontradiction. ⊓⊔ heorem 19. Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextwith P ME R ( κ ) > . Then B c K ( C ⊑ D | κ ) = 1 − P w ∈ Int ( L ) , T w | = C D,w | = κ P ME R ( w ) P ME R ( κ ) . Proof.
For every w ∈ Int ( L ), construct an ALCP -interpretation I w as follows.If T w | = C D , then I w is any model ( ∆ I w , · I w , w ) of T w ; otherwise, I w isany model ( ∆ I w , · I w , w ) of T w that satisfies C ⊑ D . Let P K = ( I , P I ) be the ALCP -interpretation with I = {I w | w ∈ Int ( L ) } and P I ( I w ) = P ME ( w ) for all w . Then P K is a model of K . Moreover, it holds that Pr P K ( C ⊑ D | κ ) = X I w | = C ⊑ D,w | = κ P I ( I w ) /P ME R ( κ )= 1 − X T w | = A B P ME R ( w ) /P ME R ( κ ) . That is, B c K ( C ⊑ D | κ ) ≥ − P T w | = C ⊑ D,w | = k P ME R ( w ) /P ME R ( κ ). If this inequal-ity is strict, then there exists a probabilistic model P = ( J , P J ) of K such that Pr P ( C ⊑ D | κ ) > Pr P K ( C ⊑ D | κ ). By Lemma 29, we can assume w.l.o.g.that P is pithy. Hence for every w ∈ Int ( L ) with P ME ( w ) > J w ∈ J with v J W = w , and thus, X J w | = C ⊑ D P J ( J w ) > X I w | = C ⊑ D P I ( I w ) . Since P I ( I w ) = P J ( J w ) for all w , there must exist some v ∈ Int ( L ) such that I v = C ⊑ D but J v | = C ⊑ D . As J v is a model of T v , T v = C D . Byconstruction, we have that I v | = C ⊑ D , which is a contradiction. ⊓⊔ Theorem 22.
Let K = ( R , T ) be an ALCP
KB, φ and ψ be consequence for-mulas for C ⊑ D and C D w.r.t. T , respectively, and κ a context. Then B s K ( C ⊑ D | κ ) = P ME R ( φ | κ ) and B c K ( C ⊑ D | κ ) = 1 − P ME R ( ψ | κ ) .Proof. The result is a direct consequence of Definition 21 and Theorems 18and 19. Indeed, B s K ( C ⊑ D | κ ) = X T w | = C ⊑ D,w | = κ P ME R ( w ) /P ME R ( κ )= X w | = φ ∧ κ P ME R ( w ) /P ME R ( κ ) = P ME R ( φ | κ ) . The case of the credulous degree of belief is analogous. ⊓⊔ Theorem 24 (Representation invariance).
Let K i = ( R i , T i ) , i ∈ { , } , betwo KBs such that Mod ( R ) = Mod ( R ) and Mod ( T ) = Mod ( T ) . Then for allconcepts C, D and contexts κ ∈ L , B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) .roof. Let P = ( I , P I ) be an ALCP -interpretation. Since
Mod ( T ) = Mod ( T ), P is consistent with T iff P is consistent with T . Since Mod ( R ) = Mod ( R ), R and R induce the same ME-model and P is (ME-)consistent with R iff P is (ME-)consistent with R . Hence, B s K ( C ⊑ D | κ ) = inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ )= inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ )= B s K ( C ⊑ D | κ )Analogously, we get that B c K ( C ⊑ D | κ ) = B c K ( C ⊑ D | κ ) and therefore B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ). ⊓⊔ Theorem 25 (Language Invariance).
Let K , K be KBs over L , N , N and L , N , N , respectively. If K = K , L ⊆ L , N ⊆ N and N ⊆ N , then forall concepts C, D ∈ N and contexts κ ∈ L , it holds that B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . Proof.
It suffices to show that for every
ALCP -model P of K there exists a ALCP -model P of K such that Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and vice versa . Given an ALCP -model P = ( I , P I ) of K , we build P = ( I , P I )as follows. For each possible world I ∈ I with probability p , I contains apossible world I ′ with probability p that extends I assigning false to all newpropositional variables and the empty set to all new role names and conceptnames. Since C, D, κ and K depend only on sig ( L ) , N , N , P satisfies K and Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) holds. Conversely, consider an ALCP -model P = ( I , P I ) of K . We obtain P from P by restricting thepossible worlds in I to C, D, κ . As before, it follows that P satisfies K and Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ). ⊓⊔ In order to prove Independence, we need the following lemma. It states an inde-pendence property of ME-distributions over our context language.
Lemma 31 (ME-independence).
Let R , R be two finite sets of probabil-ity constraints such that sig ( L ) ∩ sig ( L ) = ∅ , and let R := R ∪ R . Then P ME R = P ME · P ME . In particular, for the marginal distributions of P ME R , wehave P ME R ( v i ) = P ME i ( v i ) for all v i ∈ Int ( L i ) , i ∈ { , } .Proof. Since the signatures of L i are disjoint, we can denote the valuations ofthe language over sig ( L ) ∪ sig ( L ) by ( v , v ), v i ∈ sig ( L i ). Let us first considerthe marginals of P = P ME · P ME . For all v ∈ Int ( L i ), we have P ( v ) = X v ∈ Int ( L i ) P ( v , v ) = P ME ( v ) X v ∈ Int ( L i ) P ME ( v )= P ME ( v ) . ymmetrically, we can show that P ( v ) = P ME ( v ). This means, in particu-lar, that the marginals of P coincide with the corresponding maximum entropysolutions. Therefore, H ( P ) = X v X v P ME ( v ) P ME ( v ) log( P ME ( v ) · P ME ( v ))= X v P ME ( v ) X v P ME ( v ) log P ME ( v )+ X v P ME ( v ) X v P ME ( v ) log P ME ( v )= H ( P ME ) + H ( P ME ) . Using the independence bound for entropy (see, e.g., [27], Theorem 2.39), wehave that H ( P ME R ) ≤ H ( P ME ) + H ( P ME ) = H ( P ) . Hence, it suffices to show that P ME · P ME is indeed a model of R ∪ R . But thisfollows immediately from the facts that P ME i satisfies R i and that the marginal-ization of P over one logic corresponds to the ME-distribution over the other. ⊓⊔ Theorem 26 (Independence).
Let K , K be s.t. sig ( K ) ∩ sig ( K ) = ∅ , C, D be two concepts, and κ a context where ( sig ( C ) ∪ sig ( D ) ∪ sig ( κ )) ∩ sig ( K ) = ∅ .Then B ( C ⊑ K D | κ ) = B ( C ⊑ K ∪K D | κ ) .Proof. Let K i = ( R i , T i ) for i ∈ { , } . Since the signatures of both KBs aredisjoint, we will denote the valuations over the set of variables sig ( R ) ∪ sig ( R )as pairs ( w , w ), where w i is a valuation over sig ( R i ). We know from Theorem 18that P ME ( κ ) B ( C ⊑ K ∪K D | κ ) = X ( w ,w ) | = κ ( T ∪T w ,w | = C ⊑ D P ME R (( w , w ))= X ( w ,w ) | = κ ( T w | = C ⊑ D P ME R (( w , w )) (4)= X w | = κ ( T w | = C ⊑ D P ME R ( w ) (5)= P ME R ( κ ) B ( C ⊑ K D | κ ) , where (4) follows from the monotonicity of subsumption in ALC
TBoxes, and (5)is a consequence of Lemma 31. ⊓⊔ In order to prove Continuity, we start with a lemma that states continuity ofME-distributions over L . The proof is analogous to Paris’ proof of continuity ofmaximum entropy reasoning in his probabilistic logic [22], which is a sub-logicof our probabilistic logic over L . emma 32 (ME-continuity). Let R be a set of probabilistic constraints andlet ( R i ) be a sequence of probabilistic constraints such that ( Mod ( R i )) con-verges to Mod ( R ) . Then the sequence ( P i ) of ME -Models of R i converges tothe ME-model P ME R of R .Proof. For brevity, let M = Mod ( R ) and M i = Mod ( R i ). We show that for each ǫ >
0, there is a δ > kK i , Kk B < δ implies that k P ME i , P ME R k < ǫ .Consider the set S = { P ∈ M | k P, P ME R k ≥ ǫ } of models of of R that have atleast distance ǫ to P ME R . By continuity of the euclidean distance and compact-ness of M , S must be compact. Since the entropy function H is continuous, theminimum ν = min { H ( P ME R ) − H ( P ) | P ∈ S } does exist and ν > P ME R . Since H is defined on a compact set (the set of probabil-ity distributions over L ), H is uniformly continuous. Therefore there exists a δ > P , P over L , k P , P k < δ implies that H ( P ) − H ( P ) < min { ǫ , ν } . In partiular, we can assume that δ < ǫ . Now, if k M i , M k B < δ , there is a P i ∈ M i such that k P i , P ME R k < δ and a P ∈ M suchthat k P ME i , P k < δ . Hence, H ( P ME R ) < H ( P i ) + ν ≤ H ( P ME i ) + ν ,H ( P ME i ) < H ( P ) + ν ≤ H ( P ME R ) + ν | H ( P ME i ) − H ( P ME R ) | < ν . In particular, | H ( P ) − H ( P ME R ) |≤ | H ( P ) − H ( P ME i ) | + | H ( P ME i ) − H ( P ME R ) | < ν ν ν. By definition of ν , we can conclude that P ∈ M \ S and therefore k P ME R , P k < ǫ .Hence, k P ME R , P ME i k ≤ k P ME R , P k + k P, P ME i k < ǫ δ < ǫ. ⊓⊔ Theorem 27 (Continuity).
Let ( K i ) be a convergent sequence of KBs withlimit K and B K i ( C ⊑ D | κ ) = [ ℓ i , u i ] . If B K ( C ⊑ D | κ ) = [ ℓ, u ] , then ( l i ) converges to ℓ and ( u i ) converges to u (with respect to the usual topology on R ).Proof. Let K = ( R , T ) and K i = ( R i , T i ). By assumption, ( Mod ( R i )) convergesto Mod ( R ). Hence, Lemma 32 implies that the probability distributions inducedby ME-models of K i converge to P ME R , which, in turn, is the probability distri-bution induced by all ME-models of K . Hence, infimum and supremum of theconditonal probability of C ⊑ D given κ with respect to K i will converge toinfimum and supremum with respect to K ..