[PDF] Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version)

Abstract

A central question for knowledge representation is how to encode and handle uncertain knowledge adequately. We introduce the probabilistic description logic ALCP that is designed for representing context-dependent knowledge, where the actual context taking place is uncertain. ALCP allows the expression of logical dependencies on the domain and probabilistic dependencies on the possible contexts. In order to draw probabilistic conclusions, we employ the principle of maximum entropy. We provide reasoning algorithms for this logic, and show that it satisfies several desirable properties of probabilistic logics.

Full PDF

aa r X i v : . [ c s . A I] J un Probabilistic Reasoning in the Description Logic

ALCP with the Principle of Maximum Entropy

Rafael Pe˜naloza and Nico Potyka Free University of Bozen-Bolzano, Italy [email protected] University of Osnabr¨uck, Germany [email protected]

Abstract.

A central question for knowledge representation is how toencode and handle uncertain knowledge adequately. We introduce theprobabilistic description logic

ALCP that is designed for representingcontext-dependent knowledge, where the actual context taking place isuncertain.

ALCP allows the expression of logical dependencies on thedomain and probabilistic dependencies on the possible contexts. In orderto draw probabilistic conclusions, we employ the principle of maximumentropy. We provide reasoning algorithms for this logic, and show thatit satisﬁes several desirable properties of probabilistic logics.

A fundamental element of any intelligent application is storing and manipulat-ing the knowledge from the application domain. Logic-based knowledge repre-sentation languages such as description logics (DLs) [1] provide a clear syntaxand unambiguous semantics that guarantee the correctness of the results ob-tained. However, languages based on classical logic are ill-suited for handlingthe uncertainty inherent to many application domains. To overcome this limi-tation, various probabilistic logics have been investigated during the last threedecades (e.g., [3, 15, 20]). In particular, several probabilistic DLs have been de-veloped [18, 19]. To handle probabilistic knowledge, many approaches require acomplete deﬁnition of joint probability distributions (JPD) [5, 6, 8, 16, 25]. Oneapproach to avoid a full JPD speciﬁcation was proposed by Paris [22]: the usergives a partial speciﬁcation through a set of probabilistic constraints and thepartial knowledge is completed by means of the principle of maximum entropy.In this paper we consider a new probabilistic extension of description logicsbased on the principle of maximum entropy. In our approach we group diﬀer-ent axioms from a knowledge base together into so-called contexts, which areidentiﬁed by a propositional formula. Intuitively, each context corresponds to apossible situation, in which the associated sub-KB is guaranteed to hold. Un-certainty is associated to the contexts through a set of probabilistic constraints,which are interpreted under the principle of maximum entropy.To facilitate the understanding of our approach, we focus on the DL

ALC [26]as a prototypical example of a knowledge representation language, and propo-sitional probabilistic constraints as the framework for expressing uncertainty.s reasoning service we consider subsumption relations between concepts givensome partial knowledge of the current context. Since the knowledge in a knowl-edge base is typically incomplete, one cannot expect to obtain a precise probabil-ity for a given consequence. Instead, we compute a belief interval that describesall the probability degrees that can be associated to the consequence withoutcontradiction. The lowest bound of the interval corresponds to a sceptical view,considering only the most fundamental models of the knowledge base. The upperbound, in contrast, reﬂects the credulous belief in which every context that isnot explicitly removed is considered. In the worst-case, we get the trivial inter-val [0 , . , . .

05. In contrast, when the probability interval that a treatment will besuccessful is [0 . , . . – we deﬁne the new probabilistic description logic ALCP that allows for aﬂexible description of axiomatic dependencies, and its reasoning problems(Section 3); – we explain in detail how degrees of belief for the subsumption problem canbe computed (Section 4); and – we show that ALCP satisﬁes several desirable properties of probabilisticlogics (Section 5).

We start by recalling the basic notions of probabilistic propositional logic andthe principle of maximum entropy.Let L be a propositional language constructed over a ﬁnite signature sig ( L ),i.e., a set of propositional variables, in the usual way. An L -interpretation v is atruth assignment of the propositional variables in sig ( L ). Int ( L ) denotes the set ofall L -interpretations. Satisfaction of a formula φ ∈ L by an L -interpretation v ∈ Int ( L ) (denoted v | = φ ) is deﬁned as usual. A probability distribution over L is afunction P : Int ( L ) → [0 ,

1] where P v ∈ Int ( L ) P ( v ) = 1. Probability distributionsare extended to arbitrary L -formulas φ by setting P ( φ ) = P v | = φ P ( v ). Deﬁnition 1 (probabilistic constraints, models).

Given the propositionallanguage L , a probabilistic constraint (over L ) is an expression of the form c + k X i =1 c i · p ( φ i ) ≥ where c , c i ∈ R , and φ i ∈ L , ≤ i ≤ k . A probability distribution P over L is a model of the probabilistic constraint c + P ki =1 c i · p ( φ i ) ≥ if and only if c + P ki =1 c i · P ( φ i ) ≥ . The distribution P is a model of the set of probabilisticontraints R ( P | = R ) iﬀ it satisﬁes all the constraints in R . The set of allmodels of R is denoted by Mod ( R ) . If Mod ( R ) = ∅ , we say that R is consistent . Our probabilistic constraints can express the most common types of constraintsconsidered in the literature of probabilistic logics. For instance, probabilisticconditionals ( ψ | φ )[ ℓ, u ] are satisﬁed iﬀ ℓ · P ( φ ) ≤ P ( ψ ∧ φ ) ≤ u · P ( φ ) [17].That is, the conditional is satisﬁed iﬀ the conditional probability of ψ given φ is between ℓ and u whenever P ( φ ) >

0. Sometimes P ( φ ) > p ( ψ ∧ φ ) − ℓ · p ( φ ) ≥ , and u · p ( φ ) − p ( ψ ∧ φ ) ≥ . Probabilistic constraints can also express more complex restrictions; for example,we can state that the probability that a bird cannot ﬂy is at most one fourth ofthe probability that a bird ﬂies through the constraint14 p ( bird ∧ flies ) − p ( bird ∧ ¬ flies ) ≥ . (2)To improve readability, we will often rewrite constraints in a more compactmanner, using conditionals as in the ﬁrst example, or e.g. rewriting (2) as p ( bird ∧ flies ) ≥ p ( bird ∧ ¬ flies ) . In general, consistent sets of probabilistic constraints have inﬁnitely manymodels, and there is no obvious way to distinguish between them. One well-studied approach for dealing with this diversity is to focus on the model thatmaximizes the entropy H ( P ) = − X v ∈ Int ( L ) P ( v ) · log P ( v ) . From an information-theoretic point of view, the maximum entropy (ME) dis-tribution can be regarded as the most conservative one in the sense that itminimizes the information-theoretic distance (that is, the KL-divergence) to theuniform distribution among all probability distributions that satisfy our con-straints. In particular, if there are no restrictions on the probability distributionsconsidered, then the uniform distribution is the ME distribution, see, e.g., [27]for a more detailed discussion of these issues. A complete characterization ofmaximum entropy for the purpose of uncertain reasoning can be found in [22].

Deﬁnition 2 (ME-model).

Let R be a consistent set of probabilistic con-straints. The ME-model P ME R of R is the unique solution of the maximizationproblem arg max P | = R H ( P ) . Existence and uniqueness of P ME R follows from the fact that H is strictly concaveand continuous, and that the probability distributions that satisfy R form aompact and convex set. P ME R is usually computed by deriving an unconstrainedoptimization problem by means of the Karush-Kuhn-Tucker conditions. Theresulting problem can be solved, for instance, by (quasi-)Newton methods withcost | Int ( L ) | , see, e.g., [21] for more details on these techniques. ALCP

ALCP is a probabilistic extension of the classical description logic

ALC capableof expressing complex logical and probabilistic relations. As with classical DLs,the main building blocks in

ALCP are concepts . Syntactically,

ALCP conceptsare constructed exactly as

ALC concepts. Given two disjoint sets N C of conceptnames and N R of role names , ALCP concepts are built using the grammar rule C ::= A | ¬ C | C ⊓ C | ∃ r.C, where A ∈ N C and r ∈ N R . Note that we canderive disjunction, universal quantiﬁcation and subsumption from these rulesby using logical equivalences like C ⊔ C ≡ ¬ ( ¬ C ⊓ ¬ C ). The knowledge ofthe application domain is expressed through a ﬁnite set of axioms that restrictthe way the diﬀerent concepts and roles may be interpreted. To express bothprobabilistic and logical relationships, each axiom is annotated with a formulafrom L that intuitively expresses the context in which this axiom holds. Deﬁnition 3 (KB). An L -restricted general concept inclusion ( L -GCI) is ofthe form h C ⊑ D : κ i where C, D are

ALCP concepts and κ is an L -formula.An L -TBox is a ﬁnite set of L -GCIs. An ALCP knowledge base (KB) over L is a pair K = ( R , T ) where R is a set of probabilistic constraints and T is an L -TBox.Example 4. Consider an application modeling beliefs about bacterial and viralinfections using the concept names strep (streptococcal infection), bac (bacterialinfection), vir (viral infection), inf (infection), and ab (antibiotic); and the rolenames sf (suﬀers from), and suc (successful treatment); and the propositionalvariables res (antibiotic resistance), and h (heavy use of antibiotics by patient).Deﬁne the L -TBox T exa containing the L -GCIs h∃ sf . bac ⊑ ∃ suc . ab : ¬ res ∧¬ h i , h∃ sf . vir ⊑ ¬∃ suc . ab : ⊤i , h strep ⊑ bac : ⊤i , h∃ sf . bac ⊑ ¬∃ suc . ab : res i , h bac ⊑ inf : ⊤i , h vir ⊑ inf : ⊤i , where ⊤ is any L -tautology. For example, the ﬁrst L -GCI states that a bacterialinfection can be treated successfully with antibiotics if no antibiotic resistanceis present and there was no heavy use of antibiotics; the second one states thatviral infections can never be treated with antibiotics successfully. Consider ad-ditionally the set R containing the probabilistic constraints containing( res )[0 . , ( res | h )[0 . . That is, the probability of an antibiotic resistance is 5% if no further informationis given. However, if the patient used antibiotics heavily, the probability increasesto 80%.otice that the probabilistic constraints, and hence the representation of theuncertainty in the knowledge, refer only to the propositional formulas that labelthe L -GCIs. In ALCP , the uncertainty of the knowledge is handled throughthese propositional formulas as explained next.A possible world interprets both the axiom language (i.e., the concept androle names) and the context language (the propositional variables). Intuitively,it describes a possible context ( L -interpretation) together with the relationshipsbetween concepts in that situation ( ALC -interpretation).

Deﬁnition 5 (possible world). A possible world is a triple I = ( ∆ I , · I , v I ) where ∆ I is a non-empty set (called the domain ), v I is an L -interpretation,and · I is an interpretation function that maps every concept name A to a set A I ⊆ ∆ I and every role name r to a binary relation r I ⊆ ∆ I × ∆ I . The interpretation function · I is extended to complex concepts as usual in DLsby letting ( ¬ C ) I := ∆ I \ C I ; ( ∃ r.C ) I := { d ∈ ∆ I | ∃ e ∈ ∆ I . ( d, e ) ∈ r I , e ∈ C I } ;and ( C ⊓ D ) I := C I ∩ D I . A possible world is a model of an L -GCI iﬀ it satisﬁesthe description logic constraint of the axiom whenever it satisﬁes the context. Deﬁnition 6 (model of TBox).

The possible world I = ( ∆ I , · I , v I ) is a model of the L -GCI h C ⊑ D : κ i , denoted as I | = h C ⊑ D : κ i , iﬀ (i) v I = κ , or(ii) C I ⊆ D I . It is a model of the L -TBox T iﬀ it is a model of all the L -GCIsin T . The classical DL

ALC is a special case of

ALCP where all the axioms are anno-tated with an L -tautology ⊤ . To preserve the syntax of classical DLs, we denotesuch L -GCIs as C ⊑ D instead of h C ⊑ D : ⊤i . In this case, the condition (i)from Deﬁnition 6 cannot be satisﬁed, and hence a model is required to satisfy C I ⊆ D I for all L -GCIs C ⊑ D in the TBox. For a deeper introduction toclassical ALC , see [1].According to our semantics, we only demand that the L -GCIs are satisﬁedin some speciﬁc contexts. Thus, it is often useful to focus on the classical ALC

TBox that contains the knowledge that holds in a particular situation. For a KB K = ( R , T ) and v ∈ Int ( L ), the v-restricted TBox is the ALC

TBox T v := { C ⊑ D | h C ⊑ D : κ i ∈ T , v | = κ } . The possible world I satisﬁes T v ( I | = T v ) if for all L -GCIs C ⊑ D ∈ T v itholds that C I ⊆ D I . In the following, we will often consider subsumption and strong non-subsumption between concepts w.r.t. a restricted TBox. We say that C is subsumed by D w.r.t. T v ( T v | = C ⊑ D ) if for every I | = T v it holds that C I ⊆ D I . Dually, C is strongly non-subsumed by D w.r.t. T v ( T v | = C D ) if forevery I | = T v , C I D I holds. Notice that strong non-subsumption requires thatthe inclusion between axioms does not hold in any possible world satisfying T v .Hence, this condition is more strict than just negating the subsumption relation.We now describe how the probabilistic constraints are handled in our logic.An ALCP -interpretation consists of a ﬁnite set of possible worlds and a proba-bility function over these worlds. eﬁnition 7 (

ALCP -interpretation). An ALCP -interpretation is a pair ofthe form P = ( I , P I ) , where I is a non-empty, ﬁnite set of possible worlds and P I is a probability distribution over I . Each

ALCP -interpretation induces a probability distribution over L . The prob-ability of a context can be obtained by adding the probabilities of all possibleworlds in which this context holds. Deﬁnition 8 (distribution induced by P ). Let P = ( I , P I ) be an ALCP -in-terpretation. The probability distribution P P : Int ( L ) → [0 ,

1] induced by P isdeﬁned by P P ( v ) := P I∈ I | v P I ( I ) , where I | v = { ( ∆ I , · I , v I ) ∈ I | v I = v } . As usual, reasoning is restricted to interpretations that satisfy the restrictionsimposed by the knowledge base. In our case, we have to demand that the in-terpretation is consistent with both the classical and the probabilistic part ofour knowledge base. That is, we consider only those possible worlds that satisfyboth the terminological knowledge ( T ) and the probabilistic constraints ( R ). Deﬁnition 9 (model).

Let P = ( I , P I ) be an ALCP -interpretation. P is con-sistent with the TBox T if every I ∈ I is a model of T . P is consistent with theset of probabilistic constraints R iﬀ P P | = R . The ALCP -interpretation P is a model of the KB K = ( R , T ) iﬀ it is consistent with both T and R . As usual, aKB is consistent iﬀ it has a model. Notice that

ALCP -KBs can express both, logical and probabilistic dependenciesbetween axioms. For instance, two L -GCIs h C ⊑ D : κ i and h C ⊑ D : κ i where κ ⇒ κ express that whenever the ﬁrst L -GCI is satisﬁed, the secondone must also hold. Similarly, the probabilistic dependencies between axioms areexpressed via the probabilistic constraints of the labeling formulas.We are interested in computing degrees of belief for subsumption relationsbetween concepts. We deﬁne the conditional probability of a subsumption rela-tion given a context with respect to a given ALCP -interpretation following theusual notions of conditioning.

Deﬁnition 10 (probability of subsumption).

Let

C, D be concepts, κ a con-text and P an ALCP -interpretation. The conditional probability of C ⊑ D given κ with respect to P isPr P ( C ⊑ D | κ ) := P I∈ I , I| = κ, I| = C ⊑ D P I ( I ) P I∈ I , I| = κ P I ( I ) . (3)Notice that the denominator in (3) can be rewritten as X I∈ I , I| = κ P I ( I ) = X v | = κ X I∈ I | v P I ( I ) = X v | = κ P P ( v ) = P P ( κ ) . As usual, the conditional probability is only well-deﬁned when P P ( κ ) > R may be satisﬁed by aninﬁnite class of probability distributions. In the spirit of maximum entropy rea-soning, we consider only the most conservative ones in the sense that they inducethe ME-model P ME R of R . eﬁnition 11 (ME- ALCP -model). An ALCP -model P of K is called an ME-

ALCP -model of K iﬀ P P = P ME R . The set of all ME- ALCP -models of K isdenoted by Mod ME ( K ) . K is called ME-consistent iﬀ Mod ME ( K ) = ∅ . Note that ME-consistency is a strictly stronger notion of consistency. ME-consis-tent knowledge bases are always consistent, but the converse does not necessarilyhold if the classical TBox obtained from T by restricting to a context is incon-sistent as we show in the following example. Example 12.

Let sig ( L ) = { x } and K = ( R , T ) be the KB with R = ∅ and T = {h A ⊔ ¬ A ⊑ A ⊓ ¬ A : x i} . Since A ⊔ ¬ A ⊑ A ⊓ ¬ A is contradictorial, each ALCP -model of K must satisfy ¬ x . There certainly are such models, but in eachsuch model P , P P ( x ) = 0. However, since R = ∅ , we have P ME R ( x ) = 0 . K has no ME-model.ME-inconsistency rules out some undesired cases in which the whole knowledgebase is consistent, but the TBox restricted to some context is inconsistent. Thefollowing theorem gives a simple characterization of ME-consistency: to verifyME-consistency of a KB, it suﬃces to check consistency of the TBoxes inducedby the L -interpretations that have positive probability with respect to P ME R . Bythe properties of the ME distribution, these are the interpretations that are notexplicitly restricted to have zero probability through R . Theorem 13.

The KB K = ( R , T ) is ME-consistent iﬀ for every v ∈ Int ( L ) such that P ME R ( v ) > , T v is consistent. For the rest of this paper we consider only ME-consistent KBs. Hence, wheneverwe speak of a KB K , we implicitly assume that K has at least one ME-model.We are interested in computing the probability of a subsumption relationw.r.t. a given KB K . Notice that, although we consider only one probability dis-tribution P ME R , there can still exist many diﬀerent ME-models of K , which yielddiﬀerent probabilities for the same subsumption relation. One way to handlethis is to consider the smallest and largest probabilities that can be consistentlyassociated to this relation. We call them the sceptical and the creduluos degreesof belief, respectively. Deﬁnition 14 (degree of belief ).

Let

C, D be ALCP concepts, κ a context,and K = ( R , T ) an ALCP

KB. The sceptical degree of belief of C ⊑ D given κ w.r.t. K is B s K ( C ⊑ D | κ ) := inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ ) . The credulous degree of belief of C ⊑ D given κ w.r.t. K is B c K ( C ⊑ D | κ ) := sup P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ ) . Example 15.

Consider K exa from Example 4. If we ask for the degrees of beliefthat a patient who suﬀers from an infection can be successfully treated withntibiotics, we obtain B s K exa ( ∃ sf . inf ⊑ ∃ suc . ab | ⊤ ) = 0 , B c K exa ( ∃ sf . inf ⊑ ∃ suc . ab | ⊤ ) = 1 . These bounds are not very informative, but they are perfectly justiﬁed by ourknowledge base since we do not know anything about the eﬀectiveness of an-tibiotics with respect to infections in general. However, for a patient who suﬀersfrom a streptococcal infection we get B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = 0 . , B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = 0 . . If we know that this patient used antibiotics heavily in the past, then thereis nothing in our knowledge base that guarantees the existence of a successfultreatment. Hence, the degrees of belief become B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 0 B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 0 . . Our deﬁnition of the sceptical degree of belief raises a philosophical question:should there be no diﬀerence between the degree of belief 0 and an inﬁnitelysmall degree of belief? A dual question arises for the credulous degree of beliefand the probability 1. However, as we show in the next section, the sceptical andcredulous degrees of belief actually correspond to minimum and maximum ratherthan to inﬁmum and supremum (see Corollary 30) so that these questions becomevacuous. From the following theorem we can conclude that every intermediatedegree can also be obtained by some model of the KB.

Theorem 16 (Intermediate Value Theorem).

Let p < p and P and P be two ME- ALCP -models of the KB K = ( R , T ) such that Pr P ( C ⊑ D | κ ) = p and Pr P ( C ⊑ D | κ ) = p . Then for each p between p and p there exists anME- ALCP -model P of K such that Pr P ( C ⊑ D | κ ) = p As we will show in Corollary 30, both the sceptical degree B s K ( C ⊑ D | κ ) andthe credulous degree B c K ( C ⊑ D | κ ) are in fact witnessed by some ME-models.Therefore it is meaningful to consider the whole interval of beliefs between B s K ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ). Deﬁnition 17 (belief interval).

Let

C, D be ALCP concepts, κ ∈ L a contextand K = ( R , T ) a ALCP

KB. The belief interval for C ⊑ D w.r.t. K given κ is B K ( C ⊑ D | κ ) := [ B s K ( C ⊑ D | κ ) , B c K ( C ⊑ D | κ )] . Computing Beliefs

In this section we show how to compute the belief interval. The ﬁrst theoremstates that the sceptical degreef of belief for a subsumption relation can becomputed by adding the probabilities of those L -interpretations w that entailthis subsumption in the corresponding restricted TBox T w . Theorem 18.

Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextsuch that P ME R ( κ ) > . Then B s K ( C ⊑ D | κ ) = P w ∈ Int ( L ) , T w | = C ⊑ D,w | = κ P ME R ( w ) P ME R ( κ ) . Dually, the credulous degree of belief for a subsumption relation can be computedby removing all the situations in which this relation cannot possibly hold.

Theorem 19.

Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextwith P ME R ( κ ) > . Then B c K ( C ⊑ D | κ ) = 1 − P w ∈ Int ( L ) , T w | = C D,w | = κ P ME R ( w ) P ME R ( κ ) . To prove these theorems, one can build two models of the KB K , P and Q suchthat Pr P ( C ⊑ D | κ ) and Pr Q ( C ⊑ D | κ ) are those degrees expressed byTheorems 18 and 19, respectively. As a byproduct of these proofs, we obtainthat the inﬁmum and supremum that deﬁne the sceptical and the credulousdegrees of belief actually correspond to minimum and maximum taken by someME-models, yielding the following corollary. Corollary 20.

Let K be an ALCP

KB,

C, D be two concepts, and κ be a context.There exist two ME-models P , Q of K with B s K ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ) = Pr Q ( C ⊑ D | κ ) . The direct consequence of Theorems 18 and 19 is that if we want to computethe belief interval for C ⊑ D given some context, it suﬃces to identify all L -interpretations whose induced (classical) TBoxes entail the subsumption rela-tion C ⊑ D (for the sceptical belief) or the strong non-subsumption C D (forcredulous belief). Recall that every set of propositional interpretations can berepresented by a propositional formula. This motivates the following deﬁnition. Deﬁnition 21 (consequence formula). An L -formula φ is a consequenceformula for C ⊑ D (respectively C D ) w.r.t. the L -TBox T if for every w ∈ Int ( L ) it holds that w | = φ iﬀ T w | = C ⊑ D (respectively T w | = C D ). If we are able to compute these consequence formulas, then the computation ofthe belief interval can be reduced to the evaluation of the probability of theseformulas w.r.t. the ME-distribution satisfying R . Theorem 22.

Let K = ( R , T ) be an ALCP

KB, φ and ψ be consequence for-mulas for C ⊑ D and C D w.r.t. T , respectively, and κ a context. Then B s K ( C ⊑ D | κ ) = P ME R ( φ | κ ) and B c K ( C ⊑ D | κ ) = 1 − P ME R ( ψ | κ ) . lgorithm 1 Computing degrees of belief

Input: KB K = ( R , T ), concepts C, D , context κ Output:

Belief degrees (cid:0) B s K ( C ⊑ D | κ ) , B c K ( C ⊑ D | κ ) (cid:1) ℓ s ← ℓ c ← for all v ∈ Int ( L ) doif v | = κ thenif T v | = C ⊑ D then ℓ s ← ℓ s + P ME R ( v ) else if T v | = C D then ℓ c ← ℓ c + P ME R ( v ) return (cid:0) ℓ s /P ME R ( κ ) , − ℓ c /P ME R ( κ ) (cid:1) Example 23.

In our running example, one can see that a consequence formulafor ∃ sf . strep ⊑ ∃ suc . ab is ¬ res ∧ ¬ h . Indeed, in order to deduce this consequenceit is necessary to satisfy the ﬁrst axiom of T exa , which is only guaranteed in thecontext ¬ res ∧¬ h . Similarly, res is a consequence formula for ∃ sf . strep suc . ab .Knowing both the consequence formulas and the ME-model, we can deduce B s K exa ( ∃ sf . strep ⊑ ∃ suc . ab | ⊤ ) = P ME R ( ¬ res ∧ ¬ h ) = 0 . B c K exa ( ∃ sf . strep ⊑ ∃ suc . ab | h ) = 1 − P ME R ( res | h ) = 0 . . In particular, Theorem 22 implies that the belief interval can be computed intwo phases. The ﬁrst phase uses purely logical reasoning to compute the con-sequence formulas, while the second phase applies probabilistic inferences tocompute the degrees of belief from these formulas. We now brieﬂy explain howthe consequence formulas can be computed.Notice ﬁrst that subsumption and non-subsumption are monotonic conse-quences in the sense of [2]; that is, if an

ALC

TBox T entails the subsumption C ⊑ D , then every superset of T also entails this consequence. Similarly, addingmore axioms to a TBox entailing C D does not remove this entailment. More-over, the set of all L -formulas (modulo logical equivalence) forms a distributivelattice ordered by generality, in which L -interpretations are all the join primeelements. Thus, the consequence formulas from Deﬁnition 21 are in fact the so-called boundaries from [2]. Hence, they can be computed using any of the knownboundary computation approaches.Assuming that the number of contexts is small in comparison to the size ofthe TBox, it is better to compute the degrees of belief through a more directapproach following Theorems 18 and 19. In order to compute B s K ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ), it suﬃces to enumerate all interpretations v ∈ Int ( L ) and checkwhether T v | = C ⊑ D or T v | = C D , and v | = κ , or not (see Algorithm 1). Thisapproach requires 2 | sig ( L ) | calls to a standard ALC reasoner, and each of thesecalls runs in exponential time on |T | [9]. Notice that this algorithm has an any-time behaviour: it is possible to stop its execution at any moment and obtain anapproximation of the belief interval. Moreover, the longer the algorithm runs,he better this approximation becomes. Thus, this method is adequate for asystem where ﬁnding good approximations eﬃciently may be more importantthan computing the precise answers.

We now investigate some properties of probabilistic logics [22]. First we showthat

ALCP is language and representation invariant . Invariance is meant withrespect to logical objects. Language invariance means that just extending thelanguage without changing the knowledge base should not aﬀect reasoning re-sults. Representation invariance means that equivalent knowledge bases shouldyield equal inference results. Notice that diﬀerent notions of representation de-pendence exist in the literature. For instance, in [11] a very diﬀerent notion isconsidered, where the language and the knowledge base are changed simultane-ously. This case is not covered by our notion of representation invariance. ALCP also satisﬁes an independence property; i.e., reasoning results about a part of thelanguage are not changed, when we add knowledge about an independent partof the language. Finally,

ALCP is continuous in the sense that minor changes inthe probabilistic knowledge expressed by a knowledge base cannot induce majorchanges in the reasoning results. Theorem 24 (Representation invariance).

Let K i = ( R i , T i ) , i ∈ { , } , betwo KBs such that Mod ( R ) = Mod ( R ) and Mod ( T ) = Mod ( T ) . Then for allconcepts C, D and contexts κ ∈ L , B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . ALCP is not only representation invariant, but also language invariant. Thisproperty is of computational interest, in particular in combination with inde-pendence, that we investigate subsequently. To illustrate this, suppose that weadded knowledge about bone fractures in our medical example, which is inde-pendent of the knowledge about infections. Independence guarantees that wecan ignore the knowledge about infections when answering queries about bonefractures. In this way, we can decrease the size of the knowledge base. Languageinvariance guarantees that we can also ignore the concepts, relations and propo-sitional variables related to the infection domain. Thus, we can decrease thesize of the language. Exploiting both properties, the size of the computationalproblems can sometimes be decreased signiﬁcantly.

Theorem 25 (Language Invariance).

Let K , K be KBs over L , N , N and L , N , N , respectively. If K = K , L ⊆ L , N ⊆ N and N ⊆ N , then forall concepts C, D ∈ N and contexts κ ∈ L , it holds that B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . For an L -TBox T , we deﬁne the signature of T to be the set sig ( T ) of allconcept names and role names appearing in T . Likewise, sig ( R ) is the set of allpropositional variables appearing in R . The signature of a KB K = ( R , T ) is sig ( K ) := sig ( R ) ∪ sig ( T ). heorem 26 (Independence). Let K , K be s.t. sig ( K ) ∩ sig ( K ) = ∅ , C, D be two concepts, and κ a context where ( sig ( C ) ∪ sig ( D ) ∪ sig ( κ )) ∩ sig ( K ) = ∅ .Then B ( C ⊑ K D | κ ) = B ( C ⊑ K ∪K D | κ ) . The last property we consider is continuity. One important practical featureof continuous probabilistic logics is that they guarantee a numerically stablebehaviour. That is, minor rounding errors due to ﬂoating-point arithmetic willnot result in major errors in the computed probabilities. As demonstrated byParis in [22], measuring the diﬀerence between probabilistic knowledge basesis subtle and is best addressed by comparing knowledge bases extensionally;i.e., with respect to their model sets. To this end, Paris considered the Blaschkemetric. Formally, the

Blaschke distance k S , S k B between two convex sets S , S is deﬁned by inf { δ ∈ R | ∀ P ∈ S ∃ P ∈ S : k P , P k ≤ δ and ∀ P ∈ S ∃ P ∈ S : k P , P k ≤ δ } Intuitively, k S , S k B is the smallest real number d such that for each distribu-tion in one of the sets, there is a probability distribution in the other that hasdistance at most d to the former. We say that a sequence of knowledge bases ( K i )converges to a knowledge base K iﬀ the classical part of each K i is equivalent tothe classical part of K and the probabilistic part converges to the probabilisticpart of K . Our reasoning approach behaves indeed continuously with respect tothis metric. Theorem 27 (Continuity).

Let ( K i ) be a convergent sequence of KBs withlimit K and B K i ( C ⊑ D | κ ) = [ ℓ i , u i ] . If B K ( C ⊑ D | κ ) = [ ℓ, u ] , then ( l i ) converges to ℓ and ( u i ) converges to u (with respect to the usual topology on R ). Relational probabilistic logical approaches can be roughly divided into those thatconsider probability distributions over the domain, those that consider proba-bility distributions over possible worlds and those that combine both ideas [10].Our framework belongs to the second group. Maximum entropy reasoning inpropositional probabilistic logics has been discussed extensively, e.g., in [13, 22],and various extensions to ﬁrst-order languages have been considered in recentyears [3, 4, 14, 15]. In these works, the domain is restricted to a ﬁnite number ofconstants or bounded in the limit. We circumvent the need to do so by combin-ing a classical ﬁrst-order logic with unbounded domain with a probabilistic logicwith ﬁxed domain.Many probabilistic DLs have also been considered in the last decades [16,18, 19]. Our approach is closest to Bayesian DLs [5, 6] and disponte [25]. Thegreatest diﬀerence with the former lies in the fact that

ALCP

KBs do not re-quire a complete speciﬁcation of the probability distribution, but only a setof probabilistic constraints. Moreover, the previous formalisms consider only theceptical degree of belief, while we are interested in the full belief interval. In con-trast to disponte , ALCP is capable of expressing both, logical and probabilisticdependencies between the axioms in a KB; in addition, disponte requires alluncertainty degrees to be assigned as mutually independent point probabilities,while

ALCP allows for a more ﬂexible speciﬁcation.

We have introduced the probabilistic DL

ALCP , which extends the classical DL

ALC with the capability of expressing and reasoning about uncertain contextualknowledge deﬁned through the principle of maximum entropy. Eﬀective reason-ing methods were developed using the decoupling between the logical and theprobabilistic components of

ALCP

KBs. We also studied the properties of thislogic in relation to other probabilistic logics.We plan to extend this work in several directions. First, instead of consideringthe ME-model, we could reason over all probability distributions that satisfy ourprobabilistic constraints similar to [12, 17, 20]. This will result in larger beliefintervals in general. A smaller interval is preferable since it corresponds to amore precise degree of belief. However, when using all probability distributionsthe size of the interval can be a good indicator for the variation of the possiblebeliefs in our query with respect to the knowledge base.In some applications it is also useful to allow more expressive propositionalor relational context languages like those proposed in [4, 7, 15, 23]. Similarly,we can consider other DLs for our concept language. Indeed,

ALC was chosenas a prototypical DL for studying the basic properties of our framework. In-cluding additional constructors into the formalism should be relatively simple.In contrast, considering other reasoning problems beyond subsumption is lessstraightforward. Recall, for instance, that if an

ALCP KB K contains an incon-sistent context with positive probability, then K has no models. It is thus unclearhow to handle the probability of consistency of a KB.Practical reasoning with ALCP can be currently performed by combining ex-isting ME-reasoners with any ALC -reasoner according to Algorithm 1. Clearly,such an approach can still be further optimized. We are working on combiningthe classical and probabilistic reasoning parts in more sophisticated ways. References

1. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.(eds.): The Description Logic Handbook: Theory, Implementation, and Applica-tions. Cambridge University Press, 2nd edn. (2007)2. Baader, F., Knechtel, M., Pe˜naloza, R.: Context-dependent views to axioms andconsequences of semantic web ontologies. J. of Web Semantics 12–13, 22–40 (2012) http://owl.cs.manchester.ac.uk/tools/list-of-reasoners/. Barnett, O., Paris, J.B.: Maximum entropy inference with quantiﬁed knowledge.Logic Journal of the IGPL 16(1), 85–98 (2008)4. Beierle, C., Kern-Isberner, G., Finthammer, M., Potyka, N.: Extending and com-pleting probabilistic knowledge and beliefs without bias. KI 29(3), 255–262 (2015)5. Ceylan, I.I., Pe˜naloza, R.: The Bayesian description logic BEL. In: Proc. of IJCAR2014. LNCS, vol. 8562, pp. 480–494. Springer (2014)6. d’Amato, C., Fanizzi, N., Lukasiewicz, T.: Tractable reasoning with Bayesian de-scription logics. In: Proc. SUM 2008. LNCS, vol. 5291, pp. 146–159. Springer (2008)7. De Bona, G., Cozman, F.G., Finger, M.: Towards classifying propositional proba-bilistic logics. J. of Applied Logic 12(3), 349–368 (2014)8. Domingos, P.M., Lowd, D.: Markov Logic: An Interface Layer for Artiﬁcial Intelli-gence. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, Morgan& Claypool Publishers (2009)9. Donini, F.M., Massacci, F.: ExpTime tableaux for ALC . Artiﬁcial Intelligence124(1), 87–138 (2000)10. Halpern, J.Y.: An analysis of ﬁrst-order logics of probability. Artiﬁcial Intelligence46, 311–350 (1990)11. Halpern, J.Y., Koller, D.: Representation dependence in probabilistic inference.JAIR pp. 319–356 (2004)12. Hansen, P., Perron, S.: Merging the local and global approaches to probabilisticsatisﬁability. Intern. J. of Approx. Reasoning 47(2), 125 – 140 (2008)13. Kern-Isberner, G.: Conditionals in nonmonotonic reasoning and belief revision.Springer, LNAI 2087 (2001)14. Kern-Isberner, G., Thimm, M.: Novel semantical approaches to relational proba-bilistic conditionals. In: Proc. KR 2010. pp. 382–391. AAAI Press (2010)15. Kern-Isberner, G., Lukasiewicz, T.: Combining probabilistic logic programmingwith the power of maximum entropy. Artif. Intell. 157(1-2), 139–202 (2004)16. Klinov, P., Parsia, B.: A hybrid method for probabilistic satisﬁability. In: Proc.CADE 2011. LNCS, vol. 6803, pp. 354–368. Springer (2011)17. Lukasiewicz, T.: Probabilistic deduction with conditional constraints over basicevents. JAIR 10, 380–391 (1999)18. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in descriptionlogics for the semantic web. J. of Web Semantics 6(4), 291–308 (2008)19. Lutz, C., Schr¨oder, L.: Probabilistic description logics for subjective uncertainty.In: Proc. KR 2010. AAAI Press (2010)20. Nilsson, N.J.: Probabilistic logic. Artiﬁcial Intelligence 28, 71–88 (February 1986)21. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, 2nd edn. (2006)22. Paris, J.: The uncertain reasoner’s companion – A mathematical perspective. Cam-bridge University Press (1994)23. Potyka, N.: Reasoning over linear probabilistic knowledge bases with priorities. In:Proc. SUM 2015. vol. 9310, pp. 121–136. Springer (2015)24. Potyka, N.: Relationships between semantics for relational probabilistic conditionallogics. In: Computational Models of Rationality, Essays dedicated to GabrieleKern-Isberner. pp. 332–347. College Publications (2016)25. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R.: Epistemic and statistical probabilisticontologies. In: Proc. URSW-12. vol. 900, pp. 3–14. CEUR-WS (2012)26. Schmidt-Schauß, M., Smolka, G.: Attributive concept descriptions with comple-ments. Artif. Intell. 48(1), 1–26 (1991)27. Yeung, R.W.: Information theory and network coding. Springer Science & BusinessMedia (2008) ppendix: Proofs

Theorem 13.

The KB K = ( R , T ) is ME-consistent iﬀ for every v ∈ Int ( L ) such that P ME R ( v ) > , T v is consistent.Proof. For the “if” direction, let v , . . . , v n ∈ Int ( L ) be all the L -interpretationssuch that P ME R ( v i ) >

0, 1 ≤ i ≤ n . Then, for every i, ≤ i ≤ n the inducedTBox T v i has a classical model I i = ( ∆ I i , · I i ), by assumption. It is easy to verifythat the ALCP -interpretation P = ( I , P I ) deﬁned by I = {J i = ( ∆ I i , · I i , v i ) | ≤ i ≤ n } and P I ( J i ) = P ME R ( v i ) for all i, ≤ i ≤ n is an ME-model of K . Thus, K isconsistent.Conversely, let P = ( I , P I ) be an ME-model of K . Then, for every v ∈ Int ( L )with P ME R ( v ) > ∆ I , · I , w I ) with w I = v . Since I is amodel of T and satisﬁes all contexts corresponding to the GCIs in T v , it followsthat ( ∆ I , · I ) | = T v ; thus, T v must be consistent. ⊓⊔ Theorem 16 (Intermediate Value Theorem).

Assume w.l.o.g. that P i = ( I i , P i ), i = 1 ,

2, are such that I ∩ I = ∅ : ifthere exists some I ∈ I ∩ I , it suﬃces to rename the elements in ∆ I in oneof the probabilistic interpretations. Given λ ∈ [0 , P λ = ( I ∪ I , P λ ),where for every I ∈ I ∪ I , P λ ( I ) = ( (1 − λ ) P ( I ) if I ∈ I λP ( I ) otherwise. P λ is consistent with T since P , P are. We now show that P λ induces theME-model of R , which implies that P λ is an ME- ALCP -model of K . For all v ∈ Int ( L ), we have P P λ ( v ) = X I∈ ( I ∪ I ) | v P λ ( I )= X I∈ I | v P λ ( I ) + X I∈ I | v P λ ( I )= X I∈ I | v (1 − λ ) · P ( I ) + X I∈ I | v λ · P ( I )= (1 − λ ) · P P ( v ) + λ · P P ( v ) = P ME R ( v ) , where the last equation follows from P P = P P = P ME R . Hence, each P λ isindeed a ME- ALCP -model of K . For the probability of our subsumption relation,e can derive similarly that P P λ ( κ ) Pr P λ ( C ⊑ D | κ ) is equal to X I∈ I ∪ I , I| = κ, I| = C ⊑ D P λ ( I ) = X I∈ I , I| = κ, I| = C ⊑ D P λ ( I ) + X I∈ I , I| = κ, I| = C ⊑ D P λ ( I )= P P λ ( κ ) ((1 − λ ) p + λp ) . For every p ∈ [ p , p ] there exists a λ p ∈ [0 ,

1] such that p = (1 − λ p ) p + λ p p .Using this value λ p we obtain that Pr P λp ( C ⊑ D | κ ) = p ⊓⊔ In order to prove Theorems 18 and 19 it is useful to consider a restricted classof

ALCP -interpretations in which each context is represented by at most onepossible world. We call these interpretations pithy . Deﬁnition 28 (pithy).

The

ALCP -interpretation P =( I , P I ) is pithy if forevery w ∈ Int ( L ) there is at most one possible world ( ∆ I , · I , v I ) ∈ I with v I = w . As the following lemma shows, pithy models are suﬃcient for computing thesceptical and credulous degrees of belief of a conditional subsumption relationand, by extension, the belief interval.

Lemma 29.

Let K be an ALCP

KB,

C, D two concepts and κ ∈ L such that P ME R ( κ ) > . For every ALCP -model P of K there exist pithy ALCP -models Q , Q of K such thatPr Q ( C ⊑ D | κ ) ≤ Pr P ( C ⊑ D | κ ) ≤ Pr Q ( C ⊑ D | κ ) and P P = P Q = P Q .Proof. Let P = ( I , P I ). If P is already pithy, then the result holds trivially.Otherwise, there must exist two possible worlds I , J ∈ I such that v I = v J .There are four possible cases: (i) I | = C ⊑ D and J | = C ⊑ D ; (ii) I 6| = C ⊑ D and J 6| = C ⊑ D ; (iii) I | = C ⊑ D and J 6| = C ⊑ D ; and (iv) I 6| = C ⊑ D and J | = C ⊑ D .We construct a new model P ′ by removing one of the possible worlds I , J and redistributing the probability according to these cases, as described next.For the ﬁrst three cases, deﬁne H := I \ {I} and the probability distribution P H ( H ) := ( P I ( H ) H 6 = J P I ( I ) + P I ( J ) H = J for all H ∈ H . Then P P = P P ′ and P ′ = ( H , P H ) is still an ME-model of K .Since the denominator in (3) is P ME R ( κ ) independently of C ⊑ D , we have byconstruction that Pr P ′ ( C ⊑ D | κ ) ≤ Pr P ( C ⊑ D | κ ) . The case (iv) is symmetric to (iii), where the possible world J is removed insteadof I . Since H ⊂ I , we can iteratively repeat this process until a pithy model Q is obtained. Q can be constructed symmetrically. ⊓⊔ otice that since P P = P Q = P Q , this lemma in particular implies that foreach ME-model there exists a pithy ME-model that yields a smaller or equalprobability for the subsumption relation, and dually one that yields a larger orequal probability. Note also that in each pithy ME-model, for all w ∈ Int ( L )with P ME R ( w ) >

0, there must be exactly one possible world I with v I = w because otherwise P I could not be a probability distribution (the elementaryevents could not sum to 1). Moreover, since a pithy interpretation can containat most | Int ( L ) | possible worlds and the world corresponding to some w ∈ Int ( L )must have probability P ME R ( w ), there is only a ﬁnite number of probabilities thatpithy models can assign to a given subsumption relation. Hence, the inﬁmum andsupremum that deﬁne the sceptical and the credulous degrees of belief actuallycorrespond to minimum and maximum taken by some pithy ME-models. Corollary 30.

Given an

ALCP KB K , two concepts C, D and a context κ , thereexist two pithy ME-models P , Q of K such that B s K ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and B c K ( C ⊑ D | κ ) = Pr Q ( C ⊑ D | κ ) . Theorem 18.

Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextsuch that P ME R ( κ ) > . Then B s K ( C ⊑ D | κ ) = P w ∈ Int ( L ) , T w | = C ⊑ D,w | = κ P ME R ( w ) P ME R ( κ ) . Proof.

For every w ∈ Int ( L ), we construct an ALCP -interpretation I w as follows.If T w | = C ⊑ D , then I w is any model ( ∆ I w , · I w , w ) of T w ; otherwise, I w isany model ( ∆ I w , · I w , w ) of T w that does not satisfy C ⊑ D , which must existby deﬁnition. Let now P K = ( I , P I ) be the ALCP -interpretation such that I = {I w | w ∈ Int ( L ) } and P I ( I w ) = P ME ( w ) for all w . Then P K is a model of K . Moreover, it holds that Pr P K ( C ⊑ D | κ ) = X I w | = C ⊑ D,w | = κ P I ( I w ) /P ME R ( κ )= X T w | = C ⊑ D P ME R ( w ) /P ME R ( κ ) . Thus, P ME R ( κ ) B s K ( C ⊑ D | κ ) ≤ P T w | = C ⊑ D,w | = κ P ME R ( w ). If this inequality isstrict, then w.l.o.g. there must exist a pithy probabilistic model P = ( J , P J ) of K such that Pr P ( C ⊑ D | κ ) < Pr P K ( C ⊑ D | κ ) (see Lemma 29). Hence for every w ∈ Int ( L ) with P ME R ( w ) > J w ∈ J with v J W = w .We thus have X J w | = C ⊑ D P J ( J w ) < X I w | = C ⊑ D P I ( I w ) . Since P I ( I w ) = P J ( J w ) for all w , then there must exist a valuation v such that I v | = C ⊑ D but J v = C ⊑ D . Since J v is a model of T v it follows that T v = C ⊑ D . By construction, then we have that I v = C ⊑ D , which is acontradiction. ⊓⊔ heorem 19. Let K = ( R , T ) be a KB, C, D two concepts, and κ a contextwith P ME R ( κ ) > . Then B c K ( C ⊑ D | κ ) = 1 − P w ∈ Int ( L ) , T w | = C D,w | = κ P ME R ( w ) P ME R ( κ ) . Proof.

For every w ∈ Int ( L ), construct an ALCP -interpretation I w as follows.If T w | = C D , then I w is any model ( ∆ I w , · I w , w ) of T w ; otherwise, I w isany model ( ∆ I w , · I w , w ) of T w that satisﬁes C ⊑ D . Let P K = ( I , P I ) be the ALCP -interpretation with I = {I w | w ∈ Int ( L ) } and P I ( I w ) = P ME ( w ) for all w . Then P K is a model of K . Moreover, it holds that Pr P K ( C ⊑ D | κ ) = X I w | = C ⊑ D,w | = κ P I ( I w ) /P ME R ( κ )= 1 − X T w | = A B P ME R ( w ) /P ME R ( κ ) . That is, B c K ( C ⊑ D | κ ) ≥ − P T w | = C ⊑ D,w | = k P ME R ( w ) /P ME R ( κ ). If this inequal-ity is strict, then there exists a probabilistic model P = ( J , P J ) of K such that Pr P ( C ⊑ D | κ ) > Pr P K ( C ⊑ D | κ ). By Lemma 29, we can assume w.l.o.g.that P is pithy. Hence for every w ∈ Int ( L ) with P ME ( w ) > J w ∈ J with v J W = w , and thus, X J w | = C ⊑ D P J ( J w ) > X I w | = C ⊑ D P I ( I w ) . Since P I ( I w ) = P J ( J w ) for all w , there must exist some v ∈ Int ( L ) such that I v = C ⊑ D but J v | = C ⊑ D . As J v is a model of T v , T v = C D . Byconstruction, we have that I v | = C ⊑ D , which is a contradiction. ⊓⊔ Theorem 22.

Let K = ( R , T ) be an ALCP

KB, φ and ψ be consequence for-mulas for C ⊑ D and C D w.r.t. T , respectively, and κ a context. Then B s K ( C ⊑ D | κ ) = P ME R ( φ | κ ) and B c K ( C ⊑ D | κ ) = 1 − P ME R ( ψ | κ ) .Proof. The result is a direct consequence of Deﬁnition 21 and Theorems 18and 19. Indeed, B s K ( C ⊑ D | κ ) = X T w | = C ⊑ D,w | = κ P ME R ( w ) /P ME R ( κ )= X w | = φ ∧ κ P ME R ( w ) /P ME R ( κ ) = P ME R ( φ | κ ) . The case of the credulous degree of belief is analogous. ⊓⊔ Theorem 24 (Representation invariance).

Let K i = ( R i , T i ) , i ∈ { , } , betwo KBs such that Mod ( R ) = Mod ( R ) and Mod ( T ) = Mod ( T ) . Then for allconcepts C, D and contexts κ ∈ L , B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) .roof. Let P = ( I , P I ) be an ALCP -interpretation. Since

Mod ( T ) = Mod ( T ), P is consistent with T iﬀ P is consistent with T . Since Mod ( R ) = Mod ( R ), R and R induce the same ME-model and P is (ME-)consistent with R iﬀ P is (ME-)consistent with R . Hence, B s K ( C ⊑ D | κ ) = inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ )= inf P∈ Mod ME ( K ) Pr P ( C ⊑ D | κ )= B s K ( C ⊑ D | κ )Analogously, we get that B c K ( C ⊑ D | κ ) = B c K ( C ⊑ D | κ ) and therefore B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ). ⊓⊔ Theorem 25 (Language Invariance).

Let K , K be KBs over L , N , N and L , N , N , respectively. If K = K , L ⊆ L , N ⊆ N and N ⊆ N , then forall concepts C, D ∈ N and contexts κ ∈ L , it holds that B K ( C ⊑ D | κ ) = B K ( C ⊑ D | κ ) . Proof.

It suﬃces to show that for every

ALCP -model P of K there exists a ALCP -model P of K such that Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) and vice versa . Given an ALCP -model P = ( I , P I ) of K , we build P = ( I , P I )as follows. For each possible world I ∈ I with probability p , I contains apossible world I ′ with probability p that extends I assigning false to all newpropositional variables and the empty set to all new role names and conceptnames. Since C, D, κ and K depend only on sig ( L ) , N , N , P satisﬁes K and Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ) holds. Conversely, consider an ALCP -model P = ( I , P I ) of K . We obtain P from P by restricting thepossible worlds in I to C, D, κ . As before, it follows that P satisﬁes K and Pr P ( C ⊑ D | κ ) = Pr P ( C ⊑ D | κ ). ⊓⊔ In order to prove Independence, we need the following lemma. It states an inde-pendence property of ME-distributions over our context language.

Lemma 31 (ME-independence).

Let R , R be two ﬁnite sets of probabil-ity constraints such that sig ( L ) ∩ sig ( L ) = ∅ , and let R := R ∪ R . Then P ME R = P ME · P ME . In particular, for the marginal distributions of P ME R , wehave P ME R ( v i ) = P ME i ( v i ) for all v i ∈ Int ( L i ) , i ∈ { , } .Proof. Since the signatures of L i are disjoint, we can denote the valuations ofthe language over sig ( L ) ∪ sig ( L ) by ( v , v ), v i ∈ sig ( L i ). Let us ﬁrst considerthe marginals of P = P ME · P ME . For all v ∈ Int ( L i ), we have P ( v ) = X v ∈ Int ( L i ) P ( v , v ) = P ME ( v ) X v ∈ Int ( L i ) P ME ( v )= P ME ( v ) . ymmetrically, we can show that P ( v ) = P ME ( v ). This means, in particu-lar, that the marginals of P coincide with the corresponding maximum entropysolutions. Therefore, H ( P ) = X v X v P ME ( v ) P ME ( v ) log( P ME ( v ) · P ME ( v ))= X v P ME ( v ) X v P ME ( v ) log P ME ( v )+ X v P ME ( v ) X v P ME ( v ) log P ME ( v )= H ( P ME ) + H ( P ME ) . Using the independence bound for entropy (see, e.g., [27], Theorem 2.39), wehave that H ( P ME R ) ≤ H ( P ME ) + H ( P ME ) = H ( P ) . Hence, it suﬃces to show that P ME · P ME is indeed a model of R ∪ R . But thisfollows immediately from the facts that P ME i satisﬁes R i and that the marginal-ization of P over one logic corresponds to the ME-distribution over the other. ⊓⊔ Theorem 26 (Independence).

Let K , K be s.t. sig ( K ) ∩ sig ( K ) = ∅ , C, D be two concepts, and κ a context where ( sig ( C ) ∪ sig ( D ) ∪ sig ( κ )) ∩ sig ( K ) = ∅ .Then B ( C ⊑ K D | κ ) = B ( C ⊑ K ∪K D | κ ) .Proof. Let K i = ( R i , T i ) for i ∈ { , } . Since the signatures of both KBs aredisjoint, we will denote the valuations over the set of variables sig ( R ) ∪ sig ( R )as pairs ( w , w ), where w i is a valuation over sig ( R i ). We know from Theorem 18that P ME ( κ ) B ( C ⊑ K ∪K D | κ ) = X ( w ,w ) | = κ ( T ∪T w ,w | = C ⊑ D P ME R (( w , w ))= X ( w ,w ) | = κ ( T w | = C ⊑ D P ME R (( w , w )) (4)= X w | = κ ( T w | = C ⊑ D P ME R ( w ) (5)= P ME R ( κ ) B ( C ⊑ K D | κ ) , where (4) follows from the monotonicity of subsumption in ALC

TBoxes, and (5)is a consequence of Lemma 31. ⊓⊔ In order to prove Continuity, we start with a lemma that states continuity ofME-distributions over L . The proof is analogous to Paris’ proof of continuity ofmaximum entropy reasoning in his probabilistic logic [22], which is a sub-logicof our probabilistic logic over L . emma 32 (ME-continuity). Let R be a set of probabilistic constraints andlet ( R i ) be a sequence of probabilistic constraints such that ( Mod ( R i )) con-verges to Mod ( R ) . Then the sequence ( P i ) of ME -Models of R i converges tothe ME-model P ME R of R .Proof. For brevity, let M = Mod ( R ) and M i = Mod ( R i ). We show that for each ǫ >

0, there is a δ > kK i , Kk B < δ implies that k P ME i , P ME R k < ǫ .Consider the set S = { P ∈ M | k P, P ME R k ≥ ǫ } of models of of R that have atleast distance ǫ to P ME R . By continuity of the euclidean distance and compact-ness of M , S must be compact. Since the entropy function H is continuous, theminimum ν = min { H ( P ME R ) − H ( P ) | P ∈ S } does exist and ν > P ME R . Since H is deﬁned on a compact set (the set of probabil-ity distributions over L ), H is uniformly continuous. Therefore there exists a δ > P , P over L , k P , P k < δ implies that H ( P ) − H ( P ) < min { ǫ , ν } . In partiular, we can assume that δ < ǫ . Now, if k M i , M k B < δ , there is a P i ∈ M i such that k P i , P ME R k < δ and a P ∈ M suchthat k P ME i , P k < δ . Hence, H ( P ME R ) < H ( P i ) + ν ≤ H ( P ME i ) + ν ,H ( P ME i ) < H ( P ) + ν ≤ H ( P ME R ) + ν | H ( P ME i ) − H ( P ME R ) | < ν . In particular, | H ( P ) − H ( P ME R ) |≤ | H ( P ) − H ( P ME i ) | + | H ( P ME i ) − H ( P ME R ) | < ν ν ν. By deﬁnition of ν , we can conclude that P ∈ M \ S and therefore k P ME R , P k < ǫ .Hence, k P ME R , P ME i k ≤ k P ME R , P k + k P, P ME i k < ǫ δ < ǫ. ⊓⊔ Theorem 27 (Continuity).

Let ( K i ) be a convergent sequence of KBs withlimit K and B K i ( C ⊑ D | κ ) = [ ℓ i , u i ] . If B K ( C ⊑ D | κ ) = [ ℓ, u ] , then ( l i ) converges to ℓ and ( u i ) converges to u (with respect to the usual topology on R ).Proof. Let K = ( R , T ) and K i = ( R i , T i ). By assumption, ( Mod ( R i )) convergesto Mod ( R ). Hence, Lemma 32 implies that the probability distributions inducedby ME-models of K i converge to P ME R , which, in turn, is the probability distri-bution induced by all ME-models of K . Hence, inﬁmum and supremum of theconditonal probability of C ⊑ D given κ with respect to K i will converge toinﬁmum and supremum with respect to K ..