Beliefs and Probability in Bacchus' l.p. Logic: A~3-Valued Logic Solution to Apparent Counter-intuition
aa r X i v : . [ c s . A I] A p r Beliefs and Probability in Bacchus’ l.p. Logic:A 3-Valued Logic Solution to ApparentCounter-intuition
Mieczyslaw A. Klopotek
Institute of Computer Science Polish Academy of SciencesWarsaw, Poland
Abstract
Fundamental discrepancy between first order logic and statistical inference(global versus local properties of universe) is shown to be the obstacle forintegration of logic and probability in L.p. logic of Bacchus. To overcome thecounterintuitiveness of L.p. behaviour, a 3-valued logic is proposed.
The paper of Bacchus aims at painless integration of two paradigms of humanreasoning, that is 1) first order logics and 2) statistical inference. (see also ) in sucha way as to avoid all the contradictions emerging in previous approaches.Nonetheless the claim of the current paper is that also the L.p. logic of Bacchus fails to achieve its primary goal of becoming the tool for describing knowledge &reasoning in expert systems and other knowledge-based systems. In Section 2 wepresent several simple examples of basic flaws of this logic exploiting the counterintuitiveness of the L.p. logic. Section 3 demonstrates a more elaborated examplepointing at weaknesses of the L.p. logic.As a remedy we propose (in a sketchy way) a different deduction theory takinginto account the gap between first-order way of thinking (global treatment of do-mains) and that of statistical (experimental) sciences (local treatment of domains). A number of works were concerned with representational and inferential issues whenprobabilities of events were identified as degrees of belief. Bacchus criticized i.e. theollowing approaches:
Approach 1: (propositional logic) : (see. , ) Probability of a sentence is theprobability of selection of one of those possible worlds wherein this sentence holds.E.g. the 90 % belief that the famous Tweety flies is stated as P rob ( F lies ( T weety )) = x with x being greater than 0.9. However, such an approach does not make it easyto state that “Most birds fly”. Approach 2: (first order logic) The probability of the expression: ∀ x.Bird ( x ) → F lies ( x ) be expressed as P rob ( ∀ x.Bird ( x ) → F lies ( x )). Following the principles ofprobability calculus we obtain: P rob ( ∃ x.Bird ( x ) ∧¬ F lies ( x )) = P rob ( ¬ ( ∀ x.Bird ( x ) → F lies ( x ))) = 1 − P rob ( ∀ x.Bird ( x ) → F lies ( x ))). Hence if P rob ( ∀ x.Bird ( x ) → F lies ( x )) > .
9, then it should hold that
P rob ( ∃ x.Bird ( x ) ∧¬ F lies ( x )) < .
1. How-ever one can imagine such a set of possible worlds that in most of those worlds mostof birds fly and at the same time in most of the worlds non-flying birds, exist, thatis both
P rob ( ∀ x.Bird ( x ) → F lies ( x )) > . P rob ( ∃ x.Bird ( x ) ∧ ¬ F lies ( x )) > . > . Approach 3:
Cheeseman proposed that the above statements be meta-expressionwith conditional probability of the type: ∀ x.P rob [ F lies ( x ) | Bird ( x )] > . for details).So, both probability inside and outside the scope of quantifiers lead to contradic-tions. Hence Bacchus proposed an L.p. logic described in , where the probabilityis a quantifier itself (probability of the formula α ( x ) with the free variable x isexpressed as [ α ( x )] x . Let us cite here from :INFERENCE RULE: ( modus ponens )R1: From { α , α → β } infer β .DEFINITION: Conditional probability [ β | α ] x ( β conditioned on α ):([ α ] x > → [ β ∧ α ] x = [ β | α ] x ∗ [ α ] x ) ∧ ([ α ] x = 0 → [ β | α ] x = 0) Let us show now the major weaknesses of the L.p. logic. Let us notice the following:1. many Logic-based knowledge systems express general knowledge in terms ofimplications,2. all the examples of statistical knowledge representation in refer to conditionalprobabilities instead of probabilities of implications.2. the concept of conditional probability in L.p. is not a primary one but aconcept derived from “absolute” probability in a strange way (see below),4. the strangeness of conditional probability definition results from missing logicalconstruct corresponding to conditional probability, (a construct of the form: p ⇒ q with [ p ⇒ q ] x == [ q | p ] x ).5. the conditional probability does not suffice to substitute this missing logicalconstruct, for how to express a statement “in most cases whenever p implies q then also v implies z ”.Let us demonstrate the non-suitability of implication for expressing statisticalknowledge. Example 1:
What is the sum of conditional probabilities of an event and itscounter-event [ α | β ] x + [ ¬ α | β ] x ? The answer is: either 1 or 0!! (depend-ing on the probability of β , that is [ β ] x ). Example 2:
What is the conditional probability of an event conditioned on itself:[ α | α ] x ? The answer is: either 1 or 0!!! (depending on the probability of α , that is[ α ] x ). Example 3:
Let us consider the following facts:“With a certainty of at most 90 % if you are man then you are fertile.”“With a certainty of at most 80 % if you are a fertile man then you will become afather”“If you are a father then you are a man.”What is the probability of being a woman ?The answer is: at most 0.7. The proof is as follows:We obtain the translation of the facts:[ man ( x ) → f ertile ( x )] x ≤ . , [ man ( x ) ∧ f ertile ( x ) → f ather ( x )] x < . , ∀ x. ( f ather ( x ) → man ( x ))Hence: [ ¬ ( man ( x ) → f ertile ( x ))] x ≥ . ¬ ( man ( x ) ∧ f ertile ( x ) → f ather ( x ))] x ≥ . man ( x ) ∧ ¬ f ertile ( x ))] x ≥ . man ( x ) ∧ f ertile ( x ) ∧ ¬ f ather ( x ))] x > . woman ( x )] x = 1 − [ man ( x )] x =3 1 − [( man ( x ) ∧ ¬ f ertile ( x )) ∨ ( man ( x ) ∧ f ertile ( x ) ∧ f ather ( x )) ∨ ( man ( x ) ∧ f ertile ( x ) ∧ ¬ f ather ( x ))] x == 1 − [ man ( x ) ∧ ¬ f ertile ( x )] x − [ man ( x ) ∧ f ertile ( x ) ∧ f ather ( x )] x − [ man ( x ) ∧ f ertile ( x ) ∧ ¬ f ather ( x )] x ≤≤ − [ man ( x ) ∧ ¬ f ertile ( x )] x − [ man ( x ) ∧ f ertile ( x ) ∧ f ather ( x )] x ≤≤ − . − . . Example 4.
Let us consider the following facts:“For all x, if x is a male then x is not pregnant” and“For all x, it is not true that if x is a male then x is pregnant”The question is: are there any females ?Let us use the following predicates: m ( x )–male x, p ( x ) –pregnant x We obtain the translation: ∀ x. ( m ( x ) → ¬ p ( x )) and ∀ x. ¬ ( m ( x ) → p ( x ))Hence: [( m ( x ) → ¬ p ( x ))] x = 1 and [ ¬ ( m ( x ) → p ( x ))] x = 1hence: [( m ( x ) → ¬ p ( x ))] x = 1 and [( m ( x ) → p ( x ))] = 0But: ∀ x. (( m ( x ) → ¬ p ( x )) ∨ ( m ( x ) → p ( x ))Hence [( m ( x ) → ¬ p ( x )) ∨ ( m ( x ) → p ( x )] x = 1but [( m ( x ) → ¬ p ( x )) ∨ ( m ( x ) → p ( x )] == [ m ( x ) → ¬ p ( x )] x + [ m ( x ) → p ( x )] x − [( m ( x ) → ¬ p ( x )) ∧ ( m ( x ) → p ( x )] x Hence: 1 = 1 + 0 − [( m ( x ) → ¬ p ( x )) ∧ ( m ( x ) → p ( x )] x Hence: [( m ( x ) → ¬ p ( x )) ∧ ( m ( x ) → p ( x )] x = 0[ ¬ m ( x )] x = 0So being a female is improbable !!!!Before proceeding with another example let us remind a basic fact from intuitivereasoning: whenever we consider a piece of knowledge to be nearly sure, we reasonwith it as if it were absolutely true and when we obtain a result then we believe itto be nearly sure if the reasoning chain is not too long. We also take our experience4earned in one environment and expect it to hold in a different environment if thefirst environment yielded significant results. When we apply a body of generalknowledge to an individual case, we usually possess only partial knowledge of thecase and reason as if we have had a population of cases fitting our knowledge of theindividuum and obtain statistical results covering this artificial population. This ishow Bayesian networks are used for individual diagnosis, as done in Example 8also. This is also the very nature of Miller’s Principle .Let us state some claims about L.p. logic : Theorem 1
L.p. logic is equivalent to a logic Lp’ derived from L.p. by substitutionof the inference rule R with R1’ and
R2’ : R1’ : From { [ α ] x = 1 , [ α → β ] x = 1 } infer [ β ] x = 1 ., with vector x being vector ofall free variables in α and β . R2’ : From { α → β } infer [ α → β ] x = 1 , ( x as in R1’ ). PROOF: see ✷ Theorem 2
Lp’ logic is equivalent to a logic Lp” derived from Lp’ by substitutionof the inference rules
Ri’ with
R1” , R2” , R3” : R1” : From { [ α ] x = 1 , [ β | α ] x = 1 } infer [ β ] x = 1 . , with vector x is vector ofall free variables in α and β . R2” = R2’R3” : From { [ α → β ] x = 1 . [ α ] x > infer [ β | α ] x = 1 , ( x as above). PROOF: see ✷ Theorem 3
Given [ α ] x > , always [ β | α ] x ≤ [ α → β ] x . PROOF: easily seen ✷ Theorem 4
If within the proof system Lp’ in a certain step of the proof the premise/conclusionis weakened [ α ] x = 1 − ε , [ α → β ] x = 1 − ε ( ε i ≥ and small), then in theequivalent proof in Lp” we get: [ β | α ] x ≥ − ε PROOF:1 = ε + [ α → β ] x = ε + [ ¬ α ∨ β ] x ≤ ε + [ ¬ α ] x + [ β ] x = ε + ε + [ β ] x , hence: [ β ] x ≥ − ε − ε [ β | α ] x = [ β ∧ α ] x / [ α ] x = ([ β ] x − [ ¬ β ∧ α ] x ) / [ α ] x = ([ β ] x − [ ¬ ( β ∨ ¬ α ] x ) / [ α ] x == ([ β ] x − [ ¬ ( α → β )] x ) / [ α ] x = ([ β ] x − (1 − [( α → β )] x )) / [ α ] x (1 − ε − ε − − ε ) / (1 − ε ) = (1 − ε ) / (1 − ε ) − ε / (1 − ε ) = 1 − ε / (1 − ε ) ≥ − ε Q.e.d. ✷ Example 5:
Let us consider the example 8 from , page 227. (Fig. 1: from withmy interpretation for X − X ): Let us first consider the rules: X (guilty) X prison X punishmentfinancial X punishment ❅❅❅❅❅❅❅❘(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)✠❅❅❅❅❅❅❅❘ (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)✠ Fig1: Example 8 from [page 227] [1] – intrepreted ¬ X ( x ) → X ( x ) and X ( x ) ∨ X ( x ) → X ( x ) . Hence if ¬ X ( x ) is valid, then in the logic Lp’ we obtain rules:[ ¬ X ( x ) → X ( x )] = 1 and [ X ( x ) ∨ X ( x ) → X ( x )] x = 1then From ¬ X ( x ) , [ ¬ X ( x ) → X ( x )] x = 1 infer[ X ( x )] x = 1From [ X ( x )] = 1 , definition ∨ ′ infer [ X ( x ) ∨ X ( x )] x = 1From [ X ( x ) ∨ X ( x )] = 1 , [ X ( x ) ∨ X ( x ) → X ( x )] x = 1 infer[ X ( x )] x = 1 . Now let us imagine we verify our rules in a real world environment. Let among 100persons appearing before court be 5 innocent ones none of which was condemned,and 95 guilty persons of which 94 were imprisoned and one had to pay a fine. Then:[ ¬ X x ) → X ( x )] x = 0 . and [ X ( x ) ∨ X ( x ) → X ( x )] x = 1So in fact our rules are highly probable. Now let us apply the rules learned previouslyto an individuum of which we know it is innocent. So we consider a population with[ ¬ X ( x )] x = 1. Following the spirit of the previous deduction we obtain:From ¬ X ( x ) , [ ¬ X ( x ) → X ( x )] x = 0 .
95 infer [ X ( x )] x > . X ( x )] x > . , definition of ′ ∨ ′ infer[ X ( x ) ∨ X ( x )] x > . X ( x ) ∨ X ( x )] x > . , [ X ( x ) ∨ X ( x ) → X ( x )] x = 1 infer[ X ( x )] x > . . However, if we considered conditional probabilities instead of probabilities of in-ference rules we would obtain: [ X ( x )] x = 0 (innocent are not condemned). Soapparently the validity of THEOREM 4 is denied, so also that of Bacchus L.p.Though the reason for the flaw is obvious – inference rules are global in nature andconditional probabilities cover local properties of a universe, hence are more suitableto be transferred to another universe – but the solution is not as easy. To overcome the problems mentioned above it is necessary to find a logical constructcorresponding to conditional probability. It is easily seen that enforcing the interpre-tation of probability of ordinary implication as conditional probability would leadto serious problems for then: [ β | α ] x = [ α → β ] x = [ ¬ β → ¬ α ] x = [ ¬ α | ¬ β ] x ,which may easily lead to a contradiction.So we see that two-valued logics are not sufficient for our purposes. Hence let usintroduce the logical construct |⊢ having the following three-valued semantics: ( T- =true, F =false, U =uninteresting) p |⊢ q ❅❅ p T U FqT T U UU U U UF F U UWe need also truth tables for basic logical constructs ∧ , ∨ , ¬ : p ∧ q ❅❅ p T U F q T T U FU U U FF F F F p ∨ q ❅❅ p T U F q T T T TU T U UF T U F ❅❅ ¬ qq TUF FUTLet us define two probability quantifiers: P x.α and P x.α in such a way that P α taking value T to cases it takes value T or F . P x.χ |⊢ β is then equivalent to conditional probability [ β | χ ] x . P T or F to cases it takes any ofthe values T , F , U . We have then the following properties of both:1) ∀ x . . . ∀ x n . α → P x.α = 1 ∧ P x.α = 17) P x.α ≥ , P x.α ≥ , P x.α ≥ P x.α + P x. ¬ α = 1 , P x.α = P x. ¬ α P x.α + P x.β ≥ P x.α ∨ β P x.α ∧ β = 0 → P x.α + P x.β = P x.α ∨ β The quantifier P P P x in previous examples would resolve all the problems encountered there. Beside this,the statement “Almost always whenever p implies q then also v implies z ” may beproperly expressed by P x. ( p |⊢ q ) |⊢ ( v |⊢ z ) > .
9. So, by proper axiomatizationwe will gain the following: if a proof is to be transferred from one universe to anotherone locally similar then all the steps engaging P P . References
1. F. Bacchus: “L.p., a logic for representing and reasoning with statistical knowl-edge”,
Computer Intelligence , 209-231, (1990).2. P. Cheeseman: “An inquiry into computer understanding”, ComputationalIntelligence , , 58-66, (1988).3. J.Y. Halpern: “An analysis of first-order logics of probability”, Artificial In-telligence , 311-350, (1990).4. T. Hrycej: “Gibbbs Sampling in Bayesian Networks”,
Artificial Intelligence
46 (3) , 351-36, (1990).5. M.A.Klopotek: “Bayesian Network and L.p. Logic for Statistical Inference”, ATalk at the
National Workshop Cybernetics- Intelligence- Development CIR-91 , Siedlce-Poland, Sept. (1991) - to appear in Proceedings.6. C.G. Morgan: “Weak conditional comparative probability as a formal semantictheory”,
Zeit. Fuer Math. Log . , 199-212, (1984).7. N.J. Nilsson: “Probabilistic logic ”, Artificial Intelligence , 71-87, (1986).8. S. Watanabe: “Pattern Recognition, Human and Machine”“Pattern Recognition, Human and Machine”