aa r X i v : . [ s t a t . O T ] M a r Bayes’ Theorem under ConditionalIndependence
Jun Hu ∗ and Xianggui Qu † Abstract
In this article we provide a substantial discussion on the statistical concept of con-ditional independence, which is not routinely mentioned in most elementary statisticsand mathematical statistics textbooks. Under the assumption of conditional indepen-dence, an extended version of Bayes’ Theorem is then proposed with illustrations fromboth hypothetical and real-world examples of disease diagnosis.
Keywords:
Disease diagnosis; Extended Bayes’ Theorem; HIV testing ∗ Jun Hu is Assistant Professor in the Department of Mathematics and Statistics, Oakland University,Rochester, MI 48309. Email address: [email protected]. † Xianggui Qu is Professor in the Department of Mathematics and Statistics, Oakland University,Rochester, MI 48309. Email address: [email protected]. . Introduction Inarguably, conditional probability and independence are two concepts that play an impor-tant role in statistical theory. Most elementary statistics and mathematical statistics text-books discuss these two concepts in detail and then illustrate the well-known Bayes’ Theo-rem, such as Wackerly, Mendenhall, and Scheaffer (2014) and Hogg, Tanis, and Zimmerman(2015). To our surprise, however, the concept of conditional independence has been rarelymentioned since its appearance in Dawid (1979) more than forty years ago, let alone asystematic introduction.In this article, therefore, we give a substantial discussion on conditional independence.We focus on conditional independence of events instead of random variables for illustrativepurposes. This way, the concept is made as simple as possible for students to understand,but no simpler. In Section 2, a number of straightforward examples are provided to pointout some basic properties of conditional independence as well as a series of seemingly correctyet wrong arguments that students may make to supplement the existing literature. Thenin Section 3, we propose an extended version of Bayes’ Theorem under the assumption ofconditional independence to accommodate practical applicability, and also use hypotheticaland real-world examples to demonstrate the possible application in disease diagnosis. Thematerials will be helpful for motivating undergraduate students to explore the story-linewith more depth-confidence-grasp about how to apply the impressing result efficiently. Weend with some concluding thoughts in Section 4.
2. Conditional Independence
In this section, we first revisit (statistical) independence between two events and therebyintroduce the concept of conditional independence. After a sequence of preliminary resultsare presented, we extend the idea from the two-event case to multiple-event case.2 .1 Basic concepts and preliminary results
Definition 1. (Independence) Two events A and A are said to be independent if andonly if P ( A | A ) = P ( A ) , (1)provided that P ( A ) > Definition 2. (Independence) Two events A and A are said to be independent if andonly if P ( A ∩ A ) = P ( A ) · P ( A ) . (2)The first definition is straightforward to convey the meaning of independence: if twoevents are independent, then knowledge that one of the events has occurred has no effect onthe probability that the other will occur. Nevertheless, most students will find the seconddefinition more favorable since it does not require the assumption that P ( A ) > Definition 3. (Conditional Independence) Two events A and A are said to be condition-ally independent given event B with P ( B ) >
0, if and only if P ( A ∩ A | B ) = P ( A | B ) · P ( A | B ) . (3)Otherwise, we say events A and A are conditionally dependent given B .Conditional independence of two events can be interpreted in view of Definition 1: Underthe condition that event B has occurred, event A (or A ) occurring does not affect theprobability that event A (or A ) occurs. Naturally, students may ask how independenceand conditional independence might be associated with each other. Here, we provideseveral crucial remarks with examples to answer this question, which also demonstratethat independence and conditional independence can behave quite differently. We believethis will help students to avoid making misleading arguments that seem to make sense atthe first glance. Afterwards, students may like to further scrutinize those arguments and3onstruct their own counter-examples for practice. Throughout the article, S is used todenote the sample space with equally likely outcomes without otherwise specified, and thecomplement of an event A is represented by A ′ . Remark 1.
Independence does not imply conditional independence necessarily,and vice versa.
Example 1.
Let S = { , , , , , } . Define three events A = { , , } , A = { , } and B = { , , } . By the assumption of equally likely outcomes in S , it is trivial for studentsto obtain that P ( A ) = 12 , P ( A ) = 13 , P ( A ∩ A ) = 16 , P ( A | B ) = 23 , P ( A | B ) = 13 , P ( A ∩ A | B ) = 0 . Therefore, students immediately find that A and A are independent since P ( A ∩ A ) = P ( A ) P ( A ). However, A and A are not conditionally independent given B due to thefact that P ( A | B ) P ( A | B ) = P ( A ∩ A | B ). Example 2.
Let S = { , , , , , , , } . Define three events A = { , , } , A = { , } and B = { , , , , , } . Clearly, P ( A ) = 38 , P ( A ) = 14 , P ( A ∩ A ) = 18 , P ( A | B ) = 12 , P ( A | B ) = 13 , P ( A ∩ A | B ) = 16 . Hence, A and A are conditionally independent under B , but are not independent of eachother. Remark 2.
That two events A and A are conditionally independent given event B does not necessarily imply that A and A are also conditionally independentgiven B ′ , the complement of B . Example 3.
Let S = { , , , , , , , } . Define three events A = { , , , } , A = { , , } and B = { , , , , , } . Then, by noting that P ( A | B ) = 12 , P ( A | B ) = 13 , and P ( A ∩ A | B ) = 16 , one has that A and A are conditionally independent under B . However, A and A arenot conditionally independent under B ′ since P ( A | C ′ ) = 12 , P ( A | C ′ ) = 12 , but P ( A ∩ A | B ′ ) = 0 . emark 3. That two events A and A are both independent and conditionallyindependent given event B does not imply that A and A are conditionallyindependent given B ′ . Example 4.
Let S = { , , ..., } . Define three events as follows: A = { , , ..., , } , A = { , , , , , , , } and B = { , , , , , } , for which A ∩ A = { , , , , , } and B ′ = { , , , , , , , , , } . It is not hard for students to work out the following quantities: P ( A ) = 34 , P ( A ) = 12 , P ( A ∩ A ) = 38 = P ( A ) P ( A ) ,P ( A | B ) = 12 , P ( A | B ) = 13 , P ( A ∩ A | B ) = 16 = P ( A | B ) P ( A | B ) ,P ( A | B ′ ) = 910 , P ( A | B ′ ) = 35 , P ( A ∩ A | B ′ ) = 12 = P ( A | B ′ ) P ( A | B ′ ) . In this example, students will notice that A and A are independent and conditionallyindependent under B , but they are conditionally dependent under B ′ . Remark 4.
That two events A and A are both conditionally independent given B and conditionally independent given B ′ does not necessarily imply A and A are independent. Example 5.
Let S = { , , ..., } . Define three events A = { , , , , , , , , } , A = { , , , , , } and B = { , , , , , } . Then, we have P ( A | B ) = 12 , P ( A | B ) = 13 , P ( A ∩ A | B ) = 16 = P ( A | B ) P ( A | B ) ,P ( A | B ′ ) = 34 , P ( A | B ′ ) = 12 , P ( A ∩ A | B ′ ) = 38 = P ( A | B ′ ) P ( A | B ′ ) ,P ( A ) = 914 , P ( A ) = 37 , P ( A ∩ A ) = 27 = P ( A ) P ( A ) . Thus, while A and A are conditionally independent under either B or B ′ , A and A arenot independent.Next, the following theorem points out a possible association between independenceand conditional independence. Theorem 1
Let A , A and B be three events with P ( B ) > . If A is independent of B and A is also independent of A ∩ B , then A and A are conditionally independent given B . roof. By checking the definition of conditional independence between two events, stu-dents can establish the identity that P ( A ∩ A | B ) = P ( A ∩ A ∩ B ) P ( B ) = P ( A ) · P ( A ∩ B ) P ( B ) = P ( A | B ) · P ( A | B ) . (4)Hence, the statement holds. Theorem 2
Given event B with P ( B ) > , the following four statements in terms ofevents A , A and their complements are equivalent: (i) A and A are conditionally inde-pendent; (ii) A ′ and A are conditionally independent; (iii) A and A ′ are conditionallyindependent; (iv) A ′ and A ′ are conditionally independent. Proof.
We show that (i) ⇒ (ii) ⇒ (iv) ⇒ (iii) ⇒ (i). First, we show (i) ⇒ (ii). Note that when A and A are conditionally independent given B , one has P ( A ′ ∩ A | B ) = P ( A ′ ∩ A ∩ B ) P ( B ) = P ( A ∩ B ) − P ( A ∩ A ∩ B ) P ( B )= P ( A | B ) − P ( A | B ) P ( A | B ) = P ( A ′ | B ) P ( A | B ) , (5)which indicates that A ′ and A are conditionally independent given B . Note that studentswill need to recall the definition of conditional probability and the identity that P ( A | B ) + P ( A ′ | B ) = 1 to claim (5). Following this result, (iv) holds immediately by retaining thefirst event A ′ and substituting the second event A with A ′ , as how we moved forwardfrom (i) to (ii). In the same manner, (iv) ⇒ (iii) and (iii) ⇒ (i) can also be justified togetherwith the interchangeability of A and A . In analogy to pairwise and mutual independence of multiple events, we are now in a positionto generalize the notion of conditional independence of multiple events.
Definition 4. (Pairwise and Mutual Conditional Independence) A collection of events A , A , ..., A n ( n ≥
3) is said to be pairwise conditionally independent given event B with P ( B ) >
0, if and only if for all i = j , P ( A i ∩ A j | B ) = P ( A i | B ) · P ( A j | B ) . (6)6 collection of events A , A , ..., A n ( n ≥
3) is said to be mutually conditionally inde-pendent given another event B with P ( B ) >
0, if and only if for every subset of indices i , i , ..., i k , P ( A i ∩ A i ∩ · · · ∩ A i k | B ) = P ( A i | B ) · P ( A i | B ) · · · P ( A i k | B ) . (7)For convenience, we drop the modifier “mutually” when talking about multiple mutuallyconditionally independent events in practice. Hence, whenever we say that A , ..., A n are“conditionally independent”, we mean “mutually conditionally independent.” Studentsmay take it as an exercise to give examples showing that Remarks 1-4 are also satisfied formultiple conditionally independent events. In this case, Theorem 2 can also be modifiedaccordingly. Theorem 3
Given a collection of events A , A , ..., A n ( n ≥ , let A ∗ i be either A i orits complement A ′ i , i = 1 , , ..., n . Then, all the following statements are equivalent: A ∗ , A ∗ , ..., A ∗ n are conditionally independent given event B with P ( B ) > . Proof.
One may start with the assumption that A , ..., A n are conditionally independentunder B , and show that the collection of events stay conditionally independent if we sub-stitute one of them with its complement, for instance, A ′ , A , ..., A n . This can be done in asimilar way as we proved (i) ⇒ (ii) in Theorem 2. Then, we use this result repeatedly withone A ∗ i replaced by its complement at a time, and a complete proof will go through. Weleave out many details for brevity.
3. Extending Bayes’ Theorem
When it comes to conditional probability, Bayes’ Theorem is helpful for reversing the role ofthe event and the condition. Suppose A is an event with P ( A ) >
0, and B , B , ..., B m ( m ≥
2) are mutually exclusive and exhaustive events, that is, a partition of the sample space S .Then, P ( B k | A ) = P ( A | B k ) P ( B k ) P mi =1 P ( A | B i ) P ( B i ) , k = 1 , , ..., m. (8)7onsidering the set of events { B, B ′ } as a trivial partition of S , we have a simplifiedversion of Bayes’ Theorem as follows: P ( B | A ) = P ( A | B ) P ( B ) P ( A | B ) P ( B ) + P ( A | B ′ ) P ( B ′ ) , (9)which is widely used in diagnostic testing for diseases. See the example below. Example 6.
Let D be the event that a (rare) disease is present, so D ′ denotes the event thatthe disease is not present. Suppose there exists a diagnostic test for this disease, and let T + and T − be the events that the test result is positive and negative, respectively. Here,1. P ( D ), called the prevalence , is interpreted as the probability that a randomly-selectedperson has the disease and is assumed known.2. P ( T + | D ), called the test sensitivity , is interpreted as the probability that the testgives a “true positive” result. As a characteristic of the test, it is known to us.3. P ( T − | D ′ ), called the test specificity , is interpreted as the probability that the testgives a “true negative” result. As another characteristic of the test, it is also knownto us.4. P ( D | T + ), called the positive predictive value (PPV), is the conditional probabilitythat one has the disease given that the test result is positive. If the test is positiveand the PPV is high enough, then it would be appropriate to initiate a treatment.On the other hand, if the PPV is low, then further testing might be appropriate.5. P ( D ′ | T − ), called the negative predictive value (NPV), is the conditional probabilitythat one does not have the disease given that the test result is negative. If the test isnegative and the NPV is high enough, then one can conclude no disease is present.On the other hand, if is low, then further testing might be appropriate.One may refer to Altman and Bland (1994a,b) for more details of these notions.Mostly, we are interested in the PPV. Based on Bayes’ Theorem in (9), we substitute A with T + , B with D and obtain P ( D | T + ) = P ( T + | D ) P ( D ) P ( T + | D ) P ( D ) + P ( T + | D ′ ) P ( D ′ )= P ( T + | D ) P ( D ) P ( T + | D ) P ( D ) + [1 − P ( T − | D ′ )][1 − P ( D )] . (10)8gain, students need to recall the fact that T + and T − are complementary events and thus P ( T + | D ′ ) + P ( T − | D ′ ) = 1.Most textbook examples stop discussions upon the derivation of PPV, even when itis sufficiently small indicating the necessity of further testing. However, students may becurious about the following questions: What if a second test is conducted and the testresult is still positive, or negative? At that point, what is the probability that one has thedisease, indeed?In this section, we are ready to extend Bayes’ Theorem under the assumption of condi-tional independence and answer the above questions. Provided a set of events { B , B , ..., B m , m ≥ } with all positive probabilities, which formsa partition of the sample space S , suppose events A , A , ..., A n , n ≥ B k , k = 1 , , ..., m . Suppose also that we are interested in theconditional probability P ( B k | T ni =1 A i ). For any k = 1 , , ..., m and i = 1 , , ..., n , if thequantities P ( B k )’s and P ( A i | B k )’s are all known to us, we give the so-called extendedBayes’ Theorem as follows: Theorem 4 (Extended Bayes’ Theorem) P B k | n \ i =1 A i ! = P ( B k ) · Q ni =1 P ( A i | B k ) P mk =1 P ( B k ) · Q ni =1 P ( A i | B k ) . (11) Proof.
By the definition of conditional probability, students can easily obtain P B k | n \ i =1 A i ! = P ( T ni =1 A i ∩ B k ) P ( T ni =1 A i ) , (12)where the numerator P n \ i =1 A i ∩ B k ! = P n \ i =1 A i | B k ! P ( B k ) = P ( B k ) · n Y i =1 P ( A i | B k ) , (13)and the denominator P n \ i =1 A i ! = m X k =1 P n \ i =1 A i | B k ! · P ( B k )= m X k =1 P ( B k ) · n Y i =1 P ( A i | B k ) . (14)9he proof is now complete by combining (13) and (14) together. Remark 5.
In terms of P ( T ni =1 A i ) in (14), a possible error that some students may makeis to treat A i ’s as independent events and thus write P n \ i =1 A i ! = n Y i =1 P ( A i ) , where P ( A i ) , i = 1 , , ..., n is further computed by using the Law of Total Probability: P ( A i ) = m X k =1 P ( A i | B k ) P ( B k ) . As is pointed out in Remark 4, however, this is not necessarily true. And Example 5provides a simple counter-example when m = n = 2. It emphasizes that one should notconfuse independence with conditional independence.The significance of Theorem 4 is immediately recognized in answering questions raisedin Example 6. Suppose a person whose first test for the disease is positive, denoted by T +1 ,goes for a second test separately and the test is still positive, denoted by T +2 . Due to thetest sensitivity and specificity, it is reasonable to assume that T +1 and T +2 are conditionallyindependent under D as well as under D ′ . Then, according to the extended Bayes’ Theoremin Theorem 4, the probability that he actually has the disease can be updated as follows: P ( D | T +1 ∩ T +2 ) = P ( T + | D ) P ( D ) P ( T + | D ) P ( D ) + [1 − P ( T − | D ′ )] (1 − P ( D )) , (15)where P ( D ) , P ( T + | D ) and P ( T − | D ′ ) continue to denote prevalence, sensitivity and speci-ficity mentioned earlier, respectively. If P ( D | T +1 ∩ T +2 ) is still low, then a third test mightbe appropriate. In general, we can obtain the probability that one has the disease given n conditionally independent positive test results: P D | n \ i =1 T + i ! = P ( T + | D ) n P ( D ) P ( T + | D ) n P ( D ) + [1 − P ( T − | D ′ )] n (1 − P ( D )) . (16)This can be left as an exercise for students to practice. Remark 6.
For an accurate diagnostic test, both sensitivity and specificity are close to one.Then, it is safe to assume that the quantity P ( T + | D )1 − P ( T − | D ′ ) , likelihood ratio (See Altman and Bland, 1994b), is larger than 1. As a result,it is not hard for students to observe thatlim n →∞ P D | n \ i =1 T + i ! = 1by using some elementary calculus techniques, which implies that a sequence of positivetests can be a good indicator of the presence of disease. To illustrate the application of the extended Bayes’ Theorem and Remark 6, we include ahypothetical example borrowed from Utts and Heckard (2011, p. 220) that is appealing tostudents taking elementary statistics courses with modifications.
Example 7.
Last week, Alicia went to her physician for a routine medical exam and wastold that one of her tests came back positive, indicating that she may have a disease D .It is known that the test is 95% accurate as to whether someone has this disease or not.In other words, the test sensitivity and specificity are both 95%. Suppose that only 1 outof 1000 women of Alicia’s age indeed has D . With knowledge on Bayes’ Theorem, Aliciathen computed her actual chance of having the disease D given the positive test result byreferring to (10): P ( D | T + ) = (0 . . . . − . − . . . (17)The positive predicted value is so small that further testing for the disease D may beneeded. Therefore, Alicia went for the same test for D for a second time. Unfortunately,the test result turned out positive again. At this point, by using the extended Bayes’Theorem in (15), we have P ( D | T +1 ∩ T +2 ) = (0 . (0 . . (0 . − . (1 − . . . (18)With a second positive test result, Alicia’s chance of having the disease increased hugely byalmost 14 times. Suppose Alicia took a third and fourth test and they were again positive.Referring to (16), we have P ( D | T +1 ∩ T +2 ∩ T +3 ) = (0 . (0 . . (0 . − . (1 − . . , (19)11nd P ( D | T +1 ∩ T +2 ∩ T +3 ∩ T +4 ) = (0 . (0 . . (0 . − . (1 − . . , (20)closer and closer to 1. Bhatti and Wightman (2008) provided a real-world application of Bayes’ Theorem. Table2 in their paper gives the probabilities of being HIV positive for one and two positive testswith sensitivity 0 .
99 and specificity 0 .
99 with various prevalence in ten geographic regions.In the spirit of their paper, we calculate the probabilities of adult aged 15 to 49 being HIVpositive for one, two, and three positive tests using our extended Bayes’ Theorem basedon the data coming from the Joint United Nations Programme on HIV/AIDS (2018). Theresults are presented in Table 1.Table 1: Probability of adult aged 15 to 49 being HIV positive by geographic region givenone positive test, two and three conditionally independently positive tests with sensitivity0 .
99 and specificity 0 . . Adult One Two ThreeRegion Prevalence Positive Positives PositivesAsia and the Pacific 0.002 0.1656 0.9516 0.9995Caribbean 0.012 0.5460 0.9917 0.9999Eastern and Southern Africa 0.070 0.8817 0.9986 1.0000Eastern Europe and Central Asia 0.009 0.4734 0.9889 0.9999Latin America 0.004 0.2845 0.9752 0.9997Middle East and North Africa 0.001 0.0902 0.9075 0.9990Western and Central Africa 0.015 0.6012 0.9933 0.9999Western and Central Europe and North America 0.002 0.1656 0.9516 0.9995
For small prevalence (e.g., 0.001), the PPV given one positive test may remain to besmall (e.g., 0.0902) even if both sensitivity and specificity are large (e.g., 0.99). Given a12econd positive test, however, this conditional probability will increase dramatically andapproach 1. All probabilities of adult aged 15 to 49 being HIV positive for three positivetests are almost equal to 1 . The real data illustration has justified Remark 6.Furthermore, it will be a good idea for instructors to interpret the interesting phe-nomenon of small PPV in detail: This is due to the low prevalence of disease instead of the“inaccurate” diagnostic test. It demonstrates the necessity of follow-up confirmatory tests.And in fact, the probability P (cid:0) D | T ni =1 T + i (cid:1) approaches 1 very fast when both sensitivityand specificity are large enough, showing the great significance of diagnostic test accuracy. In this section, we propose a sequential testing scheme in which the extended Bayes’ The-orem is applied for more efficient disease diagnosis. For n ≥
1, define p n = P ( D | T ni =1 T ∗ i ),where T ∗ i = T + i or T − i meaning that the i th test is positive or negative, i = 1 , ..., n , so p n can be interpreted as the conditional probability that one has the disease given a se-quence of test results { T ∗ n } . Let { α n } and { β n } be two nondecreasing series of numberspredetermined appropriately such that0 < α ≤ · · · ≤ α n ≤ · · · ≤ β ≤ · · · ≤ β n ≤ · · · < . Then, we develop a stopping rule for diagnostic testing as follows: N = inf { n ≥ p n ≤ α n or p n ≥ β n } . (21)That is, we conduct the test successively and terminate at the first time N = n such thateither p n ≤ α n or p n ≥ β n happens. And we conclude that the disease is present (or notpresent) if p N ≥ β N (or p N ≤ α N ). Students from some interdisciplinary programs mayfind it interesting to follow this direction and explore the possibility for future researchwork.
4. Overall Concluding Thoughts
In Section 2, we have discussed conditional independence of events alone. It is worthmentioning that we can also generalize the concept of conditional independence of random13ariables, which is of great importance in the area of Bayesian statistics. A lot of detailsare left out in this article for brevity, as it is prepared for study of elementary statistics andmathematical statistics at the undergraduate level overall. One may see a batch of articlesincluding Dawid (1979), Dawid (1998) and Basu and Pereira (2011) for reference.Under the assumption of conditional independence, we have put forward the extendedBayes’ Theorem and address its application in diagnostic testing with examples and realdata illustrations. A novel idea is proposed in Section 3.4 briefly, but one may follow thisdirection to make it more substantial. Indeed, instructors are encouraged to introducethese materials accordingly to those students standing out in class.
References
Altman, D. G. and Bland, J. M. (1994a). Diagnostic tests 1: sensitivity and specificity.
BMJ: British Medical Journal , , 1552.Altman, D. G. and Bland, J. M. (1994b). Diagnostic tests 2: predictive values. BMJ:British Medical Journal , , 102.Basu, D. and Pereira, C. A. (2011). Conditional independence in statistics. In SelectedWorks of Debabrata Basu , 371-384.Bhatti, C. R. and Wightman, J. L. (2008). Conditional probability and HIV testing: Areal-world example.
The American Statistician , , 238-241.Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the RoyalStatistical Society, Series B , , 1-31.Dawid, A. P. (1998). Conditional independence. Encyclopedia of Statistical Sciences,Update , , 146-153.Hogg, R. V., Tanis, E. A., and Zimmerman, D. L. (2015). Probability and statisticalinference (9th ed.). Pearson.Joint United Nations Programme on HIV/AIDS (2018). “Factsheets.” http:// aidsinfo.unaids.org/
Utts, J. M. and Heckard, R. F. (2011).
Mind on statistics (4th ed.). Cengage Learning.14ackerly, D., Mendenhall, W. and Scheaffer, R. L. (2014).