InSilicoVA: A Method to Automate Cause of Death Assignment for Verbal Autopsy
IInSilicoVA:
A Method to Automate Cause of Death Assignmentfor Verbal Autopsy
Samuel J. Clark , Tyler McCormick , Zehang Li , andJon Wakefield Department of Sociology, University of Washington Department of Statistics, University of Washington Department of Biostatistics, University of Washington Institute of Behavioral Science (IBS), University of Colorado at Boulder MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt),School of Public Health, Faculty of Health Sciences, University of the Witwatersrand ALPHA Network, London School of Hygience and Tropical Medicine, London, UK INDEPTH Network, Accra, Ghana * Correspondence to: [email protected]
August 24, 2013 i a r X i v : . [ s t a t . O T ] A p r bstract Verbal autopsies (VA) are widely used to provide cause-specific mortality estimates in de-veloping world settings where vital registration does not function well. VAs assign cause(s)to a death by using information describing the events leading up to the death, provided bycare givers. Typically physicians read VA interviews and assign causes using their expertknowledge. Physician coding is often slow, and individual physicians bring bias to the codingprocess that results in non-comparable cause assignments. These problems significantly limitthe utility of physician-coded VAs. A solution to both is to use an algorithmic approach thatformalizes the cause-assignment process. This ensures that assigned causes are comparableand requires many fewer person-hours so that cause assignment can be conducted quicklywithout disrupting the normal work of physicians. Peter Byass’ InterVA method (Byasset al., 2012) is the most widely used algorithmic approach to VA coding and is aligned withthe WHO 2012 standard VA questionnaire (Leitao et al., 2013).The statistical model underpinning InterVA can be improved; uncertainty needs to be quan-tified, and the link between the population-level CSMFs and the individual-level cause as-signments needs to be statistically rigorous. Addressing these theoretical concerns providesan opportunity to create new software using modern languages that can run on multiple plat-forms and will be widely shared. Building on the overall framework pioneered by InterVA,our work creates a statistical model for automated VA cause assignment.
Acknowledgments
Preparation of this manuscript was partially supported by the Bill and Melinda Gates Foundation. Theauthors are grateful to Peter Byass, Basia Zaba, Kathleen Kahn, Stephen Tollman, Adrian Raftery, PhilipSetel and Osman Sankoh for helpful discussions. ii ontents Introduction
Verbal autopsy (VA) is a common approach for determining cause of death in regionswhere deaths are not recorded routinely. VAs are a standardized questionnaire adminis-tered to caregivers, family members, or others knowledgeable of the circumstances of arecent death with the goal of using these data to infer the likely causes of death (Byasset al., 2012). The VA survey instrument asks questions related to the deceased individ-ual’s medical condition (did the person have diarrhea, for example) and related to otherfactors surrounding the death (did the person die in an automotive accident, for exam-ple). VA has been widely used by researchers in Health and Demographic SurveillanceSites (HDSS), such as the INDEPTH Network (Sankoh and Byass, 2012), and has recentlyreceived renewed attention from the World Health Organization through the release of an up-date to the widely used standard VA questionnaire (see ). The main statistical challenge with VA data isto ascertain patterns in responses that correspond to a pre-defined set of causes of death.Typically the nature of such patterns is not known a priori and measurements are subjectto various types of measurement error, which are discussed in further detail below.There are two credible methods for automating the assignment of cause of death from VAdata. InterVA (see for example: Byass et al., 2003, 2006, 2012) is a proprietary algorithmdeveloped and maintained by Peter Byass, and it has been used extensively by both the AL-PHA (Maher et al., 2010) and INDEPTH networks of HDSS sites and a wide variety of otherresearch groups. At this time InterVA appears to be the de facto standard method. TheInstitute for Health Metrics and Evaluation (IHME) have proposed a number of additionalmethods (for example: Flaxman et al., 2011; James et al., 2011; Murray et al., 2011), some ofwhich build on earlier work by King and Lu (King et al., 2010; King and Lu, 2008). Amongtheir methods, the Simplified Symptom Pattern Recognition (SSPR) method (Murray et al.,2011) is most directly comparable to InterVA and appears to have the best chance of becom-ing widely used. The SSPR and related methods require a so-called ‘gold standard’ database,a database consisting of a large number of deaths where the cause has been certified by medi-cal professionals and is considered reliable, and further, where the symptoms for those deathsare also verifiable by medical professionals. Deaths recorded in a gold standard databaseare typically in-hospital deaths. Given regional variation in the prevalence and familiarity ofmedical professionals with certain causes of death, region-specific gold standard databasesare also necessary. The majority of public health and epidemiology researchers and officialsdo not have access to such a gold standard database, motivating development of methods toinfer cause of death using VA data without access to gold standard databases.Aside from the challenges related to obtaining useful gold standard training data, using VAdata to infer cause assignment is also statistically challenging because there are multiplesources of variation and error present in VA data. We identify three sources of variationand propose a novel method called
InSilicoVA to address these sources of variation. First,though responses on the VA instrument may be the respondents recollection or best guessabout what has happened, they are not necessarily accurate. This type of error is presentin all survey data but may be especially pronounced in VA data because respondents have1arying levels of familiarity with the circumstances surrounding a death. Also, the definitionof an event of interest may be different for each respondent. A question about diarrhea,for example, requires the respondent to be sufficiently involved in the decedent’s medicalcare to know this information and to have a definition of diarrhea that maps relativelywell to accepted clinical standards. A second source of variability arises from individualvariation in presentation of diseases. Statistical methods must be sufficiently robust tooverfitting to appreciate that two individuals with the same cause of death may have slightlydifferent presentations, and thus, different VA reports. Third, like InterVA, InSilicoVAwill use physician elicited conditional probabilities. These probabilities, representing theconsensus estimate of the likelihood that a person will experience a given symptom if theydied from a certain cause, are unknown. Simulation and results with data from the AgincourtMRC/Wits Rural Public Health and Heath Transitions Research Unit (Agincourt) indicatethat results obtained with conditional probabilities assumed fixed and known (as is done inboth InterVA and the SSPR method) can underestimate uncertainty in population cause ofdeath distributions. We evaluate the sensitivity to these prior probabilities.InSilicoVA incorporates uncertainty from these sources and propagates the uncertainty be-tween individual cause assignments and population distributions. Accounting for thesesources of error produces confidence intervals for both individual cause assignments andpopulation distributions. These confidence intervals reflect more realistically the complex-ity of cause assignment with VA data. A unified framework for both individual cause andpopulation distribution estimation also means that additional information about individualcauses, such as physician coded VAs, can easily be used in InSilicoVA, even if physician codedrecords are only available for a subset of cases. Further, we can exactly quantify the contri-bution of each VA questionnaire item in classifying a case into a specific cause, affording thepossibility for ‘item reduction’ by identifying which inputs are most useful for discriminatingbetween causes. This feature could lead to a more efficient, streamlined survey mechanismand set of conditional probabilities for elicitation.In Section 2 we describe InterVA and present open challenges which we address with InSil-icoVA. In Section 3 we present the InSilicoVA model. Section 4 describes applications ofboth methods to simulated and real data, and Section 4.3 provides the results. We concludewith a very brief discussion in Section 5.
Below is Byass’ summary of the InterVA method presented in Byass et al. (2012).
Bayes theorem links the probability of an event happening given a particular circumstancewith the unconditional probability of the same event and the conditional probability of thecircumstance given the event. If the event of interest is a particular cause of death, and thecircumstance is part of the events leading to death, then Bayes theorem can be applied in termsof circumstances and causes of death. Specifically, if there are a predetermined set of possible auses of death C . . . C m and another set of indicators I . . . I n representing various signs,symptoms and circumstances leading to death, then Bayes general theorem for any particular C i and I j can be stated as: P ( C i | I j ) = P ( I j | C i ) × P ( C i ) P ( I j | C i ) × P ( C i ) + P ( I j | ! C i ) × P (! C i ) (1)where P (! C i ) is (1 − P ( C i )). Over the whole set of causes of death C . . . C m a set of probabilitiesfor each C i can be calculated using a normalising assumption so that the total conditionalprobability over all causes totals unity: P ( C i | I j ) = P ( I j | C i ) × P ( C i ) (cid:80) mi =1 P ( C i ) (2)Using an initial set of unconditional probabilities for causes of death C . . . C m (which canbe thought of as P ( C i | I )) and a matrix of conditional probabilities P ( I j | C i ) for indicators I . . . I n and causes C . . . C m , it is possible to repeatedly apply the same calculation processfor each I . . . I n that applies to a particular death: P ( C i | I ...n ) = P ( I j | C i ) × P ( C i | I ...n − ) (cid:80) mi =1 P ( C i | I ...n − ) (3)This process typically results in the probabilities of most causes reducing, while a few likelycauses are characterised by their increasing probabilities as successive indicators are processed. In the same article Byass describes the process of defining the conditional probabilities P ( I j | C i ). Apart from the mathematics, the major challenge in building a probabilistic model coveringall causes of death to a reasonable level of detail lies in populating the matrix of conditionalprobabilities P ( I j | C i ). There is no overall source of data available which systematically quan-tifies probabilities of various signs, symptoms and circumstances leading to death in terms oftheir associations with particular causes. Therefore, these conditional probabilities have to beestimated from a diversity of incomplete sources (including previous InterVA models) and mod-ulated by expert opinion. In the various versions of InterVA that have been developed, expertpanels have been convened to capture clinical expertise on the relationships between indicatorsand causes. In this case, an expert panel convened in Geneva in December 2011 and contin-ued to deliberate subsequently, particularly considering issues that built on previous InterVAversions. Experience has shown that gradations in levels of perceived probabilities correspondmore to a logarithmic than linear scale, and in the expert consultation for InterVA-4, we used aperceived probability scale that was subsequently converted to numbers on a logarithmic scaleas shown below. able 1: InterVA ConditionalProbability Letter-ValueCorrespondancesLabel Value InterpretationI 1.0 AlwaysA 0.8 Almost alwaysA 0.5 CommonA 0.2B 0.1 OftenB 0.05B 0.02C 0.01 UnusualC 0.005C 0.002D 0.001 vRareD 0.0005D 0.0001E 0.00001 Hardly everN 0.0 Never
The physician-derived conditional probabilities that are supplied with the InterVA software(Byass, 2013) are coded using the letter codes in the leftmost column of Table 1.We rewrite, interpret and discuss the InterVA model below. • Deaths: y j j ∈ { , . . . , J } , (cid:126)Y = [ y , . . . , y J ] • Signs/symptoms: s k ∈ { , } , k ∈ { , . . . , K } , (cid:126)S = [ s , . . . , s K ] • Causes: c n n ∈ { , . . . , N } , (cid:80) Nn =1 c j,n = 1 • Fraction of all deaths that are cause n , the ‘cause-specific mortality fraction’ (CSMF): f n n ∈ { , . . . , N } , (cid:126)F = [ f , . . . , f N ] , (cid:80) Nn =1 f n = 1
1. For each death y j , a VA interview produces a binary-valued vector of signs/symptoms: (cid:126)S j = { s j, , s j, , . . . s j,K } (4) S is the J × K matrix whose rows are the (cid:126)S j for each death.2. A K × N matrix of conditional probabilities reflecting physicians’ opinions about how4ikely a given sign/symptom is for a death resulting from a given cause: P = Pr( s | c ) Pr( s | c ) · · · Pr( s | c N )Pr( s | c ) Pr( s | c ) · · · Pr( s | c N )... ... . . . ...Pr( s K | c ) Pr( s K | c ) · · · Pr( s K | c N ) (5)As supplied with the InterVA software (Byass, 2013) P does not contain internally con-sistent probabilities . This is easy to understand by noting that these probabilities arenot derived from a well-defined event space that would constrain them to be consistentwith one another. As described by Byass above in Section 2.1 the physicians providea ‘letter grade’ for each conditional probability, and these correspond to a rankingof perceived likelihood of a given sign/symptom if the death is due to a given cause.These letter grades are then turned into numbers in the range [0 ,
1] (NB: 0.0 and 1.0are included) using Table 1.Consequently it is not possible to assume that the members of P will behave as expectedwhen one attempts to calculate complements and use more than one in an expressionin a way that should be consistent.3. An initial guess of (cid:126)F , (cid:126)F (cid:48) = [ f (cid:48) n , . . . , f (cid:48) N ] For a specific death y j we can imagine and examine the two-dimensional joint distribution( c j,n , s j,k ): Pr( c j,n | s j,k ) = P ( s j,k | c j,n ) · Pr( c j,n )Pr( s j,k )= Pr( s j,k | c j,n ) · Pr( c j,n )Pr( s j,k | c j,n ) · Pr( c j,n ) + x (6)where x = Pr( s j,k |¬ c j,n ) · Pr( ¬ c j,n ) (7)Looking on the RHS of (6), we have Pr( s j,k | c j,n ) from the conditional probabilities fromphysicians and Pr( c j,n ) ≈ f (cid:48) n . If the conditional probabilities P were well-behaved, then x = N (cid:88) n (cid:48) =1 , n (cid:48) (cid:54) = n Pr( s j,k | c j,n (cid:48) ) · Pr( c j,n (cid:48) ) (8) The P supplied with InterVA has many logical inconsistencies, for example situations where con-ditional probabilities should add up to equal another: Pr(fast breathing for 2 weeks or longer | HIV) +Pr(fast breathing for less than 2 weeks | HIV) (cid:54) = Pr(fast breathing | HIV), or where they just do not makesense: Pr(fast breathing for 2 weeks or longer | sepsis) > Pr(fast breathing | sepsis) . The P supplied withInterVA-4 (Byass, 2013) is a 254 ×
69 matrix with 17,526 entries. We have investigated automated waysof correcting the inconsistencies, but with every attempt we discover more, so we have concluded that theentries in P need to be re-elicited from physicians using an approach that ensures that they are consistent. P supplied with the InterVA software (Byass, 2013) are not consistentwith one another this calculation does not produce useful results.InterVA solves this with an arbitrary reformulation of the relationship. For each death y j and over all signs/symptoms s k associated with y j : ∀ ( j, n ) : Propensity( c j,n | (cid:126)S j ) = f (cid:48) n · K (cid:89) k =1 [Pr( s j,k | c n )] s k (9)For each death y j these ‘Propensities’ do not add to 1.0 so they need to be normalized toproduce well-behaved probabilities: ∀ ( j, n ) : Pr( c j,n ) = Propensity( c j,n | (cid:126)S j ) (cid:80) Nn =1 Propensity( c j,n | (cid:126)S j ) (10)The population-level CSMFs (cid:126)F are calculated by adding up the results of calculating (10)for all causes for all deaths: ∀ n : f n = J (cid:88) j =1 Pr( c j,n ) (11) In effect what InterVA does is distribute a given death among a number of predefined causes.The cause with the largest fraction is assumed to be the primary cause, followed with decreas-ing significance by the remaining causes in order from largest to smallest. The conceptualconstruct of a ‘partial’ death is central to InterVA and is interchanged freely with the prob-ability of dying from a given cause. This equivalence is not real and is at the heart of thetheoretical problems with InterVA.At a high level InterVA proposes a very useful solution to the fundamental challenge thatall automated VA coding algorithms face - how to characterize and use the relationshipthat exists between signs/symptoms and causes of death. In a perfect world we would havemedically certified patient records that include the results of ‘real’ autopsies, and we coulduse those to model this relationship and use the results of that model in our cause-assignmentalgorithms. But in that perfect world there is no use for VA at all. So by definition we livein the world where that type of ‘gold standard’ data do not and will not exist most of thetime for most of the developing world where VAs are conducted. Byass’ solution to this isto accept the limitations on the expert knowledge that physicians can provide to substitutefor gold standard data, and further, to elicit and then organize that information in a veryuseful format – the conditional probabilities matrix P above in (5). In sum, Byass has sortedthrough a variety of possible general strategies and settled on one that is both doable andproduces useful results.Where we contribute is to help refine the statistical and computational methods used toconduct the cause assignments. In order to do that we have evaluated InterVA thoroughly,6nd we have identified a number of weaknesses that we feel can be addressed. The brieflist below is by necessity a blunt description of those weaknesses, which nevertheless do notreduce the importance of InterVA as described just above.There are several theoretical problems with the InterVA model:1. The derivation presented in (1) through (3) is incorrect; in particular (2) does not followfrom (1), and (3) does not follow from (2). As described just above, (11) requires thatprobabilities be equated with fractional deaths, which is conceptually difficult.2. InterVA’s statistical model is not ‘probabilistic’ in the recognized sense because itdoes not include elements that can vary unpredictably, and hence there is no random-ness . Although P contains ‘probabilities’ (see discussion with (5) above), these arenot allowed to vary in the estimation procedure used to assign causes of death – P iseffectively a fixed input to the model.3. Because the model does not contain features that are allowed to vary unpredictably, itis not possible to quantify uncertainty to produce probabilistic error bounds.4. If we ignore the errors in the derivation and work with (9) - (11) as if they were correct,there are additional problems. Equation (9) is at the core of InterVA and demonstratestwo undesirable features:(a) For a specific individual the propensity for each cause is deterministically affectedby f (cid:48) n , what Byass terms the ‘prior’ probability of cause n , effectively a user-defined parameter of the algorithm. This means that the final individual-levelcause assignments are a deterministic transformation of the so-called ‘prior’ – i.e.the results are not only sensitive to but depend directly on the ‘prior’.(b) The expression in (9) captures only one valance in the relationship betweensigns/symptoms and a cause of death – it acknowledges and reacts only to thepresence of a sign/symptom but not to its absence, effectively throwing away halfof the information in the dataset. To include information conveyed by the absenceof a sign/symptom, (9) needs a term that involves something like ‘1 − Pr( s j,k | C n )’.[The components of InSilicoVA in (22) and (27) below include this term.] This isundesirable for two reasons: (1) signs/symptoms are not selecting causes that fit and de-selecting causes that don’t, but rather just de-selecting causes, and (2) thefinal probabilities are typically the product of a large number of very small num-bers, and hence their numeric values can become extremely small, small enough tointeract badly and unpredictable with the numerical storage/arithmetic capacityof the software and computers used to calculate them. A simple log transforma-tion would solve this problem.5. Finally, (11) is a deterministic transformation of the individual-level cause assignmentsto produce a population-level CSMF; simply another way of stating the individual-levelcause assignments. Because the individual-level cause assignments are not probabilis-tic, neither are the resulting CSMFs.In addition there are idiosyncrasies that affect the current implementation of InterVA (Byass,7013) and some oddities having to do with the matrix of conditional probabilities providedwith Byass’ InterVA. We will not describe those here.InSilicoVA is designed to overcome these problems and provide a valid statistical frameworkon which further refinements can be built. InSilicoVA is a statistical model and computational algorithm to automate assignment ofcause of death from data obtained by VA interviews. Broadly the method aims to: • Follow in the footsteps of InterVA building on its strengths and addressing its weak-nesses. • Produce consistent, comparable cause assignments and CSMFs. • Be statistically and computationally valid and extensible. • Provide a means to quantify uncertainty. • Be able to function to assign causes to a single death.The name ‘InSilicoVA’ is inspired by ‘in-vitro’ studies that mimic real biology but in morecontrolled circumstances (often on ‘glass’ petri dishes). In this case we are assigning causesto deaths using a computer that performs the required calculations using a silicon chip.Further, we owe a great debt to the InterVA (interpret VA) method that provides usefulphilosophical and practical frameworks on which to build the new method - so we have stuckto the structure of InterVA’s name. • Deaths: y j j ∈ { , . . . , J } , (cid:126)Y = [ y , . . . , y J ] • Signs/symptoms: s k ∈ { , } , k ∈ { , . . . , K } , (cid:126)S = [ s , . . . , s K ] • Causes of death: c n n ∈ { , . . . , N } , (cid:126)C = [ c , . . . , c N ] • For individual j , probability of cause n given (cid:126)S j : (cid:96) j,n = Pr( y j = c n | (cid:126)S j ) , j ∈ { , . . . , J } ,n ∈ { , . . . , N } , (cid:126)L j = [ l j, , . . . , l j,N ] , (cid:80) Nn =1 (cid:96) j,n = 1 • Count of all deaths that are cause n , the ‘cause-specific death count’ (CSDC): m n n ∈ { , . . . , N } , (cid:126)M = [ m , . . . , m N ] , (cid:80) Nn =1 m n = J • Fraction of all deaths that are cause n , the ‘cause-specific mortality fraction’ (CSMF): f n n ∈ { , . . . , N } , (cid:126)F = [ f , . . . , f N ] , (cid:80) Nn =1 f n = 18 .2 InSilicoVA Data:
1. For each death y j , the VA interview produces a binary-valued vector of signs/symptoms: (cid:126)S j = { s j, , s j, , . . . s j,K } (12) S is the J × K matrix whose rows are the (cid:126)S j for each death. The columns of S areassumed to be independent given (cid:126)C , i.e. there is no systematic relationship between thesigns/symptoms for a given cause. This is very obviously not a justifiable assumption.Signs and symptoms come in characteristic sets depending on the cause of death, sothere is some correlation between them, conditional on a given cause. Nonetheless weassume independence in order to facilitate initial construction and testing of our model,and most pragmatically, so that we can utilize the matrix of conditional probabilitiessupplied by Byass with the InterVA software (Byass, 2013) – it is impossible to eitherregenerate or significantly improve upon these without significant resources with whichto organize meetings of physicians with the relevant experience who can provide thisinformation. This (cid:126)S j is the same as (4) used by InterVA.2. A K × N matrix of conditional probabilities reflecting physicians’ opinions about howlikely a given sign/symptom is for a death resulting from a given cause: P = Pr( s | c ) Pr( s | c ) · · · Pr( s | c N )Pr( s | c ) Pr( s | c ) · · · Pr( s | c N )... ... . . . ...Pr( s K | c ) Pr( s K | c ) · · · Pr( s K | c N ) (13)InSilicoVA assumes that the components of P are consistent with one another. In thesimulation study described in Section 4.1, we construct consistent values for P , butwhen we test the model on real data in Section 4.2, we have no option other than usingthe inconsistent P supplied with the InterVA software (Byass, 2013).3. An initial guess of (cid:126)F , (cid:126)F (cid:48) = [ f (cid:48) n , . . . , f (cid:48) N ] We are interested in the joint distribution ( (cid:126)F , (cid:126)Y ) given the set of observed signs/symptoms S . The posterior distribution is:Pr( (cid:126)F , (cid:126)Y | S ) = Pr( S | (cid:126)Y , (cid:126)F ) Pr( (cid:126)Y | (cid:126)F ) Pr( (cid:126)F )Pr( S ) ∝ Pr( S | (cid:126)Y , (cid:126)F ) Pr( (cid:126)Y | (cid:126)F ) Pr( (cid:126)F ) (14)= J (cid:89) j =1 Pr( S | y j , (cid:126)F ) Pr( y j | (cid:126)F ) Pr( (cid:126)F ) (15)9ecause individual cause assignments are independent, individual sign/symptom vectors (cid:126)S j are independent from (cid:126)F (the CSMFs), and we have:Pr( (cid:126)F , (cid:126)Y | S ) ∝ J (cid:89) j =1 Pr( S | y j ) Pr( y j | (cid:126)F ) Pr( (cid:126)F ) (16)We will use a Gibbs sampler to sample from this posterior as follows:G.1 start with an initial guess of (cid:126)F , (cid:126)F (cid:48) G.2 sample (cid:126)Y | (cid:126)F , S G.3 sample (cid:126)F | (cid:126)Y , S G.4 repeat steps G.2 and G.3 until (cid:126)F and (cid:126)Y convergeThis algorithm is generic and allows a rich range of models. For the moment the InsilicoVAmodel is: s j,k | c n ∼ Bernoulli(Pr( s j,k | c n )) (17) y j = c n | (cid:126)F ∼ Multinomial N (1 , (cid:126)F ) (18) (cid:126)F ∼ Dirichlet( (cid:126)α ) , (cid:126)α is N -dimensional and constant (19)Then the posterior in (16) is:Pr( (cid:126)F , (cid:126)Y | S ) ∝ J (cid:89) j =1 K (cid:89) k =1 Pr( s j,k | y j = c n ) Pr( y j = c n | (cid:126)F ) Pr( (cid:126)F ) (20)This formulation is computationally efficient because of Multinomial/Dirichlet conjugacy,and because using Bayes rule we have, for step G.2: y j = c n | (cid:126)F , (cid:126)S j ∼ Multinomial N (1 , (cid:126)L j ) (21)where the (cid:96) j,n that compose (cid:126)L j are: (cid:96) j,n = Pr( y j = c n | (cid:126)S j )= Pr( y j = c n ) · Pr( (cid:126)S j | y j = c n )Pr( (cid:126)S j )substituting f n = Pr( y j = c n ) and using the data (cid:126)S j and P to calculatethe probability of a specific (cid:126)S j given the cause assignment y j = c n = f n · (cid:81) Kk =1 (cid:16) Pr( s j,k | y j = c n ) s j,k · [1 − Pr( s j,k | y j = c n )] (1 − s j,k ) (cid:17)(cid:80) Nn (cid:48) =1 f n (cid:48) · (cid:81) Kk =1 (cid:16) Pr( s j,k | y j = c n (cid:48) ) s j,k · [1 − Pr( s j,k | y j = c n (cid:48) )] (1 − s j,k ) (cid:17) (22)10e can also derive from (20) the distribution of (cid:126)F conditional on (cid:126)Y , for step G.3: (cid:126)F | (cid:126)Y , S ∼ Dirichlet( (cid:126)M + (cid:126)α ) (23)where m n = J (cid:88) j =1 [ y j = c n ] , using Iverson’s bracket notation:[ z ] = (cid:40) z true0 if z false (24)In summary, the Gibbs sampler proceeds given suitable initialization (cid:126)F (cid:48) by:G.2 sampling a cause for each death to generate a new (cid:126)Y | (cid:126)F , S : y j = c n | (cid:126)F , (cid:126)S j ∼ Multinomial N (1 , (cid:126)L j ) (25)where (cid:96) j,n = f n · (cid:81) Kk =1 (cid:16) Pr( s j,k | y j = c n ) s j,k · [1 − Pr( s j,k | y j = c n )] (1 − s j,k ) (cid:17)(cid:80) Nn (cid:48) =1 f n (cid:48) · (cid:81) Kk =1 (cid:16) Pr( s j,k | y j = c n (cid:48) ) s j,k · [1 − Pr( s j,k | y j = c n (cid:48) )] (1 − s j,k ) (cid:17) (26)G.3 sampling a new (cid:126)F | (cid:126)Y , S : (cid:126)F | (cid:126)Y , S ∼ Dirichlet( (cid:126)M + (cid:126)α ) (27)The resulting sample of ( (cid:126)F , (cid:126)Y ) and the (cid:126)L j that go with it form the output of the method.These are distributions of CSMFs at the population level and probabilities of dying fromeach cause at the individual level. These distributions can be summarized as required toproduce point values and measures of uncertainty in (cid:126)F and (cid:126)L j .Deaths often result from more than one cause. InSilicoVA accommodates this possibility byproducing a separate distribution of the probabilities of being assigned to each cause; thatis N distributions, one for each cause. In contrast, InterVA reports one value for each cause,and those values sum to unity across causes for a single death.Finally, with a suitable (cid:126)F , InSilicoVA can be used to assign causes (and their associated (cid:96) j,n )to a single death by repeatedly drawing causes using (25). This requires no more informationthan InterVA to accomplish the same objective, and it produces uncertainty bounds aroundthe probabilities of being assigned to each cause. To evaluate both InSilicoVA and InterVA we fit them to simulated and real data. We havecreated R code that implements both methods. The R code for InterVA matches the resultsproduced by Peter Byass’ implementation (Byass, 2013).11 .1 Simulation Study
Our simulated data are generated using this procedure:1. Draw a set of Pr( s k | c n ) so that they have the same distribution and range as thoseprovided with Byass’ InterVA software (Byass, 2013).2. Draw a set of simulated deaths from a made up distribution of deaths by cause.3. For each simulated death, assign a set of signs/symptoms by applying the conditionalprobabilities simulated in step 1.These simulated data have the same overall features as the data required for either InterVAor InSilicoVA, and we know both the real population distribution of deaths by cause and thetrue individual cause assignments.Our simulation study poses three questions:1. Fair comparison of InSilicoVA and InterVA . To make this comparison we gen-erate 100 simulated datasets and apply both methods to each dataset. We summarizethe results with individual-level and population-level error measures. We refer to thisas ‘fair’ because we apply both methods in their simplest form to data that fulfill allthe requirements of both methods. Since the data are effectively ‘perfect’ we expectboth methods to perform well.2.
Influence of numeric values of
Pr( s k | c n ). Given the structure of InterVA, we areconcerned that the results of InterVA may be sensitive to the exact numerical valuestaken by the conditional probabilities supplied by physicians. To test this we rescalethe simulated Pr( s k | c n ) so that their range is restricted to [0 . − . Reporting errors . As we mentioned in the introduction, we are concerned aboutreporting errors for any algorithmic approach to VA cause assignment. To investigatethis we randomly recode our simulated signs/symptoms so that a small fraction are‘wrong’ - i.e. coded 0 when the sign/symptom exists (15%) or 1 when there is nosign/symptom (10%). We do this for 100 simulated datasets and summarize errorsresulting from application of both methods.
To investigate the behavior of InSilicoVA and InterVA on real data, we apply both methods tothe VA data generated by the Agincourt health and demographic surveillance system (HDSS)in South Africa (Kahn et al., 2012) from roughly 1993 to the present. The Agincourt sitecontinuously monitors the population of 21 villages located in the Bushbuckridge Districtof Mpumalanga Province in northeast South Africa. This is a rural population living inwhat was during Apartheid a black ‘homeland’. The Agincourt HDSS was established in theearly 1990s with the purpose of guiding the reorganization of South Africa’s health system.12ince then the goals of the HDSS have evolved and now it contributes to evaluation ofnational policy at population, household and individual levels. The population covered bythe Agincourt site is approximately eighty-thousand.For this test we us the physician-generated conditional probabilities P and initial guess ofthe CSMFs (cid:126)F (cid:48) provided by Byass with the InterVA software (Byass, 2013). The results of our simulation study are summarized graphically as a set of Figures 1 – 3.The Agincourt results are presented in Figure 4.
InSilicoVA begins to solve most of the critical problems with InterVA. The results of ap-plying both methods to simulated data indicate that InSilicoVA performs well under allcircumstances except ‘reporting errors’, but even in this situation InSilicoVA performs farbetter than InterVA. InSilicoVA and InterVA both perform relatively well when the simu-lated data are perfect. InSilicoVA’s performance is not affected by changing the magnitudesand ranges of the conditional probability inputs, whereas InterVA’s performance suffers dra-matically. With reporting errors both methods’ performance is negatively impacted, butInterVA becomes effectively useless.Applied to one specific real dataset, both methods produce qualitatively similar results, butInSilicoVA is far more conservative and produces confidence bounds, whereas InterVA doesnot. For Agincourt, Figure 4.A shows the causes with the largest difference between the In-SilicoVA and InterVA estimates of the CSMF. InSilicoVA classifies a larger portion of deathsas due to causes labeled as ‘other.’ This indicates that these causes are related to eitherthe communicable or non-communicable diseases, but there is not enough information tomake a more specific classification. This feature of InSilicoVA identifies cases that are diffi-cult to classify using available data and may, for example, be good candidates for physicianreview.We view this behavior as a strength of InSilicoVA because it is consistent with the fun-damental weakness of the VA approach, namely that both the information obtained froma VA interview and the expert knowledge and/or gold standard used to characterize therelationship between signs/symptoms and causes are inherently weak and incomplete, andconsequently it is very difficult or impossible to make highly specific cause assignments usingVA. Given this, we do not want a method that is artificially precise, i.e. forces fine-tunedclassification when there is insufficient information. Hence we view InSilicoVA’s behavioras reasonable, ‘honest’ (in that it does not over interpret the data) and useful. ‘Useful’ in13he sense that it identifies where our information is particularly weak and therefore wherewe need to apply more effort either to data or to interpretation, like addition physicianreviews.
We plan a variety of additional work on InSilicoVA:1. Explore the possibility of replacing the Dirichlet distribution in (19), (23) and (27)with a mixture of Normals on the baseline logit transformed set of f n ’s. This providesadditional flexible parameters to allow each CSMF to have its own mean and variance.2. Embed InSilicoVA in a spatio-temporal model that allows (cid:126)F to vary smoothly throughspace and time. This would provide a parsimonious way of exploring spatio-temporalvariation in the CSMFs while using the data as efficiently as possible.3. Create the ability to add physician cause assignments to (22) and (26) so that infor-mation in that form can be utilized when available. The physician codes will requirepre-processing to remove physician-specific bias in cause assignment, perhaps using a‘rater reliability method’ (for example: Salter-Townshend and Murphy, 2012).4. Most importantly , address the obviously invalid assumption that the signs/symptomsare independent given a specific cause. This will require modeling of the signs/symptomsand the physician-provided conditional probabilities so that important dependenciescan be accommodated. Further, this will require additional consultation with physi-cians and acquisition of new expert knowledge to characterize these dependencies. Allof this will require a generous grant and the collaboration of a large number of experts.This will very likely greatly improve the performance and robustness of the method.5.
Critically , re-elicit the conditional probabilities P from physicians so that they arelogically well-behaved, i.e. fully consistent with one another and their complements.6. Focus and sharpen VA questionnaire. Quantify the influence of each sign/symptom to:(1) potentially eliminate low-value signs/symptoms and thereby make the VA interviewmore efficient, and/or (2) suggest sign/symptom ‘types’ that appear particularly useful,and potentially suggest augmenting VA interviews based on that information.7. Explore new possibilities for refining the conditional probabilities P and potentiallyfor entirely new models. 14 lllllllllllll llllllll InSilicoVA InterVA . . . . Individual accuracy A cc u r a cy A llllllll InSilicoVA InterVA . . . . . . CSMF estimation M ean A b s o l u t e E rr o r BFigure 1: Simulation setup 1: ‘Fair Comparison’. A : InSilicoVA correctly assignscause of death correctly effectively 100% of the time. InterVA is less accuratein assigning individual causes of death. B : InSilicoVA’s errors in identifyingCSMFs are consistently very small. InterVA’s errors are also generally small,but the distribution has a long tail in the direction of large errors – sometimesInterVA’s errors are large. 15 lll InSilicoVA InterVA . . . . . Individual accuracy A cc u r a cy A l l InSilicoVA InterVA . . . . . . . CSMF estimation M ean A b s o l u t e E rr o r BFigure 2: Simulation setup 2: ‘Conditional Probabilities in the range [0 . − . ’.A : InSilicoVA correctly assigns cause of death correctly effectively 100% of thetime. InterVA correctly assigns cause of death correctly 80% of the time withwide variation all the way down to as low as 60% and never above about 90%. B :InSilicoVA’s errors in identifying CSMFs are consistently very small. InterVA’serrors in identifying the CSMFs are larger and more variable.16 l l InSilicoVA InterVA . . . . . . . Individual accuracy A cc u r a cy A ll ll InSilicoVA InterVA . . . . CSMF estimation M ean A b s o l u t e E rr o r BFigure 3: Simulation setup 3: ‘Reporting Errors’.
Both methods suffer, but InterVAsuffers a lot more. A : InSilicoVA correctly assigns cause of death correctly about70% of the time. InterVA correctly assigns cause of death correctly about 40%of the time. B : InSilicoVA’s errors in identifying CSMFs are still consistentlysmall. InterVA’s errors in identifying the CSMFs are larger.17 ost discrepency InSilico−InterVA −20% −15% −10% −5% 0 5% 10% 15% 20% llll llll l l
Other and unspecified neoplasmsDiabetes mellitusDiarrhoeal diseasesStrokeOther and unspecified infect disChronic obstructive pulmonary disDigestive neoplasmsPulmonary tuberculosisSepsis (non−obstetric)Other and unspecified NCD A Within 4%
InSilico−InterVA −4% −2% 0 2% 4% llllllll ll
Intentional self−harmRoad traffic accidentAsthmaBreast neoplasmsAssaultMeningitis and encephalitisOther and unspecified cardiac disReproductive neoplasms MF Other and unspecified external CoDAcute abdomen B Estimated cause−specific mortality fractions
Estimated CSMF0.00 0.05 0.10 0.15 0.20
Other and unspecified neoplasmsDiabetes mellitusDiarrhoeal diseasesStrokeOther and unspecified infect disChronic obstructive pulmonary disDigestive neoplasmsPulmonary tuberculosisSepsis (non−obstetric)Other and unspecified NCD
InterVAInSilicoVA
CFigure 4: Agincourt Application. A : Differences in CSMFs (InSilicoVA – InterVA)displaying the 10 specific causes that differ most. InSilicoVA is less willing tomake highly specific classifications and thus produces larger CSMFs associatedwith less-specific causes and smaller CSMFs associated with more specific causes. B : Same as (A) but for the 10 largest differences in CSMFs that are still less than4%, which as a group includes HIV even though it is not in the top 10 that areplotted here. C : The CSMFs produced by both models on their natural scale,the same 10 causes as in A for which the differences are greatest.18 eferences Byass, P. (2013). Interva software. .Byass, P., D. Chandramohan, S. J. Clark, L. D’Ambruoso, E. Fottrell, W. J. Graham, A. J.Herbst, A. Hodgson, S. Hounton, K. Kahn, et al. (2012). Strengthening standardisedinterpretation of verbal autopsy data: the new interva-4 tool.
Global health action 5 .Byass, P., E. Fottrell, D. L. Huong, Y. Berhane, T. Corrah, K. Kahn, L. Muhe, et al. (2006).Refining a probabilistic model for interpreting verbal autopsy data.
Scandinavian journalof public health 34 (1), 26–31.Byass, P., D. L. Huong, and H. Van Minh (2003). A probabilistic approach to interpret-ing verbal autopsies: methodology and preliminary validation in vietnam.
ScandinavianJournal of Public Health 31 (62 suppl), 32–37.Flaxman, A. D., A. Vahdatpour, S. Green, S. L. James, C. J. Murray, and ConsortiumPopulation Health Metrics Research (2011). Random forests for verbal autopsy anal-ysis: multisite validation study using clinical diagnostic gold standards.
Popul HealthMetr 9 (29).James, S. L., A. D. Flaxman, C. J. Murray, and Consortium Population Health Metrics Re-search (2011). Performance of the tariff method: validation of a simple additive algorithmfor analysis of verbal autopsies.
Popul Health Metr 9 (31).Kahn, K., M. A. Collinson, F. X. G´omez-Oliv´e, O. Mokoena, R. Twine, P. Mee, S. A.Afolabi, B. D. Clark, C. W. Kabudula, A. Khosa, et al. (2012). Profile: Agincourt healthand socio-demographic surveillance system.
International journal of epidemiology 41 (4),988–1001.King, G. and Y. Lu (2008). Verbal autopsy methods with multiple causes of death.
StatisticalScience 100 (469).King, G., Y. Lu, and K. Shibuya (2010). Designing verbal autopsy studies.
Popul HealthMetr 8 (19).Leitao, J., D. Chandramohan, P. Byass, R. Jakob, K. Bundhamcharoen, andC. Choprapowan (2013). Revising the WHO verbal autopsy instrument to facilitate routinecause-of-death monitoring.
Global Health Action 6 (21518).Maher, D., S. Biraro, V. Hosegood, R. Isingo, T. Lutalo, P. Mushati, B. Ngwira, M. Nyirenda,J. Todd, and B. Zaba (2010). Translating global health research aims into action: theexample of the alpha network*.
Tropical Medicine & International Health 15 (3), 321–328.Murray, C. J., S. L. James, J. K. Birnbaum, M. K. Freeman, R. Lozano, A. D. Lopez,and Consortium Population Health Metrics Research (2011). Simplified symptom patternmethod for verbal autopsy analysis: multisite validation study using clinical diagnosticgold standards.
Popul Health Metr 9 (30). 19alter-Townshend, M. and T. B. Murphy (2012). Sentiment analysis of online media.
Lausen,B., van del Poel, D. and Ultsch, A.(eds.). Algorithms from and for Nature and Life. Studiesin Classification, Data Analysis, and Knowledge Organization .Sankoh, O. and P. Byass (2012). The indepth network: filling vital gaps in global epidemi-ology.