Conflict diagnostics for evidence synthesis in a multiple testing framework
Anne M. Presanis, David Ohlssen, Kai Cui, Magdalena Rosinska, Daniela De Angelis
CConflict diagnostics for evidence synthesis in a multipletesting framework
Anne M. Presanis, David Ohlssen, Kai Cui,Magdalena Rosinska, Daniela De AngelisOctober 16, 2018
Medical Research Council Biostatistics Unit, University of Cambridge, U.K.Novartis Pharmaceuticals Corporation, East Hanover, NJ, U.S.A.Department of Epidemiology, National Institute of Public Health,National Institute of Hygiene, Warsaw, Poland e-mail: [email protected]
Abstract
Evidence synthesis models that combine multiple datasets of varying design, to estimatequantities that cannot be directly observed, require the formulation of complex prob-abilistic models that can be expressed as graphical models. An assessment of whetherthe different datasets synthesised contribute information that is consistent with eachother, and in a Bayesian context, with the prior distribution, is a crucial componentof the model criticism process. However, a systematic assessment of conflict suffersfrom the multiple testing problem, through testing for conflict at multiple locations ina model. We demonstrate the systematic use of conflict diagnostics, while accountingfor the multiple hypothesis tests of no conflict at each location in the graphical model.The method is illustrated by a network meta-analysis to estimate treatment effects insmoking cessation programs and an evidence synthesis to estimate HIV prevalence inPoland.
KEYWORDS: Conflict; evidence synthesis; graphical models; model criticism; multipletesting; network meta-analysis.
Evidence synthesis refers to the use of complex statistical models that combine multiple, dis-parate and imperfect sources of evidence to estimate quantities on which direct informationis unavailable or inadequate (e.g. Ades and Sutton, 2006; Welton et al., 2012; De Angelis1 a r X i v : . [ s t a t . M E ] S e p t al., 2014). Such evidence synthesis models are typically graphical models represented by adirected acyclic graph (DAG) G ( V , E ), where V and E are sets of nodes and edges respec-tively, encoding conditional independence assumptions (Lauritzen, 1996). With increasedcomputational power, models of the form of G ( V , E ) have proliferated, requiring also thedevelopment of model criticism tools adapted to the challenges of evidence synthesis. Ina Bayesian framework, any of the prior distribution, the assumed form of the likelihoodand structural and functional assumptions may conflict with the observed data or with eachother. To assess the consistency of each of these components, various mixed- or posterior-predictive checks have been proposed. In particular, the “conflict p-value” (Marshall andSpiegelhalter, 2007; G˚asemyr and Natvig, 2009; Presanis et al., 2013; G˚asemyr, 2016) is adiagnostic calculated by splitting G ( V , E ) into two independent sub-graphs (“partitions”)at a particular “separator” node φ , to measure the consistency of the information providedby each partition about the node (a “node-split”). G˚asemyr and Natvig (2009) and Presaniset al. (2013) demonstrate how the conflict p-value may be evaluated in different contexts,including both one- and two-sided hypothesis tests, and G˚asemyr (2016) demonstrates theuniformity of the conflict p-value in a wide range of models.The conflict p-value may be used in a targeted manner, searching for conflict at particularnodes in a DAG. However, in complex evidence syntheses, often the location of potentialconflict may be unclear. A systematic assessment of conflict throughout a DAG is thenrequired to locate problem areas (e.g. Krahn et al., 2013). Such systematic assessment,however, suffers from the multiple testing problem, either through testing for conflict ateach node in G ( V , E ) or through the separation of G ( V , E ) into more than two partitionsto simultaneously test for conflict between each pair-wise partition. Here we account forthese multiple tests by adopting the general hypothesis testing framework of Hothorn et al.(2008); Bretz et al. (2011), allowing for simultaneous multiple hypotheses in a parametricsetting. They propose different possible tests to account for multiplicity: we concentratehere on maximum-T type tests.In section 2, we define evidence synthesis before introducing the particular models thatmotivate our work on systematic conflict assessment: a network meta-analysis and a modelfor estimating HIV prevalence. Section 3 describes the methods we use to test for conflictand account for the multiple tests we perform. We apply these methods to our examples inSection 4 and end with a discussion in Section 5. Formally, our goal is to estimate K basic parameters θ = ( θ , . . . , θ K ) given a collection of N independent data sources y = ( y , . . . , y N ), where each y i , i ∈ , . . . , N may be a vector orarray of data points. Each y i provides information on a functional parameter ψ i (or poten-tially a vector of functions ψ i ). When ψ i = θ k is the identity function, the data y i are saidto directly inform θ k . Otherwise, ψ i = ψ i ( θ ) is a function of multiple parameters in θ : the y i therefore provide indirect information on these parameters. Given the conditional indepen-dence of the datasets y i , the likelihood is L ( θ ; y ) = (cid:81) Ni =1 L i ( ψ i ( θ ); y i ), where L i ( ψ i ( θ ); y i )is the likelihood contribution of y i given the basic parameters θ . In a Bayesian context,for a prior distribution p ( θ ), the posterior distribution p ( θ | y ) ∝ p ( θ ) L ( θ ; y ) summarisesall information, direct and indirect, on θ . Let ψ = ( ψ , . . . , ψ N ) be the set of functional2arameters informed by data and φ = { θ , ψ } be the set of all unknown quantities, whetherbasic or functional. In this setup, the DAG G ( V , E ) representing the evidence synthesismodel has a set of nodes V = { φ , y } representing either known or unknown quantities; andthe directed edges E represent dependencies between nodes. Each ‘child’ node is indepen-dent of its ‘siblings’ conditional on their direct ‘parents’. The joint distribution of all nodes V is the product of the conditional distributions of each node given its direct parents. Anexample DAG of an evidence synthesis model is given in Figure 1(i). Circles denote unknownquantities: either basic parameters θ that are ‘founder’ nodes at the top of a DAG havinga prior distribution (double circles); or functional parameters ψ . Squares denote observedquantities, solid arrows represent stochastic distributional relationships, and dashed arrowsrepresent deterministic functional relationships. This DAG could be extended to more com-plex hierarchical priors and models, where repetition over variables is represented by ‘plates’,rounded rectangles around the repeated nodes, labelled by the range of repetition. In gen-eral, the set V may be larger than the set of basic and functional parameters, including alsoother intermediate nodes in the DAG, for example unit-level parameters in a hierarchicalmodel. For brevity, from here on we will abbreviate any DAG to the notation G ( φ , y ). Network meta-analysis (NMA) is a specific type of evidence synthesis (Salanti, 2012), thatgeneralises meta-analysis from the synthesis of studies measuring a treatment effect (e.g. oftreatment B versus treatment A in a randomised clinical trial), to the synthesis of data onmore than two treatment arms. The studies included in the NMA may not all measure thesame treatment effects, but each study provides data on at least two of the treatments. Forexample, considering a set of treatments { A, B, C, D } , the network of trials may consist ofstudies of different “designs”, i.e. with different subsets of the treatments included in eachtrial (Jackson et al., 2014), such as { ABC, ABD, BD, CD } . As with meta-analysis, NMAmodels can be implemented in either a two-stage or single-stage approach, as described morecomprehensively elsewhere (Salanti, 2012; Jackson et al., 2014). Here we concentrate on asingle-stage approach, where the original data Y Jdi for each treatment J of study i of design d are available. A full likelihood model specifies Y Jdi ∼ f ( p Jdi | w Jdi )for some distribution f ( · ) and treatment outcome p Jdi with associated information w Jdi . Forexample, if the data are numbers of events out of total numbers at risk of the event, then w Jdi might be the denominator for treatment J . We might assume the data are realisations of aBinomial random variable, Y Jdi ∼ Bin ( w Jdi , p
Jdi ), where the proportion p Jdi is a function of astudy-specific baseline α di representing a design/study-specific baseline treatment B d and astudy-specific treatment contrast (log odds ratio) µ B d Jdi , through a logistic model, logit ( p Jdi ) = α di + µ B d Jdi . The intercept is α di = logit ( p B d di ). To complete the model specification requiresparameterisation of the treatment effects µ AJdi . A common effect model, for a network-widereference treatment A , is given by µ AJdi = η AJ (1)for each J (cid:54) = A , i.e. assumes that all studies of all designs measure the same treatmenteffects. The η AJ are basic parameters, of which there are the number of treatments in thenetwork minus 1, representing the relative effectiveness of treatment J compared to thenetwork baseline treatment A . All other contrasts η JK , J, K (cid:54) = A are functional parameters,3efined by assuming a set of consistency equations η JK = η AK − η AJ for each J, K (cid:54) = A .These equations define a transitivity property of the treatment effects. The extension to arandom-effects model, still under the consistency assumption, implies µ AJdi = η AJ + β AJdi (2)where usually the random effects β AJdi , reflecting between-study heterogeneity, are assumednormally distributed around 0, with a covariance structure defined as a square matrix Σ β suchthat all entries on the leading diagonal are σ β and all remaining entries are σ β / basic parameters is denoted η b = ( η AJ ) J (cid:54) = A and thecorresponding set of functional parameters is denoted η f = ( η JK = η AK − η AJ ) J,K (cid:54) = A . Notethat the common-effect model is a special case of the random-effects model. In the Bayesianparadigm, we specify prior distributions for the basic parameters η b , the (nuisance) study-specific baselines α di , and in the case of the random treatment effects model, the commonstandard deviation parameter σ β in terms of which the variance-covariance matrix Σ β isdefined. Note that any change in parameterisation of the model, for example changingtreatment labels, will affect the joint prior distribution, making invariance challenging oreven impossible in a Bayesian setting. A smoking cessation example
Dias et al. (2010), amongst many others (Lu and Ades,2006; Higgins et al., 2012; Jackson et al., 2015), considered an NMA of studies of smokingcessation. The network consists of 24 studies of 8 different designs, including 2 three-armtrials. Four smoking cessation counselling programs are compared (Figure 2): A no interven-tion; B self-help; C individual counselling; D group counselling. The data (SupplementaryMaterial Table A.1) are the number of individuals out of those participating who have suc-cessfully ceased to smoke at 6-12 months after enrollment. Here we fit the common- andrandom-effect models under a consistency assumption and diffuse priors: Normal(0 , ) onthe log-odds scale for η b and α di ; and Uniform(0 ,
5) for σ β . We find (Supplementary Ma-terial Table A.2) that the deviance information criterion ( DIC , Spiegelhalter et al. (2002))prefers the random-effect model, suggesting it is necessary to explain the heterogeneity inthe network. The estimates of the treatment effects from the random-effect model are bothsomewhat different and more uncertain than those from the common-effect model, agree-ing with estimates found by others, including Dias et al. (2010). Moreover, the posteriorexpected deviance for the random-effect model, E θ | y ( D ) = 54, is slightly larger than thenumber of observations (50), suggesting still some lack of fit to the data. A single node-split model
This residual lack of fit and the general potential in NMA forvariability between groups of direct and indirect information from multiple studies that isexcess to between-study heterogeneity (“inconsistency”, Lu and Ades (2006)) has motivatedvarious approaches to the detection and resolution of inconsistency (Lumley, 2002; Lu andAdes, 2006; Dias et al., 2010; Higgins et al., 2012; White et al., 2012; Jackson et al., 2014).Dias et al. (2010) apply the idea of node-splitting, based on Marshall and Spiegelhalter(2007), to the NMA context, splitting a single mean treatment effect η JK in the random ef-fects consistency model (2). A DAG is partitioned into direct evidence from studies directlycomparing J and K versus indirect evidence from all remaining studies. Specifically, for anystudy i of design d that directly compares J and K , the study-specific treatment effect is4xpressed in terms of the direct treatment effect: µ JKdi = η JKDir + β JKdi ; whereas the indirectversion of the treatment effect is estimated from the remaining studies via the consistencyequation: η JKInd = η AK − η AJ . The posterior distribution of the contrast or inconsistency pa-rameter δ JK = η JKDir − η JKInd is then examined to check posterior support for the null hypothesis δ JK = 0. Multiple node-splits
Although the single node-split approach in Dias et al. (2010) hasbeen extended to automate the generation of different single node-splitting models for conflictassessment (van Valkenhoef et al., 2016), the simultaneous splitting of multiple nodes in aNMA has not yet been considered. In section 4.1, we use multiple splits to investigate conflictin the smoking cessation network beyond heterogeneity, accounting for the multiplicity.
As further illustration of systematic conflict detection, we consider an evidence synthesisapproach to estimating HIV prevalence in Poland, among the exposure group of men whohave sex with men (MSM) (Rosinska et al., 2016). The data aggregated to the national levelare given in Supplementary Material Table A.3. There are three basic parameters to beestimated: the proportion of the male population who are MSM, ρ ; the prevalence of HIVinfection in the MSM group, π ; and the proportion of those infected who are diagnosed, κ (Figure 3(a)). Likelihood
The total population of Poland, N = 15 , , y , . . . , y directly inform, respectively: ρ ; prevalence of diagnosedinfection πκ ; prevalence of undiagnosed infection π (1 − κ ); and lower ( D L ) and upper ( D U )bounds for the number of diagnosed infections D = N ρπκ (Figure 3(a), SupplementaryMaterial Table A.3). These data are modelled independently as either Binomial ( y , y , y )or Poisson ( y , y ). Priors
The number diagnosed D is constrained a priori to lie between the stochasticbounds D L and D U , which in turn are given vague log-normal priors. Since D is alreadydefined as a function of the basic parameters, the constraint is implemented via introductionof an auxiliary Bernoulli datum of observed value 1, with probability parameter given by afunctional parameter c = P r ( D L ≤ D ≤ D U ) (Figure 3(a)). The basic parameters ρ, π and κ are given independent uniform prior distributions on [0 , Exploratory model criticism
This initial analysis reveals a lack of fit to some of thedata (Supplementary Material Table A.3), with particularly high posterior mean deviancesfor the data informing ρ and πκ . This lack of fit in turn may suggest the existence of conflictin the DAG (Spiegelhalter et al., 2002). In Rosinska et al. (2016), conflict between evidencesources was not directly considered or formally measured, instead resolving the lack of fitby modelling potential biases in the data in a series of sensitivity analyses. By contrast, inSection 4.2 we systematically assess the consistency of evidence coming from the prior model5nd from each likelihood contribution, by splitting the DAG at each functional parameter(Figure 3(b)). Briefly, as in Presanis et al. (2013), consider partitioning a DAG G ( φ , y ) into two inde-pendent partitions, at a separator node φ . The separator could either be a founder node,i.e. a basic parameter, or a node internal to the DAG, and is split into two copies φ a and φ b , one in each partition (Figure 1(ii,iii)). Suppose that partition G ( φ a , y a ) containsthe data vector y a and provides inference resulting in a posterior distribution p ( φ a | y a ),and that similarly partition G ( φ b , y b ) results in p ( φ b | y b ). The aim is to assess thenull hypothesis that φ a = φ b . For φ taking discrete values, we can directly evaluate p ( φ a = φ b | y a , y b ). If the support of φ is continuous, we consider the posterior prob-ability of δ = h ( φ a ) − h ( φ b ), where h ( · ) is a function that transforms φ to a scale forwhich a uniform prior is appropriate. The two-sided “conflict p-value” is defined as c =2 × min { Pr { p δ ( δ | y a , y b ) < p δ (0 | y a , y b ) } , − Pr { p δ ( δ | y a , y b ) < p δ (0 | y a , y b ) }} , where p δ is the posterior density of the difference δ , so that the smaller c is, the greater the conflict. Generalising now to multiple tests of conflict, suppose that G ( φ , y ) is partitioned into Q independent sub-graphs, G ( φ , y ) , . . . , G Q ( φ Q , y Q ), where each disjoint subset of the data y q , q ∈ , . . . , Q is chosen to identify part of the basic parameter space θ q = ( θ q , . . . , θ qb q ),where b q is the number of basic parameters in partition q . Note that θ q ⊂ φ q for each q ∈ , . . . , Q , whereas the complementary subset φ q \ θ q consists of functional and othernon-basic parameters. To test the consistency of information provided by each partitionabout a set of J separator nodes ( φ ( s )1 , . . . , φ ( s ) J ) ⊆ φ from the original model, a set ofconstrasts δ j = ( δ j , . . . , δ jC j ) is formed for each j ∈ , . . . , J , one contrast per pair ofpartitions in which φ j appears. A maximum of (cid:0) Q (cid:1) contrasts are possible for each separator,i.e. C j ≤ (cid:0) Q (cid:1) . Each contrast δ jc is defined as δ jc = h j ( φ jq A | y A ) − h j ( φ jq B | y B )for the pair of partitions c = { q A , q B } and node-split copies { φ jq A , φ jq B } . The functions h j ( · )are functions that transform the separator nodes { φ jq A , φ jq B } to an appropriate scale for auniform (Jeffreys) prior to be applicable, if either is a founder node in either partition.Denote the separator nodes in each partition by φ ( s ) q = { φ jq , j ∈ , . . . , m q , q ∈ , . . . , Q } ,where m q ≤ J is the number of separator nodes in partition q . Writing these nodes asa stacked vector φ S = ( φ ( s )1 , . . . , φ ( s ) Q ) = ( φ , . . . , φ m , φ , . . . , φ m , . . . , φ Q , . . . , φ m Q Q ) T ,and the transformed version as φ H = h ( φ S ), the total set of contrasts is ∆ = ( δ , . . . , δ J ) T = C ∆ T φ H C ∆ T . Note that not every separator nodenecessarily appears in every partition, so although φ H has maximum length J × Q , inpractice, its length m = (cid:80) Qq =1 m q ≤ J × Q . The contrast matrix C ∆ T therefore has dimension p × m , so that it maps from the space of the m separator nodes (including node-split copies)to that of the p = (cid:80) Jj =1 C j contrasts. A test for consistency of the information in eachpartition may be expressed as a test of the null hypothesis that H : ∆ = C ∆ T φ H = (3) Using standard asymptotic theory (Bernardo and Smith, 1994, see also derivation in Sup-plementary Material Appendix B), it can be shown that if the joint posterior distributionof all parameters φ in all partitions is asymptotically multivariate normal (i.e. if the prioris flat enough relative to the likelihood), and if ∂ ∆( φ ) ∂ φ = C ∆ T is non-singular with contin-uous entries, then the posterior mean of ∆ is ∆ = C ∆ T φ H a ≈ C ∆ T ˆ φ H and the posteriorvariance-covariance matrix of ∆ is S ∆ a ≈ C ∆ T V H C ∆ , where: ˆ φ H is the maximum like-lihood estimate of ˆ φ H ; the matrix V H = J h ( ˆ φ S ) T V S J h ( ˆ φ S ); J h ( ˆ φ S ) is the Jacobian ofthe transformation h ( φ S ); and V S is a blocked diagonal matrix consisting of the inverseobserved information matrices for the separator nodes in each partition along the diagonal.The posterior summaries ∆ and S ∆ , i.e. the Bayes’ estimator under a mean-squared errorBayes’ risk function and corresponding variance-covariance matrix, may therefore be usedunder the general simultaneous inference framework of Hothorn et al. (2008); Bretz et al.(2011) to construct a multiplicity-adjusted test that ∆ = . Given the estimator ∆ and corresponding variance-covariance matrix S ∆ , define a vec-tor of test statistics T n = D − / n ( ∆ − ∆ ), where n is the dimension of the data y and D n = diag ( S ∆ ). Then it can be shown (Hothorn et al., 2008; Bretz et al., 2011) that T n tends in distribution to a multivariate normal distribution, T n a ∼ N m ( , R ), where R := D − / n S ∆ D − / n ∈ R m,m is the posterior correlation matrix for the vector (length m ) of contrasts ∆ . Under the null hypothesis (3), T n = D − / n ∆ a ∼ N m ( , R ), and hence,assuming S ∆ is fixed and known, the authors show that a global χ -test of conflict can beformulated: X = T Tn R + T n d −→ χ ( Rank ( R ))where the superscript + denotes the Moore-Penrose inverse of the corresponding matrix and Rank ( R ) is the degrees of freedom. Importantly, it is also possible to construct multiply-adjusted local (individual) conflict tests, based on the m z − scores corresponding to T n andthe null distribution of the maximum of these, Z max , (Hothorn et al., 2008; Bretz et al.,2011). This latter null distribution is obtained by integrating the limiting m − dimensionalmultivariate normal distribution over [ − z, z ] to obtain the cumulative distribution function P ( Z max ≤ z ). The individual conflict p-values are then calculated as P ( | z k | < Z max ) , k ∈ , . . . , m , with a corresponding global conflict p-value (an alternative to the χ -test) givenby P ( | z max | < Z max ). 7 Examples
We now illustrate the idea of systematic multiple node-splitting to assess conflict in our twomotivating examples. All analyses were carried out in
OpenBUGS 3.2.2 (Lunn et al., 2009)and
R 3.2.3 (R Core Team, 2015). We use the
R2OpenBUGS package (Sturtz et al., 2005) torun
OpenBUGS from within R and the multcomp package (Bretz et al., 2011) to carry out thesimultaneous local and global max-T tests. Consider first a NMA in general, and for simplicity, assume there are no multi-arm trialsand a common-effect model (equation (1)) for the data. The basic parameters η b form aspanning tree of the network of evidence (Figure 2), i.e. a graph with no cycles, such thateach node in the network can be reached from every other node, either directly or indirectlythrough other nodes (van Valkenhoef et al., 2012). Multiple possible partitionings of theevidence network exist, so a choice must be made (Figure 2). Suppose the spanning tree η b is identifiable by a set of evidence Y b containing outcomes from all trials designed todirectly estimate the treatment effects in η b . Then every treatment effect is identifiable from Y b , by definition of a spanning tree and the fact that each treatment effect represented byedges outside the spanning tree is a functional parameter in the set η f , equal to a linearcombination of the basic parameters. The data Y b therefore indirectly inform the functionalparameters η f , whereas the remaining data, Y f = Y \ Y b directly inform η f . A comparisonbetween the direct and indirect evidence on η f is therefore possible, to assess conflict betweenthe two types of evidence. The network is split into two partitions, { η Dirf , Y f } (the “directevidence partition”, DE) and { η Indf , Y b } (the “spanning tree partition”, ST) and the directand indirect versions of the functional parameters compared: ∆ = η Dirf − η Indf . Dependingon the studies that are in the DE partition, the basic parameters η b may also be weaklyidentifiable in the DE partition, due to prior information. Since a NMA model may beformulated as a DAG, this Direct/Indirect partitioning is equivalent to a multi-node split inthe DAG at the functional parameters (Supplementary Material Figure A.2).Generalising now to more complex situations, if the direct data Y f form a sub-network ofevidence, the question arises of whether these data should be split into further partitions,by identifying a spanning tree for the sub-network. Then the vector ∆ of contrasts to testwould involve comparisons between more than two partitions, e.g. for three partitions: ∆ = (cid:0) η f − η f , η f − η f , η f − η f (cid:1) T If we now consider a random rather than common heterogeneity effects model (equation (2)),a decision must be made on how to handle the variance components in Σ β . One approachwould be to split the variance components simultaneously with the means, so that ∆ alsoincludes contrasts for the variances. Alternatively, if the variance components are not wellidentified by the evidence in a partition, a common variance component could be assumed.Such commonality could potentially allow for feedback between partitions, since they wouldnot be fully independent (Marshall and Spiegelhalter, 2007; Presanis et al., 2013).Finally, for multi-arm trials, the key consideration is that multi-arm studies should have8nternal consistency, and hence their observations should not be split between partitions. Achoice must therefore be made whether to initially include multi-arm data in the ST data Y b , in the DE data Y f , or in a third partition of their own. In the latter case, any study-specific treatment effect µ JKdi , where d is a multi-arm design, could be compared at least withthe ST partition, where η JK is definitely identified. Potentially, it could also be comparedsimultaneously with the DE partition, if the edge J K is identifiable in the DE partition. Thecomparison can be made even if
J K is not identifiable, or only weakly identifiable from theprior, but if the prior is diffuse, then no conflict will be detected due to the uncertainty. Sucha comparison is not therefore particularly meaningful, unless we are interested in prior-dataconflict.
Smoking cessation example
To illustrate concretely the above issues, we consider firstthe spanning tree (
AB, AC, AD ) corresponding to the parameters η b = { η AB , η AC , η AD } for the smoking cessation example. Figures 2(b-d) demonstrate different ways of splittingthe evidence based on this spanning tree, depending on how we treat the evidence frommulti-arm trials. In Figures 2(b,c), we consider just two partitions, with the multi-arm evi-dence either left in the ST partition { η Indf , Y b } or included in the DE partition { η Dirf , Y f } ,respectively. We compare the direct and indirect evidence on each of the edges or treat-ment comparisons ( BC, BD, CD ). In Figure 2(d), we consider a series of spanning trees((
AB, AC, AD ) , ( BC, BD ) and ( CD )), together with a final partition consisting of evidencefrom multi-arm trials, resulting in four partitions.We also consider an alternative choice of spanning tree, ( AB, AC, BD ), as in Figures 2(e,f).In these two models, we again make a choice between including the multi-arm evidencein either the ST or DE partitions and compare the evidence in each partition on edges(
AD, BC, CD ). In all cases, we assume random heterogeneity effects and make the choiceto assume common variance components across the partitions, splitting only the means.Table 1 gives posterior mean (sd) estimates of the treatment effects (log odds ratios) foredges outside the spanning tree, from each partition, where the subscript 1 denotes the STpartition and 2 denotes the DE partition for the two-partition models (b,c,e,f). For thefour-partition model (d), 1-3 denote the sequential spanning tree partitions and 4 the multi-arm trial partition. Also given, for each edge outside the original spanning tree, are theposterior mean (sd) differences between partitions and both the local and global posteriorprobabilities of no difference, adjusted for the multiple tests and their correlation. First, notethat the global test of no conflict varies by model, and hence by what partitions of evidenceare compared with each other: the posterior probability of no conflict in model (b) is 94 . .
4% and 27 .
4% for models (c) and (e). These latter two models appear todetect some mild evidence of conflict, despite the large uncertainty in many of the partition-specific treatment effect estimates, with several of the posterior standard deviations of thesame order of magnitude as the corresponding posterior means, if not larger. The DIC isalso slightly smaller for the two models (c) and (e) which detect potential conflict, comparedto those that don’t. This lack of invariance of the global test to the partitions employedsuggests it is not enough to rely on a single node-splitting model to search for conflict ina DAG. Moreover, it motivates looking at local tests for conflict in different node-splittingmodels, to locate the specific items of evidence that may conflict with each other.A closer look at the local posterior probabilities of no conflict for each edge outside the initial9panning tree reveals that the potential conflict detected by models (c) and (e) involves edgesincluding treatment D (posterior probabilities 17 .
8% and 18 .
6% for edges BD and CD inmodel (c), 12 .
4% and 10 .
5% for edges AD and CD in model (e)). Each of these fourlocal tests involves a partition where the estimated treatment effect for the relevant edge isimplausibly large ( > >
400 on the odds ratio scale) andwhere the sample sizes of the studies involved are small (e.g. studies 7, 20, 23 and 24 inSupplementary Material Table A.1).Unlike models (c), (e) and (f), where in both partitions, each sub-network spans all 4 treat-ments, in models (b) and (d), the spanning tree chosen, (
AB, AC, AD ), is such that for eachsub-network outside the spanning tree, not all the treatments are included (Figure 2). Thisresults in a lack of identifiability for the basic parameters η b in partition 2 of model (b) andin partitions 2 and 3 of model (d) (Table 1), where their estimates are dominated by theirdiffuse prior distribution (Normal(0 , ) on the log odds ratio scale). There is thereforeno potential for detecting conflict about the basic parameters η b , only about the functionalparameters η f .The different results obtained from each of the five models are understandable, since eachmodel partitions the evidence in a different way, and the detection of conflict relies on theconflicting evidence being in different rather than the same partitions. However, where thesame evidence is in the same partition for different models — for example, the evidencedirectly informing the AC edge in models (c) and (d) — approximately the same estimate isreached in each model, as expected (0 . .
26) in model (c), 0 . .
28) in model (d), Table1).
Figure 3(b) demonstrates the multiple node-splits we make to systematically assess conflictin the original DAG of Figure 3(a), separating out the contributions of the prior model andeach likelihood contribution. These node-splits result in 5 partitions, with 6 contrasts to testfor equality to zero. Denoting the nodes in the “prior” partition (above the red arrows inFigure 3(b)) by the subscript p and the nodes in each “likelihood” partition (below the redarrows in Figure 3(b)) by d , the vector of contrasts to test is then ∆ = ( h ( ρ p ) − h ( ρ d ) , h ( π p κ p ) − h ([ πκ ] d ) , h ( π p (1 − κ p )) − h ([ π (1 − κ )] d ) ,g ( D L p ) − g ( D L d ) , g ( D U p ) − g ( D U d ) , g ( D p ) − g ( D d ))) T where h ( · ) and g ( · ) denote the logit and log functions respectively. These contrasts arerepresented by the red dot-dashed arrows in Figure 3(b). In the “prior” partition, thepriors given to the basic parameters are those of the original model (Section 2.2). In each“likelihood” partition, the basic parameters are given Jeffreys’ priors so that the posteriorsrepresent only the likelihood. These priors are Beta( / , / ) for the proportions and p ( D B d ) ∝ /D / B d for the lower and upper bounds ( B = L, U ) for D . D d is given a Uniform priorbetween D L d and D U d .Figure 4 shows the posterior distributions of the contrasts ∆ , where 0 lies in these distribu-tions and the corresponding unadjusted ( p U ) and multiply-adjusted ( p A ) individual conflictp-values testing for equality to 0. A global χ -squared (Wald) test gives a conflict p-value10f 0 . D U (posterior probability of zero difference is p U = 0 . D ( p U = 0 . ρ ( p U = 0 . D U and D , at p A = 0 .
175 and p A = 0 .
058 respectively. Note thatthe posterior contrasts in Figure 4 are slightly non-normal, hence we interpret the adjustedposterior probabilities of no conflict as exploratory, rather than as absolute measures.Examining closer the posterior distributions of the “prior” and “likelihood” versions of thenode D (Supplementary Material Figure A.3, upper panel), we visualise better the prior-data conflict: the “likelihood” version lies very much in the lower tail of the “prior” version.This is in spite of – or rather because of – the flat Uniform priors of the prior model, whichtranslate into a non-Uniform implied prior for the function D p = N ρ p π p κ p .The “saturated” model splitting apart each component of evidence in the DAG allows us toassess prior-data conflict in this model, but not conflict between different combinations ofthe likelihood evidence, due to lack of identifiability: in each likelihood partition in Figure3(b), clearly only the parameter directly informed by the data, whether basic or functional,can be identified. To assess consistency of evidence between likelihood terms, we employ across-validatory “leave-n-out” approach, for n = 1 and n = 2, splitting in each case the rel-evant nodes directly informed by the left-out data items. Note that other possibilities exist,such as splitting at the basic parameters, depending on which data are left out. Table 2 givesunadjusted ( p U ) and various multiply-adjusted ( p AW , p AL , p AA ) individual posterior proba-bilities of no difference between nodes split between partitions 1 (the “left-out” evidence)and 2 (the remaining evidence). These posterior probabilities highlight inconsistency in thenetwork of evidence { y , y , y , y } , i.e. informing the three nodes ρ, πκ and D = N ρπκ .Splits at these three nodes demonstrate low posterior probabilities of no difference in the“leave-1-out” models (A), (B) and (E), and in the “leave-2-out” models (B), (C), and (J) inparticular. There is no potential for the evidence y on the prevalence of undiagnosed infec-tion π (1 − κ ) to conflict with any other evidence, since π and κ are not separately identifiablefrom the remaining evidence { y , y , y , y } alone. Hence all of the posterior probabilities ofno difference concerning the node π (1 − κ ) are high.The conflict in the { y , y , y , y } network is well illustrated by the node-split model (J),where the count data on the lower and upper bounds for the D are “left out” in partition1. Supplementary Material Figure A.3 (lower panel) shows the posterior distributions foreach of D L , D U and D in both partitions. Since in partition 2 the data on the limits for D have been excluded, the posterior distributions for the bounds (solid black and red lines) areflat and hugely variable. Despite this, the posterior distribution for D is relatively tightlypeaked, due to the indirect evidence on D provided by the data informing ρ and π κ . Itis this indirect evidence that conflicts with the direct evidence informing D via the data { y , y } on the bounds for D . 11 Discussion
We have proposed here the systematic assessment of conflict in an evidence synthesis, inparticular accounting for the multiple tests for consistency entailed, through the simultaneousinference framework proposed by Hothorn et al. (2008); Bretz et al. (2011). We have chosenthe max-T tests that allow both for multiply-adjusted local and global testing simultaneously.Note that the use of this (typically classical) simultaneous inference framework relies on theasymptotic multivariate normality of the joint posterior distribution. In cases where thelikelihood does not dominate the prior, resulting in a skewed or otherwise non-normal poste-rior, we treat the results of conflict analysis as exploratory, rather than absolute measures ofconflict. If the posterior is skewed but still uni-modal, a global, implicitly multiply-adjusted,test for conflict can be formulated in terms of the Mahalanobis distance of each posteriorsample from their mean, as we proposed in Presanis et al. (2013). This is a multivariateequivalent of calculating the tail area probability for regions further away from the posteriormean than the point . However, the Mahalanobis-based test does not allow us to obtainlocal tests for conflict, nor does it apply in the case of a multi-modal posterior. In the lattercase, kernel density estimation could be used to obtain the multivariate tail area probability,although such estimation is computationally challenging for large posterior dimension.Although generalised evidence syntheses have mostly been carried out in a Bayesian frame-work, there are examples (e.g. Commenges and Hejblum, 2013) that are either frequentist ornot fully Bayesian. In the NMA field, maximum likelihood and Bayesian methods are bothcommon (e.g. White et al., 2012; Jackson et al., 2014). An advantage of the simultaneousinference framework (Hothorn et al., 2008; Bretz et al., 2011) is that, given any estimator ∆ of a vector of differences and its corresponding variance-covariance matrix S ∆ , regardless ofthe method used to obtain the estimates, the global and local max-T tests can be formulated.Conflict p-values can be seen as cross-validatory posterior predictive checks (Presanis et al.,2013). There is a large literature on various types of Bayesian predictive diagnostics, in-cluding prior-, posterior- and mixed-predictive checks (e.g. Box, 1980; Gelman et al., 1996;Marshall and Spiegelhalter, 2007). A key issue much discussed in this literature is the lack ofuniformity of posterior predictive p-values under the null hypothesis (Gelman, 2013), withsuch p-values conservative due to the double use of data. Much work has therefore beendevoted to either alternative p-values (e.g. Bayarri and Berger, 2000) or post-processing ofp-values to calibrate them (e.g. Steinbakk and Storvik, 2009). Gelman (2013) argues thatthe importance of uniformity depends on the context in which the model checks are con-ducted: in general non-uniformity is not an issue, but if the posterior predictive tests relyon parameters or imputed latent data, then care should be taken. Since conflict p-valuesare cross-validatory, the issue of conservatism and the double use of data does not apply. Infact, for a wide class of standard hierarchical models, G˚asemyr (2016) has demonstrated theuniformity of the conflict p-value.As illustrated by both applications, the choice of different ways of partitioning the evidencein a DAG can lead to different conclusions over the existence of conflict. This is to be ex-pected when considering the local conflict p-values, since conflicting evidence may need to bein different partitions in order to be detectable. This is analogous to the idea of “masking” incross-validatory outlier detection, where outliers may not be detected if multiple outliers ex-ist (Chaloner and Brant, 1988). In the case of the global tests for conflict, the NMA example12howed that these are also not invariant to the choice of partition. In the NMA literature, al-ternative methods accounting for inconsistency include models that introduce “inconsistencyparameters” that absorb any variability due to conflict beyond between-study heterogeneity(Lu and Ades, 2006; Higgins et al., 2012; Jackson et al., 2014). Higgins et al. (2012); Jacksonet al. (2014) have pointed out that the apparent algorithm that Lu and Ades (2006) follow foridentifying inconsistency parameters does not guarantee that all such parameters are identi-fied, nor that the Lu-Ades model is invariant to the choice of baseline treatment. The authorsfurther posit, and more recently have proved (Jackson et al., 2015), that their “design-by-treatment interaction model”, which introduces an inconsistency parameter systematicallyfor each non-baseline treatment within each design, contains each possible Lu-Ades modelas a sub-model. In related ongoing work, we note that each Lu-Ades model corresponds to aparticular choice of node-splitting model, one being a reparameterisation of the other. Thelack of invariance of results of testing for inconsistency from one Lu-Ades model to anotheris therefore not surprising, since, as we illustrated here, different choices of node-splittingmodel correspond to different partitions of evidence being compared. The lack of invarianceof a global test for conflict to the choice of node-splitting model, although unsurprising, isperhaps unsatisfactory: however, as we illustrated in this paper, this lack clearly emphasisesthe need for a more comprehensive and systematic assessment of conflict throughout a DAG,both at a local level and across different types of node-split model, than just a single globaltest can provide. We therefore recommend that although a global test may be an initialstep in any conflict analysis, to be sure of detecting any potential conflict requires testingfor conflict throughout a DAG. One strategy is to start from splitting every possible nodein the DAG, as we did in the HIV example, before looking at more targeted leave-n-out ap-proaches. The design-by-treatment interaction model provides a way of doing so and we arefurther investigating the relationship of the (fixed inconsistency effects) design-by-treatmentinteraction model to such a “saturated” node-splitting model.Note that in the NMA example considered here, we have concentrated on a “contrast-based”as opposed to “arm-based” parameterisation (Hong et al., 2016; Dias and Ades, 2016). Also,we have considered the case where each study has a study-specific baseline treatment B d andthe network as a whole has a baseline treatment A . However, alternative parameterisationscould be considered, such as using a two-way linear predictor with main effects for bothtreatment and study, treating the counter-factual or missing treatment designs as missingdata (Jones et al., 2011; Piepho et al., 2012). Although we have not yet explored alternativeparameterisations, we posit that systematic node-splitting could be equally well applied.As with any cross-validatory work, the systematic assessment of conflict at every node in aDAG can quickly become computationally burdensome as a model grows in dimension. Anarea for future research is the systematic analysis of conflict using efficient algorithms (Lunnet al., 2013; Goudie et al., 2015) in a Markov melding framework (Goudie et al., 2016) whichallows for an efficient modular approach to model building. Acknowledgements
This work was supported by the Medical Research Council [Unit Programme number U105260566];and the Polish National Science Centre [grant no. DEC-2012/05/E/ST1/02218]. The au-thors also thank Ian White and Dan Jackson for their very helpful comments.13 eferences
Ades, A. E. and A. J. Sutton (2006). Multiparameter evidence synthesis in epidemiologyand medical decision-making: current approaches.
JRSS(A) 169 (1), 5–35.Bayarri, M. J. and J. O. Berger (2000). P-values for composite null models.
JASA 95 (452),1127–1142.Bernardo, J. M. and A. F. M. Smith (1994).
Bayesian Theory . John Wiley & Sons, Inc.Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness.
JRSS(A) 143 (4), 383–430.Bretz, F., T. Hothorn, and P. Westfall (2011).
Multiple Comparisons Using R (First ed.).Chapman and Hall/CRC.Chaloner, K. and R. Brant (1988). A Bayesian approach to outlier detection and residualanalysis.
Biometrika 75 (4), 651–659.Commenges, D. and B. Hejblum (2013). Evidence synthesis through a degradation modelapplied to myocardial infarction.
Lifetime Data Analysis 19 (1), 1–18.De Angelis, D., A. M. Presanis, P. J. Birrell, G. S. Tomba, and T. House (2014). Four keychallenges in infectious disease modelling using data from multiple sources.
Epidemics .Dias, S. and A. E. Ades (2016). Absolute or relative effects? arm-based synthesis of trialdata.
Res. Syn. Meth. 7 (1), 23–28.Dias, S., N. J. Welton, D. M. Caldwell, and A. E. Ades (2010). Checking consistency inmixed treatment comparison meta-analysis.
Stat. Med. 29 (7-8), 932–944.Gelman, A. (2013). Two simple examples for understanding posterior p-values whose distri-butions are far from uniform.
Electron. J. Stat. 7 (0), 2595–2602.Gelman, A., X.-L. Meng, and H. Stern (1996). Posterior predictive assessment of modelfitness via realized discrepancies.
Statistica Sinica 6 , 733–807.Goudie, R. J. B., R. Hovorka, H. R. Murphy, and D. Lunn (2015). Rapid model explorationfor complex hierarchical data: application to pharmacokinetics of insulin aspart.
Stat.Med. 34 (23), 3144–3158.Goudie, R. J. B., A. M. Presanis, D. J. Lunn, D. De Angelis, and L. Wernisch(2016). Model surgery: joining and splitting models with Markov melding.https://arxiv.org/abs/1607.06779.G˚asemyr, J. (2016). Uniformity of node level conflict measures in Bayesian hierarchicalmodels based on directed acyclic graphs.
Scand. J. Stat. 43 (1), 20–34.G˚asemyr, J. and B. Natvig (2009). Extensions of a conflict measure of inconsistencies inBayesian hierarchical models.
Scand. J. Stat. 36 (4), 822–838.Higgins, J. P. T., D. Jackson, J. K. Barrett, G. Lu, A. E. Ades, and I. R. White (2012).Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies.
Res. Syn. Meth. 3 (2), 98–110.14ong, H., H. Chu, J. Zhang, and B. P. Carlin (2016). A Bayesian missing data framework forgeneralized multiple outcome mixed treatment comparisons.
Res. Syn. Meth. 7 (1), 6–22.Hothorn, T., F. Bretz, and P. Westfall (2008). Simultaneous inference in general parametricmodels.
Biometrical J. 50 (3), 346–363.Jackson, D., J. K. Barrett, S. Rice, I. R. White, and J. P. T. Higgins (2014). A design-by-treatment interaction model for network meta-analysis with random inconsistency effects.
Stat. Med. 33 (21), 3639–3654.Jackson, D., P. Boddington, and I. R. White (2015). The design-by-treatment interactionmodel: a unifying framework for modelling loop inconsistency in network meta-analysis.
Res. Syn. Meth. 7 (3), 329–32.Jones, B., J. Roger, P. W. Lane, A. Lawton, C. Fletcher, J. C. Cappelleri, H. Tate, andP. Moneuse (2011). Statistical approaches for conducting network meta-analysis in drugdevelopment.
Pharma. Stat. 10 (6), 523–531.Krahn, U., H. Binder, and J. K¨onig (2013). A graphical tool for locating inconsistency innetwork meta-analyses.
BMC Med. Res. Method. 13 (1), 35+.Lauritzen, S. L. (1996).
Graphical Models . Oxford Statistical Science Series. OUP.Lu, G. and A. E. Ades (2006). Assessing evidence inconsistency in mixed treatment com-parisons.
JASA 101 (474), 447–459.Lumley, T. (2002). Network meta-analysis for indirect treatment comparisons.
Stat.Med. 21 (16), 2313–2324.Lunn, D., J. K. Barrett, M. Sweeting, and S. Thompson (2013). Fully Bayesian hierarchicalmodelling in two stages, with application to meta-analysis.
JRSS(C) 62 (4), 551–572.Lunn, D., D. J. Spiegelhalter, A. Thomas, and N. Best (2009). The BUGS project: Evolution,critique and future directions.
Stat. Med. 28 (25), 3049–3067.Marshall, E. C. and D. J. Spiegelhalter (2007). Identifying outliers in Bayesian hierarchicalmodels: a simulation-based approach.
Bayesian Analysis 2 , 409–444.Piepho, H. P., E. R. Williams, and L. V. Madden (2012). The use of two-way linear mixedmodels in multi-treatment meta-analysis.
Biometrics 68 (4), 1269–1277.Presanis, A. M., D. Ohlssen, D. J. Spiegelhalter, and D. De Angelis (2013). Conflict diag-nostics in directed acyclic graphs, with applications in Bayesian evidence synthesis.
Stat.Sci. 28 (3), 376–397.R Core Team (2015).
R: a language and environment for statistical computing . Vienna,Austria: R Foundation for Statistical Computing.Rosinska, M., P. Gwiazda, D. De Angelis, and A. M. Presanis (2016). Bayesian evidencesynthesis to estimate HIV prevalence in men who have sex with men in Poland at the endof 2009.
Epidemiol. Infect. 144 , 1175–1191.Salanti, G. (2012). Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next gen-eration evidence synthesis tool.
Res. Syn. Meth. 3 (2), 80–97.15piegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measuresof model complexity and fit.
JRSS(B) 64 (4), 583–639.Steinbakk, G. H. and G. O. Storvik (2009). Posterior predictive p-values in Bayesian hier-archical models.
Scand. J. Stat. 36 (2), 320–336.Sturtz, S., U. Ligges, and A. Gelman (2005). R2WinBUGS: a package for running WinBUGSfrom R.
J. Stat. Softw. 12 (3), 1–16.van Valkenhoef, G., S. Dias, A. E. Ades, and N. J. Welton (2016). Automated generation ofnode-splitting models for assessment of inconsistency in network meta-analysis.
Res. Syn.Meth. 7 (1), 80–93.van Valkenhoef, G., T. Tervonen, B. de Brock, and H. Hillege (2012). Algorithmic parame-terization of mixed treatment comparisons.
Stat. Comp. 22 (5), 1099–1111.Welton, N. J., A. J. Sutton, N. J. Cooper, K. R. Abrams, and A. E. Ades (2012).
EvidenceSynthesis in a Decision Modelling Framework . John Wiley & Sons, Ltd.White, I. R., J. K. Barrett, D. Jackson, and J. P. T. Higgins (2012). Consistency and incon-sistency in network meta-analysis: model estimation using multivariate meta-regression.
Res. Syn. Meth. 3 (2), 111–125. 16 y φ ( ii ) . . .. . .. . .. . . y φ . . .φ a y a,φ . . .. . .. . .. . . y φ . . .φ b y b,φ . . . ( iii ) ψ i ψ . . . ψ i +1 ψ N . . .y y i y i +1 y N . . . . . .θ k θ . . . θ k +1 θ K . . . ( i ) Figure 1: (i) Example DAG G ( V , E ) showing a generic evidence synthesis. (ii) & (iii)Example node-split at separator node φ : (ii) original model G ( φ , y ); (iii) node-split model.In (ii): the data y = { y φ , y φ } comprise data y φ that are direct descendents of φ ; and theremaining data y φ . In (iii): when splitting G ( φ , y ) into partitions a and b , the data vector y φ is split into y a,φ and y b,φ , whereas y φ remains only in partition a . The partition a dataare therefore y a = { y a,φ , y φ } and the partition b data are y b = y b,φ .17able 1: Multiply adjusted posterior mean (sd) estimates of conflict between partitions,for each model (b)-(f) respectively. In the two-partition models (b,c,e,f), partition 1 is thespanning tree (indirect) evidence partition and partition 2 is the direct data partition. Inmodel (d), partitions 1-3 are the sequential spanning trees and partition 4 is the multi-armstudy partition. ST:
AB,AC,AD AB,AC,BD
Model: (b) (c) (d) (e) (f)Posterior: Mean SD Mean SD Mean SD Mean SD Mean SD AB AB -0.415 (5.276) 0.319 (0.983) -0.230 ( 5.849) 6.261 (3.251) 1.513 (1.041) AB -0.044 (10.009) AB AC AC -0.165 (5.262) 0.615 (0.866) -0.379 ( 5.848) 6.173 (3.140) 1.496 (0.835) AC AC AD AD AD AD AD − -11.804 (6.268) -1.354 (2.279) p AD − BC BC BC BC BC − BC − BC − p BC − p BC − p BC − BD BD BD BD -0.017 ( 0.948)∆ BD − -0.140 (1.067) 8.639 (5.069) 8.079 ( 5.444)∆ BD − BD − p BD − p BD − p BD − CD CD CD CD -0.345 ( 0.714)∆ CD − -0.294 (0.902) 8.459 (5.033) 7.430 ( 5.526) -6.443 (3.287) -0.682 (1.893)∆ CD − CD − p CD − p CD − p CD − p U denotesthe unadjusted conflict p-value; p AW is the p-value adjusted for the multiple tests carriedout within each model (A)-(J) for the leave-2-out approach; p AL is the p-value adjusted forthe 23 tests carried out in all models (A)-(J) for the leave-2-out approach; and p AA is thep-value adjusted for 28 tests carried out in all leave-1-out models (A)-(E) and all leave-2-outmodels (A)-(J).Model Partition 1 Partition 2 Node split p U p AW p AL p AA Leave-1-out(A) y { y , y , y , y } ρ < . y { y , y , y , y } πκ < . y { y , y , y , y } π (1 − κ ) 0 . y { y , y , y , y } D L . y { y , y , y , y } D U < . { y , y } { y , y , y } ρ πκ { y , y } { y , y , y } ρ < . π (1 − κ ) 0.4906 0.7717 1.0000 1.0000(C) { y , y } { y , y , y } πκ < . < . < . < . π (1 − κ ) 0.8322 0.9000 1.0000 1.0000 π κ { y , y } { y , y , y } ρ < . D L { y , y } { y , y , y } ρ D U { y , y } { y , y , y } πκ D L { y , y } { y , y , y } πκ D U { y , y } { y , y , y } π (1 − κ ) 0.1471 0.3330 0.9855 0.9944 D L { y , y } { y , y , y } π (1 − κ ) 0.5237 0.8100 1.0000 1.0000 D U < . { y , y } { y , y , y } D L D U < . D < . B CD AB CD ( a ) ( b ) AB CD ( f ) AB CD ( c ) AB CD ( d ) AB CD ( e ) Figure 2: Smoking cessation evidence network, under (a) a consistency assumption; (b)-(f)inconsistency assumptions, where the evidence is partitioned in different ways. In (b), (c),(e) and (f), the direct evidence (dashed lines) is compared with the indirect evidence (solidlines) on each contrast where there is a dashed line. In (d), the evidence is separated intothree spanning trees and a fourth partition for the multi-arm trial evidence.20 y πκy π (1 − κ ) y D L y D U y c Nρπκ π κ D = N ( a ) ρ p y π p κ p y π p (1 − κ p ) y D L p y D U p y c p Nρ p π p κ p π p κ p D p = N ( b ) ρ d [ πκ ] d [ π (1 − κ )] d D L d D U d c d D d Figure 3: (a) DAG of initial model for synthesising Polish HIV prevalence data. (b) DAG ofmultiple node-split model comparing priors to each likelihood contribution. Note that thesquare brackets are used in denoting the nodes in the likelihood partition ([ πδ ] d , [ π (1 − δ )] d )to emphasise the fact that these two nodes are independent parameters not functionallyrelated to each other. 21 ogit ( r p ) - logit ( r d ) Difference D en s i t y −10 −5 0 5 10 15 . . . . . . p U = A = logit ( p p k p ) - logit ( p d k d ) Difference D en s i t y −15 −10 −5 0 5 . . . . . . p U = A = logit ( p p ( - k p )) - logit ( p d ( - k d )) Difference D en s i t y −10 −5 0 5 . . . . . . p U = A = log ( D L p ) - log ( D L d ) Difference D en s i t y −40 −30 −20 −10 0 10 . . . . . . p U = A = log ( D U p ) - log ( D U d ) Difference D en s i t y −10 0 10 20 30 40 . . . . . . p U = A = log ( D p ) - log ( D d ) = log ( N r p p p k p ) - log ( N r d p d k d ) Difference D en s i t y −10 −5 0 5 10 . . . . . . p U = A = Figure 4: Posterior distributions of the contrasts ∆ for the HIV prevalence example. Thered lines denote 0 difference, p U is the unadjusted and p A the multiply-adjusted individualconflict p-value respectively. 22 upplementary Material A Figures and Tables ( a ) ( b ) η f η b Σ β y Jdi i ∈ . . . N d y B d di y Jdi α di p Jdi p Jdi µ B d Jdi µ B d Jdi J = B d J = B d d ∈ . . . Dη f η b y Jdi i ∈ . . . N d y B d di y Jdi α di p Jdi p Jdi µ B d Jdi µ B d Jdi J = B d B d = AJ = B d B d = A d ∈ . . . D B d = AB d = A Figure A.1: (a) DAG of NMA under assumptions of a common treatment effect η JK (noheterogeneity) and consistency η JK = η AK − η AJ . (b) DAG of NMA under assumptions ofrandom treatment effects, to account for heterogeneity, and consistency.23able A.1: Smoking cessation data setStudy Design y A n A y A /n A y B n B y B /n B y C n C y C /n C y D n D y D /n D E θ | y ( D ); de-viance evaluated at posterior means D ( E θ | y θ ); effective number of parameters p D ; and de-viance information criterion DIC .Model: Common-effect Random-effect µ JK : Posterior mean Posterior sd Posterior mean Posterior sd AB AC AD BC BD CD E θ | y ( D ) 267 54 D ( E θ | y θ ) 240 10 p D
27 44
DIC
294 98Table A.3: Results from initial HIV model: observations; posterior mean (sd) estimates;posterior mean deviance E θ | y ( D ); deviance evaluated at posterior means D ( E θ | y θ ); effectivenumber of parameters p D ; and deviance information criterion DIC .Parameter Data Estimates Deviance summaries θ y n y/n ˆ y ˆ θ E θ | y ( D ) D ( E θ | y θ ) p D DICρ
35 1536 0.023 14.6 ( 1.5) 0.010 (0.001) 21.0 20.7 0.4 21.4 πκ
113 2840 0.040 92.5 ( 8.9) 0.033 (0.003) 5.5 4.4 1.1 6.5 π (1 − κ ) 136 2725 0.050 136.7 (11.3) 0.050 (0.004) 1.0 0.0 1.0 2.0 D L
836 836.2 (28.9) 836.2 (28.9) 1.0 0.0 1.0 2.0 D U DEf η STb y Jdi i ∈ . . . N d y B d di y Jdi α di p Jdi p Jdi µ B d Jdi µ B d Jdi J = B d B d = AJ = B d B d = A d ∈ STy B d di α di η STf d / ∈ STi ∈ . . . N d η DEb
Figure A.2: DAG of common-effect network meta-analysis model, split into direct (DE) andindirect (ST) evidence informing the functional parameters η f , i.e. those edges outside ofthe spanning tree formed by the basic parameters η b .26 . . . . . . . log ( D p ) vs log ( D d ) N = 30000 Bandwidth = 0.05 D en s i t y log ( D p ) log ( D d ) log ( D ) vs log ( D ) N = 30000 Bandwidth = 0.05 D en s i t y log ( D L2 ) log ( D L1 ) log ( D U2 ) log ( D U1 ) log ( D ) log ( D ) Figure A.3: Upper panel: Posterior distributions of the nodes D p and D d for the HIVprevalence example, on the log scale. The right-hand blue line denotes where the totalpopulation of Poland ( N = 15 , , a priori forthe number diagnosed. The left-hand blue line denotes the value log( N × . ), i.e. theprior mean of log( D p ) = log( N ρ p π p κ p ). Lower panel: Posterior distributions of the nodes D L , D L , D U , D U , D and D for the HIV prevalence “leave-2-out” node-split model (J), onthe log scale. The dashed lines represent the nodes in partition 1, i.e. the “left-out” partition,where the posteriors are based only on the likelihood given by { y , y } and Jeffreys’ priorsfor D L , D U . The solid lines give the corresponding posteriors in partition 2, i.e. based onall the original model priors and on the dataset { y , y , y } .27 Asymptotics
Let p ( θ ) , . . . , p ( θ Q ) denote the set of prior distributions for the basic parameters θ q in eachpartition q . Then by the independence of each partition, the joint posterior distribution ofall parameters φ in all partitions is p ( φ | y ) = Q (cid:89) q =1 p ( θ q ) p ( y q | θ q ) . If the joint prior distribution is dominated by the likelihood, then asymptotically (Bernardoand Smith, 1994), the joint posterior distribution of all nodes is multi-variate normal: φ | y a ∼ N (cid:80) q n q (cid:16) ( ˆ φ , . . . , ˆ φ Q ) , V (cid:17) where n q is the total number of parameters in partition q , whether basic or not, and V isthe inverse observed information matrix for the parameters φ . Since the vector of separatornodes, φ S = ( φ ( s )1 , . . . , φ ( s ) Q ), is a subset of φ , their joint posterior is also multivariatenormal: φ S | y a ∼ N m (cid:16) ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , V S (cid:17) (4)where m = (cid:80) q m q is the total number of separator nodes, including node-split copies, and V S is the appropriate sub-matrix of V . Since the partitions are independent, V S is a blockeddiagonal matrix consisting of the inverse observed information matrices for separator nodesin each partition along the diagonal.By theorem 5.17 of Bernardo and Smith (1994), since (4) holds and if J h ( φ S ) = ∂ h ( φ S ) ∂ φ S is non-singular with continuous entries, then the posterior distribution of the transformedseparator nodes, φ H = h ( φ S ), is also asymptotically normal: φ H | y a ∼ N m (cid:16) h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , J h ( ˆ φ S ) T V S J h ( ˆ φ S ) (cid:17) The Jacobian J h ( φ S ) exists and is non-singular for the sorts of transformations we use inpractice, for example log and logit transformations.A further application of theorem 5.17 of Bernardo and Smith (1994) results in a posteriordistribution of the contrasts ∆ that is also aymptotically multivariate normal, if ∂ ∆( φ ) ∂ φ = C ∆ T is non-singular with continuous entries, which as a contrast matrix it is: ∆ | y a ∼ N p (cid:16) C ∆ T h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , C ∆ T J h ( ˆ φ S ) T V S J h ( ˆ φ S ) C ∆ (cid:17) (5)= N p (cid:16) C ∆ T ˆ φ H , C ∆ T V H C ∆ (cid:17) for V H = J h ( ˆ φ S ) T V S J h ( ˆ φ S ). Asymptotically, therefore, the posterior mean ∆ = C ∆ T φ H a ≈ C ∆ T ˆ φ H and the posterior variance-covariance matrix of ∆ is S ∆∆
Let p ( θ ) , . . . , p ( θ Q ) denote the set of prior distributions for the basic parameters θ q in eachpartition q . Then by the independence of each partition, the joint posterior distribution ofall parameters φ in all partitions is p ( φ | y ) = Q (cid:89) q =1 p ( θ q ) p ( y q | θ q ) . If the joint prior distribution is dominated by the likelihood, then asymptotically (Bernardoand Smith, 1994), the joint posterior distribution of all nodes is multi-variate normal: φ | y a ∼ N (cid:80) q n q (cid:16) ( ˆ φ , . . . , ˆ φ Q ) , V (cid:17) where n q is the total number of parameters in partition q , whether basic or not, and V isthe inverse observed information matrix for the parameters φ . Since the vector of separatornodes, φ S = ( φ ( s )1 , . . . , φ ( s ) Q ), is a subset of φ , their joint posterior is also multivariatenormal: φ S | y a ∼ N m (cid:16) ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , V S (cid:17) (4)where m = (cid:80) q m q is the total number of separator nodes, including node-split copies, and V S is the appropriate sub-matrix of V . Since the partitions are independent, V S is a blockeddiagonal matrix consisting of the inverse observed information matrices for separator nodesin each partition along the diagonal.By theorem 5.17 of Bernardo and Smith (1994), since (4) holds and if J h ( φ S ) = ∂ h ( φ S ) ∂ φ S is non-singular with continuous entries, then the posterior distribution of the transformedseparator nodes, φ H = h ( φ S ), is also asymptotically normal: φ H | y a ∼ N m (cid:16) h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , J h ( ˆ φ S ) T V S J h ( ˆ φ S ) (cid:17) The Jacobian J h ( φ S ) exists and is non-singular for the sorts of transformations we use inpractice, for example log and logit transformations.A further application of theorem 5.17 of Bernardo and Smith (1994) results in a posteriordistribution of the contrasts ∆ that is also aymptotically multivariate normal, if ∂ ∆( φ ) ∂ φ = C ∆ T is non-singular with continuous entries, which as a contrast matrix it is: ∆ | y a ∼ N p (cid:16) C ∆ T h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , C ∆ T J h ( ˆ φ S ) T V S J h ( ˆ φ S ) C ∆ (cid:17) (5)= N p (cid:16) C ∆ T ˆ φ H , C ∆ T V H C ∆ (cid:17) for V H = J h ( ˆ φ S ) T V S J h ( ˆ φ S ). Asymptotically, therefore, the posterior mean ∆ = C ∆ T φ H a ≈ C ∆ T ˆ φ H and the posterior variance-covariance matrix of ∆ is S ∆∆ a ≈ C ∆∆
Let p ( θ ) , . . . , p ( θ Q ) denote the set of prior distributions for the basic parameters θ q in eachpartition q . Then by the independence of each partition, the joint posterior distribution ofall parameters φ in all partitions is p ( φ | y ) = Q (cid:89) q =1 p ( θ q ) p ( y q | θ q ) . If the joint prior distribution is dominated by the likelihood, then asymptotically (Bernardoand Smith, 1994), the joint posterior distribution of all nodes is multi-variate normal: φ | y a ∼ N (cid:80) q n q (cid:16) ( ˆ φ , . . . , ˆ φ Q ) , V (cid:17) where n q is the total number of parameters in partition q , whether basic or not, and V isthe inverse observed information matrix for the parameters φ . Since the vector of separatornodes, φ S = ( φ ( s )1 , . . . , φ ( s ) Q ), is a subset of φ , their joint posterior is also multivariatenormal: φ S | y a ∼ N m (cid:16) ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , V S (cid:17) (4)where m = (cid:80) q m q is the total number of separator nodes, including node-split copies, and V S is the appropriate sub-matrix of V . Since the partitions are independent, V S is a blockeddiagonal matrix consisting of the inverse observed information matrices for separator nodesin each partition along the diagonal.By theorem 5.17 of Bernardo and Smith (1994), since (4) holds and if J h ( φ S ) = ∂ h ( φ S ) ∂ φ S is non-singular with continuous entries, then the posterior distribution of the transformedseparator nodes, φ H = h ( φ S ), is also asymptotically normal: φ H | y a ∼ N m (cid:16) h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , J h ( ˆ φ S ) T V S J h ( ˆ φ S ) (cid:17) The Jacobian J h ( φ S ) exists and is non-singular for the sorts of transformations we use inpractice, for example log and logit transformations.A further application of theorem 5.17 of Bernardo and Smith (1994) results in a posteriordistribution of the contrasts ∆ that is also aymptotically multivariate normal, if ∂ ∆( φ ) ∂ φ = C ∆ T is non-singular with continuous entries, which as a contrast matrix it is: ∆ | y a ∼ N p (cid:16) C ∆ T h ( ˆ φ ( s )1 , . . . , ˆ φ ( s ) Q ) , C ∆ T J h ( ˆ φ S ) T V S J h ( ˆ φ S ) C ∆ (cid:17) (5)= N p (cid:16) C ∆ T ˆ φ H , C ∆ T V H C ∆ (cid:17) for V H = J h ( ˆ φ S ) T V S J h ( ˆ φ S ). Asymptotically, therefore, the posterior mean ∆ = C ∆ T φ H a ≈ C ∆ T ˆ φ H and the posterior variance-covariance matrix of ∆ is S ∆∆ a ≈ C ∆∆ T V H C ∆∆