[PDF] A Note on Using Discretized Simulated Data to Estimate Implicit Likelihoods in Bayesian Analyses

Abstract

This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.

Full PDF

AA Note on Using Discretized SimulatedData to Estimate Implicit Likelihoods in

Bayesian Analyses

M. S. Hamada , T. L. Graves , N. W. Hengartner , D. M. Higdon ,A. V. Huzurbazar , E. C. Lawrence , C. D. Linkletter , C. S. Reese ,D. W. Scott , R. R. Sitter , R. L. Warr , and B. J. Williams Los Alamos National Laboratory Berry Consultants Virginia Polytechnic Institute The Lego Group Brigham Young University Rice University Simon Fraser UniversityAugust 10, 2020

Abstract

This article presents a Bayesian inferential method where the likelihood fora model is unknown but where data can easily be simulated from the model.We discretize simulated (continuous) data to estimate the implicit likelihood ina Bayesian analysis employing a Markov chain Monte Carlo algorithm. Threeexamples are presented as well as a small study on some of the method’sproperties.

KEY WORDS: Gaussian process, interval-censored, Latin HypercubeSample, Markov chain Monte Carlo (MCMC), probability, relativefrequency a r X i v : . [ s t a t . M E ] A ug Introduction

This article considers the situation where the likelihood of the data is unknown, i.e.,an implicit likelihood. However, we assume that it is easy to simulate data from thedata model and that they can be used to estimate the likelihood, i.e., an estimatedlikelihood. We want to use the implicit likelihood estimate in a Bayesian analysisthat incorporates prior knowledge.Approximate Bayesian Computation or ABC Sisson et al. (2019) is the preemi-nent so-called “likelihood-free” method today for handling implicit or intractable like-lihoods and ABC also employs simulation. ABC compares simulated data summarieswith observed data summaries to decide whether to accept the model parameters val-ues used to generate the simulated data as samples from the posterior distribution.ABC relaxes the acceptance requirements that makes the sampling more eﬃcient butresults in samples drawn from an approximate posterior distribution.Lerman and Manski (1981) estimate probabilities of probability mass functionsand Diggle and Gratton (1984) use kernel density estimates (KDEs) in implicit likeli-hoods by simulating data for maximum likelihood estimation. Drovandi et al. (2015)presents Indirect Inference (II) methods that employ auxiliary models that are dif-ferent from the data models, possibly even with diﬀerent parameters, and makesconnections to ABC. Within II, Drovandi et al. (2015) views the use of KDEs as anIndirect Likelihood (IL) method of an auxiliary likelihood, i.e., the estimated like-lihood where the summary statistics are the data. Sisson et al. (2007) and Fluryand Shephard (2011) also use sequential Monte Carlo (SMC) to handle implicitlikelihoods within particle ﬁlters that rely on simulation. Price et al. (2018) usesa multivariate normal approximation of an implicit likelihood in a method that isreferred to as synthetic likelihood.Despite Drovandi et al. (2015) making the connection between KDEs and ABCmentioned above, the literature seems to rather silent on their use in Bayesian anal-yses. Perhaps the silence is because the dimension of the response needs to be small,say, less than 10, but the early applications motivating ABC had large dimensionalresponses. Still in many engineering and physical sciences applications that we areinvolved with, the dimension of the response is small. However, when we used aKDE embedded in a Markov chain Monte Carlo (MCMC) algorithm, we experiencedseveral problems. With diﬀerent bandwidth choices, either the MCMC samples didnot mix well or they were seriously biased. Here, we applied KDEs in a brute forcemanner; that is, for every candidate parameter value, we simulated data, computedthe KDE and used the KDE to evaluate the likelihood, i.e., the estimated likelihood.In this article, we consider discretizing the simulated (continuous) data and estimat-2ng the interval-censored probabilities by the simulated proportions. We then usethese estimated probabilities to estimate the discretized data likelihood. We referto the proposed method as SimILE for Simulated Implicit Likelihood Estimation forshorthand whereas the long name is discretized simulated data for implicit discretizeddata likelihood estimation.The article is organized as follows. Section 2 describes the SimILE method.Section 3 presents three examples. Section 4 explores some aspects of the SimILEmethod as well as using Gaussian process emulation as an alternative to the bruteforce use of simulation mentioned above. Section 5 concludes this article with adiscussion.

In this article, we propose discretizing continuous data so that the likelihood consistsof probabilities of the discretized or interval-censored data. At some level, all contin-uous data are discrete because they are measured with error. Rounding is anotherexample of discretization that can have little impact on inference, even for moderateamounts of rounding Hamada (1989, 1991).An algorithm for the SimILE method is as follows: I. Calculate data-based intervals.

Ia.

Let nInt be the target number of intervals.

Ib.

Divide the range of the observed data y into nInt +1 evenly spaced intervalsof width w , where w = [max( y ) − min( y )] /nInt and the ﬁrst interval startsat min( y ) − w/ y ) + w/ Ic.

Add intervals before the ﬁrst and last intervals if needed to cover the sup-port of the distribution generating the data.

II.

Calculate the discretized implicit likelihood.

IIa.

For a given set of parameter values, simulate nSim data, ˜ y , . . . , ˜ y nSim . IIb.

Use the simulated data to compute the relative frequencies of the data-based intervals.

IIc.

As in Flury and Shephard (2011), the discretized implicit likelihood esti-mate is the product of the relative frequencies for the data-based intervalsinto which the observed data y fall.3 II.

For multivariate data, apply the same discretization simultaneously to all thecoordinates.To help understand this algorithm better, consider a simple one parameter exam-ple of n = 25 observed data generated by a N ormal (0 , ) distribution, where µ isthe parameter of interest whose true value is µ = 0. We use the prior N ormal (1 , )for µ , and nInt = 50 and nSim = 10 in the algorithm. In Step Ib, the range of y , the observed data of size 25, is ( − . , . w = 0 . − .

000 and the end of the last interval is 1 . −∞ , − . . , ∞ ) in Step Ic. Suppose that we evaluate the discretized implicit likelihoodestimate at µ = 0 .

5. Then, from one realization of ˜ y , . . . , ˜ y nSim in Step IIa, the rel-ative frequencies in Step IIc are 0.0002348, 0.0000846, . . . , 0.0276046, 0.2611034; seethe plot of the relative frequencies and the observed data (jiggled) in Figure 1. Basedon Step IIc, the discretized implicit likelihood estimate loglikelihood is the sum ofthe log of the relative frequencies of the data-based intervals into which the observeddata y fall is − . µ . After dropping the ﬁrst 1,000 draws as burnin,Table 1 summarizes the subsequent 10,000 draws. We see that the exact likelihoodand discretized implicit likelihood estimate produce very similar results. We also seethe similarity in the display in their posterior distributions in Figure 2.4 . . . . . . y R e l a t i v e F r equen cy l ll ll ll lll l llll lll ll llll l Figure 1: Plot of relative frequencies and observed data (jiggled) for the normalexample. Table 1: Posterior Summaries for µ for the Normal ExampleMethod Mean Std. Dev. 2.5% 50% 97.5%Exact Likelihood -0.224 0.201 -0.612 -0.229 0.174SimILE -0.213 0.199 -0.607 -0.213 0.1685 . . . . D en s i t y Figure 2: Plot of posterior distributions for the normal example, exact likelihood asthe solid line and SimILE method as the dotted line.As mentioned in Section 1, we tried to treat the data directly as continuous bysimulating data and using kernel density estimates of probability density functions toobtain implicit likelihood estimates. However, for MCMC algorithms, our experiencefound that recommended bandwidths did not work, yielding either biased results orthe MCMC algorithms getting stuck and not mixing properly. Moreover, when thebandwidths were shortened to reduce the bias, the MCMC algorithms also got stuckand and did not mix properly. On the other hand, the SimILE method estimates thediscretized (interval-censored) probabilities by unbiased simulated relative frequen-cies. Note that Flury and Shephard (2011) cites Andrieu et al. (2010) who shows howan unbiased estimate of the likelihood embedded in an MCMC algorithm providesproper draws from the posterior distribution.6

Examples

In this section, we demonstrate the proposed SimILE method with three examples.The ﬁrst arises when indirect measurements are collected. The second illustratesa situation where there is a mixture of implicit and explicit likelihoods. The thirdinvolves bivariate data.

Consider a situation in which a population of maximum distances x is of interestthat follow a speciﬁed distribution. A measurement is taken at a randomly chosenangle θ of the part so that the measured distance y is an attenuation of the maximumdistance x and is related by y = x | sin( θ ) | ; we refer to y as an indirect measurement.Here, we assume that the maximum distances x follow a Lognormal (0 ,

1) distributionwith µ = 0 and σ = 1.Such indirect measurements are easily simulated by draws of lognormally dis-tributed maximum distances and pairing them with draws of uniformly distributedangles. Thereby, the proposed SimILE method can be applied.We simulated a data set of size N with N = 50, where the true parameter valuesare µ = 0 and σ = 1. The SimILE method was applied using nInt = 100 and nSim =10 with the following priors: µ ∼ N ormal (0 ,

10) and σ ∼ Lognormal (0 , µ, σ ) that shows that they are mixing well. The results for theSimILE method are given in Table 2. 7 − . − . . m . . . s Figure 3: SimILE method posterior draws of µ and σ for indirect measurementsexample.Table 2: Posterior Summaries for the Implicit Measurement ExampleMethod Parameter Mean Std. Dev. 2.5% 50% 97.5%SimILE µ -0.104 0.172 -0.449 -0.104 0.215 σ µ -0.110 0.178 -0.472 -0.108 0.223Likelihood σ y can be expressed as an integralGraves and Hamada (2005). See the results using the exact likelihood in Table 2.The results for the SimILE method are nearly the same.8 .2 A Flowgraph with a Mixture of Implicit and ExplicitLikelihoods Figure 4 displays a simple ﬂowgraph of a process Huzurbazar (2005) that can bein one of three states (0, 1, 2) such as working, degraded and failed. Starting instate 0, the transition time to state 1 is described by the cumulative distributionfunction (cdf) F ( x ). With probability p , there is a transition back to state 0 intime described by the cdf F ( x ). Finally, a transition from state 1 to state 2, theﬁnal state, occurs with probability 1 − p in time described by the cdf F ( x ). F (x) (1-p)F (x)pF (x) Figure 4: Flowgraph for transitions from state 0 to state 1, the return back to state0 and to ﬁnal state 2.Consider the situation where we have transition times for the three transitions,0 to 1, 1 to 0, and 1 to 2. The transition times provide information about the threetransition time distributions. Here, we assume that the transition time distributionsare gamma distributions parameterized in terms of their means and variances denotedgenerically by µ and σ , respectively. From the observed transitions, we can alsoestimate p with the proportion of transitions from state 1 to state 0. Independently,we have transition times from state 0 to state 2 in which the number of returns tostate 0 is unknown; these data provide information about the model parameters fromthe three transition time distributions and p .In this example, there are ﬁve components to the likelihood: the transition timesfor 0 to 1, 1 to 0, and 1 to 2, the binomial count x , the number of transitions from1 to 0 out of n transitions from state 1, and the 0 to 2 transition times. The ﬁrstthree are products of gamma probability density functions (pdfs) and the fourth isa binomial probability. The ﬁfth is a product of implicit pdfs. That is, this exampleillustrates a situation where both implicit and explicit pdfs that contribute to thelikelihood. We simulated data using µ = 6, σ = 136 / µ = 4, σ = 16 / = 10, σ = 100 /

12, and p = 0 .

4. There were 28 state 0 to 1 transition times, 11state 1 to 0 transition times, and 17 state 1 to 2 transition times, with x = 11 out of n = 28. Independent of these data, we also have an 25 state 0 to state 2 transitiontimes.We can use the SimILE method for the state 0 to 2 transition time pdfs, inwhich use nInt = 100 and nSim = 10 . For the other data, we use the exact likeli-hoods. Priors for µ , σ , µ , σ , µ , σ , p : µ ∼ Lognormal (log(6) , σ ∼ Lognormal (log(136 / , µ ∼ Lognormal (log(4) , σ ∼ Lognormal (log(16 / , µ ∼ Lognormal (log(10) , σ ∼ Lognormal (log(100 / , p ) ∼ N ormal (0 , µ , σ , µ , σ , µ , σ , p .Table 3: Posterior Summaries for the Flowgraph ExampleMethod Parameter Mean Std. Dev. 2.5% 50% 97.5%SimILE µ σ µ σ µ σ p µ σ µ σ µ σ p .3 A Network With Bivariate Data Consider the simple network displayed in Figure 5 with three links. Each link hasa transition time X i distributed as Exponential ( λ i ) with mean 1 /λ i . The observeddata from the network consist of the pairs ( Y , Y ), where Y = X + X and Y = X + X ; Y and Y are clearly dependent.We simulated N = 300 observed pairs ( Y , Y ) using λ = 0 . λ = 1 /

15 and λ =1 /

40 and applied the SimILE method with nInt = 50 and nSim = 10 . We usedthe following priors: λ ∼ Lognormal (log(0 . , λ ∼ Lognormal (log(1 / , λ ∼ Lognormal (log(1 / , X X Figure 5: Simple network consisting of three links with transition times X i , i = 1 , , λ λ λ λ λ λ y , y ) has a closed form expression Lawrenceet al. (2007). See the results using the exact likelihood in Table 4. The results forthe SimILE method are nearly the same.12 Brieﬂy Exploring the SimILE Method

Here, we consider data distributed as

Lognormal (0 ,

1) of size N = 50. We use thepriors: µ ∼ N ormal (0 ,

10) and σ ∼ Lognormal (0 , nInt = 100 and nSim = 10 that performed well.This choice also works well here although nSim = 10 performs somewhat better,whereas nInt = 50 performs somewhat worse as does nSim = 10 . In applying themethod, the practitioner can ballpark estimates for the model parameters (say, using nInt = 100 and nSIm = 10 ) and explore various nInt and nSim combinations likewe do simulating “observed” data to identify nInt and nSim values to be used inthe actual analysis. For example, nInt can be chosen too large so that some of theobserved data may have associated zero relative frequency intervals too often. Then,the MCMC algorithms will get stuck and not mix well.Table 5: Posterior Summaries for the SimILE Method ExplorationMethod Parameter Mean Std. Dev. 2.5% 50% 97.5%Exact µ -0.253 0.150 -0.544 -0.249 0.033Likelihood σ nInt = 100, nSim = 10 SimILE µ -0.254 0.147 -0.546 -0.253 0.029 σ nInt = 100, nSim = 10 SimILE µ -0.248 0.148 -0.546 -0.252 0.039 σ nInt = 50, nSim = 10 SimILE µ -0.251 0.141 -0.530 -0.253 0.033 σ nInt = 100, nSim = 10 SimILE µ -0.253 0.144 -0.536 -0.253 0.040 σ .1 Reducing Computation Using a Gaussian Process Emu-lator Rather than estimating the discretized implicit likelihood each time to evaluate acandidate, i.e., the so-called brute force approach, we might use a Gaussian processemulator as is done with computer experiments Santner et al. (2019). That is, usinga Latin Hypercube Sample (LHS) of the parameter space, we estimate the discretizedimplicit likelihood by simulation at the design speciﬁed by the LHS. Then assuminga Gaussian process (GP) model, we predict the discretized implicit likelihood usingGP emulation, a kriging estimate based on maximum likelihood estimation. Here,we predict the cumulative discretized probabilities and use the principle componentanalysis approach for handling multiple responses as implemented in the R packagemlegp Dancik and Dorman (2008).We consider the same lognormal setup as discussed in Section 4. We used a 500point maximin LHS design on the square [ − , for ( µ, log( σ )) and estimate thediscretized implicit likelihood with nInt = 100 by simulation using nSim = 10 simulations. We also used 20 principal components with the mlegp package. Asbefore we use 10,000 draws after 1,000 burnin draws. Table 6 displays the posteriorsummaries using GP emulation whose results are very similar to those reported inTable 5.Table 6: Posterior Summaries for the SimILE Method Using Gaussian Process Em-ulation Parameter Mean Std. Dev. 2.5% 50% 97.5% µ -0.250 0.146 -0.537 -0.250 0.031 σ In this article, we have proposed discretizing continuous data into intervals and us-ing simulated data to estimate the likelihood when the likelihood is implicit. Theuse of relative frequencies to estimate interval probabilities is easy to explain. Andthe method can be used when the practitioner does not know how to evaluate thelikelihood exactly. We have seen through the examples and a limited study that theSimILE method results are similar to the exact likelihood results; the SimILE methodresults are slightly more uncertain, i.e., slightly wider credible intervals. However,a practitioner may have a short deadline to analyze data and the SimILE method14rovides a viable and principled way to perform the analysis quickly when the like-lihood is implicit. (Note that for the examples presented, the exact likelihoods wereburied in our Ph.D. theses and publications, and therefore would be unknown tomost practitioners.) The practitioner can also use the SimILE method to handlecomplex multiple data source problems that involve a mixture of implicit and ex-plicit likelihood components. Also, while the SimILE method cannot handle a largedimensional response like ABC, it could be applied to the data summaries that ABCuses.

Acknowledgments

We thank C. C. Essix for her encouragement and support.

Author contributions

All the authors shared equally in the work for this article.

Conﬂict of interest

The authors declare no potential conﬂict of interests.

References

Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain MonteCarlo methods,

Journal of the Royal Statistical Society, Series B : 269–342.Chib, S. and Greenberg, E. (1995). Understanding the Metropolis Hastings algo-rithm, The American Statistician : 327–335.Dancik, G. M. and Dorman, K. S. (2008). mlegp: statistical analysis for computermodels of biological systems using R, Bioinformatics (17): 1966–1967.Diggle, P. J. and Gratton, R. J. (1984). Monte Carlo methods of inference for implicitstatistical models, Journal of the Royal Statistical Society, Series B : 193–227.Drovandi, C. C., Pettitt, A. N. and Lee, A. (2015). Bayesian indirect inference usinga parametric auxiliary model, Statistical Science : 73–95.15lury, T. and Shephard, N. (2011). Bayesian inference based on simulated likelihood:particle ﬁlter analysis of dynamic economic models, Econometric Theory : 933–956.Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B.(2014). Bayesian Data Analysis, Third Edition , CRC Press, Boca Raton.Graves, T. and Hamada, M. (2005). Making inferences with indirect measurements,

Quality Engineering : 555–559.Hamada, M. (1989). The costs of using incomplete response data for the expo-nential regression model, Communications in Statistics – Theory and Methods (5): 1691–1714.Hamada, M. (1991). The costs of using incomplete exponential data, Journal ofStatistical Planning and Inference : 317–324.Huzurbazar, A. V. (2005). Flowgraph Models for Multistate Time-to-Event Data ,John Wiley and Sons, Inc., New York.Lawrence, E., Michailidis, G. and Nair, V. N. (2007). Statistical inverse problems inactive network tomography, in R. Liu, W. Strawderman and C.-H. Zhang (eds),

Complex Datasets and Inverse Problems: Tomography, Networks and Beyond,Lecture Notes–Monograph Series Volume 54 , Institute of Mathematical Statistics,Beachwood, pp. 24–44.Lerman, S. and Manski, C. (1981). On the use of simulated frequencies to ap-proximate choice probabilities, in C. Manski and D. McFadden (eds),

StructuralAnalysis of Discrete Data with Econometric Applications , MIT Press, Boston,pp. 305–319.Price, L. F., Drovandi, C. C., Lee, A. and Nott, D. J. (2018). Bayesian syntheticlikelihood,

Journal of Computational and Graphical Statistics : 1–11.Santner, T. J., Williams, B. J., and Notz, W. I. (2019). The Design and Analysis ofComputer Experiments, Second Edition , Springer, New York.Sisson, S. A., Fan, Y. and Beaumont, M. A. (2019). Overview of ABC, in S. Sisson,Y. Fan, and M. Beaumont (eds),

Handbook of Approximate Bayesian Computation ,CRC Press, Boca Raton, pp. 3–54. 16isson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo withoutlikelihoods,

Proceedings of the National Academy of Sciences : 1760–1765.Warr, R. L. and Huzurbazar, A. V. (2010). Expanding the statistical ﬂowgraphframework to use any transition distribution,

Journal of Statistical Theory andPractice4