Integrated organic inference (IOI): A reconciliation of statistical paradigms
IIntegrated organic inference (IOI):
A reconciliation of statistical paradigms
Russell J. Bowater
Independent researcher, Sartre 47, Acatlima, Huajuapan de Le´on, Oaxaca, C.P. 69004,Mexico. Email address: as given on arXiv.org. Twitter profile: @naked statistPersonal website: sites.google.com/site/bowaterfospage
Abstract:
It is recognised that the Bayesian approach to inference can not adequately copewith all the types of pre-data beliefs about population quantities of interest that are commonlyheld in practice. In particular, it generally encounters difficulty when there is a lack of suchbeliefs over some or all the parameters of a model, or within certain partitions of the parameterspace concerned. To address this issue, a fairly comprehensive theory of inference is put forwardcalled integrated organic inference that is based on a fusion of Fisherian and Bayesian reasoning.Depending on the pre-data knowledge that is held about any given model parameter, inferencesare made about the parameter conditional on all other parameters using one of three methods ofinference, namely organic fiducial inference, bispatial inference and Bayesian inference. The fullconditional post-data densities that result from doing this are then combined using a frameworkthat allows a joint post-data density for all the parameters to be sensibly formed withoutrequiring these full conditional densities to be compatible. Various examples of the applicationof this theory are presented. Finally, the theory is defended against possible criticisms partiallyin terms of what was previously defined as generalised subjective probability.
Keywords:
Bayesian; Bispatial inference; Fisherian; Gibbs sampler; Incompatible conditionaldensities; Objective and subjective probability; Organic fiducial inference; P values. a r X i v : . [ s t a t . M E ] J a n . Introduction The general problem of making inferences about a population on the basis of a small ran-dom sample from that population has long been of great interest to scientific researchers.This problem is often addressed by making the assumption that, in the population, thedistribution of the measurements being considered is a member of a given parametricfamily of distributions. Although this assumption can be criticised, we will choose inthis paper to examine problems of inference that are constrained by this assumption.Our justification for this is that, first, this class of problems has substantial importancein its own right, and second, resolving such problems can be viewed as a convenientfirst step towards tackling cases in which making such an assumption is not appropri-ate. Therefore, let us suppose that the data set to be analysed x = { x , x , . . . , x n } wasdrawn from a joint density or mass function g ( x | θ ) that depends on a set of parameters θ = { θ i : i = 1 , , . . . , k } , where each θ i is a one-dimensional variable.A way of classifying the nature of the problem that is encountered in trying to makeinferences about the set of parameters θ is to do so on the basis of the type of knowledgethat was held about these parameters before the data were observed. In this respect, itcan be argued that the three most common types of pre-data opinion that, in practice, arenaturally held about any given model parameter θ j conditional on all other parameters θ − j = { θ , . . . , θ j − , θ j +1 , . . . , θ k } being known are as follows:1) Nothing or very little is known about the parameter.2) It is felt that the parameter may well be close to a specific value, which may forexample indicate the absence of a treatment effect, or the lack of a correlation betweenvariables, but apart from this nothing or very little is known about the parameter. Someexamples of where it would be reasonable to hold this type of pre-data opinion were givenin Bowater (2019b). 2) We know enough about the parameter for our opinion about it to be satisfactorilyrepresented by a probability density or mass function over the parameter.For the reason just given, each of these types of pre-data opinion about the parameter θ j will therefore be treated as corresponding to a distinct problem of inference. Nev-ertheless, since our pre-data opinions about each of the parameters in any given set ofparameters θ may well fall into different categories among the three being considered, itmay be necessary to address two or all three of these types of problem in any particularscenario.These problems are the three problems of inference that will be of principal interest inwhat follows. More specifically, the aim of the present paper will be to show how theseproblems can be dealt with in a harmonious manner by using an approach to inferencebased on a fusion of Fisherian (as attributed to R. A. Fisher) and Bayesian reasoning. Ofcourse, given the obvious incompatibilities that exist between, and to some extent evenwithin, these two schools of reasoning, we will need to be given some liberty in how eachof these approaches to inference is interpreted.In this respect, although the theory that will be outlined is based on a type of prob-ability that is inherently subjective, and therefore not frequentist as in the Fisherianparadigm, it is not the same type of probability that is commonly regarded as underlyingsubjective Bayesian theory. Instead, it is a generalised form of subjective probability thateffectively allows probability distributions to be distinguished according to where theyare on a scale that goes from them being virtually objective to them being extremelysubjective. This type of probability was referred to as generalised subjective proba-bility in Bowater (2018b). Furthermore, the theory to be presented relies on variousconcepts that are heavily used by frequentist statisticians, e.g. sufficient and ancillarystatistics, point estimators and their distributions, the classical notion of significance,and also one very important idea that during his own lifetime was chiefly advocated by3isher himself, namely the fiducial argument. We are not suggesting, though, that theproposed methodology should be judged positively simply because it represents a com-promise between competing schools of inference, rather we recommend, quite naturally,that it should be evaluated on the basis of its effectiveness in dealing with the particularinferential challenge that has been set out.To give a little more detail, each of the three aforementioned problems of inference willbe addressed using a method that is specific to the problem concerned, and although thisresults in the use of three methods that are of a clearly different nature, these methods arenevertheless compatible with the overall framework of inference that will be put forward.In particular, the first type of problem will be tackled using what, in Bowater (2019a),was called organic fiducial inference. On the other hand, the second problem will beaddressed using what, in Bowater (2019b), was called bispatial inference. Finally, thethird problem will be dealt with using Bayesian inference. The overall framework justreferred to provides a way of coordinating these distinct methods of inference so that itis possible to simultaneously make inferences about all of the parameters in the model.Let us now briefly describe the structure of the paper. In the next five sections, wewill present summaries of the fundamental concepts and methods that form the basis ofthe general theory in question, which will be called integrated organic inference (IOI). Inparticular, in the next two sections we will summarise the theory of generalised subjec-tive probability and the overall framework of integrated organic inference. Furthermore,after clarifying in Section 2.3 the interpretation that will be adopted in this paper ofthe Bayesian approach to inference, concise accounts of the methods of organic fiducialinference and bispatial inference will be given in Sections 2.4 and 2.5. Various examplesof the application of integrated organic inference will then be outlined in detail in Sec-tions 3.1 to 3.5. In the final section of the paper (Section 4), a discussion of this theoryof inference will be presented in the form of answers to questions that would be expected4o naturally arise about the theory when it is first evaluated.The theory will be referred to as integrated organic inference (IOI) because it inte-grates what are often considered to be conflicting approaches to inference into an overallframework that relies, in general, on what can be viewed as being an organic simulationalgorithm. Furthermore, the type of inferences that this theory facilitates may, dependingon the circumstances, be regarded as being objective or very subjective, but are never-theless always organic, in the sense that they are intended to be only really understoodby living subjects, e.g. humans, rather than primitive robots.
2. Fundamental concepts and methods2.1. Generalised subjective probabilityOverview
Under this definition of probability, a probability distribution is defined by its (cumula-tive) distribution function and the strength of this function relative to other distributionfunctions of interest. The distribution function is defined as having the standard mathe-matical properties of such a function. Let us now briefly outline the notion of the strengthof a distribution function and some of the concepts that underlie this notion. Furtherdetails and examples of these concepts and of the notion of strength itself can be foundin Bowater (2018b).
Similarity
As in the aforementioned paper, let S ( A, B ) denote the similarity that a given individualfeels there is between his confidence (or conviction) that an event A will occur andhis confidence (or conviction) that an event B will occur. For any three events A , B and C , it is assumed that an individual is capable of deciding whether or not theorderings S ( A, B ) > S ( A, C ) and S ( A, B ) < S ( A, C ) are applicable. The notation5 ( A, B ) = S ( A, C ) is used to represent the case where neither of these orderings apply.
Reference set of events
Let O = { O , O , . . . , O m } be a finite ordered set of m events that are mutually exclusiveand exhaustive. Also, let us assume that if O (1), O (2) and O (3) are three subsets of theset O that contain the same number of events, then the following is true: S (cid:91) O j ∈ O (1) O j , (cid:91) O j ∈ O (2) O j = S (cid:91) O j ∈ O (1) O j , (cid:91) O j ∈ O (3) O j for all possible choices of the subsets O (1), O (2) and O (3). Under this assumption, areference set of events R can be defined as follows: R = { R ( λ ) : λ ∈ Λ } (1)where R ( λ ) = O ∪ O ∪ · · · ∪ O λm and Λ = { /m, /m, . . . , ( m − /m } . For example,it should be clear that any given individual could easily decide that the set of all theoutcomes of randomly drawing a ball out of an urn containing m distinctly labelled ballscould be the set O .Equation (1) gives the definition of a reference set of events assuming that this set isdiscrete. For the definition of a continuous reference set of events, see Bowater (2018b). External strength of a distribution function
Let two continuous random variables X and Y of possibly different dimensions haveelicited or given distribution functions F X ( x ) and G Y ( y ) respectively. Also, we willspecify the set of events F [ a ] as follows: F [ a ] = (cid:26) { X ∈ A} : (cid:90) A f X ( x ) dx = a (cid:27) for a ∈ [0 , { X ∈ A} is the event that X lies in the set A and f X ( x ) is the density function6orresponding to F X ( x ), and we will specify the set G [ a ] in the same way but with respectto the variable Y instead of the variable X and the distribution function G Y ( y ) insteadof F X ( x ).For a given discrete or continuous reference set of events R , we will now define thefunction F X ( x ) as being externally stronger than the function G Y ( y ) at the resolution λ ,where λ ∈ Λ, if min A ∈ F [ λ ] S ( A, R ( λ )) > max A ∈ G [ λ ] S ( A, R ( λ ))An interpretation that could be given to this definition is that, if a particular individualjudges a function F X ( x ) as being externally stronger than a function G Y ( y ) then, relativeto the reference event R ( λ ), the function F X ( x ) could be regarded as representing hisuncertainty about the variable X better than G Y ( y ) represents his uncertainty about thevariable Y .A definition of the internal rather than the external strength of a distribution function,and other definitions of the external strength of a distribution function that are applicableto discrete distribution functions and to distribution functions derived by formal systemsof reasoning, e.g. derived by applying the standard rules of probability, can be found inBowater (2018b). The general aim of the theory to be presented is to construct a joint density/mass functionof all the model parameters θ that accurately represents what is known about theseparameters after the data have been observed, i.e. what can be referred to as a post-datadensity function of these parameters. Let this density function be denoted as p ( θ | x ). Tobe more specific, this will be done by first determining each of the density functions in7he complete set of full conditional post-data density functions of the parameters θ , i.e.the set of density functions: p ( θ j | θ − j , x ) for j = 1 , , . . . , k (2)One of the key features of the approach that will be developed is that it allows anygiven one of these density functions to be constructed using whichever one of the threedistinct methods of inference mentioned in the Introduction is regarded as being the mostappropriate for the task.In order to remove a potentially important source of conflict between the three methodsof inference being referred to, the quite natural assumption will be made that duringthe process of determining each of the full conditional densities in equation (2), theset of conditioning parameters θ − j are always treated as being known constants. Thismeans that usually it will not be permitted that any one of these conditional densitiesis determined by first constructing a joint post-data density of the parameter θ j andsome or all of the parameters in the set θ − j , and then conditioning this joint density onthe parameters θ − j . However, making the assumption that has just been made does notgenerally eliminate the possibility that the set of full conditional densities in equation (2)may be determined using the methods in question in a way that implies that they arenot consistent with any joint density function of the parameters concerned, i.e. theseconditional densities may be incompatible among themselves. On the other hand, if thefull conditional densities under discussion are indeed compatible then, since, under a mildrequirement, a joint density function is uniquely defined by its full conditional densities,these densities will, in general, define a unique joint post-data density function for theparameters θ , i.e. a unique density p ( θ | x ). 8 ddressing the issue of incompatible full conditional densities As discussed in Bowater (2018a), to check whether full conditional densities of the overalltype being considered are compatible, it may be possible to use a simple analyticalmethod. In particular, we begin to implement this method by proposing an analyticalexpression for the joint density function of the set of parameters θ , then we determine thefull conditional density functions for this joint density, and finally we see whether theseconditional densities are equivalent to the full conditional densities in equation (2). If thisequivalence is achieved, then these latter conditional densities clearly must be compatible.This method has the advantage that generally, in such circumstances, it directly gives usan analytical expression for the unique joint post-data density p ( θ | x ), i.e. under a mildcondition, it will be the originally proposed joint density for the parameters θ .By contrast, in situations that will undoubtedly often arise where it is not easy toestablish whether or not the full conditional densities in equation (2) are compatible, letus imagine that we make the pessimistic assumption that they are in fact incompatible.Nevertheless, even though these full conditional densities could be incompatible, theycould be reasonably assumed to represent the best information that is available for con-structing a joint post-data density function of the parameters θ , or in other words, forconstructing the most suitable density p ( θ | x ). Therefore, it would seem appropriate totry to find the joint density of the parameters θ that has full conditional densities thatmost closely approximate those given in equation (2).To achieve this objective, let us focus attention on the use of a method that was ad-vocated in a similar context in Bowater (2018a), in particular the method that simplyconsists in making the assumption that the joint density of the parameters θ that mostclosely corresponds to the full conditional densities in equation (2) is equal to the limitingdensity function of a Gibbs sampling algorithm (Geman and Geman 1984, Gelfand andSmith 1990) that is based on these conditional densities with some given fixed or random9canning order of the parameters in question. Under a fixed scanning order of the modelparameters, we will define a single transition of this type of algorithm as being one thatresults from randomly drawing a value (only once) from each of the full conditional den-sities in equation (2) according to some given fixed ordering of these densities, replacingeach time the previous value of the parameter concerned by the value that is generated.Let us clarify that it is being assumed that only the set of values for the parameters θ that are obtained on completing a transition of this kind are recorded as being a newlygenerated sample, i.e. the intermediate sets of parameter values that are used in theprocess of making such a transition do not form part of the output of the algorithm.To measure how close the full conditional densities of the limiting density function ofthe general type of Gibbs sampler being presently considered are to the full conditionaldensities in equation (2), we can make use of a method that, in relation to its use ina similar context, was discussed in Bowater (2018a). The reasoning that underlies thismethod can be easily appreciated by first assessing the practical viability of another spe-cific procedure for verifying the compatibility of the conditional densities in equation (2).In particular, on the basis of the results in Chen and Ip (2015), it can be deduced thatthe conditional densities in this equation will be compatible if, under a fixed scanningorder of the parameters θ that is implemented in the way that was just specified, a Gibbssampling algorithm based on these full conditional densities satisfies the following threeconditions:A) It is positive recurrent for all possible fixed scanning orders. This condition ensuresthat the sampling algorithm has at least one stationary distribution for any given fixedscanning order.B) It is irreducible and aperiodic for all possible fixed scanning orders. Together withcondition A, this condition ensures that the sampling algorithm has a limiting distributionfor any given fixed scanning order. 10) Given conditions A and B hold, the limiting density function of the sampling algorithmneeds to be the same over all possible fixed scanning orders.Moreover, when these conditions hold, the joint post-data density function of the param-eters θ that is directly defined by the full conditional densities in equation (2) will bethe unique limiting density function of these parameters referred to in condition C. Thesufficiency of the conditions A to C just listed for establishing the compatibility of anygiven set of full conditional densities was proved for a special case in Chen and Ip (2015),which is a proof that can be easily extended to the more general case that is currentlyof interest.Nevertheless, even if, with respect to the specific type of full conditional densitiesreferred to in equation (2), we can establish that condition A and condition B are satisfied,it will usually be impossible, in practice, to determine whether condition C is satisfied.From an alternative perspective, if we assume that the full conditional densities in thisequation are in fact incompatible, then if conditions A and B are satisfied, it would appearto be useful (with reference to condition C) to analyse how the limiting density function ofa Gibbs sampler based on these full conditional densities varies over a reasonable numberof very distinct fixed scanning orders of the sampler. If within such an analysis, thevariation of this limiting density with respect to the scanning order of the parameters θ can be classified as small, negligible or undetectable, then this should give us reassurancethat the full conditional densities in equation (2) are, respectively according to suchclassifications, close, very close or at least very close, to the full conditional densities ofthe limiting density of a Gibbs sampler of the type that is of main interest, i.e. a Gibbssampler that is based on any given fixed or random scanning order of the parametersconcerned.In trying to choose the scanning order of this type of Gibbs sampler such that it hasa limiting density function that corresponds to a set of full conditional densities that11ost accurately approximate the density functions in equation (2), a good general choicewould arguably be, what will be referred to as, a uniform random scanning order. Underthis type of scanning order, a transition of the Gibbs sampling algorithm in questionwill be defined as being one that results from generating a value from one of the fullconditional densities in equation (2) that is chosen at random, with the same probabilityof 1 /k being given to any one of these densities being selected, and then treating thegenerated value as the updated value of the parameter concerned.However, it can be easily shown that independent of whether or not the set of fullconditional densities in equation (2) are compatible, the last full conditional density inthis set that is sampled from in completing a given fixed scanning order will be one ofthe full conditional densities of the limiting density function of the type of Gibbs samplerbeing discussed that uses such a fixed scanning order. This therefore provides a reason forperhaps deciding, in certain applications, that the limiting density of a Gibbs samplerof the general type in question most satisfactorily corresponds to the full conditionaldensities in equation (2) when a given fixed rather than a uniform random scanningorder of the parameters θ is used. Conventional simulation issues
As with all Gibbs samplers it is important to verify in implementing strategies of thetype just mentioned that the sampler concerned has converged to its limiting densityfunction within the restricted number of transitions of the sampler that can be observed inpractice. To do this, we can make use of standard methods for analysing the convergenceof Monte Carlo Markov chains described in, for example, Gelman and Rubin (1992),Cowles and Carlin (1996) and Brooks and Roberts (1998). However, the use of suchconvergence diagnostics may be considered to be slightly more important in the caseof present interest in which the full conditional densities on which the Gibbs sampleris based could be incompatible, since, compared to the case where these densities are12nown to be compatible, there is likely to be, in practice, a little more concern that theGibbs sampler may not actually have a limiting density function, even though in realitythe genuine risk of this may still be extremely low.A notable advantage of the general method for finding a suitable joint post-data densityfor the parameters θ that has just been outlined is that it can directly achieve what isoften the main goal of a standard application of the Gibbs sampler, namely that ofobtaining good approximations to the expected values of functions of the parameters ofa model over the post-data or posterior density for these parameters that is of interest,i.e. expected values of the following type:E[ h ( θ ) | x ] = (cid:90) R k h ( θ ) p ( θ | x ) dθ where p ( θ | x ) is a given post-data density function of the parameters θ , while h ( θ ) is anygiven function of these parameters. To be more specific, this kind of expected value may,of course, be approximated using the Monte Carlo estimator:1 N − b N (cid:88) i = b +1 h ( θ ( i )1 , θ ( i )2 , . . . , θ ( i ) k )where θ ( i )1 , θ ( i )2 , . . . , θ ( i ) k is the i th sample of parameter values among the N samples gen-erated by the sampler in total, and b is the number of initial samples that are classifiedas belonging to the burn-in phase of the sampler. As was in effect done so by Bayes in his famous paper Bayes (1763), it will be assumedthat Bayesian inference depends on three key concepts. First, Bayes’ theorem as a purelymathematical expression. Second, the justification of the application of this theorem towell-understood physical experiments, e.g. random spins of a wheel or random draws13f a ball from an urn of balls. Finally, something which will be referred to as Bayes’analogy, which is the type of analogy that can be made between the uncertainty thatsurrounds the outcomes of the kind of physical experiments just mentioned to whichBayes’ theorem can be very naturally applied, and the uncertainty that surrounds whatare the true values of any unknown real-world quantities that are of interest.By using this latter concept, we can justify the use of Bayesian inference in a muchwider range of applications than is allowed by only using the first two concepts. However,depending on the type of application, the Bayes’ analogy may be a good analogy or apoor analogy, which is something that needs to be taken into account when assessing theadequacy of any given application of the Bayesian method.In keeping with the notation defined in the Introduction, the post-data or posteriordensity function of the parameter θ j given all other model parameters θ − j can be ex-pressed according to Bayes’ theorem as follows: p ( θ j | θ − j , x ) = C g ( x | θ ) p ( θ j | θ − j )where p ( θ j | θ − j ) is the pre-data or prior density function of the parameter θ j given theparameters θ − j , while C is a normalising constant.In this paper, we will exclude from consideration two methods of inference that areoften referred to as ‘objective’ forms of Bayesian inference. The first of these methodsconsists in always specifying the prior density p ( θ j | θ − j ) as being a uniform or flat densityfunction over all values of θ j . This implies, though, that the Bayes’ analogy must bebroken due to this prior density being improper and/or due to the posterior densityof any given population quantity of interest h ( θ j ) conditional on the parameters θ − j possessing, in general, the property of being dependent on the parameterisation of thesampling model, which of course is a very undesirable property for this posterior densityto have. On the other hand, the second type of method entails specifying the priordensity p ( θ j | θ − j ) such that it depends on the sampling model, i.e. allowing what is14nown about the parameter θ j to depend on how we intend to collect more informationabout this parameter, however doing this clearly again breaks the Bayes’ analogy. Afamous example of a type of prior density that is specified in this way is a prior densitythat is derived by applying Jeffreys’ rule, see Jeffreys (1961), although many other priordensities of this kind have been proposed, see for example, Kass and Wasserman (1996).To conclude, it can be strongly argued that, due to the Bayes’ analogy being clearlybroken, the application of either of the two methods of inference that have just beenmentioned should not really be regarded as being an application of the Bayesian approachto inference at all. We will now outline some of the key concepts that underlie the theory of organic fiducialinference. Descriptions of other important concepts on which this theory is based, alongwith further details about the concepts that will be outlined here and about the overalltheory itself, can be found in Bowater (2019a). Throughout this section, it will beassumed that the values of the parameters in the set θ − j are known. Fiducial statistics
A fiducial statistic Q ( x ) will be defined as being a univariate statistic of the sample x that can be regarded as efficiently summarising the information that is contained in thissample about the only unknown parameter θ j , given the values of other statistics thatdo not provide any information about this parameter, i.e. ancillary statistics. If, in anygiven case, there exists a univariate sufficient statistic for θ j , then this would naturallybe chosen to be the fiducial statistic for that case. In other cases, it may well make goodsense to choose this statistic Q ( x ) to be the maximum likelihood estimator of θ j .For ease of presentation, we will assume, in what follows, that the choice of the fiducial15tatistic can be justified without reference to any particular ancillary statistics. Data generating algorithm
Independent of the way in which the data set x was actually generated, it will be assumedthat this data set was generated by the following algorithm:1) Generate a value γ for a continuous one-dimensional random variable Γ, which has adensity function π ( γ ) that does not depend on the parameter θ j .2) Determine a value q ( x ) for the fiducial statistic Q ( x ) by setting Γ equal to γ and Q ( x )equal to q ( x ) in the following expression for the statistic Q ( x ), which effectively shoulddefine the way in which this statistic is distributed: Q ( x ) = ϕ (Γ , θ j ) (3)where the function ϕ (Γ , θ j ) is specified so that it satisfies the following conditions:a) The density or mass function of Q ( x ) that is, in effect, defined by equation (3) is equalto what it would have been if Q ( x ) had been determined on the basis of the data set x .b) The only random variable upon which ϕ (Γ , θ j ) depends is the variable Γ.3) Generate the data set x from its sampling density or mass function g ( x | θ , θ , . . . , θ k )conditioned on the statistic Q ( x ) being equal to its already generated value q ( x ).In the context of this algorithm, the variable Γ is referred to as the primary randomvariable (primary r.v.). Strong fiducial argument
This is the argument that the density function of the primary r.v. Γ after the data havebeen observed, i.e. the post-data density function of Γ, should be equal to the pre-datadensity function of Γ, i.e. the density function π ( γ ) as defined in step 1 of the data16enerating algorithm just presented. Moderate fiducial argument
It will be assumed that this argument is only applicable if, on observing the data x ,there exists some positive measure set of values of the primary r.v. Γ over which the pre-data density function π ( γ ) was positive, but over which the post-data density functionof Γ, which will be denoted as the density function π ( γ ), is necessarily zero. Underthis condition, it is the argument that, over the set of values of Γ for which the densityfunction π ( γ ) is necessarily positive, the relative height of this function should be equalto the relative height of the density function π ( γ ), i.e. the heights of these two functionsshould be proportional. Weak fiducial argument
This argument will be assumed to be only applicable if neither the strong nor the mod-erate fiducial argument is considered to be appropriate. It is the argument that, overthe set of values of the primary r.v. Γ for which the post-data density function π ( γ ) isnecessarily positive, the relative height of this function should be equal to the relativeheight of the pre-data density function π ( γ ) multiplied by weights on the values of Γdetermined by a given function over the parameter θ j that was specified before the datawere observed. This latter function is called the global pre-data function of θ j . Let usnow define this function. Global pre-data (GPD) function
The global pre-data (GPD) function ω G ( θ j ) is used to express pre-data knowledge, or alack of such knowledge, about the only unknown parameter θ j . This function may be anygiven non-negative function of the parameter θ j that is locally integrable over the spaceof this parameter. It is a function that only needs to be specified up to a proportionality17onstant, in the sense that, if it is multiplied by a positive constant, then the value ofthe constant is redundant. Unlike a Bayesian prior density, it is not controversial to usea GPD function that is not globally integrable. A principle for defining the fiducial density f ( θ j | θ − j , x )Let us now consider a principle for defining the post-data density of θ j conditional on theparameters θ − j , which given that it will be derived using a type of fiducial inference, willbe called the fiducial density of θ j conditional on θ − j , and will be denoted as the density f ( θ j | θ − j , x ). To be able to use this principle, the following condition must be satisfied. Condition 1
Let G x and H x be, respectively, the sets of all the values of the primary r.v. Γ andthe parameter θ j for which the density functions of these variables must necessarily bepositive in light of having observed only the value of the fiducial statistic Q ( x ), i.e. thevalue q ( x ), and not any other information in the data set x . To clarify, any set of valuesof Γ or any set of values of θ j that are regarded as being impossible after the statistic Q ( x ) has been observed can not be contained in the set G x or the set H x respectively.Given this notation, the present condition will be satisfied if, on substituting the variable Q ( x ) in equation (3) by its observed value q ( x ), this equation would define a bijectivemapping between the set G x and the set H x .Under this condition, the full conditional fiducial density f ( θ j | θ − j , x ) is defined bysetting Q ( x ) equal to its observed value q ( x ) in equation (3), and then treating the value θ j in this equation as being a realisation of the random variable Θ j , to give the expression: q ( x ) = ϕ (Γ , Θ j ) (4)except that, instead of the variable Γ necessarily having the density function π ( γ ) asdefined in step 1 of the data generating algorithm, it will be assumed to have the post-18ata density function of this variable as defined by: π ( γ ) = (cid:40) C ω G ( θ j ( γ )) π ( γ ) if γ ∈ G x θ j ( γ ) is the value of the variable Θ j that maps on to the value γ of the variableΓ according to equation (4), the function ω G ( θ j ( γ )) is the GPD function of θ j definedearlier, and C is a normalising constant.Notice that if, on substituting the variable Q ( x ) by the value q ( x ), equation (3) definesan injective mapping from the set of values { γ : π ( γ ) > } for the variable Γ to thespace of the parameter θ j , then the GPD function ω G ( θ j ) expresses in effect our pre-databeliefs about θ j relative to what is implied by using the strong fiducial argument. Bydoing so, it determines whether the strong, moderate or weak fiducial argument is usedto make inferences about θ j , and also the way in which the latter two arguments influencethe inferential process.In the case where nothing or very little was known about the parameter θ j before thedata were observed, it would generally seem reasonable to choose the GPD function of theparameter θ j to be equal to a positive constant over the entire space of this parameter.Under the assumption that there exists an injective mapping from the space of Γ to thespace of θ j of the type just mentioned, choosing the GPD function ω G ( θ j ) in this wayimplies that the post-data density π ( γ ) will be equal to the pre-data density π ( γ ), i.e.inferences will be made about θ j by using the strong fiducial argument. The use of thetheory of fiducial inference being presently considered in this special case is discussed tosome extent in Bowater (2019a), but more extensively in Bowater (2018a), where in facta specific version of organic fiducial inference is applied to examples of this particularnature that is referred to as subjective fiducial inference.19 ther ways of defining the fiducial density f ( θ j | θ − j , x )In cases where the principle just described can not be applied, i.e. when Condition 1does not hold, we may well be able to define the fiducial density f ( θ j | θ − j , x ) using thealternative principle for this purpose that was presented in Section 3.4 of Bowater (2019a)as Principle 2, or it may well be considered acceptable to define this fiducial density usingthe kind of variations on this latter principle that were discussed in Sections 7.2 and 8 ofthis earlier paper. The alternative principle in question, which is particularly useful incases where the data are discrete or categorical, relies on the concept of a local pre-data(LPD) function for expressing additional information concerning the pre-data beliefs thatwere held about the parameter θ j to that which is expressed by the GPD function for θ j .The concept of a LPD function is also detailed in Bowater (2019a). The type of bispatial inference that will be incorporated into the theory being developedin the present paper will be the special form of bispatial inference that was laid out inSection 3 of Bowater (2019b). Let us now outline the key concepts on which this type ofbispatial inference is based. Further details about these concepts and a broader discussionof the specific method of inference in question can be found in Bowater (2019b). As inthe previous section, the values of the parameters in the set θ − j will be assumed to beknown. Scenario of interest
This scenario is characterised by there having been a substantial degree of belief beforethe data were observed that the only unknown parameter θ j lay in a narrow interval[ θ j , θ j ], but if, on the other hand, θ j had been conditioned not to lie in this interval,then there would have been no or very little pre-data knowledge about θ j over all of20ts allowable values outside of the interval in question. Among the three common typesof pre-data opinion we may hold about the parameter θ j that were highlighted in theIntroduction, this scenario is clearly consistent with holding the second type of opinion. Test statistics
In the context of bispatial inference, a test statistic T ( x ), which will also be denotedsimply by the value t , is specified such that it satisfies two criteria. First, this statisticmust fit within the broad definition of a fiducial statistic that was given in the previoussection. Therefore, this could mean that a particular choice of the statistic T ( x ) canonly be justified with reference to given ancillary statistics, however, similar to how weproceeded in the previous section, we will assume here, for ease of presentation, that thisis not the case.The second criterion is that if F ( t | θ j ) is the cumulative distribution function of theunobserved test statistic T ( X ) evaluated at its observed value t given a value for theparameter θ j , i.e. F ( t | θ j ) = P ( T ( X ) ≤ t | θ j ), and if F (cid:48) ( t | θ j ) is equal to the probability P ( T ( X ) ≥ t | θ j ), then it is necessary that, over the set of allowable values for θ j , theprobabilities F ( t | θ j ) and 1 − F (cid:48) ( t | θ j ) strictly decrease as θ j increases. Parameter and sampling space hypotheses
Under this definition of a test statistic T ( x ), if the condition: F ( t | θ j = θ j ) ≤ F (cid:48) ( t | θ j = θ j ) (5)holds, where the values θ j and θ j are as defined at the start of this section, then theparameter space hypothesis H P and the sampling space hypothesis H S will be definedas: H P : θ j ≥ θ j (6) H S : ρ ( T ( X ∗ ) ≤ t ) ≤ F ( t | θ j = θ j ) (7)21here X ∗ is an as-yet-unobserved second sample of values drawn from the samplingdensity of interest, i.e. the density g ( x | θ ), that is the same size as the observed (first)sample x , i.e. it consists of n observations, and where ρ ( A ) is the unknown populationproportion of times that condition A is satisfied. On the other hand, if the condition inequation (5) does not hold, then the hypotheses in question will be defined as: H P : θ j ≤ θ j (8) H S : ρ ( T ( X ∗ ) ≥ t ) ≤ F (cid:48) ( t | θ j = θ j ) (9)Given the way that the test statistic T ( x ) was just defined, it can be easily appreciatedthat the hypotheses H P and H S in equations (6) and (7) are equivalent, and also thatthese hypotheses as defined in equations (8) and (9) are equivalent. In addition, observethat the probabilities F ( t | θ j = θ j ) and F (cid:48) ( t | θ j = θ j ) that appear in the definitions ofthe hypotheses H S in equations (7) and (9) would be the standard one-sided P values thatwould be calculated on the basis of the data set x if the null hypotheses were regardedas being the hypotheses H P that correspond to the two hypotheses H S in question. Inferential process
It will be assumed that inferences are made about the parameter θ j by means of thefollowing three-step process:Step 1: Assessment of the likeliness of the hypothesis H P being true using only pre-dataknowledge about the parameter θ j , with special attention being given to evaluating thelikeliness of the hypothesis that θ j lies in the interval [ θ j , θ j ], which is an hypothesisthat is always included in the hypothesis H P . It is not necessary that this assessment isexpressed in terms of a formal measure of uncertainty, e.g. a probability does not needto be assigned to the hypothesis H P . 22tep 2: Assessment of the likeliness of the hypothesis H S being true after the data x havebeen observed, leading to the assignment of a probability to this hypothesis, which will bedenoted as the probability κ . In carrying out this assessment, all relevant factors oughtto be taken into account including, in particular: (a) the size of the one-sided P valuethat appears in the definition of the hypothesis H S , i.e. the value F ( t | θ j = θ j ) or thevalue F (cid:48) ( t | θ j = θ j ), (b) the assessment made in Step 1, and (c) the known equivalencybetween the hypotheses H P and H S .Step 3: Conclusion about the probability of the hypothesis H P being true having takeninto account the data x . This is directly implied by the assessment made in Step 2 dueto the equivalence of the hypotheses H P and H S . In combination with organic fiducial inference
It was described in Bowater (2019b) how the type of bispatial inference under discussioncan be extended from allowing us to simply determine a post-data probability for thehypothesis H P being true, i.e. the probability κ , to allowing us to determine an entirepost-data density function for the parameter θ j . As was the case in this earlier paper, wewill again favour doing this in an indirect way by combining bispatial inference as hasjust been detailed with organic fiducial inference as was summarised in Section 2.4. Inparticular, the method that we will choose to adopt to achieve the goal in question willbe essentially the method that was put forward in Section 4.2 of Bowater (2019b). Letus now give briefly outline this method.To begin with, in applying the method concerned, we assume that both the post-data density function of θ j conditional on θ j lying in the interval [ θ j , θ j ], and thepost-data density function of θ j conditional on θ j not lying in this interval are de-rived under the paradigm of organic fiducial inference, i.e. they are fiducial density func-tions, and let us therefore denote these density functions by f ( θ j | θ j ∈ [ θ j , θ j ] , x ) and23 ( θ j | θ j / ∈ [ θ j , θ j ] , x ) respectively. Since it has been assumed that, under the conditionthat θ j does not lie in the interval [ θ j , θ j ], nothing or very little would have been knownabout θ j before the data were observed, it would seem quite natural, in deriving thelatter of these fiducial densities f ( θ j | θ j / ∈ [ θ j , θ j ] , x ), to use a GPD function for θ j thathas the following form: ω G ( θ j ) = (cid:40) θ j ∈ [ θ j , θ j ] a otherwisewhere a >
0, which would be classed as a neutral GPD function using the terminologyof Bowater (2019a).On the basis of this GPD function, the fiducial density f ( θ j | θ j / ∈ [ θ j , θ j ] , x ) can oftenbe derived by applying the moderate fiducial argument under the principle that was out-lined in Section 2.4, i.e. Principle 1 of Bowater (2019a). Alternatively, in accordance withwhat was also advocated in Bowater (2019a), this fiducial density can be more generallydefined, with respect to the same GPD function for θ j , by the following expression: f ( θ j | θ j / ∈ [ θ j , θ j ] , x ) = C f S ( θ j | x ) (10)where C is a normalising constant, and f S ( θ j | x ) is a fiducial density for θ j derived usingeither Principle 1 or Principle 2 of Bowater (2019a) that would be regarded as being asuitable fiducial density for θ j in a general scenario where it is assumed that there wasno or very little pre-data knowledge about θ j over all possible values of θ j .To construct the fiducial density of θ j conditional on θ j lying in the interval [ θ j , θ j ],i.e. the density f ( θ j | θ j ∈ [ θ j , θ j ] , x ), the method being considered relies on quite ageneral type of GPD function for θ j . In particular, it is assumed that this GPD functionhas the following form: ω G ( θ j ) = (cid:40) νh ( θ j ) if θ j ∈ [ θ j , θ j ]0 otherwise (11)24here ν ≥ h ( θ j ) is a continuous unimodal density function onthe interval [ θ j , θ j ] that is equal to zero at the limits of this interval. On the basis ofthis GPD function, the fiducial density f ( θ j | θ j ∈ [ θ j , θ j ] , x ) can often be derived byagain using the principle detailed in Section 2.4 (i.e. Principle 1 of Bowater 2019a), butthis time by calling upon the weak fiducial argument. Alternatively, in accordance withwhat was also advocated in Bowater (2019a), this fiducial density can be more generallydefined, with respect to the same GPD function for θ j , in the following way: f ( θ j | θ j ∈ [ θ j , θ j ] , x ) = C ω G ( θ j ) f S ( θ j | x ) (12)where the fiducial density f S ( θ j | x ) is specified as it was immediately after equation (10),and C is a normalising constant.Now, if in using the method of bispatial inference outlined immediately before thecurrent discussion, the hypothesis H P , i.e. the hypothesis in equation (6) or equation (8),is assigned a sensible post-data probability κ , i.e. a probability above a very low limitthat is defined in Bowater (2019b), then given the two conditional post-data densitiesfor θ j that have just been specified, i.e. the fiducial densities f ( θ j | θ j ∈ [ θ j , θ j ] , x ) and f ( θ j | θ j / ∈ [ θ j , θ j ] , x ), we have sufficient information to determine a valid post-datadensity function of θ j over all values of θ j . Hopefully, it is fairly clear why this is thecase, nevertheless the reader is referred to Bowater (2019b) for a more detailed accountof the derivation of this latter post-data density function. In the rest of this paper, wewill denote this overall post-data density function of θ j as the density b ( θ j | θ − j , x ) toindicate that it was derived using bispatial inference.However, there is an important final issue that needs to be resolved, which is how thevalue of the constant ν in equation (11) is chosen. Using the method being discussed,this constant must in fact be chosen such that the overall post-data density b ( θ j | θ − j , x )is made equivalent to a fiducial density function for θ j that is based on a continuousGPD function for θ j over all values of θ j , but except for the way in which this GPD25unction is specified, is based on the same assumptions as were used to derive the fiducialdensity f S ( θ j | x ). In general, a value for ν will exist that satisfies this condition andit will be a unique value. Placing this condition on the choice of ν can be viewed asnot restricting excessively the way we are allowed to express our pre-data knowledgeabout the parameter θ j , while it ensures that the density function b ( θ j | θ j , x ) possesses,in general, the usually desirable property of being continuous over all values of θ j . Post-data opinion curve
Observe that in using the method of inference that has just been outlined, the assessmentof the likeliness of the hypothesis H S in either equation (7) or equation (9) will, in general,depend on the values of the parameters in the set θ − j . This of course will be partially dueto the effect that the values of these parameters can have on the one-sided P value thatappears in the definition of this hypothesis, i.e. their effect on the value F ( t | θ j = θ j )or the value F (cid:48) ( t | θ j = θ j ). As a result, to implement the method of inference underdiscussion within the overall framework for determining a joint post-data density of allthe model parameters θ that was put forward in Section 2.2, we will generally wish toassign not just one, but various probabilities to the hypothesis H S conditional on thevalues of the parameters θ − j .It is possible though to simplify matters greatly by assuming that the probability thatis assigned to any given hypothesis H S , and to also therefore its corresponding hypothesis H P , i.e. the probability κ , will be the same for any fixed value of the one-sided P valuethat appears in the definition of the hypothesis H S no matter what values are actuallytaken by the parameters in the set θ − j . By making this assumption, which is arguably areasonable assumption in many practical situations, the probability κ becomes a mathe-matical function of the one-sided P value that appears in the definition of the hypothesis H S concerned. As was the case in Bowater (2019b), this function will be called thepost-data opinion (PDO) curve for the parameter θ j conditional on the parameters θ − j .26 . Examples We will now present various examples of the application of the overall theory that wasoutlined in previous sections, i.e. the theory of integrated organic inference.
Let us begin by considering what can be referred to as Student’s problem, that is, the stan-dard problem of making inferences about the mean µ of a normal density function, whenits variance σ is unknown, on the basis of a sample x of size n , i.e. x = { x , x , . . . , x n } ,drawn from the density function concerned.If σ was known, a sufficient statistic for µ would be the sample mean ¯ x , which there-fore, in applying the theory of fiducial inference outlined in Section 2.4, can naturally beassumed to be the fiducial statistic Q ( x ) in this particular case. Based on this assumptionand given a value for σ , equation (3) can be expressed as:¯ x = ϕ (Γ , µ ) = µ + ( σ/ √ n )Γ (13)where the primary r.v. Γ ∼ N(0 , µ beforethe data x were observed, then it would be quite natural to specify the GPD function for µ as follows: ω G ( µ ) = a for µ ∈ ( −∞ , ∞ ), where a >
0, which is indeed in keeping withhow this function would be chosen using a criterion mentioned in Section 2.4. Using theprinciple outlined in this earlier section for deriving the fiducial density f ( θ j | θ − j , x ), andin particular using equation (4), this would imply that the fiducial density of µ given σ ,i.e. the density f ( µ | σ , x ), is defined by: µ | σ , x ∼ N(¯ x, σ /n ) (14)On the other hand, if µ was known, a sufficient statistic for σ would be ˆ σ =(1 /n ) (cid:80) ni =1 ( x i − µ ) , which therefore, in applying again the theory of Section 2.4, will27e assumed to be the statistic Q ( x ) in this case. Based on this assumption and given avalue for µ , equation (3) can be expressed as: ˆ σ = ϕ (Γ , σ ) = ( σ /n )Γ (15)where the primary r.v. Γ has a χ distribution with n degrees of freedom. If there wasno or very little pre-data knowledge about σ , it would be quite natural to specify theGPD function for σ as follows: ω G ( σ ) = b if σ ≥ b >
0. Again using the principle detailed in Section 2.4 for deriving the fiducialdensity f ( θ j | θ − j , x ), this would imply that the fiducial density f ( σ | µ, x ) is defined by: σ | µ, x ∼ Inv-Gamma ( α = n/ , β = n ˆ σ /
2) (17)i.e. it is an inverse gamma density function with shape parameter α equal to n/ β equal to n ˆ σ / f ( µ | σ , x ) and f ( σ | µ, x )as they have just been specified are compatible and the joint density function of µ and σ that they directly define is unique. This density function is therefore the joint fiducialdensity of µ and σ . In particular, the marginal density of µ over this joint fiducialdensity is given by: µ | x ∼ Non-standardised t n − (¯ x, s/ √ n ) (18)where s is the sample standard deviation, i.e. it is a non-standardised Student t densityfunction with n − x and scaling pa-rameter equal to s/ √ n (which are settings that of course make it a very familiar memberof this particular family of density functions), while the marginal density of σ over the28oint fiducial density of µ and σ in question is given by: σ | x ∼ Inv-Gamma (( n − / , ( n − s /
2) (19)All the main results that have just been outlined were previously given with moreexplanation in Bowater (2019a), and indeed, a similar derivation of these results can befound in Bowater (2018a). By contrast, in what follows, the results that will be presentedare generally original results, i.e. results not discussed in earlier papers, although variousreferences will be made to examples that have been detailed previously.In the scenario currently being considered, let us now turn our attention to the casewhere we have important pre-data knowledge about either of the parameters µ or σ that can be adequately represented by a probability density function over the parameterconcerned conditional on the other parameter being known. To give an example, letus assume that our pre-data opinion about σ conditional on µ being known can beadequately represented by the density function of σ conditional on µ that is defined by: σ | µ ∼ Inv-Gamma ( α , β ) (20)where α > β > σ conditional on µ that is defined by: σ | µ, x ∼ Inv-Gamma ( α + ( n/ , β + ( n ˆ σ / µ , then it would be quitenatural to let the full conditional fiducial density f ( µ | σ , x ) defined by equation (14),and the full conditional posterior density p ( σ | µ, x ) defined in the equation just given,form the basis for using the framework described in Section 2.2 to determine the jointpost-data density of µ and σ , i.e. the density p ( µ, σ | x ). In fact, by using the simpleanalytical method outlined in the opening part of Section 2.2, it can be easily established29hat these full conditional densities are compatible, and it is clear that the joint densityfunction for µ and σ that they define must be unique. This joint density function istherefore the post-data density p ( µ, σ | x ). Furthermore, the marginal density of µ overthis joint post-data density is given by: µ | x ∼ Non-standardised t α + n − (cid:32) ¯ x, (cid:18) β + ( n − s (2 α + n − n (cid:19) . (cid:33) , (22)while the marginal density of σ over the joint density in question is given by: σ | x ∼ Inv-Gamma ( α + (( n − / , β + (( n − / s ) (23)To illustrate this example, Figure 1 shows some results from using the calculationsjust described to perform an analysis of a data set x that is summarised by the values n = 9, ¯ x = 2 . s = 9. In particular, this figure shows a plot of the specific form ofthe conditional prior density p ( σ | µ ) as defined by equation (20) that was used in thisanalysis, which is represented by the short-dashed curve in Figure 1(b), a plot of themarginal post-data density p ( µ | x ) as defined by equation (22), which is represented bythe long-dashed (rather than the dot-dashed) curve in Figure 1(a), and a plot of themarginal post-data density p ( σ | x ) as given by equation (23), which is represented bythe long-dashed curve in Figure 1(b). To complete the specification of the prior density p ( σ | µ ), the constants α and β in equation (20) were set equal to 4 and 64 respectively.These settings imply that this prior density would be equal to the marginal fiducialdensity of σ defined by equation (19) if this latter density was based on having observeda variance of 16 in a preliminary sample of 9 observations drawn from a populationhaving the same unknown variance σ that is currently being considered. Notice that,from a practical viewpoint, this interpretation would be genuinely useful if the mean µ of this population was not only assumed to be unknown, but was assumed not to be thesame as the mean µ of present interest. 30n the basis of only the main data set being analysed, i.e. the data set x , and forcomparison with the plots being considered, the solid curves in Figures 1(a) and 1(b)represent, respectively, the marginal fiducial density f ( µ | x ) as defined by equation (18)and the marginal fiducial density f ( σ | x ) as given by equation (19).Let us now change the state of knowledge about both the parameters µ and σ beforethe data were observed. In particular, let us begin by imagining that we have importantpre-data knowledge about the mean µ that can be adequately represented by a probabilitydensity function over µ conditional on σ being known, i.e. the density p ( µ | σ ). To givean example, let this density function be defined by: µ | σ ∼ Non-standardised t ν ( µ , σ ) (24)where ν > σ > µ are given constants. Treating this choice of the density p ( µ | σ ) as a prior density under the Bayesian paradigm leads to a posterior density of µ conditional on σ that is defined by: p ( µ | σ , x ) ∝ (1 + (1 /σ ν )( µ − µ ) ) − ( ν +1) / exp( − n (¯ x − µ ) / σ )If now we assume that there was no or very little pre-data knowledge about σ , thenit would be quite natural to use the full conditional fiducial density f ( σ | µ, x ) givenby equation (17), and the full conditional posterior density p ( µ | σ , x ) defined by theequation just presented, as the basis for determining the joint post-data density of µ and σ , i.e. the density p ( µ, σ | x ). Similar to the previous example, it can easily beshown by using once again the simple analytical method outlined in the opening part ofSection 2.2 that these full conditional densities are compatible, and it is again clear thatthe joint density function for µ and σ that they define must be unique. This joint densityfunction, which is therefore the post-data density p ( µ, σ | x ), can in fact be expressed asfollows: p ( µ, σ | x ) = (1 /σ ) ( n/ (1 + (1 /σ ν )( µ − µ ) ) − ( ν +1) / exp( − (1 / σ ) n ˆ σ ) (25)31 . . . . . (a) m D en s i t y . . . . . . . (b) s D en s i t y Figure 1:
Conditional prior and marginal post-data densities of the mean µ and standarddeviation σ of a normal distribution To illustrate the use of the method being discussed, let us apply this method to theanalysis of the same data set x as we were concerned with in the previous example. Inparticular, Figure 1 shows, along with the plots that were mentioned earlier, a plot ofthe specific form of the conditional prior density p ( µ | σ ) as defined by equation (24)that was used in the present analysis, which is represented by the short-dashed curvein Figure 1(a), and plots of the marginal densities of µ and σ over the joint post-datadensity p ( µ, σ | x ) given in equation (25), which are represented by the dot-dash curvesin Figure 1(a) and Figure 1(b) respectively. These marginal densities of µ and σ wereobtained by numerical integration over the joint density p ( µ, σ | x ). To complete thespecification of the prior density p ( µ | σ ), the constants in equation (24) were given thesettings ν = 17, µ = − . σ = 4 /
3. These settings imply that this prior densitywould be equal to the marginal fiducial density of µ given by equation (18) if this latterdensity was based on having observed a mean of − . µ that is currently being considered. Similar to a point made earlier, such aninterpretation would be genuinely useful in a practical sense if the variance σ of thispopulation was not only assumed to be unknown, but was assumed not to be the sameas the variance σ of present interest.Finally, in the case where we have important pre-data knowledge about both µ and σ that can be adequately represented by full conditional probability densities over eachof these parameters, i.e. the densities p ( µ | σ ) and p ( σ | µ ), it would seem reasonable,assuming that these conditional densities are compatible, to treat these densities as beingconditional prior densities, and to use exclusively the standard Bayesian approach to makeinferences about µ and σ . Since Bayesian inference is a well-known form of inference,no further discussion of this particular case will be given here. In the previous section, Student’s problem was tackled by incorporating organic fiducialinference and Bayesian inference into the framework outlined in Section 2.2, now let usconsider a case in which it would seem appropriate to address the same problem by alsoincorporating bispatial inference into this framework.In particular, let us assume that conditional on the variance σ being known, thescenario of interest of Section 2.5 would apply if the general parameter θ j was takenas being the mean µ , with the interval [ θ j , θ j ] in this scenario being denoted now asthe interval [ µ − ε, µ + ε ], where ε ≥ µ are given constants. We will thereforeconstruct the post-data density of µ conditional on σ using the type of bispatial inferencedescribed in Section 2.5.To do this, the test statistic T ( x ) as defined in Section 2.5 will be quite reasonablyassumed to be the sample mean ¯ x . Therefore, in the case where the mean ¯ x is greaterthan zero, which will be assumed to be the case of particular interest, the hypotheses H P a) −1 0 1 2 3 4 5 6 . . . . . m D en s i t y (b) . . . . . . . s D en s i t y Figure 2:
Histograms representing marginal post-data densities of the mean µ and standarddeviation σ of a normal distribution and H S will be as defined in equations (8) and (9), which implies that, for the presentexample, they can be more specifically expressed as: H P : µ ≤ µ + εH S : ρ ( X ∗ > ¯ x ) ≤ − Φ((¯ x − µ − ε ) √ n/σ ) (= J ) (26)where X ∗ is the mean of an as-yet-unobserved sample of n additional values drawn fromthe density function g ( x | µ, σ ), i.e. the normal density function being studied, and Φ( y )is the cumulative density of a standard normal distribution at the value y . Also, it willbe assumed, quite reasonably, that the fiducial density f S ( θ j | x ), which is required byequations (10) and (12), i.e. the density f S ( µ | σ , x ) in the present case, is the fiducialdensity of µ given σ that was defined in equation (14).To complete the specification of the post-data density of µ given σ , i.e. in keeping withearlier notation, the density b ( µ | σ , x ), let us now make some more specific assumptions.In particular, let us assume that µ = 0 and ε = 0 .
2, and that the density function h ( θ j )34hat appears in equation (11), i.e. the density h ( µ ) in the present case, is defined by: µ ∼ Beta(4 , , − . , .
2) (27)i.e. it is a beta density function for µ on the interval [ − . , .
2] with both its shapeparameters equal to 4. Furthermore, we will assume that the data is summarised as itwas in the previous section, i.e. by n = 9, ¯ x = 2 . s = 9. Finally, the probabilities κ that would be assigned to the hypothesis H S in equation (26) for different values of σ will be assumed to be given by the PDO curve for µ conditional on σ that has theformula: κ = J . , where, as indicated in equation (26), J is the one-sided P value in thedefinition of the hypothesis H S concerned. These assumptions fully specify the post-datadensity b ( µ | σ , x ) according to the methodology outlined in Section 2.5.In fact, in Bowater (2019b), this full conditional density of µ , precisely as this densityhas just been defined, and the full conditional fiducial density f ( σ | µ, x ) given by equa-tion (17), with the data set x assumed to be as currently specified, were used as the basisfor determining the joint post-data density of µ and σ within the same type of frameworkas described in Section 2.2. As mentioned earlier, the use of the full conditional fiducialdensity of σ being referred to would be quite natural if it was assumed there was no orvery little pre-data knowledge about the variance σ . However, this assumption will notbe made here. Instead, let us assume that we have important pre-data knowledge about σ that in fact is adequately represented by the density function for σ conditional on µ that is defined by equation (20), with the same choices for the constants α and β as were used earlier to express pre-data knowledge about σ conditional on µ , i.e. with α = 4 and β = 64. Treating this density function as a prior density function under theBayesian paradigm leads therefore to the posterior density of σ given µ , i.e. the density p ( σ | µ, x ), being defined as it was in equation (21).To illustrate this example, Figure 2 shows some results from running a Gibbs sampleron the basis of the full conditional post-data densities of µ and σ that have just been35efined, i.e. the post-data density b ( µ | σ , x ) and the posterior density p ( σ | µ, x ), with auniform random scanning order of the parameters µ and σ , as such a scanning order wasdefined in Section 2.2. In particular, the histograms in Figures 2(a) and 2(b) represent thedistributions of the values of the mean µ and the standard deviation σ , respectively, overa single run of six million samples of these parameters generated by the Gibbs samplerafter a preceding run of two thousand samples, which were classified as belonging to itsburn-in phase, had been discarded. The sampling of the density b ( µ | σ , x ) was basedon the Metropolis algorithm (Metropolis et al. 1953), while each value drawn from thedensity p ( σ | µ, x ) was independent from the preceding iterations.In addition to this analysis, the Gibbs sampler was also run various times from dif-ferent starting points, and a careful study of the output of these runs using appropriatediagnostics provided no evidence to suggest that the sampler does not have a limitingdistribution, and showed, at the same time, that it would appear to generally convergequickly to this distribution. Furthermore, the Gibbs sampling algorithm was run sepa-rately with each of the two possible fixed scanning orders of the parameters, i.e. the onein which µ is updated first and then σ is updated, and the one that has the reverseorder, in accordance with how a single transition of such an algorithm was defined inSection 2.2, i.e. single transitions of the algorithm incorporated updates of both param-eters. In doing this, no statistically significant difference was found between the samplesof parameter values aggregated over the runs of the sampler in using each of these twoscanning orders after excluding the burn-in phase of the sampler, e.g. between the twosample correlations of µ and σ , even when the runs concerned were long. Taking intoaccount what was discussed in Section 2.2, this implies that the full conditional densitiesof the limiting distribution of the original Gibbs sampler, i.e. the one with a uniformrandom scanning order, should be, at the very least, close approximations to the fullconditional densities on which the sampler is based, i.e. the post-data density b ( µ | σ , x )36nd the posterior density p ( σ | µ, x ) defined earlier.Each of the curves overlaid on the histograms in Figures 2(a) and 2(b), which aredistinguished by being plotted with short-dashed, long-dashed and solid lines, is identicalto the curve plotted using the same line type in Figures 1(a) and 1(b) respectively. Bycomparing the histograms in Figures 2(a) and 2(b) with the curves in question, it can beseen that the forms of the marginal post-data densities of µ and σ that are representedby these histograms are consistent with what we would have intuitively expected giventhe pre-data beliefs about µ and σ that have been taken into account. It may also be tosome extent informative to compare Figures 2(a) and 2(b) with Figures 4(a) and 4(b) ofBowater (2019b), since these latter figures relate to the example from this earlier paperthat was mentioned midway through the present section. We will now consider the problem of making inferences about the parameters π =( π , π , π ) (cid:48) of a trinomial distribution, where π i is the proportion of times that the i th outcome of the three possible outcomes is generated in the long run, based on observ-ing a sample of counts x = ( x , x , x ) (cid:48) from the distribution concerned, where x i is thenumber of times that the i th outcome is observed. Since of course π + π + π = 1, thismodel has effectively only two parameters, which we will assume to be the proportions π and π . To clarify, the probability of observing the sample of counts x = ( x , x , x ) (cid:48) is specified by the trinomial mass function in this case, i.e. the function: g ( x | π , π ) = (cid:26) ( n ! /x ! x ! x !) π x π x π x if x , x , x ∈ Z ≥ and n = x + x + x n is fixed.In particular, let us begin by applying organic fiducial inference as outlined in Sec-tion 2.4 to make inferences about π conditional on π being known. In this regard,37bserve that if π was known, sufficient statistics for π would be x and x + x . How-ever, x + x is an ancillary complement of x , and therefore, according to the moregeneral definition of the fiducial statistic Q ( x ) given in Bowater (2019a), the count x can justifiably be assumed to be the statistic Q ( x ). Based on this assumption and givena value for π , equation (3) can naturally be redefined as: x = ϕ (Γ , π ) = min (cid:110) y : Γ < (cid:80) yj =0 g ( j | π ) (cid:111) (28)where the primary r.v. Γ has a uniform distribution over the interval (0 , g ( j | π ) is given by: g ( j | π ) = ( x + x )!( x + x − j )! j ! (cid:18) π − π (cid:19) j (cid:18) − π − π − π (cid:19) x + x − j in which the statistic x + x is treated as having already been generated.Given that it will be assumed that there was no or very little pre-data knowledgeabout the proportion π , the GPD function for π will be quite reasonably specified asfollows: ω G ( π ) = a if 0 ≤ π ≤ − π and 0 otherwise, where a >
0. However, since forwhatever choice is made for this GPD function and whatever turns out to be the sample x , equation (28) will never satisfy Condition 1 of Section 2.4, the principle outlined inthis earlier section for deriving the fiducial density f ( θ j | θ − j , x ) can not be employedin the case of interest to determine the fiducial density of π given π , i.e. the density f ( π | π , x ). This density can instead, though, be determined by applying Principle 2of Bowater (2019a), which as mentioned in Section 2.4, is a principle that relies on theconcept of a local pre-data (LPD) function. In particular, to make use of this principlein the present case, we need to specify a LPD function for π . Further details about howthe principle in question is applied are given in Bowater (2019a).As also discussed in this earlier paper, the type of method being considered could beused to obtain a complete set of full conditional fiducial densities for k of the populationproportions of a multinomial distribution with k + 1 categories on the basis of a given38ample from this distribution, which could then be used to determine a joint fiducialdensity of these k proportions (or equivalently of all k + 1 population proportions of thedistribution) using the type of framework outlined in Section 2.2 of the current paper. Inrelation to this issue, a detailed example was presented in Bowater (2019a) of how a jointfiducial density of the five (or equivalently four of the five) population proportions of amultinomial distribution with five categories could be obtained using such an approach.However, in the present case, it will be assumed that, unlike the post-data densityof π given π , the post-data density of π given π does not belong to the class of fullconditional fiducial densities under discussion. This is because, in contrast to the kind ofscenario where the type of approach just mentioned is most applicable, it will be assumedthat we have important pre-data knowledge about the proportion π , and that this pre-data knowledge can, in particular, be adequately represented by a probability densityfunction over π conditional on π being known, i.e. the density p ( π | π ). To give anexample, let this density function be defined by: p ( π | π ) = (cid:40) C ( π ) α − (1 − π ) β − if 0 ≤ π ≤ − π α > β > C is a normalising constant. Treatingthis choice of the density p ( π | π ) as a prior density and combining it with the likelihoodfunction in this case, under the Bayesian paradigm, leads to a posterior density of π given π that is defined by: p ( π | π , x ) = (cid:40) C ( π ) α + x − (1 − π − π ) n − x − x (1 − π ) β − if 0 ≤ π ≤ − π C is a normalising constant.To illustrate this example, Figure 3 shows some results from running a Gibbs sampleron the basis of the full conditional post-data densities of π and π that have just beenreferred to, i.e. the fiducial density f ( π | π , x ) and the posterior density (derived using39 a) p D en s i t y (b) p D en s i t y Figure 3:
Unconditional prior density of one parameter, namely π , and marginal post-datadensities of both parameters π and π of a trinomial distribution Bayesian inference) p ( π | π , x ), with a uniform random scanning order of the param-eters π and π . In particular, the histograms in Figures 3(a) and 3(b) represent thedistributions of the values of π and π , respectively, over a single run of six millionsamples of these parameters generated by the Gibbs sampler after a preceding run of onethousand samples were discarded due to these samples being classified as belonging toits burn-in phase. The sampling of the density p ( π | π , x ) was based on the Metropo-lis algorithm, while the sampling of the density f ( π | π , x ) was independent from thepreceding iterations.Moreover, the observed counts on which the inferential process being described wasbased were set as follows: x = 4, x = 2 and x = 6. Also, it was assumed that the LPDfunction for π was given by: ω L ( π ) = (cid:40) b if 0 ≤ π ≤ − π b >
0, which is in keeping with the choices that were made for functions of this40ind in the aforementioned example in Bowater (2019a) of the use of organic fiducialinference in this type of situation. Finally, the specification of the prior density p ( π | π )was completed by making the assignments α = 1 . β = 11 . α and β imply that the prior density p ( π | π ) is equal to the density function of π that is defined by: p ( π ) ∝ ( π ) . (1 − π ) . if 0 ≤ π ≤ π ≤ − π , which clearly must always hold, but is ofcourse a condition that can only be applied if the proportion π is known. Furthermore,this latter unconditioned density p ( π ) is equivalent to the (unconditional) posteriordensity of π that would be formed after observing the counts x = 1 and x + x = 11(for which, we can see, membership of categories 2 and 3 is not distinguished) if theprior density of π was the Jeffreys prior that corresponds to conducting the binomialexperiment that produced these counts (see Jeffreys 1961). However, since as mentionedin Section 2.3, posterior densities formed on the basis of prior densities that are dependenton the sampling model, such as the Jeffreys prior, are controversial, it is arguably of moreinterest to note that this posterior density of π is a close approximation to forms of the(unconditional) fiducial density of π that would be naturally constructed on the basis ofthe two counts in question, i.e. x = 1 and x + x = 11, by applying the methodology inBowater (2019a) if nothing or very little was known about the proportion π before thesecounts were observed. This type of approximation was discussed both in this previouspaper and in Bowater (2019b).In addition to the analysis just described, the Gibbs sampler of present interest wasalso run various times from different starting points, and there was no suggestion fromusing appropriate diagnostics that the sampler does not have a limiting distribution.Furthermore, after excluding the burn-in phase of the sampler, no statistically significantdifference was found between the samples of parameter values aggregated over the runs41f the sampler in using each of the two fixed scanning orders of the parameters π and π that are possible, with a single transition of the sampler defined in the same way as inthe example outlined in the previous section, even when the runs concerned were long.Therefore, taking into account what was discussed in Section 2.2, the full conditionaldensities of the limiting distribution of the original random-scan Gibbs sampler shouldbe, at the very least, close approximations to the full conditional densities on which thesampler is based, i.e. the posterior density p ( π | π , x ) and the fiducial density f ( π | π , x )defined earlier.The solid curves overlaid on the histograms in Figures 3(a) and 3(b) are plots of themarginal densities of the parameters π and π , respectively, over the joint posteriordensity of π and π that would be formed after having only observed the main data ofinterest, i.e. the counts x = 4, x = 2 and x = 6, if the joint prior density of theseparameters was the Jeffreys prior for this case. It can be shown that this joint posteriordensity, which is in fact defined by the expression: p ( π , π | x ) = (cid:40) C ( π ) x − . ( π ) x − . (1 − π − π ) x − . if π , π ∈ [0 ,
1] and π + π ≤
10 otherwisewhere C is a normalising constant, is a close approximation to forms of the joint fiducialdensity of π and π that would be naturally constructed on the basis of these observedcounts x , x and x by applying the methodology in Bowater (2019a) if there was noor very little pre-data knowledge about π and π . The dashed curve overlaid on thehistogram in Figure 3(a) is a plot of the density function of π given in equation (30),i.e. the unconditioned prior density p ( π ).By comparing the locations and degrees of dispersion of the histograms in Figures 3(a)and 3(b), it can be seen that it is beyond dispute that generally more precise conclusionscan be drawn about the proportion π than the proportion π after the counts x , x and x in question have been observed, which, on the basis of comparing these histograms42ith the curves overlaid on them, can be clearly attributed to the incorporation, underthe Bayesian paradigm, of substantial prior information about π into the constructionof the joint post-data density of π and π . Let us now turn our attention to the problem of making inferences about all the param-eters β , β , β , β and σ of the normal linear regression model defined by: Y = β + β x + β x + β x + ε with ε ∼ N(0 , σ ) (31)where Y is the response variable and x , x and x are three covariates, on the basis of adata set y + = { ( y i , x i , x i , x i ) : i = 1 , , . . . , n } , where y i is the value of Y generated bythis model for the i th case in this data set given values x i , x i and x i of the covariates x , x and x respectively.Observe that sufficient statistics for each of the parameters β , β , β , β and σ conditional on all parameters except the parameter itself being known are respectively: n (cid:88) i =1 y i , n (cid:88) i =1 x i y i , n (cid:88) i =1 x i y i , n (cid:88) i =1 x i y i and n (cid:88) i =1 ( y i − β − β x i − β x i − β x i ) (32)In Bowater (2018a), all except the fourth statistic here were used as fiducial statistics Q ( y + ) to derive, under the strong fiducial argument, a complete set of full conditionalfiducial densities of the model parameters in the special case where the model in equa-tion (31) is a quadratic regression model, i.e. where x = ( x ) and the coefficient β is set to zero (hence the lack of a need for the fourth statistic). Also, it was shown inthis earlier paper that, since these full conditional densities are compatible, they directlydefine a unique joint density for β , β , β and σ , which is therefore a joint fiducialdensity for these parameters. Furthermore, it is fairly clear from this previous analysishow the particular method of inference that was employed can be extended to address the43roblem of making inferences about the parameters of the more general type of normallinear regression model that is defined by equation (31).However, this specific type of method is not going to be directly applicable to thecase that will be presently considered. This is because, although it will be assumed thatnothing or very little was known about the parameters β , β and σ before the data wereobserved, by contrast it is going to be assumed that there was a substantial amount ofpre-data knowledge about the parameters β and β . Let us begin though by clarifyinghow the full conditional post-data densities of β , β and σ will be constructed.With this aim in mind, notice that if the sufficient statistics for β and β presented inequation (32) are treated as the fiducial statistics Q ( y + ) in making inferences about thesetwo parameters respectively, then given that the sampling distributions of these statisticsare normal, the functions ϕ (Γ , β ) and ϕ (Γ , β ), as generally defined by equation (3), canbe expressed in a similar way to how the function ϕ (Γ , µ ) was expressed in equation (13).Also if, under the condition that σ is the only unknown parameter, the sufficient statisticfor σ presented in equation (32) is treated as the statistic Q ( y + ) in making inferencesabout this parameter, then given that this statistic divided by σ has a chi-squaredsampling distribution with n degrees of freedom, the function ϕ (Γ , σ ) can be expressedin a similar way to how this type of function was expressed in equation (15), where itwas also denoted as ϕ (Γ , σ ) but with of course a different meaning. Furthermore, givenwhat has been assumed, it would be quite natural to specify the GPD function for σ in the same way as the GPD function for a population variance (also denoted as σ )was defined in equation (16), and to specify the GPD functions for β and β as follows: ω G ( β i ) = a for β i ∈ ( −∞ , ∞ ), where a >
0. This leads to the full conditional fiducialdensities for β , β and σ being defined as follows: β | β − , σ , y + ∼ N (cid:0)(cid:80) ni =1 y i /n − β (cid:80) ni =1 x i /n − β (cid:80) ni =1 x i /n − β (cid:80) ni =1 x i /n, σ /n (cid:1) (33)44 | β − , σ , y + ∼ N (cid:18) (cid:80) ni =1 x i y i − β (cid:80) ni =1 x i − β (cid:80) ni =1 x i x i − β (cid:80) ni =1 x i x i (cid:80) ni =1 x i , σ (cid:80) ni =1 x i (cid:19) (34) σ | β , ..., β , y + ∼ Inv-Gamma ( n/ , (cid:80) ni =1 ( y i − β − β x i − β x i − β x i ) /
2) (35)where β − j denotes the set of all the regression coefficients except β j .Now let us provide more details with regard to what was known about the coefficient β before the data were observed. In particular, let us assume that conditional on allother parameters in the model being known, the scenario of interest of Section 2.5 wouldapply if the general parameter θ j was taken as being β , with the interval [ θ j , θ j ] in thisscenario now being specified as simply the interval [ − δ, δ ], where δ ≥
0. We will thereforeconstruct the full conditional post-data density of β using the type of bispatial inferenceoutlined in Section 2.5, which implies that, from now on, this density will be denoted as b ( β | β − , σ , y + ).In particular to do this, the test statistic T ( x ) as defined in Section 2.5, which nowneeds to be denoted as T ( y + ), will be assumed to be the least squares estimator of β under the condition that all other parameters are known, i.e. the estimator: ˆ β = (cid:80) ni =1 x i y i − β (cid:80) ni =1 x i − β (cid:80) ni =1 x i x i − β (cid:80) ni =1 x i x i (cid:80) ni =1 x i (36)which is a reasonable assumption to make since, under this condition, it is a sufficientstatistic for β that satisfies the second criterion given in Section 2.5 for being the statistic T ( y + ). Observe that this estimator has a sampling distribution that is defined by: ˆ β ∼ N (cid:0) β , σ / (cid:80) ni =1 x i (cid:1) Therefore, the hypotheses H P and H S defined in Section 2.5 that are applicable in the45ase where ˆ β ≤
0, i.e. the hypotheses in equations (6) and (7), can now be expressed as: H P : β ≥ − δH S : ρ ( (cid:98) B ∗ < ˆ β ) ≤ Φ (cid:18) ( ˆ β + δ )(1 /σ ) (cid:113)(cid:80) ni =1 x i (cid:19) (= J ) (37)where Φ( ) again denotes the standard normal distribution function, while (cid:98) B ∗ is theestimator ˆ β calculated exclusively on the basis of an as-yet-unobserved sample of n additional data points Y ∗ + = { ( Y ∗ i , x i , x i , x i ) : i = 1 , , . . . , n } generated according tothe regression model in equation (31), where the values of the covariates x , x and x are assumed to be the same as in the original sample. On the other hand, the hypotheses H P and H S that apply if ˆ β >
0, i.e. the hypotheses in equations (8) and (9), can nowbe expressed as: H P : β ≤ δH S : ρ ( (cid:98) B ∗ > ˆ β ) ≤ − Φ (cid:18) ( ˆ β − δ )(1 /σ ) (cid:113)(cid:80) ni =1 x i (cid:19) (= J ) (38)Also, let us assume, quite reasonably, that the fiducial density f S ( θ j | x ) that is requiredby equations (10) and (12), i.e. the density f S ( β | β − , σ , y + ) in the present case, isderived on the basis of the strong fiducial argument with the fiducial statistic Q ( y + )specified as being a sufficient statistic for β , e.g. one of the sufficient statistics for β givenin equations (32) and (36). Under these assumptions, the fiducial density in question isdetermined in a similar way to how the fiducial densities in equations (33), (34) and (35)were determined, and in particular is given by the expression: β | β − , σ , y + ∼ N (cid:16) ˆ β , σ / (cid:80) ni =1 x i (cid:17) (39)On the other hand, it will be assumed that we knew enough about the coefficient β before the data were observed such that it is possible to adequately represent our pre-data knowledge about this coefficient by placing a probability density function over this46oefficient conditional on all other parameters being known, i.e. the density p ( β | β − , σ ).To give an example, let this density function be defined by: β | β − , σ ∼ N( µ , σ ) (40)where µ and σ > p ( β | β − , σ )as a prior density and combining it with the likelihood function in this case, under theBayesian paradigm, leads to a full conditional posterior density of β , i.e. the density p ( β | β − , σ , y + ), that can be expressed as: β | β − , σ , y + ∼ N (cid:32) σ (cid:34) ˆ β (cid:80) ni =1 x i σ + µ σ (cid:35) , σ (cid:33) where σ = (( (cid:80) ni =1 x i /σ ) + (1 /σ )) − and ˆ β = (cid:80) ni =1 x i y i − β (cid:80) ni =1 x i − β (cid:80) ni =1 x i x i − β (cid:80) ni =1 x i x i (cid:80) ni =1 x i To illustrate this example, Figure 4 shows some results from running a Gibbs samplerwith a uniform random scanning order of the parameters β , β , β , β and σ on the basisof the full conditional post-data densities of these parameters that have just been detailed,i.e. the fiducial densities f ( β | β − , σ , y + ), f ( β | β − , σ , y + ) and f ( σ | β , ..., β , y + ) de-fined by equations (33), (34) and (35), the post-data density (derived using bispatialinference) b ( β | β − , σ , y + ) and the posterior density p ( β | β − , σ , y + ) defined by theequation just given. In particular, the histograms in Figures 4(a) to 4(d) represent thedistributions of the values of the coefficients β , β , β and the standard deviation σ , re-spectively, over a single run of ten million samples of all five model parameters generatedby the Gibbs sampler after allowing for its burn-in phase by discarding a preceding runof five thousand samples. (For reasons of space, a histogram of the generated values ofthe intercept coefficient β is not given.) The sampling of the density b ( β | β − , σ , y + )47 . . . . . . (a) b D en s i t y −3.5 −3.0 −2.5 −2.0 −1.5 . . . . . . (b) b D en s i t y −0.5 0.0 0.5 1.0 1.5 2.0 2.5 . . . . . (c) b D en s i t y . . . . (d) s D en s i t y Figure 4:
Conditional prior density of one parameter, namely β , and marginal post-datadensities of four parameters β , β , β and σ of a normal linear regression model was based on the Metropolis algorithm, while the sampling of each of the other four fullconditional post-data densities was independent from the preceding iterations.Moreover, the values for the response variable Y in the observed data set y + werea typical sample of n = 18 such values generated according to the regression model inequation (31) with β = 0, β = 5, β = − β = 1 and σ = 1 .
5, and with the values48f the covariates x , x and x in this data set chosen without replacement from the 27combinations of values for these covariates that are possible if each covariate can onlytake the value −
1, 0 or 1. In particular, the way these covariate values were selectedresulted in: (cid:80) x i = − (cid:80) x i = 2, (cid:80) x i = 1, (cid:80) x i x i = 3, (cid:80) x i x i = 4 and (cid:80) x i x i = −
3. In addition, the specification of the posterior density p ( β | β − , σ , y + )was completed by setting the constants µ and σ , i.e. the constants that control the choiceof the prior density of β in equation (40), to be 4.4 and 0.6 respectively. On the otherhand, with regard to how the post-data density b ( β | β − , σ , y + ) was fully determined,the constant δ was assumed to be equal to 0.1, and the probabilities κ that would beassigned to the hypothesis H S as defined by either equation (37) or equation (38) fordifferent values of all the model parameters except β were assumed to be given by thePDO curve with the formula: κ = J . , where, as indicated in equations (37) and (38), J is the one-sided P value in whichever definition of the hypothesis H S is applicable. Also,in determining the post-data density of β in question, the density function h ( θ j ) thatappears in equation (11), i.e. the density h ( β ) in the present case, was defined similar tohow a density function of this type was specified in Section 3.2, that is, by the expression β ∼ Beta(4 , , − . , . β , β , β , β and σ in accordance with how a single transition of such an algorithmwith a fixed scanning order was defined in Section 2.2. In doing this, no statisticallysignificant difference was found between the samples of parameter values aggregated overthe runs of the sampler, after excluding the burn-in phase of the sampler, in using eachof the scanning orders concerned, e.g. between the various correlation matrices of the49arameters and between the various distributions of each individual parameter, evenwhen the runs in question were long. Therefore, on the grounds of what was discussed inSection 2.2, it would be reasonable to conclude that the full conditional densities of thelimiting distribution of the original random-scan Gibbs sampler should be, at the veryleast, close approximations to the full conditional densities on which the sampler is based,i.e. the fiducial densities f ( β | β − , σ , y + ), f ( β | β − , σ , y + ) and f ( σ | β , ..., β , y + ), thepost-data density b ( β | β − , σ , y + ) and the posterior density p ( β | β − , σ , y + ).The solid curves overlaid on the histograms in Figures 4(a) to 4(d) are plots of themarginal densities of the coefficients β , β , β and the standard deviation σ , respec-tively, over the joint fiducial density of all the parameters in the model that is defineddirectly and uniquely by the set of compatible full conditional densities that consists ofthe fiducial densities f ( β | β − , σ , y + ), f ( β | β − , σ , y + ) and f ( σ | β , ..., β , y + ) just re-ferred to, which of course are given by equations (33), (34) and (35), the fiducial density f S ( β | β − , σ , y + ) given by equation (39), and the fiducial density for β conditional on β , β , β and σ that results from making assumptions that are analogous to those onwhich the aforementioned full conditional fiducial densities for β , β and β are based.On the other hand, the dashed curve overlaid on the histogram in Figure 4(a) is a plotof the conditional prior density of β given in equation (40).By comparing the histograms in Figures 4(a) to 4(d) with the curves overlaid on them,it can be seen that the forms of the marginal post-data densities of β , β , β and σ thatare represented by these histograms are consistent with what could have been intuitivelyexpected given the pre-data beliefs about all of the model parameters that were takeninto account as part of the method of inference that has been described in the presentsection. 50 .5. Inference about a bivariate normal distribution To give a final detailed example of the application of integrated organic inference, let usconsider the problem of making inferences about all five parameters of a bivariate normaldensity function, i.e. the means µ x and µ y and the variances σ x and σ y , respectively, ofthe two random variables concerned X and Y , and the correlation τ of X and Y , on thebasis of a sample from this type of density function, i.e. the sample z = { ( x i , y i ) : i =1 , , . . . , n } , where x i and y i are the i th realisations of X and Y respectively.In Bowater (2018a), as a way of addressing this problem, full conditional fiducialdensities were derived either exactly or approximately for each of the parameters µ x , µ y , σ x , σ y and τ by using appropriately chosen fiducial statistics under the strong fiducialargument, and then it was illustrated how, on the basis of these conditional densities,what can be regarded as being a suitable joint fiducial density of these parameters can beobtained by using the Gibbs sampler within the type of framework outlined in Section 2.2of the current paper. However, for the same kind of reason that was given in relation tothe use of a similar method of inference in the previous section, this particular methodis not going to be directly applicable to the case that will be presently considered. Thisis more specifically due to the fact that, although we will assume that nothing or verylittle was known about the means µ x and µ y before the data were observed, by contrastwe are going to assume that there was a substantial amount of pre-data knowledge aboutthe variances σ x and σ y and the correlation coefficient τ . To begin with though, let usclarify how the full conditional post-data densities of µ x and µ y will be constructed.In this regard, observe that sufficient statistics for the parameters µ x and µ y conditionalon all parameters except the parameter itself being known are: q x = ¯ x − τ ( σ x /σ y )¯ y and q y = ¯ y − τ ( σ y /σ x )¯ x, respectively, where ¯ x = (cid:80) ni =1 x i and ¯ y = (cid:80) ni =1 y i . Therefore, these two statistics q x q y will be assumed to be the fiducial statistics Q ( z ) that will be used in makinginferences about µ x and µ y respectively. Under this assumption, if µ x is the only unknownparameter in the model, then equation (3) will now have the form q x = ϕ (Γ , µ x ), andmore specifically can be expressed as:¯ x − τ (cid:18) σ x σ y (cid:19) ¯ y = µ x − τ (cid:18) σ x σ y (cid:19) µ y + Γ (cid:18) σ x (1 − τ ) n (cid:19) . where the primary r.v. Γ ∼ N(0 , µ x , it would be quite natural to specify the GPD functionfor µ x as follows: ω G ( µ x ) = a for µ x ∈ ( −∞ , ∞ ), where a >
0. This implies that the fullconditional fiducial density of µ x is defined by: µ x | µ y , σ x , σ y , τ, z ∼ N (cid:18) ¯ x + τ (cid:18) σ x σ y (cid:19) ( µ y − ¯ y ) , σ x (1 − τ ) n (cid:19) (41)Furthermore, due to the symmetrical nature of the bivariate normal distribution, it shouldbe clear that, using a GPD function for µ y of the same type as just used for µ x , the fullconditional fiducial density of µ y would be defined by: µ y | µ x , σ x , σ y , τ, z ∼ N (cid:18) ¯ y + τ (cid:18) σ y σ x (cid:19) ( µ x − ¯ x ) , σ y (1 − τ ) n (cid:19) (42)With regard to what was known about the variances σ x and σ y before the data wereobserved, we will assume that it is possible to adequately represent such knowledge byplacing a probability density function over each of these parameters conditional on allparameters except the parameter itself being known, i.e. the densities p ( σ x | µ x , µ y , σ y , τ )and p ( σ y | µ x , µ y , σ x , τ ) respectively. To give an example, let these density functions for σ x and σ y be defined respectively by: σ x ∼ Inv-Gamma ( α x , β x ) and σ y ∼ Inv-Gamma ( α y , β y ) (43)where α x , β x , α y and β y are given positive constants.52otice that, for the case being considered, the likelihood functions that would beplaced over each of the parameters σ x and σ y assuming that all parameters except theparameter itself are known are given by the expressions: L ( σ x | µ x , µ y , σ y , τ, z ) = (1 /σ x ) n exp (cid:18) − − τ ) (cid:18) (cid:80) ( x (cid:48) i ) σ x (cid:19) + τ − τ (cid:18) (cid:80) x (cid:48) i y (cid:48) i σ x σ y (cid:19)(cid:19) (44)and L ( σ y | µ x , µ y , σ x , τ, z ) = (1 /σ y ) n exp (cid:18) − − τ ) (cid:18) (cid:80) ( y (cid:48) i ) σ y (cid:19) + τ − τ (cid:18) (cid:80) x (cid:48) i y (cid:48) i σ x σ y (cid:19)(cid:19) (45)respectively, where x (cid:48) i = x i − µ x and y (cid:48) i = y i − µ y . Therefore, if the choices of thedensities p ( σ x | µ x , µ y , σ y , τ ) and p ( σ y | µ x , µ y , σ x , τ ) in equation (43) are treated as priordensities, it can easily be seen how, by combining these prior densities with the likelihoodfunctions in equations (44) and (45) under the Bayesian paradigm, the full conditionalposterior densities of σ x and σ y can be numerically computed, i.e. the posterior densities p ( σ x | µ x , µ y , σ y , τ, z ) and p ( σ y | µ x , µ y , σ x , τ, z ).On the other hand, with regard to the beliefs that were held about the correlationcoefficient τ before the data were observed, let us assume that conditional on all otherparameters being known, the scenario of interest of Section 2.5 would apply if the generalparameter θ j was taken as being τ , with the interval [ θ j , θ j ] in this scenario now beingspecified as the interval [ − ε, ε ], where ε ≥
0. As a result, we will now discuss how thefull conditional post-data density of τ will be constructed by using the type of bispatialinference outlined in Section 2.5, which implies that it will be denoted as the density b ( τ | µ x , µ y , σ x , σ y , z ).In this respect, let us begin by pointing out that since, if all parameters except τ are known, there exists no sufficient set of univariate statistics for τ that contains onlyone statistic that is not an ancillary statistic, it would seem reasonable to assume thatthe test statistic T ( z ), as generally defined in Section 2.5, is the maximum likelihood53stimator of τ given that all other parameters are known. It can be shown that thismaximum likelihood estimator is the value ˆ τ that solves the following cubic equation: − n ˆ τ + (cid:18) (cid:80) ni =1 x (cid:48) i y (cid:48) i σ x σ y (cid:19) ˆ τ + (cid:18) n − (cid:80) ni =1 ( x (cid:48) i ) σ x − (cid:80) ni =1 ( y (cid:48) i ) σ y (cid:19) ˆ τ + (cid:80) ni =1 x (cid:48) i y (cid:48) i σ x σ y = 0Now, it is well known that a maximum likelihood estimator of a parameter is usuallyasymptotically normally distributed with mean equal to the true value of the parameter,and variance equal to the inverse of the Fisher information with respect to that parameter.(To clarify, this is the Fisher information obtained via differentiating the logarithm ofthe likelihood function with respect to the parameter concerned.) For this reason, if n islarge, the sampling density function of the maximum likelihood estimator ˆ τ just definedcan be approximately expressed as follows: ˆ τ ∼ N( τ, / I ( τ )) (46)where I ( τ ) is the Fisher information of the likelihood function in this example withrespect to τ assuming all other parameters are known, which is in fact given by: I ( τ ) = n (1 + τ )(1 − τ ) Using this approximation, the hypotheses H P and H S defined in Section 2.5 that areapplicable in the case where ˆ τ ≤
0, i.e. the hypotheses in equations (6) and (7), can nowbe expressed as: H P : τ ≥ − ε (47) H S : ρ ( (cid:98) T ∗ < ˆ τ ) ≤ Φ (cid:16) ( ˆ τ + ε ) (cid:112) I ( ε ) (cid:17) (= J ) (48)where (cid:98) T ∗ is the estimator ˆ τ calculated exclusively on the basis of an as-yet-unobservedsample of n additional data points { ( X ∗ i , Y ∗ i ) : i = 1 , , . . . , n } drawn from the bivariatenormal density function being studied, and Φ( ) is again the standard normal distribution54unction. On the other hand, the hypotheses H P and H S that apply if ˆ τ >
0, i.e. thehypotheses in equations (8) and (9), can now be expressed as: H P : τ ≤ ε (49) H S : ρ ( (cid:98) T ∗ > ˆ τ ) ≤ − Φ (cid:16) ( ˆ τ − ε ) (cid:112) I ( ε ) (cid:17) (= J ) (50)We should point out that if the estimator ˆ τ did indeed have the normal distribution givenin equation (46), then it can be easily shown that this estimator would satisfy the secondcriterion given in Section 2.5 for being a valid test statistic T ( z ), which would in turnimply that the hypotheses H P and H S as defined in equations (47) and (48) would beequivalent, and also that these hypotheses as defined in equations (49) and (50) wouldbe equivalent.To determine the fiducial density f S ( θ j | x ) that is required by equations (10) and (12),i.e. the density f S ( τ | µ x , µ y , σ x , σ y , z ) in the present case, let us begin by assuming that themaximum likelihood estimator ˆ τ is the fiducial statistic Q ( z ), which is actually the choicethat was made for this statistic Q ( z ) in the aforementioned example in Bowater (2018a)when fiducial inference was used in this type of situation, i.e. in the situation where τ isthe only unknown parameter. However, instead of assuming that the sampling densityfunction of ˆ τ is a normal density as has just been done, and as was done in the context ofcurrent interest in Bowater (2018a), let us assume that it is a transformation of ˆ τ that isnormally distributed, namely the function tanh − ( ˆ τ ). The reason for doing this is that itcan be shown that, under this latter assumption, a generally better approximation to thesampling density of ˆ τ can be obtained than under the former assumption, except, thatis, when τ is close to zero. Notice that this exception is the reason why this alternativeassumption was not the preferred assumption in the preceding discussion in order toderive approximate forms of the hypothesis H S . More specifically, it will be assumedthat the density function of tanh − ( ˆ τ ) is directly specified (and the density function of ˆ τ
55s therefore indirectly specified) by the expression:tanh − ( ˆ τ ) ∼ N(tanh − ( τ ) , / I (tanh − τ ))where I (tanh − τ ) is the Fisher information with respect to the quantity tanh − ( τ ) as-suming all parameters except τ are known, which is in fact given by: I (tanh − τ ) = n (1 + τ )Allowing tanh − ( ˆ τ ) to take the role of the statistic Q ( z ), and using the approximationto the density function of this statistic tanh − ( ˆ τ ) just given, we can therefore approximateequation (3) in the case where τ is the only unknown parameter as follows:tanh − ( ˆ τ ) = ϕ (Γ , τ ) = tanh − ( τ ) + Γ (cid:112) n (1 + τ ) (51)where the primary r.v. Γ ∼ N(0 , − v , v ) where v >
0, then this condition will be satisfied for very large values of v underthe restriction that n is not too small and ˆ τ is not very close to − n = 100 and | ˆ τ | < . v ,but even if v is chosen to be as high as 36, and will be satisfied for substantially largervalues of v as | ˆ τ | becomes smaller.We will therefore make use of equation (51) under the assumption that the primary r.v.Γ follows the truncated normal density function just mentioned with v chosen to be equalto or not far below the largest possible value of v that is consistent with equation (51)satisfying Condition 1. Also, since the fiducial density f S ( τ | µ x , µ y , σ x , σ y , z ) needs to bederived under the assumption that, given the values of the conditioning parameters µ x , µ y , σ x and σ y , there would have been no or very little pre-data knowledge about τ , it willbe quite naturally assumed that the GPD function of τ is specified as follows: ω G ( τ ) = b − ≤ τ ≤ b >
0. Under the assumptions that have just beenmade, applying the principle outlined in Section 2.4 for deriving a fiducial density of thegeneral type f ( θ j | θ − j , x ), i.e. Principle 1 of Bowater (2019a), leads to an approximationto the full conditional fiducial density of τ that is given by: f S ( τ | µ x , µ y , σ x , σ y , z ) = ψ t ( γ ) (cid:12)(cid:12)(cid:12)(cid:12) dγdτ (cid:12)(cid:12)(cid:12)(cid:12) if τ ∈ ( τ , τ ) and is zero otherwisewhere γ is the value of Γ that solves equation (51) for the given value of τ , i.e. γ = (tanh − ( ˆ τ ) − tanh − ( τ )) n . (1 + τ ) . while ψ t ( γ ) is the standard normal density function truncated to lie in the interval ( − v , v )evaluated at γ , and finally ( τ , τ ) is the interval of values of τ that, according to equa-tion (51), correspond to γ lying in the interval ( − v , v ). With the assumption havingbeen made that the fiducial density f S ( τ | µ x , µ y , σ x , σ y , z ) is approximately determinedin this manner, it can be easily seen how the specification of the post-data density b ( τ | µ x , µ y , σ x , σ y , z ) can be completed by using the criteria of Section 2.5.To illustrate this example, Figure 5 shows some results from running a Gibbs sam-pler with a uniform random scanning order of the parameters µ x , µ y , σ x , σ y and τ onthe basis of the full conditional post-data densities of these parameters that have justbeen detailed, i.e. the fiducial densities f ( µ x | µ y , σ x , σ y , τ, z ) and f ( µ y | µ x , σ x , σ y , τ, z )defined by equations (41) and (42), the posterior densities (derived using Bayesian in-ference) p ( σ x | µ x , µ y , σ y , τ, z ) and p ( σ y | µ x , µ y , σ x , τ, z ) and the post-data density (de-rived using bispatial inference) b ( τ | µ x , µ y , σ x , σ y , z ). In particular, the histograms inFigures 5(a) to 5(e) represent the distributions of the values of µ x , µ y , σ x , σ y and τ ,respectively, over a single run of ten million samples of these parameters generated bythe Gibbs sampler after allowing for its burn-in phase by discarding a preceding runof five thousand samples. The sampling of each of the densities p ( σ x | µ x , µ y , σ y , τ, z ), p ( σ y | µ x , µ y , σ x , τ, z ) and b ( τ | µ x , µ y , σ x , σ y , z ) was based on the Metropolis algorithm,57hile the sampling of each of the densities f ( µ x | µ y , σ x , σ y , τ, z ) and f ( µ y | µ x , σ x , σ y , τ, z )was independent from the preceding iterations.Moreover, the observed data set z was a typical sample of n = 100 data pointsfrom a bivariate normal distribution with µ x = 0, µ y = 0, σ x = 1, σ y = 1 and τ = 0 .
3. In addition, the specification of the posterior densities p ( σ x | µ x , µ y , σ y , τ, z )and p ( σ y | µ x , µ y , σ x , τ, z ) were completed by assuming the values of the constants α x , β x , α y and β y , i.e. the constants that control the choice of the prior densities of σ x and σ y in equation (43), were set as follows: α x = 49 . β x = 48, α y = 49 . β y = 34. Onthe other hand, with regard to how the post-data density b ( τ | µ x , µ y , σ x , σ y , z ) was fullydetermined, the constant ε was assumed to be equal to 0.02, and the probabilities κ thatwould be assigned to the hypotheses H S in equations (48) and (50) for different valuesof all the parameters except τ were assumed to be given by the PDO curve with, oncemore, the formula: κ = J . , where, as indicated in these earlier equations, J is theone-sided P value in the definition of the hypothesis H S that is applicable. Also, in de-termining the post-data density of τ in question, the density function h ( θ j ) that appearsin equation (11), i.e. the density h ( τ ) in the present case, was defined similar to how adensity function of this type was specified in earlier examples, that is, by the expression τ ∼ Beta(4 , , − . , . µ x , µ y , σ x , σ y and τ , witha single transition of the sampler defined in the same way as in previous examples,even when the runs in question were long. Taking into account what was discussed in58 (a) m x D en s i t y −0.3 −0.2 −0.1 0.0 0.1 0.2 (b) m y D en s i t y (c) s x D en s i t y (d) s y D en s i t y (e) t D en s i t y Figure 5:
Conditional prior densities of two parameters, namely σ x and σ y , and marginalpost-data densities of all five parameters of a bivariate normal distribution f ( µ x | µ y , σ x , σ y , τ, z ) and f ( µ y | µ x , σ x , σ y , τ, z ), the pos-terior densities p ( σ x | µ x , µ y , σ y , τ, z ) and p ( σ y | µ x , µ y , σ x , τ, z ) and the post-data density b ( τ | µ x , µ y , σ x , σ y , z ).The solid curves overlaid on the histograms in Figures 5(a) and 5(c) are plots ofthe marginal fiducial densities of the parameters µ and σ , respectively, as defined byequations (18) and (19) that would apply if the data set of interest only consisted ofthe observed values of the variable X , i.e. { x i : i = 1 , , . . . , } , while in Figures 5(b)and 5(d), the solid curves represent, respectively, the marginal fiducial densities of µ and σ defined in the same way except that these densities correspond to treating theobserved values of the variable Y rather than the variable X , i.e. the set of values { y i : i = 1 , , . . . , } , as being the data set x in the equations being discussed. On the otherhand, the dashed curves overlaid on the histograms in Figures 5(c) and 5(d) are plots ofthe conditional prior densities for σ x and σ y , respectively, as defined in equation (43).Finally, the solid curve overlaid on the histogram in Figure 5(e) is a plot of a confidencedensity function for the parameter τ . In general, a density function of this type corre-sponds to a set of confidence intervals that have a varying coverage probability for theparameter concerned, see for example Efron (1993) for further clarification. More specif-ically, for the plot being considered, these confidence intervals for τ were constructedon the basis of summarising the data set z by the sample correlation coefficient r , andthen assuming that the Fisher transformation of this coefficient, i.e. the transforma-tion tanh − ( r ), has a normal sampling distribution with mean tanh − ( τ ) and variance1 / ( n − τ . 60imilar to earlier examples, it can be seen from comparing the histograms in Fig-ures 5(a) to 5(d) with the curves overlaid on them that the forms of the marginalpost-data densities of µ x µ y , σ x and σ y that are represented by these histograms areconsistent with what we would have intuitively expected given the pre-data beliefs aboutthese parameters and the correlation τ that have been taken into account. Furthermore,we can observe that the marginal post-data density for τ represented by the histogramin Figure 5(e) differs substantially from the curve overlaid on this histogram, i.e. theaforementioned type of confidence density function for τ , particularly with regard to theamount of probability mass that these two density functions assign to values of τ close tozero. This arguably gives an indication of how inadequate it would be, in this example, toattempt to make inferences about the correlation τ using the standard type of confidenceintervals for τ on which the overlaid curve in question is based. As part of the discussion of the examples that were outlined in the preceding sections,reference was made to additional examples from Bowater (2018a), Bowater (2019a) andBowater (2019b) that fit within the inferential framework that has been put forward inthe present paper. Here the opportunity will be taken to highlight examples of a similarkind from these earlier papers that have not been mentioned up to this point.To begin with, let us remark that in Bowater (2019a), organic fiducial inference wasapplied to the problem of making post-data inferences about discrete probability distri-butions that naturally only have one unknown parameter, in particular the binomial andPoisson distributions, and as a result, a fiducial density for the parameter concerned wasdetermined. With regard to making inferences about a binomial proportion, the appli-cation of the method of inference in question represents, of course, a special case of thetype of scenario discussed in Section 3.3, i.e. the case where the population proportion61 in this latter example is set to zero. Furthermore, the problem of making post-datainferences about a binomial proportion was addressed in Bowater (2019b) by using thetype of bispatial inference that was described in Section 2.5.On the other hand, in Bowater (2018a), joint post-data densities for the two param-eters of the Pareto, gamma and beta distributions were determined by using the typeof framework that was outlined in Section 2.2 on the basis of full conditional post-datadensities of the parameters concerned that were formed by applying, in effect, organicfiducial inference, i.e. all these full conditional and joint post-data densities were, in fact,fiducial densities. In addition, the post-data density for a relative risk π t /π c was deter-mined in Bowater (2019b) by using the kind of framework of Section 2.2 on the basisof full conditional post-data densities for the binomial proportions π t and π c that wereformed by applying the type of bispatial inference detailed in Section 2.5 in a way thatmeant that dependence would, in general, exist between π t and π c in the joint post-datadensity of these parameters. Finally, in Bowater (2018a), a method that was, in effect, or-ganic fiducial inference was applied to the problem of making post-data inferences aboutthe difference between the means of two normal density functions that have unknownvariances on the basis of independent samples from the two density functions concerned,i.e. the Behrens-Fisher problem.
4. Defence and discussion of the theory
There now follows a discussion of the theory put forward in the present paper, i.e. inte-grated organic inference, arranged as a series of questions that one might expect would benaturally raised as a reaction to first reading about this theory, and immediate responsesto each of these questions. 62 uestion 1.
Why not always use the Bayesian approach to inference?
As comments were already made in Section 2.3 regarding the flawed nature of twocommon ‘objective’ forms of Bayesian inference, let us consider the proposal of alwaysmaking post-data inferences about model parameters using the standard or subjectiveBayesian paradigm.It is clearly arguable that the main difficulty with the Bayesian paradigm is in choos-ing a prior density function for the model parameters that adequately represents whatwas known about these parameters before the data were observed. According to thedefinition of probability being adopted in this paper, i.e. the definition outlined in detailin Bowater (2018b) that was summarised in Section 2.1, carrying out this task in anunsatisfactory manner (which can reasonably be regarded as often being unavoidable) isformally indicated by a low ranking being attached to the external strength of the priordistribution function, under the assumption, which will be made from now onwards, thatthe event R ( λ ) is a given outcome of a well-understood physical experiment (such asdrawing a ball out of an urn of balls) and the resolution level λ is some value in the inter-val [0 . , . uestion 2. What about Lindley’s criticism with regard to the incoherence of fiducialinference?
With reference to Fisher’s fiducial argument, it was shown in Lindley (1958) that, ifthe fiducial density of a parameter θ that is formed on the basis of a data set x is treatedas a prior density of θ in forming, in the usual Bayesian way, a posterior density of θ onthe basis of a second data set y , then, in general, this posterior density will not be thesame as the one that would be formed by repeating the same operation but with y asthe first data set, and x as the second data set, i.e. fiducial inference generally fails tosatisfy a seemingly reasonable coherency condition.As a reaction to this, it can be remarked that fiducial inference, whether it is Fisher’sversion of this type of inference, or the version outlined in the present paper, relies on pre-data knowledge, or an expression of the lack of such knowledge, being incorporated intothe inferential process within the context of the observed data. Therefore, while it may beloosely acceptable, in general, to apply a blanket rule such as the strong fiducial argumentwithout concern for the data actually observed, it is perhaps unsurprising that doing thiscould sometimes lead to the type of phenomenon that has just been highlighted. Also,the act of expressing pre-data knowledge is rarely going to be a completely 100% preciseact no matter what paradigm of inference is adopted, therefore the door is always openfor inconsistencies in the inferential process such as the one identified in Lindley (1958)that is under discussion. Furthermore, if indeed we are in a scenario where the coherencycondition being considered is not satisfied, then at least with respect to the type offiducial inference outlined in the present paper, i.e. organic fiducial inference, it would beexpected that good approximate adherence to this condition would usually be achievedproviding that the data sets x and y referred to above are at least moderately sized. Inother words, it can be argued that the practical consequences of the anomaly in questionshould generally be regarded as being quite small.65bserve that the same kind of anomaly is clearly also going to apply when post-datadensities of the parameters of a given model are constructed by relying in some way onthe type of bispatial inference that was described in Section 2.5. Similar arguments canbe made, though, in response to the criticism being discussed with regard to this type ofsituation as have just been presented.Finally, we ought to mention an important issue that is related to this criticism. Inparticular, if it is considered as being appropriate in a particular context to form apost-data density function for the parameters of a given model by incorporating organicfiducial inference, and possibly also bispatial inference, into the framework that has beendetailed in the present paper, then we may ask, would it not be best to use one or both ofthese methods of inference to construct such a density function on the basis of a minimalpart of the data set that has actually been observed, and as a next step, use this densityfunction as a prior density in analysing the rest of the data under only the Bayesianparadigm? Although, at first sight, this strategy may appear to be a reasonable one,it has the drawback that post-data density functions constructed using organic fiducialinference on its own, or combined with bispatial inference, may well be regarded asbeing less adequate representations of the post-data uncertainty that is felt about theparameters concerned if they are based on a small rather than a large amount of data.For example, even if there was very little pre-data knowledge about a given parameter ofinterest and the fiducial statistic Q ( x ) is a sufficient statistic, it may be less appropriateto apply the strong fiducial argument to make inferences about this parameter if thedata set is small rather than large. Also, with regard to bispatial inference, there is ofcourse generally less chance that the one-sided P value in the hypothesis H S defined byequation (7) or (9), i.e. the value F ( t | θ j = θ j ) or the value F (cid:48) ( t | θ j = θ j ), will be smallif it is calculated on the basis of a small rather than a large data set, and as a resultmore chance perhaps that the interpretation of this P value will be a little complicated.66e are therefore led again to an issue that was discussed in the answer to Question 1 ofthis section, in particular the question of whether we can justifiably attach a very highranking to the external strength of the prior density that forms the basis for carryingout the second step of the type of strategy being considered and, if we can only applyBayesian reasoning in this second stage, whether we can justifiably attach a very highranking to the external strength of the posterior density that results from the wholeanalysis? Question 3.
If the choice of the fiducial statistic is not obvious, how should this statisticbe chosen?
The definition of a fiducial statistic Q ( x ) was given in Section 2.4. As alluded to in thisearlier section, if there is not a sufficient statistic for the unknown parameter of interestthat is a natural choice for the fiducial statistic, then a fairly general choice for this latterstatistic, which has a good deal of intuitive appeal, is the maximum likelihood estimatorof the parameter. Nevertheless, it would appear that more sophisticated criteria forchoosing the fiducial statistic could be easily developed so that, in general, the effect ofany arbitrariness in the choice of this statistic could be assured as being negligible. Sucha development though will be left for future work. Question 4.
Can the results obtained from applying integrated organic inference dependon the parameterisation of the sampling model?
There are two key reasons why the parameterisation of the sampling model may pos-sibly affect the inferences made about population quantities of interest when applyingintegrated organic inference. First, related to a point made in the answer to Question 2of this section, it may be possible to achieve a more representative expression of pre-dataknowledge about the parameters of a model using one parameterisation of the modelrather than another. In this case, it is fairly obvious that ideally, out of all possible67arameterisations of the model, the one should be chosen with regard to which the mostrepresentative expression of pre-data knowledge about the parameters can be achieved.The second reason why inferences may be possibly affected by model parameterisationis related to the answer given to Question 3 of this section. In particular, it is thatparameterisations may exist with regard to which fiducial statistics Q ( x ) or test statistics T ( x ) can be found that make more efficient use of the information contained in the datathan those that can be found with regard to other parameterisations. However, it wouldbe expected that, in general, this issue would not have more than a negligible effect onpost-data inferences made about quantities of interest, and where the effect of this issueis more than negligible then, in the context of what was just discussed about the choice ofmodel parameterisation, there clearly should be a preference for those parameterisationsthat allow fiducial statistics and test statistics to be chosen that make the best use ofthe information that is in the data. Question 5.
In cases where the set of full conditional post-data densities referred toin equation (2) are incompatible, how often, in practice, could we expect them to be‘approximately compatible’ ?
Let us begin by clarifying that in interpreting this question it will be assumed thatthe full conditional densities referred to in equation (2) would be described as being‘approximately compatible’ if they were incompatible, but nevertheless it was possible tofind a joint density function of the parameters concerned such that these full conditionaldensities were closely approximated by the full conditional densities of the given jointdensity.In replying to the question just raised, let us first remember that examples werediscussed in Sections 3.2 to 3.5 of the present paper in which the Gibbs sampling methodof Section 2.2 was applied to determine a joint post-data density of the parameters ofeach of the specific models of interest in these examples. Also, various other examples68f this kind were outlined in Bowater (2018a, 2019a, 2019b). In all of these examples, ajustification was given as to why it would be reasonable to conclude that if indeed the fullconditional densities referred to in equation (2) are incompatible, then they neverthelessshould be approximately compatible.However, let us take the opportunity to highlight two examples where the approximatecompatibility of the full conditional densities in equation (2) appeared to be less goodthan what was seen to be generally the case in the examples of the type in question.First, in an example in Bowater (2018a) where organic fiducial inference was appliedto the problem of making post-data inferences about all the parameters of a bivariatenormal distribution, a basic simulation study showed that the full conditional densitiesreferred to in equation (2) were clearly incompatible. It could be argued, though, thatthe main reason for this was likely to be the fairly unsophisticated normality assumptionsthat were made as part of this application of the method of inference in question in orderto approximate the full conditional fiducial densities for three of the five parametersconcerned, these three parameters being, in particular, the two population variances andthe correlation coefficient. Second, although in an example in Bowater (2019a) whereorganic fiducial inference was used to make post-data inferences about all the parametersof a multinomial distribution, a justification was given as to why the full conditionaldensities in equation (2) should be at least approximately compatible, an additional(unreported) simulation study showed that in this example, the full conditional densitiesin question often may not have this desirable property if the number of trials (or in otherwords the number of observations) is very low and one or more of the categories overwhich the multinomial distribution is defined contain no observations. Nevertheless, theproblem of making inferences about the parameters of a multinomial distribution on thebasis of limited data of this type when, as in the example being referred to, there isassumed to be no or very little pre-data knowledge about the parameters concerned is69enerally a difficult problem to solve using any paradigm of inference, see for exampleBerger, Bernardo and Sun (2015), and it is one that may well never have a completelysatisfactory solution.Finally, with regard to making inferences about the parameters θ of any given samplingmodel, it is important to bear in mind that, even if the full conditional densities referred toin equation (2) fail to be at least approximately compatible, then nevertheless, as alludedto in Section 2.2, they may well be considered as representing the best information that isavailable for constructing the most suitable post-data density function for the parametersconcerned using the Gibbs sampling method outlined in this earlier section.This concludes the discussion of the theory put forward in the present paper, i.e.integrated organic inference (IOI). It is hoped that it will be appreciated that this theorymodifies, generalises and extends Fisherian inference, and naturally combines it withBayesian inference in a way that constitutes a major advance on the level of sophisticationof either of these two older schools of inference. References
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances.
Philo-sophical Transactions of the Royal Society , , 370–418.Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall objective priors. BayesianAnalysis , , 189–221.Bowater, R. J. (2017). A defence of subjective fiducial inference. AStA Advances inStatistical Analysis , , 177–197.Bowater, R. J. (2018a). Multivariate subjective fiducial inference. arXiv.org (CornellUniversity), Statistics , arXiv:1804.09804.70owater, R. J. (2018b). On a generalised form of subjective probability. arXiv.org (Cor-nell University), Statistics , arXiv:1810.10972.Bowater, R. J. (2019a). Organic fiducial inference. arXiv.org (Cornell University), Sta-tistics , arXiv:1901.08589.Bowater, R. J. (2019b). Sharp hypotheses and bispatial inference. arXiv.org (CornellUniversity), Statistics , arXiv:1911.09049.Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for Markovchain Monte Carlo. Statistics and Computing , , 319–335.Chen, S-H. and Ip, E. H. (2015). Behaviour of the Gibbs sampler when conditionaldistributions are potentially incompatible. Journal of Statistical Computation andSimulation , , 3266–3275.Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diag-nostics: a comparative review. Journal of the American Statistical Association , ,883–904.Efron, B. (1993). Bayes and likelihood calculations from confidence intervals. Biomet-rika , , 3–26.Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculatingmarginal densities. Journal of the American Statistical Association , , 398–409.Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiplesequences. Statistical Science , , 457–472.Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and theBayesian restoration of images. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , , 721–741. 71effreys, H. (1961). Theory of Probability , 3rd edition, Oxford University Press, Oxford.Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formalrules.
Journal of the American Statistical Association , , 1343–1370.Lindley, D. V. (1958). Fiducial distributions and Bayes’ theorem. Journal of the RoyalStatistical Society, Series B , , 102–107.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E.(1953). Equation of state calculations by fast computing machines. Journal ofChemical Physics ,21