OOrganic fiducial inference
Russell J. Bowater
Independent researcher, Sartre 47, Acatlima, Huajuapan de Le´on, Oaxaca, C.P. 69004,Mexico. Email address: as given on arXiv.org.Personal website: sites.google.com/site/bowaterfospage
Abstract:
A substantial generalisation is put forward of the theory of subjective fiducialinference as it was outlined in earlier papers. In particular, this theory is extended to deal withcases where the data are discrete or categorical rather than continuous, and cases where therewas important pre-data knowledge about some or all of the model parameters. The system fordirectly expressing and then handling this pre-data knowledge, which is via what are referred toas global and local pre-data functions for the parameters concerned, is distinct from that whichinvolves attempting to directly represent this knowledge in the form of a prior distributionfunction over these parameters, and then using Bayes’ theorem. In this regard, the individualattributes of what are identified as three separate types of fiducial argument, namely the strong,moderate and weak fiducial arguments, form an integral part of the theory that is developed.Various practical examples of the application of this theory are presented, including examplesinvolving binomial, Poisson and multinomial data. The fiducial distribution functions for theparameters of the models in these examples are interpreted in terms of a generalised definitionof subjective probability that was set out previously.
Keywords:
Data generating algorithm; Fiducial statistic; Generalised subjective probability;Global and local pre-data functions; Primary random variable; Types of fiducial argument. a r X i v : . [ s t a t . O T ] A ug . Introduction The theory of subjective fiducial inference was first proposed in Bowater (2017b), andwas then modified and extended to deal with more general inferential problems in whichvarious parameters are unknown in Bowater (2018a). A further analysis that supportsthe adoption of this approach to inference is provided in Bowater and Guzm´an (2018b).References to loosely related work in the general area of fiducial inference can be foundin the first two of these three papers.The aim of the present work is to substantially generalise this theory of inference as itwas defined in Bowater (2018a). In particular, this theory will be extended to deal withcases where the data are discrete or categorical rather than continuous, and cases wherethere was important knowledge about some or all of the model parameters before thedata were observed. Such knowledge, which will be termed ‘pre-data knowledge’, will betreated as being distinct from ‘prior knowledge’, since the use of this latter term usuallyexclusively implies that inferences will be made under the Bayesian paradigm.The development of the earlier theory will be at a level that is sufficient to justify thetheory being renamed as ‘organic fiducial inference’. Also, the use of the word ‘subjective’in the original name caused confusion as for some this meant that the theory mustsubstantially depend on personal beliefs, or in some other way must be far from beingobjective. As was explained in Bowater (2018a) and Bowater and Guzm´an (2018b), thiswas not the case for the original theory, and is not generally the case for the theory thatis about to be presented. The word ‘organic’ in the new name, however, still emphasizesthat the theory is designed for living subjects, e.g. humans, and not for robots.For cases in which nothing or very little was known about the model parameters beforethe data were observed, the motivation for this paper is similar to how the need for thework in Bowater (2018a) was justified, that is, it is motivated by the severe criticisms thatgenerally can be made in these cases against the frequentist and Bayesian approaches to2nference. These criticisms, some of which are well known, were set out in Section 4 ofBowater (2017b) and Sections 2 and 7 of Bowater (2018a), and to save space they willnot be repeated here.In other cases that will be of interest, i.e. where there is moderate to strong pre-dataknowledge about some or all of the model parameters, conventional schools of infer-ence can also be inadequate. In particular, frequentist theory is a generally inflexibleframework for incorporating such knowledge into the inferential process. For example,it has proved, on the whole, very difficult to adapt the confidence interval approach tosituations where we simply know, before the data was observed, that values in a givensubset of the natural space of a parameter of interest are impossible, see for exampleMandelkern (2002) and the references therein. On the other hand, while our pre-dataknowledge about some or all of the model parameters may be substantial, it may not becomprehensive enough in many situations to be adequately incorporated into a Bayesiananalysis by placing a prior density function over the parameters in question.Let us now summarise the structure of the paper. Some brief comments about theconcept of probability that will be used in the paper are made in the following section.Further concepts, principles and definitions that underlie the theory of organic fiducial in-ference in cases where only one model parameter is unknown are presented and discussedin Section 3. In relation to earlier work, an account is then given in Section 4 of how thismethodology is extended to include cases where various parameters are unknown.In the second half of the paper, the theory is applied to various examples. In particularin Sections 5 and 6, problems of inference based on both continuous and discrete dataare examined where nothing or very little was known about the model parameters beforethe data were observed. Examples are then discussed in Section 7 where the naturalparameter space is restricted as a result of pre-data knowledge about the case in question,and finally in Section 8, the impact of more general forms of pre-data knowledge about3he model parameters is illustrated.
2. Generalised subjective probability
The definition of probability upon which the theory of organic fiducial inference willbe based is the definition of subjective probability that was presented in Bowater andGuzm´an (2018b), although the key concept of similarity that this definition relies on wasintroduced in Bowater (2017a), and discussed in Bowater (2017b) and Bowater (2018a).For the sake of convenience, this definition of probability will be referred to as generalisedsubjective probability.Under this definition, a probability distribution is defined by its distribution func-tion, which has the usual mathematical properties of such a function, and the strength of this function relative to other distribution functions of interest. In very loose terms,the strength of a distribution function is essentially a measure of how well the distri-bution function represents a given individual’s uncertainty about the random variableconcerned. In this paper, we will be primarily interested in the external strength ofa continuous distribution function as specified by Definitions 5 and 7 of Bowater andGuzm´an (2018b). To avoid repeating all the technical details, the reader is invited toexamine these definitions as well as the application of these definitions to a fundamentalproblem of statistical inference in Sections 3.6 and 3.7 of this earlier paper.Although generalised subjective probability will be the adopted definition of proba-bility, the concept of strength will not be explicitly discussed in the sections that imme-diately follow in order to give a more digestible introduction to the other main conceptsthat underlie organic fiducial inference. Instead, the role of this definition of probabil-ity in organic fiducial inference will be fully examined when this method of inference isapplied to examples later in the paper. 4 . Univariate organic fiducial inference3.1. Sampling model and data generation
In general, it will be assumed that a sampling model that depends on one or variousunknown parameters θ , θ , . . . , θ k generates the data x . Let the joint density or massfunction of the data given the true values of θ , θ , . . . , θ k be denoted as g ( x | θ , θ , . . . , θ k ).For the moment, though, we will assume that the only unknown parameter in the modelis θ j , either because there are no other parameters in the model, or because the truevalues of the parameters in the set θ − j = { θ , . . . , θ j − , θ j +1 , . . . , θ k } are known.In a change from Bowater (2018a), the following more general definition of a fiducialstatistic will be applied. Definition 1: A fiducial statistic
A fiducial statistic Q ( x ) will be defined as being the only statistic in a sufficient set of one-dimensional statistics that is not an ancillary statistic. Of course, given this requirement,there may not exist any possible choice for this kind of statistic. However, in this paper,we will only consider cases where this definition can be successfully applied. In othercases, a way of defining the fiducial statistic is to allow it to be any one-to-one functionof a unique maximum likelihood estimator of θ j . This latter criterion was applied inSection 5.7 of Bowater (2018a).We will also make a more general assumption about the way in which the data weregenerated than in this earlier paper. Assumption 1: Data generating algorithm
Independent of the way in which the data were actually generated, it will be assumed5hat the data set x was generated by the following algorithm:1) Simulate the values of the ancillary complements, if any exist, of a given fiducialstatistic Q ( x ).2) Generate a value γ for a continuous one-dimensional random variable Γ, which has adensity function π ( γ ) that does not depend on the parameter θ j .3) Determine a value q ( x ) for the fiducial statistic Q ( x ) by setting Γ equal to γ and q ( x ) equal to Q ( x ) in the following expression, which effectively defines the distributionfunction of Q ( x ): Q ( x ) = ϕ (Γ , θ j ) (1)where the function ϕ (Γ , θ j ) is defined so that it satisfies the following conditions: Assumption 1.1: Conditions on the function ϕ (Γ , θ j )a) The distribution function of Q ( x ) as defined by the expression in equation (1) is equalto what it would have been if Q ( x ) had been determined on the basis of the data set x .b) The only random variable upon which ϕ (Γ , θ j ) depends is the variable Γ.4) Generate the data set x by conditioning the sampling density or mass function g ( x | θ , θ , . . . , θ k ) on the already generated value for Q ( x ) and the values of any an-cillary complements of Q ( x ).Observe that Assumption 1.1 differs from the corresponding assumption in Bowa-ter (2018a) due to the absence of a condition that is similar to condition (c) of Assump-tion 1.1 in this earlier paper.In the context of the above algorithm, the variable Γ will be referred to as a primaryrandom variable (primary r.v.), which is consistent with how this term was used inBowater (2017b), Bowater (2018a) and Bowater and Guzm´an (2018b). To clarify, if6his algorithm was rewritten so that the value γ of the variable Γ was generated bysetting it equal to a deterministic function of an already generated value for Q ( x ) andthe parameter θ j , then Γ would not be a primary r.v. Although the fiducial argument is usually considered to be a single argument, in thissection we will clarify and develop the argument by breaking it down into three separatebut related sub-arguments.
Definition 2(a): Strong or standard fiducial argument
This is the argument that the density function of the primary r.v. Γ after the data havebeen observed, i.e. the post-data density function of Γ, should be equal to the pre-datadensity function of Γ, i.e. the density function π ( γ ) as defined in step 2 of the algorithmin Assumption 1. In the case where nothing or very little was known about the parameter θ j before the data were observed, justifications for this argument, without using Bayesianreasoning, were outlined in Section 3.1 of Bowater (2017b), Section 6 of Bowater (2018a)and Section 3.6 of Bowater and Guzm´an (2018b), and therefore will not be repeated here. Definition 2(b): Moderate fiducial argument
This type of fiducial argument will be assumed to be only applicable to cases where valuesof the primary r.v. Γ that were possible before the data were observed, i.e. values in theset { γ : π ( γ ) > } , are made impossible by the act of observing the data. Under thiscondition, it is the argument that, over the set of values of Γ that are still possible giventhe data, the relative height of the post-data density function of Γ should be equal tothe relative height of the pre-data density function of Γ.It is an argument that can be certainly viewed as being less attractive than the strongfiducial argument as its use implies that our beliefs about Γ will be modified by the data.7evertheless, it will be made clear in Section 7.1 how this argument can be adequatelyjustified without using Bayesian reasoning in an important class of cases. Definition 2(c): Weak fiducial argument
This argument will be assumed to be only applicable to cases where the use of neitherthe strong nor the moderate fiducial argument is considered to be appropriate. It is theargument that, over the set of values of the primary r.v. Γ that are possible given thedata, the relative height of the post-data density function of Γ should be equal to therelative height of the pre-data density function of Γ multiplied by weights on the valuesof Γ that are determined from a function over the parameter θ j that was specified beforethe data were observed. The precise way in which these weights over the values of Γ areformed will be defined in Section 3.4.Similar to the strong and moderate fiducial arguments, this type of fiducial argumentcan be adequately justified without using Bayesian reasoning in many important cases.Such a justification and examples of the cases in question will be presented in Section 8. θ j In the theory of organical fiducial inference, it will be assumed that pre-data knowledgeabout the parameter θ j is expressed through what will be called a global pre-data functionand a local pre-data function for θ j , which have the following definitions. Definition 3: Global pre-data (GPD) function
The global pre-data (GPD) function ω G ( θ j ) is any given non-negative and locally in-tegrable function over the space of the parameter θ j . It is a function that only needsto be specified up to a proportionality constant, in the sense that if it is multiplied bya positive constant, then the value of the constant is redundant. If ω G ( θ j ) = 0 for all θ j ∈ A where A is a given subset of the real line, then this implies that it was regarded8s being impossible that θ j ∈ A before the data x were observed. Unlike a Bayesian priordensity, it is not controversial to use a GPD function that is not globally integrable.In many cases, the GPD function will have the following simple form: ω G ( θ j ) = (cid:26) θ j ∈ Ab otherwise (2)where the set A may be empty and b > Definition 4: Local pre-data (LPD) function
The local pre-data (LPD) function ω L ( θ j ) is a function of the parameter θ j that hasthe same mathematical properties as the GPD function, i.e. it is a non-negative andlocally integrable function over the space of θ j that only needs to be specified up toa proportionality constant. Its role is to complete the definition of the joint post-datadensity function of the primary r.v. Γ and the parameter θ j in cases where using eitherthe strong or moderate fiducial argument alone is not sufficient to achieve this. Forthis reason, the LPD function is in fact redundant in many situations. We describethis function as being ‘local’ because it is only used in the inferential process under thecondition that γ equals a specific value, and with this condition in place and given thedata x , the parameter θ j usually must lie in a compact set that is contained in a verysmall region of the real line. It will be seen that because of this, even if the LPD functionis not redundant, its influence on the inferential process will usually be relatively minor. Given the data x , the fiducial density function of the parameter θ j conditional on theparameters in the set θ − j being known, i.e. the density function f ( θ j | θ − j , x ), will be9efined according to the following two compatible principles. Principle 1 for defining the fiducial density f ( θ j | θ − j , x )This principle requires that the following condition is satisfied. Condition 1
Let G x and H x be the sets of all values of Γ and θ j respectively that are possible giventhe value of the fiducial statistic q ( x ) and its ancillary complement, if it exists, that arecalculated on the basis of the data x . In defining these sets, it is assumed that values of θ j that were regarded as being impossible before the data were observed can not be madepossible by observing the data. Given this notation, the present condition is satisfied if,on substituting the variable Q ( x ) in equation (1) by the value q ( x ), this equation woulddefine a bijective mapping between the set G x and the set H x .Under Condition 1, the fiducial density function f ( θ j | θ − j , x ) is defined by setting Q ( x )equal to q ( x ) in equation (1), and then treating the value θ j as being a realisation of therandom variable Θ j , to give the expression: q ( x ) = ϕ (Γ , Θ j )except that, instead of Γ necessarily having the density function π ( γ ) as defined instep 2 of the algorithm in Assumption 1, it will be assumed to have the following densityfunction: π ( γ ) = (cid:26) c ω G ( θ j ( γ )) π ( γ ) if γ ∈ G x θ j ( γ ) is the value of θ j that maps on to the value γ , the function ω G ( θ j ( γ )) is theGPD function as introduced by Definition 3, and c is a normalising constant.The function π ( γ ) will be regarded as being the post-data density function of Γ. Also,in the definition of the weak fiducial argument, i.e. Definition 2(c), the function over θ j γ values in the construction of this densityfunction for Γ will now be identified as being the GPD function.Observe that if the GPD function is neutral, i.e. it has the form given in equation (2),then over the set G x , the density π ( γ ) will be equal to the pre-data density π ( γ )conditioned to lie in this set. For this type of GPD function, if G x = { γ : π ( γ ) > } (4)then clearly the procedure for making inferences about θ j will depend on the strong fidu-cial argument, otherwise it will depend on the moderate fiducial argument. Alternatively,if the GPD function is not equal to a positive constant over the set H x , then we can seethat inferences about θ j will be made by using the weak fiducial argument.Furthermore notice that if, on substituting the variable Q ( x ) by the value q ( x ), equa-tion (1) defines an injective mapping from the set { γ : π ( γ ) > } to the space of theparameter θ j , then the GPD function ω G ( θ j ) expresses in effect our pre-data beliefs about θ j relative to what is implied by the strong fiducial argument. By doing so, it determineswhether the strong, moderate or weak fiducial argument is used to make inferences about θ j , and also the way in which the latter two arguments influence the inferential process.In this respect, under the same assumption concerning equation (1), it can be seenthat if the pre-data density π ( γ ) is a uniform density for Γ over (0 , d = (cid:90) γ ∈ D ω G ( θ j ( γ )) dγ and e = (cid:90) γ ∈ E ω G ( θ j ( γ )) dγ where D and E are non-empty subsets of the interval (0 ,
1) such that the events { Γ ∈ D } and { Γ ∈ E } are assigned the same probability by the density π ( γ ), then assuming that e is not zero, the probability of the event { Γ ∈ D } will be d/e times the probability ofthe event { Γ ∈ E } after the data have been observed.Finally, it should be noted that in the theory of subjective fiducial inference as outlined11n Bowater (2018a), the density π ( γ ) is effectively always defined to be equal to thedensity π ( γ ), i.e. the only type of fiducial argument that this earlier theory relies on isthe strong fiducial argument. Principle 2 for defining the fiducial density f ( θ j | θ − j , x )This principle requires that the following two conditions are satisfied. Condition 2(a)
Given the value q ( x ) for the variable Q ( x ), it is required that, H x = { θ j : ( ∃ γ ∈ G x )[ θ j ∈ θ j ( γ )] } where G x and H x are as defined in Condition 1, and θ j ( γ ) is the set of values of θ j thatmap on to the value γ according to equation (1). Condition 2(b)
The GPD function ω G ( θ j ) must be equal to a positive constant over the set H x .Under Conditions 2(a) and 2(b), the fiducial density function f ( θ j | θ − j , x ) is definedby f ( θ j | θ − j , x ) = (cid:90) γ ∈ G x ω ∗ ( θ j | γ ) π ( γ ) dγ (5)where π ( γ ) is as defined in equation (3), although ω G ( θ j ( γ )) will always be equal toa positive constant in this equation, and the conditional density function ω ∗ ( θ j | γ ) isdefined by ω ∗ ( θ j | γ ) = (cid:26) c ( γ ) ω L ( θ j ) if θ j ∈ θ j ( γ )0 otherwise (6)where ω L ( θ j ) is the LPD function as introduced by Definition 4, and c ( γ ) is a normalisingconstant, which clearly must depend on the value of γ .It can be seen that the density function f ( θ j | θ − j , x ) as defined by equation (5) is12ormed by marginalising, with respect to γ , a joint density of Γ and θ j that is based on ω ∗ ( θ j | γ ) being the conditional density of θ j given γ , and on π ( γ ) being the marginaldensity of Γ. Similar to what was the case under Principle 1, if the condition in equa-tion (4) is satisfied, then the density π ( γ ) will be equal to the density π ( γ ), i.e. thedensity function f ( θ j | θ − j , x ) is determined on the basis of the strong fiducial argument,otherwise it is determined on the basis of the moderate argument. However, in contrastto what was the case under Principle 1, the weak fiducial argument is never used to makeinferences about θ j .Also, we can observe that the density function ω ∗ ( θ j | γ ) defined in equation (6) isformed by normalising the LPD function after θ j has been restricted to lie in the subset θ j ( γ ). The role of the density ω ∗ ( θ j | γ ) is therefore to make use of the nature of the LPDfunction to distribute θ j over those values of θ j that are consistent with any given valueof γ . For this reason, it is assumed that the LPD function ω L ( θ j ) is chosen to reflect whatwe believe about θ j . In particular, these beliefs are assumed to be our pre-data ratherthan post-data beliefs about θ j , as otherwise it is evident that, in general, we would beguilty of making inferences about θ j by using the data twice. As eluded to in Definition 4,the sets θ j ( γ ) will in general be compact sets that are usually wholly contained withinvery small regions of the real line.Furthermore, it is worth noting that if Condition 2(b) is satisfied, then Principle 1 isessentially a special case of Principle 2. This is because to apply Principle 1 it is requiredthat Condition 1 holds, and if it does then, first, Condition 2(a) must hold, second, thedensity ω ∗ ( θ j | γ ) could be regarded as converting itself into a point mass function atthe value θ j ( γ ), and third, as a result of this, the joint density function of Γ and θ j inequation (5) effectively becomes a univariate density function. Therefore, the integrationof this latter function with respect to γ would be naturally regarded as being redundant.As a final point, we need to acknowledge the fact that important cases exist in which13either Condition 1 is satisfied nor Conditions 2(a) and 2(b) are both satisfied. If Condi-tion 2(a) does not hold, then we have a problem that could be described as ‘spillage’ dueto the fact that the set H x will be a proper subset of { θ j : ( ∃ γ ∈ G x )[ θ j ∈ θ j ( γ )] } , andtherefore this latter set ‘spills out’ of the set H x . How to deal with this problem of spillagewill be returned to in Section 7.2, and how to deal with cases where Condition 2(b) doesnot hold will be discussed in Section 8.
4. Multivariate organic fiducial inference
We will now consider the case where all the parameters θ , θ , . . . , θ k in the samplingmodel are unknown. Definition 5: Joint fiducial density functions
Under the assumption that Principles 1 or 2, or any natural variations on these principles,can be used to define the full conditional fiducial densities f ( θ j | θ − j , x ) for j = 1 , , . . . , k (7)and that this set of conditional densities determine a joint density function for the param-eters θ , θ , . . . , θ k , this latter density function will be defined as being the joint fiducialdensity function of these parameters, and will be denoted as f ( θ , θ , . . . , θ k | x ). It canbe easily shown that this density function will always be unique.To corroborate that the set of full conditional densities in equation (7) actually de-termine a joint density function for the parameters concerned, the analytical or thecomputational method that were proposed for this purpose in Bowater (2018a) could beapplied. These methods will now be briefly described. An analytical method
Under the assumption that the set of full conditional densities in equation (7) can be ex-14ressed analytically, a way of establishing whether they determine a joint density functionfor θ , θ , . . . , θ k is simply to propose an analytic expression for such a density function,derive the full conditional densities of the proposed density function, and see if theymatch the full conditionals in equation (7). Statement about incompatible full conditional densities
It is not acknowledged in the following subsection or in Section 6.3 that the stationarydensity of an ergodic Gibbs sampler is affected by the scanning order of the variableson which the sampler is based when the full conditional densities concerned are incom-patible. These sections will be rewritten in due course to take this important issueinto account. Nevertheless, doing so will not affect the relevance of the results that arecurrently presented in the example in the latter section.
A computational method
A more general method for establishing whether the full conditional densities in equa-tion (7) determine a joint density function for the parameters concerned is based onattempting to generate random samples from this joint density by applying the Gibbssampler (Geman and Geman 1984 and Gelfand and Smith 1990) to the full conditionalsin question. Of course, the Gibbs sampler, assuming that it is irreducible and aperiodic,will only converge to a unique stationary density if the joint density f ( θ , θ , . . . , θ k | x )actually exists (and the reverse is also true). For this reason, we now choose to redefinethe problem as being one of trying to establish whether the Gibbs sampler converges toa unique stationary density on the basis of the observed behaviour of this sampler.This may also seem to be a difficult problem to resolve. However, in a more conven-tional application of the Gibbs sampler, we are faced with the similar problem of whetherthe sampler converges to its unique stationary density in a reasonable amount of time,i.e. before a large pre-specified number of cycles of the sampler have been completed.15his is the reason why a substantial number of techniques have been developed to assesswhether Monte Carlo Markov chains, such as the Gibbs sampler, converge to their uniquestationary densities within a given finite number of cycles, see for example Gelman andRubin (1992) and Brooks and Roberts (1998).Obviously, if there is the added complication that we are not completely sure that theGibbs sampler has a unique stationary density, then it would seem appropriate that weuse these convergence diagnostics more intensively. On the whole though, if in the contextof having already taken into account how the full conditional densities in equation (7)were formed, the use of such diagnostics can give us a high degree of confidence that theGibbs sampler has converged to a unique stationary density, then of course we shouldhave a high degree of confidence that the joint fiducial density f ( θ , θ , . . . , θ k | x ) doesindeed exist.An important benefit of using the Gibbs sampling method that has just been describedis that to calculate expectations of interest with respect to this joint fiducial density, wewill often need to rely on simulation methods such as the Gibbs sampler. Therefore byusing this Gibbs sampling method, two goals can be achieved simultaneously.
5. An example with continuous data and little pre-data knowledge
We will now apply the methodology put forward in the previous sections to some exam-ples. To begin with, let us consider the standard problem of making inferences about themean µ of a normal density function, when its variance σ is unknown, on the basis of asample x of size n , i.e. x = ( x , x , . . . , x n ), drawn from the density function concerned.Although the way in which the theory of subjective fiducial inference can be used tosolve this problem was detailed in Bowater (2018a), let us quickly place this problem inthe context of the type of inference that is the subject of the present paper, i.e. organicfiducial inference. 16f σ is known, a sufficient statistic for µ is the sample mean ¯ x , which therefore canbe assumed to be the fiducial statistic Q ( x ). Based on this assumption, equation (1) canbe expressed as ¯ x = ϕ (Γ , µ ) = µ + ( σ/ √ n )Γ (8)where Γ ∼ N (0 , µ before the data x were observed, it is quite natural to specify the GPD function for µ as follows: ω G ( µ ) = a , µ ∈ ( −∞ , ∞ ), where a >
0. Furthermore, since equation (8)will always satisfy Condition 1, the fiducial density f ( µ | σ , x ) can be always determinedby Principle 1. In particular, as the GPD function is neutral, and the condition inequation (4) will be satisfied, the fiducial density in question is derived under this principleby applying the strong fiducial argument. As a result, it can be easily shown that thisfiducial density is defined by µ | σ , x ∼ N (¯ x, σ /n )On the other hand, if µ is known, a sufficient statistic for σ is ˆ σ = (1 /n ) (cid:80) ni =1 ( x i − µ ) which will be assumed to be Q ( x ). Based on this assumption, equation (1) can beexpressed as ˆ σ = ϕ (Γ , σ ) = ( σ /n )Γwhere Γ ∼ χ n . Under the assumption of no or very little pre-data knowledge about σ ,it is quite natural to specify the GPD function for σ as follows: ω G ( σ ) = b if σ ≥ b >
0. Furthermore, we can see that Principle 1 will be againalways applicable, and as the GPD function is neutral and the condition in equation (4)will be satisfied, the fiducial density f ( σ | µ, x ) is derived under this principle by againcalling on the strong fiducial argument. As a result, it can be easily shown that thisfiducial density is a scaled inverse χ density function with n degrees of freedom andscaling parameter equal to ˆ σ . 17inally, by using the analytical method outlined in Section 4, it can be easily estab-lished that the conditional density functions f ( µ | σ , x ) and f ( σ | µ, x ) that have justbeen defined determine a joint fiducial density for µ and σ , and by integrating overthis joint density function, it can be deduced that the marginal fiducial density for µ isdefined by µ | x ∼ t n − (¯ x, s/ √ n ) (9)where s is the sample standard deviation, i.e. it is the well-used non-standardised Student t density function with n − x and scalingparameter equal to s/ √ n .The full conditional fiducial densities for many other problems of inference are natu-rally obtained in a similar way, i.e. under Principle 1, with a neutral GPD function andapplying the strong fiducial argument. For example, the full conditional fiducial densitiesthat were put forward in all the applications of subjective fiducial inference that werediscussed in Bowater (2018a) can be derived either exactly or approximately under thesame assumptions.Let us now turn to the issue of how to interpret the joint fiducial density functions f ( θ , θ , . . . , θ k | x ) that can be derived under these assumptions in terms of the frameworkof generalised subjective probability, i.e. the definition of probability outlined in Bowaterand Guzm´an (2018b). In accordance with what was explained back in Section 2, tocomplete the definition of any fiducial or posterior distribution, within this framework,we require both the distribution function of the variables concerned, and an assessment ofthe external strength of this function relative to other distribution functions of interest.With regard to the main example of the present section, a detailed evaluation of theexternal strength of the fiducial distribution function of µ given σ , i.e. F ( µ | σ , x ), wasprovided in Bowater and Guzm´an (2018b). In particular, it was shown how it can beargued that if the compound events R ( λ ) in the reference set R (using the notation of this18arlier paper) are made up of the outcomes of a well-understood physical experiment, e.g.the positions of a wheel after it has been spun, then, for any resolution λ ∈ [0 . , . F ( µ | σ , x ) should be judged asbeing at a level that is close to the highest attainable level. On the basis of the argumentspresented in Bowater (2018a) and Bowater and Guzm´an (2018b), the same conclusioncan be reached about the relative external strength of the fiducial distribution functionof σ given µ , i.e. F ( σ | µ, x ).Since the joint fiducial distribution function of µ and σ is fully defined by two distri-bution functions, namely F ( µ | σ , x ) and F ( σ | µ, x ), that, under the assumptions thathave been made about the reference set R and the resolution λ , can both be argued asbeing externally very strong, then under the same assumptions, it can be argued thatthis joint distribution function should also be regarded as being externally very strong.In loose terms, this means that the joint distribution of µ and σ in question shouldbe regarded as being close in nature to the kind of probability distribution that wouldbe placed over the outcomes of the physical experiment on which the reference set R isbased. By generalising the same line of reasoning (see Bowater 2018a for clarification),similar conclusions can be reached about the relative external strengths of the joint fidu-cial distribution functions F ( θ , θ , . . . , θ k | x ) that can be derived for other problems thatsatisfy the criteria of the cases that have been considered in the present section, e.g. theproblems discussed in Bowater (2018a).
6. Examples with discrete data and little pre-data knowledge
In this section, organic fiducial inference will be applied to examples in which the data x are discrete, and where nothing or very little was known about the model parametersbefore the data were observed. 19 .1. Inference about a binomial proportion First, let us consider the problem of making inferences about the population proportionof successes p on the basis of observing x successes in n trials, where the probability ofobserving any given number of successes y follows the usual definition of the binomialmass function as specified by: g ( y | p ) = (cid:18) ny (cid:19) p y (1 − p ) n − y for y = 0 , , . . . , n As clearly the value x is a sufficient statistic for p , it will therefore be assumed to bethe fiducial statistic Q ( x ). Based on this assumption, equation (1) can be expressed as x = ϕ (Γ , p ) = min { z : Γ < z (cid:88) y =0 g ( y | p ) } (10)where Γ ∼ U (0 , p , it is again quite natural that the GPD function has the following form: ω G ( p ) = a if 0 ≤ p ≤ a >
0. This time, though, since equation (10)will never satisfy Condition 1 for any choice of the GPD function and for any value of x , we can never apply Principle 1. On the other hand, this equation together with thespecified GPD function will satisfy Condition 2(a) for all possible values of x , and sinceCondition 2(b) will also hold for all x , Principle 2 can always be applied. Furthermore,as the condition in equation (4) will also be satisfied, inferences will be made about p under this principle by using the strong fiducial argument.As a result, by placing the present case in the context of the general definition ofthe fiducial density f ( θ j | θ − j , x ) given in equations (5) and (6), we obtain the followingexpression for the fiducial density f ( p | x ): f ( p | x ) = (cid:90) ω ∗ ( p | γ ) dγ (11)where ω ∗ ( p | γ ) = (cid:26) c ( γ ) ω L ( p ) if p ∈ p ( γ )0 otherwise (12)20f course, to be able to complete this definition, a LPD function ω L ( p ) needs to be spec-ified. Observe that any choice for this function that satisfies the very loose requirementsof Definition 4 will lead to a fiducial density f ( p | x ) that is valid for any n ≥ x = 0 , , . . . , n . Nevertheless, to provide two practical examples, we will choose tohighlight the two LPD functions that are defined by ω L ( p ) = b if 0 ≤ p ≤ b > ω L ( p ) = 1 / (cid:112) p (1 − p ) if 0 ≤ p ≤ f ( p | x ) for any given value of x . However, drawingrandom values from this density function will be generally fairly straightforward.In this respect, the histograms in Figures 1(a) and 1(b) were each formed on the basisof one million independent random values drawn from the density function f ( p | x ), with n being equal to 10 and the observed x being equal to 1. The results in Figure 1(a) dependon choosing the LPD function to be the one given in equation (13), while the results inFigure 1(b) depend on this function being as defined in equation (14). The dashed-linecurves in these figures represent the posterior density for p that corresponds to the priordensity for p being uniform on (0 , p that corresponds to the prior density for p being the Jeffreysprior for the case in question, i.e. the density function that is proportional to the functionfor p in equation (14).It can be seen from these figures that, although the posterior density for p is highlysensitive to which of the two prior densities is used, the fiducial density for p barely movesdepending on whether the LPD function is proportional to the uniform prior, or whetherit is proportional to the Jeffreys prior for this case. Moreover, we can observe that thetwo fiducial densities for p in question both closely approximate the posterior density for21 a) D en s i t y p (b) D en s i t y p Figure 1: Samples from the organic fiducial density of a binomial proportion p that is based on this Jeffreys prior.Similar to the previous section, let us now turn to the issue of how to interpret thefiducial density f ( p | x ) in terms of the framework of generalised subjective probability. Itwill be assumed that the reference set R and the range of the resolution λ are as definedin this earlier section.To begin with, on the basis of the lines of reasoning presented in Bowater (2018a) andBowater and Guzm´an (2018b), it can be argued that the relative external strength of thedistribution function that corresponds to the post-data density of the primary r.v. Γ, i.e.the uniform density function π ( γ ), should be judged as being at a level that is close tothe highest attainable level, which loosely means that arguably this density should bean extremely good representation of our post-data beliefs about Γ. On the other hand,given that it is being assumed that we have no or very little pre-data knowledge about p ,it will not be easy to find an LPD function ω L ( p ) that adequately represents our pre-databeliefs about p . Therefore, it would be expected that similar to any prior distributionfunction that could be chosen for p in this type of situation, the distribution functionsthat correspond to the conditional densities ω ∗ ( p | γ ) defined in equation (12) would be22udged as being externally quite weak.Nevertheless, since these latter distribution functions are defined over intervals for p that will be generally much shorter than the interval for p over which the prior distributionfunction for p must be defined, i.e. the interval (0 , p . Moreover, since in cases where n is not very small and x is notequal to 0 or n , the role of the LPD function ω L ( p ) could be described as being heavilysubordinate to the role of the density π ( γ ) in the construction of the joint density of p and γ in equation (11), it can be argued that, in these cases, the distribution functionthat corresponds to the fiducial density f ( p | x ) should be regarded as being externallyvery strong. In loose terms, this means that the fiducial probability of p lying in anygiven interval of moderate width should be regarded as being close in nature to theprobabilities of the events contained in the reference set R .By contrast, since the posterior density for p is effectively obtained through Bayes’theorem by simply reweighting the prior density for p , that is, by normalising the den-sity function that results from multiplying this prior density function by the likelihoodfunction, it would seem difficult to use a form of a reasoning that is compatible with theBayesian paradigm, to argue that the relative external strength of the posterior distri-bution function for p should not be heavily dependent on the relative external strengthof the prior distribution function for p , which as already mentioned would be expectedto be externally quite weak. We will now consider the problem of making inferences about an unknown event rate τ on the basis of observing x events over a time period of length t , where the probabilityof observing any given number of events y over a period of this length follows the usual23efinition of the Poisson mass function as specified by: g ( y | τ ) = ( τ y /y !) exp( − τ ) for y = 0 , , , . . . Again, since the data set to be analysed consists of a single value x , this value will beassumed to be the fiducial statistic Q ( x ). Based on this assumption, equation (1) can beexpressed in a way that is similar to equation (10), i.e. x = ϕ (Γ , τ ) = min { z : Γ < z (cid:88) y =0 g ( y | τ ) } (15)where Γ ∼ U (0 , τ , theGPD function will again be specified in the following way: ω G ( τ ) = a for τ > a >
0. Similar also to the previous problem, the nature of equation (15)means that Principle 1 can never be applied for any choice of the GPD function, but theparticular choice that has been made for this latter function means that Principle 2 canalways be applied, and in particular, inferences will be made about τ under this principleby using the strong fiducial argument.As a result, expressions that define the fiducial density f ( τ | x ) are identical to theexpressions in equations (11) and (12) except that the proportion p is replaced by theevent rate τ . Although any choice for the LPD function ω L ( τ ) that conforms to Defini-tion 4 will imply that this fiducial density is valid for any x = 0 , , , . . . , let us choose tohighlight the consequences of using the two LPD functions that are defined by ω L ( τ ) = b if τ > b > ω L ( τ ) = 1 / √ τ if τ > f ( τ | x ),24 a) D en s i t y . . . . t (b) D en s i t y . . . . t Figure 2: Samples from the organic fiducial density of a Poisson event rateunder the assumption that two events were observed over a given period of length t ,i.e. x = 2, with the LPD functions that underlie the results in these two figures beingdefined by equation (13) and by equation (14) respectively. In these figures, the dashed-line curves represent the posterior density for τ that corresponds to the prior densityfor τ being the function for τ in equation (16), while the solid-line curves represent thisposterior density when the prior density for τ is the function for τ in equation (17),i.e. the Jeffreys prior for the case in question. Observe that the use of these two priordensities for τ is controversial as they are both improper.It is evident that there is almost no difference between the two histograms in Fig-ures 2(a) and 2(b), and as was the case for the histograms in Figures 1(a) and 1(b),they are both closely approximated by the posterior density that is based on the Jeffreysprior for the problem of interest. Furthermore, using a very similar line of reasoningto the one that in Section 6.1 was used to argue that, under certain assumptions, thedistribution function that corresponds to the fiducial density f ( p | x ) should be regardedas being externally very strong, it can also be argued, under the same assumptions aboutthe set R and the resolution λ , that if x >
0, the distribution function that corresponds25o the fiducial density of current interest, i.e. f ( τ | x ), should also be regarded as beingexternally very strong. To conclude this section, let us consider the problem of making inferences about thepopulation proportions p = ( p , p , . . . , p k +1 ) (cid:48) of all the k + 1 outcomes of an experiment,where p i is the proportion of times outcome i is generated by the experiment, based onobserving any given sample of counts x = ( x , x , . . . , x k +1 ) (cid:48) of these outcomes, where x i is the number of times outcome i is observed, and the probability of observing thissample followed the usual definition of the multinomial mass function as specified by: g ( x | p ) = n ! x ! x ! · · · x k +1 ! k +1 (cid:89) i =1 p x i i for x , x , . . . , x k +1 ∈ Z ≥ , where n = (cid:80) k +1 i =1 x i Given that p k +1 = 1 − (cid:80) ki =1 p i , let us define the complete set of model parametersas being the set { p , p , . . . , p k } . Now, if it is assumed that all the proportions in thisset are known except p j , a set of sufficient statistics for p j would be { x j , x j + x k +1 } .However, x j + x k +1 is an ancillary statistic, and therefore according to Definition 1, itcan be assumed that x j is the fiducial statistic Q ( x ). Under this assumption, and takinginto account that the quantity p j + p k +1 is known, it is convenient to express the definitionof the conditional fiducial density f ( p j | p − j , x ), where p − j = { p , . . . , p j − , p j +1 , . . . , p k } ,in terms of the fiducial density f ( r j | p − j , x ), where r j = p j / (( p j + p k +1 ). This is becausethe definition of this latter fiducial density is equivalent to the definition of the fiducialdensity f ( p | x ) in equations (11) and (12) except that p , x and n in this earlier definitionare substituted by r j , x j and x j + x k +1 respectively.In this way, the set of full conditional fiducial densities for this problem can be deter-mined, i.e. the set f ( p j | p − j , x ) for j = 1 , , . . . , k (18)26n the basis of having done this, the histograms in Figures 3(a)-(d) summarise asample of three million realisations of all the parameters of a multinomial distributionfunction with k = 4 that was obtained by excluding an initial burn-in sample of 500 ofsuch random vectors from one run of a Gibbs sampler applied to this set of full conditionaldensities. The sample of counts x was (0 , , , , (cid:48) , and to complete the definition of theseconditional fiducial densities, the LPD functions concerned, i.e. { ω L ( p j ) : j = 1 , , , } were all chosen to have the form of the LPD function given in equation (13). The Gibbssampler in question was also run various times more from different starting points, andthe results provided no evidence to suggest that the sampler was failing to converge to aunique stationary density function. Therefore, it would seem reasonably safe to assumethat the full conditional densities in equation (18) determine a joint fiducial density forthe parameters concerned, and we have succeeded in generating a series of random vectorsfrom this density function.The solid-line curves in Figures 3(a)-(d) represent the marginal posterior densities foreach of the parameters p , p , p and p respectively when the joint prior density for theseparameters is the Jeffreys prior for the case in question, i.e. a symmetric Dirichlet densitywith concentration parameter α equal to 0.5. On the other hand, the long-dashed andshort-dashed curves in these figures represent these marginal posterior densities whenthe joint prior density concerned is, respectively, a uniform density and the Perks priordensity, i.e. a symmetric Dirichlet density with α equal to 1 / ( k + 1). For any given valueof k , the use of the uniform prior density was advocated for example by Tuyl (2017), whilethe use of the Perks prior density was advocated for example by Berger et al. (2015).It can be seen that the histograms for the proportions p , p and p in Figures 3(b)-(d) are closely approximated by the marginal posterior densities corresponding to each ofthese parameters when the joint prior density is the Jeffreys prior for this case, whereasthe histogram for the proportion p in Figure 1(a) is only loosely approximated by the27 a) D en s i t y p (b) D en s i t y p (c) D en s i t y p (d) D en s i t y . . . . p Figure 3: A sample from a joint organic fiducial density of multinomial proportionsobtained using the Gibbs samplermarginal posterior density for p derived on the basis of this prior density. Also, thecovariances between all the proportions p = ( p , . . . , p ) (cid:48) , except those involving theparameter p , were found to be very similar between the joint fiducial density and thejoint posterior density in question. Furthermore, additional simulations showed that thejoint fiducial density in this example was not very sensitive to the choice of the LPDfunctions concerned, i.e. { ω L ( p j ) : j = 1 , , , } .Before proceeding let us assume that the reference set R and the resolution λ aredefined as in previous sections. Now, given the natural relationship that exists betweenany of the full conditional densities in equation (18) and the fiducial density for a binomialproportion defined in equations (11) and (12), a similar line of reasoning to one outlined28n Section 6.1 can be used to argue that the distribution functions that correspond to thedensities in equation (18) should all be regarded as being externally very strong providedthat x j > x k +1 > x j + x k +1 is not very small for all values of j . The first ofthese conditions of course does not apply in the case where x = 1 in the example that hasbeen highlighted, but this example was not chosen to represent the most ideal scenario.Furthermore, since the joint fiducial distribution function of all the proportions p is fullydefined by the full conditional densities in equation (18), a similar line of reasoningto one mentioned in Section 5 can be used to argue that this joint distribution functionshould also be regarded as being externally very strong provided that the aforementionedconditions on the counts x hold, and the total count n is not very small relative to thenumber of proportions k + 1.Finally, it needs to be taken into account that the joint fiducial distribution functionin question is potentially sensitive to which of the population proportions is defined tobe the proportion p k +1 . However, extensive simulations that were conducted showedthat the effect of this choice of parameterisation was generally negligible, and was onlyfound to be slightly more than negligible in certain cases where the total count n wasless than the number of proportions k + 1. Moreover, this issue can be easily resolved byalways applying the criterion of designating the proportion p k +1 so that its correspondingcount x k +1 is the highest or equal highest out of all the counts x . As the count x k +1 isalways one of the two counts that are used to form each of the full conditional fiducialdensities in equation (18), this criterion is justifiable from a statistical viewpoint, and italso guarantees that the case is avoided where the count x k +1 = 0, and at least one of theremaining counts equals zero, which would imply that at least one of these conditionalfiducial densities is undefined. 29 . Examples with restricted parameter spaces Let us now turn our attention to examples of the application of organic fiducial inferencein which it was known, before the data were observed, that values in a given subset of thenatural space of the model parameters were impossible, but apart from this, nothing orvery little was known about these parameters. In relation to this issue, the importanceof the need to make inferences about a normal mean µ when there is a lower bound on µ ,and about a Poisson rate parameter τ when there is a positive lower bound on τ has beenunderlined by practical examples from the field of quantum physics that are described,for example, in Mandelkern (2002). These examples motivate what will be examined inthe present section. With regard to the example considered in Section 5, let us change what is assumed tohave been known about the mean µ before the data were observed to the assumptionthat, for any given value of the variance σ , it was known that µ > µ , where µ is agiven finite constant, but apart from this, nothing or very little was known about µ . Inthis situation, it is quite natural to specify the GPD function for µ as follows: ω G ( µ ) = (cid:26) a if µ > µ µ ≤ µ where a >
0. Although, as was the case in Section 5, this GPD function is neutral,this time the condition in equation (4) will never hold, and therefore the fiducial density f ( µ | σ , x ) is derived under Principle 1 by using the moderate rather than the strongfiducial argument. The consequence of this in terms of the definition of the marginalfiducial density for µ is that this density function becomes simply the marginal densityfunction for µ defined in equation (9) conditioned to lie in the interval ( µ , ∞ ). However,it is of interest to examine the potential effect on the relative external strength of this30arginal density function due to the use of the moderate rather than the strong fiducialargument in constructing the conditional density f ( µ | σ , x ).In this regard, let us remember that in the definition of the function ϕ (Γ , µ ) in equa-tion (8) it was assumed that the pre-data density function of the primary r.v. Γ, i.e. thefunction π ( γ ), is a standard normal density function. Now, on observing the samplemean ¯ x , we immediately know that the value γ generated in step 2 of the algorithm inAssumption 1 must be less than the value γ = ( √ n/σ )(¯ x − µ ).The moderate fiducial argument in this situation, i.e. the argument that the relativeheight of the post-data density function of Γ, i.e. the function π ( γ ), in the interval( −∞ , γ ) should be equal to the relative height of π ( γ ) over this interval, is similar (butnot identical) to the Bayesian argument that the relative height of a density function fora fixed parameter θ should not be affected by learning that a given subset of values for θ are impossible, apart from it of course becoming equal to zero over this subset. Althoughthis type of Bayesian argument has been criticised as being overly simplified due to thefact that it does not take into account the manner in which we learn that values in theparticular subset are impossible, see for example Shafer (1985), it is an argument thatis considered as being almost universally acceptable. For this reason, under the sameassumptions about the reference set R and the resolution λ as made in previous sections,it can be argued that the density function π ( γ ), i.e. a standard normal density for γ truncated to the interval ( −∞ , γ ), in the context of being a representation of what isbelieved about γ after the data are observed, should be regarded as being externally verystrong. As a result, under the same assumptions, the case can be made that the jointfiducial density of µ and σ in the present example, and the marginal densities that canbe derived from this joint density should also be regarded as being externally very strong.Clearly the same type of reasoning can be applied to many other problems of inferenceover restricted parameter spaces that are similar to the problem that has just been31iscussed. Returning to the problem of making inferences about a Poisson rate parameter τ thatwas discussed in Section 6.2, let us now assume that before the data were observed, it wasknown that τ > τ , where τ is a given positive constant, but apart from this, nothingor very little was known about τ . Again, as was the case in Section 6.2, it is clear thatPrinciple 1 can not be applied to determine the fiducial density of τ .Observe that, in this new situation, the set H x as defined in Condition 1, where theparameter θ j in this definition is τ , is the set { τ : τ > τ } , and that it is naturalto specify the GPD function ω G ( τ ) so that Condition 2(b) is satisfied. However, incontrast to the example outlined in Section 6.2, the definition of the function ϕ (Γ , τ )given in equation (15) implies that the set G x as defined in Condition 1 does not satisfyCondition 2(a), and therefore we have the problem of ‘spillage’ that was referred to atthe end of Section 3.4.The first step of a very straightforward way of trying to circumvent this difficulty isto make inferences about τ in an artificial scenario, namely the scenario considered inSection 6.2. In doing this, it will be assumed that the LPD function is chosen to representas best as possible a general situation where nothing or very little was known about theparameter τ over the interval (0 , ∞ ) before the data were observed, e.g. the LPD functiongiven in equation (16) or equation (17). Having determined a fiducial density for τ overthe interval (0 , ∞ ) by using this method, we then simply condition this density to liein the interval ( τ , ∞ ) to thereby obtain a fiducial density for τ that corresponds to theproblem at hand.Although in applying this strategy we do not directly use any of the three types offiducial argument outlined in Section 3.2, if the same strategy was applied to the example32iscussed in Section 7.1, which of course would not require the use of a LPD function,then the fiducial density of µ conditional on σ being known, i.e. the density f ( µ | σ , x ),would be the same as is obtained by using the approach put forward in this previoussection, i.e. an approach that is based on the moderate fiducial argument. On the otherhand, the strategy has the clear disadvantage that it depends on expressing pre-dataknowledge about a parameter of interest via the GPD function, and possibly also viathe LPD function, with regard to an artificial scenario rather than the scenario thatis actually under consideration. Nevertheless, under the same assumptions about thereference set R and the resolution λ as made in previous sections, it still can be arguedthat, if in the present example, the observed count x is greater than zero and is not verysmall relative to the threshold τ , then the fiducial density for τ that results from usingthis strategy should be regarded as being externally quite strong.To give a good practical example of the application of the strategy that has just beendiscussed, let us suppose that the threshold τ , which will be regarded as the event ratefor the background noise over a time length t , needs to be estimated on the basis of aPoisson count x collected over a period of length α times t when only background noisecould be present, where α is a given value. Since it will be assumed that τ can takeany positive value, the fiducial density of τ formed on the basis of the data x , i.e. thedensity f ( τ | x ), is defined in the same way as the fiducial density f ( τ | x ) was definedin Section 6.2. Taking into account also a Poisson count x collected over a period oflength t when a signal should be present, we will then be interested in making inferencesabout the event rate τ = τ + τ over this time period, which will be regarded as theevent rate for background noise plus the signal. Due to the fact that τ will be assumedto be a positive event rate, namely the event rate for the signal only, the parameter τ must be greater than τ , and so it will be assumed that the fiducial density of τ formedon the basis of the data x and conditioned on τ being greater than τ , i.e. the density33 ( τ | τ , x ), is determined using the method described in the present section. Given thesedefinitions, the joint fiducial density of τ and τ can therefore be expressed as f ( τ, τ | x, x ) = f ( τ | τ , x ) f ( τ | x )To illustrate a specific case, Figures 4(a) and 4(b) show histograms of one million inde-pendent random values drawn from, respectively, the marginal density of τ = τ + τ andthe marginal density of τ over this joint fiducial density assuming that the LPD functionthat was used to form both of the densities f ( τ | τ , x ) and f ( τ | x ) was the simple stepfunction given in equation (16) and that α = 4, x = 3 and x = 2. The solid-line anddashed-line curves in Figure 4(a) represent the posterior density of τ that corresponds,respectively, to the use of the Jeffreys prior for the case when τ is unrestricted over theinterval (0 , ∞ ) and to the use of this prior density with the condition that τ > .
75, where0.75 (= x /α ) is clearly the maximum likelihood estimate of τ . These curves have beenadded to this figure, only because we know that, under the conditions in question, theyclosely approximate the fiducial densities for τ when the LPD function being consideredis used. In particular, comparing the lower tails of the histogram and the dashed-linecurve in Figure 4(a), highlights the extra uncertainty that is introduced by taking intoaccount the statistical error in the estimation of τ .
8. An example with two different GPD functions that are non-neutral
To give a final example of the application of organic fiducial inference, let us again returnto the problem of inference considered in Section 5, and let us assume that the GPDfunction ω G ( µ ) used to determine the fiducial density of the mean µ given the variance σ , i.e. the density f ( µ | σ , x ), is one of the two step functions defined by ω G ( µ ) = (cid:26) a if µ >
01 otherwise (19)34 a) D en s i t y . . . . t + t (b) D en s i t y . . . . t Figure 4: Samples from marginal organic fiducial densities of Poisson event ratesand by ω G ( µ ) = (cid:26) a if − b < µ < b a is any given constant greater than one, and b is any given positive constant.As a way of interpreting either of these two GPD functions, it can be observed that ifthere is an interval of values ( γ , γ ) for the primary r.v. γ such that ω G ( µ ) = 1 for all µ ∈ { µ ( γ ) : γ ∈ ( γ , γ ) } , where in keeping with earlier notation µ ( γ ) is the value of µ that maps on to the value γ given the data x , and there is another interval { γ , γ } for γ such that ω G ( µ ) = a for all µ ∈ { µ ( γ ) : γ ∈ ( γ , γ ) } , then the probability of the event { γ ∈ ( γ , γ ) } divided by the probability of the event { γ ∈ ( γ , γ ) } will be regarded asbeing a times larger after the data are observed than before step 2 of the algorithm inAssumption 1 was implemented.Clearly the GPD function in equation (19) can be used to represent the scenario inwhich nothing or very little was known about µ before the data were observed, except thatit was known that, when the data are observed, positive values of µ would be regardedas being more likely and negative values of µ less likely than as required to be able toaccept the strong fiducial argument. On the other hand, if for example b is chosen to be35mall, the GPD function in equation (20) could be used to represent a scenario wherethere was little or no pre-data knowledge about µ except that, it was known that, whenthe data are observed, values of µ lying in a narrow interval centred at zero, which couldbe the value of µ that corresponds to the null effect of a treatment compared to a control,would be regarded as being more likely and values of µ lying outside of this interval lesslikely than as assumed by the strong fiducial argument.On the basis of either of the GPD functions in equations (19) and (20), the fiducialdensity f ( µ | σ , x ) is derived under Principle 1 by applying the weak fiducial argument.In particular, the two forms of this fiducial density that correspond to using these twoGPD functions are the same as the two forms of the posterior density for µ given σ that result from treating these GPD functions as prior densities for µ under the Bayesianparadigm. However, there are at least two good reasons why it is better to regard thesedensities as being fiducial densities backed by the methodology outlined in Section 3.4,rather than posterior densities backed up by standard Bayesian theory.First, if the GPD functions in equations (19) and (20) are treated as being priordensities then these density functions must be improper. This is also one of a number ofcriticisms that could be applied to the interpretation of the fiducial density f ( µ | σ , x )derived in Section 5 as being a posterior density for µ , as the required prior density for µ in this case would be a flat improper density for µ over the interval ( −∞ , ∞ ). Morespecifically, though, it would seem particularly awkward to try to justify either of theimproper prior densities for µ that correspond to the GPD functions being presentlyconsidered as being a natural approximation to a proper prior density, or some kind ofnatural limit of allowing a hyperparameter of a proper prior density to tend to infinity.This is due to the discontinuity that occurs at zero for the function in equation (19), andthe discontinuities that occur at − b and b for the function in equation (20).The second reason why it is better to use fiducial rather than Bayesian reasoning in36he cases under consideration is that the fiducial densities f ( µ | σ , x ) that correspondto the GPD functions in equations (19) and (20) can be regarded as being based on aset of conditional versions of these densities derived using the moderate fiducial argu-ment. In particular, under the GPD function in equation (19), the fiducial density for µ when µ is conditioned to lie in one of the intervals ( −∞ ,
0) or (0 , ∞ ) would be derivedusing the moderate fiducial argument, while, under the GPD function in equation (20),the fiducial density for µ when µ is conditioned to lie in one of the subsets ( − b, b ) or( −∞ , − b ) ∪ ( b, ∞ ) would also be derived using this type of fiducial argument. Takinginto account the intuitive appeal of the moderate fiducial argument that was discussedin Section 7.1, the case can be made that the partial dependence on this argument thathas been identified should mean that, under the same assumptions about the referenceset R and the resolution λ as made in previous sections, the relative external strengthof the fiducial density f ( µ | σ , x ), when µ is unrestricted over the whole of the real line,and when either of the GPD functions in question is used, should be regarded as be-ing reasonably high in many situations where the use of the GPD function concerned isconsidered to be adequate.The same line of reasoning can also be applied in assessing the relative externalstrength of the fiducial density of any given parameter θ j of any given sampling model g ( x | θ , θ , . . . , θ k ) conditional on all other parameters, provided that such a density for θ j can be derived under Principle 1, and the GPD function for θ j is a step function with atleast two steps that have distinct non-zero heights. Furthermore, if the GPD function for θ j is allowed to take any form that simply satisfies the loose requirements of Definition 3,then despite this line of reasoning being in general no longer applicable, the capacity toexpress pre-data knowledge about θ j in a way that is distinct from placing a prior densityover θ j under the Bayesian paradigm will be generally retained.On the other hand, if Condition 1 is not satisfied then, since Conditions 2(a) and 2(b)37an only be satisfied by special, albeit quite important, forms of the GPD function for θ j , e.g. the simple choices made for this function in the cases considered in Section 6, it isclear that over all possible choices for this function, we will not be able in general to makeinferences about θ j by directly using the methodology outlined in Section 3.4. However,in this general case, we can use a similar strategy to the one outlined in Section 7.2 by firstusing Principle 2 to construct a fiducial density f ( θ j | θ − j , x ) that would be appropriate inthe artificial scenario in which it is assumed that there was little or no pre-data knowledgeabout θ j , and then normalising the density function that results from multiplying thispreliminary fiducial density for θ j by the GPD function for θ j that corresponds to theactual scenario being considered. For a similar reason to that which has just been outlinedcombined with reasoning given in Section 7.2, this type of strategy would appear to beparticularly attractive if this latter GDP function for θ j is a step function, although itgenerally offers a useful alternative way of taking into account pre-data knowledge about θ j over all choices for this function.
9. Closing comment
Since the theory of organic fiducial inference is a generalisation of the theory of subjectivefiducial inference, issues that were identified in the final section of Bowater (2018a) asbeing relevant to the further development of this latter theory, i.e. the coherence ofinferences based on subsets of the data set of interest, alternative definitions of thefiducial statistic and computational issues, also apply to the theory that has been putforward in the present paper. To save space the reader is referred to this earlier paperfor a discussion of these issues. 38 eferences
Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall objective priors.
BayesianAnalysis , , 189–221.Bowater, R. J. (2017a). A formulation of the concept of probability based on the useof experimental devices. Communications in Statistics: Theory and Methods , ,4774–4790.Bowater, R. J. (2017b). A defence of subjective fiducial inference. AStA Advances inStatistical Analysis , , 177–197.Bowater, R. J. (2018a). Multivariate subjective fiducial inference. arXiv.org (CornellUniversity), Statistics , arXiv:1804.09804.Bowater, R. J. and Guzm´an, L. E. (2018b). On a generalized form of subjective prob-ability. arXiv.org (Cornell University), Statistics , arXiv:1810.10972.Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for Markovchain Monte Carlo. Statistics and Computing , , 319–335.Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculatingmarginal densities. Journal of the American Statistical Association , , 398–409.Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiplesequences. Statistical Science , , 457–472.Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and theBayesian restoration of images. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , , 721–741. 39andelkern, M. (2002). Setting confidence intervals for bounded parameters (withdiscussion). Statistical Science , , 149–172.Shafer, G. (1985). Conditional probability (with discussion). International StatisticalReview , , 261–277.Tuyl, F. (2017). A note on priors for the multinomial model. The American Statistician ,71