Conditional probability and improper priors
aa r X i v : . [ m a t h . S T ] J un Conditional probability and improper priors
Gunnar Taraldsen and Bo H. Lindqvist (2016)Communications in Statistics - Theory and Methods,45:17, 5007-5016, doi:10.1080/03610926.2014.935430
Contents bstract According to Jeffreys improper priors are needed to get the Bayesianmachine up and running. This may be disputed, but usage of improperpriors flourish. Arguments based on symmetry or information theo-retic reference analysis can be most convincing in concrete cases. Thefoundations of statistics as usually formulated rely on the axioms ofa probability space, or alternative information theoretic axioms thatimply the axioms of a probability space. These axioms do not includeimproper laws, but this is typically ignored in papers that considerimproper priors.The purpose of this paper is to present a mathematical theorythat can be used as a foundation for statistics that include improperpriors. This theory includes improper laws in the initial axioms andhas in particular Bayes theorem as a consequence. Another conse-quence is that some of the usual calculation rules are modified. Thisis important in relation to common statistical practice which usuallyinclude improper priors, but tends to use unaltered calculation rules.In some cases the results are valid, but in other cases inconsistenciesmay appear. The famous marginalization paradoxes exemplify thislatter case.An alternative mathematical theory for the foundations of statis-tics can be formulated in terms of conditional probability spaces. Inthis case the appearance of improper laws is a consequence of the the-ory. It is proved here that the resulting mathematical structures forthe two theories are equivalent. The conclusion is that the choice ofthe first or the second formulation for the initial axioms can be con-sidered a matter of personal preference. Readers that initially haveconcerns regarding improper priors can possibly be more open towarda formulation of the initial axioms in terms of conditional probabilities.The interpretation of an improper law is given by the correspondingconditional probabilities.
Keywords : Axioms of statistics, Conditional probability space, Improperprior, Projective space 2
Introduction
Statistical developments are driven by applications, theory, and most im-portantly the interplay between applications and theory. The following isintended for readers that can appreciate the importance of the theoreticalfoundation for probability theory as given by the axioms of Kolmogorov[1933]. According to the recipe of Kolmogorov a random element in a set Ω X is identified with a measurable function X : Ω → Ω X where (Ω , E , P) is thebasic probability space that the whole theory is based upon.The use of improper priors is common in the statistical literature, butmost often without reference to a corresponding theoretical foundation. Itwill here be explained how the theory of conditional probability spaces asdeveloped by Renyi [1970a] is related to a theory for statistics that includesimproper priors. This theory has been presented in a simplified form withsome elementary examples by Taraldsen and Lindqvist [2010]. The idea isto use the above recipe given by Kolmogorov, but generalized by assumingthat (Ω , E , P) is defined by the use of a σ -finite measure. The underlyinglaw P itself is not a σ -finite measure, but it is an equivalence class of σ -finitemeasures. A more precise formulation is given by Definition 2 below.In oral presentations of the theory related to improper priors it is quitecommon that someone in the audience makes the following claim: Improperpriors are just limits of proper priors, so we need not consider improperpriors.
We strongly disagree with this even though it is true that improperpriors can be obtained as limits of proper priors. The reason is perhapsbest explained by analogy with a more familiar example: It is true thatthe real number system is obtained as a limit of the rational numbers. Aprecise construction, and this is important, is found by the aid of equivalenceclasses of Cauchy sequences. Nonetheless, most people prefer to think of realnumbers as such without reference to rational numbers. Real numbers arejust limits of rational numbers, but we use them with properties as given bythe axioms of the real number system.The reader will not find any new algorithms or methods for the solu-tion of practical problems here. The presented theory can, however, be usedto put many known solutions to practical problems on a more solid theo-retical foundation. This additional effort is necessary to avoid and explaincontra-dictionary results as exemplified by for instance the famous marginal-ization paradoxes [Stone and Dawid, 1972, Dawid et al., 1973]. The theoryalso gives a natural frame for the proof of optimality of inference based3n fiducial theory [Taraldsen and Lindqvist, 2013a] and a proof of coinci-dence of a fiducial distribution and certain Bayesian posteriors based onimproper priors [Taraldsen and Lindqvist, 2013b]. The theory has also beenused for a rigorous specification of intrinsic conditional autoregression models[Lavine and Hodges, 2012]. These models are widely used in spatial statis-tics, dynamic linear models, and elsewhere.
The aim in the following is to formulate a theory that can be used to providea foundation for statistics that includes improper priors. A less technical pre-sentation of this with some elementary examples has already been providedby Taraldsen and Lindqvist [2010]. They show in particular by examplesthat this theory is different from the alternative theory for improper priorsprovided by Hartigan [1983]. This section gives a condensed presentation ofthe mathematical ingredients. Most definitions are standard as presented byfor instance Rudin [1987], but some are not standard and are emphasized inthe text. An example is the concept of a
C-measure as introduced below.Let X be a nonempty set, and F a family of subsets that includes theempty set ∅ . The family F is a σ -field if it is closed under formation ofcountable unions and complements. A set is measurable if it belongs to F .A measurable space X is a set X equipped with a σ -field F . The samesymbol X is here used to denote both the set and the space. The notation( X , F ) is also used to denote the measurable space. The convention here isto use the term space to denote a set with some additional structure.A measure space ( X , F , µ ) is a measurable space ( X , F ) equipped with ameasure µ . A measure is a function µ : F → [0 , ∞ ] that is countably additive:The equality µ ( S i A i ) = P i µ ( A i ) holds for disjoint measurable A , A , . . . . Definition 1.
Let ( X , F , µ ) be a measure space. An admissible condition A is a measurable set with < µA < ∞ . The measure space ( X , F , µ ) is σ -finite if there exists a sequence A , A , . . . of admissible conditions such that X = ∪ i A i . A probability space is a mea-sure space ( X , F , µ ) such that µ ( X ) = 1, and µ is then said to be a probabilitymeasure.Consider the set M of all σ -finite measures on a fixed measurable space( X , F ). The set M includes in particular all probability measures, and the4ollowing gives new concepts also when restricted to probability measures.Two elements µ and ν in M are defined to be equivalent if µ = αν forsome positive number α . This defines an equivalence relation ∼ and thequotient space M/ ∼ defines the set of C-measures . It should in particular beobserved that any topology on M induces a corresponding quotient topologyon M/ ∼ . Convergence of C-measures is an important topic, but the study ofthis is left for the future. Some further discussion will, however, be providedin Section 4. A C-measure space ( X , F , γ ) is a measurable space ( X , F )equipped with a C-measure γ . This means that γ = [ µ ] = { ν | ν ∼ µ } is anequivalence class of σ -finite measures. The term conditional probability space will here be used as an equivalent term. This convention will be motivatednext, and is further elaborated in Section 4.Let ( X , F , γ ) be a conditional probability space, and let A be an admissi-ble condition. The conditional law, or equivalently the conditional measure , γ ( · | A ) is then well defined by γ ( B | A ) = ν ( B | A ) = ν ( B ∩ A ) /ν ( A ) where ν ∈ γ . It is well defined since it does not depend on the choice of ν , and theresulting conditional law is a probability measure. This argument gives: Proposition 1.
A conditional probability space ( X , F , γ ) defines a uniquefamily of probability spaces indexed by the admissible sets: { ( X , F , γ ( · | A )) | A is admissible } . For ease of reference we restate the following definition.
Definition 2.
Let ( X , F ) be a measurable space. A conditional measure γ = [ ν ] = { αν | < α < ∞ , ν is σ -finite } is a set of σ -finite measuresdefined by a σ -finite measure ν defined on ( X , F ) . Let A be an admissiblecondition and let B be measurable. The formula γ ( B | A ) = ν ( B ∩ A ) /ν ( A ) defines the conditional probability of B given A . A conditional measure space ( X , F , γ ) is a measurable space ( X , F ) equipped with a conditional measure γ . In general probability and statistics it is most useful to extend the defi-nition of conditional probability and expectation to include conditioning on σ -fields and statistics. This can be done also in the more general contexthere. The main new ingredient is given by the definition of σ -finite σ -fieldsand σ -finite measurable functions.Let ( X , F , µ ) be a measure space. Assume that F ⊂ F is a σ -finite σ -field in the sense that ( X , F , µ ) is σ -finite where µ is the restriction of µ to F . This implies that ( X , F , µ ) is also σ -finite. Let A ∈ F . The conditional5easure µ ( · | F ) is defined by the F measurable function x µ ( A | F )( x )uniquely determined by the relation µ ( A ∩ B ) = Z B µ ( A | F )( x ) µ ( dx ) (1)which is required to hold for all measurable subsets B ∈ F .The existence and uniqueness proof follows by observing: (i) νB = µ ( AB )defines a measure on F . (ii) ν is dominated by the measure µ . (iii)The Radon-Nikodym theorem gives existence and uniqueness of the con-ditional µ ( A | F ) as the density of ν with respect to µ so the claim ν ( dx ) = µ ( A | F )( x ) µ ( dx ) follows. The uniqueness is only as a measurable functiondefined on the measure space ( X , F , µ ) and the conditional probability ismore properly identified with an equivalence class of measurable functions.The defining equation (1) shows that µ ( A | F ) = ( αµ )( A | F ) for all α >
0. It can be concluded that the conditional measure γ ( A | F ) is well definedif ( X , F , γ ) is a conditional probability space. An immediate consequenceis γ ( X | F ) = 1, so the conditional measures are all normalized. The term conditional probability will motivated by this be used as equivalent to theterm conditional measure . This is as above for the elementary conditionalmeasure.The following example demonstrates that this conditional probability gen-eralizes the elementary conditional probabilities γ ( A | B ). Let F be the σ -field generated by a countable partition of X into disjoint admissible sets A , A , . . . . It follows that F is σ -finite. Assuming this the conditionalexpectation is given by γ ( A | F )( x ) = γ ( A | A i ) for x ∈ A i .The definitions presented so far lead naturally to the definition of the cat-egory of conditional probability spaces with a corresponding class of arrows.The study of this, and functors to related categories, will not be pursuedhere. This more general theory gives, however, alternative motivation forsome of the concepts presented next.A function φ : X → Y is measurable if the inverse image of every mea-surable set is measurable: ( φ ∈ A ) = φ − ( A ) = { x | φ ( x ) ∈ A } is measurablefor all measurable A . Let µ be a measure on X . The image measure µ φ isdefined by µ φ ( A ) = µ ( φ ∈ A ) (2) A measurable function φ is by definition σ -finite if µ φ is σ -finite. A directverification shows that a σ -finite function φ : X → Y pushes a conditional6robability space structure on X into a conditional probability space struc-ture on Y . This follows from the above and the identity [ µ ] φ = [ µ φ ]. Con-sequently, if γ is a C-measure, then the C-measure γ φ is well defined if φ is σ -finite. The definition given here of a σ -finite function is a generalization ofthe concept of a regular random variable as defined by Renyi [1970a, p.73].The definition of σ -finite σ -fields and σ -finite measurable functions can bereformulated as follows. Definition 3.
Let ( X , F , µ ) be a σ -finite measure space. A σ -field F ⊂ F is σ -finite if µ restricted to F is σ -finite. Let ( Y , G ) be a measurable space. Ameasurable φ : X → Y is σ -finite if the σ -field F = {{ x | φ ( x ) ∈ A } | A ∈ G} is σ -finite. The previous arguments show that the σ -finite functions can play therole as arrows in the category of conditional probability spaces. The σ -finite functions can also be used to define conditional probabilities just as σ -finite σ -fields did in equation (1). It will be a generalization since theprevious definition is obtained by consideration of the function x x as afunction taking values in the space equipped with the σ -finite σ -field in theconstruction that follows.Assume that δ : X → Z is σ -finite. The conditional probability µ z ( A ) = µ ( A | δ = z ) is defined by the relation µ ( A [ δ ∈ B ]) = Z µ ( A | δ = z )[ z ∈ B ] µ δ ( dz ) (3)which is required to hold for all measurable subsets B ⊂ Z . The existenceand uniqueness proof follows by an argument similar to the argument afterequation (1). The identity [ µ ] z = µ z holds and the conditional measure γ z iswell defined for a C-measure γ .Composition of the functions x δ ( x ) = z and z µ z ( A ) defines theconditional probability µ ( A | δ ) as a measurable function defined on X . Thisfunction is measurable with respect to the initial σ -field F δ ⊂ F generatedby δ . The σ -finiteness of δ is equivalent with the σ -finiteness of F δ . Directverification shows that the definitions of conditional probability given byequations (1) and (3) coincide in the sense that µ ( A | F δ ) = µ ( A | δ ).The conclusion is that a conditional probability space ( X , F , γ ) is not onlyequipped with the family of elementary conditional probabilities { γ ( · | A ) | A ∈A} , but also a family { γ z | z ∈ Z} of conditional probabilities for each σ -finite δ : X → Z . 7 he conditional probability µ zφ on Y is defined by µ zφ ( A ) = µ z ( φ ∈ A ) = µ ( φ ∈ A | δ = z ) (4)The function δ must be σ -finite, but it is not required that φ : X → Y is σ -finite. It follows that µ φ,δ ( dy, dz ) = µ zφ ( dy ) µ δ ( dz ) (5)The case X = Y and φ ( x ) = x gives the defining equation (3) as a specialcase of the more general factorization given by equation (5).The previous discussion can be summarized by: Proposition 2.
A conditional measure space ( X , F , γ ) , a σ -finite δ : X → Z ,and a measurable φ : X → Y define a unique measurable family { γ zφ | z ∈ Z} of conditional measures defined on Y . It does not follow in general that there exists a version of γ zφ such thatthis is a measure for almost all z . A sufficient condition for this is that Y is a Borel space [Schervish, 1995, p.618]. This is the case if Y is in one-one measurable correspondence with a measurable subset of an uncountablecomplete separable metric space [Royden, 1989, p.406]. The correspondingversion of the conditional probability is then said to be a regular conditionalprobability . Integration with respect to γ zφ can nonetheless be defined withoutany regularity conditions, and the factorization given by equation (5) holdsin the most general case as stated. The possibility of this more generalintegral with respect to conditional probabilities was indicated already byKolmogorov [1933, eq.10 on p.54]. Let x be the observed result of an experiment. It will be assumed that x can be identified with an element of a mathematically well defined setΩ X . The set should include all other possible outcomes that could have beenobserved as a result of the experiment. The observed result can be a number,a collection of numbers organized in some way, a continuous function, aself-adjoint linear operator, a closed subset of a topological space, or anyother element of a well defined set corresponding to the experiment underconsideration. 8ssume that the sample space Ω X is equipped with a σ -field E X so that(Ω X , E X ) is a measurable space. Assume furthermore that (Ω X , E X , P θX ) is aprobability space for each θ in the model parameter space Ω Θ . The family { P θX | θ ∈ Ω Θ } specifies a statistical model for the experiment.A predominant family of example in the applied statistical literature isgiven by letting P θX be the multivariate Gaussian distribution on Ω X = R N with covariance matrix Σ( θ ) and mean µ ( θ ) where Ω Θ ⊂ R K . The sim-plest special case is given by Σ = I and µ = ( θ ; . . . ; θ ) which correspondsto independent sampling from the univariate Gaussian with unknown mean θ and variance equal to 1. Other examples included are given by ANOVAmodels with fixed and random effects, more general regression models, struc-tured equations models with latent variables from the fields of psychologyand economy [J¨oreskog, 1970], and a variety of models from the statisticalsignal processing literature [Van Trees, 2002]. These models correspond tospecific choices of the functional dependence on θ in Σ( θ ) and µ ( θ ).The contents so far coincides with the definition of a statistical model asfound in standard statistical literature. One exception is the choice of thenotation { P θX | θ ∈ Ω Θ } for the statistical model. This choice indicates theconnection to the theory of conditional probability spaces as will be explainedby the introduction of further assumptions.It is assumed that the statistical model is based upon an underlyingabstract conditional probability space (Ω , E , P). This includes the case of anunderlying abstract probability space as formulated by Kolmogorov [1933]as a special case, but seen as a conditional probability space. It is abstractin the sense that it is assumed to exist, but it is not specified. It is assumedthat the model parameter space is a measurable space (Ω Θ , E Θ ), that thereexists a σ -finite measurable Θ : Ω → Ω Θ , and that there exists a measurable X : Ω → Ω X so that the resulting conditional probability P θX as defined inequation (4) coincides with the specified statistical model.Existence of Ω, P, Θ, and X can be proved in many concrete cases byconsideration of the product space Ω X × Ω Θ equipped with the σ -finite mea-sure P θX ( dx ) π ( dθ ) obtained from the choice of a σ -finite measure π . Thisincludes in particular the multivariate Gaussian example indicated above.As soon as existence is established it is assumed for the further theoreticaldevelopment that Ω, P, Θ, and X are abstract unspecified objects with therequired resulting statistical model as a consequence.It can be observed that it is required that the mapping θ P θX ( A ) ismeasurable for all measurable A for the above construction to be possible.9his condition is trivially satisfied in most examples found in applications,and is furthermore a typical assumption in theoretical developments. A goodexample of the latter is given by the mathematical proof of the factorizationtheorem for sufficient statistics [Halmos and Savage, 1949].The basis for frequentist inference is then the observation x and the spec-ified statistical model P θX based on the underlying abstract conditional prob-ability space (Ω , E , P).The basis for Bayesian inference is as for frequentist inference, but theprior distribution P Θ is also specified. The basis for Bayesian inference ishence the observation x and the joint distribution P X, Θ ( dx, dθ ) = P θX ( dx ) P Θ ( dθ ).The conclusions of Bayesian inference are derived from the posterior distribu-tion P x Θ , which is well defined by equation (4) if X is σ -finite. This result canbe considered to be a very general version of Bayes theorem as promised inthe Abstract. A discussion of a more elementary version involving densitiesis given by Taraldsen and Lindqvist [2010].The importance of the σ -finiteness of X has also been observed by others,but then as a requirement on the prior. Berger et al. [2009, p.911] includesthis requirement as a part of the definition of a permissible prior . The def-inition as formulated in this section can be used as a generalization of thispart to cases not defined by densities.A summary of the contents in this section is given by: Definition 4 (Statistical model) . A statistical model { (Ω X , E X , P θX ) | θ ∈ Ω Θ } is specified by a family of probability spaces indexed by the model param-eter space (Ω Θ , E Θ ) with the additional structure defined in the following.It is assumed that all objects are defined based on the underlying condi-tional probability space (Ω , E , P ) . The observation is given by a measurablefunction X : Ω → Ω X and the model parameter is given by a σ -finite mea-surable function Θ : Ω → Ω Θ . It is assumed that the family of probabilitymeasures is given by the conditional law, so P θX ( A ) = P ( X ∈ A | Θ = θ ) .A Bayesian statistical model is specified by a statistical model togetherwith a specification of the prior law P Θ . It is assumed that X is σ -finite, andthen the resulting marginal law P X is a conditional measure and the resultingposterior law P x Θ ( B ) = P (Θ ∈ B | X = x ) is well defined. In the previous the prior P Θ , the marginal P X , and the joint distribu-tion P X, Θ are C-measures with corresponding conditional probability spaces(Ω Θ , E Θ , P Θ ), (Ω X , E X , P X ), and (Ω X, Θ , E X, Θ , P X, Θ ). The interpretation of the10rior is in terms of the corresponding elementary conditional laws P Θ ( · | A ).The same holds for the other improper laws.Bayesian inference is essentially unique. This is in contrast to frequentistinference which most often offer many different possible inference proceduresfor a given problem. An analogous situation occurs in applied metrologywhere it is common to have many different measurement instruments avail-able for the measurement of a physical quantity. The choice of instrumentdepends on the actual situation and purpose of the experiment at hand.The previous gives a mathematical definition of a statistical model anda Bayesian statistical model based on the concept of a conditional measure.The concept of a fiducial statistical model can also be defined based on thesame theory. The necessary ingredients and further discussion of this havebeen presented by Taraldsen and Lindqvist [2013a,b].
Renyi [1970a, p.38-] gives a definition of a conditional probability space basedon a family of objects µ ( A | B ). A condensed summary of the initial ingre-dients in this theory is presented next, but with some extensions and minorchanges in the choice naming conventions. The purpose is to show the closeconnection to the concept of a conditional measure space as discussed in theprevious two section. The final words of Renyi on this subject are recom-mended for a more thorough [Renyi, 1970a] and pedagogical presentation[Renyi, 1970b] of the theory as formulated and motivated by Renyi. Definition 5 (Bunch) . Let ( X , F ) be a measurable space. A family B ⊂ F is a bunch in X if1. B , B ∈ B ⇒ B ∪ B ∈ B .2. There exist B , B , . . . ∈ B such that S i B i = X .3. The empty set ∅ does not belong to B .Example 1 Let ( X , F ) be the real line equipped with the Borel σ -field.Let B be the set of finite nonempty unions of open intervals on the form( m/ , m/
2) where m are integers. The family B is then a countablebunch. ✷ efinition 6 (Renyi space) . A Renyi space ( X , F , ν ) is a measurable space ( X , F ) equipped with a family { ν ( · | B ) | B ∈ B} of probability measures in-dexed by a bunch B which fulfill B , B ∈ B and B ⊂ B ⇒ ν ( B | B ) > ,and the identity ν ( A | B ) = ν ( A ∩ B | B ) ν ( B | B ) (6)A Renyi space ( X , F , ν ) extends a Renyi space ( X , F , ν ) by definitionif B ⊂ B and ν ( · | B ) = ν ( · | B ) for all B ∈ B . The extension is strict if B ⊂ B and B = B . A Renyi space is maximal if a strict extension doesnot exist. Example 1 (continued)
Let ν ( A | B ) be the uniform probability law on B for each B ∈ B . This gives a Renyi space ( X , F , ν ) where ( X , F ) is thereal line with the Borel σ -field. The family { ν ( · | B ) } B ∈B is in this case acountable family of probability measures.Let µ = [ m ] be the C-measure given by Lebesgue measure m on the realline. The elementary conditional measures µ ( · | A ) for admissible A ∈ A defines a Renyi space ( X , F , µ ) which contains the Renyi space ( X , F , ν ) inthe sense that B ⊂ A and ν ( · | B ) = µ ( · | B ) for all B ∈ B . It follows fromthe results presented next that ( X , F , µ ) is a maximal extension of ( X , F , ν ). ✷ It follows generally that a C-measure space ( X , F , µ ) generates a uniqueRenyi space ( X , F , µ ) through the elementary conditional measures µ ( · | A ).The same symbol µ is here used for two different concepts. Further excusefor this abuse of notation is given by the following structure result: Proposition 3.
A Renyi space generates a unique conditional measure space.The corresponding resulting Renyi space is a maximal extension of the initialRenyi space.Proof.
Let ( X , F , ν ) be the Renyi space. It will be proved that there existsa σ -finite measure µ such that µ ( · | B ) = ν ( · | B ) for all B ∈ B , and that theC-measure [ µ ] is unique.The first step in the proof is to pick an arbitrary B ∈ B and define µ ( B ) = ν ( B | B ∪ B ) /ν ( B | B ∪ B ) for B ∈ B . This choice gives thenormalization µ ( B ) = 1. This definition is extended to measurable A ⊂ B ∈ B by µ ( A ) = µ ( A ∩ B ) = ν ( A | B ) µ ( B ). An arbitrary measurable A can be written as a disjoint union of measurable A , A , . . . where each A i
12s contained in some set B from the bunch. The measure µ is then finallydefined by µ ( A ) = P i µ ( A i ).Equation (6) can be used to prove that the previous definition of µ ( A )based on a A ⊂ B for B ∈ B does not depend on the choice of B . This, andfurther proof of consistency and uniqueness of [ µ ] is left to the reader. Analternative is to consult the proof of a corresponding result given by Renyi[1970a, p.40-43].Two different Renyi spaces can generate the same C-measure space. Aconcrete example is provided by consideration of the bunch generated bythe intervals ( m/ , m/ Corollary 1.
A Renyi space has a unique extension to a maximal Renyispace. The set of maximal Renyi spaces is in one-one correspondence withthe set of C-measure spaces.Proof.
Let ( X , F , ν ) be a Renyi space and let ( X , F , [ µ ]) be the correspondinggenerated C-measure space. The Renyi space ( X , F , [ µ ]) given by the set ofadmissible conditions A is then a unique maximal extension. Uniquenessand maximality follows since any Renyi space that contains ( X , F , ν ) willgenerate the C-measure space ( X , F , [ µ ]) by the construction given in theproof of Proposition 3.A more general concept of a conditional probability space was originallyintroduced by Renyi [1955], and a corresponding more general structure the-orem was proved by Cs´asz´ar [1955]. Renyi [1970a, p.95] refers to these moregeneral spaces as generalized conditional probability spaces. They are trulymore general and a generalized conditional probability space is not necessar-ily generated by a single σ -finite measure. The distinction between a σ -finite measure space and the corresponding C-measure space could at first sight seem trivial. For a σ -finite measure µ thecorresponding C-measure ν = [ µ ] is an equivalence class of σ -finite measuresin the set of all σ -finite measures on the measurable space X . It follows,13s stated earlier, that any topology on the set of σ -finite measures gives acorresponding quotient topology on the set of C-measures. Convergence of σ -finite measures is different from convergence of the corresponding conditionalmeasures. This is also true if the initial σ -finite measure is a probabilitymeasure.An alternative is to consider the C-measure space as a maximal Renyispace, and this is a concept more clearly distinct from that of a σ -finite mea-sure space. Convergence concepts for Renyi spaces can be studied directly,and initial work on this has been done by Renyi [1970a]. He shows in partic-ular that any conditional measure can be obtained as a limit of conditionalmeasures corresponding to probability measures in a reasonable topology.The study of convergence concepts exemplify an important difference be-tween σ -finite measures and C-measures. This is left for the future.The distinction between a σ -finite measure space and the correspond-ing C-measure space can also be seen by analogy with the construction ofprojective spaces. The projective space P n ( R ) as a set is the set of linesthrough the origin 0 in R n +1 . It is hence equal to the set of equivalenceclasses [ x ] = { λx | λ ∈ R \ { } , x ∈ R n +1 \ { }} in R n +1 \ { } . The set ofC-measures on a measurable space is hence different from the set of σ -finitemeasures just as a projective space is different from the space on which it isconstructed.The presented theory is in line with the arguments given by Jeffreys[1939]. He argues that improper priors are necessary to get the Bayesianmachine up and running. This point of view can be disputed, but it isindisputable that usage of improper priors flourish in the statistical literature.There is hence a need for a theory that includes improper priors.Lindley [1965], apparently in line with the view of the current authors,found that the theory of Renyi is a natural starting point for statistical theory.In the Preface to his classical text on probability he writes:The axiomatic structure used here is not the usual one associatedwith the name of Kolmogorov. Instead one based on the ideas ofRenyi has been used. The essential difference between the twoapproaches is that Renyi’s is stated in terms of conditional prob-abilities, whereas Kolmogorov’s is in terms of absolute probabil-ities, and conditional probabilities are defined in terms of them.Our treatment always refers to the probability of A, given B, andnot simply to the probability of A. In my experience students14enefit from having to think of probability as a function of twoarguments, A and B, right from the beginning. The condition-ing event, B, is then not easily forgotten and misunderstandingsare avoided. These ideas are particularly important in Bayesianinference where one’s views are influenced by the changes in theconditioning event.Lindley [1965] refers to an earlier German edition of the book cited here[Renyi, 1962]. The two books [Renyi, 1970b,a] represent the final view ofRenyi regarding conditional probability spaces, but the basis for the the-ory development are found in earlier articles [Renyi and Turan, 1976, Renyi,1955]. The extension given by conditioning on σ -finite statistics and σ -finite σ -fields is not treated by Renyi.The structure theorem shows in general that a family of conditional prob-abilities that satisfies the axioms of a Renyi space given in Definition 6 can beextended so that it corresponds to a unique maximal Renyi space which canbe identified with a C-measure space. The family of conditional probabilitiesgives intuitive motivation and interpretation for usage of improper laws inprobability and statistics. In this theory any marginal law corresponds to aconditional probability space. All probabilities are conditional probabilities. References
James O. Berger, Jos´e M. Bernardo, and Dongchu Sun. The Formal Defini-tion of Reference Priors.
The Annals of Statistics , 37(2), 2009.´Akos Cs´asz´ar. Sur la structure des espaces de probabilit´e conditionnelle.
ActaMathematica Academiae Scientiarum Hungarica , 6:337–361, 1955.A. P. Dawid, M. Stone, and J. V. Zidek. Marginalization Paradoxes inBayesian and Structural Inference.
Journal of the Royal Statistical So-ciety Series B-Statistical Methodology , 35(2):189–233, 1973.P. Halmos and L. J. Savage. Application of the Radon-Nikodym theorem tothe theory of sufficient statistics.
Annals of Mathematical Statistics , 20:225–241, 1949.J. Hartigan.
Bayes theory . Springer, 1983.15. Jeffreys.
Theory of probability (1966 ed) . Oxford, third edition, 1939.K. G. J¨oreskog. A general method for analysis of covariance structures.
Biometrika , 57(2):239–251, August 1970.A. Kolmogorov.
Foundations of the theory of probability . Chelsea edition(1956), second edition, 1933.Michael L. Lavine and James S. Hodges. On Rigorous Specification of ICARModels.
The American Statistician , 66(1):42–49, 2012.D. V. Lindley.
Introduction to Probability and Statistics from a BayesianViewpoint(Cambridge 2008 reprint) , volume I-II. Cambridge UniversityPress, 1965.A. Renyi.
Wahrscheinlichkeitsrechnung (Probability theory 1970) . DeutscherVerlag der Wissenschaften, Berlin, 1962.A. Renyi.
Foundations of Probability . Holden-Day, 1970a.A. Renyi and P. Turan.
Selected papers of Alfred Renyi , volume I-III.Akademiai Kiado, Budapest, 1976.Alfred Renyi. On a new axiomatic theory of probability.
Acta MathematicaHungarica , 6(3):285–335, September 1955.Alfred Renyi.
Probability Theory . Dover Publications, dover ed (2007) edi-tion, May 1970b.H. L. Royden.
Real Analysis . Macmillan, third edition, 1989.W. Rudin.
Real and Complex Analysis . McGraw-Hill, 1987.M. J. Schervish.
Theory of Statistics . Springer, 1995.M. Stone and A. P. Dawid. Un-Bayesian Implications of Improper Bayes In-ference in Routine Statistical Problems.
Biometrika , 59(2):369–375, 1972.G. Taraldsen and B. H. Lindqvist. Improper Priors Are Not Improper.
TheAmerican Statistician , 64(2):154–158, 2010.G. Taraldsen and B. H. Lindqvist. Fiducial theory and optimal inference.
Annals of Statistics , 41(1):323–341, 2013a.16. Taraldsen and B. H. Lindqvist. Fiducial and Posterior Sampling.
Com-munications in Statistics:Theory and Methods (accepted) , 2013b.H. L. Van Trees.