[PDF] Conditional probability and improper priors

Abstract

The purpose of this paper is to present a mathematical theory that can be used as a foundation for statistics that include improper priors. This theory includes improper laws in the initial axioms and has in particular Bayes theorem as a consequence. Another consequence is that some of the usual calculation rules are modified. This is important in relation to common statistical practice which usually include improper priors, but tends to use unaltered calculation rules. In some cases the results are valid, but in other cases inconsistencies may appear. The famous marginalization paradoxes exemplify this latter case. An alternative mathematical theory for the foundations of statistics can be formulated in terms of conditional probability spaces. In this case the appearance of improper laws is a consequence of the theory. It is proved here that the resulting mathematical structures for the two theories are equivalent. The conclusion is that the choice of the first or the second formulation for the initial axioms can be considered a matter of personal preference. Readers that initially have concerns regarding improper priors can possibly be more open toward a formulation of the initial axioms in terms of conditional probabilities. The interpretation of an improper law is given by the corresponding conditional probabilities. Keywords: Axioms of statistics, Conditional probability space, Improper prior, Projective space

Full PDF

aa r X i v : . [ m a t h . S T ] J un Conditional probability and improper priors

Gunnar Taraldsen and Bo H. Lindqvist (2016)Communications in Statistics - Theory and Methods,45:17, 5007-5016, doi:10.1080/03610926.2014.935430

Contents bstract According to Jeﬀreys improper priors are needed to get the Bayesianmachine up and running. This may be disputed, but usage of improperpriors ﬂourish. Arguments based on symmetry or information theo-retic reference analysis can be most convincing in concrete cases. Thefoundations of statistics as usually formulated rely on the axioms ofa probability space, or alternative information theoretic axioms thatimply the axioms of a probability space. These axioms do not includeimproper laws, but this is typically ignored in papers that considerimproper priors.The purpose of this paper is to present a mathematical theorythat can be used as a foundation for statistics that include improperpriors. This theory includes improper laws in the initial axioms andhas in particular Bayes theorem as a consequence. Another conse-quence is that some of the usual calculation rules are modiﬁed. Thisis important in relation to common statistical practice which usuallyinclude improper priors, but tends to use unaltered calculation rules.In some cases the results are valid, but in other cases inconsistenciesmay appear. The famous marginalization paradoxes exemplify thislatter case.An alternative mathematical theory for the foundations of statis-tics can be formulated in terms of conditional probability spaces. Inthis case the appearance of improper laws is a consequence of the the-ory. It is proved here that the resulting mathematical structures forthe two theories are equivalent. The conclusion is that the choice ofthe ﬁrst or the second formulation for the initial axioms can be con-sidered a matter of personal preference. Readers that initially haveconcerns regarding improper priors can possibly be more open towarda formulation of the initial axioms in terms of conditional probabilities.The interpretation of an improper law is given by the correspondingconditional probabilities.

Keywords : Axioms of statistics, Conditional probability space, Improperprior, Projective space 2

Introduction

Statistical developments are driven by applications, theory, and most im-portantly the interplay between applications and theory. The following isintended for readers that can appreciate the importance of the theoreticalfoundation for probability theory as given by the axioms of Kolmogorov[1933]. According to the recipe of Kolmogorov a random element in a set Ω X is identiﬁed with a measurable function X : Ω → Ω X where (Ω , E , P) is thebasic probability space that the whole theory is based upon.The use of improper priors is common in the statistical literature, butmost often without reference to a corresponding theoretical foundation. Itwill here be explained how the theory of conditional probability spaces asdeveloped by Renyi [1970a] is related to a theory for statistics that includesimproper priors. This theory has been presented in a simpliﬁed form withsome elementary examples by Taraldsen and Lindqvist [2010]. The idea isto use the above recipe given by Kolmogorov, but generalized by assumingthat (Ω , E , P) is deﬁned by the use of a σ -ﬁnite measure. The underlyinglaw P itself is not a σ -ﬁnite measure, but it is an equivalence class of σ -ﬁnitemeasures. A more precise formulation is given by Deﬁnition 2 below.In oral presentations of the theory related to improper priors it is quitecommon that someone in the audience makes the following claim: Improperpriors are just limits of proper priors, so we need not consider improperpriors.

We strongly disagree with this even though it is true that improperpriors can be obtained as limits of proper priors. The reason is perhapsbest explained by analogy with a more familiar example: It is true thatthe real number system is obtained as a limit of the rational numbers. Aprecise construction, and this is important, is found by the aid of equivalenceclasses of Cauchy sequences. Nonetheless, most people prefer to think of realnumbers as such without reference to rational numbers. Real numbers arejust limits of rational numbers, but we use them with properties as given bythe axioms of the real number system.The reader will not ﬁnd any new algorithms or methods for the solu-tion of practical problems here. The presented theory can, however, be usedto put many known solutions to practical problems on a more solid theo-retical foundation. This additional eﬀort is necessary to avoid and explaincontra-dictionary results as exempliﬁed by for instance the famous marginal-ization paradoxes [Stone and Dawid, 1972, Dawid et al., 1973]. The theoryalso gives a natural frame for the proof of optimality of inference based3n ﬁducial theory [Taraldsen and Lindqvist, 2013a] and a proof of coinci-dence of a ﬁducial distribution and certain Bayesian posteriors based onimproper priors [Taraldsen and Lindqvist, 2013b]. The theory has also beenused for a rigorous speciﬁcation of intrinsic conditional autoregression models[Lavine and Hodges, 2012]. These models are widely used in spatial statis-tics, dynamic linear models, and elsewhere.

The aim in the following is to formulate a theory that can be used to providea foundation for statistics that includes improper priors. A less technical pre-sentation of this with some elementary examples has already been providedby Taraldsen and Lindqvist [2010]. They show in particular by examplesthat this theory is diﬀerent from the alternative theory for improper priorsprovided by Hartigan [1983]. This section gives a condensed presentation ofthe mathematical ingredients. Most deﬁnitions are standard as presented byfor instance Rudin [1987], but some are not standard and are emphasized inthe text. An example is the concept of a

C-measure as introduced below.Let X be a nonempty set, and F a family of subsets that includes theempty set ∅ . The family F is a σ -ﬁeld if it is closed under formation ofcountable unions and complements. A set is measurable if it belongs to F .A measurable space X is a set X equipped with a σ -ﬁeld F . The samesymbol X is here used to denote both the set and the space. The notation( X , F ) is also used to denote the measurable space. The convention here isto use the term space to denote a set with some additional structure.A measure space ( X , F , µ ) is a measurable space ( X , F ) equipped with ameasure µ . A measure is a function µ : F → [0 , ∞ ] that is countably additive:The equality µ ( S i A i ) = P i µ ( A i ) holds for disjoint measurable A , A , . . . . Deﬁnition 1.

Let ( X , F , µ ) be a measure space. An admissible condition A is a measurable set with < µA < ∞ . The measure space ( X , F , µ ) is σ -ﬁnite if there exists a sequence A , A , . . . of admissible conditions such that X = ∪ i A i . A probability space is a mea-sure space ( X , F , µ ) such that µ ( X ) = 1, and µ is then said to be a probabilitymeasure.Consider the set M of all σ -ﬁnite measures on a ﬁxed measurable space( X , F ). The set M includes in particular all probability measures, and the4ollowing gives new concepts also when restricted to probability measures.Two elements µ and ν in M are deﬁned to be equivalent if µ = αν forsome positive number α . This deﬁnes an equivalence relation ∼ and thequotient space M/ ∼ deﬁnes the set of C-measures . It should in particular beobserved that any topology on M induces a corresponding quotient topologyon M/ ∼ . Convergence of C-measures is an important topic, but the study ofthis is left for the future. Some further discussion will, however, be providedin Section 4. A C-measure space ( X , F , γ ) is a measurable space ( X , F )equipped with a C-measure γ . This means that γ = [ µ ] = { ν | ν ∼ µ } is anequivalence class of σ -ﬁnite measures. The term conditional probability space will here be used as an equivalent term. This convention will be motivatednext, and is further elaborated in Section 4.Let ( X , F , γ ) be a conditional probability space, and let A be an admissi-ble condition. The conditional law, or equivalently the conditional measure , γ ( · | A ) is then well deﬁned by γ ( B | A ) = ν ( B | A ) = ν ( B ∩ A ) /ν ( A ) where ν ∈ γ . It is well deﬁned since it does not depend on the choice of ν , and theresulting conditional law is a probability measure. This argument gives: Proposition 1.

A conditional probability space ( X , F , γ ) deﬁnes a uniquefamily of probability spaces indexed by the admissible sets: { ( X , F , γ ( · | A )) | A is admissible } . For ease of reference we restate the following deﬁnition.

Deﬁnition 2.

Let ( X , F ) be a measurable space. A conditional measure γ = [ ν ] = { αν | < α < ∞ , ν is σ -ﬁnite } is a set of σ -ﬁnite measuresdeﬁned by a σ -ﬁnite measure ν deﬁned on ( X , F ) . Let A be an admissiblecondition and let B be measurable. The formula γ ( B | A ) = ν ( B ∩ A ) /ν ( A ) deﬁnes the conditional probability of B given A . A conditional measure space ( X , F , γ ) is a measurable space ( X , F ) equipped with a conditional measure γ . In general probability and statistics it is most useful to extend the deﬁ-nition of conditional probability and expectation to include conditioning on σ -ﬁelds and statistics. This can be done also in the more general contexthere. The main new ingredient is given by the deﬁnition of σ -ﬁnite σ -ﬁeldsand σ -ﬁnite measurable functions.Let ( X , F , µ ) be a measure space. Assume that F ⊂ F is a σ -ﬁnite σ -ﬁeld in the sense that ( X , F , µ ) is σ -ﬁnite where µ is the restriction of µ to F . This implies that ( X , F , µ ) is also σ -ﬁnite. Let A ∈ F . The conditional5easure µ ( · | F ) is deﬁned by the F measurable function x µ ( A | F )( x )uniquely determined by the relation µ ( A ∩ B ) = Z B µ ( A | F )( x ) µ ( dx ) (1)which is required to hold for all measurable subsets B ∈ F .The existence and uniqueness proof follows by observing: (i) νB = µ ( AB )deﬁnes a measure on F . (ii) ν is dominated by the measure µ . (iii)The Radon-Nikodym theorem gives existence and uniqueness of the con-ditional µ ( A | F ) as the density of ν with respect to µ so the claim ν ( dx ) = µ ( A | F )( x ) µ ( dx ) follows. The uniqueness is only as a measurable functiondeﬁned on the measure space ( X , F , µ ) and the conditional probability ismore properly identiﬁed with an equivalence class of measurable functions.The deﬁning equation (1) shows that µ ( A | F ) = ( αµ )( A | F ) for all α >

0. It can be concluded that the conditional measure γ ( A | F ) is well deﬁnedif ( X , F , γ ) is a conditional probability space. An immediate consequenceis γ ( X | F ) = 1, so the conditional measures are all normalized. The term conditional probability will motivated by this be used as equivalent to theterm conditional measure . This is as above for the elementary conditionalmeasure.The following example demonstrates that this conditional probability gen-eralizes the elementary conditional probabilities γ ( A | B ). Let F be the σ -ﬁeld generated by a countable partition of X into disjoint admissible sets A , A , . . . . It follows that F is σ -ﬁnite. Assuming this the conditionalexpectation is given by γ ( A | F )( x ) = γ ( A | A i ) for x ∈ A i .The deﬁnitions presented so far lead naturally to the deﬁnition of the cat-egory of conditional probability spaces with a corresponding class of arrows.The study of this, and functors to related categories, will not be pursuedhere. This more general theory gives, however, alternative motivation forsome of the concepts presented next.A function φ : X → Y is measurable if the inverse image of every mea-surable set is measurable: ( φ ∈ A ) = φ − ( A ) = { x | φ ( x ) ∈ A } is measurablefor all measurable A . Let µ be a measure on X . The image measure µ φ isdeﬁned by µ φ ( A ) = µ ( φ ∈ A ) (2) A measurable function φ is by deﬁnition σ -ﬁnite if µ φ is σ -ﬁnite. A directveriﬁcation shows that a σ -ﬁnite function φ : X → Y pushes a conditional6robability space structure on X into a conditional probability space struc-ture on Y . This follows from the above and the identity [ µ ] φ = [ µ φ ]. Con-sequently, if γ is a C-measure, then the C-measure γ φ is well deﬁned if φ is σ -ﬁnite. The deﬁnition given here of a σ -ﬁnite function is a generalization ofthe concept of a regular random variable as deﬁned by Renyi [1970a, p.73].The deﬁnition of σ -ﬁnite σ -ﬁelds and σ -ﬁnite measurable functions can bereformulated as follows. Deﬁnition 3.

Let ( X , F , µ ) be a σ -ﬁnite measure space. A σ -ﬁeld F ⊂ F is σ -ﬁnite if µ restricted to F is σ -ﬁnite. Let ( Y , G ) be a measurable space. Ameasurable φ : X → Y is σ -ﬁnite if the σ -ﬁeld F = {{ x | φ ( x ) ∈ A } | A ∈ G} is σ -ﬁnite. The previous arguments show that the σ -ﬁnite functions can play therole as arrows in the category of conditional probability spaces. The σ -ﬁnite functions can also be used to deﬁne conditional probabilities just as σ -ﬁnite σ -ﬁelds did in equation (1). It will be a generalization since theprevious deﬁnition is obtained by consideration of the function x x as afunction taking values in the space equipped with the σ -ﬁnite σ -ﬁeld in theconstruction that follows.Assume that δ : X → Z is σ -ﬁnite. The conditional probability µ z ( A ) = µ ( A | δ = z ) is deﬁned by the relation µ ( A [ δ ∈ B ]) = Z µ ( A | δ = z )[ z ∈ B ] µ δ ( dz ) (3)which is required to hold for all measurable subsets B ⊂ Z . The existenceand uniqueness proof follows by an argument similar to the argument afterequation (1). The identity [ µ ] z = µ z holds and the conditional measure γ z iswell deﬁned for a C-measure γ .Composition of the functions x δ ( x ) = z and z µ z ( A ) deﬁnes theconditional probability µ ( A | δ ) as a measurable function deﬁned on X . Thisfunction is measurable with respect to the initial σ -ﬁeld F δ ⊂ F generatedby δ . The σ -ﬁniteness of δ is equivalent with the σ -ﬁniteness of F δ . Directveriﬁcation shows that the deﬁnitions of conditional probability given byequations (1) and (3) coincide in the sense that µ ( A | F δ ) = µ ( A | δ ).The conclusion is that a conditional probability space ( X , F , γ ) is not onlyequipped with the family of elementary conditional probabilities { γ ( · | A ) | A ∈A} , but also a family { γ z | z ∈ Z} of conditional probabilities for each σ -ﬁnite δ : X → Z . 7 he conditional probability µ zφ on Y is deﬁned by µ zφ ( A ) = µ z ( φ ∈ A ) = µ ( φ ∈ A | δ = z ) (4)The function δ must be σ -ﬁnite, but it is not required that φ : X → Y is σ -ﬁnite. It follows that µ φ,δ ( dy, dz ) = µ zφ ( dy ) µ δ ( dz ) (5)The case X = Y and φ ( x ) = x gives the deﬁning equation (3) as a specialcase of the more general factorization given by equation (5).The previous discussion can be summarized by: Proposition 2.

A conditional measure space ( X , F , γ ) , a σ -ﬁnite δ : X → Z ,and a measurable φ : X → Y deﬁne a unique measurable family { γ zφ | z ∈ Z} of conditional measures deﬁned on Y . It does not follow in general that there exists a version of γ zφ such thatthis is a measure for almost all z . A suﬃcient condition for this is that Y is a Borel space [Schervish, 1995, p.618]. This is the case if Y is in one-one measurable correspondence with a measurable subset of an uncountablecomplete separable metric space [Royden, 1989, p.406]. The correspondingversion of the conditional probability is then said to be a regular conditionalprobability . Integration with respect to γ zφ can nonetheless be deﬁned withoutany regularity conditions, and the factorization given by equation (5) holdsin the most general case as stated. The possibility of this more generalintegral with respect to conditional probabilities was indicated already byKolmogorov [1933, eq.10 on p.54]. Let x be the observed result of an experiment. It will be assumed that x can be identiﬁed with an element of a mathematically well deﬁned setΩ X . The set should include all other possible outcomes that could have beenobserved as a result of the experiment. The observed result can be a number,a collection of numbers organized in some way, a continuous function, aself-adjoint linear operator, a closed subset of a topological space, or anyother element of a well deﬁned set corresponding to the experiment underconsideration. 8ssume that the sample space Ω X is equipped with a σ -ﬁeld E X so that(Ω X , E X ) is a measurable space. Assume furthermore that (Ω X , E X , P θX ) is aprobability space for each θ in the model parameter space Ω Θ . The family { P θX | θ ∈ Ω Θ } speciﬁes a statistical model for the experiment.A predominant family of example in the applied statistical literature isgiven by letting P θX be the multivariate Gaussian distribution on Ω X = R N with covariance matrix Σ( θ ) and mean µ ( θ ) where Ω Θ ⊂ R K . The sim-plest special case is given by Σ = I and µ = ( θ ; . . . ; θ ) which correspondsto independent sampling from the univariate Gaussian with unknown mean θ and variance equal to 1. Other examples included are given by ANOVAmodels with ﬁxed and random eﬀects, more general regression models, struc-tured equations models with latent variables from the ﬁelds of psychologyand economy [J¨oreskog, 1970], and a variety of models from the statisticalsignal processing literature [Van Trees, 2002]. These models correspond tospeciﬁc choices of the functional dependence on θ in Σ( θ ) and µ ( θ ).The contents so far coincides with the deﬁnition of a statistical model asfound in standard statistical literature. One exception is the choice of thenotation { P θX | θ ∈ Ω Θ } for the statistical model. This choice indicates theconnection to the theory of conditional probability spaces as will be explainedby the introduction of further assumptions.It is assumed that the statistical model is based upon an underlyingabstract conditional probability space (Ω , E , P). This includes the case of anunderlying abstract probability space as formulated by Kolmogorov [1933]as a special case, but seen as a conditional probability space. It is abstractin the sense that it is assumed to exist, but it is not speciﬁed. It is assumedthat the model parameter space is a measurable space (Ω Θ , E Θ ), that thereexists a σ -ﬁnite measurable Θ : Ω → Ω Θ , and that there exists a measurable X : Ω → Ω X so that the resulting conditional probability P θX as deﬁned inequation (4) coincides with the speciﬁed statistical model.Existence of Ω, P, Θ, and X can be proved in many concrete cases byconsideration of the product space Ω X × Ω Θ equipped with the σ -ﬁnite mea-sure P θX ( dx ) π ( dθ ) obtained from the choice of a σ -ﬁnite measure π . Thisincludes in particular the multivariate Gaussian example indicated above.As soon as existence is established it is assumed for the further theoreticaldevelopment that Ω, P, Θ, and X are abstract unspeciﬁed objects with therequired resulting statistical model as a consequence.It can be observed that it is required that the mapping θ P θX ( A ) ismeasurable for all measurable A for the above construction to be possible.9his condition is trivially satisﬁed in most examples found in applications,and is furthermore a typical assumption in theoretical developments. A goodexample of the latter is given by the mathematical proof of the factorizationtheorem for suﬃcient statistics [Halmos and Savage, 1949].The basis for frequentist inference is then the observation x and the spec-iﬁed statistical model P θX based on the underlying abstract conditional prob-ability space (Ω , E , P).The basis for Bayesian inference is as for frequentist inference, but theprior distribution P Θ is also speciﬁed. The basis for Bayesian inference ishence the observation x and the joint distribution P X, Θ ( dx, dθ ) = P θX ( dx ) P Θ ( dθ ).The conclusions of Bayesian inference are derived from the posterior distribu-tion P x Θ , which is well deﬁned by equation (4) if X is σ -ﬁnite. This result canbe considered to be a very general version of Bayes theorem as promised inthe Abstract. A discussion of a more elementary version involving densitiesis given by Taraldsen and Lindqvist [2010].The importance of the σ -ﬁniteness of X has also been observed by others,but then as a requirement on the prior. Berger et al. [2009, p.911] includesthis requirement as a part of the deﬁnition of a permissible prior . The def-inition as formulated in this section can be used as a generalization of thispart to cases not deﬁned by densities.A summary of the contents in this section is given by: Deﬁnition 4 (Statistical model) . A statistical model { (Ω X , E X , P θX ) | θ ∈ Ω Θ } is speciﬁed by a family of probability spaces indexed by the model param-eter space (Ω Θ , E Θ ) with the additional structure deﬁned in the following.It is assumed that all objects are deﬁned based on the underlying condi-tional probability space (Ω , E , P ) . The observation is given by a measurablefunction X : Ω → Ω X and the model parameter is given by a σ -ﬁnite mea-surable function Θ : Ω → Ω Θ . It is assumed that the family of probabilitymeasures is given by the conditional law, so P θX ( A ) = P ( X ∈ A | Θ = θ ) .A Bayesian statistical model is speciﬁed by a statistical model togetherwith a speciﬁcation of the prior law P Θ . It is assumed that X is σ -ﬁnite, andthen the resulting marginal law P X is a conditional measure and the resultingposterior law P x Θ ( B ) = P (Θ ∈ B | X = x ) is well deﬁned. In the previous the prior P Θ , the marginal P X , and the joint distribu-tion P X, Θ are C-measures with corresponding conditional probability spaces(Ω Θ , E Θ , P Θ ), (Ω X , E X , P X ), and (Ω X, Θ , E X, Θ , P X, Θ ). The interpretation of the10rior is in terms of the corresponding elementary conditional laws P Θ ( · | A ).The same holds for the other improper laws.Bayesian inference is essentially unique. This is in contrast to frequentistinference which most often oﬀer many diﬀerent possible inference proceduresfor a given problem. An analogous situation occurs in applied metrologywhere it is common to have many diﬀerent measurement instruments avail-able for the measurement of a physical quantity. The choice of instrumentdepends on the actual situation and purpose of the experiment at hand.The previous gives a mathematical deﬁnition of a statistical model anda Bayesian statistical model based on the concept of a conditional measure.The concept of a ﬁducial statistical model can also be deﬁned based on thesame theory. The necessary ingredients and further discussion of this havebeen presented by Taraldsen and Lindqvist [2013a,b].

Renyi [1970a, p.38-] gives a deﬁnition of a conditional probability space basedon a family of objects µ ( A | B ). A condensed summary of the initial ingre-dients in this theory is presented next, but with some extensions and minorchanges in the choice naming conventions. The purpose is to show the closeconnection to the concept of a conditional measure space as discussed in theprevious two section. The ﬁnal words of Renyi on this subject are recom-mended for a more thorough [Renyi, 1970a] and pedagogical presentation[Renyi, 1970b] of the theory as formulated and motivated by Renyi. Deﬁnition 5 (Bunch) . Let ( X , F ) be a measurable space. A family B ⊂ F is a bunch in X if1. B , B ∈ B ⇒ B ∪ B ∈ B .2. There exist B , B , . . . ∈ B such that S i B i = X .3. The empty set ∅ does not belong to B .Example 1 Let ( X , F ) be the real line equipped with the Borel σ -ﬁeld.Let B be the set of ﬁnite nonempty unions of open intervals on the form( m/ , m/

2) where m are integers. The family B is then a countablebunch. ✷ eﬁnition 6 (Renyi space) . A Renyi space ( X , F , ν ) is a measurable space ( X , F ) equipped with a family { ν ( · | B ) | B ∈ B} of probability measures in-dexed by a bunch B which fulﬁll B , B ∈ B and B ⊂ B ⇒ ν ( B | B ) > ,and the identity ν ( A | B ) = ν ( A ∩ B | B ) ν ( B | B ) (6)A Renyi space ( X , F , ν ) extends a Renyi space ( X , F , ν ) by deﬁnitionif B ⊂ B and ν ( · | B ) = ν ( · | B ) for all B ∈ B . The extension is strict if B ⊂ B and B = B . A Renyi space is maximal if a strict extension doesnot exist. Example 1 (continued)

Let ν ( A | B ) be the uniform probability law on B for each B ∈ B . This gives a Renyi space ( X , F , ν ) where ( X , F ) is thereal line with the Borel σ -ﬁeld. The family { ν ( · | B ) } B ∈B is in this case acountable family of probability measures.Let µ = [ m ] be the C-measure given by Lebesgue measure m on the realline. The elementary conditional measures µ ( · | A ) for admissible A ∈ A deﬁnes a Renyi space ( X , F , µ ) which contains the Renyi space ( X , F , ν ) inthe sense that B ⊂ A and ν ( · | B ) = µ ( · | B ) for all B ∈ B . It follows fromthe results presented next that ( X , F , µ ) is a maximal extension of ( X , F , ν ). ✷ It follows generally that a C-measure space ( X , F , µ ) generates a uniqueRenyi space ( X , F , µ ) through the elementary conditional measures µ ( · | A ).The same symbol µ is here used for two diﬀerent concepts. Further excusefor this abuse of notation is given by the following structure result: Proposition 3.

A Renyi space generates a unique conditional measure space.The corresponding resulting Renyi space is a maximal extension of the initialRenyi space.Proof.

Let ( X , F , ν ) be the Renyi space. It will be proved that there existsa σ -ﬁnite measure µ such that µ ( · | B ) = ν ( · | B ) for all B ∈ B , and that theC-measure [ µ ] is unique.The ﬁrst step in the proof is to pick an arbitrary B ∈ B and deﬁne µ ( B ) = ν ( B | B ∪ B ) /ν ( B | B ∪ B ) for B ∈ B . This choice gives thenormalization µ ( B ) = 1. This deﬁnition is extended to measurable A ⊂ B ∈ B by µ ( A ) = µ ( A ∩ B ) = ν ( A | B ) µ ( B ). An arbitrary measurable A can be written as a disjoint union of measurable A , A , . . . where each A i

12s contained in some set B from the bunch. The measure µ is then ﬁnallydeﬁned by µ ( A ) = P i µ ( A i ).Equation (6) can be used to prove that the previous deﬁnition of µ ( A )based on a A ⊂ B for B ∈ B does not depend on the choice of B . This, andfurther proof of consistency and uniqueness of [ µ ] is left to the reader. Analternative is to consult the proof of a corresponding result given by Renyi[1970a, p.40-43].Two diﬀerent Renyi spaces can generate the same C-measure space. Aconcrete example is provided by consideration of the bunch generated bythe intervals ( m/ , m/ Corollary 1.

A Renyi space has a unique extension to a maximal Renyispace. The set of maximal Renyi spaces is in one-one correspondence withthe set of C-measure spaces.Proof.

Let ( X , F , ν ) be a Renyi space and let ( X , F , [ µ ]) be the correspondinggenerated C-measure space. The Renyi space ( X , F , [ µ ]) given by the set ofadmissible conditions A is then a unique maximal extension. Uniquenessand maximality follows since any Renyi space that contains ( X , F , ν ) willgenerate the C-measure space ( X , F , [ µ ]) by the construction given in theproof of Proposition 3.A more general concept of a conditional probability space was originallyintroduced by Renyi [1955], and a corresponding more general structure the-orem was proved by Cs´asz´ar [1955]. Renyi [1970a, p.95] refers to these moregeneral spaces as generalized conditional probability spaces. They are trulymore general and a generalized conditional probability space is not necessar-ily generated by a single σ -ﬁnite measure. The distinction between a σ -ﬁnite measure space and the corresponding C-measure space could at ﬁrst sight seem trivial. For a σ -ﬁnite measure µ thecorresponding C-measure ν = [ µ ] is an equivalence class of σ -ﬁnite measuresin the set of all σ -ﬁnite measures on the measurable space X . It follows,13s stated earlier, that any topology on the set of σ -ﬁnite measures gives acorresponding quotient topology on the set of C-measures. Convergence of σ -ﬁnite measures is diﬀerent from convergence of the corresponding conditionalmeasures. This is also true if the initial σ -ﬁnite measure is a probabilitymeasure.An alternative is to consider the C-measure space as a maximal Renyispace, and this is a concept more clearly distinct from that of a σ -ﬁnite mea-sure space. Convergence concepts for Renyi spaces can be studied directly,and initial work on this has been done by Renyi [1970a]. He shows in partic-ular that any conditional measure can be obtained as a limit of conditionalmeasures corresponding to probability measures in a reasonable topology.The study of convergence concepts exemplify an important diﬀerence be-tween σ -ﬁnite measures and C-measures. This is left for the future.The distinction between a σ -ﬁnite measure space and the correspond-ing C-measure space can also be seen by analogy with the construction ofprojective spaces. The projective space P n ( R ) as a set is the set of linesthrough the origin 0 in R n +1 . It is hence equal to the set of equivalenceclasses [ x ] = { λx | λ ∈ R \ { } , x ∈ R n +1 \ { }} in R n +1 \ { } . The set ofC-measures on a measurable space is hence diﬀerent from the set of σ -ﬁnitemeasures just as a projective space is diﬀerent from the space on which it isconstructed.The presented theory is in line with the arguments given by Jeﬀreys[1939]. He argues that improper priors are necessary to get the Bayesianmachine up and running. This point of view can be disputed, but it isindisputable that usage of improper priors ﬂourish in the statistical literature.There is hence a need for a theory that includes improper priors.Lindley [1965], apparently in line with the view of the current authors,found that the theory of Renyi is a natural starting point for statistical theory.In the Preface to his classical text on probability he writes:The axiomatic structure used here is not the usual one associatedwith the name of Kolmogorov. Instead one based on the ideas ofRenyi has been used. The essential diﬀerence between the twoapproaches is that Renyi’s is stated in terms of conditional prob-abilities, whereas Kolmogorov’s is in terms of absolute probabil-ities, and conditional probabilities are deﬁned in terms of them.Our treatment always refers to the probability of A, given B, andnot simply to the probability of A. In my experience students14eneﬁt from having to think of probability as a function of twoarguments, A and B, right from the beginning. The condition-ing event, B, is then not easily forgotten and misunderstandingsare avoided. These ideas are particularly important in Bayesianinference where one’s views are inﬂuenced by the changes in theconditioning event.Lindley [1965] refers to an earlier German edition of the book cited here[Renyi, 1962]. The two books [Renyi, 1970b,a] represent the ﬁnal view ofRenyi regarding conditional probability spaces, but the basis for the the-ory development are found in earlier articles [Renyi and Turan, 1976, Renyi,1955]. The extension given by conditioning on σ -ﬁnite statistics and σ -ﬁnite σ -ﬁelds is not treated by Renyi.The structure theorem shows in general that a family of conditional prob-abilities that satisﬁes the axioms of a Renyi space given in Deﬁnition 6 can beextended so that it corresponds to a unique maximal Renyi space which canbe identiﬁed with a C-measure space. The family of conditional probabilitiesgives intuitive motivation and interpretation for usage of improper laws inprobability and statistics. In this theory any marginal law corresponds to aconditional probability space. All probabilities are conditional probabilities. References

James O. Berger, Jos´e M. Bernardo, and Dongchu Sun. The Formal Deﬁni-tion of Reference Priors.

The Annals of Statistics , 37(2), 2009.´Akos Cs´asz´ar. Sur la structure des espaces de probabilit´e conditionnelle.

ActaMathematica Academiae Scientiarum Hungarica , 6:337–361, 1955.A. P. Dawid, M. Stone, and J. V. Zidek. Marginalization Paradoxes inBayesian and Structural Inference.

Journal of the Royal Statistical So-ciety Series B-Statistical Methodology , 35(2):189–233, 1973.P. Halmos and L. J. Savage. Application of the Radon-Nikodym theorem tothe theory of suﬃcient statistics.

Annals of Mathematical Statistics , 20:225–241, 1949.J. Hartigan.

Bayes theory . Springer, 1983.15. Jeﬀreys.

Theory of probability (1966 ed) . Oxford, third edition, 1939.K. G. J¨oreskog. A general method for analysis of covariance structures.

Biometrika , 57(2):239–251, August 1970.A. Kolmogorov.

Foundations of the theory of probability . Chelsea edition(1956), second edition, 1933.Michael L. Lavine and James S. Hodges. On Rigorous Speciﬁcation of ICARModels.

The American Statistician , 66(1):42–49, 2012.D. V. Lindley.

Introduction to Probability and Statistics from a BayesianViewpoint(Cambridge 2008 reprint) , volume I-II. Cambridge UniversityPress, 1965.A. Renyi.

Wahrscheinlichkeitsrechnung (Probability theory 1970) . DeutscherVerlag der Wissenschaften, Berlin, 1962.A. Renyi.

Foundations of Probability . Holden-Day, 1970a.A. Renyi and P. Turan.

Selected papers of Alfred Renyi , volume I-III.Akademiai Kiado, Budapest, 1976.Alfred Renyi. On a new axiomatic theory of probability.

Acta MathematicaHungarica , 6(3):285–335, September 1955.Alfred Renyi.

Probability Theory . Dover Publications, dover ed (2007) edi-tion, May 1970b.H. L. Royden.

Real Analysis . Macmillan, third edition, 1989.W. Rudin.

Real and Complex Analysis . McGraw-Hill, 1987.M. J. Schervish.

Theory of Statistics . Springer, 1995.M. Stone and A. P. Dawid. Un-Bayesian Implications of Improper Bayes In-ference in Routine Statistical Problems.

Biometrika , 59(2):369–375, 1972.G. Taraldsen and B. H. Lindqvist. Improper Priors Are Not Improper.

TheAmerican Statistician , 64(2):154–158, 2010.G. Taraldsen and B. H. Lindqvist. Fiducial theory and optimal inference.

Annals of Statistics , 41(1):323–341, 2013a.16. Taraldsen and B. H. Lindqvist. Fiducial and Posterior Sampling.

Com-munications in Statistics:Theory and Methods (accepted) , 2013b.H. L. Van Trees.