aa r X i v : . [ m a t h . L O ] M a r Sets and Probability ∗ Hazel Brickhill and Leon HorstenMarch 21, 2019
Abstract
In this article the idea of random variables over the set theoretic uni-verse is investigated. We explore what it can mean for a random setto have a specific probability of belonging to an antecedently givenclass of sets.
Probabilistic notions have been applied to mathematical objects and no-tions. For instance, probabilistic concepts have been applied in the theoryof random graphs [Alon et al 2000]. The aim of this article is to apply anotion of probability to the mathematical universe as a whole. More inparticular, we wish to explicate what it could mean for a property A of sets tohave a probability of being true of a set y in the set theoretic universe V.
Prop-erties are identified with their extensions, so that A ranges over all properand improper classes in V .The aim is to develop a theory of the probability of events of the form A ( τ ) , where A is a class and the variable τ is a random variable . The state ∗ Versions of this paper have been presented a Bristol–Leuven workshop on Logic andPhilosophy of Science (2015), at the Philosophy Department of the Universidade Fed-eral do Rio Grande do Norte (2015), the Fourth Reasoning Conference in Manchester(2015), the Philosophy of Mathematics Seminar in Oxford (2014), and at the PhilosophyDepartmental Research Seminar in Aberdeen (2014). We are grateful to the audiences forhelpful comments, questions, and suggestions. In this respect we are especially indebtedto Philip Welch, George Wilmers, and Sylvia Wenmackers. pace of the random variables is of course V . The outcome space of the ran-dom variables has to be at least as large as V because there must be enoughstates for a random variable to take each set as a possible value. On theother hand, there is no need for it to be larger than V . Therefore the out-come space is simply identified with V .Without invoking fixed set of postulates, intuitions about probabilityhave occasionally been used in set theory, for instance to motivate new ba-sic principles [Freiling 1986]. However, such attempts are mostly regardedas unsuccessful [Hamkins 2015]. In the light of this it is natural to wonderwhat we should require from probability functions associated with ran-dom variables on V .Surely it would be unreasonable to insist on there being one unique cor-rect probability function that yields the probability of a random variabletaking a value in a given class of sets. On the other hand, for our func-tions to have any hope of meriting the label probability function , they haveto satisfy Kolmogorov’s conditions for being a finitely additive probabilityfunction .From the outset we impose additional constraints on the class of prob-ability functions that we are interested in: Totality.
The probability functions are defined on all classes.2.
Uniformity.
All singleton events are given the same probability.3.
Regularity.
All singleton events are given non-zero probability.All this means, for familiar reasons, that the sought-for probability func-tions cannot be Kolmogorov probability functions. Given our insistenceon finite additivity, this means that the probability functions will be non-Archimedean. They will not satisfy σ -additivity, but they will instead sat-isfy a generalised infinite additivity rule.In mathematics today, the term ‘probability’ has become virtually syn-onymous with ‘function that satisfies the Kolmogorov axioms ( including σ -additivity)’. If you see matters this way, then you will will be loath todignify the functions constructed in this paper by the term ‘probabilityfunction’. Nonetheless, you may ask the question whether a fine-grainedquantitative theory of possibility, with which the degree of possibility of For a discussion of these constraints in the context of non-Archimedean probabilitytheory, see [Benci et al 2018]. quantitative theory of possibility.
Youare then advised to replace all occurrences of ‘(non-standard) probabilityfunction’ by ‘quantitative possibility function’.The project in which we are engaging in this article is related to thework in [Benci et al 2007]. The aim of the latter article is to construct a the-ory of sizes for mathematical universes inspired by the
Euclidean principle that the size of the whole is larger than the sizes of its proper parts. Nowthere is of course a familiar theory of size—Cantor’s theory of cardinality,—which does not satisfy this Euclidean principle. So Benci and his co-authorspropose their Euclidean theory of size as a rival to Cantor’s theory.We, on the other hand, fully accept Cantor’s theory of cardinality. None-theless, the probability functions that will be constructed satisfy the Eu-clidean principle that the probability of an event is strictly greater than theprobability of each of its sub-events. Moreover, the mathematical tech-niques for generating them are closely related to the techniques that areused in [Benci et al 2007].What we shall mean by ‘mathematical universe’ is not the same as whatis meant [Benci et al 2007] by the term. The authors of [Benci et al 2007]impose mainly algebraic constraints on what counts as a mathematicaluniverse [Benci et al 2007, Introduction]. We, in contrast, take the term‘mathematical universe’ in the set theoretical sense. Naively, you may takethere to be one preferred set theoretic universe: V . But if you are uncom-fortable with taking V as given, then you might want to take a mathemat-ical universe to be a rank V α that constitutes a model of most or perhapseven all of the standard principles of set theory. Indeed, we will see thatfor random variables defined on any large set S , the general idea of equip-ping them with a probability function will be the same as that for randomvariables on V .We will discuss two ways of generating non-Archimedean probabilityfunctions for random variables on V . In section 2 a simple way of gen-erating such probability functions (the finite snapshot approach ) will be de-scribed. In section 3 we go on to discuss how global properties of theseprobability functions can be made to hold by imposing constraints on theprocess of generating such functions. In section 4, a theoretically moresatisfying but also more complicated way of generating non-Archimedeanprobability functions for random variables on V is discussed (the bootstrap- ing method). A random variable τ on V is a function from states to the outcome space,i.e., an element of V V . So there are many random variables on V . The aim isto associate a notion of probability with elements of V V that meet the min-imal constraints (totality, uniformity and regularity) that were describedin section 1.In fact, we want to give precise meaning to conditional probabilitystatements of the form Pr ( σ ∈ A | τ ∈ B ) ,where σ , τ ∈ V V and A , B ⊆ V . But we will see that it will be sufficientfor our purposes to give meaning to unconditional probability statementsof the form Pr ( σ ∈ A ) . So our fundamental problem amounts to givingmeaning to expressions of the form Pr ( σ ∈ A ) . Such probability measureswill be determined by a choice of a fine ultrafilter on the collection [ V ] < ω of finite subsets of the state space. The starting point is a fine ultrafilter U on [ V ] < ω . This fine ultrafilter U defines a non-Archimedean field F U in the following way.For any two functions f , g : [ V ] < ω → Q we define: Definition 1 f ≈ U g ≡ { T ∈ [ V ] < ω : f ( T ) = g ( T ) } ∈ U .In words: two functions are identified if they coincide on ultrafilter-manystates.The relation ≈ U is an equivalence relation, so we can take equivalenceclasses for which we then have [ f ] U = [ g ] U ⇔ f ≈ U g .Moreover, it is again a routine exercise to verify that the [ f ] U ’s form ahyper-rational field F U .Now suppose A ⊆ V and θ ∈ V V . Then we define the function f θ ∈ A : [ V ] < ω → Q as follows: What follows is an adaptation of the approach of [Brickhill et al 2018, section 2]. efinition 2 For every T ∈ [ V ] < ω : f θ ∈ A ( T ) ≡ |{ s ∈ T : θ ( s ) ∈ A }|| T | .In words: for every finite set of states T , f θ ∈ A ( T ) is the ratio between thenumber of states s in T for which θ ( s ) ∈ A and the number of states in T .In this sense, f θ ∈ A ( T ) is the probability of θ ∈ A on a finite snapshot of states .Similarly, we define the function f θ ∈ A ∧ ν ∈ B as follows: Definition 3
For every T ∈ [ V ] < ω : f θ ∈ A ∧ ν ∈ B ( T ) ≡ |{ s ∈ T : θ ( s ) ∈ A and ν ( s ) ∈ B }|| T | .Now we are ready to define the probability of θ ∈ A , relative to a fine(and therefore free) ultrafilter U on [ V ] < ω : Definition 4 Pr U ( θ ∈ A ) ≡ [ f θ ∈ A ] U .Similarly, we define Pr U ( θ ∈ A ∧ ν ∈ B ) as [ f θ ∈ A ∧ ν ∈ B ] U . Thus we haveconstructed a probability function Pr U that takes its values in the hyper-rational field F U . Such probability functions are sometimes called NAPfunctions.Conditional probability can then be expressed in terms of uncondi-tional probability: Definition 5 Pr U ( θ ∈ A | ν ∈ B ) ≡ Pr U ( θ ∈ A ∧ ν ∈ B ) Pr U ( ν ∈ B ) . From section 1 we know that the aim is not to arrive at a unique (correct)probability function on V . But we did insist from the outset on our proba-bility functions satisfying three global constraints: totality, uniformity, andregularity. It will be shown that these properties are always guaranteed tohold. 5here are further global conditions on probability functions on V thatseem reasonable to require, and that are not guaranteed to hold withoutfurther work. These global constraints will be explored. We will showthat many of them can be forced to hold by imposing constraints on theultrafilters from which the probability functions are generated. The definition of Pr U is relative to an initial choice of the fine ultrafilter U .The properties of Pr U depend on U . Nonetheless, certain basic propertiesof Pr U can be easily seen to hold regardless of which fine ultrafilter U ischosen: Proposition 1 Pr U is a finitely additive probability function;2. Pr U is Euclidean. Proof.
Easy.
Now we define the notion of a diagonal random variable : Definition 6
A random variable θ is said to be a diagonal random variable iffor any set x, there is exactly one element u of the state space such that θ ( u ) = x. In words: a diagonal random variable is a random variable that takes ev-ery value exactly once.Using this notion, we define the notions of regularity and uniformity : Definition 7 (regularity)
A probability function Pr U is regular if for every di-agonal random variable θ and for every x ∈ V , Pr U ( θ = x ) > . Definition 8 (uniformity)
A probability function Pr U is uniform if for everydiagonal random variable θ and for all x , y ∈ V : Pr U ( θ = x ) = Pr U ( θ = y ) . Proposition 2
For every fine ultrafilter U :1. Pr U is regular; . Pr U is uniform. Proof.
These properties are proved as propositions 2.5 and 2.6 in [Brickhill et al 2018,p. 525–526].
The Euclidean property is formally defined as follows:
Definition 9 (Euclidean)
A probability function Pr U is Euclidean if for everydiagonal random variable θ and all A , B ⊆ V:A ( B ⇒ Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) .Then we have: Proposition 3
For every fine ultrafilter U , the probability function Pr U is Eu-clidean. Proof.
By finite additivity and regularity.
Now we turn to infinite additivity. Countable additivity means thatthe probability of the union of a countable family of disjoint sets is the infinite sum of the probabilities of the elements of the family, where thenotion of infinite sum is spelled out in terms of the classical notion of limit.In the present setting, the probability Pr U of the union of any family ofdisjoint sets is also the infinite sum of the probabilities of the elements ofthe family [Benci et al 2013, section 3.4]. But now the notion of infinitesum is spelled out in terms of the generalised notion of limit based on theultrafilter U . More precisely, the new notion of infinite sum is defined asfollows. Suppose we are given a family { q i : i ∈ N } of rational numbers,and I ⊆ N . Then consider the function f : [ N ] < ω → Q given by f ( T ) = ∑ i ∈ I ∩ T q i .This function can be seen as giving the value of the infinite sum on all finite parts (“snapshots”) of the index set. So we identify the infinite sumof the family { q i : i ∈ I } of rational numbers with the generalised limit of f according to the ultrafilter U : Definition 10 ∑ i ∈ I ∗ q i ≡ [ f ] U .7sing this notion of infinite sum, we can express the probability of theunion of a disjoint family of sets as the sum of the probabilities of themembers of that family: Proposition 4
If A = S i ∈ I A i , with A i ∩ A j = ∅ for all i , j ∈ I, then for everyrandom variable τ : Pr U ( τ ∈ A ) = ∑ i ∈ I ∗ Pr U ( τ ∈ A i ) .In sum, Pr U has a natural infinite additivity property that is sometimescalled perfect additivity . Proposition 5
For every fine ultrafilter U , the probability function Pr U is per-fectly additive. Proof.
This proposition is proved as proposition 8 in [Benci et al 2013, p. 132–133].
From now on, the symbol θ will be used to refer to some arbitrary diagonal random variable. When it is not assumed that the random variable inquestion is diagonal, we will write τ .The Euclidean-ness of Pr U has implications for symmetry principles . Asa rule of thumb, one can say that symmetry principles fail . Proposition 6
For every fine ultraflter U , the probability function Pr U is notinvariant under all permutations of V. Proof.
We concentrate on N as it is canonically represented in V (by means ofthe Zermelo ordinals, for instance). Define a permutation π of V as follows: • π ( x ) = x for x ∈ V \ N ; Otherwise: • π ( x ) = x + for x even; • π ( ) = ; • π ( x ) = x − for x odd and > . See [Benci et al 2007], [Benci et al 2013], [Benci et al 2018]. et A ≡ {
0, 2, 4, . . . } , and let θ be a diagonal random variable. Then π ( A ) ( A.Therefore, by the Euclidean principle, Pr U ( θ ∈ π ( A )) < Pr U ( θ ∈ A ) .This of course entails that there are diagonal random variables θ , θ ′ suchthat for some A ⊆ V , Pr U ( θ ∈ A ) = Pr U ( θ ′ ∈ A ) .One popular global constraint on probability measures is translation-invariance . The Lebesgue measure has this property, and Banach limitsseem to occupy a privileged position in the class of generalised limits atleast in part because they are translation-invariant. In our context, translation-invariance does not make obvious sense. For a random class A , it is notclear what ‘ A + α ’ (where α is a number) means . But a clear interpretationof ‘adding an ordinal number’ can of course be given if A is a collection ofordinals: Definition 11
For A any collection of ordinals:A ⊕ α ≡ { β : ∃ γ ∈ A such that β = γ + α } .Then for A to be translation-invariant means that for all ordinals α and forevery θ , Pr U ( θ ∈ A ) = Pr U ( θ ∈ A ⊕ α ) .However, even if we consider non-Archimedean measures (of the kindthat we have been describing) on ordinals, translation-invariance conflictswith the Euclidean Property of our generalised probability functions. Inparticular, there is no NAP probability function Pr U on any infinite cardi-nal κ such that there is even one ordinal α with 0 < α < κ and Pr U ( θ ∈ κ ) = Pr U ( θ ∈ κ ⊕ α ) .The reason is simple. We have κ ⊕ α = κ \ α ( κ , so if we had Pr U ( θ ∈ κ ) = Pr U ( θ ∈ κ ⊕ α ) , then we would contradict the Euclidean principle.As this example shows, such translations arent necessarily one to oneso we may not want full invariance in general. In [Benci et al 2007, section1.3], Benci, Forti, and Di Nasso explore a restricted notion of translation-invariance of NAP-like measures on ordinals. We do not pursue this themefurther here, but only pause to note that there are other reasonable-lookingprinciples that are hard to satisfy. In the context of their theory of numerosi-ties , Benci, Forti, and Di Nasso consider a principle that in the present con-text would take the following form: 9 efinition 12 (Difference Principle) ∀ A , B ∈ V : Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) ⇒∃ C ∈ V : Pr U ( θ ∈ B ) = Pr U ( θ ∈ A ) + Pr U ( θ ∈ C ) .On countable sample spaces, the difference principle can be made to holdby building Pr U from a selective ultrafilter [Benci et al 2003]. But the exis-tence of selective ultrafilters is independent of ZFC. As far as we know,it is an open whether the difference principle can be consistently made tohold for NAP probability functions on uncountable sample spaces. In this (sub-)section we investigate the relation between our notion of gen-eralised probability on the one hand, and the familiar notion of cardinalityon the other hand.
One might naively wonder whether the following probabilistic analogueof Hume’s Principle for cardinality can hold:
Definition 13 (Hume’s principle for probability)
For all A , B ∈ V: | A | = | B | ⇒ Pr U ( τ ∈ A ) = Pr U ( τ ∈ B ) .But the probability functions Pr U that we have been considering cannotsatisfy Hume’s principle for probability, as its failure is an immediate con-sequence of Proposition 6: invariance under permutations and Hume’sprinciple for probability are mathematically equivalent. However, thiswas only to be expected. After all, we do not expect Kolmogorov prob-ability (on infinite spaces) to satisfy any such principle. The hyper-rational field F U in which the probability functions Pr U taketheir values contain infinitesimal numbers —this is what makes it non-Archi-medean. We will write Pr U ( σ ∈ A ) ≈ Pr U ( σ ∈ A ) < n − for each10 ∈ N . And we will write Pr U ( σ ∈ A ) ≪ Pr U ( τ ∈ B ) if Pr U ( σ ∈ A ) Pr U ( τ ∈ B ) ≈ Pr U cannot satisfy Hume’s principle for probability.But, at least at first sight, it seems that it would be reasonable to demand: | A | < | B | ⇒ Pr U ( δ ∈ A ) < Pr U ( δ ∈ B ) .Indeed, if in addition | B | ≥ ω , then we might even expect | A | < | B | ⇒ Pr U ( σ ∈ A ) ≪ Pr U ( σ ∈ B ) .Further, this may be expected to hold if B is a proper class but A is a set .The result is a size constraint which is a strengthening of the requirementof regularity: Definition 14 (Superregularity) ω ≤ | A | < | B | ≤ | V | ⇒ Pr U ( θ ∈ A ) ≪ Pr U ( θ ∈ B ) .Note that if A is finite and B is infinite then the consequent holds auto-matically.By a suitable restriction on admissible ultrafilters U , superregularitycan indeed be made to hold: Theorem 1
There are fine ultrafilters U such that Pr U is superregular. Proof.
If A , B ∈ V such that ω ≤ | A | < | B | are given, then we have Pr U ( θ ∈ A ) ≪ Pr U ( θ ∈ B ) if and only if for each n ∈ N , { D ∈ [ V ] < ω : Pr ( θ ∈ A | θ ∈ D ) Pr ( θ ∈ B | θ ∈ D ) ≤ n − } ∈ U . The aim is to build an ultrafilter U for which this holds.For any n ∈ N , defineC nAB ≡ { D ∈ [ V ] < ω : Pr ( θ ∈ A | θ ∈ D ) Pr ( θ ∈ B | θ ∈ D ) ≤ n − } .11 oreover, let A x ≡ { D ∈ [ V ] < ω : x ∈ D } . Define also
F ≡ { C nAB : n ∈ N , | A | < | B |} ∪ { A x : x ∈ V } . We want to prove that F has the finite intersection property. Therefore take anyx , . . . , x k ∈ V, and any h A , B , n i , . . . , h A l , B l , n l i such that (cid:12)(cid:12) A j (cid:12)(cid:12) < (cid:12)(cid:12) B j (cid:12)(cid:12) andn j ∈ N for j ≤ l . Assume for the construction that | A | ≤ | A | ≤ · · · ≤ | A l | .For every finite D, if { x , . . . . x k } ⊆ D , then D ∈ T i ≤ k A x i . So setting n = max { n j : j < l } we will extend { x , . . . . x k } to a set in C nA j B j , and hence C n j A j B j ,for each j ≤ l. Set F = { x , . . . . x k } and a = | F ∩ A | . As B is infinite andof larger cardinality than A we add n · a elements of B \ A to F , yielding afinite set F . Now set a = | F ∩ A | , and add n · a elements of B \ ( A ∪ A ) to F to give F . Note we can find these elements of B as | B | > | A | ≥ | A | .Continuing in this manner, set F = F l . Then we have ensured that for all j ≤ l Pr ( θ ∈ A j | θ ∈ F ) Pr ( θ ∈ B j | θ ∈ F ) ≤ n − , and so we have F ∈ C nA j B j , and since D ⊆ F , we also have F ∈ T i ≤ k A x i . So F indeed has the finite intersection property, whereby it can be extendedto a filter and then further to an ultrafilter U . By design, then, the resultingprobability function Pr U is super-regular. Once again, Hume’s Principle for probability cannot hold for the no-tion of probability that we are investigating. But this leaves open the ques-tion whether the converse of Hume’s Principle for probability can be madeto hold. This is called
Cantor’s Principle in [Benci et al 2007], where the au-thors investigate it in the context of their Euclidean theory of size:
Definition 15 (Cantor’s Principle) Pr U ( θ ∈ A ) = Pr U ( θ ∈ B ) ⇒ | A | = | B | .Benci, Forti, and Di Nasso prove that ‘Cantor’s Principle’ can be made tohold [Benci et al 2007, section 3.2]. It is also clear that Cantor’s Principlefollows from super-regularity. 12 .3.3 The power set principle The question whether ∀ A , B ∈ V : | A | < | B | ⇒ |P ( A ) | < |P ( B ) | is true, is independent of the axioms of set theory. (Of course the principleis true if the Generalised Continuum Hypothesis holds.) Like the cardi-nality operator, our NAP probability functions are measures of some kind.One might wonder what should follow from Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) . Inparticular, given that Pr U is intended to be a fine-grained quantitative pos-sibility measure, perhaps probability should be expected to co-vary withthe power set operation in some fairly direct manner. In other words, it isnatural to ask if the following principle can be made to hold: Definition 16 (Power Set Condition) ∀ A , B ∈ V : Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) ⇔ Pr U ( θ ∈ P ( A )) < Pr U ( θ ∈ P ( B )) .It turns out that the power set condition can indeed be satisfied: Theorem 2
There are fine ultrafilters U such that Pr U satisfies the power setcondition. The argument for this is somewhat more involved.We aim to prove Theorem 2 by building the probability function upfrom an ultrafilter U which is based on a pre-filter C ⊆ P ([ V ] < ω ) that hasthe finite intersection property.The class C is built up in stages, and in such a way that it eventuallywitnesses the truth of the power set condition for all A , B ∈ V . Stage 0
The class C consists of all A x ≡ { a ∈ [ V ] < ω : x ∈ a } ,for x ∈ V . This is to ensure that the ultrafilter that will be built from C isfine. We know that C has the finite intersection property. Limit stages
For limit stages λ , we simply set C λ ≡ S β < λ C β .13 uccessor stages Given fine-ness, we may, and will, ignore the elements of V ω . At stage α > ω , where α is a successor ordinal, we consider the sets of V α \ V α − andensure that the power set condition eventually holds for all these sets andtheir power sets, by adding families of finite sets to C α − in such a waythat the finite intersection property is preserved.As an illustrative and indeed representative example we do the casewhere α = ω + { A , B } , . . . , { A β , B β } , . . . of thepairs of elements of V ω + \ V ω .For the induction, we assume that, by having added appropriate setsof finite sets to C , the power set condition holds for { A , B } , . . . , { A β , B β } and their power sets, and that in the process the finite intersection prop-erty has been preserved. The aim is now to extend this so that it also holdsfor { A β + , B β + } . In other words, we have constructed C β , and we wantto obtain C β + , where C ≡ C . Definition 17 C A < B ≡ { D ∈ [ V ] < ω : | A ∩ D || D | < | B ∩ D || D | } . Definition 18 C A ≥ B ≡ { D ∈ [ V ] < ω : | A ∩ D || D | ≥ | B ∩ D || D | } . Claim
Either C β ∪ { C A β < B β } has the finite intersection property, or C β ∪ { C A β ≥ B β } has the finite intersection property (or both). Proof
Suppose not. Then there is a finite intersection F of elements of C β suchthat F ∩ C A β < A β = ∅ , and there is a finite intersection F ′ of elements of C β such that F ′ ∩ C A β ≥ B β = ∅ . But then ( F ∩ F ′ ) ∩ C A β < B β = ∅ and ( F ∩ F ′ ) ∩ C A β ≥ B β = ∅ . But C A β < B β ∪ C A β ≥ B β = [ V ] < ω . So then ( F ∩ F ′ ) = ∅ . Butthis contradicts the inductive assumption that C β has the finite intersectionproperty. 14hus define C β + to be C β ∪ { C A β < B β } if this has the finite intersectionproperty, or C β ∪ { C A β ≥ B β } otherwise, and by the claim, C β + has the finiteintersection property. Now setting C − ≡ S β C β , we may conclude that C − has the finite intersection property.At this point we must extend C − by adding to C − : • every set of the form C P ( A ) < P ( B ) such that C A < B ∈ C − ; • every set of the form C P ( A ) ≥P ( B ) such that C A ≥ B ∈ C − .Call the resulting set C . Our aim is to prove that C has the finite intersec-tion property.Consider an arbitrary non-empty finite family F ⊆ C . Without loss ofgenerality we may assume that the ‘judgements’ in F of the form C P ( A ) < P ( B ) or C P ( A ) ≥P ( B ) , taken together, describe a finite total pre-ordering relation R on some set {P ( A ) , . . . , P ( A k ) } . Further, we may also assume that forand sets A and B from V ω + \ V ω , C P ( A ) < P ( B ) ∈ F if and only if C A < B ∈ F ,and C P ( A ) ≥P ( B ) iff C A ≥ B ∈ F . Thus F contains witnesses for all the rele-vant judgements we may be interested in.Let F − = F ∩ C − , so F − consists only of judgements about sets in V ω + \ V ω . Then we know from the foregoing that T F − = ∅ . So takesome F − ∈ T F − . Our plan is inductively to extend F − , using the pre-order R , to a finite set F ∈ T F .We will add to F − elements that ensure that the constraints of R aresatisfied. Moreover, by choosing the elements to be added to F − from V ω + \ V ω , we ensure that the constraints imposed by F − remain satisfied.As a result, F will satisfy all constraints from F , so T F 6 = ∅ and hence C has the finite intersection property.As an example, suppose that R = P ( A ) < P ( A ) < P ( A ) = P ( A ) .(1) We start by ensuring that P ( A ) < P ( A ) is satisfied.Suppose that F − already contains n elements of P ( A ) . Since C A < A ∈F , there must be an element x − ∈ A \ A . This implies that there areinfinitely many infinite sets x in P ( A ) \P ( A ) such that x − ∈ x : we add n + F − , and call the resulting finite set F − . For later stages we will take these sets from V α + \ V α , i.e. sets of rank α . P ( A ) < P ( A ) is satis-fied:Suppose that F − already contains m elements from P ( A ) , observingthat it may be the case that m > n +
1, for there may already be a finitenumber of elements of P ( A ) in F − . Since C A < A ∈ F , there must be anelement y − ∈ A \ A , and since C A < A ∈ F , there must be an element y − ∈ A \ A . So there are infinitely many infinite sets y in P ( A ) suchthat y − , y − ∈ y : add m + F − , and call the resulting set F − .(3) Now suppose that there are m elements of P ( A ) in F − , and m el-ements of P ( A ) in F − . Moreover, suppose that m < m . (The casewhere m < m is similar.) Since C A ≥ A , C A ≥ A ∈ F , but also A = A ,there must be some x ∈ A \ A and some x ∈ A \ A . Moreover, since C A < A , C A < A ∈ F , there are elements x ∈ A \ A , x ∈ A \ A . So P ( A ) contains infinitely many infinite sets x such that { x , x , x } ⊂ x .Similarly, P ( A ) contains infinitely many infinite sets x that are outside P ( A ) , P ( A ) , P ( A ) . So we add a sufficient number of such elements to F − so that there are an equal number p of “witnesses” for P ( A ) as for P ( A ) but where p is larger than the number of witnesses for P ( A ) . Callthe resulting set F − .(4) To conclude, we set F ≡ F − . It is clear that F ∈ T F .This procedure of extending F − easily generalises to any finite totalpre-ordering on {P ( A ) , . . . , P ( A k ) } . Thus we have shown that C hasthe finite intersection property.This procedure for extending C to C while preserving the finite in-tersection property also works for larger successor ordinals: at level V α + (stage β + α = ω + β ) we can extend the corresponding F − usingsubsets of rank α . As we have said above, at limit stages we can simplytake unions. Ultimately we set C ≡ S α ∈ On C α .The class C will then have the finite intersection property, so it can beextended to a filter and then to an ultrafilter U . The probability functionbased on U will make the power set condition true for all A , B ∈ V , andthis concludes the proof of theorem 2.Our proof actually shows something slightly stronger: for all A , B with16 A | , | B | ≥ ω , we have Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) ⇔ Pr U ( θ ∈ P ( A )) ≪ Pr U ( θ ∈ P ( B )) .The reason is that in enlarging the set F − we always have infinitely manyelements to choose from.For any probability measure Pr U that satisfies power set condition wealso have that ∀ A , B ∈ V , ∀ n ∈ ω : Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) ⇔ Pr U ( θ ∈ P n ( A )) < Pr U ( θ ∈ P n ( B )) where P n ( A ) = P ( P ( . . . P ( A ) . . . )) . An easy argument shows this can-not extend to infinite applications of the power set operation.One might wonder whether the motivations behind the power set con-dition should not also support imposing the following restricted power setcondition on Pr U : Question 1
Are there probability measures such that ∀ A , B ∈ V : Pr U ( θ ∈ A ) < Pr U ( θ ∈ B ) ⇔ Pr U ( θ ∈ [ A ] < ω ) < Pr U ( θ ∈ [ B ] < ω ) ? For α ≥ ω , in each level V α + \ V α of the iterative hierarchy one finds onlyone ordinal, but infinitely many sets that are not ordinals. This might leadone to believe that a probability function on V should satisfy Pr U ( θ ∈ On ) ≈ (see [Wenmackers et al 2013, section 6.2]), itseems reasonable to require that Pr U ( θ ∈ Even | θ ∈ On ) ≈
12 ,where ‘Even’ is the class of even ordinals, which is defined in the obviousway. Thanks to Philip Welch for this question. Pr U ( θ ∈ Lim | θ ∈ On ) ≈ Theorem 3
There are super-regular probability functions Pr such that:1. Pr U ( θ ∈ On ) ≈ Pr U ( θ ∈ Even | θ ∈ On ) ≈ − ; Pr U ( θ ∈ Lim | θ ∈ On ) ≈ Proof.
As before, the aim is wisely to choose the ultrafilter U on which Pr U isbased. We want U to be such that for all k , l , m ∈ N : • Pr U ( θ ∈ A ) Pr U ( θ ∈ B ) ≤ k − if ω ≤ | A | < | B | ; • Pr U ( θ ∈ Even | θ ∈ On ) − Pr U ( θ ∈ Odd | θ ∈ On ) ≤ l − and Pr U ( θ ∈ Lim | θ ∈ On ) ≤ l − ; • Pr U ( θ ∈ On ) ≤ m − . Now we define: • A x ≡ { D ∈ [ V ] < ω : x ∈ D } ; • C kAB ≡ { D ∈ [ V ] < ω : Pr [ A | D ] Pr [ B | D ] ≤ k − } ; • I l ≡ { D ∈ [ V ] < ω : ∀ α ∈ D ∃ β ∃ n ≥ l ( α ∈ [ β , β + n ] ⊆ D ) } ; • W m ≡ { D ∈ [ V ] < ω : Pr [ On | D ] ≤ m − } .18 nd now we set: F ≡ { A x , C kAB , I l , W m : x ∈ V , k , l , m ∈ N and ω ≤ | A | < | B |} Claim: F has the finite intersection property.Let some x , . . . , x n be given. Now T i ≤ n I l i = I l where l = max { l i : i < n } ,and similarly for T i ≤ n W m i , so as before in theorem 1, it suffices to concentrateon the highest values of k , l , m.(1) A ∈ T i ≤ n A x i ⇔ { x , . . . , x n } ⊆ A . So we start with the finite set A ≡{ x , . . . , x n } , and will extend it.(2) Again we concentrate on one pair h A , B i such that ω ≤ | A | < | B | ; we leaveout further cases as they are similar. There are arbitrarily large finite subsetsC ⊆ B that are l-isolated from elements of A , meaning that each ordinal in C ismore than l ordinals removed from any ordinal in A. We choose any such C ⊆ Bthat is of size at least k · n, and we set A ≡ A ∪ C.(3) Now we extend A to ensure that all ordinal intervals are of length ≥ l: foreach α ∈ A , we add α +
1, . . . , α + l. Call the resulting finite collection A .Note that by our choice of l-isolated elements in (2), none of α +
1, . . . , α + l areelements of A.(4) Let | A | = j. Then we add j · m elements of V \ ( A ∪ B ∪ On ) to A and callthe resulting set A .It is now routine to verify that A ∈ T i ≤ n A x i ∩ C kAB ∩ I l ∩ W m . The caseincluding further sets C kA ′ B ′ is similar, thus the claim is verified. So F indeedhas the finite intersection property, whereby it can be extended to a filter and thenfurther to an ultrafilter U . By design, the resulting probability function Pr U hasthe required properties. The probability Pr U ( θ ∈ A ) is obtained by ‘summing up’ the probabilities Pr ( θ ∈ A | θ ∈ S ) for all ‘small’ parts S of V ; such Pr ( θ ∈ A | θ ∈ S ) areseen as approximations of Pr U ( θ ∈ A ) .In the finite snapshot approach, ‘small’ in this context means ‘finite’.But from a conceptual point of view, ‘finite’ might be taken to be too smallas far as the test sets (or snapshots) are concerned. Compared to V , all sets—and not just the finite sets— are small. So to determine Pr U ( θ ∈ A ) , weshould take the ‘limit’ of the values Pr ( θ ∈ A | θ ∈ S ) , where S is a set of19ny size. Then if S is infinite, Pr ( θ ∈ A | θ ∈ S ) cannot just be taken to begiven by the ratio formula but needs to be defined .In the approach to which we now turn (the bootstrapping approach),a probability Pr U ( θ ∈ A ) is determined by the probabilities Pr U ( θ ∈ A | θ ∈ S ) , where Pr U ( θ ∈ A | θ ∈ S ) , for S a large set, is then in turndetermined by probabilities Pr U ( θ ∈ A | θ ∈ S ′ ) for S ′ being smaller‘snapshots’ than S , and so on, until we reach the finite snapshots and canappeal to the probability functions that were discussed in the previoussections. Thus the bootstrapping account can be seen as a generalisation ofthe finite snapshot approach. In general terms, this is how we will proceed:(1) By the construction from the previous section, a fine ultrafilter on [ S ] < ω yields a notion of probability on all sets S ∈ V with | S | < ω . In otherwords, this yields a suitable notion of probability, call it Pr S , for every countable set S .(2) The notion of Pr S for all S ∈ V with | S | < ω is determined using thenotion of probability on countable sets: the probability of A on such an S is determined by the class of probabilities of A on the countable ‘snap-shots’ of S . Using these countable probability functions, a fine ultrafilteron [ S ] < ω gives us a notion of probability on sets S with | S | < ω .Again the resulting functions Pr S are essentially NAP-functions as de-fined in [Benci et al 2013]. They are total, regular, etc.. . .( β ) A fine ultrafilter on [ S ] < ω α , together with probability functions Pr S forall S such that | S | < ω α , yields a notion of probability on all sets S with | S | < ω α + .. . .Limit stages of course do not present a problem. So by transfinite re-cursion on cardinality this yields for every set S a notion Pr S of probabilityon S .Then a fine ultrafilter U on V = [ V ] < Card yields, using the general no-tion Pr S for S ∈ V , a notion Pr V that is a total (class) function from proper-20ies A and random variables θ to values Pr V ( θ ∈ A ) in a non-Archimedeanclass field. This probability function again satisfies the principles of thetheory NAP in [Benci et al 2013].For this construction, what we need is suitable (fine) ultrafilters onsmall, and somewhat larger, and large, . . . sets, and a fine ultrafilter U on [ V ] < Card . But we will see that all the set ultrafilters used in the con-struction can be uniformly obtained as restrictions to sets S of the given fineultrafilter on [ V ] < Card . So Pr V is determined by one initial choice of U ,whereby Pr V can be seen as the ‘limit’ of its set-restrictions Pr S , where thefunctions Pr S can in turn be seen as ‘limits’ of restrictions to their smallsubsets. This uniform construction has the advantage that the resultingprobability functions are all coherent , in the sense that for a set T , Pr S ( A | T ) is the same for all S ⊇ T and hence also for V .Now it is time to look at details of the construction. Since our construction involves ultrafilters on sets [ S ] < κ with κ > ω , wemake the following definition, which accords with the usual definition offineness on [ S ] < ω . Definition 19
For any infinite cardinal κ , an ultrafilter on [ S ] < κ is fine iff forevery x ∈ S : { T ∈ [ S ] < κ : x ∈ T } ∈ U . The notion of ‘set-fine’ ultrafilter on V is defined in the obvious way.
We first show that appropriate restrictions of ultrafilters to smaller setscan be obtained in a uniform fashion.
Definition 20
Suppose S ∈ V, | S | = κ , and U a fine ultrafilter on [ S ] < κ , andS ′ ⊆ S with | S ′ | = α < κ . Then we define the restriction U S ′ of U to S ′ asfollows.For any X ∈ P ([ S ] < κ ) , letX S ′ ≡ { y | ∃ z ∈ X : y = z ∩ S ′ and | y | < α } . Then U S ′ ≡ { X S ′ | X ∈ U } . 21 roposition 7 For any S ∈ V with | S | = κ , there are fine ultrafilters U on [ S ] < κ that restrict to a fine ultrafilter on every S ′ ⊆ S with | S ′ | = α , and ω ≤ α < κ .Further, such ultrafilters are coherent in that if T ⊂ S ′ with ω ≤ | T | < | S ′ | ,then ( U S ′ ) T = U T . Proof.
We build the ultrafilter from a pre-filter F (i.e., a set with the finite inter-section property), which can then be extended to a filter and then to an ultrafilter.For each x ∈ S, let A x ≡ { X ∈ [ S ] < κ : x ∈ X } . And let for each S ′ with | S ′ | = α < κ and S ′ ⊆ S:R S ′ ≡ { X ∈ [ S ] < κ : X ∩ S ′ ∈ [ S ′ ] < α } . Now set F ≡ { A x : x ∈ S } ∪ { R S ′ : S ′ ⊆ S and (cid:12)(cid:12) S ′ (cid:12)(cid:12) < κ } . It is easy to see that F has the finite intersection property and so can be extendedto an ultrafilter U . And by design, U is fine.Clearly U S ′ ⊆ P ([ S ′ ] < α ) . We must check the fine ultrafilter properties:(1)
Fine.
This follows from the fact that U is fine: for x ∈ S ′ this is witnessed by ( A x ) S ′ .(2) Finite intersection.
Let X , Y ∈ U S ′ . Then there are X , Y ∈ U such thatX = X S ′ and Y = Y S ′ . By the finite intersection property of U , we know thatX ∩ Y ∈ U . But X ∩ Y ⊇ ( X ∩ Y ) S ′ . So X ∩ Y ∈ U S ′ .(3) Ultra.
Take any X ⊆ [ S ′ ] < α , and let X c ≡ [ S ′ ] < α \ X . Let X ≡ { x ∈ [ S ] < κ | x ∩ S ′ ∈ X } and let X c ≡ { x ∈ [ S ] < κ | x ∩ S ′ X } . Then X c = [ S ] < κ \ X . By the ultra property for U , we have X ∈ U or X c ∈ U . But X = X S ′ andX c = X cS ′ . So X ∈ U S ′ or X c ∈ U S ′ . (4) Non-principality.
This is implied by fineness.(5)
Empty set property : We have to show that ∅
6∈ U S ′ . It suffices to show thatfor each X ∈ U , X S ′ = ∅ . Since R S ′ ∈ U , X ∩ R S ′ = ∅ . But for any set x inthis intersection, x ∩ S ′ ∈ [ S ′ ] < α . So x ∩ S ′ ∈ X S ′ = ∅ . For coherence, take T ⊂ S ′ ⊂ S with | T | < | S ′ | < | S | and let X ∈ U . AsR S ′ ∈ U it is enough to show that (( X ∩ R S ′ ) S ′ ) T = ( X ∩ R S ′ ) T . Now (( X ∩ R S ′ ) S ′ ) T = { y | ∃ z ∈ X ∩ R S ′ : y = z ∩ T , | y | < | T | and | z ∩ S ′ | < | S ′ |} , butby definition, for any z ∈ R S ′ we have | z ∩ S ′ | < | S ′ | . Thus (( X ∩ R S ′ ) S ′ ) T = { y | ∃ z ∈ X ∩ R S ′ : y = z ∩ T and | y | < | T |} = ( X ∩ R S ′ ) T . [ V ] < Card : Consequence 1
There are fine ultrafilters U on [ V ] < Card , such that for every setS with | S | = α , U S is a fine ultrafilter on [ S ] < α and the coherence property holds. Proof.
By the same reasoning as in the previous proposition.
Now we show how for every set, a probability function on that set canbe defined. The same procedure can then be used to define a probabilityfunction on V , and these probability functions are coherent .The key is to spell out what is involved in the β -th step of the recursiveprocedure for defining probabilities on sets:( β ) A fine ultrafilter U on [ S ] < ω β (with ω β = | S | ), together with probabilityfunctions Pr T for all T such that | T | < ω β , yields a notion of probability Pr S on S .As in section 2, we define a function f θ ∈ A such that for all T ∈ [ S ] < ω β : f θ ∈ A ( T ) ≡ Pr T ( θ ∈ A ∩ T ) .Similarly, we define a function f θ ∈ A ∧ ν ∈ B such that for all T ∈ [ S ] < ω β : f θ ∈ A ∧ ν ∈ B ( T ) ≡ Pr T ( θ ∈ A ∩ T ∧ ν ∈ B ∩ T ) .Then Pr S ( θ ∈ A ) is defined as [ f θ ∈ A ] U , and Pr S ( θ ∈ A | ν ∈ B ) is definedas [ f θ ∈ A ∧ ν ∈ B ] U [ f ν ∈ B ] U .This function Pr S will then be an NAP probability function in the sense of[Benci et al 2013].Now in an exactly similar way, we define a class probability function Pr + U on V , using the probability functions on ‘small’ classes (i.e., sets) andultrafilters on ‘small’ classes which (given proposition 7) we can now as-sume to have been defined on the basis of an ultrafilter U on [ V ] < Card withwhich we start. The function Pr + U is total, regular, and uniform for thesame reasons as why its ‘smaller cousin’ Pr U has these properties.23e now check coherence. We will do this only for straight probabilitiesrather than random variables in general, as although coherence holds forrandom variables also, it is much more technical to state. Below we use Pr ( A ) to denote Pr ( ι ∈ A ) where ι s the identity random variable. Proposition 8
For any class A and sets T ⊂ S with | T | < | S | we have Pr T ( A ) = Pr S ( A | T ) . Proof.
We show by induction on | T | that that the above holds for all S ⊃ T with | S | > | T | . Strictly speaking, the range of Pr T may be a different non-archimedeanfield to the range of Pr S , but there is a natural embedding of the former into thelatter defined by i ([ f ] U T ) = [ ¯ f ] U S where for X ∈ S < | S | , ¯ f ( X ) = f ( X ∩ T ) . Thisis well-defined as { X ∈ S < | S | : | X ∩ T | < | T |} = ( R T ) S ∈ U S .Using this embedding we have i ( Pr T ( A )) = i ([ f A ] U T ) = [ ¯ f A ] U S . Now forX ∈ ( R T ) S ( ∈ U S ) we have: ¯ f A ( X ) = f A ( X ∩ T ) = f A ∩ T ( X ∩ T ) = Pr X ∩ T ( A ∩ T ) . As X ∈ ( R T ) S we have | X ∩ T | < | T | so by our inductive hypothesis Pr X ∩ T ( A ∩ T ) = Pr X ( A ∩ T | T ) = f A ∩ T ( X ) f T ( X ) . But by definition, (cid:2) f A ∩ T f T (cid:3) U S = Pr S ( A | T ) , so [ ¯ f A ] U S = Pr S ( A | T ) and we’re done. In our definition of the probability of a set theoretic property, the prob-ability Pr + U ( θ ∈ A ) of a property A is determined by the probabilities Pr S ( θ ∈ A ) of A on large ‘snapshots’ S , where a probability Pr S ( θ ∈ A ) (for S a large set) is then in turn determined by the probabilities Pr S ′ ( θ ∈ A for S ′ being smaller ‘snapshots’ than S , and so on. Conceptually, the def-inition in section 4.3 is superior to the simpler definition suggested fromsection 2: we want to take the behaviour of the property on as many andas large ‘snapshots‘ as possible into account.24t is not straightforward to compare the simple and the more involveddefinition: the simple method is based on an ultrafilter on [ V ] < ω whereasthe more involved method is based on an ultrafilter on V = [ V ] < Card .The obvious suggestion is to base the comparison on the relation be-tween a probability function determined by an ultrafilter U on [ V ] < Card and its restriction to [ V ] < ω defined as U ↾ ω = { X ∩ [ V ] < ω | X ∈ U } . But: Proposition 9
Not all ultrafilters on [ V ] < Card restrict to ultrafilters on to [ V ] < ω . Proof.
Consider
A ∪ [ V ] < ω , where A is the set of atoms (guaranteeing fine-ness)and [ V ] < ω is the relative complement of [ V ] < ω in [ V ] < Card . Then
A ∪ [ V ] < ω has the finite intersection property and so can be extended to a fine ultrafilter U on [ V ] < Card . But ∅ ∈ U ↾ ω . So U does not restrict to an ultrafilter on [ V ] < ω . On the other hand, every fine ultrafilter on [ V ] < Card restricting to anultrafilter on [ V ] < ω essentially is an ultrafilter on [ V ] < ω : Proposition 10
Suppose U is a fine ultrafilter on [ V ] < Card restricting to an ul-trafilter U ↾ ω on [ V ] < ω . Then [ V ] < ω ∈ U . Proof.
Since U is ultra, we have [ V ] < ω ∈ U or [ V ] < ω ∈ U . But if [ V ] < ω ∈ U ,then ∅ ∈ U ↾ ω , so that U does not restrict, contradicting the assumption. So [ V ] < ω ∈ U . This means that the essentially involved probability functions on V can-not be reduced to ‘simple’ probability functions on V . In this article we have explored two methods for modelling, by meansof non-Archimedean probability functions, the properties of random vari-ables ranging over the set theoretic universe: the finite snapshot methodand the bootstrapping method. Concerning the finite snapshot method,we found that many of the probabilistic properties that seem intuitivelyplausible can be satisfied. The bootstrapping method is more satisfyingfrom a conceptual point of view, but we have only been able to show thatthe resulting probability functions satisfy minimal requirements. So muchwork remains to be done. This is a different notion of restriction to that defined in the previous section as herewe are only restricting the index, while the underlying class remains the same ( V ). eferences [Alon et al 2000] Alon, N. & Spencer, J. The Probabilistic Method.
Secondedition, Wiley, 2000.[Benci et al 2003] Benci, V. & Di Nasso, M., Numerosities of labelled sets.Anewwayof counting. Adv. Math. (2003), p. 50–67.[Benci et al 2007] Benci, V., Di Nasso, Mauro, Forti, M. AnEuclideanmea-sureofsizeformathematicaluniverses. Logique et Analyse (2007),p. 43–62.[Benci et al 2013] Benci, V., Horsten, H., Wenmackers, S. Non-Archimedean probability, Milan Journal of Mathematics (2013),p. 121–151.[Benci et al 2018] Benci, V. , Horsten, L. Wenmackers, S., Infinitesimalprobabilities. British Journal for the Philosophy of Science (2018),p. 509–552.[Brickhill et al 2018] Brickhill, H. & Horsten, L. Triangulating non-Archimedean probability. Review of Symbolic Logic (2018).p. 519–546.[Freiling 1986] Freiling, C. Axioms of infinity. Throwing darts at the realnumberline. Journal of Symbolic Logic (1986), p. 190–200.[Hamkins 2015] Hamkins, J. Is the dream solution to the continuum hy-pothesisattainable? Notre Dame Journal of Formal Logic (2015),p. 135–145.[Horsten 2019] Horsten, L. The Metaphysics and Mathematics of ArbitraryObjects.
Cambridge University Press, forthcoming.[Kremer 2004] Kremer, P. Indeterminacyoffairinfinite lotteries. Synthese (2014), p. 1757–1760.[Wenmackers et al 2013] Wenmackers, S., Horsten, L. Fair infinite lotter-ies. Synthese190