aa r X i v : . [ s t a t . O T ] M a r Aust. N.Z.J.Stat.2021 doi: 10.1111/j.1467-842X.XXX
Anna Karenina and The Two Envelopes Problem
R. D. Gill MathematicalInstitute,LeidenUniversity
Summary
The Anna Karenina principle is named after the opening sentence in the eponymous novel:Happy families are all alike; every unhappy family is unhappy in its own way. The TwoEnvelopes Problem (TEP) is a much-studied paradox in probability theory, mathematicaleconomics, logic, and philosophy. Time and again a new analysis is published in whichan author claims finally to explain what actually goes wrong in this paradox. Each author(the present author included) emphasizes what is new in their approach and concludes thatearlier approaches did not get to the root of the matter. We observe that though a logicalargument is only correct if every step is correct, an apparently logical argument which goesastray can be thought of as going astray at different places. This leads to a comparisonbetween the literature on TEP and a successful movie franchise: it generates a successionof sequels, and even prequels, each with a different director who approaches the same basicpremise in a personal way. We survey resolutions in the literature with a view to synthesis,correct common errors, and give a new theorem on order properties of an exchangeable pairof random variables, at the heart of most TEP variants and interpretations. A theorem onasymptotic independence between the amount in your envelope and the question whetherit is smaller or larger shows that the pathological situation of improper priors or infiniteexpectation values has consequences as we merely approach such a situation.
Keywords: Recreational mathematics, mathematical paradoxes, Monty Hall problem,Exchange paradox, Necktie problem, Saint Petersburg paradox
1. TEP-11.1. Introduction
Here is the (currently) standard form of the Two Envelopes Problem (TEP), as given byFalk (2008), who cites Wikipedia for the precise formulation. Wikipedia cites Falk (2008), sothis is kind of frozen now. I will postpone remarks on the (pre-)history of TEP till near the endof the paper. Writing for probabilists and statisticians I shall move fast through (for us) easydevelopments. However on the way I will discuss logicians’, philosophers’, and economists’approaches and thereby call into question the very assumptions that for “us” probabilists andstatisticians are as natural as the air we breathe, hence taken for granted. Though Bayesiansand frequentists may also live in different worlds.You are given two indistinguishable envelopes, each of which contains a positive sum ofmoney. One envelope contains twice as much as the other. You may pick one envelope andkeep whatever amount it contains. You pick one envelope at random but before you open ityou are offered the possibility to take the other envelope instead. Now consider the followingreasoning: Mathematical Insitute, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands. Email: [email protected]: 10 March 2020. This version may be thought of as “version 4.1”.Acknowledgment. I’m very grateful to Rianne de Heide, who compiled the bibliography file for me as a firststep in a new joint research project. Onwards! c (cid:13) anzsauth.cls [Version: 2018/01/30Version9] ANNA KARENINA AND THE TWO ENVELOPES
1. I denote by A the amount in my selected envelope.2. The probability that A is the smaller amount is / , and that it is the larger amount isalso / .3. The other envelope may contain either A or A/ .4. If A is the smaller amount the other envelope contains A .5. If A is the larger amount the other envelope contains A/ .6. Thus the other envelope contains A with probability / and A/ with probability / .7. So the expected value of the money in the other envelope is (1 / A + (1 / A/
2) =5 A/ .8. This is greater than A , so I gain on average by swapping.9. After the switch, I can denote that content by B and reason in exactly the same manneras above.10. I will conclude that the most rational thing to do is to swap back again.11. To be rational, I will thus end up swapping envelopes indefinitely.12. As it seems more rational to open just any envelope than to swap indefinitely, we havea contradiction.Notice that the problem is not to give a correct proof that there is no point in switching.The problem, which many authoritative writers admit still defeats them, is to explain what iswrong with the arguments given above.For a mathematician it helps to introduce some more notation. I’ll refer to the envelopesas Envelope A and Envelope B, and the amounts in them as A and B . Let me introduce X tostand for the smaller of the two amounts and Y to stand for the larger. I think of all four asbeing random variables; but this includes the situation that we think of X and Y as being twofixed though unknown amounts of money x and y = 2 x : a degenerate probability distributionis also a probability distribution, a constant is also a random variable. It includes the modelof a frequentist statistician who imagines (or has been reliably informed that) the organizerof this game repeatedly chooses, according to a fixed probability distribution, a new randomamount X to be the smaller of the two; then the other amount is determined as Y = 2 X , andfinally by the toss of a fair coin (independent of the two amounts) one is put in EnvelopeA and the other in Envelope B, defining random variables A and B . On the other hand, italso includes the model of a true Bayesian statistician which formally is identical to what Ijust described, but where the probability law of the random variable X is her subjective priordistribution of the unknown, smaller, amount of money in the two envelopes, in one specificrealisation of the game. For her, x is a fixed but unknown positive quantity, and the law of theartificial random variable X encapsulates her prior beliefs about x . For the frequentist, x is theactually realised value of a physical random variable X . Both he and she know that EnvelopeA is filled by tossing a fair coin and then putting either x or y = 2 x in it, and since the calculusof subjectivist probability is the same as the calculus of frequentist probability (Kolmogorovrules!), their mathematical models are identical: only their interpretation is different.So we have four random variables X , Y , A and B and it is given that Y = 2 X > andthat ( A, B ) = (
X, Y ) or ( Y, X ) . The assumption that the envelopes are indistinguishable andclosed and one is picked at random, translates into the assumption that the event { A = X } hasprobability 1/2, whatever the amount X ; in other words, the random variable X and the event { A = X } are independent. And to repeat what I just stated: the notation does not prejudice c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
X, Y ) or ( Y, X ) . The assumption that the envelopes are indistinguishable andclosed and one is picked at random, translates into the assumption that the event { A = X } hasprobability 1/2, whatever the amount X ; in other words, the random variable X and the event { A = X } are independent. And to repeat what I just stated: the notation does not prejudice c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 3 the question whether probability is taken in its subjectivist or frequentist interpretation – dowe use probability to represent our (lack of) knowledge, or do we use probability to representchance mechanisms in the real world?I consider the argument steps 1–12 together with the structural relationships andprobabilistic properties of A , B , X and Y to be the definition of The Two Envelopes Problem(TEP), or more precisely, The Original Two Envelopes Problem (TEP-1). Just as a successfulmovie may spawn a series of sequels and occasionally even prequels, TEP has done the same.We must therefore be careful to distinguish between the entire franchise TEP and the originalTEP. Moreover, the original TEP did not come out of thin air, but had a history. Think ofold movies which the public might have forgotten, but the directors of new movies certainlyhadn’t.The alert probabilist will notice that something is going wrong in steps 6 and 7.An expectation value is being computed, but how? Is it a conditional expectation or anunconditional expectation? These are two main interpretations of the intention of the authorof 1–12: the author meant to compute the unconditional expectation E ( B ) , or the conditionalexpectation E ( B | A ) . However the author does not reveal his intention so this is pureguesswork on our side. Curiously, probabilists tend to go for the conditional expectation,while philosophers think more often that an unconditional expectation was intended. I willdescribe the philosopher’s choice (and many layperson’s choice) first. Let’s explore the philosopher’s interpretation first. According to that interpretation weare aiming at computation of E ( B ) by conditioning on the two cases separately: X = A (envelope A contains the smaller amount of money), X = B (envelope B contains the smalleramount). If that is so, then the rule which we want to use is E ( B ) = P ( A = X ) E ( B | A = X ) + P ( B = X ) E ( B | B = X ) . The two situations have equal probability / , as mentioned in step 6, and those probabilitiesare then substituted, correctly, in step 7. However according to the this interpretation, thetwo conditional expectations are screwed up. A correct computation of E ( B | A = X ) is thefollowing: conditional on A = X , B is identical to X , so we have to compute E (2 X | A = X ) = 2 E ( X | A = X ) . But we are told that whether or not Envelope A contains the smalleramount X is independent of the amounts X and X , so E ( X | A = X ) = E ( X ) . Similarlywe find E ( B | B = X ) = E ( X | B = X ) = E ( X ) .Thus the expected values of the amount of money in Envelope B are E ( X ) and E ( X ) in the two situations that it contains the larger and the smaller amount. The overallaverage is (1 / E ( X ) + (1 / E ( X ) = (3 / E ( X ) . Similarly this is the expected amountin Envelope A.The clearest exponents of the philosophers’ diagnosis of the core of the problem areSchwitzgebel & Dever (2008) who write: “ You would expect less in
Envelope A if you knewthat it was the envelope with less than you would if you knew it was the envelope with more ”.This is perfectly correct, and I think a very intuitive explanation. In fact, we can easily saysomething stronger: the expected amount in the second envelope given it’s the larger of thetwo is twice the expected amount given it’s the smaller! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Envelope A if you knewthat it was the envelope with less than you would if you knew it was the envelope with more ”.This is perfectly correct, and I think a very intuitive explanation. In fact, we can easily saysomething stronger: the expected amount in the second envelope given it’s the larger of thetwo is twice the expected amount given it’s the smaller! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ANNA KARENINA AND THE TWO ENVELOPES
As many philosophy authors repeat, the resolution of the paradox is that the writerhas committed the sin of equivocation: using the same words to describe different things.However this is equivocation of somewhat subtle concepts. Taking the subjective Bayesianinterpretation of our model, we are confusing our beliefs about b , the amount in the secondenvelope, in the situation where we imagine being informed that it is the larger amount, fromwhat we imagine our beliefs about it would be if we were to imagine being informed that it isthe smaller amount. And at the same time we are making an even more serious equivocation,namely of levels: we are confusing expectation values with actual values.In my opinion the philosopher’s interpretation is very far fetched. However it seems tobe a very common way in which also ordinary lay persons interpret the context and intent ofthe writer. There is a very different way to interpret the intention of the writer of steps 6 and7 which is far more common in the probability literature. Apparently it comes completelynaturally to “us” probabilists and statisticians, while it is far too sophisticated ever to occurto ordinary folk. Since the answers are expressed in terms of the amount in Envelope A, it also seemsreasonable to suppose that the writer intended to compute E ( B | A ) . Contrary to what manywriters imagine, this in no way implies that our player is actually looking in his envelope. Thepoint is that he can imagine what his expectation value would be of the contents of EnvelopeB, for any particular amount a he might imagine seeing in his own Envelope A, if he were totake a peek. If it would appear favourable to switch whatever that imaginary amount mightbe, then he has no need to peek in his envelope at all: he can decide to switch anyway.The conditional expectation E ( B | A = a ) can be computed just as the ordinaryexpectation, by averaging over two situations, but the mathematical rule which is being usedis then E ( B | A ) = P ( A = X | A ) E ( B | A = X, A ) + P ( B = X | A ) E ( B | B = X, A ) . If this was the writer’s intention, then in step 7 he correctly substitutes E ( B | A = X, A ) = E (2 X | A = X, A ) = E (2 A | A = X, A ) = 2 A and similarly E ( B | B = X, A ) = A . Buthe also takes P ( A = X | A ) = 1 / and P ( B = X | A ) = 1 / , that is to say, the writerassumes that the probability that the first envelope is the smaller or the larger doesn’t dependon how much is in it. But it obviously could do! For instance if the amount of money isbounded then sometimes one can tell for sure whether Envelope A contains the larger orsmaller amount from knowing how much is in it.In probabilistic terms, under this interpretation, the writer has mistakenly takenindependence of the event { X = A } from the amount A as the same as the implicitly givenassumption that the event { A = X } is independent of the random variable X . In probability theory we know that (statistical) independence is symmetric. In particular,it is equivalent to say that A is statistically independent of { A = X } and to say that { A = X } is statistically independent of A . The probabilist’s interpretation of the mess was that thewriter incorrectly assumed { A = X } to be independent of A . The philosophers Schwitzgebel c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
As many philosophy authors repeat, the resolution of the paradox is that the writerhas committed the sin of equivocation: using the same words to describe different things.However this is equivocation of somewhat subtle concepts. Taking the subjective Bayesianinterpretation of our model, we are confusing our beliefs about b , the amount in the secondenvelope, in the situation where we imagine being informed that it is the larger amount, fromwhat we imagine our beliefs about it would be if we were to imagine being informed that it isthe smaller amount. And at the same time we are making an even more serious equivocation,namely of levels: we are confusing expectation values with actual values.In my opinion the philosopher’s interpretation is very far fetched. However it seems tobe a very common way in which also ordinary lay persons interpret the context and intent ofthe writer. There is a very different way to interpret the intention of the writer of steps 6 and7 which is far more common in the probability literature. Apparently it comes completelynaturally to “us” probabilists and statisticians, while it is far too sophisticated ever to occurto ordinary folk. Since the answers are expressed in terms of the amount in Envelope A, it also seemsreasonable to suppose that the writer intended to compute E ( B | A ) . Contrary to what manywriters imagine, this in no way implies that our player is actually looking in his envelope. Thepoint is that he can imagine what his expectation value would be of the contents of EnvelopeB, for any particular amount a he might imagine seeing in his own Envelope A, if he were totake a peek. If it would appear favourable to switch whatever that imaginary amount mightbe, then he has no need to peek in his envelope at all: he can decide to switch anyway.The conditional expectation E ( B | A = a ) can be computed just as the ordinaryexpectation, by averaging over two situations, but the mathematical rule which is being usedis then E ( B | A ) = P ( A = X | A ) E ( B | A = X, A ) + P ( B = X | A ) E ( B | B = X, A ) . If this was the writer’s intention, then in step 7 he correctly substitutes E ( B | A = X, A ) = E (2 X | A = X, A ) = E (2 A | A = X, A ) = 2 A and similarly E ( B | B = X, A ) = A . Buthe also takes P ( A = X | A ) = 1 / and P ( B = X | A ) = 1 / , that is to say, the writerassumes that the probability that the first envelope is the smaller or the larger doesn’t dependon how much is in it. But it obviously could do! For instance if the amount of money isbounded then sometimes one can tell for sure whether Envelope A contains the larger orsmaller amount from knowing how much is in it.In probabilistic terms, under this interpretation, the writer has mistakenly takenindependence of the event { X = A } from the amount A as the same as the implicitly givenassumption that the event { A = X } is independent of the random variable X . In probability theory we know that (statistical) independence is symmetric. In particular,it is equivalent to say that A is statistically independent of { A = X } and to say that { A = X } is statistically independent of A . The probabilist’s interpretation of the mess was that thewriter incorrectly assumed { A = X } to be independent of A . The philosophers Schwitzgebel c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 5 and Dever’s interpretation was that the writer incorrectly assumed A to be independent of { A = X } .One point I’m making is that we have no way of knowing what the original writer wasmeaning to do. One thing is clear: he is doing probability calculations in a sloppy way. He iscomputing an expectation by taking the weighted average of the expectations in two differentsituations. Either he gets the expectations right but the weights wrong, or the weights right butthe expectations wrong (or is there a third possibility?). Is he confusing random variables andpossible values they can take? Or conditional expectations and unconditional expectations?Conditional probabilities and unconditional probabilities? That simply cannot be decided.TEP-1 has many cores. And these many cores give some reason for the branching family ofvariant paradoxes which grew from it.The analysis so far leads me to the interim conclusion that TEP-1 does not deserve to becalled a paradox (and certainly not an unresolved paradox, as many writers in philosophystill insist on claiming): it is merely an example of a screwed-up probability calculationwhere the writer is not even clear what he is trying to calculate. The mathematics beingused appears to be elementary probability theory, but whatever the writer is intending to do,he is breaking the standard, elementary rules. Steps 6 and 7 together are inconsistent. Onecannot say that one of the steps is wrong and the other is right. One can offer as diagnosis,that the inconsistency is caused by the author giving the same names to different things, orthe same symbols to different things. We can’t deduce what he is confusing with what. Heprobably is not even aware of the distinctions. (However ... in the next section I will showthat this interim conclusion is hasty. Maybe the writer was smarter than we give him creditfor.) But first of all I will present a little theorem which ought to be known in the literature,but which however almost nobody seems to realize is true.We saw that both philosophers and probabilists both put their finger on essentially thesame point: the random variable A need not be independent of the event { A = X } . We cansay something a whole lot stronger. The random variable A cannot be independent of theevent { A = X } .Let me make a side remark here, connected to the parenthetical “however” above.Suppose that the writer of TEP is a subjective Bayesian. The intended interpretation ofthe random variables X , Y , A and B is therefore that their joint probability distributionrepresents the writer’s prior knowledge or uncertainty about the actual amounts involved.Denote the actual smaller and large amount as x > and y = 2 x , and denote by a and b the actual amounts in the first and second envelopes. These are fixed, unknown amounts ofmoney. The probability distribution of X encapsulates the writer’s prior knowledge about x .From this, his prior knowledge about all four amounts is defined by first defining Y = 2 X and then defining A and B as follows: independently of X , with probability one half, A = X and Y = B ; with the complementary probability one half, A = Y and B = X . Since themathematics I am about to do assumes I am within conventional probability theory, it followsthat I started with a proper probability distribution for X . Our Bayesian does not have animproper prior. We will return to the possibility of an improper prior in the next section. Theorem 1.
Therandomvariable A cannotbeindependentoftheevent { A < B } . Proof.
Suppose to start with that A and B have finite expectation values. Note that E ( A − B | A − B > > . That’s the same, since all expectation values are finite, as c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Suppose to start with that A and B have finite expectation values. Note that E ( A − B | A − B > > . That’s the same, since all expectation values are finite, as c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ANNA KARENINA AND THE TWO ENVELOPES E ( A | A > B ) > E ( B | A > B ) = E ( A | B > A ) . In the last step we used the symmetryof the joint distribution of A and B .Now if the expectation of A depends on whether A > B or B > A then the distributionof A depends on which is true, or in other words, the random variable A is not stochasticallyindependent of the event A > B . Equivalently, the event
A > B is not independent of therandom variable A .For the general case, choose some strictly increasing map from the positive real lineto a bounded interval, for instance, arc tangent. Apply this transformation to both A and B and then apply the argument just given to the transformed variables. The ordering of thevariables is unaffected by the transformation. So we find that the transformed variable A isnot independent of the event { A < B } , and this implies the non-independence of A of thisevent.Note that we only used the symmetry of the distribution of A and B , and the fact thatthese variables have positive probability to be different. We did not use their positivity. Aswe will see at the end of the paper, this little theorem lies at the heart not only of the twoenvelope paradox but also of a whole family of related exchange paradoxes. In every case,the originators of the paradoxes (or the first to “solve” them) have “explained” the paradox bydoing explicit calculations in a particular case. This always leaves later writers with a feelingthat the paradox has not really been solved. Indeed, just giving one example does not prove ageneral theorem. One swallow does not make a summer.Samet, Samet & Schmeidler (2004) seem to be the only writers on TEP who know thegeneral theorem. They prove a weaker result in a more general situation: they do not assumesymmetry. Their proof is a little more tricky than ours, but still, not much more than a pageand basically elementary too. When one adds the assumption of symmetry their result givesours. Eckhardt (2013) Chapter 8 “TheTwo-Envelopes Problem” has some nice mathematicalresults (which I admit that I have not yet digested), which seem to give the same globalmessages as this paper.Our proof showed that for any strictly monotone increasing function g suchthat E ( g ( A )) exists and is finite, E ( g ( A ) | A < B ) < E ( g ( A )) < E ( g ( A ) | A > B ) .Approximating a not strictly monotone function by strictly increasing functions and going tothe limit, we obtain the same inequalities only possibly not strict for all monotone increasing g with E ( g ( A )) exists and finite. This is the same as saying that the laws of A given A < B , of A itself, and of A given A > B , are strictly stochastically ordered: for all aP ( A > a | A < B ) ≤ P ( A > a ) ≤ P ( A > a | A > B ) , with strict inequality for some a .This observation gives us the following general theorem: Theorem 2.
Suppose A and B are two random variables, unequal with probability 1, andwhosejointdistributionissymmetricunderexchangeofthetwovariables.Then P ( A < B | A ) = P ( B < A | A ); inotherwords,forasetofvaluesof A withpositiveprobability, P ( A < B | A = a ) = P ( B < A | A = a ) . Also, the laws of A conditional on A < B , unconditional, and conditional on
A > B arestrictlystochasticallyordered(fromsmalltolarge);inotherwords, P ( A > a | A < B ) ≤ P ( A > a ) ≤ P ( A > a | A > B ) forall a, c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
A > B arestrictlystochasticallyordered(fromsmalltolarge);inotherwords, P ( A > a | A < B ) ≤ P ( A > a ) ≤ P ( A > a | A > B ) forall a, c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 7 withstrictinequalityfor a withpositiveprobabilityunderthelawof A .Intuitively, P ( A < B | A = a ) ought to be decreasing in a . Simple examples show thatthis is not necessarily true. However it is true in a certain average sense. For any a , the resultwhen averaging over a < a is never larger than the result when averaging over a ≥ a ,where the averaging is with respect to the appropriately normalized law of A . To be precise: E ( P ( A < B | A ) | A < a ) ≥ P ( A < B ) = 1 / ≥ E ( P ( A < B | A ) | A ≥ a ) for all a , with both inequalities strict for some a .The just mentioned average ordering of the conditional probabilities P ( A < B | A = a ) and the stochastic ordering of the conditional (given the ordering of A and B ) andunconditional laws of A are exactly equivalent results, and both are forms of the statement thatthe random variable A and the indicator variable of the event { A > B } are strictly positiveorthant dependent . Recall that X and Y are positive orthant dependent if for all x and y , P ( X ≥ x, Y ≥ y ) ≥ P ( X ≥ x ) P ( Y ≥ y ) ; I call the dependence strict if there exist x and y such that the inequality is strict.
2. TEP-2
Just like a great movie, the success of TEP led to several sequels and to a prequel, sonowadays when we talk about TEP we have to make clear whether we mean the originalmovie TEP-I or the whole franchise.However before introducing TEP-2 proper, I’ll present some intermediate materialbelonging formally in TEP-1.
Are steps 6 and 7 of the TEP argument really inconsistent? Suppose the author isactually a Bayesian and the probability distribution she is using for X summarizes her priorknowledge about this amount of money. Suppose she knows absolutely nothing about it,except that it is positive. In that case, if she knows nothing about X , she knows nothing about cX , for any positive c . In particular, if we know nothing about X then knowing A intuitivelygives us no clue at all as to whether it is X or X .Now, if knowledge (or lack thereof) can be expressed by probability measures, then theprobability measure expressing total ignorance about X and that expressing total ignoranceabout cX must be the same, for any c > . The only locally bounded measures on the positivehalf line invariant under multiplication by just two constants c > and c ′ > , both differentfrom 1, and such that the ratio of their logarithms is irrational, are those with Lebesguedensity proportional to /x . For instance: c = 2 and c ′ = e . The only bounded measures onthe positive half line invariant under multiplication by any positive number are those withdensity proportional to /x .Probability theorists will now retort that there is no proper probability distribution withdensity proportional to /x , end of story! However, I think that that is a cheap way out. That acertain formal mathematical framework for some real world domain (reasoning and decisionmaking under uncertainty) does not hold a representative of a conceptual object belonging tothat field could just as well be seen as a defect of standard probability theory. In any case,the standard framework of probability theory does contain arbitrarily close approximations c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Are steps 6 and 7 of the TEP argument really inconsistent? Suppose the author isactually a Bayesian and the probability distribution she is using for X summarizes her priorknowledge about this amount of money. Suppose she knows absolutely nothing about it,except that it is positive. In that case, if she knows nothing about X , she knows nothing about cX , for any positive c . In particular, if we know nothing about X then knowing A intuitivelygives us no clue at all as to whether it is X or X .Now, if knowledge (or lack thereof) can be expressed by probability measures, then theprobability measure expressing total ignorance about X and that expressing total ignoranceabout cX must be the same, for any c > . The only locally bounded measures on the positivehalf line invariant under multiplication by just two constants c > and c ′ > , both differentfrom 1, and such that the ratio of their logarithms is irrational, are those with Lebesguedensity proportional to /x . For instance: c = 2 and c ′ = e . The only bounded measures onthe positive half line invariant under multiplication by any positive number are those withdensity proportional to /x .Probability theorists will now retort that there is no proper probability distribution withdensity proportional to /x , end of story! However, I think that that is a cheap way out. That acertain formal mathematical framework for some real world domain (reasoning and decisionmaking under uncertainty) does not hold a representative of a conceptual object belonging tothat field could just as well be seen as a defect of standard probability theory. In any case,the standard framework of probability theory does contain arbitrarily close approximations c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ANNA KARENINA AND THE TWO ENVELOPES to the improper prior. If the author only meant to write that since she knows almost nothingabout X , it then follows that given A , ∆ is pretty certain to be very close to Bernoulli(1/2),we could not fault steps 6 and 7.Let me make this reasoning firm and also show where it leads to, namely to a whole classof new TEP paradoxes which I’ll call TEP-2. This is where TEP moves from probabilitytheory to mathematical economics. But first we stick within (or very close to) probabilitytheory.Suppose X has the probability distribution with density c/x on the interval [ ǫ, M ] , zerooutside. An easy calculation shows that the proportionality constant is c = 1 / log( M/ǫ ) .From this we find that the joint distribution of ( A, ∆) has density c/ (2 x ) on [ ǫ, M ] / × { } ∪ [2 ǫ, M ] × { } and hence the conditional distribution of ∆ given A is Bernoulli(1/2) for A = a ∈ [2 ǫ, M ] , while it is degenerate for a ∈ [ ǫ, ǫ ) ∪ ( M, M ] . Note that the probabilitythat the distribution of ∆ given A is not Bernoulli(1/2) converges to zero as ǫ → , M → ∞ .Similarly, the discrete uniform distribution on k , k = − M, ..., N has this property as
M, N → ∞ , and can be seen as an approximation to the improper prior which is uniform on all integer powers (positive and negative) of 2.Let me give an elementary proof characterizing all probability distributions (properor improper) such that A and ∆ are independent. This seems to me to be much moreconstructive than giving a proof showing that no proper probability distribution exists withthis property (I found such a proof in the literature but have mislaid the reference). However,since I am working with improper as well as proper distributions I have to be a bit carefulwith probability theory: I move to measure theory, supposing X is “distributed” accordingto a measure on (0 , ∞ ) . We understand, I am sure, what I mean by supposing that ∆ isBernoulli(1/2), independently of X , and now I can define ( A, ∆) as function of ( X, ∆) and this generates an image measure on the range of ( A, ∆) which is simply a copy ofhalf of the original improper distribution of X on (0 , ∞ ) × { } together with half of theoriginal improper distribution of X on (0 , ∞ ) × { } . We assume that this measure exhibitsindependence between A and ∆ . But that simply means that the improper distributions of X and of X are identical. Taking logarithms to base 2 the improper distributions on thewhole real line of log X and of X are identical. The distribution of log X isinvariant under a shift of size +1 and hence under all integer shifts. Such measures are easyto characterize: place an arbitrary measure on the interval [0 , and glue together all integershifts of this measure to a measure on the real line. In semi-probabilistic terms, now using { . } to denote the fractional part of a real number, { log ( X ) } and ⌊ log ( X ) ⌋ are independent,with the integer part being uniformly distributed over all integers, and the fractional parthaving an arbitrary distribution.It would be nice to show that all probability distributions of X which have ∆ and A approximately independent, are approximately of this form. The crux of the matter istherefore to choose meaningful notions of both instances of “approximate”. Also, it wouldbe nice to get rid of the special dependence on the number 2. We could just as well haveformulated the two envelopes problem using any other factor, at least, large enough to makeexchange seem attractive. If a measure on the real line is invariant under all shifts then it hasto be uniform. If it is invariant under two relatively irrational shifts then it is uniform. If it islocally bounded and invariant under all rational shifts it is uniform.So far I only succeeded in deriving some partial results, and will stick with the originalproblem with the special role of 2. c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
M, N → ∞ , and can be seen as an approximation to the improper prior which is uniform on all integer powers (positive and negative) of 2.Let me give an elementary proof characterizing all probability distributions (properor improper) such that A and ∆ are independent. This seems to me to be much moreconstructive than giving a proof showing that no proper probability distribution exists withthis property (I found such a proof in the literature but have mislaid the reference). However,since I am working with improper as well as proper distributions I have to be a bit carefulwith probability theory: I move to measure theory, supposing X is “distributed” accordingto a measure on (0 , ∞ ) . We understand, I am sure, what I mean by supposing that ∆ isBernoulli(1/2), independently of X , and now I can define ( A, ∆) as function of ( X, ∆) and this generates an image measure on the range of ( A, ∆) which is simply a copy ofhalf of the original improper distribution of X on (0 , ∞ ) × { } together with half of theoriginal improper distribution of X on (0 , ∞ ) × { } . We assume that this measure exhibitsindependence between A and ∆ . But that simply means that the improper distributions of X and of X are identical. Taking logarithms to base 2 the improper distributions on thewhole real line of log X and of X are identical. The distribution of log X isinvariant under a shift of size +1 and hence under all integer shifts. Such measures are easyto characterize: place an arbitrary measure on the interval [0 , and glue together all integershifts of this measure to a measure on the real line. In semi-probabilistic terms, now using { . } to denote the fractional part of a real number, { log ( X ) } and ⌊ log ( X ) ⌋ are independent,with the integer part being uniformly distributed over all integers, and the fractional parthaving an arbitrary distribution.It would be nice to show that all probability distributions of X which have ∆ and A approximately independent, are approximately of this form. The crux of the matter istherefore to choose meaningful notions of both instances of “approximate”. Also, it wouldbe nice to get rid of the special dependence on the number 2. We could just as well haveformulated the two envelopes problem using any other factor, at least, large enough to makeexchange seem attractive. If a measure on the real line is invariant under all shifts then it hasto be uniform. If it is invariant under two relatively irrational shifts then it is uniform. If it islocally bounded and invariant under all rational shifts it is uniform.So far I only succeeded in deriving some partial results, and will stick with the originalproblem with the special role of 2. c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 9
Theorem 3.
Consider a sequence of probability measures of the random variable X suchthat A and ∆ areasymptoticallyindependentinthesensethattheconditionallawof ∆ given A convergesweakly to Bernoulli(1/2).Then thetotalvariationdistancebetweenthelaws of log ( X ) and ( X ) , which is of course equal to the total variation distance betweenthelawsof X and X ,convergestozero.Conversely,convergenceof the total variation distance between the laws of X and X tozero,impliestheasymptoticindependenceof A and ∆ . Corollary 1. sup k P ( ⌊ log X ⌋ = k ) → . Corollary 2.
Thedistancebetweenanytwo(different)quantilesofthelawof X convergestoinfinity. Corollary 3.
Forall δ > , P ( X < δE ( X )) → . Conjecture 1. A and ∆ areasymptoticallyindependentif andonlyif fractionaland wholepartsof log X areasymptoticallyindependent,withthewholepartasymptoticallyuniformlydistributedoverallintegers. Examples . Suppose X is continuously uniformly distributed on the interval [1 , N ] . For a ∈ [2 , N/ , the conditional probability that A < B given A = a is exactly equal to 1/2.Outside that interval it is equal to 0 or 1. As N increases the probability of the event A ∈ [2 , N/ converges to 1/4. So A and ∆ are not asymptotically independent. The variationdistance between the laws of X and X converges to 1/2. Theorem 2 does not apply, thoughthe statement of the first corollary is true, and hence also of the next two. On the otherhand, if we take log X continuously uniformly distributed on [0 , N ] , then the asymptoticindependence does hold and hence the theorem applies, and also its corollaries. If we replacethe continuous uniform distributions by the discrete, the same things can be said. All this isconsistent with Conjecture 1. Remark 1 . Corollary 3 is going to be used to resolve the (still to be introduced) TEP-2paradox. As the proof will show, Corollary 3 is a corollary of Corollary 2, which followsfrom Corollary 1, which follows from the theorem (forwards implication).
Remark 2 . Conjecture 1 as it stands is ill-posed. Part of the problem is to extend probabilitytheory and then weak convergence theory to include improper prior distributions and allowthem to arise as “weak limits” in the new, appropriate sense. The first thing to do is to studymore examples.
Proof of Theorem 3, forward implication . To say that the conditional law of ∆ given A converges weakly to the constant law Bernoulli(1/2) means precisely that for any ǫ > and δ there exists an N ( ǫ, δ ) such that for all N ≥ N , P ( (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) > ǫ ) ≤ δ . Recallthat everything is defined here through the law of X which is supposed to depend on N . Forall N , ∆ is independent of X and Bernoulli(1/2), and A = X if ∆ = 0 , A = 2 X if ∆ = 1 .Now if (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) ≤ ǫ then P (∆ = 0 | A ) /P (∆ = 1 | A ) ≤ (1 + 2 ǫ ) / (1 − ǫ ) = c , say. Define Z = log X , let denote an indicator random variable. We have for all E , P ( Z ∈ E ) = 2 P (log A ∈ E, ∆ = 0) ≤ δ + P (cid:0) log A ∈ E, ∆ = 0 , P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c (cid:1)! ≤ δ + 2 E P (∆ = 0 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Proof of Theorem 3, forward implication . To say that the conditional law of ∆ given A converges weakly to the constant law Bernoulli(1/2) means precisely that for any ǫ > and δ there exists an N ( ǫ, δ ) such that for all N ≥ N , P ( (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) > ǫ ) ≤ δ . Recallthat everything is defined here through the law of X which is supposed to depend on N . Forall N , ∆ is independent of X and Bernoulli(1/2), and A = X if ∆ = 0 , A = 2 X if ∆ = 1 .Now if (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) ≤ ǫ then P (∆ = 0 | A ) /P (∆ = 1 | A ) ≤ (1 + 2 ǫ ) / (1 − ǫ ) = c , say. Define Z = log X , let denote an indicator random variable. We have for all E , P ( Z ∈ E ) = 2 P (log A ∈ E, ∆ = 0) ≤ δ + P (cid:0) log A ∈ E, ∆ = 0 , P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c (cid:1)! ≤ δ + 2 E P (∆ = 0 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E } ! ≤ δ + 2 cP (log A ∈ E, ∆ = 1)= 2 δ + 2 cP ( Z + 1 ∈ E, ∆ = 1)= 2 δ + 1 + 2 ǫ − ǫ P ( Z + 1 ∈ E ) . It follows that P ( Z ∈ E ) − P ( Z + 1 ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . On the other hand, reversing the roles of the events { ∆ = 0 } and { ∆ = 1 } , and starting fromthe identity P ( Z + 1 ∈ E ) = 2 P (log A ∈ E, ∆ = 1) , we obtain in exactly the same way P ( Z + 1 ∈ E ) − P ( Z ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . Since E was arbitrary this proves the claim that the total variation distance between the lawsof Z and of Z + 1 converges to zero. Proof of Theorem 3, reverse implication . This proof is left to the reader. It requires carefulchoice of two different sets E , for instance, E + = { a : P (∆ = 1 | A = a ) > / ǫ } forsome ǫ > , and E − = { a : P (∆ = 1 | A = a ) < / − ǫ } . Proof of Corollary 1 . If k maximizes P ( ⌊ Z ⌋ = k ) then applying the theorem m timeswe have the asymptotic equality of P ( ⌊ Z ⌋ = k ) , P ( ⌊ Z ⌋ + 1 = k ) , ...P ( ⌊ Z ⌋ + m = k ) .This implies that lim sup P ( ⌊ Z ⌋ = k ) ≤ / ( m + 1) . Since m was arbitrary, it follows that max k P ( ⌊ Z ⌋ = k ) → . Proof of Corollary 2 . It is obvious from Corollary 1, that the distance between two fixed(distinct) quantiles of the distribution of Z must diverge as N → ∞ . Proof of Corollary 3 . Let z α denote the upper α -quantile of the law of Z = log X , definedby P ( Z ≥ z α ) ≥ α , P ( Z > z α ) < α . Fix ǫ > . On the one hand, P ( X ≤ z ǫ ) > − ǫ. On the other hand, E ( X ) = E (2 Z ) ≥ ǫ z ǫ/ = ǫ z ǫ/ − z ǫ z ǫ . Since z ǫ/ − z ǫ → ∞ , it follows that for sufficiently large N , δE ( X ) > z ǫ and hence P ( X < δE ( X )) > − ǫ . Now for TEP-2 proper, and a shift to some issues much discussed in mathematicaleconomics and decision theory. It was quickly observed that steps 6 and 7 can’t both becorrect if we restrict attention to X having a proper probability distribution. (As I justexplained, I consider that observation to be a cheap way to resolve the TEP-1). However, c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Proof of Theorem 3, forward implication . To say that the conditional law of ∆ given A converges weakly to the constant law Bernoulli(1/2) means precisely that for any ǫ > and δ there exists an N ( ǫ, δ ) such that for all N ≥ N , P ( (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) > ǫ ) ≤ δ . Recallthat everything is defined here through the law of X which is supposed to depend on N . Forall N , ∆ is independent of X and Bernoulli(1/2), and A = X if ∆ = 0 , A = 2 X if ∆ = 1 .Now if (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) ≤ ǫ then P (∆ = 0 | A ) /P (∆ = 1 | A ) ≤ (1 + 2 ǫ ) / (1 − ǫ ) = c , say. Define Z = log X , let denote an indicator random variable. We have for all E , P ( Z ∈ E ) = 2 P (log A ∈ E, ∆ = 0) ≤ δ + P (cid:0) log A ∈ E, ∆ = 0 , P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c (cid:1)! ≤ δ + 2 E P (∆ = 0 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E } ! ≤ δ + 2 cP (log A ∈ E, ∆ = 1)= 2 δ + 2 cP ( Z + 1 ∈ E, ∆ = 1)= 2 δ + 1 + 2 ǫ − ǫ P ( Z + 1 ∈ E ) . It follows that P ( Z ∈ E ) − P ( Z + 1 ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . On the other hand, reversing the roles of the events { ∆ = 0 } and { ∆ = 1 } , and starting fromthe identity P ( Z + 1 ∈ E ) = 2 P (log A ∈ E, ∆ = 1) , we obtain in exactly the same way P ( Z + 1 ∈ E ) − P ( Z ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . Since E was arbitrary this proves the claim that the total variation distance between the lawsof Z and of Z + 1 converges to zero. Proof of Theorem 3, reverse implication . This proof is left to the reader. It requires carefulchoice of two different sets E , for instance, E + = { a : P (∆ = 1 | A = a ) > / ǫ } forsome ǫ > , and E − = { a : P (∆ = 1 | A = a ) < / − ǫ } . Proof of Corollary 1 . If k maximizes P ( ⌊ Z ⌋ = k ) then applying the theorem m timeswe have the asymptotic equality of P ( ⌊ Z ⌋ = k ) , P ( ⌊ Z ⌋ + 1 = k ) , ...P ( ⌊ Z ⌋ + m = k ) .This implies that lim sup P ( ⌊ Z ⌋ = k ) ≤ / ( m + 1) . Since m was arbitrary, it follows that max k P ( ⌊ Z ⌋ = k ) → . Proof of Corollary 2 . It is obvious from Corollary 1, that the distance between two fixed(distinct) quantiles of the distribution of Z must diverge as N → ∞ . Proof of Corollary 3 . Let z α denote the upper α -quantile of the law of Z = log X , definedby P ( Z ≥ z α ) ≥ α , P ( Z > z α ) < α . Fix ǫ > . On the one hand, P ( X ≤ z ǫ ) > − ǫ. On the other hand, E ( X ) = E (2 Z ) ≥ ǫ z ǫ/ = ǫ z ǫ/ − z ǫ z ǫ . Since z ǫ/ − z ǫ → ∞ , it follows that for sufficiently large N , δE ( X ) > z ǫ and hence P ( X < δE ( X )) > − ǫ . Now for TEP-2 proper, and a shift to some issues much discussed in mathematicaleconomics and decision theory. It was quickly observed that steps 6 and 7 can’t both becorrect if we restrict attention to X having a proper probability distribution. (As I justexplained, I consider that observation to be a cheap way to resolve the TEP-1). However, c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 11 it also did not take long for many authors to discover probability distributions of X such that E ( B | A = a ) > a for all a , or more concisely, E ( B | A ) > A . Thus the paradox appears tobe resurrected since there are situations in which it appears rational to exchange envelopeswithout knowledge of the content of your envelope. Here is just one such example: let X be2 to the power of a geometrically distributed random variable with parameter p = 1 / ; to beprecise, P ( X = 2 n ) = 2 n / n +1 , n = 0 , , ... . When A = 1 , with certainly A < B . For anyother possible value of A it turns out that P ( A < B | A ) = 3 / and E ( B | A ) = 11 A/ >A except when A = 1 , when E ( B | A ) = 2 > A .Equally quickly, it was noticed that such examples always had E ( X ) = ∞ . This isnecessary, since on taking expectation values again, it follows from E ( B | A ) > A that E ( B ) > E ( A ) ... or that E ( B ) = E ( A ) = ∞ . But we know a priori (by symmetry) that E ( B ) = E ( A ) , and indeed E ( B ) = E ( A ) = 3 E ( X ) / since the expected amount in bothenvelopes together is E ( X ) . Hence all such examples must indeed have E ( X ) = ∞ .Why does this observation resolve the paradox? Well, because if the expectationvalues of A and B are infinite, you will always be disappointed with what you get, onchoosing and opening either envelope. As Keynes famously said, in the long run we aredead. Why are expectation values supposed to be interesting? Because they are supposed toapproximate long run averages. But if the infinitely long run average is infinite, any finiteaverage is disappointing. In the mathematical economics literature, as well as our probabilitydistributions expressing our beliefs we have our utilities expressing our value to be assigned toany outcome. Standard economic theory assumes that utilities are bounded. That is supposedto keep paradoxes from the door.Well, that is the point of view in mathematical economics. Again, I think it is a toocheap way out. In mathematical models it is often perfectly justified to use probabilitydistributions with infinite ranges, and even with infinite expectation values, as convenient,realistic, legitimate mathematical approximations to real life distributions, even though somewould insist that all “real” distributions actually have bounded support and definitely finiteexpectation value. The point is, that that point is irrelevant. The fields of mathematicalfinance, climatology, meteorology, geophysics abound with examples. The important pointis the fact that in the real world it is quite possible for averages of a number of independentobservations of X to be always far less than the mathematical expectation value of X withoverwhelming probability. Take a distribution of X on the positive real line with infiniteexpectation and leading to E ( B | A ) > A and truncate it so far to the right that even a millionindependent observations from X would hardly ever contain one observation exceeding thetruncation value. Call the truncated distribution that of X ′ and use it instead of X to set upTEP-2. You’ll find E ( B | A ) > A with huge probability so step 8 suggests you should switchenvelopes. But the gain is illusory, since this is a situation where the average of a huge numberof copies of X is still far smaller than their expectation value. Expectation value is no guideto decision, even though everything is as finite as you like.Some philosophers working on the margins of the foundations of the theory of utilitydo write papers trying to set up a theory of utility which allows unbounded utilities, and useTEP-2 as a test case for such theories. For the reasons just expressed, I think they are barkingup a completely wrong tree.This is where I also return to my intermediate (between TEP-1 and TEP-2) resolution:the author was perhaps a Bayesian using a prior distribution perfectly appropriate to expressalmost complete lack of knowledge about X . Corollary 3 says that as she must admit to c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Proof of Theorem 3, forward implication . To say that the conditional law of ∆ given A converges weakly to the constant law Bernoulli(1/2) means precisely that for any ǫ > and δ there exists an N ( ǫ, δ ) such that for all N ≥ N , P ( (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) > ǫ ) ≤ δ . Recallthat everything is defined here through the law of X which is supposed to depend on N . Forall N , ∆ is independent of X and Bernoulli(1/2), and A = X if ∆ = 0 , A = 2 X if ∆ = 1 .Now if (cid:12)(cid:12) P (∆ = 1 | A ) − (cid:12)(cid:12) ≤ ǫ then P (∆ = 0 | A ) /P (∆ = 1 | A ) ≤ (1 + 2 ǫ ) / (1 − ǫ ) = c , say. Define Z = log X , let denote an indicator random variable. We have for all E , P ( Z ∈ E ) = 2 P (log A ∈ E, ∆ = 0) ≤ δ + P (cid:0) log A ∈ E, ∆ = 0 , P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c (cid:1)! ≤ δ + 2 E P (∆ = 0 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E, P (∆ = 0 | A ) P (∆ = 1 | A ) ≤ c ) } ! ≤ δ + 2 cE P (∆ = 1 | A ) { log A ∈ E } ! ≤ δ + 2 cP (log A ∈ E, ∆ = 1)= 2 δ + 2 cP ( Z + 1 ∈ E, ∆ = 1)= 2 δ + 1 + 2 ǫ − ǫ P ( Z + 1 ∈ E ) . It follows that P ( Z ∈ E ) − P ( Z + 1 ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . On the other hand, reversing the roles of the events { ∆ = 0 } and { ∆ = 1 } , and starting fromthe identity P ( Z + 1 ∈ E ) = 2 P (log A ∈ E, ∆ = 1) , we obtain in exactly the same way P ( Z + 1 ∈ E ) − P ( Z ∈ E ) ≤ δ + 4 ǫ/ (1 − ǫ ) . Since E was arbitrary this proves the claim that the total variation distance between the lawsof Z and of Z + 1 converges to zero. Proof of Theorem 3, reverse implication . This proof is left to the reader. It requires carefulchoice of two different sets E , for instance, E + = { a : P (∆ = 1 | A = a ) > / ǫ } forsome ǫ > , and E − = { a : P (∆ = 1 | A = a ) < / − ǫ } . Proof of Corollary 1 . If k maximizes P ( ⌊ Z ⌋ = k ) then applying the theorem m timeswe have the asymptotic equality of P ( ⌊ Z ⌋ = k ) , P ( ⌊ Z ⌋ + 1 = k ) , ...P ( ⌊ Z ⌋ + m = k ) .This implies that lim sup P ( ⌊ Z ⌋ = k ) ≤ / ( m + 1) . Since m was arbitrary, it follows that max k P ( ⌊ Z ⌋ = k ) → . Proof of Corollary 2 . It is obvious from Corollary 1, that the distance between two fixed(distinct) quantiles of the distribution of Z must diverge as N → ∞ . Proof of Corollary 3 . Let z α denote the upper α -quantile of the law of Z = log X , definedby P ( Z ≥ z α ) ≥ α , P ( Z > z α ) < α . Fix ǫ > . On the one hand, P ( X ≤ z ǫ ) > − ǫ. On the other hand, E ( X ) = E (2 Z ) ≥ ǫ z ǫ/ = ǫ z ǫ/ − z ǫ z ǫ . Since z ǫ/ − z ǫ → ∞ , it follows that for sufficiently large N , δE ( X ) > z ǫ and hence P ( X < δE ( X )) > − ǫ . Now for TEP-2 proper, and a shift to some issues much discussed in mathematicaleconomics and decision theory. It was quickly observed that steps 6 and 7 can’t both becorrect if we restrict attention to X having a proper probability distribution. (As I justexplained, I consider that observation to be a cheap way to resolve the TEP-1). However, c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 11 it also did not take long for many authors to discover probability distributions of X such that E ( B | A = a ) > a for all a , or more concisely, E ( B | A ) > A . Thus the paradox appears tobe resurrected since there are situations in which it appears rational to exchange envelopeswithout knowledge of the content of your envelope. Here is just one such example: let X be2 to the power of a geometrically distributed random variable with parameter p = 1 / ; to beprecise, P ( X = 2 n ) = 2 n / n +1 , n = 0 , , ... . When A = 1 , with certainly A < B . For anyother possible value of A it turns out that P ( A < B | A ) = 3 / and E ( B | A ) = 11 A/ >A except when A = 1 , when E ( B | A ) = 2 > A .Equally quickly, it was noticed that such examples always had E ( X ) = ∞ . This isnecessary, since on taking expectation values again, it follows from E ( B | A ) > A that E ( B ) > E ( A ) ... or that E ( B ) = E ( A ) = ∞ . But we know a priori (by symmetry) that E ( B ) = E ( A ) , and indeed E ( B ) = E ( A ) = 3 E ( X ) / since the expected amount in bothenvelopes together is E ( X ) . Hence all such examples must indeed have E ( X ) = ∞ .Why does this observation resolve the paradox? Well, because if the expectationvalues of A and B are infinite, you will always be disappointed with what you get, onchoosing and opening either envelope. As Keynes famously said, in the long run we aredead. Why are expectation values supposed to be interesting? Because they are supposed toapproximate long run averages. But if the infinitely long run average is infinite, any finiteaverage is disappointing. In the mathematical economics literature, as well as our probabilitydistributions expressing our beliefs we have our utilities expressing our value to be assigned toany outcome. Standard economic theory assumes that utilities are bounded. That is supposedto keep paradoxes from the door.Well, that is the point of view in mathematical economics. Again, I think it is a toocheap way out. In mathematical models it is often perfectly justified to use probabilitydistributions with infinite ranges, and even with infinite expectation values, as convenient,realistic, legitimate mathematical approximations to real life distributions, even though somewould insist that all “real” distributions actually have bounded support and definitely finiteexpectation value. The point is, that that point is irrelevant. The fields of mathematicalfinance, climatology, meteorology, geophysics abound with examples. The important pointis the fact that in the real world it is quite possible for averages of a number of independentobservations of X to be always far less than the mathematical expectation value of X withoverwhelming probability. Take a distribution of X on the positive real line with infiniteexpectation and leading to E ( B | A ) > A and truncate it so far to the right that even a millionindependent observations from X would hardly ever contain one observation exceeding thetruncation value. Call the truncated distribution that of X ′ and use it instead of X to set upTEP-2. You’ll find E ( B | A ) > A with huge probability so step 8 suggests you should switchenvelopes. But the gain is illusory, since this is a situation where the average of a huge numberof copies of X is still far smaller than their expectation value. Expectation value is no guideto decision, even though everything is as finite as you like.Some philosophers working on the margins of the foundations of the theory of utilitydo write papers trying to set up a theory of utility which allows unbounded utilities, and useTEP-2 as a test case for such theories. For the reasons just expressed, I think they are barkingup a completely wrong tree.This is where I also return to my intermediate (between TEP-1 and TEP-2) resolution:the author was perhaps a Bayesian using a prior distribution perfectly appropriate to expressalmost complete lack of knowledge about X . Corollary 3 says that as she must admit to c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls having a tiny bit of information, steps 6 and 7 are only approximately correct, not exactly, butnow the resolution of the paradox is that in this situation the expectation value of X is so farto the right of where the bulk of its probability distribution lies, that expectation values areno guide to action. It is step 8 which fails. This is a situation where Keynes has the last word.Back to TEP-1: since the writer is not working explicitly in a particular formalframework, we do not know what he or she is trying to do. There is not a unique resolutionto the paradox of the type “step so-and-so fails”. There is not a unique explanation of “whatwent wrong”. Looking for one is illusory. Unless we take the higher point of view and say: thewriter was trying to do probability theory but without knowing its concepts, let alone its rules,and he or she screwed up big time by not making distinctions which in probability theory arecrucial to make. TEP-1 is the kind of reason that formal probability theory was invented.Philosophers who work on TEP-1 without knowing modern (elementary) probability arelargely wasting their own time; at best they will reinvent the wheel.
3. TEP-3
Next we start analysing the situation when we do look in Envelope A before decidingwhether to switch or stay. If there is a given probability distribution of X this just becomesan exercise in Bayesian probability calculations. Typically there is a threshhold value abovewhich we do not switch. But all kinds of strange things can happen. If a probabilitydistribution of X is not given we come to the randomized solution of Cover (1987) wherewe compare A to a random “probe” of our own choosing.Here is the problem, in Cover’s words: Player 1 writes down any two distinct numberson separate slips of paper. Player 2 randomly chooses one of these slips of paper and looksat the number. Player 2 must decide whether the number in his hand is the larger of the twonumbers. He can be right with probability one-half, by just guessing. It seems absurd that hecan do better .Spoiler alert. How can he do better? (Cover does not give the answer, but he does knowthat there is one). Here it is. Player 2 picks a number with a positive probability densitywith respect to Lebesgue measure on the real line. For any non-empty interval, there ispositive probability that it lies in that interval. Hence there is positive probability that it liesbetween the two numbers written down by Player 1. Now Player 2 uses his random numberas surrogate for “the other number”. He’ll give the right answer when his own number is inbetween Player 1’s numbers, but when his number is outside of the range of Player 1’s twonumbers, he guesses right with probability one half. His overall probability of getting it rightis strictly larger than a half.
4. TEP-0
This is of course the “TEP without probability” of Smullyan (1992).
Let the amountin the envelope chosen by the player be A . By swapping, the player may gain A or lose A/ . So the potential gain is strictly greater than the potential loss. But let the amounts inthe envelopes be X and X . Now by swapping, the player may gain X or lose X . So thepotential gain is equal to the potential loss .The short resolution is simply: the problem is using the same words (potential gain,loss) to describe different things. But different resolutions are possible depending on what c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Let the amountin the envelope chosen by the player be A . By swapping, the player may gain A or lose A/ . So the potential gain is strictly greater than the potential loss. But let the amounts inthe envelopes be X and X . Now by swapping, the player may gain X or lose X . So thepotential gain is equal to the potential loss .The short resolution is simply: the problem is using the same words (potential gain,loss) to describe different things. But different resolutions are possible depending on what c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 13 one thinks was the intention of the writer. One can try to embed the argument(s) intocounterfactual reasoning. Or one also can point out that the key information that Envelope Ais chosen at random is not being used in Smullyan’s arguments. So this is a problem in logicand this time an example of screwed up logic. Philosophers have lots of ways to clean up thisparticular mess.
5. History
So far I neglected to mention that TEP was a remake of the 1953 two-neckties problem, Kraitchik (1953), of Maurice Kraitchik (1882-1957), a Belgian mathematician andpopulariser of mathematics born in Minsk. An earlier (1943) edition of Kraitchik’s book“Mathematical recreations” exists, I do not know if that one already contains the problem.
Two men are each given a necktie by their respective wives as a Christmas present. Overdrinks they start arguing over who has the cheaper necktie. They agree to have a wager overit. They will consult their wives and find out which necktie is more expensive. The terms of thebet are that the man with the more expensive necktie has to give it to the other as the prize.The first man reasons as follows: winning and losing are equally likely. If I lose, then I losethe value of my necktie. But if I win, then I win more than the value of my necktie. Therefore,the wager is to my advantage. The second man can consider the wager in exactly the sameway; thus, paradoxically, it seems both men have the advantage in the bet. This is obviouslynot possible (assuming both prefer the more expensive necktie) .Kraitchik’s main interests were the theory of numbers and recreational mathematics.The two neckties became two wallets with Gardner (1982) and two envelopes with Zabell(1988a), Zabell (1988b), Nalebuff (1988), Nalebuff (1989) and Gardner (1989). Zabell gavethe wide class of problems the name exchange paradox . He explains that he heard of theproblem from Steve Budrys of the Odesta corporation, and also that he discussed it with lotsof other people. Nalebuff tells that he got it from Hal Varian who got it from Sandy Zabell.Zabell (a subjective Bayesian) starts with introducing a third player, Player C, who fills thetwo envelopes and gives one to Player A and one to Player B. We are not initially told that Cdoes this “at random”. Hence the other players’ prior beliefs about Player C would certainlyinfluence their own decisions. Zabell does go on to focus on the symmetric case that playerC is known to be a neutral referee. Nalebuff focussed on a non-symmetric version now calledthe Ali and Baba problem. Since my focus is on the symmetric case I do not write out the(simple) details here. He neatly retains the paradox that both Ali and Baba, after imagining looking in their envelopes, seem to have a good reason to want to switch with the other. Apossible ancestry goes back to a problem proposed by Schr¨odinger, quoted in Littlewood(1953). A highly disguised appearance of the paradox occurred in Blackwell (1951). So inthe movie paradigm, TEP is actually a remake of an almost forgotten classic.All the symmetric versions of the problem have exactly the same key feature and thesame resolution: there is a pair of random variables A , B whose distribution is invariantunder exchange. They have positive probability to be different; on conditioning that theyare different, we may pretend they are certainly different. Hence by our little Theorem 2 atthe end of Section 1, the random variable A cannot be independent of the event { A < B } ,or equivalently, the event { A < B } cannot be independent of the random variable A . Or ...there is an improper prior lurking behind the scenes, expectations are infinite, and exchangeis futile. c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Two men are each given a necktie by their respective wives as a Christmas present. Overdrinks they start arguing over who has the cheaper necktie. They agree to have a wager overit. They will consult their wives and find out which necktie is more expensive. The terms of thebet are that the man with the more expensive necktie has to give it to the other as the prize.The first man reasons as follows: winning and losing are equally likely. If I lose, then I losethe value of my necktie. But if I win, then I win more than the value of my necktie. Therefore,the wager is to my advantage. The second man can consider the wager in exactly the sameway; thus, paradoxically, it seems both men have the advantage in the bet. This is obviouslynot possible (assuming both prefer the more expensive necktie) .Kraitchik’s main interests were the theory of numbers and recreational mathematics.The two neckties became two wallets with Gardner (1982) and two envelopes with Zabell(1988a), Zabell (1988b), Nalebuff (1988), Nalebuff (1989) and Gardner (1989). Zabell gavethe wide class of problems the name exchange paradox . He explains that he heard of theproblem from Steve Budrys of the Odesta corporation, and also that he discussed it with lotsof other people. Nalebuff tells that he got it from Hal Varian who got it from Sandy Zabell.Zabell (a subjective Bayesian) starts with introducing a third player, Player C, who fills thetwo envelopes and gives one to Player A and one to Player B. We are not initially told that Cdoes this “at random”. Hence the other players’ prior beliefs about Player C would certainlyinfluence their own decisions. Zabell does go on to focus on the symmetric case that playerC is known to be a neutral referee. Nalebuff focussed on a non-symmetric version now calledthe Ali and Baba problem. Since my focus is on the symmetric case I do not write out the(simple) details here. He neatly retains the paradox that both Ali and Baba, after imagining looking in their envelopes, seem to have a good reason to want to switch with the other. Apossible ancestry goes back to a problem proposed by Schr¨odinger, quoted in Littlewood(1953). A highly disguised appearance of the paradox occurred in Blackwell (1951). So inthe movie paradigm, TEP is actually a remake of an almost forgotten classic.All the symmetric versions of the problem have exactly the same key feature and thesame resolution: there is a pair of random variables A , B whose distribution is invariantunder exchange. They have positive probability to be different; on conditioning that theyare different, we may pretend they are certainly different. Hence by our little Theorem 2 atthe end of Section 1, the random variable A cannot be independent of the event { A < B } ,or equivalently, the event { A < B } cannot be independent of the random variable A . Or ...there is an improper prior lurking behind the scenes, expectations are infinite, and exchangeis futile. c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls
6. Conclusions
Over the years, frequentist probabilists, Bayesian probabilists, logicians, philosophers,and mathematical economists, have all taken a too narrow view of TEP, blind to the existenceof other scientific communities. Obviously, the present author is the first to step outside ofthe narrow confines of their own discipline! Since probability calculus was invented so as toprovide a decent language to enable the world to move on from problems like TEP, why doso many philosophers still insist on clumsy pre-probability “solutions” which are so vagueas to be useless? But how come Martin Gardner couldn’t solve TEP? And why did so manybiggish names deduce that X must have a uniform distribution on (0 , ∞ ) , while in fact it’s log X which must be uniform on ( −∞ , ∞ ) , to preserve the validity of steps 6 and 7 (if thespecial number “2” is made arbitrary)? Why did so many authors take a cheap way out toresolve the paradox? It’s clear that most people find TEP irritating . It is not a fun problemlike MHP (Monty Hall problem).I hope this paper shows that there are both subtle and fascinating aspects to TEP andprobably even some more interesting maths, if not philosophy, to be done. I did not succeedin showing that limiting independence of A and ∆ implied that ⌊ log X ⌋ is asymptoticallyuniform and asymptotically independent of { log X } . I could not do this because I don’tyet have a way to express formally what I want to prove, since in the limit I am outside ofconventional probability theory.There are certainly some important lessons to people who build probability models inthe real world. One should be wary of infinities, but please let’s be wary of them for the goodreasons, not for non-reasons.I think it helps a great deal to bear the Anna Karenina principle in mind, when tacklinga logical paradox like TEP. Note that the TEP argument is informal. Steps are partly justified,but not fully justified. In order to “point a finger” at the mistake, the steps need to be amplified.But why should there only be one way to amplify the steps of the argument so as to fit in tosome logical – but failing – argument? And why should the failed argument only fail atone step? The writer does not make explicit within which logical framework he is working.We neither know his assumptions nor his intention. Whatever they are, he must be makinga mistake, since his conclusion is self-contradictory. But one cannot say that whatever thecontext and whatever the intention, the mistake is made at the same place. It is hard to be surethat there are no other reasonable contexts and intentions than those which have appearedso far in the literature. As the paradox evolved and migrated to new fields it mutated aswell: from its humble origin in recreational mathematics (where it was invented by experts innumber theory so as to confuse amateurs) it mutated and migrated to statistics, mathematicaleconomics and to philosophy.I also found the Anna Karenina principle very useful when arguing with researchers inthe foundations of quantum mechanics, who believe that Bell’s theorem is false. The theoremin question states that quantum mechanics is incompatible with “local realism” – the worldview of Einstein. The theorem, or a formal mathematical version of it, is clearly correct, andit has stood up to more than fifty years of intense scrutiny and much opposition. Again andagain, very smart people come up with counterexamples. There is always a mistake in theircounter-example, but they will always deny that that is a mistake. Like a persistent student,they will rewrite their manuscript adding new technical detail correcting the mistake they hadmade before, and hiding a new one buried deeper still in long computations. For some nice c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Over the years, frequentist probabilists, Bayesian probabilists, logicians, philosophers,and mathematical economists, have all taken a too narrow view of TEP, blind to the existenceof other scientific communities. Obviously, the present author is the first to step outside ofthe narrow confines of their own discipline! Since probability calculus was invented so as toprovide a decent language to enable the world to move on from problems like TEP, why doso many philosophers still insist on clumsy pre-probability “solutions” which are so vagueas to be useless? But how come Martin Gardner couldn’t solve TEP? And why did so manybiggish names deduce that X must have a uniform distribution on (0 , ∞ ) , while in fact it’s log X which must be uniform on ( −∞ , ∞ ) , to preserve the validity of steps 6 and 7 (if thespecial number “2” is made arbitrary)? Why did so many authors take a cheap way out toresolve the paradox? It’s clear that most people find TEP irritating . It is not a fun problemlike MHP (Monty Hall problem).I hope this paper shows that there are both subtle and fascinating aspects to TEP andprobably even some more interesting maths, if not philosophy, to be done. I did not succeedin showing that limiting independence of A and ∆ implied that ⌊ log X ⌋ is asymptoticallyuniform and asymptotically independent of { log X } . I could not do this because I don’tyet have a way to express formally what I want to prove, since in the limit I am outside ofconventional probability theory.There are certainly some important lessons to people who build probability models inthe real world. One should be wary of infinities, but please let’s be wary of them for the goodreasons, not for non-reasons.I think it helps a great deal to bear the Anna Karenina principle in mind, when tacklinga logical paradox like TEP. Note that the TEP argument is informal. Steps are partly justified,but not fully justified. In order to “point a finger” at the mistake, the steps need to be amplified.But why should there only be one way to amplify the steps of the argument so as to fit in tosome logical – but failing – argument? And why should the failed argument only fail atone step? The writer does not make explicit within which logical framework he is working.We neither know his assumptions nor his intention. Whatever they are, he must be makinga mistake, since his conclusion is self-contradictory. But one cannot say that whatever thecontext and whatever the intention, the mistake is made at the same place. It is hard to be surethat there are no other reasonable contexts and intentions than those which have appearedso far in the literature. As the paradox evolved and migrated to new fields it mutated aswell: from its humble origin in recreational mathematics (where it was invented by experts innumber theory so as to confuse amateurs) it mutated and migrated to statistics, mathematicaleconomics and to philosophy.I also found the Anna Karenina principle very useful when arguing with researchers inthe foundations of quantum mechanics, who believe that Bell’s theorem is false. The theoremin question states that quantum mechanics is incompatible with “local realism” – the worldview of Einstein. The theorem, or a formal mathematical version of it, is clearly correct, andit has stood up to more than fifty years of intense scrutiny and much opposition. Again andagain, very smart people come up with counterexamples. There is always a mistake in theircounter-example, but they will always deny that that is a mistake. Like a persistent student,they will rewrite their manuscript adding new technical detail correcting the mistake they hadmade before, and hiding a new one buried deeper still in long computations. For some nice c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 15 open probability problems in this field with distinctly geometric flavour, and not needing anyknowledge of quantum mechanics, see Gill (2020).I find the analogy with the Aliens movie franchise also useful. TEP tells us howimportant it is to make distinctions. People who write about TEP should be careful todistinguish TEP-1 from the whole franchise. We have this whole franchise precisely becauseof the Anna Karenina principle. Anna Karenina meets Aliens on the back of a few envelopes.I am looking forward to new papers on TEP, if necessary shredding my own. Arrogancedeserves to be punished.The bibliography to this paper contains a list of all the papers I have studied whilewriting this one. Many are not cited in the body of this paper, but they have all influenced inone way or another the whole paper. Many of the books are listed with their date of originalpublication, but with the publisher which presently provides a “second (or later) edition”.I first started working on this topic through getting involved in Wikipedia discussions, orperhaps one could better say, fights, which somewhat like many court cases (especially in civillaw, but also in criminal law) were typically resolved in favour of editors who could recite atlength from the Wikipedia rule book while blind drunk if not asleep. Logic or truth are notcriteria which a Wikipedia editor is allowed to use. Instead, the key notions to justify inclusionare “reliable source”, “notability”, and “neutral point of view”. Elementary arithmetic isallowed, but elementary logic is disqualified as being “own research”. Anyway, I’m especiallyindepted to the Wikipedia editor “iNic” who maintains an extensive bibliography on aWikipedia talk page https://en.wikipedia.org/wiki/Talk:Two_envelopes_problem/Literature .I particularly like the quote he gives, from Syverson (2010), “Indeed if there is anythinginherently unbounded about the two-envelope paradox, it is that each search will uncover atleast one more reference”.The Wikipedia page on TEP is still (March 2020) problematic. Please cite my presentpaper in many future peer-reviewed publications by yourself, in order that it may become anauthoritative source for future wikipedia editors.Almost absent are papers on the quantum two envelope problem. This is surprising inview of the rich literature on quantum versions of MHP (the Monty Hall or three doorsproblem), in particular D’Ariano et al. (2002). And what led Schr¨odinger to the problem?The interesting paper Ergodos (2014) at least mentions the possibility of a quantum TEP. Arecent discovery which I have yet to digest is Cheong, Saakian & Zadourian (2017). c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
Over the years, frequentist probabilists, Bayesian probabilists, logicians, philosophers,and mathematical economists, have all taken a too narrow view of TEP, blind to the existenceof other scientific communities. Obviously, the present author is the first to step outside ofthe narrow confines of their own discipline! Since probability calculus was invented so as toprovide a decent language to enable the world to move on from problems like TEP, why doso many philosophers still insist on clumsy pre-probability “solutions” which are so vagueas to be useless? But how come Martin Gardner couldn’t solve TEP? And why did so manybiggish names deduce that X must have a uniform distribution on (0 , ∞ ) , while in fact it’s log X which must be uniform on ( −∞ , ∞ ) , to preserve the validity of steps 6 and 7 (if thespecial number “2” is made arbitrary)? Why did so many authors take a cheap way out toresolve the paradox? It’s clear that most people find TEP irritating . It is not a fun problemlike MHP (Monty Hall problem).I hope this paper shows that there are both subtle and fascinating aspects to TEP andprobably even some more interesting maths, if not philosophy, to be done. I did not succeedin showing that limiting independence of A and ∆ implied that ⌊ log X ⌋ is asymptoticallyuniform and asymptotically independent of { log X } . I could not do this because I don’tyet have a way to express formally what I want to prove, since in the limit I am outside ofconventional probability theory.There are certainly some important lessons to people who build probability models inthe real world. One should be wary of infinities, but please let’s be wary of them for the goodreasons, not for non-reasons.I think it helps a great deal to bear the Anna Karenina principle in mind, when tacklinga logical paradox like TEP. Note that the TEP argument is informal. Steps are partly justified,but not fully justified. In order to “point a finger” at the mistake, the steps need to be amplified.But why should there only be one way to amplify the steps of the argument so as to fit in tosome logical – but failing – argument? And why should the failed argument only fail atone step? The writer does not make explicit within which logical framework he is working.We neither know his assumptions nor his intention. Whatever they are, he must be makinga mistake, since his conclusion is self-contradictory. But one cannot say that whatever thecontext and whatever the intention, the mistake is made at the same place. It is hard to be surethat there are no other reasonable contexts and intentions than those which have appearedso far in the literature. As the paradox evolved and migrated to new fields it mutated aswell: from its humble origin in recreational mathematics (where it was invented by experts innumber theory so as to confuse amateurs) it mutated and migrated to statistics, mathematicaleconomics and to philosophy.I also found the Anna Karenina principle very useful when arguing with researchers inthe foundations of quantum mechanics, who believe that Bell’s theorem is false. The theoremin question states that quantum mechanics is incompatible with “local realism” – the worldview of Einstein. The theorem, or a formal mathematical version of it, is clearly correct, andit has stood up to more than fifty years of intense scrutiny and much opposition. Again andagain, very smart people come up with counterexamples. There is always a mistake in theircounter-example, but they will always deny that that is a mistake. Like a persistent student,they will rewrite their manuscript adding new technical detail correcting the mistake they hadmade before, and hiding a new one buried deeper still in long computations. For some nice c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 15 open probability problems in this field with distinctly geometric flavour, and not needing anyknowledge of quantum mechanics, see Gill (2020).I find the analogy with the Aliens movie franchise also useful. TEP tells us howimportant it is to make distinctions. People who write about TEP should be careful todistinguish TEP-1 from the whole franchise. We have this whole franchise precisely becauseof the Anna Karenina principle. Anna Karenina meets Aliens on the back of a few envelopes.I am looking forward to new papers on TEP, if necessary shredding my own. Arrogancedeserves to be punished.The bibliography to this paper contains a list of all the papers I have studied whilewriting this one. Many are not cited in the body of this paper, but they have all influenced inone way or another the whole paper. Many of the books are listed with their date of originalpublication, but with the publisher which presently provides a “second (or later) edition”.I first started working on this topic through getting involved in Wikipedia discussions, orperhaps one could better say, fights, which somewhat like many court cases (especially in civillaw, but also in criminal law) were typically resolved in favour of editors who could recite atlength from the Wikipedia rule book while blind drunk if not asleep. Logic or truth are notcriteria which a Wikipedia editor is allowed to use. Instead, the key notions to justify inclusionare “reliable source”, “notability”, and “neutral point of view”. Elementary arithmetic isallowed, but elementary logic is disqualified as being “own research”. Anyway, I’m especiallyindepted to the Wikipedia editor “iNic” who maintains an extensive bibliography on aWikipedia talk page https://en.wikipedia.org/wiki/Talk:Two_envelopes_problem/Literature .I particularly like the quote he gives, from Syverson (2010), “Indeed if there is anythinginherently unbounded about the two-envelope paradox, it is that each search will uncover atleast one more reference”.The Wikipedia page on TEP is still (March 2020) problematic. Please cite my presentpaper in many future peer-reviewed publications by yourself, in order that it may become anauthoritative source for future wikipedia editors.Almost absent are papers on the quantum two envelope problem. This is surprising inview of the rich literature on quantum versions of MHP (the Monty Hall or three doorsproblem), in particular D’Ariano et al. (2002). And what led Schr¨odinger to the problem?The interesting paper Ergodos (2014) at least mentions the possibility of a quantum TEP. Arecent discovery which I have yet to digest is Cheong, Saakian & Zadourian (2017). c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls References A LBERS , C.J., K
OOI , B.P. & S
CHAAFSMA , W. (2005). Trying to Resolve the Two-Envelope Problem.Synthese , 499–500.B
INDER , D.A. (1993). Comment on Christensen and Utts (1992). TheAmericanStatistician , 160.B LACHMAN , N.M., C
HRISTENSEN , R. & U
TTS , J.M. (1996). Comment on Christensen and Utts (1992).TheAmericanStatistician , 99.B LACHMAN , N.M. & K
ILGOUR , D.M. (2001). Elusive optimality in the box problem. MathematicsMagazine , 171–181.B LACKWELL , D. (1951). On the translation parameter problem for discrete variables. TheAnnalsofMathematicalStatistics , 393–399.B RAMS , S.J. & K
ILGOUR , D.M. (1995). The box problem: to switch or not to switch. MathematicsMagazine , 27–34.B UTLER , S.F. & N
ICKERSON , R.S. (2008). Keep or trade? An experimental study of the exchangeparadox. Thinking&Reasoning , 365–394.C HALMERS , D.J. (1994). The two-envelope paradox: a complete analysis. URL http://consc.net/papers/envelope.html .C HALMERS , D.J. (2002). The St. Petersburg two-envelope paradox. Analysis , 155–157.C HASE , J. (2002). The non-probabilistic two envelope paradox. Analysis , 157–160.C HEN , G.J. (2007). The Puzzle of the Two-Envelope Puzzle. AvailableatSSRN1132506 .C
HEONG , K.H., S
AAKIAN , D.B. & Z
ADOURIAN , R. (2017). Allison mixture and the two-envelopeproblem. Phys.Rev.E , 062303.C HIHARA , C.S. (1995). The mystery of Julius: A paradox in decision theory. PhilosophicalStudies ,1–16.C HRISTENSEN , R. & U
TTS , J. (1992). Bayesian resolution of the “exchange paradox”. TheAmericanStatistician , 274–276.C OVER , T.M. (1987). Pick the largest number. In Openproblemsincommunicationandcomputation.Springer, pp. 152–152.D’A
RIANO , G.M., G
ILL , R.D., K
EYL , M., K
UEMMERER , B., H., M. & W
ERNER , R.F. (2002). TheQuantum Monty Hall Problem. Quant.Inf.Comput. , 355–366.D OUVEN , I. (2007). A three-step solution to the two-envelope paradox. LogiqueetAnalyse , 359–365.E CKHARDT , W. (2013). ParadoxesinProbabilityTheory. Springer.E
RGODOS , N. (2014). The enigma of probability. Journal ofCognitionandNeuroethics , 37–71.F ALK , R. (2008). The unrelenting exchange paradox. TeachingStatistics , 86–88.F ALK , R. & K
ONOLD , C. (1992). The psychology of learning probability. Statisticsforthetwenty-firstcentury , 151–164.F
ALK , R. & N
ICKERSON , R.S. (2009). An inside look at the two envelopes paradox. TeachingStatistics , 39–41.F ALLIS , D. (2009). Taking the Two Envelope Paradox to the Limit. SouthwestPhilosophyReview ,95–111.G ARDNER , M. (1982). Aha!Gotcha:Paradoxes topuzzleanddelight. WH Freeman New York.G
ARDNER , M. (1989). PenroseTilestoTrapdoorCiphers:AndtheReturnofDrMatrix. CambridgeUniversity Press.G
ILL , R.D. (2011). The Monty Hall problem is not a probability puzzle (it’s a challenge in mathematicalmodelling). StatisticaNeerlandica , 58–71.G ILL , R.D. (2020). The triangle wave versus the cosine: How classical systems can optimally approximateEPR-B correlations. Entropy , 87.I SHIKAWA , S. (2014). The two envelopes paradox in non-Bayesian and Bayesian statistics. arXivpreprintarXiv:1408.4916 .K
ATZ , B.D. & O
LIN , D. (2007). A tale of two envelopes. Mind , 903–926.K
ATZ , B.D. & O
LIN , D. (2010). Conditionals, Probabilities, and Utilities: More on Two Envelopes. Mind , 171–183.K
RAITCHIK , M. (1953). Mathematicalrecreations. Courier Corporation.L
ANGTRY , B. (2004). The classical and maximin versions of the two-envelope paradox. TheAustralasianJournalofLogic .L ITTLEWOOD , J.E. (1953). Littlewood’smiscellany. Cambridge University Press.L
OREDO , T. (2004). The two-envelope paradox. URL http://hosting.astro.cornell.edu/staff/loredo/bayes/two-envelope.pdf . c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing
OREDO , T. (2004). The two-envelope paradox. URL http://hosting.astro.cornell.edu/staff/loredo/bayes/two-envelope.pdf . c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing anzsauth.cls ICHARD D GILL 17M C D ONNELL , M.D. & A
BBOTT , D. (2009). Randomized switching in the two-envelope problem.Proceedings oftheRoyalSocietyA:Mathematical,PhysicalandEngineeringSciences ,3309–3322.M C D ONNELL , M.D., G
RANT , A.J., L
AND , I., V
ELLAMBI , B.N., A
BBOTT , D. & L
EVER , K. (2011).Gain from the two-envelope problem via information asymmetry: on the suboptimality of randomizedswitching. ProceedingsoftheRoyalSocietyA:Mathematical,PhysicalandEngineeringSciences , 2825–2851.M
EACHAM , C.J.G. & W
EISBERG , J. (2003). Clark and Shackel on the two-envelope paradox. Mind ,685–689.N
ALEBUFF , B. (1988). Puzzles: Cider in Your Ear, Continuing Dilemma, The Last Shall be First, More.JournalofEconomicPerspectives , 149–156.N ALEBUFF , B. (1989). Puzzles: The other person’s envelope is always greener. JournalofEconomicPerspectives , 171–181.N ICKERSON , R.S. & F
ALK , R. (2006). The exchange paradox: Probabilistic and cognitive analysis of apsychological conundrum. Thinking&reasoning , 181–213.N ORTON , J. (1998). Where the sum of our expectation fails us: The exchange paradox. PacificPhilosophicalQuarterly , 34–58.O’R EILLY , F. (2010). Is there a two-envelope paradox? URL .P RIEST , G. & R
ESTALL , G. (2008). Envelopes and indifference. In Dialogues,logicsandotherstrangethings:EssaysinhonourofShahidRahman. London: College Publications, pp. xxx–xxx+10.R
AWLING , P. (1994). A note on the two envelopes problem. TheoryandDecision , 97–102.R IDGWAY , T. (1993). Comment on Christensen and Utts (1992). TheAmericanStatistician , 311.R OBERTS , M.A. (2009). The nonidentity problem and the two envelope problem: When is one act better fora person than another? In FuturePersons:Ethics,GeneticsandtheNonidentityProblem, eds. M.A.Roberts & D.T. Wasserman. Dordrecht: Springer Netherlands, pp. 201–228.R
ODRIGUEZ , C.C. (1988). Understanding ignorance. In Maximum-EntropyandBayesianMethodsinScienceandEngineering. Springer, pp. 189–204.R
OSS , S.M. (1994). Comment on Christensen and Utts (1992). TheAmericanStatistician , 267.S AMET , D., S
AMET , I. & S
CHMEIDLER , D. (2004). One observation behind two-envelope puzzles. TheAmericanMathematicalMonthly , 347–351.S
CHWITZGEBEL , E. & D
EVER , J. (2008). The Two Envelope paradox and using variables within theexpectation formula. Sorites , 135–140.S MULLYAN , R.M. (1992). Satan,Cantor,AndInfinityAndOtherMind-bogglingPuzzles. Knopf.S
UTTON , P.A. (2010). The epoch of incredulity: a response to Katz and Olin’s ‘A tale of two envelopes’.Mind , 159–169.S
YVERSON , P. (2010). Opening Two Envelopes. ActaAnalytica , 479–498.T SIKOGIANNOPOULOS , P. (2014). Variations on the Two Envelopes Problem. arXiv:1411.2823 .W
AGNER , C.G. (1999). Misadventures in conditional expectation: The two-envelope problem. Erkenntnis , 233–241.Y I , B.U. (2009). The two-envelope Paradox With No Probability. URL http://individual.utoronto.ca/byeonguk/Conditionals%20&%20a%20two%20envelope%20paradox.pdf .Z ABELL , S.L. (1988a). Discussion of “De Finetti’s theorem, induction, and A ( n ) or Bayesiannonparametric predictive inference” by B.M. Hill. In Bayesianstatistics3,ProceedingsofthethirdValenciainternationalmeeting, eds. J. Bernardo, M. DeGroot, D. D V Lindley & A. Smith. ClarendonPress, Oxford, pp. 233–236.Z ABELL , S.L. (1988b). Symmetry and its discontents. In Causation,chanceandcredence. ProceedingsfromtheIrvineConference onProbabilityandCausation,Volume1. Kluwer, Dordrecht, pp. 155–190. c (cid:13)2021 Australian Statistical Publishing Association Inc.Preparedusing