[PDF] The Implications of the Early Formation of Life on Earth

Abstract

One of the most interesting unsolved questions in science today is the question of life on other planets. At the present time it is safe to say that we do not have much of an idea as to whether life is common or exceedingly rare in the universe, and this will probably not be solved for certain unless definitive evidence of extraterrestrial life is found in the future. Our presence on Earth is just as consistent with the hypothesis that life is extremely rare as it is with the hypothesis that it is common, since if there was only one planet with intelligent life, we would find ourselves on it. However, we have more information than this, such as the the surprisingly short length of time it took for life to arise on Earth. Previous authors have analysed this information, concluding that it is evidence that the probability of abiogenesis is moderate ( > 13% with 95% probability) and cannot be extremely small. In this paper I use simple probabilistic model to show that this conclusion was based more on an unintentional assumption than on the data. While the early formation of life on Earth provides some evidence in the direction of life being common, it is far from conclusive, and in particular does not rule out the possibility that abiogenesis has only occurred once in the history of the universe.

Full PDF

aa r X i v : . [ phy s i c s . pop - ph ] J u l The Implications of the Early Formation of Life on Earth

Brendon James BrewerSchool of Mathematics and StatisticsThe University of New South Wales [email protected]

October 25, 2018

Abstract

13% with 95% probability)and cannot be extremely small. In this paper I use simple probabilistic model to show that thisconclusion was based more on an unintentional assumption than on the data. While the earlyformation of life on Earth provides some evidence in the direction of life being common, it isfar from conclusive, and in particular does not rule out the possibility that abiogenesis has onlyoccurred once in the history of the universe.

Attempting to make predictions about life elsewhere based on observations about Earth is in-herently diﬃcult due to the sample size of 1. It is also fraught with controversial “anthropic”considerations (Smolin, 2004). However, there is no reason in principle why it cannot be done. Ifwe use probability theory to model uncertainty (Jaynes, 2003), and data about life on Earth reallyis uninformative about extraterrestrial life, then probability theory will return wide probabilitydistributions, indicating the large uncertainty.The surprising fact that life arose on Earth very quickly after its formation (e.g. Mojzsis et al ,1996) and at the end of a likely phase of sterilisation due to frequent impacts, has been used toargue that abiogenesis must therefore be easy. Lineweaver & Davis (2002, 2004, hereafter L&D)have modelled this reasoning with probability theory and concluded with 95% conﬁdence (Bayesianposterior probability) that the probability of abiogenesis on an Earth-like planet is greater than13%. This was done by using a model where there was constant hazard (chance of life arisingper discrete time interval) q . The probability distribution for the time t L (our t L corresponds toL&D’s ∆ t biogenesis ) at which life arises depends on q , and this is also calculated conditional on thefact that t L must be less than the age of the Earth, to correct for the fact that we couldn’t haveobserved the Earth unless life began. This probability distribution is a likelihood function for q when the actual observed t L is substituted into it. Combined with a prior distribution for q , wecan then make inferences about its value.Whilst it is possible and interesting to calculate such things, the model used by L&D containsa ﬂaw that renders the conclusion invalid. Unfortunately, the conclusion quoted above depends n a choice of prior distribution over q that is overconﬁdent and unrealistic as a description of ourstate of knowledge about abiogenesis. While uniform priors representing “initial ignorance” arecommon in Bayesian Analysis, a uniform prior for an unknown probability such as q is actuallyquite informative (Jaynes, 2003, chapter 18), assigning most of its probability to moderate valuesof q , and ignoring the possibility of extreme values. That the uniform prior is inappropriate can beillustrated using the technique of elaboration : are we happy with all of the implied consequencesof assuming this prior distribution? For instance, one implication is that q ∈ [0 . , .

51] is justas plausible as q ∈ [0 , . q ∼ − and so onshould not be ignored as they almost are by the uniform prior. A more realistic representation ofcomplete prior ignorance would be the improper Haldane prior ∝ [ q (1 − q )] − (Jaynes, 1968), ora modiﬁcation that removes the singularities at q = 0 and q = 1 and makes the prior proper. TheHaldane prior corresponds to an improper uniform prior for the “logit” log[ q/ (1 − q )], representinguncertainty not just about the exact value of q but also about its order of magnitude . This paperuses a model that bypasses direct use of q and deals with expected waiting times instead, althoughits conclusions can be interpreted in the L&D framework as well.The conclusions of Lineweaver & Davis (2002) have been criticised previously on the basisthat no observer would ever see a recent abiogenesis due to the large number of intermediate stepsrequired between abiogenesis and the development of intelligent life (Flambaum, 2003). However, itis still possible to imagine life arising after 5 Gyr on a planet and intelligent observers discoveringthis at, say, t = 8 Gyrs. The fact that we are not in this situation could still be consideredsurprising, and therefore informative (Lineweaver & Davis, 2003). Suppose that there existed a planet that is identical with the early Earth (when conditions havesettled down to be suited for life; call this time t = 0) in terms of all of its macroscopic parameters:mass, temperature, chemical composition, distance from its Sun (which is identical with our Sun),etc. Of course, this model only applies to planets that are Earthlike in terms of their biologicalcharacteristics. While this may seem restrictive, observations about what actually occurred onEarth cannot be relevant to planets that do not have this property. Imagine we are given the valueof a constant, µ , which is the expected waiting time for the ﬁrst abiogenesis on a planet with theabove conditions. From standard survival analysis, 1 /µ is proportional to the probability per unittime of the event happening, and plays the same role as q in L&D’s work. We are then informed,to our great surprise, that the following events occurred on the planet:- Proposition S : At time t = t (the present time. Henceforth, a value of 4.3 Gyr is adoptedwhenever a speciﬁc value is required), there exists a person called Brendon James Brewer, and thePrime Minister of Australia on the planet is Kevin Rudd.- Life ﬁrst arose on the planet at a time t L . Obviously, t L < t .While proposition S may seem overly speciﬁc, one is more likely to make correct inferencesby conditioning on a statement that is more speciﬁc than, say, “intelligent life arises”. See Neal(2006) for a detailed discussion of this point and a principled framework for the treatment anthropicselection eﬀects in general.Our predictions will be given in the form of probability distributions for all of these param-eters. The probability distributions are chosen to represent our uncertain state of knowledge -the Bayesian framework (Jaynes, 2003). Throughout this paper, probabilities of propositions aredenoted by an upper case P () and probability density functions (PDFs) for variables by a lowercase p (); this notation allows probability expressions to become very succinct as the rules followedby probabilities and PDFs are written in the same way. If we only knew the abiogenesis timescale µ , our prediction for t L would be described by anexponential distribution: p ( t L | µ ) = 1 µ exp ( − t L /µ ) t L > Time (Gyr) P r obab ili t y D en s i t y ( G y r − ) µ = 0.3 Gyr µ = 1 Gyr µ = 10 Gyr Figure 1: The probability density for the time at which abiogenesis occured, given that we exist att=4.3 Gyrs after the Earth was ﬁrst suitable for life (deﬁned as t=0). Note that as the abiogenesistimescale becomes larger, this distribution becomes uniform.

Note that this is not an assumption about any frequency distribution that would occur in a popu-lation of Earths, it is only the most conservative probability distribution that has the expectationvalue µ (Jaynes, 1979). When we ﬁnd out that S is true for the planet we are watching, thedistribution is revised to be truncated to between t = 0 and t = t : p ( t L | µ, S ) = µ exp ( − t L /µ ) R t µ exp ( − t L /µ ) dt L , t L ∈ [0 , t ] (2)= µ exp ( − t L /µ )1 − exp ( − t /µ ) (3)Technically, this should have been calculated from Bayes’ theorem: p ( t L | µ, S ) = p ( t L | µ ) P ( S | t L , µ ) P ( S | µ ) (4)= p ( t L | µ ) P ( S | t L , µ ) R ∞ p ( t L | µ ) P ( S | t L , µ ) dµ (5)where the ﬁrst term in the numerator would come from Equation 1. The other term would bevery diﬃcult to quantify, however, any eﬀects that they would include apart from the obvioustruncation eﬀect ( S cannot be true unless t L < t ) would likely be simply quantitative versionsof the evidence and arguments discussed by Lineweaver & Davis (2003). For example, the factthat it is very unlikely for S to be true if t L is close to t corresponds to the “non-observability ofrecent abiogenesis” and would be modelled in the factor p ( S | t L , µ ). Another possible eﬀect is thatthere are various epochs in any Earth-like planet’s history, and conditions are suitable for life toarise in only one of those epochs. However, for the purposes of this paper, the simple truncationof Equation 2 is suﬃcient to repeat most of L&D’s argument, while highlighting our point ofdisagreement with it. Any attempt to increase the sophistication of the model will be deferred tofuture work.The sampling distribution (Equation 2) for data given parameters is plotted in Figure 1 forthree diﬀerent values of the abiogenesis waiting timescale µ : 0.3, 1 and 10 Gyr. Any proposedvalue of µ is a distinct hypothesis that we wish to test, and this sampling distribution deﬁnesthe predictions that each hypothesis makes about the observational data t L , the actual time thatabiogenesis occurred. Note that as µ increases, this tends to a uniform distribution, and hence moderately large values of µ ( ∼

10 Gyr) and extremely large values of µ make exactly the samepredictions about t L .Thus far, this model is virtually identical to that of L&D - the only diﬀerence is that L&Dused a discretised time axis with ∆ t =200 Myr and parameterised µ by q ≈ µ − ∆ t , the probability f life arising in a time ∆ t . This makes it diﬃcult to see how they could have extracted suchconﬁdent conclusions about q based on t L , in light of the above paragraph. This question will beexplored in the next section. µ We have a sampling distribution for some data given a parameter of interest, in Equation 2. Toinfer the parameter µ from the known value of t L , we use Bayes’ Theorem to get the posteriordistribution for µ , which is proportional to a prior distribution times the likelihood function fromEquation 2: p ( µ | t L , S ) ∝ p ( µ | S ) p ( t L | µ, S ) = p ( µ ) p ( t L | µ, S ) (6)Since S by itself hardly tells us anything about any abiogenesis except that it is possible, thedependence on S was dropped from the prior. Now, before we can get probabilistic conclusionsabout µ or a related quantity such as q , a prior must be assigned. If we are initially ignorant of µ ,a suitable prior is the Jeﬀreys prior ∝ /µ . The reason for this is that it is equivalent to a uniformimproper prior for log( µ ), and hence describes uncertainty about the order of magnitude of theparameter. Alternatively, it is the only prior that is invariant under a change of timescale: if wewere to ﬁnd that we are measuring µ in Terayears rather than Gigayears, the Jeﬀreys prior is theonly choice that would not change in the newly rescaled problem. With this choice, the posteriordistribution for µ cannot be normalised unless we obtain additional information about an upperlimit to µ . Hence, it is impossible to construct credible intervals from this data. All we can do isplot the improper posterior for log ( µ ) (which is basically the likelihood, since a Jeﬀreys prior isuniform for log ( µ )), and this is displayed as the solid curve in Figure 2. There is a peak in theposterior, indicating that there is indeed evidence favouring a particular value for µ of about t L .However, the likelihood ﬂattens out at a non-negative (and non-negligible) value after about t =2Gyr. Thus, this data cannot rule out the hypothesis that µ is enormous and that Earth hostedthe only abiogenesis event(s) in the universe.This is essentially a quantitative version of an argument that has been put forward previously,(e.g. by Hanson, 1998): “Since no one on Earth would be wondering about the origin of life if Earthdid not contain creatures nearly as intelligent as ourselves, the fact that four billion years elapsedbefore high intelligence appeared on Earth seems compatible with any expected time longer thana few billion years”.Now, what prior did L&D implicitly assume? To answer this, a uniform prior for q must betranslated to a prior for µ via the approximate relationship q ≈ µ − ∆ t . Since q ∼ Uniform(0 , q/ ∆ t = µ − ∼ Uniform(0 , / ∆ t ). By the usual rule for transforming probability distributions: p ( µ ) dµ = p ( µ − ) d ( µ − ) (7)= p ( µ − ) d ( µ − ) dµ dµ (8)= (∆ t ) × (cid:0) − µ − (cid:1) dµ, µ ∈ [∆ t, ∞ ] (9)The negative sign is irrelevant because only the absolute value of the Jacobian matters, so thisnegative result simply measures the decrease in accumulated probability as one moves leftwardsalong the µ axis - so nothing is amiss. It is apparent that the choice of a uniform prior for q isequivalent to a prior for µ that is proportional to µ − , and truncated to values of µ greater than∆ t . This may seem innocuous, but it is signiﬁcant enough to make the posterior normalisable - infact, whereas the likelihood function (and posterior wrt a Jeﬀreys prior) ﬂattens out completelyfor high µ , the posterior wrt the L&D prior decays exponentially in that region. The major eﬀectof this choice of prior on the posterior can be seen easily in Figure 2.Unfortunately, unless deﬁnitive independent evidence can be found that puts an upper limiton possible values of µ , meaningful credible intervals cannot be constructed. L&D appear tohave unwittingly assumed that they did have that extra required information, or that the earlyformation of life on Earth could provide it, but unfortunately this is not the case. Some information t L is not known exactly, of course. A value of 250 Myr will be adopted whenever a deﬁnite value is required.

10 −5 0 5 1000.511.5 log ( µ /Gyr) U nno r m a li s ed P o s t e r i o r D en s i t y −10 −5 0 5 1002468 log ( µ /Gyr) PosteriorJeffreys Prior PosteriorL&D Prior

Figure 2: The posterior probability density for the logarithm of the abiogenesis timescale, assuminga Jeﬀreys prior for the timescale (uniform prior for its logarithm), is plotted here as the solid curvein the left panel. The prior that is implied by a uniform prior for the quantity q (the chance of lifearising in a ﬁnite time interval) is shown as a dotted curve in the right hand panel, along with theresultant posterior. Note that the likelihood function (proportional to our posterior) becomes ﬂat ata nonzero value towards the right of the curve. Hence, whilst the data do support the hypothesisthat abiogenesis is likely on Earth-like planets (due to the likelihood peak), it is not a strong enoughconstraint to rule out more ‘pessimistic’ possibilities. that could provide a likelihood function that allowed the posterior to be normalised would be thefollowing:- Detection of life elsewhere. Since it is possible to observe a lack of life elsewhere, the samplingdistribution of Equation 2 would no longer integrate to 1, and would be truncated at the star’slifetime rather than having anything to do with the age of the Earth. This models some (quitehigh, in the case of large µ ) probability that life will not arise at all on a given planet. It wasanthropic considerations that led to the truncation and renormalisation, and these do not applyto the case of life on other planets.- A very compelling and well understood theory of abiogenesis would enable the direct calcu-lation of µ from ﬁrst principles; in theory at least.These two possibilities, while not exhaustive, would allow deﬁnite inferences about µ that thecurrent data do not. This conclusion accords with the common sense attitute that prevails in thescientiﬁc community about what we know and don’t know about the probability of abiogenesis. The fact that life arose surprisingly early after the formation of the Earth can be used as evidencefor the hypothesis that abiogenesis is easy, and hence supports the conclusion that life is commonin the universe. However, the evidence is not as conclusive as has been claimed. Speciﬁcally,this study has highlighted the fact that knowledge of the early abiogenesis time on Earth is stillcompatible with the following hypothesis: that life is extraordinarily rare in the universe, perhapseven only on Earth, and we observe early abiogenesis due to chance (we’d have to be moderatelylucky, but not obscenely so). This conclusion diﬀers from Lineweaver & Davis (2002) because theyunwittingly made overconﬁdent prior assumptions. Hence, unless there is a direct detection, theanswer to the perennial question “are we alone” remains “nobody knows”. cknowledgments I am supported by an Australian Postgraduate Award and a Denison Merit Award from The Schoolof Physics at The University of Sydney. I would like to thank the anonymous referees of an earlierversion of this paper for identifying ﬂaws in the previous version, which allowed the paper to beimproved signiﬁcantly.

References