[PDF] Bioverse: a simulation framework to assess the statistical power of future biosignature surveys

Abstract

Next-generation space observatories will conduct the first systematic surveys of terrestrial exoplanet atmospheres and search for evidence of life beyond Earth. While in-depth observations of the nearest habitable worlds may yield enticing results, there are fundamental questions about planetary habitability and evolution which can only be answered through population-level studies of dozens to hundreds of terrestrial planets. To determine the requirements for next-generation observatories to address these questions, we have developed Bioverse. Bioverse combines existing knowledge of exoplanet statistics with a survey simulation and hypothesis testing framework to determine whether proposed space-based direct imaging and transit spectroscopy surveys will be capable of detecting various hypothetical statistical relationships between the properties of terrestrial exoplanets. Following a description of the code, we apply Bioverse to determine whether an ambitious direct imaging or transit survey would be able to determine the extent of the circumstellar habitable zone and study the evolution of Earth-like planets. Given recent evidence that Earth-sized habitable zone planets are likely much rarer than previously believed (Pascucci et al. 2019), we find that space missions with large search volumes will be necessary to study the population of terrestrial and habitable worlds. Moving forward, Bioverse provides a methodology for performing trade studies of future observatory concepts to maximize their ability to address population-level questions, including and beyond the specific examples explored here.

Full PDF

DDraft version January 27, 2021

Typeset using L A TEX twocolumn style in AASTeX63

Bioverse : a simulation framework to assess the statistical power of future biosignature surveys

Alex Bixel

1, 2 and D´aniel Apai

1, 2, 3 Department of Astronomy/Steward Observatory, The University of Arizona, 933 N. Cherry Avenue, Tucson, AZ 85721, USA Earths in Other Solar Systems Team, NASA Nexus for Exoplanet System Science Lunar and Planetary Laboratory, The University of Arizona, 1629 E. University Blvd, AZ 85721, USA (Accepted January 22, 2021)

Submitted to AJABSTRACTNext-generation space observatories will conduct the ﬁrst systematic surveys of terrestrial exoplanetatmospheres and search for evidence of life beyond Earth. While in-depth observations of the nearesthabitable worlds may yield enticing results, there are fundamental questions about planetary habitabil-ity and evolution which can only be answered through population-level studies of dozens to hundredsof terrestrial planets. To determine the requirements for next-generation observatories to address thesequestions, we have developed

Bioverse . Bioverse combines existing knowledge of exoplanet statisticswith a survey simulation and hypothesis testing framework to determine whether proposed space-baseddirect imaging and transit spectroscopy surveys will be capable of detecting various hypothetical sta-tistical relationships between the properties of terrestrial exoplanets. Following a description of thecode, we apply

Bioverse to determine whether an ambitious direct imaging or transit survey would beable to determine the extent of the circumstellar habitable zone and study the evolution of Earth-likeplanets. Given recent evidence that Earth-sized habitable zone planets are likely much rarer thanpreviously believed (Pascucci et al. 2019), we ﬁnd that space missions with large search volumes willbe necessary to study the population of terrestrial and habitable worlds. Moving forward,

Bioverse provides a methodology for performing trade studies of future observatory concepts to maximize theirability to address population-level questions, including and beyond the speciﬁc examples explored here. INTRODUCTIONThe ﬁeld of exoplanet science stands at an excit-ing turning point. In the past, most exoplanet sur-veys aimed only to constrain bulk properties - suchas size, period, and mass. Moving forward, severalgroups are developing concepts for space telescopeswhich would enable the atmospheric characterization oftemperate terrestrial planets. Such concepts include theLarge UV/Optical/Infrared Surveyor (LUVOIR, TheLUVOIR Team 2019), the Habitable Exoplanet Ob-servatory (HabEx, The HabEx Team 2019), the Ori-gins Space Telescope (Origins Space Telescope StudyTeam 2019), the Nautilus Space Observatory (Apai et al.2019), the Large Interferometer for Exoplanets (LIFE,Quanz et al. 2018), and the Mid-Infrared Exoplanet Cli-mate Explorer (MIRECLE, Staguhn et al. 2019). Bylooking for biosignatures in the atmospheres of temper-ate Earth-sized planets, these observatories would con-duct the ﬁrst systematic search for life beyond the SolarSystem. Next-generation observatories will be able to studysome of the closest terrestrial exoplanets in unprece-dented detail, but this is only the start of their scien-tiﬁc capability: observatories which can study tens tohundreds of terrestrial planets will allow for the ﬁrststatistical constraints on the atmospheric, geological,and biological properties of terrestrial planets. Some re-cent works have explored statistical trends and patternswhich may only be evident at the population level. Forexample, habitable zone models predict patterns in at-mospheric CO and H O abundance (Bean et al. 2017;Lehmer et al. 2020) as well as color and albedo across arange of stellar insolations (Checlair et al. 2019). Venusanalogs may have larger apparent radii than their tem-perate siblings due to their thick, post-runaway green-house atmospheres (Turbet et al. 2019). Earth’s geolog-ical record suggests a possible relationship between theages and oxygen content of Earth-like planets, assum-ing their atmospheres evolve similarly to Earth’s (Bixel& Apai 2020a), and with a large enough sample sizeof potentially habitable planets, next-generation surveys a r X i v : . [ a s t r o - ph . E P ] J a n Bixel & Apai could place the ﬁrst constraints on the frequency of lifein the universe (Checlair et al. 2020). An understand-ing of population-level trends will provide context forthe interpretation of possible biosignatures on individualworlds and could illuminate their potential false positive(i.e. non-biological) sources (Apai et al. 2017; Mead-ows et al. 2018). To avoid statistical false positive sce-narios, eﬀorts must also be made to understand whichdistinct mechanisms could produce the same apparenttrends. For example, an increase in cloud deck altitudewith insolation could masquerade as a signature of at-mospheric erosion in a sample of transiting exoplanets(Lustig-Yaeger et al. 2019).Recent research has identiﬁed key outstanding ques-tions about terrestrial exoplanets, their planetary sys-tems, and the processes which shape them for which fu-ture observatories might provide insights (see the SAG15 report for an overview of several such questions inthe context of direct imaging missions, Apai et al. 2017).For example: what are the processes which shape theiratmospheric loss (e.g., Zahnle & Catling 2017)? Is thehabitable zone wide (e.g., Kasting et al. 1993; Koppa-rapu et al. 2014) or narrow (e.g., Hart 1979)? What isthe relationship between planet size and tectonic activ-ity (e.g., Valencia et al. 2007; Dorn et al. 2018)? Arehabitable planets equally common around stars of dif-ferent mass and activity levels (e.g., Shields et al. 2016)?Which, if any, of these questions could be answered witha next-generation observatory will depend on its techni-cal design and observing strategy. One important metricis the number of terrestrial habitable zone planets whichit could realistically detect, but only a subset of thesewill be habitable, and even inhabited worlds may varysubstantially from Earth in their atmospheric composi-tion and evolutionary history. Furthermore, deep spec-troscopic characterization of individual planets will betime-consuming, so strategic choices must be made as towhich planets to characterize and at what wavelengths.For these reasons, analyses based solely on the detec-tion yield predictions of future space mission conceptswill provide an optimistic assessment of their statisticalpower.To enable meaningful statistical hypotheses which canbe tested by future observatories, we have developed

Bioverse . Bioverse estimates the statistical power ofnext-generation exoplanet surveys to detect and studypopulation-level trends by simulating the underlyingplanet population, survey limitations, observing biases,and statistical analyses which a future observer wouldperform on a large set of observations of terrestrial plan-ets. After the following brief description of the codestructure, we describe its three main components in Sec- tions 3 through 5. In Sections 6 and 7, we use

Bioverse to determine the requirements for next-generation sur-veys to test the habitable zone concept and study theevolution of Earth-like planets. CODE OUTLINE

Bioverse consists of three components, outlined inFigure 1. The ﬁrst component generates planetarysystems with bulk properties (e.g., size and period)drawn from statistical distributions, then applies the-oretical models or parametric relationships to generatesecondary properties of interest (e.g., atmospheric com-position). The second component is a survey simula-tor which conducts observations of the simulated ex-oplanet population in direct imaging or transit spec-troscopy mode. The survey simulator ﬁrst determineswhich planets could be characterized within a ﬁnite al-lotted observing time, then generates a simulated dataset representative of the telescope and instrument ca-pabilities. The third component is a Bayesian frame-work which uses simulated datasets to test statisticalhypotheses and estimate model parameters. By iterat-ing through these components, we can use

Bioverse todetermine the statistical power of a proposed observa-tory to test diﬀerent hypotheses.

Bioverse is written in Python and designed forﬂexibility, so that diﬀerent statistical assumptions andtestable hypotheses can be implemented in the future.The speciﬁc set of assumptions which Bioverse is cur-rently based on are listed in Table 1. Given the largenumber of parameters involved in

Bioverse , we providea table of abbreviations and symbols used in the text inAppendix A. PLANET GENERATIONThe ﬁrst component of

Bioverse creates simulatedplanetary systems around host stars in the solar neigh-borhood with a period and radius distribution informedby

Kepler statistics. Other planet properties (such asmass and geometric albedo) are derived from empiri-cal relationships or best-guess prior distributions. Fi-nally, the simulated planet properties reﬂect the eﬀectsof hypothetical population-level trends which could beuncovered by a future survey of terrestrial planets.3.1.

Stellar properties

We begin by considering which stars in the solar neigh-borhood would be targeted by future biosignature sur-veys. Our strategy for simulating stellar systems is A current version of the code can be found on GitHub, while theversion used in this paper is archived on Zenodo (Bixel & Apai2021). ioverse Classify planetsCreate planetsCreate host starsModule 1:Planet Generation Module 2:Survey SimulationWhich planets can be observed?How much time is required per target? Module 3:Hypothesis TestingSurvey descriptionPrioritize and observe targets Compute Bayesian evidenceSample parameter posterior distributionsDefine hypothesisApply population-level trends Reject the null hypothesis?Example 1:

Habitable zone hypothesis Example 2:Age-oxygen correlation

Population-level TrendsCompute statistical power

Figure 1.

A high-level outline of the

Bioverse code. In this paper, we apply

Bioverse to assess the detectability of twohypothetical population-level trends (green) with next-generation survey telescopes. These relationships are injected into thesimulated planet population by the ﬁrst module, then tested as statistical hypotheses by the third module. mass-dependent, and therefore depends on the observ-ing technique used by the simulated survey.

Bioverse currently considers observations through coronagraphicdirect imaging (in “imaging mode”) and transit spec-troscopy (in “transit mode”).Direct imaging surveys will primarily target the habit-able zones of higher-mass (FGK) stars within the nearest30 pc, the majority of which have already been catalogedby space-based astrometry missions. Not all of these willbe equally valid targets, due to the combined eﬀects ofdistance and background noise sources, such as zodia-cal dust (Stark et al. 2019). Sophisticated simulationsfor the LUVOIR mission concept (The LUVOIR Team2019) have produced an optimized list of targets whosehabitable zones could feasibly be probed for Earth-likeplanets. In imaging mode, we use an optimized stellartarget list for the 15-meter LUVOIR-A concept as thebasis for simulating nearby planetary systems (C. Stark,private correspondence). A survey of transiting habitable zone planets wouldbe most sensitive to planets around low-mass (K andM) stars, as their habitable zone planets are more likelyto transit, transit more frequently, and produce a deeperrelative transit depth. However, the census of low-massstars is not complete out to ∼

100 pc. Therefore, in tran-sit mode, all stellar masses are randomly drawn from apresent-day stellar mass function (Chabrier 2003) anddistribute them isotropically in space. We do not in-clude any known stars or transiting planets in the transitmode sample; as most nearby transiting planets remainundiscovered, this would have little eﬀect on the overallstatistical distribution of host star properties.In both imaging and transit modes, we relate the stel-lar mass ( M ∗ ) to its radius, luminosity, and eﬀectivetemperature ( R ∗ , L ∗ , T ∗ ) by interpolating a list of theseproperties as a function of spectral type (Pecaut & Ma-majek 2013). Each star is assigned an age drawn uni-formly from 0–10 Gyr, reﬂecting the (to ﬁrst order) con-stant star formation rate in the Milky Way for the past Bixel & Apai

Table 1.

Summary of statistical assumptions and modeling choices in

Bioverse , with associated references.Topic Assumptions ReferencesHost star distributionand properties (Imaging mode) LUVOIR-A optimized targetcatalog The LUVOIR Team (2019) and C. Stark (pri-vate correspondence)(Transit mode) Stellar mass function Chabrier (2003)Main sequence mass-radius-luminosity relations Pecaut & Mamajek (2013)Planet occurrence rates SAG13 occurrence rates, with modiﬁcations:- η ⊕ ≈ .

5% for G stars (down from ≈ . S . < R < . R ⊕ ) various (see Section 3.4)within the circumstellar habitable zone K14Observatory templates (Imaging mode) 15-meter LUVOIR-A observa-tory The LUVOIR Team (2019)(Transit mode) 50-meter equivalent NautilusSpace Observatory Apai et al. (2019a)Target prioritization Finite observing time with overheadsObserve in order of required timePrioritize targets to reduce survey biasesMeasurement noise Photon-noise limited observations with charac-teristic wavelength λ eﬀ Required exposure time scales with distance,stellar brightness, and signal strengthModel comparison Compare alternative to null hypothesis throughBayesian evidence Z Signiﬁcant evidence when ∆( Z ) >

10 Gyr (e.g., Snaith et al. 2015; Fantin et al. 2019; Moret al. 2019).3.2.

Period and radius occurrence ratesKepler has provided excellent insights into the fre-quency of planets as a function of period and size for awide range of host stars. However, these statistics areonly complete to periods (cid:46)

100 days, and as such donot reach the habitable zone of Sun-like stars. As a re-sult, estimates of η ⊕ (the average number of habitablezone Earth-sized planets per star) have so far been basedon extrapolation and are therefore model-dependent.NASA’s Exoplanet Program Analysis Group chartered Science Analysis Group 13 (hereafter SAG13) to consol-idate the results of several studies of Kepler occurrencerates into a single set of estimates for community use, resulting in an oft-cited value of η ⊕ ≈

24% for G stars.Here, and elsewhere in this paper, the value of η ⊕ usesthe habitable zone model of Kopparapu et al. (2014)(hereafter K14; 0.95 – 1.67 AU for an Earth twin). Weuse the SAG-13 consensus occurrence rate power laws asthe basis for determining the number, radii ( R p ), periods( P ), semi-major axes ( a ), and insolations ( S ) of planets see this URL as well as Kopparapu et al. (2018) ioverse N u m b e r o f t e rr e s t r i a l p l a n e t s p e r s t a r Extrapolation KeplerSAG130.10 1.00

Stellar mass ( M ) Figure 2. (Top) The assumed number of approximatelyEarth-sized planets (0 . < R < . R ⊕ ) with orbital peri-ods shorter than 3 yr per star, as a function of stellar mass.We modify the SAG13 estimate (black) by decreasing theoverall planet count by ∼ × and increasing the number ofplanets orbiting Kepler low-mass stars, as well as shorten-ing their orbital periods (gray). We conservatively assumethe occurrence rates to plateau for ultra-cool dwarfs (green).(Bottom) The corresponding value of η ⊕ using the habitablezone model of K14. in each system. However, the SAG13 metastudy wasbased largely on studies published before 2017, many ofwhich did not assess planet occurrence as a function ofstellar mass. We make the following two modiﬁcationsto the SAG13 rates to reﬂect recent work.First, we unilaterally decrease the number of plan-ets per star by a factor of 3.2, such that η ⊕ ≈ . P (cid:46)

25 d) than inthe habitable zone, which they ascribe to the eﬀects ofphotoevaporation. Speciﬁcally, they argue that a largefraction of Earth-sized planets on close-in orbits are theevaporated cores of ice giants - planets which maintaintheir envelopes (and are therefore not Earth-like) if theyform in the habitable zone. In another analysis, Neil& Rogers (2020) ﬁnd evidence for two distinct popula-tions of rocky planets, and as a result fewer Earth-sizedplanets in the habitable zone, for which they suggest asimilar explanation. The chosen value of 7.5% is in themid-range of values estimated by Pascucci et al. (2019) when they exclude the planets most aﬀected by photo-evaporation.Second, we modulate the occurrence rates as a func-tion of spectral type following Mulders et al. (2015a),who ﬁnd that rocky planets are more common aroundlower-mass stars and tend to occupy shorter orbits.Speciﬁcally, we gradually increase the number of planetsfor stars less massive than the Sun and decrease theirsemi-major axes by interpolating between the scalingfactors provided by Mulders et al. (2015a) (normalizedto 1 for the typical Kepler host star). Later, Mulderset al. (2015b) found evidence that the number of rockyplanets around the typical

Kepler

M dwarf (M0 – M5)was ∼ . × as high as for G dwarfs, so we further in-crease the number of planets around M dwarfs to reﬂectthis result. Finally, since Kepler was not sensitive tolate M dwarfs, we assume the number of planets perstar to plateau for these stars (which we believe to bea conservative extrapolation given the general trend).We note that more recent studies of M dwarf planet oc-currence rates reaﬃrm the ﬁnding that lower-mass starshave more Earth-sized planets, including estimates from

Kepler data (e.g., Hardegree-Ullman et al. 2019; Hsuet al. 2020) and radial velocity detections (Tuomi et al.2019).The net impact of these two decisions on the numberof Earth-sized planets per star, as a function of stel-lar mass, is shown in Figure 2. Our estimate of η ⊕ may seem pessimistic when compared to higher valuesused in predicting the detection yield of mission con-cepts (e.g., The HabEx Team 2019; The LUVOIR Team2019; Origins Space Telescope Study Team 2019; Apaiet al. 2019), but we view it to be a realistic estimatebased on the most recent studies available. Our esti-mate is lower than those of Bryson et al. (2021), whoavoid bias due to photoevaporation by excluding plan-ets at high insolation. However, their resulting samplesize is limited, and thus their conﬁdence intervals arebroad; indeed, our value of η ⊕ = 7 .

5% is within the 95%conﬁdence interval of some of their estimates. In general, all existing estimates of η ⊕ for G stars -including our own - are based on extrapolation and aretherefore uncertain. For example, existing data cannotrule out an increase in terrestrial planet occurrence ratesat orbital periods beyond ∼

100 d, which would enhance η ⊕ . To accommodate this uncertainty, we express ourresults in Sections 6 and 7 in terms of either η ⊕ or thenumber of planets observed (which is typically linear to see Table 1 and Figure 4 therein see Table 6 therein Bixel & Apai η ⊕ ). As a result, the validity of our results is not tiedto any speciﬁc value for η ⊕ .3.3. Habitable zone boundaries

The circumstellar habitable zone refers to the theoret-ical region around a star in which a planet can sustainliquid surface water. Many formulations of the habit-able zone exist, but the most commonly cited estimatesare based on Kasting et al. (1993) and subsequent pa-pers which expanded on their methodology (Kopparapuet al. 2013, 2014). In

Bioverse we use the results ofK14 to calculate the inner edge ( a inner , correspondingto the runaway greenhouse limit) and outer edge ( a outer ,corresponding to the maximum greenhouse limit) of thehabitable zone. To account for the dependence on plan-etary mass, we interpolate between the three planetarymasses modeled therein.3.4. Classiﬁcation

Following Kopparapu et al. (2018), we classify planetsas “hot”, “warm”, or “cold” depending on their insola-tion, and “rocky”, “super-Earth”, “sub-Neptune”, “sub-Jovian”, or “Jovian” depending on their size. Approxi-mately Earth-sized planets within the habitable zone areof particular interest, as these are the most likely plan-ets to have liquid water and habitable surface conditions.Following recent studies of detection yield estimates fordirect imaging missions (Kopparapu et al. 2018; Starket al. 2019; The HabEx Team 2019; The LUVOIR Team2019), we classify as “exo-Earth candidates” (hereafterEECs) any planets with radii 0 . S . < R < . ∼ . − . R ⊕ tendto resemble mini-Neptunes in composition more thansuper-Earths (e.g., Weiss & Marcy 2014; Rogers 2015;Fulton et al. 2017).3.5. Albedo and contrast ratio

Imaging measurements will be able to use a planet’sbrightness as a rough proxy for its size, but its bright-ness also depends on its geometric albedo, orbital phase,and semi-major axis. The latter two of these can feasi-bly be constrained by revisiting the system over severalmonths, but it will be diﬃcult to precisely disentanglegeometric albedo and planet size. Albedo is highly sen-sitive to surface and atmospheric composition and willlikely be highly variable for directly imaged exoplanets,so estimates of a planet’s size based on brightness alone will be highly uncertain (Guimond & Cowan 2018; Bixel& Apai 2020b; Carri´on-Gonz´alez et al. 2020). To prop-erly represent this source of uncertainty, we assign ge-ometric albedos ( A g ) to each planet ranging uniformlyfrom 10 – 70% (approximately the range of values en-countered at visible wavelengths for solar system plan-ets, e.g., Madden & Kaltenegger 2018).Next, we compute the planet-to-star brightness con-trast ratio for each planet, modeling them as Lambertianspheres observed at quadrature phase (Traub & Oppen-heimer 2010): ζ = A g π (cid:18) R p a (cid:19) (1)Note that the determination of a planet’s phase fromimaging data is also not trivial, requiring multiplefollow-up observations to establish the orbit. Neverthe-less, such observations will be a likely component of anyfuture imaging survey in order to distinguish temperateplanets from their hotter and colder peers (The HabExTeam 2019; The LUVOIR Team 2019).3.6. Surface gravity and scale height

To translate planet radii into masses, we use the prob-abilistic mass-radius relationship derived by Wolfganget al. (2016), which separates terrestrial planets andice giants. Given each planet’s mass and surface grav-ity, we then estimate the atmospheric scale height ( h ),which is important for determining the relative spec-troscopic signal due to atmospheric absorption (as de-scribed in Section 4.3.2). We assign an atmosphericmean molecular weight µ to each planet based on itssize. For “sub-Neptune” planets and larger, we as-sume H dominated atmospheres similar to Neptune orUranus, with µ = 2 . m H . For “rocky” and “super-Earth” planets, we calculate the ratio of N to CO based on their position relative to the habitable zoneas follows. For planets within a inner , we assume CO dominated atmospheres similar to Venus’ ( µ = 44 m H ).Within the habitable zone, we adopt a positive corre-lation between semi-major axis and CO partial pres-sure, which climate models predict as a result of thecarbonate-silicate negative feedback mechanism (e.g.,Bean et al. 2017). Speciﬁcally, we follow the correlationderived by Lehmer et al. (2020) , add N as necessaryto reach a minimum total pressure of 1 bar, and calcu-late the mean molecular weight between the two species(28 < µ < m H ). Finally, for planets beyond a outer ,we assume the CO to condense, leaving behind a pureN atmosphere ( µ = 28 m H ). We set the atmospheric We adopt the best-ﬁt line in Figure 1 therein ioverse

Inclination and transiting planets

Planets are assigned inclinations ( i ) from an isotropicdistribution (i.e. a uniform distribution in cos( i ) from − b = a cos( i ) /R ∗ (2)For transiting planets (with | b | <

1) we calculate thetransit depth ( δ = ( R p /R ∗ ) ) and duration: T dur = R ∗ Pπa (cid:112) − b (3)3.8. Hypothetical population-level trends

The primary goal of our study is to understand whichpopulation-level trends may be detectable with a next-generation exoplanet survey. For example, could such asurvey empirically determine the location of the habit-able zone based on which planets have H O-rich atmo-spheres (Section 6), or study how oxygen evolves overtime in the atmospheres of Earth-like planets (Section7)?To enable these inquiries, we apply hypotheticalpopulation-level trends to the simulated planet samplewhich will later be studied by simulated direct imagingand transit surveys. Speciﬁcally, we determine whichplanets have atmospheric water vapor based on theirsize and semi-major axis (following Equation 13), anddetermine which Earth-like planets have atmosphericoxygen based on their age (following Equation 15). Amore detailed description of these assumed trends, andan assessment of their detectability by future biosigna-ture surveys, can be found in Sections 6 and 7. SURVEY SIMULATIONThe second component of

Bioverse translates thesimulated planet population from the previous sectioninto a data set representing the result of a lengthy char-acterization eﬀort with a next-generation observatory.There are a few methods by which future observatoriescould characterize statistically-relevant samples of hab-itable planets, but in

Bioverse we focus on space-baseddirect imaging and transit spectroscopy. The data setsproduced by these next-generation surveys will be inher-ently biased by the observing approach. Most notably,an imaging survey is most eﬃcient in targeting the hab-itable zones of nearby FGK stars, while a transit survey is optimized for M stars. Strategic decisions also biasthe data set - for example, an imaging survey must ded-icate ∼ × as much time to study a planet at 2 AU fromits star versus an Earth twin, so studying planets nearthe outer edge of the habitable zone will come at a steepcost. 4.1. Survey setup

As our template for a direct imaging survey we useLUVOIR (The LUVOIR Team 2019, hereafter L19), aproposed NASA Flagship-class mission which would usean 8–15 meter segmented mirror and a multi-channelcoronagraphic instrument to study terrestrial planetsaround nearby stars. While the details of the LU-VOIR concept have been studied in-depth, our resultsare based only on its high-level characteristics - speciﬁ-cally, we adopt the 15-meter LUVOIR-A mirror diame-ter, coronagraphic inner (IWA) and outer (OWA) work-ing angles and noise ﬂoor, and the host star catalog usedto simulate its detection yield estimates (C. Stark, pri-vate correspondence). Our results should be generallyapplicable to any imaging mission with a similar mirrorsize and coronagraph.As our template for a transit survey, we use the Nau-tilus Space Observatory concept (Apai et al. (2019a);Apai et al. (2019c)), which aims to study transitingexoplanets with the equivalent light-collecting area ofa single 50-meter diameter telescope. To achieve thislight-collecting power, Nautilus would employ an arrayof large telescopes with ultralight diﬀractive-refractiveoptical elements (Milster et al. 2020) (the launch of asingle, up to 8.5m diameter telescope has recently beenproposed as a NASA Probe-class mission, Apai et al.(2019b)). To generate the potential list of transitingplanets, we simulate systems to a distance of 150 par-secs, as our simulated surveys tend not to observe tar-gets beyond this distance even when they are available.Our analyses are based on a 15-meter mirror diam-eter imaging survey and a 50-meter diameter (equiva-lent area) transit survey, because among all conceptscurrently under consideration by the community, theseare the ones purporting to oﬀer the largest EEC samplesizes for their respective techniques. It should be notedthat a 15-meter imaging survey would also be capable ofcharacterizing nearby transiting planets as a secondaryscience goal, but we do not model any dual mode surveyshere. 4.2.

Which planets can be detected?

After simulating a catalog of nearby planetary sys-tems, we discard any planets which cannot be detectedby a given mission architecture. In transit mode, we

Bixel & Apai exclude all non-transiting planets. In imaging mode,we exclude all planets whose maximum angular sepa-ration is less than the IWA, or whose average angularseparation is greater than the OWA, or for which theplanet-to-star contrast ratio ( ζ ) is below the instrumentnoise ﬂoor.The remaining planets can, in principle, be detectedby the survey, but to actually detect most of them willrequire preliminary observations either using the sametelescope architecture or a precursor survey. A dedi-cated imaging mission would likely be able to detect allof the EECs which it is capable of characterizing duringpreliminary observations (Stark et al. 2019), but the vastmajority of transiting planets within the nearest ∼ Kepler . Thecost and complexity of such a mission, though consid-erable, would likely be much less than that of a subse-quent characterization eﬀort requiring orders of magni-tude greater light-collecting area.4.3.

Which planets can be characterized?

In-depth spectroscopic characterization is time-consuming, so the number of targets which can be char-acterized is a function of the total time budget allottedto the characterization eﬀort ( t total ). Note that t total is not necessarily the same as the total survey lifetime(which might be e.g. 5–10 yr). To determine which plan-ets can be observed within t total , we ﬁrst determine theamount of time required to characterize each planet, in-cluding overheads, and prioritize targets based on boththeir required observing time and their relative impor-tance to the survey’s goals.4.3.1. Required exposure time

To determine which planets can be characterizedwithin the time budget t total , we ﬁrst determine theamount of exposure time required to spectroscopicallycharacterize a reference planet whose host star proper-ties reﬂect the typical target for each survey mode. Forboth observing modes, the reference planet has exactlythe same bulk parameters and receives the same incidentﬂux as modern Earth. For direct imaging observations,its star is a nearby solar-type star ( T ∗ , ref = 5777 K, R ∗ , ref = R (cid:12) , d ref = 10 pc) while for transit observationsit is a more distant early M dwarf ( T ∗ , ref = 3300 K, R ∗ , ref = 0 . R (cid:12) , d ref = 50 pc). In the examples to fol-low, we only consider the detection or non-detection ofan absorption feature associated with a species, ratherthan constraints on the abundance. We use two general circulation models (GCMs) pub-lished by Komacek & Abbot (2019) to quantify thethree-dimensional atmospheric abundance proﬁles of ourreference planets. Both models are water-covered plan-ets around a Sun-like star (imaging mode) or early Mdwarf (transit mode) with the same size, mass, and inso-lation as Earth and 1 bar N /H O atmospheres. Thesemodels include a treatment of ice and liquid cloud cover,which is an important factor aﬀecting the detectabil-ity of molecular features through imaging and transitobservations. Notably, because the M dwarf planet istidally-locked, convection on its dayside is more eﬃcient,leading to strong, high-altitude cloud cover and greaterstratospheric H O abundance (T. Komacek, private cor-respondence). Finally, to enable the analysis in Section7, we inject Earth’s modern oxygen abundance (pO =20.7%) into the model atmospheres, reducing the back-ground N pressure accordingly.To simulate spectra for both models, we use the Plan-etary Spectrum Generator (hereafter PSG, Villanuevaet al. 2018), which accepts three-dimensional atmo-spheric proﬁles through its GlobES module . The di-rectly imaged planet is observed at quadrature phase,while the transiting planet is observed with the night-side facing the observer. Both simulated spectra areshown in Figure 3 for atmospheres with and withoutcloud cover. Next, we use PSG to compute noise es-timates for each survey architecture as a function ofon-target exposure time. In imaging mode, we use thePSG template for the 15-meter LUVOIR-A observatory,including the projected throughput, spectral resolution,raw contrast, and detector noise for the visible and near-infrared imagers, as well as 4.5 zodis of background dust.In transit mode, we simulate observations for a 50-meterdiameter aperture with 60% total throughput, ignoringdetector and instrument noise. To determine whether amolecular feature can be detected, we simulate spectrawith and without the target molecule and compute thedetection signal-to-noise ratio (SNR) across the absorp-tion band in a manner similar to Lustig-Yaeger et al.(2019) : SNR = (cid:115)(cid:88) i (∆ y i /σ y i ) (4)where ∆ y i is the diﬀerence between the two spectra ineach spectral bin and σ y i is the measurement uncer-tainty. Finally, we compute the exposure time requiredto achieve a SNR = 5 detection of the feature for the The PSG conﬁguration ﬁles for this study can be found in thecode repository Equations 4–6 therein ioverse t ref ) in each survey mode, then scalethis value to determine the exposure time required foreach individual planet detected by the survey.4.3.2. Exposure time scaling

We deﬁne t i as the amount of exposure time requiredto spectroscopically characterize a planet at wavelength λ eﬀ . If we assume that t i depends primarily on the num-ber of photons collected, then we can estimate it by scal-ing t ref (as determined using PSG) as follows: t i t ref = f i (cid:18) d i d ref (cid:19) (cid:18) R ∗ R ∗ , ref (cid:19) − (cid:18) B ∗ ,i ( λ eﬀ , T ∗ ,i ) B (cid:12) ( λ eﬀ , T ∗ , ref ) (cid:19) − (5)where f i summarizes the factors aﬀecting the signalstrength unique to each observing mode. In imagingmode, the exposure time is inversely proportional to theplanet-to-star contrast ratio (assuming observations atquadrature phase): f im i = (cid:18) ζ i ζ ⊕ (cid:19) − (6)In transit mode, the transit depth signal induced by theatmosphere is (to ﬁrst order) ∆ δ ∼ ( R p /R ∗ ) ( h/R p )(Winn 2010) and the required exposure time is inverselyproportional to its square: f tr i = (cid:18) h i h ⊕ (cid:19) − (cid:18) R p,i R ⊕ (cid:19) − (cid:18) R ∗ ,i R ∗ , ref (cid:19) (7)We round up t i to the next integer multiple of theplanet’s transit duration, because a transit survey wouldlikely observe complete transits to measure the baseline.Planets are considered to be invalid targets if the totalnumber of required transit observations is greater thaneither the number of available transits within 10 yearsor 10 .These scaling relations are meant to capture the mainfactors aﬀecting the relative exposure time required foreach target so as to provide an approximate mappingbetween the total amount of time dedicated to a sur-vey and the number and distribution of targets it canobserve. Ultimately, the primary metric aﬀecting a sur-vey’s statistical power is usually the number of EECscharacterized, and we translate t total into the numberof characterized EECs so the reader can interpret ourresults as a function of sample size.4.3.3. Overheads

In imaging mode, following L19 we increment eachplanet’s required exposure time by 2 hr to account forslew overheads and overheads associated with wavefront control. These overheads end up being relatively in-signiﬁcant except for the closest targets. In transitmode, we assume 0.5 hr of slew overheads per observa-tion, plus a total overhead equal to the transit durationfor baseline observations before and after each transitevent. 4.3.4.

Target prioritization

Given a limited time budget, it seems reasonable toprioritize observations of planets in order of increasing t i so as to maximize the number of planets observed.However, prioritizing targets strictly by t i will lead toa biased sample, especially in the case of transit sur-veys which are strongly biased towards the detection ofclose-in planets. To counter-act these biases, we assigna weight w i to each planet, and calculate its priority asfollows: p i = w i /t i (8)The speciﬁc choice of w i depends on the hypothesis be-ing tested and is discussed in Sections 6 and 7. To createthe ﬁnal simulated data set, we observe targets in orderof decreasing p i until some pre-determined time limit t total is reached.4.4. Comparison between survey modes

In the following sections, we use

Bioverse to evalu-ate the statistical potential of direct imaging and transitspectroscopy surveys, but we avoid direct comparisonsof their results for the following reasons. First, the tech-nical requirements for and limitations of a direct imagingbiosignature survey have been more thoroughly exploreddue to investments in the LUVOIR and HabEx missionconcepts. As a result, our results for the transit sur-vey are likely more optimistic. Second, we do not wishto imply that a survey’s statistical power is the only ormost important dimension for comparison, as each ar-chitecture enables unique capabilities which the otherdoes not.For the topics discussed here, the primary diﬀerencebetween the two surveys is the number of EECs eachcan characterize. For the 15-meter imaging survey,this number is 15–20, and is volume- rather than time-limited. This estimate is consistent with that of L19when adjusted for our updated value of η ⊕ ( ≈ . O absorption or ∼

200 forO absorption given t total = 2 yr. HYPOTHESIS TESTINGThe third component of

Bioverse assesses the infor-mation content contained within the simulated data sets0

Bixel & Apai

Wavelength ( m) C o n t r a s t r a t i o ( pp t ) without cloudswith clouds 0.4 0.6 1.0 1.5 2.0 Wavelength ( m) T r a n s i t d e p t h ( pp m ) Figure 3.

Model spectra for the reference planet in imaging (left; contrast ratio in parts-per-trillion) and transit (right;transit depth in parts-per-million) mode. The spectra are based on GCM models published by Komacek & Abbot (2019), whoinvestigate ice and liquid cloud cover on planets as a function of spectral type and tidal locking. We include the eﬀects of cloudsto determine our exposure time estimates (black), while clear-sky spectra are shown for reference (gray). Targeted absorptionbands include H O (green) and O or O (blue). from the previous section. This assessment focuses ontwo primary questions: ﬁrst, how likely is it that thesurvey would be able to detect the eﬀects of a statisti-cal trend injected into the simulated planet population(Section 3.8)? Second, how precisely could the surveyconstrain the parameters of that trend? To answer thesequestions, we rely on a standard Bayesian hypothesistesting approach. Null and alternative hypotheses

Each simulated data set can be thought of as a set ofindependent variables x and dependent variables y . Forthis section (and the examples to follow), we consider x and y to each represent measurements of a single vari-able, but this hypothesis testing framework can extendto multivariate measurements as well. The hypothesis h ( (cid:126)θ, x ) describes the relationship between the x and y interms of a set of parameters (cid:126)θ . The simplest hypothesisis the null hypothesis, in which there is no relationship: h null ( θ, x ) = θ The null hypothesis is compared to an alternative hy-pothesis, which proposes a relationship between x and y , using a Bayesian parameter estimation and hypothe-sis testing approach.5.2. Likelihood function and prior distribution

Given a hypothesis h , the likelihood function takes onone of two forms. In the case where y is binary (e.g., thedetection or non-detection of an atmospheric species), For a review of Bayesian parameter estimation and model selec-tion in astronomy, we refer the reader to Trotta (2008). then h is the probability that y = 1, and the likelihoodfunction is: L ( y | (cid:126)θ ) = N (cid:89) i (cid:104) y i h ( (cid:126)θ, x i ) + (1 − y i )(1 − h ( (cid:126)θ, x i )) (cid:105) (9)Alternatively, if y is a continuous variable measured withnormal uncertainty σ y , then h predicts the expectationvalue of y , and the likelihood is described by the normaldistribution: L ( y | (cid:126)θ ) = N (cid:89) i (cid:113) πσ y,i exp (cid:32) − ( y i − h ( (cid:126)θ, x i )) σ y,i (cid:33) (10)Note that in both example applications of Bioverse tofollow, we consider a detection or non-detection as ourdependent variable and use the likelihood function de-ﬁned by Equation 9.The parameter prior distribution is denoted by Π( (cid:126)θ ).Given limited prior information about the true valuesof parameters (cid:126)θ , we generally assume uniform or log-uniform distributions spanning the range of plausiblevalues. Further justiﬁcation for our choice of prior dis-tributions can be found in the examples to follow.5.3.

Parameter estimation and Bayesian evidence

For each simulated data set, we sample the poste-rior distribution of the hypothesis parameters (cid:126)θ using aMarkov Chaint Monte Carlo (MCMC) algorithm, imple-mented by emcee (Foreman-Mackey et al. 2013). Thissampling yields measurement constraints of the parame-ters (cid:126)θ . We also use a nested sampling algorithm (Skilling2006), implemented by dynesty (Speagle 2020), to esti- ioverse Z = P ( y | h ) = (cid:90) L ( y | (cid:126)θ )Π( (cid:126)θ ) dθ (11)To test a hypothesis, we can compare its evidence tothat of the null hypothesis, ﬁnding evidence to rejectthe null hypothesis when:∆ ln( Z ) = ln( Z ) − ln( Z null ) > Z ) > p < .

05 threshold for hypoth-esis testing with other frequentist tests (e.g., Student’st-test).It should be noted that dynesty also samples the pa-rameter posterior distributions - so why use emcee todo this separately? In short, nested sampling is opti-mized to measure Z , while MCMC is optimized to de-termine the posterior distribution. While dynesty canquickly compute the Bayesian evidence with suﬃcientaccuracy ( σ ln( Z ) (cid:46) . emcee . Since we repeat each simulatedsurvey > ,

000 times, we ﬁnd this mixed approachto be necessary to achieve both accurate evidence andparameter estimations on a reasonable timescale.5.4.

Statistical power

Whether or not an individual simulated survey isable to reject the null hypothesis can often depend onstochastic error; one simulated survey may be able toreject the null hypothesis where another cannot. Tosummarize our results, we re-run each simulated sur-vey several times under the same set of assumptionsand calculate the fraction of survey realizations whichachieve a positive result. This metric is also knownas the statistical power, and it allows us to assess asurvey’s statistical potential as a function of both sur-vey parameters (such as total survey duration) andas-yet unknown astrophysical parameters (such as thefrequency of habitable planets).This concludes the description of the three primarycomponents of

Bioverse . In the following two sec-tions, we will demonstrate applications of

Bioverse toits stated goal of assessing the statistical power of next-generation biosignature surveys. EXAMPLE 1: EMPIRICAL DETERMINATIONOF THE HABITABLE ZONE BOUNDARIES Models of the habitable zone predict that planets withoceans can only exist within a ﬁnite - and perhaps verynarrow - range of insolations. An associated predictionis that terrestrial planets in the habitable zone withwater-rich atmospheres are the most likely candidatesfor ocean-bearing worlds. These models will play an im-portant role in the design and target prioritization ofnext-generation observations; for example, preliminarysearch strategies for future biosignature surveys oftendedicate intensive follow-up to water-bearing habitablezone planets (The LUVOIR Team 2019), while delegat-ing non-habitable zone planets to a lower priority. How-ever, models for the habitable zone have not been testedoutside of the solar system, and estimates of its locationand width have varied by factors of several over the pastfew decades.Could future observatories use data acquired from pre-liminary observations to test the “habitable zone hy-pothesis” i.e., the hypothesis that planets with water va-por should be more abundant within a narrow and ﬁniterange of orbital separations? Further, could these databe used to empirically determine the location and widthof the habitable zone? The practical beneﬁt of testingthe habitable zone hypothesis would be to make the sur-vey’s target prioritization strategy more eﬃcient and tobetter determine which of its targeted planets are mostlikely to be habitable. By measuring its boundaries, ob-servers could test the predictions of various habitablezone models, and therefore the physical mechanisms onwhich they rely. Finally, empirical constraints on thewidth of the habitable zone will be important for deter-mining the occurrence rate of habitable worlds. Here,we use

Bioverse to explore how a survey of atmosphericwater vapor could be used to test the habitable zone hy-pothesis. 6.1.

Model predictions

Climate models predict a steep decline in water vaporabundance of terrestrial planets outside of the habitablezone. Within the inner edge, an Earth-like planet mayundergo a runaway greenhouse as on Venus, leaving be-hind only a tenuous amount of atmospheric water vapor.Beyond the outer edge, the oceans may freeze, and watervapor would not accumulate except in very low pressureatmospheres which permit its sublimation.In

Bioverse we implement these predictions as fol-lows. We assume that a fraction f H OEEC of EECs are infact habitable, meaning they bear surface water andatmospheric water vapor. We also allow a fraction f H Onon-EEC of non-EECs to have atmospheric water va-por, serving as a source of noise and “false positives”2

Bixel & Apai for habitable planets. Then the fraction of planets withatmospheric water vapor can be described as: f H O =  f H OEEC if a inner < a < a outer and 0 . S . < R < . R ⊕ f H Onon-EEC if a < a inner or a > a outer and R > . S . R < . S . (13)where the habitable zone boundaries and planet sizelimits are those discussed in Sections 3.3 and 3.4.6.2. Simulated survey

Measurements

The imaging and transit surveys perform a set of mea-surements outlined in Table 2 to determine the size andorbital separation of each potential target. In imagingmode, the planet’s size is not determinable without priorknowledge of the geometric albedo, so an estimated size( R est ) which assumes Earth-like reﬂectivity is used asa proxy. In both modes, the orbital separation is con-verted to the “eﬀective” semi-major axis ( a eﬀ ) for whichthe planet would receive the same insolation around aSun-like star.These preliminary measurements are used to prioritizetargets as discussed in the following section. Those tar-gets of high enough priority are spectroscopically charac-terized to determine whether their atmospheres containH O. The ﬁnal output of each simulated survey as adata set consisting of ( a eﬀ , H O), where H O = { , } reﬂects the absence or presence of water absorption fea-tures in the planet’s spectrum. One example of a simu-lated data set is shown in Figure 4.6.2.2. Target prioritization

To test the habitable zone hypothesis we must observeplanets spanning a broad range of semi-major axes, butprioritizing targets solely based on required exposuretime will bias observations towards close-in planets. Fur-thermore, planets much smaller or larger than Earth arenot likely to be habitable regardless of insolation, andtherefore serve as a source of noise. The counter theseeﬀects, we weight each target according to its size andorbital separation following Figure 5. We tuned thisprioritization based on trial and error to achieve the fol-lowing goals:1. Prioritize observations of more probable Earthanalogs (planets receiving 50–150% of Earth’s in-cident ﬂux). 2. Balance observations of widely-separated planetsversus close-in planets.3. Minimize observations of non-Earth sized planets.In transit mode, we additionally weight each target by( a/R ∗ ) to negate the bias due to close-in planets be-ing more likely to transit. The resulting distribution ofobserved planets is also shown in Figure 5.6.2.3. Time budget

Following the procedure in Section 4.3.1, we use PSGto determine the exposure time required for a 5 σ detec-tion of water vapor absorption through its near-infraredabsorption bands. In transit mode, we combine the SNRfrom the 1.4 and 1.9 µ m features. In imaging mode, weonly target the 1.4 µ m band, as LUVOIR will be un-able to observe the full near-infrared spectrum simulta-neously, and the 1.9 µ m band is harder to observe dueprimarily to lower stellar ﬂux.In imaging mode, we ﬁnd t ref = 0 . t ref = 181 hr of in-transit ex-posure time; the details of these calculations are shownin Table 4. Next, we scale t ref according to Equations5 through 7 to determine the amount of exposure timerequired to characterize each planet, then add observingoverheads. We weight the targets according to Figure5, calculate each target’s priority following Equation 8,then ﬁnally observe by order of decreasing probabilityuntil the total time budget t total is reached.The average number of EECs observed by each surveyis displayed in Figures 6a and 7a as a function of either η ⊕ or t total . While we use the number of EECs observedas our primary metric of sample size, note that mostobserved targets are non-habitable. Since the imagingsurvey yield quickly becomes volume-limited, we inves-tigate the impact of varying η ⊕ for a ﬁxed time bud-get t total = 120 d (which is suﬃcient to characterize >

90% of detectable EECs). For the transit survey, weﬁx η ⊕ = 7 .

5% and investigate the impact of varying t total . 6.3. Habitable zone hypothesis

Now, let us approach the simulated data from the viewof an observer who no prior knowledge of Equation 13using the Bayesian hypothesis testing framework out-lined in Section 5. The habitable zone hypothesis statesthat planets within the habitable zone are more likelyto have water vapor than those outside of it: h HZ ( a eﬀ ) =  f HZ if a inner < a eﬀ < a inner + ∆ af HZ ( f non-HZ /f HZ ) otherwise (14) ioverse a eff (AU) no H OH O Figure 4.

An example of a simulated direct imaging data set for Section 6. Planets are probed for the presence of atmosphericwater vapor across a broad range of orbital separations. We assume the habitable zone (gray) to be marked by an abundanceof water-rich atmospheres. The separation a eﬀ = a ( L ∗ /L (cid:12) ) − / is the solar-equivalent semi-major axis. Table 2.

Measurements made by the simulated surveys in Examples 1 and 2. Parameters marked by † are calculated from other measured values.Parameter Measurement uncertainty Description / notes Example 1Imaging survey L ∗ negligible Host star luminosity ζ

15% Planet-to-star contrast a

10% Semi-major axis a eﬀ †

10% Solar-equivalent semi-major axis R est †

10% Estimated radius assuming Earth-like reﬂectivityH O Detected / not detected Presence of 1.4 µ m H O absorption

Transit survey M ∗ , R ∗

5% Host star mass and radius P negligible Orbital period δ negligible Baseline transit depth a eﬀ † R †

5% Planet radiusH O Detected / not detected Presence of 1.4 µ m and 1.9 µ m H O absorption

Example 2Imaging survey t ∗

10% Age (as measured through asteroseismology)O Detected / not detected Presence of 0.7 µ m O absorption Transit survey t ∗

30% Age (model-based estimate)O Detected / not detected Presence of 0.6 µ m O absorption Bixel & Apai

Example 1: Target prioritization and distributionImaging survey R e ff ( R ) Target priority

Resulting distribution a eff (AU) Transit survey R ( R ) Target priority

Resulting distribution a eff (AU) Figure 5.

Summary of target prioritization for the simulated imaging (top) and transit (bottom) surveys in Section 6. The leftpanel shows the relative weight assigned to each target as a function of size and orbital separation ( w i in Equation 8). The rightpanel shows the resulting relative distribution of targets which can be probed for the presence of water vapor within the surveyduration. In the case of the imaging survey, the planet size cannot be directly measured, so the “estimated” radius (assumingEarth-like reﬂectivity) is used as a proxy. In the case of the transit survey, an additional weight is applied to counteract the R ∗ /a transit probability (not shown above). ioverse Table 3.

Parameter prior distributions for Equations 14 and 16.Parameter Description Prior limits (log-uniformdistribution)

Example 1 a inner Inner edge of the habitable zone 0.1 – 2.0 AU∆ a Width of the habitable zone 0.01 – 10 AU f HZ Fraction of habitable zone planets with H O 0.001 – 1( f non-HZ /f HZ ) Fraction of non-habitable zone planets withH O (relative to f HZ ) 0.001 – 1 Example 2 f life Fraction of EECs with life 0.001 – 1 t / Oxygenation timescale of inhabited planets 0.1 to 100 Gyr

Table 4.

The predicted signal strengths of H O (Example 1) and O or O (Example 2) absorptionfor a representative target of each survey mode, expressed as the peak amplitude of the change inplanet-to-star contrast ratio (in parts-per-trillion) or transit depth (in parts-per-million) within theabsorption band. The exposure time required for a 5 σ detection is determined using PSG, andscaled for each individual target according to Equation 5. In transit mode, the signals from twobands are combined to achieve the detection of H O. In imaging mode, we select the feature whichrequires the least exposure time to detect.Survey mode Feature Wavelength Signal strength Signal strength Time required (hr)(without clouds) (with clouds) (with clouds)Imaging H O 1.4 µ m 90 ppt 55 ppt 0.9O µ m 90 ppt 70 ppt 2.6Transit H O 1.4 µ m 3.5 ppm 0.5 ppm 1811.9 µ m 5 ppm 0.7 ppmO µ m 4 ppm 2 ppm 74 This is a four parameter model with (cid:126)θ =[ a inner , ∆ a, f HZ , ( f non-HZ /f HZ )]. The choice of parame-ters was driven by two factors: ﬁrst, the width of thehabitable zone (∆ a ) is relevant for testing “rare Earth”models in which the habitable zone is very narrow. Sec-ond, we can use simple log-uniform prior distributionsfor these parameters without having to ﬁlter out pa-rameter combinations which violate the assumptions ofthe habitable zone hypothesis (e.g., f non-HZ > f HZ ).6.4. Prior assumptions

K14’s model for the habitable zone, which we im-plement in the simulated planet population, assumes acarbon-silicate feedback cycle which enhances CO con-centrations for planets further from their host stars, andspans 0.95–1.67 AU for the Sun. While this estimate hasstrong heritage (Kasting et al. 1993; Kopparapu et al. 2013), it has also been preceded and succeeded by moreconservative or generous estimates, which we use to setthe prior distribution of values considered for a inner and∆ a .Estimates of the inner edge range as far inward as 0.38AU (for highly-reﬂective desert worlds with a minimalgreenhouse eﬀect, Zsom et al. 2013) and we allow thatthe inner edge could be as far out as 2 AU, in which caseEarth would be an unusually cool outlier. Estimates ofthe habitable zone width have varied as well; a classic es-timate by Hart (1979) suggests a very narrow habitablezone (∆ a < . a inner and6 Bixel & Apai ∆ a , we assume broad prior distributions for both, shownin Table 3. 6.5. Results

We repeat the simulated survey and Bayesian analysis > ,

000 times over a grid of values for the astrophysi-cal parameters in Equation 13 for both survey architec-tures. With each simulated survey, we use dynesty tocalculate the Bayesian evidence in favor of the habitablezone hypothesis, and emcee to sample the posterior dis-tributions of a inner and ∆ a . The results are summarizedby Figure 6 and 7 for the simulated imaging and transitsurveys, respectively.6.5.1. Imaging survey

An ambitious direct imaging survey with a 15-metertelescope could conﬁdently detect the habitable zonewith a 3-month long observing campaign provided mostEECs are habitable. If habitable planets are less com-mon, however, then more EECs must be observed. TheEEC yield of an imaging survey is typically volume-limited, so higher values of η ⊕ would be required totest this hypothesis for more pessimistic astrophysicalparameters. In the best case scenario ( η ⊕ ≈ η ⊕ is likely toooptimistic.If ∼

80% of EECs are habitable, the imaging surveywould be able to measure the location of the habitablezone with suﬃcient accuracy to exclude some more ex-treme estimates of its boundaries with reasonable con-ﬁdence. In particular, it would be able to place a con-ﬁdent lower bound on ∆ a , rejecting some “rare Earth”models which predict a very narrow habitable zone (e.g.,Hart 1979).Finally, it should be noted that imaging surveys willhave access to planet brightness and color informationwhich could be incorporated into this analysis; for exam-ple, albedo and photometric color may vary predictablyacross the habitable zone (Checlair et al. 2019). Hy-potheses which include this information could be testedwith better statistical power and parameter constraintsthan the one examined here.6.5.2. Transit survey

The transit survey can conﬁdently detect the habit-able zone even in the case where most EECs are nothabitable ( f H OEEC ≈ ± . ∼ O absorption features. Both ofthese eﬀects increase the time required to characterizecold planets, making them low priority targets.6.6.

Discussion

Impact of clouds

Clouds will have a major impact on the transit sur-vey’s ability to test the habitable zone hypothesis, asthey dampen the absorption signal due to troposphericwater vapor and therefore increase the number of transitobservations required to detect it. As shown in Figure7a, this means that a much smaller number of targetscan be observed within a ﬁxed time budget, and many ofthe most distant targets become infeasible to character-ize as it would require the combination of decades’ worthof transit observations. A possible mitigating strategywould be to expand the observatory’s light-collectingarea. The Nautilus Space Observatory, on which we baseour transit survey results (Apai et al. 2019), would con-sist of 35 identically-manufactured unit telescopes. Assuch, the cost would scale linearly with light-collectingarea, and doubling the number of telescopes would re-duce by nearly half the number of transit observationsrequired to characterize each planet.Our cloud assumptions are based on the GCM mod-els of Komacek & Abbot (2019), who show that tidallylocked planets around M dwarfs have much higher day-side cloud covering fractions than Earth-like planets. Ifthis bears true, it will likely prevent the characterizationof such planets through transit spectroscopy by JWST(Fauchez et al. 2019; Komacek et al. 2020; Suissa et al.2020; Pidhorodetska et al. 2020) and possibly even largerobservatories. In the pessimistic case, even a 50-meterequivalent area transit survey may be unable to detectatmospheric water vapor for all but a handful of nearbyexo-Earths orbiting M dwarfs, so the survey must targetmore distant K and G dwarfs instead. This will come atthe cost of sample size, as we estimate that the increasedaverage distance, less frequent transits, and lower tran-sit depths for habitable zone planets around these stars ioverse Example 1: Results for 15-meter imaging surveyHow many exo-Earth candidates are probed for H O? N u m b e r o f c h a r a c t e r i z e d EE C s (a) HZ?

Can this survey test the habitable zone hypothesis?

Fraction of exo-Earthcandidates with H O F r a c t i o n o f n o n - h a b i t a b l e p l a n e t s w i t h H O (b) Statistical power % % % S t a t i s t i c a l p o w e r ( % ) Fraction of exo-Earthcandidates with H O F r a c t i o n o f n o n - h a b i t a b l e p l a n e t s w i t h H O (c) Required EEC yield ( = . % ) ( . % ) ( . % ) N u m b e r o f EE C s Can this survey determine the location of the habitable zone?(results for six random realizations of the survey)

Realization 1

Optimistic case

Realization 1 (d)

Pessimistic case

Realization 2 Realization 2Realization 3 Realization 3Realization 4 Realization 4Realization 5 Realization 5

Realization 6

Realization 6 a eff (AU) P ( i n h a b i t a b l e z o n e ) a eff (AU) P ( i n h a b i t a b l e z o n e ) Figure 6.

Results for the imaging survey in Section 6. (a) The number of EECs observed versus η ⊕ (for G stars), assuming t total = 120 d. As our baseline case, we set η ⊕ = 7 . η ⊕ . (d) The posterior probability that a planet with eﬀectiveseparation a eﬀ is in the habitable zone, as estimated by six random realizations of the survey under an optimistic case (80%of EECs are habitable, left) and pessimistic case (20% of EECs are habitable, right). The true habitable zone is highlighed ingreen, and in both cases 1% of non-habitable planets have H O. Bixel & Apai

Example 1: Results for 50-meter (equivalent area) transit surveyHow many exo-Earth candidates are probed for H O? Time budget (yr) N u m b e r o f c h a r a c t e r i z e d EE C s (a) without cloudswith cloudsSAG13 HZ?

Can this survey test the habitable zone hypothesis?

Fraction of exo-Earthcandidates with H O F r a c t i o n o f n o n - h a b i t a b l e p l a n e t s w i t h H O (b) Statistical power % % % S t a t i s t i c a l p o w e r ( % ) Fraction of exo-Earthcandidates with H O F r a c t i o n o f n o n - h a b i t a b l e p l a n e t s w i t h H O (c) Required EEC yield ( t = d ) ( d ) N u m b e r o f EE C s Can this survey determine the location of the habitable zone?(results for six random realizations of the survey)

Survey 1 (d)

Optimistic case

Survey 1

Pessimistic case

Survey 2 Survey 2Survey 3 Survey 3Survey 4 Survey 4Survey 5 Survey 5

Survey 6

Survey 6 a eff (AU) P ( i n h a b i t a b l e z o n e ) a eff (AU) P ( i n h a b i t a b l e z o n e ) Figure 7.

Results for the transit survey in Section 6. (a) The number of EECs observed versus the observing time budget,assuming η ⊕ = 7 .

5% for G stars and cloudy atmospheres (solid). 4–10 × as many planets could be observed if clouds wereneglected (dashed), or 1.5–3 × as many with clouds if assuming the higher SAG13 estimate of η ⊕ = 24% (dotted). As our baselinecase, we set t total = 2 yr. (b) The statistical power to test the habitable zone hypothesis as a function of the astrophysicalparameters in Equation 13. (c) The minimum number of EECs which must be characterized to achieve 80% statistical power,with the necessary observing time budget t total . (d) The posterior probability that a planet with eﬀective separation a eﬀ is inthe habitable zone, as estimated by six random realizations of the survey under an optimistic case (80% of EECs are habitable,left) and pessimistic case (20% of EECs are habitable, right). The true habitable zone is highlighted in green, and in both cases1% of non-habitable planets have H O. ioverse Eﬀect of non-habitable H O-rich atmospheres

Naturally, the habitable zone hypothesis is easier totest if more habitable planets are observed, and the num-ber of EEC characterizations required to test it is ap-proximately proportional to the fraction of EECs whichare habitable ( f H OEEC ). However, non-habitable planetsare far more common than habitable planets, so if evena small fraction ( f H Onon-EEC ) of these have H O, the statis-tical excess of H O in the habitable zone will be muted.In general, we ﬁnd the statistical power to be unaf-fected provided that f H Onon-EEC (cid:46) f H Onon-EEC (cid:38) f H Onon-EEC >

10% would imply that H O-richnon-habitable planets are more common than habitableplanets.Our assumption in Equation 13 is that all EECs withwater vapor are habitable, and the fraction of non-EECswith water vapor is mostly independent of insolation.However, if such “false positives” exist, their abundanceis likely a function of insolation. For example, considera population of non-habitable planets whose surfaceshave been desiccated by a runaway greenhouse eﬀect butwhich still maintain thick, H O-rich atmospheres. Such planets should be clustered near the inner edge of thehabitable zone (e.g., Turbet et al. 2019), appearing asan extension of the habitable planet population to highinsolations rather than as a distinct planet population.Even planets deﬁned as EECs may actually be non-habitable (due to diﬀerences in initial volatile content,plate tectonics, outgassing rates, etc.) yet still possesswater vapor, making them statistically indistinguish-able from habitable EECs. Again, the eﬀect of thesefalse positives will likely be negligible provided they aremuch less common than habitable planets, but indica-tors of planetary (non-)habitability other than H O maybe necessary to ﬁlter them out. EXAMPLE 2: EVOLUTION OF EARTH-LIKEPLANETSBy characterizing a suﬃciently-large sample of ter-restrial worlds, a next-generation observatory couldtest hypotheses for how they evolve over time. Onesuch hypothesis is that inhabited planets with oxygen-producing life, like Earth, evolve towards greater oxygencontent over Gyr timescales due to long-term changesin global redox balance. As we propose in Bixel &Apai (2020a) (hereafter B20), the impact on a popula-tion level would be a positive “age-oxygen correlation”,wherein older inhabited planets are more likely to haveoxygenated atmospheres.If inhabited planets do tend to evolve towards greateroxygen content over time, then what is the typicaltimescale for this evolution? Earth underwent majoroxygenation events at 2–2.5 Gyr of age and again at ∼ ∼ Bixel & Apai

Age (Gyr) no O O f O ( t * ) Figure 8.

An example of a simulated transit spectroscopy data set for Section 7. Earth-sized planets in the habitable zone areprobed for the presence of O (a tracer of O ), which we assume becomes more common with age as more planets undergo globaloxidation events. This “age-oxygen correlation” (Equation 16) is represented by the grey line, in this case where f life = 80% ofobserved planets are inhabited and the oxygenation timescale is 5 Gyr. Age estimates are uncertain to ± Age (Gyr) T a r g e t p r i o r i t y Figure 9.

Target prioritization for both surveys in Sec-tion 4.3, optimized to favor observations of younger andolder planets to maximize the detectability of age-dependenttrends. This also reﬂects the age distribution of characterizedtargets, because the simulated planet sample has a uniformage distribution. rest of that population. Here, we assess the ability ofdirect imaging and transit surveys to study the oxygena-tion history of Earth-like planets. This section followsa similar methodology to our previous analysis (B20),but expands upon it by incorporating a more thoroughassessment of planet occurrence rates, detection sensi-tivity, and survey strategy, and by studying a broaderrange of evolutionary timescales.7.1.

Model predictions

We assume a fraction f life of EECs to be inhabitedby life - note that this parameter absorbs factors aﬀect-ing both the planet’s habitability and the likelihood oflife originating. Over time, simulated inhabited planetstransition from anoxic to oxygenated atmospheres at anaverage rate described by a half life t / . The result-ing fraction of habitable planets which have oxygenated atmospheres as a function of age t ∗ is: f O ( t ∗ ) = f O ( t ∗ ) = f life (cid:16) − . t ∗ /t / (cid:17) (15)Note that we assume oxygenated atmospheres to haveboth O and its photochemical byproduct O . We runsimulations for f life ranging from 0–100% and for t / ranging from 500 Myr – 50 Gyr.7.2. Simulated survey

Measurements

The measurements performed by each simulated sur-vey are summarized in Table 7. First, we measure theage ( t ∗ ) of every planet’s host star with 10% precisionfor the imaging survey and 30% precision for the tran-sit survey. These estimates represent the state of theart for high- and low-mass stars, respectively. For high-mass stars, asteroseismology has yielded highly preciseage constraints for Kepler targets (e.g., Creevey et al.2017; Kayhan et al. 2019; Lund et al. 2019), and willlikely be able to do so for most of the O (100) stel-lar targets probed by an imaging mission. For low-mass stars, asteroseismology has not been successful(e.g., Rodr´ıguez-L´opez et al. 2015; Rodr´ıguez et al. 2016;Berdi˜nas et al. 2017), and age determination currentlyrelies on a synthesis of model-based estimates. As anexample, Burgasser & Mamajek (2017) use a combina-tion of approaches to determine the age of TRAPPIST-1planetary system (Gillon et al. 2017) with ∼

30% preci-sion.Next, each planet is observed to constrain the pres-ence of oxygen. For an Earth-like planet, O can bedetected directly through its 0 . µ m absorption featureor inferred through absorption by stratospheric ozone inthe Chappuis (0.40–0.65 µ m) or Hartley (0.2–0.3 µ m)bands. It should be noted that our calculations assume ioverse and O abundances, an assumptionwhich we revisit in Section 7.6.1.For each survey mode, we determine which of thesethree features would be easiest to observe across the fullrange of detected EECs. In imaging mode we observeO -A absorption; while the Hartley band may be easierto detect for a solar-type star, it becomes more expensiveto observe for lower-mass stars, and the Chappuis bandsignal is too shallow. Ultimately this consideration isunimportant for the volume-limited imaging survey, andit is likely that all three features will be searched for inthe atmospheres of all detected EECs. In transit modewe observe the Chappuis band, as its signal is strong intransit observations. The Hartley band is inaccessiblefor the vast majority of (predominantly M dwarf) transitsurvey targets, and the O -A feature is too shallow andnarrow to detect for distant targets.In total, the simulated surveys produce measurementsof ( t ∗ , O ) for each observed EEC, where O = { , } indicates the detection or non-detection of either O -Aabsorption (imaging mode) or O Chappuis band ab-sorption (transit mode).7.2.2.

Target prioritization

Unlike in the previous example, we do not prioritizetargets by size or insolation except that we assume alltargets have previously been identiﬁed as EECs (per-haps with follow-up observations to conﬁrm the pres-ence of H O). This assumption is not trivial; imagingsurveys cannot easily determine a planet’s size, and thetrue range of planet sizes and insolations which permithabitability are not yet known. In reality, it is likelythat an actual biosignature survey will probe some plan-ets which are not habitable for reasons yet unknown tothe observer, which will serve as a source of noise (i.e.by reducing f life ).However, we do prioritize targets by age according toFigure 9, with observations of the youngest and oldestplanets being preferred. This is not intended to counterany bias in the underlying sample, as there are no fac-tors which bias the number of planets which can be char-acterized by our simulated surveys as a function of age.Rather, as we demonstrate in B20, a survey which prior-itizes younger and older planets will be more sensitive tomonotonic, age-dependent trends because of the largercontrast between those categories. While this prioriti-zation strategy is optimal for studying the evolution ofEarth-like planets, it must be balanced versus the sur-vey’s other goals. Notably, it de-prioritizes observationsof modern Earth analogs, which may be the best plan-ets to probe if the sole goal is to maximize the chanceof detecting O . 7.2.3. Time budget

As discussed in Section 7.2.1, we consider the detec-tion of O -A absorption in imaging mode and O Chap-puis band absorption in transit mode. The details of theexposure time calculations are shown in Table 4. UsingPSG, we determine the exposure time required for thereference target to be t ref = 2 . t ref = 74 hr for transit mode.7.3. Hypothesis and prior assumptions

Once more, we take the role of an observer intrepretingthe results of each simulated survey. Our hypothesis isthat inhabited planets tend to evolve towards greateroxygen content over time, and can be stated in similarterms as Equation 15: h ( t ∗ ) = f life (cid:16) − . t ∗ /t / (cid:17) (16)We adopt broad, log-uniform prior distributions for f life and t / , shown in Table 3, reﬂecting our signiﬁ-cant prior uncertainty as to frequency and evolutionarytimescales of inhabited planets.7.4. Correlation test

In lieu of the Bayesian evidence test used in the previ-ous example, we employ the Mann-Whitney test (Mann& Whitney 1947) to determine whether t ∗ correlateswith the presence of oxygen, as we previously have donein B20. This model-independent test is more sensitivefor detecting the correlation than the Bayesian evidence-based approach, especially in the limit of small samplesizes. However, it does not allow for the estimation of t / , for which we rely on MCMC sampling.7.5. Results

We assess the statistical power of each survey to testthe age-oxygen correlation hypothesis, using the Mann-Whitney test to determine whether a positive correlationcan be detected in each simulated data set, and emcee to sample the posterior distributions of t / and f life .Our results are summarized in Figure 10 for the imagingsurvey and Figure 11 for the transit survey.7.5.1. Imaging survey

Assuming η ⊕ ≈ . ∼

20 EECs only if most are in-habited. In order for an imaging survey to be reliablycapable of studying the oxygen evolution of Earth-like2

Bixel & Apai

Example 2: Results for 15-meter imaging surveyHow many exo-Earth candidates are probed for O ? N u m b e r o f c h a r a c t e r i z e d EE C s (a) t(O )? Could this survey detect the age-oxygen correlation?

Fraction of exo-Earthcandidates with life ( f life ) O xy g e n a t i o n t i m e s c a l e t / ( G y r ) (b) % % % Statistical power S t a t i s t i c a l p o w e r ( % ) Fraction of exo-Earthcandidates with life ( f life ) O xy g e n a t i o n t i m e s c a l e t / ( G y r ) (c) ( = . % ) ( . % ) ( . % ) Required EEC yield N u m b e r o f EE C s Figure 10.

Results for the imaging survey in Section 7. (a) The number of EECs observed versus η ⊕ (for G stars), assuming t total = 120 d. For our baseline case, we set η ⊕ = 7 . η ⊕ . planets under optimistic circumstances, a sample size of >

50 EECs is necessary, requiring either η ⊕ (cid:38)

20% ora smaller inner working angle than assumed here (3.5 λ / D ). 7.5.2. Transit survey

By probing 100-150 EECs for ozone, the transit sur-vey is able to detect the age-oxygen correlation with highstatistical power assuming life to be somewhat common( f life (cid:38) f life (cid:38) ∼

500 Myr or as longas ∼

20 Gyr.Under the case where life is very common, the transitsurvey could place meaningful constraints on the oxy-genation timescale. As shown in Figure 11, the surveycan distinguish between scenarios where global oxygena-tion occurs very quickly ( t / ∼ . ∼ (cid:38)

10 Gyr), since no planets of thatage exist in the sample. This is due in part to the highdegeneracy between t / and f life - that is, if only a fewoxygenated planets are found, it may be because life isuncommon, or because life is common but global oxy-genation is very slow and has not yet had time to occuron most inhabited worlds.7.6. Discussion

Detectability of oxygen through Earth’s history

In this section we consider all oxygenated planets tohave the same O and O abundance as modern Earth.However, during the Proterozoic era (approx. 2.2 – 0.6Gya), Earth had a partially oxygenated atmosphere with p O <

1% (Lyons et al. 2014). If other inhabited planetsdo evolve like Earth, this suggests that many of them ioverse Example 2: Results for 50-meter (equivalent area) transit surveyHow many exo-Earth candidates are probed for O ? Time budget (yr) N u m b e r o f c h a r a c t e r i z e d EE C s (a) w i t h o u t c l o u d s with clouds S A G t(O )? Could this survey detect the age-oxygen correlation?

Fraction of exo-Earthcandidates with life ( f life ) O xy g e n a t i o n t i m e s c a l e ( t / ) (b) Statistical power % % % S t a t i s t i c a l p o w e r ( % ) Fraction of exo-Earthcandidates with life ( f life ) O xy g e n a t i o n t i m e s c a l e ( t / ) (c) Required EEC yield ( t = d ) ( d ) N u m b e r o f EE C s Could this survey determine the oxygenation timescale?(results for six random realizations of the survey)

Survey 1 (d)

Survey 1 Survey 1Survey 2 Survey 2 Survey 2Survey 3 Survey 3 Survey 3Survey 4 Survey 4 Survey 4Survey 5 Survey 5 Survey 5

Survey 6

Survey 6 t (Gyr) d P / d t / Figure 11.

Results for the transit survey in Section 7. (a) The number of EECs observed versus the observing time budget,assuming η ⊕ = 7 .

5% for G stars and cloudy atmospheres (solid). 2–4 × as many planets could be observed if clouds were neglected(dashed), or 1.5–3 × as many with clouds if assuming the higher SAG13 estimate of η ⊕ = 24% (dotted). As our baseline case, weset t total = 730 d. (b) The statistical power to detect the age-oxygen correlation as a function of the astrophysical parametersin Equation 15. (c) The minimum number of EECs which must be characterized to achieve 80% statistical power, with thenecessary observing time budget t total . (d) Distribution of possible values for the oxygenation timescale as measured by sixrandom realizations of the survey under the optimistic assumption that 80% of EECs are inhabited. Results are shown for fast(0.5 Gyr, left), Earth-like (3 Gyr, center), and slow (10 Gyr, right) evolutionary scenarios, with the truth values marked by ablue line. Bixel & Apai may have 1–3 orders of magnitude less O than modernEarth.In our analysis, t / is the typical timescale requirefor a planet to achieve a detectable amount of O or O .Even if Proterozoic Earth analogs are common and theiroxygen is undetectable, our results should not be af-fected provided that they will eventually develop richly-oxygenated atmospheres like modern Earth’s. In thiscase t / corresponds to the end of the Proterozoic ( ∼ p O = 0 . − t / corresponds to the end of the Archean ( ∼ Hartley band absorption which would have beendetectable throughout the Proterozoic (Reinhard et al.2017). For transit spectroscopy, ultraviolet sensitivitywill be diﬃcult to achieve in a sample of predominantlyM stars, so to detect O at Protorezoic-like levels willrequire the prioritization of G and K targets instead.A LUVOIR-like direct imaging survey targeting G andK dwarfs may be capable of detecting Proterozoic-likeozone levels for individual targets, but the sample sizewill still be too small unless both η ⊕ and f life are large( (cid:38) Abiotic oxygen sources

We only consider planets on which O is biologicallyproduced - as it was in Earth’s history - but othershave considered scenarios through which an Earth-sizedplanet near or within the habitable zone could acquiredetectable levels of oxygen through abiotic processes(for a review, see Meadows et al. 2018). The oxygenin these models tends to initially derive from H O orCO dominated atmospheres shortly after the planet’sformation and can linger in the atmosphere long enoughto serve as a potential “false positive” biosignature fornext-generation observatories. In B20, by assuming thefraction of planets with abiotically produced oxygen tobe independent of age, we show that these false positiveswill have a small impact on the detectability of the age-oxygen correlation provided that they are less commonthan Earth-like planets with biogenic O .In reality, atmospheres with abiotically-produced oxy-gen will evolve over time. On Earth, oxygen is continu-ally produced in large enough quantities to overcome itssubstantial geological sinks. On planets where oxygenis, e.g., a remnant of primordial ocean loss, it would bedepleted over time. This suggests a statistical test to de- termine whether oxygen is a reliable biosignature: if thefraction of EECs with oxygen decreases with age, thiswould suggest much of the oxygen to be of a primordial,abiotic origin.Finally, it is plausible that both populations ofoxygen-rich worlds exist in comparable numbers: onewith abiotically-produced oxygen which diminishes overtime, and another with biologically produced atmo-spheres which increases over time. Whether the Earth-like age-oxygen correlation could be detected would de-pend on the timescales of the two processes. For ex-ample, if most planets with abiotically produced oxygenlose it before 1 Gyr, and most planets with biogenic oxy-gen acquire it by 10 Gyr, then it should be possible todistinguish the two populations. SUMMARYWe have presented

Bioverse , a simulation tool de-signed to gauge the potential of future observatories totest statistical hypotheses about the formation and evo-lution of planetary systems and habitable worlds. Toachieve this,

Bioverse leverages statistically realisticsimulations of nearby planetary systems, a survey sim-ulator designed to produce data sets representative ofdiﬀerent observatory conﬁgurations and survey strate-gies, and a hypothesis testing module to assess the in-formation content of the data. We demonstrated twoapplications of our code.In the ﬁrst example, we determined whether a fu-ture direct imaging (15-meter diameter) or transit spec-troscopy (50-meter equivalent diameter) survey couldempirically test the concept of a habitable zone as wellas measure its location and width. With samples assmall as 15–20 EECs, we found that both surveys willbe capable of testing the habitable zone hypothesis ifhabitable planets are common ( (cid:38)

50% of EECS), andthat they can constrain the habitable zone’s width wellenough to rule out very wide (e.g., 1–10 AU) or narrow(e.g., 1–1.1 AU). A survey which can characterize 60–70EECs for atmospheric water vapor can test the habitablezone hypothesis even if habitable planets are less com-mon (20–40% of EECs), but would be diﬃcult to achievewith currently-envisioned direct imaging observatories.Our estimates suggest that this would be feasible for alarge aperture transit survey, but the EEC sample sizeis sensitive to the impact of cloud cover (and other fac-tors not considered here, such as stellar contamination(Rackham et al. 2018)).In the second example, we expanded upon the age-oxygen correlation proposed in B20, ﬁnding that fu-ture surveys which aim to study the oxygen evolution ofEarth-like planets must expect to characterize at least ioverse ∼

50 EECs by detecting the presence of modern Earth-like O or O absorption. With a sample size of 100-150 EECs – if most of them are inhabited – a surveycould begin to constrain the evolutionary timescale withmeaningful precision, and could determine whether theoxygenation of Earth-like planets proceeds at an Earth-like pace (2–5 Gyr timescale) or much faster ( ∼ . absorption will be bene-ﬁcial if Proterozoic Earth analogs are common, but maynot be necessary provided they eventually evolve to amodern Earth-like state.The statistical power of either survey to test thesehypotheses depends critically on the number of EECsdetected, but recent evidence suggests that existing es-timates of η ⊕ are too high (Pascucci et al. 2019; Neil &Rogers 2020). Assuming η ⊕ = 7 .

5% for Sun-like stars,we found that an ambitious 15-meter mirror diameterimaging survey would likely detect 15–20 EECs. Sucha survey may have high statistical power for studies ofterrestrial planets in general (including those outside thehabitable zone), but will only be able to test the hab-itable zone concept if most EECs are habitable, or iftracers of habitability other than H O absorption areconsidered. Unless η ⊕ > Bioverse can also com-bine multiple measurements for each planet which tracethe same underlying physical conditions (such as hab-itability), allowing surveys to achieve greater statisti-cal sensitivity with limited sample sizes. For example,by incorporating measurements of planetary brightness and color in addition to H O absorption, imaging sur-veys may be able to test the habitable zone conceptwith smaller sample sizes - provided a hypothesis existsfor how these properties should vary with orbital sep-aration (e.g., Checlair et al. 2019). Similarly, if cloudsmake the detection of H O diﬃcult for a transit survey,then stratospheric O may oﬀer an alternative tracerof planetary habitability (provided O is predominantlyproduced by life).With Bioverse , we aim to enable future space-based exoplanet surveys to test hypotheses includingand beyond the examples explored here, and to em-phasize the importance of population-level studies fornext-generation exoplanet surveys. While target-by-target analyses of the closest planets will be valuable,population-level studies will reveal fundamental truthsabout the laws governing non-habitable, habitable, andinhabited worlds.ACKNOWLEDGMENTSWe thank Chris Stark for providing a possible realiza-tion of the LUVOIR-A target list and Tad Komacekfor providing the GCM model outputs used in thiswork. A.B. acknowledges support from the NASA Earthand Space Science Fellowship Program under grant No.80NSSC17K0470. The results reported herein beneﬁtedfrom collaborations and/or information exchange withinNASA’s Nexus for Exoplanet System Science (NExSS)research coordination network sponsored by NASA’sScience Mission Directorate. This research has madeuse of NASA’s Astrophysics Data System.

Software: dynesty (Speagle 2020), emcee (Foreman-Mackey et al. 2013), Matplotlib (Hunter 2007), NumPy(Oliphant 2006), PSG (Villanueva et al. 2018), SciPy(Virtanen et al. 2020)REFERENCES

Apai, D., Milster, T. D., Kim, D. W., et al. 2019, TheAstronomical Journal, 158, 83,doi: 10.3847/1538-3881/ab2631Apai, D., Milster, T. D., Kim, D. W., et al. 2019a, inSociety of Photo-Optical Instrumentation Engineers(SPIE) Conference Series, Vol. 11116, AstronomicalOptics: Design, Manufacture, and Test of Space andGround Systems II, 1111608, doi: 10.1117/12.2529428Apai, D., Cowan, N., Kopparapu, R., et al. 2017, arXive-prints, arXiv:1708.02821.https://arxiv.org/abs/1708.02821 Apai, D., Bixel, A., Rackham, B. V., et al. 2019b, inBulletin of the American Astronomical Society, Vol. 51,141Bean, J. L., Abbot, D. S., & Kempton, E. M.-R. 2017, TheAstrophysical Journal, 841, L24,doi: 10.3847/2041-8213/aa738aBerdi˜nas, Z. M., Rodr´ıguez-L´opez, C., Amado, P. J., et al.2017, MNRAS, 469, 4268, doi: 10.1093/mnras/stx1140Bixel, A., & Apai, D. 2020a, arXiv e-prints,arXiv:2005.01587. https://arxiv.org/abs/2005.01587—. 2020b, AJ, 159, 3, doi: 10.3847/1538-3881/ab5222 Bixel & Apai ioverse Pidhorodetska, D., Fauchez, T. J., Villanueva, G. L.,Domagal-Goldman, S. D., & Kopparapu, R. K. 2020,ApJL, 898, L33, doi: 10.3847/2041-8213/aba4a1Pierrehumbert, R., & Gaidos, E. 2011, ApJL, 734, L13,doi: 10.1088/2041-8205/734/1/L13Quanz, S. P., Kammerer, J., Defr`ere, D., et al. 2018, inSociety of Photo-Optical Instrumentation Engineers(SPIE) Conference Series, Vol. 10701, Proc. SPIE,107011I, doi: 10.1117/12.2312051Rackham, B. V., Apai, D., & Giampapa, M. S. 2018, ApJ,853, 122, doi: 10.3847/1538-4357/aaa08cRamirez, R. M., & Kaltenegger, L. 2017, ApJL, 837, L4,doi: 10.3847/2041-8213/aa60c8Rauer, H., Catala, C., Aerts, C., et al. 2014, ExperimentalAstronomy, 38, 249, doi: 10.1007/s10686-014-9383-4Reinhard, C. T., Olson, S. L., Schwieterman, E. W., &Lyons, T. W. 2017, Astrobiology, 17, 287,doi: 10.1089/AST.2016.1598Ricker, G. R., Winn, J. N., Vanderspek, R., et al. 2015,Journal of Astronomical Telescopes, Instruments, andSystems, 1, 014003, doi: 10.1117/1.JATIS.1.1.014003Rodr´ıguez, E., Rodr´ıguez-L´opez, C., L´opez-Gonz´alez, M. J.,et al. 2016, MNRAS, 457, 1851,doi: 10.1093/mnras/stw033Rodr´ıguez-L´opez, C., Gizis, J. E., MacDonald, J., Amado,P. J., & Carosso, A. 2015, MNRAS, 446, 2613,doi: 10.1093/mnras/stu2211Rogers, L. A. 2015, ApJ, 801, 41,doi: 10.1088/0004-637X/801/1/41Shields, A. L., Ballard, S., & Johnson, J. A. 2016, PhR,663, 1, doi: 10.1016/j.physrep.2016.10.003Skilling, J. 2006, Bayesian Anal., 1, 833,doi: 10.1214/06-BA127Snaith, O., Haywood, M., Di Matteo, P., et al. 2015, A&A,578, A87, doi: 10.1051/0004-6361/201424281Speagle, J. S. 2020, MNRAS, 493, 3132,doi: 10.1093/mnras/staa278 Staguhn, J., Mandell, A., Stevenson, K., et al. 2019, arXive-prints, arXiv:1908.02356.https://arxiv.org/abs/1908.02356Stark, C. C., Belikov, R., Bolcar, M. R., et al. 2019,Journal of Astronomical Telescopes, Instruments, andSystems, 5, 024009, doi: 10.1117/1.JATIS.5.2.024009Suissa, G., Mandell, A. M., Wolf, E. T., et al. 2020, ApJ,891, 58, doi: 10.3847/1538-4357/ab72f9The HabEx Team. 2019, The HabEx Final Report, Tech.rep.The LUVOIR Team. 2019, arXiv e-prints,arXiv:1912.06219. https://arxiv.org/abs/1912.06219Traub, W. A., & Oppenheimer, B. R. 2010, Direct Imagingof Exoplanets, ed. S. Seager, 111–156Trotta, R. 2008, Contemporary Physics, 49, 71,doi: 10.1080/00107510802066753Tuomi, M., Jones, H. R. A., Butler, R. P., et al. 2019, arXive-prints, arXiv:1906.04644.https://arxiv.org/abs/1906.04644Turbet, M., Ehrenreich, D., Lovis, C., Bolmont, E., &Fauchez, T. 2019, A&A, 628, A12,doi: 10.1051/0004-6361/201935585Valencia, D., O’Connell, R. J., & Sasselov, D. D. 2007,ApJL, 670, L45, doi: 10.1086/524012Villanueva, G. L., Smith, M. D., Protopapa, S., Faggi, S., &Mandell, A. M. 2018, JQSRT, 217, 86,doi: 10.1016/j.jqsrt.2018.05.023Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020,Nature Methods, 17, 261, doi: 10.1038/s41592-019-0686-2Weiss, L. M., & Marcy, G. W. 2014, The AstrophysicalJournal, 783, L6, doi: 10.1088/2041-8205/783/1/L6Winn, J. N. 2010, arXiv e-prints, arXiv:1001.2010.https://arxiv.org/abs/1001.2010Wolfgang, A., Rogers, L. A., & Ford, E. B. 2016, ApJ, 825,19, doi: 10.3847/0004-637X/825/1/19Zahnle, K. J., & Catling, D. C. 2017, The AstrophysicalJournal, 843, 122, doi: 10.3847/1538-4357/aa7846Zsom, A., Seager, S., de Wit, J., & Stamenkovi´c, V. 2013,ApJ, 778, 109, doi: 10.1088/0004-637X/778/2/109 Bixel & Apai

APPENDIX A. LIST OF SYMBOLS

Table 5 . A list of common abbreviations and symbols used in this paper.Symbol Description

Abbreviations

EEC “exo-Earth candidate” (or “potentially habitable planet”); planets inthe radius range 0 . S/S ⊕ ) . < R p < . R ⊕ LUVOIR Large UV/Optical/Infrared Surveyor (The LUVOIR Team 2019)SAG13 NASA’s Exoplanet Program Analysis Group Science Analysis Group 13PSG NASA/GSFC Planetary Spectrum Generator (Villanueva et al. 2018)IWA, OWA Inner, outer working angles of a coronagraphic instrumentMCMC Markov Chain Monte Carlo

Stellar properties d Distance to star M ∗ , R ∗ , L ∗ Mass, radius, and luminosity T ∗ Eﬀective temperature t ∗ Age of star and planetary system a inner , a outer Inner and outer edge of the star’s habitable zone

Planet properties M p , R p , g p Mass, radius, and surface gravity h Atmospheric scale height P Orbital period a Semi-major axis a eﬀ Semi-major axis scaled by the stellar luminosity, a eﬀ = a ( L ∗ /L (cid:12) ) − . cos( i ) (Cosine of) orbital inclination b Transit impact parameter, assuming a circular orbit δ Planet transit depth, δ = ( R p /R ∗ ) ∆ δ Approximate transit depth induced by planet’s atmosphere, ∆ δ ∼ h/R p ) ζ Planet-to-star contrast ratio R est Estimated planet radius assuming Earth-like reﬂectivity (direct imagingonly), R est /R ⊕ = ( ζ/ζ ⊕ ) . ( a/ Simulated survey D tel Telescope diameter or eﬀective diameter (based on total light-collectingarea) λ eﬀ Eﬀective wavelength of a spectroscopic measurement R ∗ , ref , T ∗ , ref Radius and eﬀective temperature of the reference star; ( R ∗ , ref , T ∗ , ref ) =(5777 K, 1 R (cid:12) ) for the imaging survey, (3000 K, 0 . R (cid:12) ) for the transitsurvey Table 5 continued ioverse Table 5 (continued)

Symbol Description t i Amount of time required to characterize the i -th planet in a sample t ref Amount of time required to characterize an Earth twin orbiting thereference star with a eﬀ = 1 AU t total Time budget allocated to characterizing planets for a speciﬁc spectralfeature (may overlap with observations at other wavelengths) ζ ⊕ Contrast ratio of the Earth with respect to the Sun, ζ ⊕ ≈ − p i , w i Observing priority and relative weight assigned to each planet, where p i = w i /t i Hypothesis testing x , y Independent and dependent variables in the simulated data sets h ( (cid:126)θ, x ) Alternative hypothesis describing the relationship between x and y , tobe compared to the null hypothesis (cid:126)θ Set of parameters which deﬁne h L ( y | (cid:126)θ ) Likelihood function, described by Equation 9 or 10Π( (cid:126)θ ) Prior probability distribution of (cid:126)θ , described for each example in Table3 Z Bayesian evidence in favor of the null or alternative hypothesis, com-puted by nested sampling

Habitable zone hypothesis a inner , a outer Inner and outer edges of the habitable zone in a eﬀ space (i.e. for aSun-like star) f H OEEC

Fraction of EECs with atmospheric water vapor (assumed habitable) f H Onon-EEC

Fraction of non-EECs with atmospheric water vapor

Age-oxygen correlation f life Fraction of EECs inhabited by life (regardless of O content) t /2