[PDF] Mock Lightcones and Theory Friendly Catalogs for the CANDELS Survey

Abstract

We present mock catalogs created to support the interpretation of the CANDELS survey. We extract halos along past lightcones from the Bolshoi Planck dissipationless N-body simulations and populate these halos with galaxies using two different independently developed semi-analytic models of galaxy formation and the empirical model UniverseMachine. Our mock catalogs have geometries that encompass the footprints of observations associated with the five CANDELS fields. In order to allow field-to-field variance to be explored, we have created eight realizations of each field. In this paper, we present comparisons with observable global galaxy properties, including counts in observed frame bands, luminosity functions, color-magnitude distributions and color-color distributions. We additionally present comparisons with physical galaxy parameters derived from SED fitting for the CANDELS observations, such as stellar masses and star formation rates. We find relatively good agreement between the model predictions and CANDELS observations for luminosity and stellar mass functions. We find poorer agreement for colors and star formation rate distributions. All of the mock lightcones as well as curated "theory friendly" versions of the observational CANDELS catalogs are made available through a web-based data hub.

Full PDF

MMNRAS , 000–000 (0000) Preprint 2 February 2021 Compiled using MNRAS L A TEX style ﬁle v3.0

Mock Lightcones and Theory Friendly Catalogs for the CANDELSSurvey

Rachel S. Somerville , (cid:63) , Charlotte Olsen , L. Y. Aaron Yung , , Camilla Paciﬁci ,Henry C. Ferguson , Peter Behroozi , Shannon Osborne , Risa H. Wechsler ,Viraj Pandya , Sandra M. Faber , Joel R. Primack , Avishai Dekel Center for Computational Astrophysics, Flatiron Institute, 162 5th Avenue, New York, NY 10010 Department of Physics and Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ 08854, USA Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA Department of Astronomy and Steward Observatory, University of Arizona, Tucson, AZ 85721, USA Kavli Institute for Particle Astrophysics and Cosmology & Physics Department, Stanford University, Stanford, CA 94305, USA;SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA Department of Astronomy and Astrophysics, University of California, Santa Cruz, CA 95064, USA Physics Department, University of California, Santa Cruz, CA 95064, USA Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel

ABSTRACT

Uni-verseMachine . Our mock catalogs have geometries that encompass the footprints of observations associated withthe ﬁve CANDELS ﬁelds. In order to allow ﬁeld-to-ﬁeld variance to be explored, we have created eight realizationsof each ﬁeld. In this paper, we present comparisons with observable global galaxy properties, including counts inobserved frame bands, luminosity functions, color-magnitude distributions and color-color distributions. We addition-ally present comparisons with physical galaxy parameters derived from SED ﬁtting for the CANDELS observations,such as stellar masses and star formation rates. We ﬁnd relatively good agreement between the model predictions andCANDELS observations for luminosity and stellar mass functions. We ﬁnd poorer agreement for colors and star for-mation rate distributions. All of the mock lightcones as well as curated “theory friendly” versions of the observationalCANDELS catalogs are made available through a web-based data hub.

Key words: galaxies: formation, evolution, stellar content, high-redshift – astronomical data base: surveys

The Cosmic Assembly Near-IR Deep Extragalactic LegacySurvey (CANDELS) is a multi-cycle treasury program on theHubble Space Telescope (HST; Grogin et al. 2011; Koeke-moer et al. 2011). The CANDELS project surveyed ﬁvewidely separated ﬁelds, each ∼ .

25 square degrees, buildingon the legacy of previous surveys such as the Great Observa-tories Origins Deep Survey (GOODS; Giavalisco et al. 2004),the Hubble Ultra Deep Field (HUDF; Beckwith et al. 2006),COSMOS (AEGIS; Scoville et al. 2007), and the UKIDSSUltra Deep Survey (UDS; Cirasuolo et al. 2007). The majornew contribution from CANDELS is Near-IR imaging withthe Wide Field Camera 3 (WFC3) in a “wedding cake” con-ﬁguration, with deeper imaging over two smaller areas and (cid:63) e-mail: rsomerville@ﬂatironinstitute.org candels.ucolick.org shallower imaging over ﬁve wider areas. CANDELS-deep issensitive enough to reveal galaxy candidates viewed during“cosmic dawn” at redshifts of z ∼ z ∼ © a r X i v : . [ a s t r o - ph . GA ] J a n CANDELS logs are documented in separate papers (Guo et al. 2013;Galametz et al. 2013; Stefanon et al. 2017; Nayyeri et al. 2017;Barro et al. 2019). Photometric redshift and stellar mass esti-mates have been presented in Dahlen et al. (2013), Mobasheret al. (2015), and Santini et al. (2015), and star formationrate estimates are presented in Barro et al. (2019).One goal of this paper is to document a set of “theoryfriendly” CANDELS high-level science products, which wehave curated in order to make it easier to compare the CAN-DELS results with the predictions of theoretical models (orwith other surveys). The “theory friendly” catalogs (hereafterTF-CANDELS catalogs) have a standard format, with thesame set of observational and derived quantities included, andhave had a fairly generic set of data quality cuts pre-applied.The quantities included in the TF-CANDELS catalogs havebeen selected to comprise those that we expect to be of themost interest for comparison with theoretical models.Another important component of the CANDELS projecthas been the development of custom theoretical models andsimulations to aid in the interpretation of CANDELS results.One major part of the theory eﬀort has been the develop-ment of detailed “mock catalogs” tailored to the characteris-tics of the CANDELS survey. To build these mock catalogs,we have extracted “lightcones” from a large dissipationless N-body simulation, with geometries matched to the ﬁve CAN-DELS ﬁelds. The lightcones are lists of the masses, redshifts,and positions on the sky (right ascension and declination) ofhalos extracted along a past lightcone. We can then construct“merger trees” which describe the build-up of these halos overtime via merging of smaller halos. The observable propertiesof the galaxies that form in these halos can then be computedusing an approach known as semi-analytic modeling.Semi-analytic models (SAMs) of galaxy formation are awidely used tool for studying the formation and evolutionof galaxies in a cosmological context. In this approach, onetracks bulk quantities such as diﬀuse hot gas, cold star form-ing gas, stars, heavy elements, etc, using approximations andphenomenological recipes. These models are set within thebackbone of the dark matter halo merger trees mentionedabove, which track the build-up of gravitationally collapsedstructures. They typically include modeling the shock heatingand radiative cooling of gas, star formation and stellar feed-back, chemical evolution, and morphological transformationvia mergers. Some recent models also include the formationand growth of supermassive black holes and feedback fromActive Galactic Nuclei (AGN). The resulting star formationand chemical enrichment histories can then be combined withstellar population models (e.g. Bruzual & Charlot 2003) anda treatment of attenuation by dust in order to obtain esti-mates of luminosities at UV-NIR wavelengths.Semi-analytic models adopt many simpliﬁcations and ap-proximations, and do not provide information that is as de-tailed as the output from a numerical hydrodynamic simu-lation. But SAMs have the advantage of much greater com-putational eﬃciency, as well as ﬂexibility. Moreover, numeri-cal simulations must still adopt phenomenological treatmentsof “sub-grid physics” to describe physical processes that oc-cur at scales smaller than the resolution of the simulation(Somerville & Dav´e 2015; Naab & Ostriker 2017). In manycases, these recipes are similar to those utilized in SAMs.Modern SAMs and cosmological numerical hydrodynamic simulations apparently yield very consistent results, at leastfor many key global quantities (Somerville & Dav´e 2015).An alternative method of linking dark matter halo proper-ties with observables is to use empirical models such as sub-halo abundance matching models (SHAMs) or their variants(see Wechsler & Tinker 2018, for a recent review). Ratherthan attempting to implement a priori all of the detailedphysical processes associated with galaxy formation, thesemodels derive mappings between dark matter halo proper-ties and observationally derived quantities, such that obser-vational constraints are satisﬁed. The

UniverseMachine de-veloped by Behroozi et al. (2019) is an example of such anapproach.We have created a set of mock catalogs based on theCANDELS lightcones using three diﬀerent approaches: theSanta Cruz SAM developed by R. Somerville and collabo-rators (Somerville et al. 2015, and references therein), theSAM code of Y. Lu and collaborators (Lu et al. 2011), andthe

UniverseMachine (Behroozi et al. 2019). In Lu et al.(2014), we conducted an extensive comparison of the predic-tions of the SC and Lu SAMs, as well as a third SAM byCroton et al. (2006), for “intrinsic” galaxy properties overthe redshift range z ∼ z = 0 stellar mass function. Overall, we found that themodels produced fairly similar results, although with somesigniﬁcant diﬀerences particularly at the highest redshifts in-vestigated. However, we did not compare the model predic-tions with actual CANDELS data in that work, as the highlevel data products were not yet available.The goals of this paper are three-fold: ﬁrst, we documentthe details of the construction and contents of the mock cata-logs, which have already been used in a number of CANDELSpapers, and which we now release to the community. Sec-ond, we present the predictions of the SC and Lu SAMs forstandard quantities such as observed counts, rest-frame lumi-nosity functions, color-magnitude relations, and color-colordiagrams. We focus here on the redshift range 0 . < ∼ z < ∼ z > ∼ § § §

5, we present a comparisonof the predictions of the models with observed and derivedquantities from CANDELS. We discuss our results, includ-ing a comparison with previous work, in § §

7. Throughout, we adopt the cosmologicalparameters consistent with the recent analysis of the Plancksurvey (as given in § MNRAS , 000–000 (0000) ock Lightcones for CANDELS Figure 1.

Approximate footprints of the ﬁve mock lightcones (left),and the observed CANDELS ﬁelds, for a slice 0 . < z < . <

26. The color scale shows the density of galaxies on thesky in arcmin − . The mock lightcones subtend a much larger areathan the CANDELS HST footprint, by design. The lightcones used to construct our mock catalogs are ex-tracted from the Bolshoi Planck (hereafter BolshoiP) N-body simulations (Klypin et al. 2016; Rodr´ıguez-Pueblaet al. 2016). The cosmological parameters are: matter densityΩ m = 0 . b = 0 . H = 67 . − Mpc − , tilt n s = 0 .

96, power spectrumnormalization σ = 0 . h − ) comoving Mpc on a side, with particle mass2 . × M (cid:12) (1 . × h − M (cid:12) ), and a force resolution of1.5 kpc (1 h − kpc ) in physical units.Dark matter halos and subhalos were identiﬁed using the ﬁeld dimensions (arcmin) area (arcmin )COSMOS 17 ×

41 697EGS 17 ×

46 782GOODS-N 32 ×

32 1024GOODS-S 39 ×

41 1599UDS 36 ×

35 1260

Table 1.

Dimensions and areas of the mock lightcones for the ﬁveCANDELS ﬁelds. Note that these dimensions are typically diﬀer-ent from the HST footprint of the observed CANDELS ﬁelds.

ROCKSTAR code (Behroozi et al. 2013a). The halo cata-logs are complete above a mass of (cid:39) . × M (cid:12) (50 kms − ). Merger trees have been constructed from these halocatalogs using the Consistent Trees code (Behroozi et al.2013b). All results presented here make use of the halo virialmass deﬁnition of Bryan & Norman (1998), given in Eqn. 1of Rodr´ıguez-Puebla et al. (2016).Lightcones are extracted from 164 snapshots betweenredshifts 0 −

10. Lightcone origins and orientations arechosen randomly within the simulation volume. The sim-ulation has been constructed with periodic boundaryconditions, and the simulated volume is replicated and tiledin all directions. Halos are collected along each lightconefrom the snapshot closest to their cosmological redshift.As CANDELS is comprised of pencil-beam surveys, norestrictions on sampling overlapping regions of the simu-lation volume are applied, as overlaps typically occur atredshift spacings ∆ z >

1. The source code for creatinglightcones is available online in the lightcone package, andthe full description of the algorithm is in Behroozi et al.(2020). For more information on how to use the light-cone package, please see the online documention at https://bitbucket.org/pbehroozi/universemachine/src/master/README.md UniverseMachine are diﬀerent lines of sight and con-tain diﬀerent halos, but comprise a statistical representationof the same halo population.

The two semi-analytic models used in this work contain asimilar suite of physical processes, but these processes are pa-rameterized and implemented in diﬀerent ways. Both modelsare based on merger trees, which describe how dark matterhalos collapse and merge to form larger structures over time. https://bitbucket.org/pbehroozi/universemachine/src/master/MNRAS , 000–000 (0000) CANDELS

The models contain prescriptions describing the cosmologicalaccretion of gas into halos, cooling of hot halo gas into theinterstellar medium (ISM) of galaxies, and the formation ofstars from cold ISM gas. In addition, the models track thereturn of mass and metals to the ISM from massive starsand supernovae, and contain a schematic treatment of “stel-lar feedback”, the ejection of mass and metals by stellar andsupernova driven winds. The Santa Cruz model contains aprescription for the formation and growth of supermassiveblack holes, and associated “black hole feedback”, while theLu models include a phenomenological halo-based quenchingmodel. Both models track the stellar mass in a “disk” and“spheroid” component of each galaxy separately, allowing forsimpliﬁed estimates of galaxy morphology to be made. Bothmodels additionally contain estimates of the radial size ofthe disk component of each galaxy. The Santa Cruz modelalso includes estimates of the size of the spheroid compo-nent, based on the models developed by Porter et al. (2014).Note that the version of the Santa Cruz models used herecontains tracking of multiphase gas and a molecular hydro-gen based star formation recipe, as described in Somervilleet al. (2015). The model parameters have also been updatedrelative to those presented in Somerville et al. (2015) to ac-count for the BolshoiP cosmology (see Yung et al. 2019a, fordetails). In addition, as in Yung et al. (2019a), the ﬁlteringmass for photoionization squelching has been updated to theresults from Okamoto et al. (2008).An important diﬀerence between the models is that theLu models utilize the merger trees extracted directly fromthe BolshoiP simulations, and therefore the mass resolutionis limited to ∼ M (cid:12) for root halos. The Santa Cruz SAMsuse the “root halos” along the lightcones from BolshoiP, butconstruct the halo merger histories using the Extended Press-Schechter formalism as presented in Somerville et al. (2008).Therefore, the halo merger histories depend only on halo massand redshift, and do not carry a second-order dependence onthe large scale environment. However, this means that theSanta Cruz mocks extend an order of magnitude further downin mass resolution, to root halos of ∼ M (cid:12) .Both SAMs carry out stellar population synthesis by com-bining the predicted star formation and chemical enrichmenthistories with simple stellar population models and analyticestimates of the eﬀects of dust attenuation, to predict galaxyspectral energy distributions. The Santa Cruz models addi-tionally utilize dust emission templates to extend the SEDpredictions to longer wavelengths, where the light is domi-nated by dust emission rather than starlight. More detailsare given in § Sub-halos are halos that have become subsumed within an-other virialized halo. In the typical terminology of SAMs,sub-halos are said to host “satellite” galaxies. Sub-halos aretidally stripped as they orbit within their host halo. Theymay be tidally destroyed before they merge, or they maymerge with the central galaxy or with another satellite. The SAMs used in this study treat sub-halos (which host satellitegalaxies) in diﬀerent ways. The

ROCKSTAR catalogs pro-vide merger trees for sub-halos as well as distinct halos. How-ever, as with any simulation, the ability to explicitly trackthe evolution of sub-halos is limited by the mass and forceresolution of the simulation (van den Bosch et al. 2018; vanden Bosch & Ogiya 2018). Moreover, the presence of baryonscan aﬀect the timescale for tidal stripping and destruction ofsatellites (e.g. Garrison-Kimmel et al. 2017), yet these eﬀectsare not accounted for self-consistently as our merger trees arebased on dark-matter only simulations. Sub-halos that can nolonger be identiﬁed in the N-body outputs, but which maystill have surviving satellite galaxies associated with them,are commonly referred to as “orphans”. Many SAMs utilizesemi-analytic recipes to continue to track the evolution of or-phans until they merge or are tidally destroyed. For a detaileddiscussion of these issues, and a state of the art semi-analytictreatment of sub-halo evolution, see Jiang et al. (2020).The Santa Cruz SAM treats all satellite galaxies as “or-phans” from the time that they enter the host halo. A modi-ﬁed version of the Chandrasekhar equation, which tracks theloss of orbital angular momentum due to dynamical frictionagainst the dark matter halo, is used to estimate the radialdistance of the satellite from the center of the host halo as afunction of time (Boylan-Kolchin et al. 2008). As the satelliteorbits, a ﬁxed amount of its mass is stripped oﬀ in each orbit,following Taylor & Babul (2001). If the sub-halo’s mass dropsbelow M ( < f strip r s ), where f strip is an adjustable parameterand r s is the Navarro-Frenk-White (Navarro et al. 1996) scaleradius, then the sub-halo is considered tidally destroyed. Itsstars are added to the “diﬀuse stellar halo” and its cold gas isadded to the hot gas reservoir. If the satellite survives untilit reaches the center of the halo, then the satellite is mergedwith the central galaxy (satellites are not allowed to mergewith other satellites). The details of the treatment of mergersare described in S08 and L14.The Lu SAM uses the sub-halo information from the N-body catalogs to follow the satellite population for as long asthe sub-halo can be resolved. When the sub-halo disappearsfrom the N-body merger tree catalog, its properties when itwas last identiﬁed are used in a formula that computes thedynamical friction time using the Chandrasekhar formula asgiven in Binney & Tremaine (1987). The orphan satellite isassumed to merge with the central galaxy after this time hasellapsed. Tidal stripping and destruction of orphan satellitesis not accounted for. See L14 section A.7.3 for details. Each semi-analytic model produces a prediction for the jointdistribution of ages and metallicities in each galaxy along thelightcone at its observation time. These are obtained fromthe star formation and chemical enrichment histories of allprogenitors that have merged into that galaxy by the out-put time. These age-metallicity distributions are then con-volved with stellar population synthesis models to obtainintrinsic (non-dust-attenuated) spectral energy distributions(SED) which may be convolved with any desired ﬁlter re-sponse functions. Both SAMs use the stellar population syn-thesis models of Bruzual & Charlot (2003, BC03) with thePadova 1994 isochrones and a Chabrier IMF. Note that the

MNRAS , 000–000 (0000) ock Lightcones for CANDELS synthetic SEDs currently do not currently include nebularemission.If we write the mass of stars formed in all progenitors ofa given galaxy with ages between t , t + dt and metallicitiesbetween Z and Z + dZ as Ψ( t, Z ) dt dZ , then the SED of thegalaxy is obtained by summing the“simple stellar population”components provided by BC03 over all ages and metallicities: F λ ( t obs ) = (cid:90) t obs t (cid:90) Z max Z min T dust ( λ )Ψ( t, Z ) S λ ( t, Z ) dt dZ where in practice, the SSPs are provided at a set of discreteages and metallicities (196 ages and 6 metallicities, in the caseof the BC03 models) so the integral is actually a sum. Thetimestep in the SAM is chosen such that the time binning isat least as ﬁne as that in S λ ( t, Z ) at any point.Dust attenuation is included through the term T dust ( λ ),which is given by T dust ( λ ) = 10 . − . A V k λ where A V is theattenuation in the rest-V band and k λ is the attenuation asa function of wavelength relative to the V -band.We model the rest V-band optical depth using the expres-sions: N H = m cold / ( r gas ) τ V, = f dust ( z ) τ dust , ( Z cold ) α dust ( N H ) β dust where τ dust , , α dust , and β dust are free parameters, Z cold isthe metallicity of the cold gas, m cold is the mass of the coldgas in the disc, and r gas is the radius of the cold gas disc,which is assumed to be a ﬁxed multiple of the stellar scalelength (see S08). We adopt τ dust , = 0 . α dust = 0 .

4, and β dust = 1 . τ V, results in attenuation that is toostrong at high redshift. As a result, we adopt an empiricalredshift dependent functional form for τ V, . For z < .

5, weadopt the redshift dependent correction factor f dust ( z ) = (1 + z ) γ dust (1)and for z > .

5, we adopt the expression given in Section 2.4of Yung et al. (2019a). This empirical relation was adjusted byhand to achieve a reasonable “by-eye” match to the observedrest-frame UV, B and V-band luminosity function from z ∼ z > ∼ V -band for a galaxy with inclination i isgiven by: A V = − . (cid:20) − exp[ − τ V, / cos( i )] τ V, / cos( i ) (cid:21) . (2)For k λ , we adopt the starburst attenuation curve ofCalzetti et al. (2000). We have also experimented with a two-component (cirrus plus birthcloud) model for the attenua-tion, as presented in S12. However, we found that the simplerCalzetti attenuation curve does a better job of reproducingthe colors of observed CANDELS galaxies over the wholeredshift range that we study here.Dust emission modeling is included in the SC SAMs usingthe same approach described in S12, but adopting the Chary& Elbaz (2001) emission templates. Table 2.

Summary of recalibrated SC SAM parameters.Parameter Description Value (cid:15) SN SN feedback eﬃciency 1.7 α rh SN feedback slope 3.0 V eject halo gas ejection scale 130 km/s τ ∗ , SF timescale normalization 1.0 y Chemical yield (in solar units) 2.0 κ AGN

Radio mode AGN feedback 3 . × − All cosmological models of galaxy formation contain param-eterized recipes, which are typically adjusted by tuning themto match a selected subset of observations. For a detailedsummary of the tunable parameters in the three SAMs pre-sented here, and the approach used to tune them, please seeL14. Some parameters were re-tuned relative to the valuesused in L14, due to the change in cosmological parametersfrom the original Bolshoi simulations (used in L14) to Bol-shoiP. Table 2 provides a summary of parameters for theSanta Cruz SAM that have diﬀerent values from those spec-iﬁed in Somerville et al. (2015). Please see Somerville et al.(2015) for a full description of the parameters, and Table 1in that work for a complete table of parameter values. Theobservations used for the calibration and the results of thecalibration comparison are shown in Yung et al. (2019a) Ap-pendix B. The calibration quantities include the stellar massfunction, the stellar mass vs. cold gas fraction, stellar massvs. metallicity relation, and the bulge mass vs. black holemass relation.

UniverseMachine

The

UniverseMachine is an empirical model that con-nects galaxies’ star formation rates to their host haloes’masses ( M h ), accretion rates ( ˙ M h ), and redshifts (Behrooziet al. 2019). Using an initial guess for the distribution ofgalaxy SFRs as a function of host halo properties (i.e., P ( SF R | M h , ˙ M h , z )), it populates all haloes in a dark mat-ter simulation with SFRs. These SFRs are then integratedalong merger trees to obtain galaxy stellar masses and lu-minosities. The statistics of the resulting mock universe arecompared to those from observations, including stellar massfunctions ( z = 0 − z = 0 − z = 0 − z = 0 − z = 4 − z = 4 − z = 0 − z = 0). Comparing these observables results in a likelihoodfor the guess for P ( SF R | M h , ˙ M h , z ). This likelihood is givento a Monte Carlo Markov Chain algorithm to generate a newguess, and the process is repeated millions of times to obtainthe posterior distribution of galaxy–halo connections that areconsistent with all input observations.The UniverseMachine attempts to forward-model toavailable observations as much as possible. This includes ac-counting for random and systematic errors in both stellarmasses and SFRs, which can both rise to levels of ∼ . < z < MNRAS , 000–000 (0000)

CANDELS ﬁeld reference σ aperture depth (W/D/UD) RA DEC eﬀective area[arcsec] [AB mag] [degree] [degree] [arcmin ]COSMOS Nayyeri et al. (2017) 5 0.17 27.56 150.116321 +62.238572 216EGS Stefanon et al. (2017) 5 0.20 27.6 214.825000 +52.825000 198.6GOODS-N Barro et al. (2019) 5 0.17 27.8, 28.2, 28.7 189.228621 +62.238572 163.13GOODS-S Guo et al. (2013) 5 0.17 27.4, 28.2, 29.7 53.122751 -27.805089 159.36UDS Galametz et al. (2013) 1 1 27.9 34.406250 -5.2000000 195.58 Table 3.

References and image characteristics for the published papers on the ﬁve observed CANDELS ﬁelds. The σ column indicateswhether limiting magnitudes were computed at 5 σ or 1 σ , and aperture provides the aperture used to compute the limiting magnitude(see Equation 5). Depths are for the F160W image. The quoted eﬀective areas are for the “wide” images. other models in this paper, the UniverseMachine uses anorphan prescription to extend the lifetime of infalling satel-lites. Speciﬁcally, satellite lifetimes are extended until (ortruncated after) their circular velocities reach ∼ . z = 0 − The CANDELS survey is a 902-orbit legacy program whichcarried out imaging with the WFC3 camera on HST in ﬁveﬁelds: COSMOS, EGS, GOODS-N, GOODS-S, and the UDS,over a combined area of about 0.22 deg . Each ﬁeld has adiﬀerent suite of ancillary imaging data from X-ray to ra-dio from the ground and space, which have been incorpo-rated into multi-wavelength catalogs and used to estimatephotometric redshifts and physical properties such as stel-lar masses and star formation rates. Please see Grogin et al.(2011) and Koekemoer et al. (2011) for details of the sur-vey design and basic image processing, and the ﬁve “ﬁeld”papers (summarized in Table 3) for details on the catalogconstruction for each ﬁeld. The CANDELS catalogs releasedby the team may be accessed at https://archive.stsci.edu/prepds/candels/, and an interactive web-based portal to someof the CANDELS catalog and image data is available athttps://rainbowx.ﬁs.ucm.es/Rainbow Database/Home.html.We have created a curated version of the CANDELS highlevel science products, which have been designed to be easy touse for comparisons with theoretical models and simulations.The format and contents of the CANDELS “theory friendlycatalogs” (TF-CANDELS) has been standardized and ho-mogenized over all ﬁve ﬁelds, and the catalogs have had astandard set of ﬂags and cuts applied. Each theory friendlycatalog contains a standardized set of observed frame andrest-frame photometry, along with redshifts, structural pa-rameters (size and Sersic index), and multiple stellar massand star formation rate estimates. In the original catalogs,the “value added” quantities such as photometric redshifts,structural parameters, and stellar masses are all in separateﬁles which must be joined.The photometric redshifts in the TF catalogs are the up-dated estimates from Kodra et al. (in prep), and these areused for all derived quantities in the TF-CANDELS cata-logs that depend on redshift (e.g. absolute magnitudes, stel-lar masses, SFR). We have checked, however, that none ofthe results shown in this paper diﬀer signiﬁcantly from thosethat are obtained using the published team redshifts as doc-umented in Dahlen et al. (2013).Rest-frame absolute magnitudes were computed for thesame ﬁlter response functions used to compute rest framephotometry in the mock catalogs. Rest-frame magnitudeswere computed using the package EAZY (Brammer et al.2010), with the details of the set-up and parameter ﬁleas speciﬁed in the TFCD (Appendix E). In addition, the MNRAS000

Total counts as a function of apparent magnitude in diﬀerent observed frame ﬁlters as indicated by the labels in each panel, forthe SC SAM mock catalogs compared with CANDELS. Solid dark blue lines show the SC SAM predictions with dust attenuation included,and dashed light blue lines show the SC SAM predictions without dust. The grey shaded region shows the range of values between thediﬀerent CANDELS ﬁelds, and the black symbols show the median of the values in all four ﬁelds. No magnitude cuts or completenesscorrections have been applied to either the models or observations, and the observed counts become incomplete at around magnitude 25.5or 26. The SC SAM predictions match the observations well in the F435W, F606W, and F160W bands, but a bit less well in the redderK and IRAC bands.

TF-CANDELS catalogs provide alternate estimates of theabsolute magnitudes computed using the zphot package (Fontana et al. 2000; Merlin et al. 2019), and of U − V and V − J colors computed using the SED-ﬁtting method of Paci-ﬁci et al. (2012, hereafter P12). The TF-CANDELS catalogsalso include stellar masses estimated using both the zphot and P12 approaches. For star formation rates (SFR), in ad-dition to estimates based on zphot and P12, the catalogsalso include the SFR estimates presented by Barro et al.(2019), which utilize either a combination of rest-UV andmid-IR photometry, for galaxies that are detected in the IR,or a dust-corrected estimate based on the rest-UV. A detailedcomparison of how these derived quantities diﬀer for the dif-ferent methods is not in the scope of this paper, however, wedo comment brieﬂy on this issue in the discussion (Section 6).The original ﬁles and catalog ﬁeld names used to cre-ate each entry in the TF-CANDELS catalogs are spec-iﬁed in the TF-CANDELS Documentation (TFCD Ap- Note that unlike all other magnitudes in the catalogs, these arein the Vega system. pendix A; https://users.ﬂatironinstitute.org/˜rsomerville/Data Release/CANDELS/TFCD.pdf). We have selected a“representative” observed U-band and K-band ﬁlter for eachﬁeld. The ﬁelds were observed with diﬀerent telescopes anddiﬀerent instruments, so in practice the actual ﬁlters diﬀer abit from ﬁeld to ﬁeld. The details of the actual ﬁlters usedfor each ﬁeld are provided in Appendix B of the TFCD.Appendix C of the TFCD describes how we carried out thecalculation of F160W limiting magnitude for each object inthe TF-CANDELS catalogs. Using the F160W weight mapsfor each ﬁeld, we computed the average RMS as (cid:104)

RMS (cid:105) = (cid:112) (1 / (cid:104) w i (cid:105) , where (cid:104) w i (cid:105) is the average weight over a 6x6 squareof pixels surrounding the center of each galaxy. We then com-puted the limiting magnitude as m lim = − . ( (cid:112) A (cid:104) RMS (cid:105) + z p (3)where A = 1 / (0 . / pixel) , and the zeropoint z p isgiven by z p = − . ( P HOT F LAM ) − ( P HOT P LAM ) − . MNRAS , 000–000 (0000)

CANDELS s − cm − − , and PHOTPLAM converts from ﬂux per unitwavelength f λ to ﬂux per unit frequency f ν . The values aretaken from the image headers. In the TF catalogs, the limitingmagnitude is deﬁned as the 1 σ limiting magnitude within anaperture with area 1 arcsec . Some of the CANDELS ﬁeldpapers have used other deﬁnitions, such as the 5 σ limitingmagnitude within a diﬀerent aperture (see Table 3). In orderto compute the limiting magnitude at 5 σ within some otheraperture, one can adopt: m lim = − . ( (cid:112) A (cid:104) RMS (cid:105) + z p (4)where A In this section we compare the predictions of the mockcatalogs with CANDELS observations in diﬀerent redshiftbins. We investigate quantities in ‘observational’ space suchas counts in the observed H (F160W) band as well asderived quantities such as rest-frame luminosity functionsand color-color diagrams. We further investigate compar-isons with physical properties derived from SED ﬁtting to theCANDELS data, such as stellar mass functions and SFR func-tions. For this analysis, we adopt a standard set of redshiftbins: 0 .

1, 0 .

5, 1 .

24, 1 .

72, 2 .

15, 2 .

57, 3 .

0. These have been cho-sen so that all except for the lowest redshift bin have roughlyequal comoving volume. In all of the results presented below,the model results are obtained by averaging over all eightrealizations of all ﬁve ﬁelds (40 lightcones in all, covering atotal area of 40 times 5362 arcmin , or (cid:39)

60 sq. deg.).The observational results shown in all ﬁgures to follow areobtained by averaging over the EGS, GOODS-S, GOODS-N, and UDS ﬁelds, and the shaded areas show the minimumand maximum values of the binned quantity in each bin fromﬁeld to ﬁeld. We omitted the COSMOS ﬁeld from this analy-sis because we found that the counts in redshift bins z > ∼ . < . Figure 2 shows the counts (number of objects per bin in ap-parent magnitude, per sq. arcminute on the sky) integratedover all redshifts, for a selection of the CANDELS observedframe ﬁlter bands. As the Lu mocks do not include IRACphotometry, we only show the comparison with the SC SAMshere — the Lu SAMs produce similar results in the F435W,F606W, F160W, and K bands. No magnitude cuts or com-pleteness corrections have been applied to either the mod-els or observations, and the observed counts become incom-plete at around magnitude 25.5 or 26. The SC SAM predic-tions match the observations well in the F435W, F606W, andF160W bands, but less well in the redder K and IRAC bands.This is in part because there is ﬂexibility to match the bluerbands by adjusting the dust correction. It may also reﬂect thepresence of an older stellar population in the model galaxiesthan is present in the real Universe, as we will discuss furtherbelow. This ﬁgure illustrates the potential to calibrate mod-els using multiband photometry instead of derived quantitiessuch as stellar masses.Figure 3 shows the counts in the HST F160W ﬁlter (whichis the detection ﬁlter for CANDELS) split into redshift bins.The model results are shown for the intrinsic ﬂuxes withoutaccounting for attenuation by dust, and including the modelfor dust attenuation described in Section 2.2.2. Overall, thereis very good agreement between the models and observationsonce dust attenuation is accounted for, for magnitudes wherethe CANDELS wide catalogs are highly complete (F160W < . Figure 4 shows the binned histograms of rest-frame absolutemagnitude in the rest-1500 ˚Aband, for the CANDELS obser-vations (using the EAZY-based estimate) and the SC SAM,in our standard redshift bins. We note that although we referto these as “luminosity functions”, we have made no attemptto correct these for incompleteness as is generally done inthe literature. We simply apply a cut of F160W < . < . MNRAS000

Counts as a function of apparent observed frame F160W magnitude split into redshift bins, for the Lu (orange) and SC (blue)SAM mocks compared with CANDELS (black symbols show the mean over all four ﬁelds; dark gray shaded areas show the minimumand maximum value in each bin over the four ﬁelds). Light gray vertical shaded regions show approximately where the CANDELS Wideobservations are expected to be incomplete. Dashed lines show the intrinsic counts before dust attenuation has been added to the modelgalaxies, and solid lines show the predictions including dust attenuation. The agreement between the predicted and observed counts isqualitatively good in all redshift bins. expected to be highly complete at this magnitude limit. Sim-ilar comments apply to the V-band “luminosity functions”.Figure A1 shows the same comparison for the Lu SAMs. TheSC SAMs show very good agreement with the bright end ofthe UV LF when dust attenuation modeling is included; andthe models with an F160W < . F W are applied. In the SC SAMs, there is a smallbut signiﬁcant excess of faint galaxies (fainter than L ∗ ) rela-tive to the CANDELS observations at redshifts z > ∼ .

5, butthe agreement on the bright end is very good. The Lu SAMsshow better agreement for faint galaxies, but show a possibledeﬁcit of bright galaxies at z > ∼ For both results presented in this sub-section, a magnitudecut of F160W < . U − V and V − J colors, and the EAZY estimate for therest-frame V-band magnitude.Figure 6 shows the distribution of the number density ofgalaxies in the rest- V magnitude versus U − V color plane,in three redshift bins. The overall distributions in color-magnitude space appear similar between the models and ob-servations, although some quantitative diﬀerences are appar-ent. The main population of galaxies in the models is shiftedto the red relative to the observations, by as much as 0.5dex. The discrepancy increases towards lower redshifts. The MNRAS , 000–000 (0000) CANDELS

Figure 4.

Luminosity functions in the rest-frame UV (1500 ˚A) divided into redshift bins, for the SC SAM, compared with the correspondingdistributions from CANDELS (black symbols show the mean over all ﬁelds; shaded areas show the minimum and maximum value in eachbin over the four ﬁelds). Dashed lines show the intrinsic luminosity functions with no dust attenuation; solid lines show the modelpredictions with dust attenuation included. Dotted lines show the dust attenuated model predictions with a cut of F160W < . colors in the SC SAM are slightly redder than in the LuSAM. The Lu SAM does not show as pronounced a trendbetween luminosity and color as the observations or the SCSAM. Both the Lu SAM and, to a lesser extent, the SC SAM,show a bimodal distribution of colors at faint luminosities.The population with redder colors is associated with satel-lite galaxies, and reﬂects a well-known tendency of SAMs toproduce over-quenched low-mass satellite galaxies. However,even when comparing only central galaxies with the observedcolors, there is a signiﬁcant discrepancy in the sense of theSAMs producing galaxies with colors that are too red. Thisis due to the known tendency in these SAMs for low-massgalaxies to form too many stars at high redshift (and so tohave too large an old stellar population) while being too in-eﬃcient at forming stars at low redshift (White et al. 2015).The eﬀect of dust reddening on colors is also very uncertain.Figure 7 shows the distribution in rest-frame U − V versus V − J color space for the CANDELS observational sampleand the SC SAM and Lu SAM, in three redshift bins. Thisdiagram is often used to identify and separate star forminggalaxies and quiescent galaxies, where quiescent galaxies areexpected to be located in the upper left-hand region of the plot, and a nominal dividing line is shown. Once again we canclearly see that the star forming population in the models istoo red in U − V , but is in better agreement in V − J (which isless sensitive to stellar age). We also see that the standard di-viding line between quiescent and star forming galaxies doesnot separate these populations very eﬀectively in the mod-els, perhaps reﬂecting shortcomings in the dust modeling, ordiﬀerences in the ensemble of star formation histories. SeeBrennan et al. (2015) for a detailed analysis of the quiescentfraction in the the Santa Cruz SAMs compared with CAN-DELS observations, and Pandya et al. (2017) for an analysisof the transition galaxy population and the quenching rate inthe SC SAMs and CANDELS. Figure 8 shows stellar mass functions for the CANDELS ob-servations, using stellar mass estimates based on the P12method, compared with stellar mass functions from the SCSAM, Lu SAM, and

UniverseMachine . We note that

Uni-verseMachine is an empirical model that was calibrated

MNRAS000

MNRAS000 , 000–000 (0000) ock Lightcones for CANDELS Figure 5.

Luminosity functions in the rest-frame V-band divided into redshift bins for the SC SAM, compared with the correspondingdistributions from CANDELS. Key is as in Fig. 4. The SC SAM predictions agree with the observed rest-frame V-band magnitudedistributions fairly well in the regime where the observations are highly complete. to match previous estimates of the galaxy stellar mass func-tion over a wide range of redshifts (see Behroozi et al. 2019,for details), so this is an indirect way to compare the SAMpredictions and the CANDELS measurements with previ-ous estimates of the stellar mass function. It is importantto note that the CANDELS observational “stellar mass func-tions” have not been corrected for incompleteness and onlygalaxies with F160W < . < .

5, as in the observations. The turnover in thestellar mass function occurs at a similar mass in the SAMsand in the CANDELS observations. The

UniverseMachine mass functions are in good agreement with the CANDELSestimates, except at the high mass end. This is due to errorson the stellar masses, which cause an Eddington bias thatmake the high-mass end of the SMF shallower. This is il-lustrated in Figure A3, which shows the

UniverseMachine predictions with stellar mass errors included as described inBehroozi et al. (2019). Here it can be seen that the stel-lar mass errors can have a signiﬁcant eﬀect on the high-massend of the SMF, especially at high redshift. With the observa-tional errors included,

UniverseMachine is in near-perfectagreement with the CANDELS stellar mass distributions, asexpected. However, estimating the error in stellar mass andhow it depends on other galaxy properties in detail is highly non-trivial, so we do not include errors on the stellar masses inthe SAM mock catalogs. The SC SAMs agree well with

Uni-verseMachine intrinsic stellar masses at the high mass end,but systematically overproduce low-mass galaxies (below theknee in the SMF) at all redshifts, but to an increasing degreeat high redshift. This is a well-known and widespread prob-lem with many current models of galaxy formation, which iscaused by too-early formation of stars in low-mass galaxies(see Somerville et al. 2015; White et al. 2015, for a detaileddiscussion). The Lu SAM shows better agreement with theabundance of low-mass galaxies, but still overproduces themin the highest redshift bins, and may underproduce massivegalaxies at high redshift. However, it is impossible to rigor-ously assess the agreement at the high-mass end for bothSAMs due to the uncertainty in the errors on the observa-tional estimates of the stellar masses, as well as uncertaintiesdue to ﬁeld-to-ﬁeld variance, which are only crudely indicatedhere.Figure 9 shows star formation rate distribution functionsfor the CANDELS observations, using SFR estimates basedon the method of P12, compared with SFR functions from theSC SAM, Lu SAM, and

UniverseMachine (intrinsic valuesof SFR, without observational errors, are shown). The SC andLu SAM SFR have been averaged over a timescale of 100 Myr,while the P12 SFR estimates are averaged over 10 Myr; how-

MNRAS , 000–000 (0000) CANDELS

Figure 6.

The greyscale and overlaid contours show joint distributions of rest-frame U − V color versus rest V magnitude for the CANDELSobservations (top) and the SC (middle) and Lu (bottom) SAMs, in three redshift bins. Both the observed and model galaxies are selectedto have F160W < .

5, where the CANDELS Wide samples are highly complete. Both SAMs predict colors for low-luminosity galaxiesthat are up to ∼ . ever, we do not expect this to cause large diﬀerences. CAN-DELS observational SFR functions have not been correctedfor incompleteness and only galaxies with F160W < . < .

5, as in the obser-vations. The amplitude and location of the “knee” of theSFR distribution function agrees well between both SAMs,

UniverseMachine , and the observations. The predictions ofboth SAMs and

UniverseMachine are very similar for thehigh SFR part of the distribution. At z > ∼ .

7, both SAMspredict a higher amplitude and steeper distribution belowthe knee than UM and the CANDELS observations. However,the SFR distribution derived from CANDELS is signiﬁcantlyhigher in amplitude above SFR values of ∼

100 M (cid:12) yr − thanany of the model predictions, by as much as several ordersof magnitude. Fig. A4 again shows the SFR distribution forCANDELS and for UniverseMachine predictions with andwithout observational errors added. Again, the observationalerrors cause a small increase in the amplitude at the highSFR end, but based on the assumed magnitude of the errorson SFR from Behroozi et al. (2019), this cannot fully accountfor the discrepancy between the models and observations.Fig. 10 shows the conditional distribution of SFR for a given stellar mass in several redshift bins, for CANDELS us-ing stellar mass and SFR estimates from P12, and for bothSAMs and for

UniverseMachine without and with obser-vational errors included. SFR-stellar mass sequence relationsfrom the literature (Speagle et al. 2014; Iyer et al. 2018) arealso overplotted. In the ﬁrst two redshift bins shown, theCANDELS results are consistent with the literature sequence,while in the highest redshift bin, there is a population of mas-sive galaxies that lies above the literature sequence. The me-dian SFR at a given stellar mass is systematically lower thanthe literature relations and CANDELS in both SAMs, moreso in the SC SAM, and has a steeper slope, such that low-mass galaxies lie below the observed SFR sequence. This is afurther reﬂection of the same problem that caused the colorsin the SAMs to be overly red, namely, SFR is too stronglysuppressed in low-mass galaxies in the SAMs. Interestingly,

UniverseMachine also shows a mild steepening in the SFRsequence at low-mass, but to a much lesser extent than theSAMs. Observational estimates of complete samples of suchlow-mass galaxies are extremely challenging to obtain, butthese results suggest it may be interesting to do so.In order to try to interpret the very diﬀerent conclusionswe might reach from comparing the rest-UV luminosity func-

MNRAS000

MNRAS000 , 000–000 (0000) ock Lightcones for CANDELS Figure 7.

Distribution of galaxies in the U − V vs. V − J plane for the CANDELS observations (top row), the SC SAM (middle row), andthe Lu SAM (bottom row). Both the observed and model galaxies are selected to have F160W < .

5, where the CANDELS Wide samplesare highly complete. The solid black line shows the region of this diagram that is typically associated with quiescent galaxies (Williamset al. 2009). tion of galaxies (which shows excellent agreement betweenmodel predictions and observations) and the SFR function(which shows disagreement between the models predictionsand observations at the level of multiple orders of magni-tude), we examine the relationship between rest-UV luminos-ity and SFR in the CANDELS observations (using the P12estimates of SFR) and in the semi-analytic models (wherethe rest-UV magnitude includes dust). Figure 11 shows thisrelationship, and reveals that the P12 SFR estimates in CAN-DELS are signiﬁcantly higher for a given rest-UV magnitudethan the predictions of the SAMs, especially at high red-shift. This helps to reconcile the diﬀerent conclusions thatwe might reach from comparing the observed and predictedrest UV luminosity functions and SFR functions, but begsthe question as to the reason the relationship between rest-UV magnitude and SFR is so diﬀerent. The main possibilitiesare the assumed/estimated dust attenuation and the star for-mation history. We investigate the former possibility in thenext sub-section.

Dust attenuation is an important ingredient in forward mod-eling the semi-analytic models to the observational plane. Bythe same token, it is a critical ingredient in SED ﬁtting meth-ods used to estimate physical properties from galaxy photom-etry. As described in Section 2.2.2, the normalization of therelationship between dust optical depth in the V-band andgalaxy properties such as gas surface density and metallicityin the SAMs has been adjusted empirically to match the ob-served UV, B, and V-band luminosity functions. Thereforeit is interesting to see how this quantity compares with thedust attenuation derived from SED ﬁtting to the CANDELSobservations using the method of P12. Figure 12 shows the at-tenuation in the rest V-band versus the (attenuated) V-band,in three redshift bins, for the SED-ﬁtting derived results fromCANDELS and for the SC and Lu SAMs. The medians ofthe distributions are quite similar. In the two higher redshiftbins, the SAMs show a stronger trend between V-band mag-nitude and attenuation than the CANDELS estimates. Boththe SAMs assume a ﬁxed dust attenuation curve, while theSED ﬁtting procedure of P12 adopts a two component dustmodel.

MNRAS , 000–000 (0000) CANDELS

Figure 8.

Stellar mass functions divided into redshift bins, for the two SAMs and

UniverseMachine , compared with stellar mass distribu-tion functions derived from CANDELS. The CANDELS observations are shown for F160W < . < . < ∼ z < ∼ z > ∼ . The mock lightcones that we have presented here enable amore detailed comparison with observations than has oftenbeen done in the past. We ﬁnd extremely good agreement be-tween the observed frame counts predicted by the SAM andobservations for the F435W, F606W, F160W, and K bands,and less precise but still good ( ∼ . < z < ∼ ∼ . z < ∼

1. The agreement between the SAM predicted SMF andthat derived from CANDELS via SED ﬁtting is good but no-tably poorer quantitatively than for the luminosity functioncomparison. Even more dramatically, the SAM predictionsfor the SFR function show very large discrepancies with theCANDELS SFR functions derived from SED ﬁtting using themethod of P12 (up to several orders of magnitude), presenting a very diﬀerent picture from that obtained through compar-ing the rest-UV luminosity functions, which are in excellentagreement as noted above. We show that, as expected, theSFR- L UV relationship predicted by the SAM and derived inCANDELS via SED ﬁtting are very diﬀerent, likely reﬂectingeither diﬀerent assumptions about dust attenuation and/orthe galaxy star formation histories.Although semi-analytic models of galaxy formation areknown to reproduce many key observations, the current gen-eration of models is also known to show some tensions withobservations that have been discussed extensively in the liter-ature. These tensions include 1) models tend to overproduce low-mass galaxies at intermediate redshifts 1 < ∼ z < ∼ underproduce mas-sive galaxies at z > ∼

1; 3) model galaxies at 1 < ∼ z < ∼ lower than observational estimates. Itcan be seen in the compilation presented in ﬁgure 4 and 5of Somerville & Dav´e (2015) that these discrepancies arecommon not only to most semi-analytic models but also toseveral large-volume hydrodynamic simulations. So far, theseproblems have been overcome only by explicitly tuning themodels to match observational constraints at high redshift,as in Henriques et al. (2015). Taken together, these discrep- MNRAS000

Star formation rate distribution functions divided into redshift bins, for the two SAMs and

UniverseMachine , compared withSFR functions derived from CANDELS. The CANDELS observations are shown for F160W < . < . UniverseMachine predictions. ancies suggest a picture in which star formation is not ef-ﬁcient enough in massive galaxies at intermediate redshifts(1 < ∼ z < ∼ too eﬃcient in low-mass galaxies in thissame redshift range. At the same time, star formation ratesare too low in low mass galaxies at low redshift ( z < ∼ estimates of these physical propertiesobtained from SED ﬁtting to observations. This approach hasmany advantages, including greater ease of interpretation interms of intuitive physical quantities, and greater ease in link-ing populations across diﬀerent epochs. However, it is veryimportant to keep in mind that these estimates still carry sig-niﬁcant uncertainties (see e.g. Conroy 2013; Leja et al. 2017,2019). Moreover, because of the complexity of the procedureused to obtain estimates of these derived quantities, it is verydiﬃcult to accurately and completely quantify their uncer- MNRAS , 000–000 (0000) CANDELS

Figure 10.

Greyscale and overlaid contours show the conditional distribution of star formation rate for a given stellar mass, for CANDELSobservations (top row), the two SAMs (SC; second row, and Lu; third rows), and

UniverseMachine with and without observationalerrors included (bottom two rows). Relations from the literature (Speagle et al. 2014; Iyer et al. 2018) are overplotted. In both SAMs, thepredicted SFR are too low at ﬁxed stellar mass, particularly for low-mass galaxies at z < ∼ tainties, which are needed for a rigorous statistical assessmentof the “goodness of ﬁt” of any theoretical model. The errorbudget should include contributions from the systematic un-certainties in deriving physical quantities such as stellar massor SFR from SED ﬁtting, as well as errors due to photometricredshift errors, photometric noise, and ﬁeld-to-ﬁeld variance.This detailed error budget has not been computed for quanti-ties commonly used to calibrate models, such as stellar massfunctions and SFR densities. We can get a ﬁrst order sense for the possible systematicerrors in the estimates of physical quantities by comparingthe results from diﬀerent methods. For stellar mass estimatesusing the zphot code compared with the P12 method, whichincorporates a more sophisticated prior on star formation his-tory, we ﬁnd systematic diﬀerences between the two methodsof 0 . − . . − . MNRAS , 000–000 (0000) ock Lightcones for CANDELS Figure 11.

Greyscale and overlaid contours show the distribution of log SFR vs. rest-frame UV magnitude, for CANDELS observations(top row) and the two SAMs (SC; second row, and Lu; third row). The CANDELS SFR estimates at a given rest-UV magnitude aresigniﬁcantly higher than the predictions of the SAMs. atic errors for the P12 estimates compared with the Barroet al. (2019) estimates of typically at least 0.3-0.5 dex up to2 dex, and a scatter of 0.5-1 dex. Both the systematic andrandom diﬀerences show dependencies on stellar mass andredshift. The zphot -based SFR estimates show even largerdiﬀerences compared with the Barro et al. (2019) SFR esti-mates, with diﬀerent dependencies on stellar mass than theP12 SFR estimates. Clearly, SFR estimates from either rest-UV and available IR photometry and diﬀerent SED ﬁtting ap-proaches still show very signiﬁcant discrepancies which mustbe better understood.An alternative approach is to forward model the simula-tions into the observational plane. This of course requiresadditional modeling steps, and the inclusion of assumptionsregarding additional ingredients such as stellar populationmodels and dust. However, estimating physical propertiesfrom SED ﬁtting also contains similar assumptions, and insome cases one has more information about the conditionsin the simulated galaxies than one does for the real galax-ies. An additional advantage to working in the observationalplane is that it is much easier to include modeling of observa-tional errors and selection eﬀects in this plane. We advocatecarrying out comparison in both planes (theoretical and ob- servational), as any diﬀerences in conclusions may illuminateproblems. One of the important results of this work is that acomparison between theoretical predictions and observationsin the “theoretical plane” of stellar mass or SFR distributionfunctions versus the “observational plane” of rest-UV and V-band luminosity functions appears to yield quantitatively dif-ferent assessments of the goodness of ﬁt of models comparedwith observations .One of our main long term goals is to work towards a fullforward modeling pipeline for multi-wavelength galaxy sur-veys. Over the next decade, wide area surveys from DESI,VRO, Euclid, the Nancy Grace Roman Space Telescope,4MOST, and other facilities will be carried out. We can usethe legacy observations from surveys such as CANDELS, tobuild a foundation for interpreting these new surveys. Whatwe have shown here is that the current generation of semi- We note that it is currently impossible to make rigorous state-ments about model goodness of ﬁt due to the unavailability ofcomplete, accurate error budgets, as discussed above. It may bethat if the uncertainties on the observational quantities being usedas constraints were properly accounted for in both cases, this dif-ference would not be present. MNRAS , 000–000 (0000) CANDELS

Figure 12.

The greyscale and overlaid contours show the joint distribution of dust attenuation in the rest V-band and rest-V bandmagnitude, in diﬀerent redshift bins, as estimated from SED ﬁtting using the method of P12 (top row), and as added to the modelgalaxies in the SC (middle) and Lu (bottom row) SAMs. It is encouraging that the dust attenuation adopted in the SAMs is similar tothe results from SED ﬁtting. analytic models produce decent broad agreement with keyproperties of galaxy evolution as represented by CANDELSover the redshift range 0 . < ∼ z < ∼

3. It has been shown else-where that these models produce similar results to those ofnumerical cosmological simulations and other semi-analyticmodels (Somerville & Dav´e 2015), and that they are also inagreement with higher redshift observations of galaxy popu-lations (Yung et al. 2019a,b), the reionization history (Yunget al. 2020), and observational probes of the cold gas phasein galaxies (Popping et al. 2014, 2019). While there are cer-tainly remaining tensions with observations, as seen here andalso in e.g. Popping et al. (2019), there is promising ongo-ing work to continue to improve the realism of the treatmentof physical processes in SAMs (e.g. Pandya et al. 2020). Inwork in progress, we are using this framework to create sim-ilar mock observations for future planned surveys with theJames Webb Space Telescope and the Nancy Grace RomanSpace Telescope (L. Y. A. Yung et al. in prep). SAMs coupledwith lightcones extracted from large volume N-body simula-tions have recently been used to create a 2 sq. deg. lightconefrom 0 < z <

10 (Yang et al. 2020, Yung et al. in prep). Inorder to create mock surveys for even larger areas — tensto hundreds of square degrees — that will be probed by the projects mentioned above, it is likely that new, even morecomputationally eﬃcient techniques will need to be devel-oped, perhaps enabled by machine learning based tools.

In this paper, we presented mock lightcones that were cus-tom created to aid in the interpretation of observations fromthe CANDELS program. We populated these lightcones withgalaxies using two diﬀerent semi-analytic modeling codes,and the empirical model

UniverseMachine . In addition,we presented specially curated “theory friendly” catalogs forthe CANDELS observations, which include a selection of theobserved and rest-frame photometry as well as estimates ofphysical galaxy properties such as redshift, stellar mass, andstar formation rate. We make all data products availablethrough a web-based data hub that allows users to previewand download the data.We showed comparisons between the mock lightcones andthe CANDELS observations for a selection of key quantitiesin the “observational plane”, including observed frame counts,rest-frame luminosity functions, color-magnitude and color-

MNRAS000

MNRAS000 , 000–000 (0000) ock Lightcones for CANDELS color distributions. We also compared our model predictionswith physical quantities estimated via SED-ﬁtting from theCANDELS photometry, such as stellar mass functions, SFRdistribution functions, the stellar mass vs. SFR relation, anddust attenuation. Although there are some tensions betweenthe theoretical predictions and the observations, we concludethat these mock catalogs reproduce the observational esti-mates accurately enough to be useful for interpreting currentobservations and making predictions for future ones. ACKNOWLEDGEMENTS

We thank the anonymous referee for helpful comments thatimproved the manuscript. We thank the Flatiron Institutefor providing computing resources and data access. Wewarmly thank Dylan Simon, Elizabeth Lovero, and AustenGabrielpillai for building Flathub. We thank Yotam Cohenfor useful comments on the Flathub datahub. RSS is sup-ported by the Simons Foundation. CP is supported by theCanadian Space Agency under a contract with NRC HerzbergAstronomy and Astrophysics. This work makes use of obser-vations taken by the CANDELS Multi-Cycle Treasury Pro-gram with the NASA/ESA HST, which is operated by theAssociation of Universities for Research in Astronomy, Inc.,under NASA contract NAS5-26555. This work is based inpart on observations made with the Spitzer Space Telescope,which was operated by the Jet Propulsion Laboratory, Cali-fornia Institute of Technology under a contract with NASA.

DATA AVAILABILITY

The data underlying this article are available fromthe Flatiron Institute Data Exploration and Compari-son Hub (Flathub), at http://ﬂathub.ﬂatironinstitute.org/group/candels.

APPENDIX A: SUPPLEMENTARY FIGURES

In this appendix we show results that supplement those inthe main text. Fig. A1 and A2 show the rest-UV and V-band luminosity functions for the Lu SAMs compared withthe CANDELS observations. Fig. A3 and A4 show the stel-lar mass function and SFRF from

UniverseMachine withobservational errors included. For details please see the maintext.

REFERENCES

Barro G., et al., 2019, ApJS, 243, 22Beckwith S. V. W., et al., 2006, AJ, 132, 1729Behroozi P., Wechsler R. H., Hearin A. P., Conroy C., 2019, MN-RAS, 488, 3143Behroozi P., et al., 2020, arXiv e-prints, arXiv:2007.04988Behroozi P. S., Wechsler R. H., Wu H.-Y., Busha M. T., KlypinA. A., Primack J. R., 2013a, ApJ, 763, 18—, 2013b, ApJ, 763, 18Binney J., Tremaine S., 1987, Galactic Dynamics. Princeton Uni-versity PressBoylan-Kolchin M., Ma C.-P., Quataert E., 2008, MNRAS, 383,93 Brammer G. B., van Dokkum P. G., Coppi P., 2010, EAZY: AFast, Public Photometric Redshift CodeBrennan R., et al., 2015, MNRAS, 451, 2933Bruzual G., Charlot S., 2003, MNRAS, 344, 1000Bryan G. L., Norman M. L., 1998, ApJ, 495, 80Calzetti D., Armus L., Bohlin R. C., Kinney A. L., Koornneef J.,Storchi-Bergmann T., 2000, ApJ, 533, 682Chabrier G., 2003, PASP, 115, 763Chary R., Elbaz D., 2001, ApJ, 556, 562Cirasuolo M., et al., 2007, MNRAS, 380, 585Conroy C., 2013, ARA&A, 51, 393Croton D. J., et al., 2006, MNRAS, 365, 11Dahlen T., Mobasher B., Faber S. M., Ferguson H. C., Barro G.,Finkelstein S. L., 2013, ApJ, 775, 93Fontana A., D’Odorico S., Poli F., Giallongo E., Arnouts S., Cris-tiani S., Moorwood A., Saracco P., 2000, AJ, 120, 2206Galametz A., et al., 2013, ApJS, 206, 10Garrison-Kimmel S., et al., 2017, MNRAS, 471, 1709Giavalisco M., et al., 2004, ApJ, 600, L93Grogin N. A., et al., 2011, ApJS, 197, 35Guo Y., et al., 2013, ApJS, 207, 24Henriques B. M. B., White S. D. M., Thomas P. A., Angulo R.,Guo Q., Lemson G., Springel V., Overzier R., 2015, MNRAS,451, 2663Iyer K., et al., 2018, ApJ, 866, 120Jiang F., Dekel A., Freundlich J., van den Bosch F. C., GreenS. B., Hopkins P. F., Benson A., Du X., 2020, arXiv e-prints,arXiv:2005.05974Kim C.-G., Ostriker E. C., Fielding D. B., Smith M. C., BryanG. L., Somerville R. S., Forbes J. C., Genel S., Hernquist L.,2020a, ApJ, 903, L34Kim C.-G., Ostriker E. C., Somerville R. S., Bryan G. L., FieldingD. B., Forbes J. C., Hayward C. C., Hernquist L., Pandya V.,2020b, ApJ, 900, 61Klypin A., Yepes G., Gottl¨ober S., Prada F., Hess S., 2016, MN-RAS, 457, 4340Koekemoer A. M., et al., 2011, ApJS, 197, 36Leja J., Johnson B. D., Conroy C., van Dokkum P. G., Byler N.,2017, ApJ, 837, 170Leja J., et al., 2019, ApJ, 877, 140Lu Y., Mo H. J., Weinberg M. D., Katz N., 2011, MNRAS, 416,1949Lu Y., et al., 2014, ApJ, 795, 123Merlin E., et al., 2019, MNRAS, 490, 3309Mobasher B., et al., 2015, ApJ, 808, 101Naab T., Ostriker J. P., 2017, ARA&A, 55, 59Navarro J. F., Frenk C. S., White S. D. M., 1996, ApJ, 462, 563Nayyeri H., et al., 2017, ApJS, 228, 7Okamoto T., Gao L., Theuns T., 2008, MNRAS, 390, 920Paciﬁci C., Charlot S., Blaizot J., Brinchmann J., 2012, MNRAS,421, 2002Pandya V., Somerville R. S., Angl´es-Alc´azar D., Hayward C. C.,Bryan G. L., Fielding D. B., Forbes J. C., Burkhart B., GenelS., Hernquist L., Kim C.-G., Tonnesen S., Starkenburg T.,2020, arXiv e-prints, arXiv:2006.16317Pandya V., et al., 2017, MNRAS, 472, 2054Popping G., Somerville R. S., Trager S. C., 2014, MNRAS, 442,2398Popping G., et al., 2019, ApJ, 882, 137Porter L. A., Somerville R. S., Primack J. R., Johansson P. H.,2014, MNRAS, 444, 942Rodr´ıguez-Puebla A., Behroozi P., Primack J., Klypin A., Lee C.,Hellinger D., 2016, MNRAS, 462, 893Santini P., et al., 2015, ApJ, 801, 97Scoville N., et al., 2007, ApJS, 172, 1Somerville R. S., Dav´e R., 2015, ARA&A, 53, 31Somerville R. S., Gilmore R. C., Primack J. R., Dom´ınguez A.,2012, MNRAS, 423, 1992 MNRAS , 000–000 (0000) CANDELS

Figure A1.

Luminosity functions in the rest-frame UV (1500 ˚A) divided into redshift bins, for the Lu-SAM, compared with the correspondingdistribution from CANDELS (black symbols show the mean over all ﬁelds; shaded areas show the minimum and maximum value in each binover the four ﬁelds). Dashed lines show the intrinsic luminosity functions with no dust attenuation; solid lines show the model predictionswith dust attenuation included. Dotted lines show the dust attenuated model predictions with a cut of F160W < .000

Luminosity functions in the rest-frame V-band divided into redshift bins for the Lu-SAM and CANDELS observations. Keyis as in Fig. A1. The Lu SAM predictions agree with the observed rest-frame V-band magnitude distributions fairly well in the regimewhere the observations are highly complete.

Figure A3.

Stellar mass functions divided into redshift bins, for

UniverseMachine , compared with stellar mass distribution functionsderived from CANDELS. Solid lines show the results for the intrinsic (error-free) stellar mass predictions in

UniverseMachine , whiledot-dashed lines show the predictions after modeling the expected errors on the stellar masses. Errors in stellar mass estimates can besigniﬁcant, especially at high redshift, and lead to an Eddington bias that impacts the high-mass end of the distribution.MNRAS , 000–000 (0000) CANDELS

Figure A4.

Star formation rate functions divided into redshift bins, for

UniverseMachine , compared with SFR functions derived fromCANDELS. Solid lines show the results for the intrinsic (error-free) SFR predictions in

UniverseMachine , while dot-dashed lines showthe predictions after modeling the expected errors on the SFR estimates.MNRAS000