A Composite Likelihood Approach for Inference under Photometric Redshift Uncertainty
M. M. Rau, C. B. Morrison, S. J. Schmidt, S. Wilson, R. Mandelbaum, Y. Y. Mao
MMNRAS , 1β22 (2015) Preprint 6 January 2021 Compiled using MNRAS L A TEX style ο¬le v3.0
A Composite Likelihood Approach for Inference under PhotometricRedshift Uncertainty
M. M. Rau β , C. B. Morrison , S. J. Schmidt , S. Wilson , R. Mandelbaum ,Y.-Y. Mao for the LSST Dark Energy Science Collaboration McWilliams Center for Cosmology, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213 Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195, USA School of Computer Science and Statistics, Lloyd Institute, Trinity College, Dublin, Ireland Department of Physics, University of California, Davis, CA 95616, USA Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
Obtaining accurately calibrated redshift distributions of photometric samples is one of the greatchallenges in photometric surveys like LSST, Euclid, HSC, KiDS, and DES. We combine theredshift information from the galaxy photometry with constraints from two-point functions,utilizing cross-correlations with spatially overlapping spectroscopic samples. Our likelihoodframework is designed to integrate directly into a typical large-scale structure and weak lensinganalysis based on two-point functions. We discuss eο¬cient and accurate inference techniquesthat allow us to scale the method to the large samples of galaxies to be expected in LSST.We consider statistical challenges like the parametrization of redshift systematics, discuss andevaluate techniques to regularize the sample redshift distributions, and investigate techniquesthat can help to detect and calibrate sources of systematic error using posterior predictivechecks. We evaluate and forecast photometric redshift performance using data from the Cos-moDC2 simulations, within which we mimic a DESI-like spectroscopic calibration sample forcross-correlations. Using a combination of spatial cross-correlations and photometry, we showthat we can provide calibration of the mean of the sample redshift distribution to an accuracyof at least 0 . ( + π§ ) , consistent with the LSST-Y1 science requirements for weak lensingand large-scale structure probes. Key words: keyword1 β keyword2 β keyword3
With ongoing and future large area photometric surveys like theDark Energy Survey (DES; e.g., Abbott et al. 2018b), the Kilo-Degree Survey (KiDS; e.g., Hildebrandt et al. 2017), the HyperSuprime-Cam (HSC; e.g., Aihara et al. 2018), the Rubin Observa-tory Legacy Survey of Space and Time (LSST; e.g., IveziΔ et al.2019), the Roman Space Telescope (e.g. Spergel et al. 2015) andEuclid (e.g. LaureΔ³s et al. 2011) modern cosmology has enteredthe era of precision cosmology, where it becomes increasingly im-portant to accurately account for sources of systematic bias anduncertainty (e.g. Mandelbaum 2018). Large area photometric sur-veys constrain cosmological parameters and the growth of structureusing two-point statistics of galaxy and shear ο¬elds (see e.g. Hilde-brandt et al. 2017; Uitert et al. 2017; Abbott et al. 2018a; Joudakiet al. 2018; Hikage et al. 2019; Heymans et al. 2020). Using only β E-mail: [email protected] the broadband photometry of galaxies allows for a limited accuracyin the estimated redshifts. In photometric surveys, we therefore typ-ically consider two-point statistics of density ο¬elds that have beenprojected along the line-of-sight, i.e., in the redshift direction. Theyare then subsequently compared with the corresponding weak lens-ing (WL) and large scale structure (LSS) theory predictions in alikelihood framework. These theory predictions have to account forthe line-of-sight projection, and therefore depend on the redshiftdistribution of the galaxies in the sample that have to be accuratelymodelled and calibrated (see e.g. Huterer et al. 2006; Hoyle et al.2018; Tanaka et al. 2018; Hildebrandt et al. 2020; Joudaki et al.2020).A primary goal of large area photometric survey programs is tomap the growth of structure and expansion history of the Universe,and thereby constrain the dark energy equation of state via thedistance-redshift and growth-redshift relations (see e.g., Albrechtet al. 2006, p. 31) which both enter the WL and LSS modelling.Note that these fundamental relationships within our cosmologicalmodel are redshift dependent, as are some key sources of theoret- Β© a r X i v : . [ a s t r o - ph . C O ] J a n ical uncertainty, such as the galaxy-dark matter bias model (seee.g. Matarrese et al. 1997; Clerkin et al. 2015; Chang et al. 2016;Simon & Hilbert 2018; Prat et al. 2018). The inferred ensembleredshift distributions for samples of galaxies can therefore exhibit adegeneracy with cosmological or astrophysical parameters. Inaccu-rate distance, or redshift, measurements based on the photometry ofthe galaxies are therefore important modelling systematics in thesesurveys (e.g. Ma et al. 2006; Bernstein & Huterer 2010). We there-fore must exploit all data sources that have the potential to breakthese degeneracies to perform eο¬cient and accurate inference.The two methods available to constrain the redshift of galaxiesin the absence of accurate spectroscopic measurements are βtem-plate ο¬ttingβ methods and empirical methods that βlearnβ the map-ping between photometry and redshift (for a recent review, seeSalvato et al. 2019). SED ο¬tting methods ο¬t the galaxy photome-try with models of the galaxy spectral energy distribution (SED;e.g., Arnouts et al. 1999; BenΓtez 2000; Ilbert et al. 2006; Feld-mann et al. 2006; Greisel et al. 2015; Leistedt et al. 2016; Malz &Hogg 2020). Machine Learning-based methods infer photometricredshifts by constructing a density estimate for the conditional dis-tribution of the galaxy redshifts given their photometry (Tagliaferriet al. 2003; Collister & Lahav 2004; Gerdes et al. 2010; CarrascoKind & Brunner 2013; Bonnett 2015; Rau et al. 2015; Hoyle 2016).Combinations of both these techniques have also been investigated(Speagle & Eisenstein 2015; Hoyle et al. 2015). Unfortunately theaccuracy of these techniques is limited since they suο¬er from diο¬er-ent sources of systematic error. Template ο¬tting approaches can besystematically biased, if ο¬ts are constructed using sets of spectralenergy distributions that are not representative of all galaxies in thesample. In contrast, photometric redshift techniques that require atraining set can produce systematically biased results due to incom-plete spectroscopic training samples. It is particularly diο¬cult toobtain representative spectroscopic data due to the long exposuretimes that are necessary to obtain accurate spectroscopic redshiftsfor faint sources (see e.g. Huterer et al. 2014; Newman et al. 2015).Instead of inferring photometric redshifts by ο¬tting modelsfor the spectral energy distribution, we can also infer photometricredshift infromation using spatial cross-correlations between pho-tometric samples and spectroscopic samples (e.g. Newman 2008;MΓ©nard et al. 2013; McQuinn & White 2013; Scottez et al. 2016;Raccanelli et al. 2017; Morrison et al. 2017; Davis et al. 2017; Gattiet al. 2018). Cross-correlation methods measure the spatial crosscorrelation between a reference sample with accurate redshift infor-mation, typically spectroscopic galaxy catalogs, and photometricsamples that do not have accurate redshift information. Ignoringcosmic magniο¬cation eο¬ects (see e.g. Scranton et al. 2005) theexpected spatial cross correlation is only nonzero for samples atthe same redshift. By cross correlating subsamples of spectroscopicsamples that are selected in thin redshift slices with these photomet-ric catalogs and comparing the resulting signals, we can reconstructthe redshift distribution of the unknown photometric sample.It is important to highlight the diο¬erent sources of systematicuncertainty in these two approaches: the measurement of spatialcross correlations requires that the sample with unknown redshiftinformation and the reference sample overlap spatially and cover thesame redshift range. However, the spectroscopic calibration sampledoes not have to cover the same color/magnitude space as the un-known photometric sample. It is, however, important to accuratelymodel the redshift-dependent galaxy-dark matter bias of the photo-metric sample and the spectroscopic calibration sample, since theredshift-dependent ratio between these two functions is completelydegenerate with the photometric redshift distribution to be inferred. In contrast, template-based redshift inference requires a complete setof templates but no calibration sample. Checking a ο¬tted model canalso, in principle, use the color space alone, by comparing the pho-tometry generated by the ο¬tted templates with the measurements. Inpractice this approach has limitations. The generation of SED modeltemplates is challenging and often requires spectroscopic referencedata for some galaxies. Furthermore, degeneracies between galaxytype and galaxy redshift can make the aforementioned color-basedapproach ill-deο¬ned. Thus, while template ο¬tting does not requirespectroscopic data to infer redshifts of galaxies, in practise it is of-ten necessary for building and evaluating models. Finally, empiricaltechniques that construct photometric redshift estimates by βlearn-ingβ from a spectroscopic calibration dataset require reference datathat does not have to spatially overlap, but needs to be representativein color-redshift space.Besides spatial correlations of galaxy clustering, we can alsouse other two-point statistics from e.g., weak gravitational lensing(e.g. Benjamin et al. 2013; StΓΆlzner et al. 2020). There also existsa considerable literature in how photometric redshift uncertaintycan be treated in the individual cosmological probes (McLeod et al.2017; Hoyle & Rau 2019) or how one can combine template ο¬ttingand cross correlation measurements (Alarcon et al. 2020b; SΓ‘nchez& Bernstein 2019; Jones & Heavens 2019; Rau et al. 2020). Shortlybefore this paper was submitted for publication Myles et al. (2020);Gatti et al. (2020); Cawthon et al. (2020) presented the redshiftinference scheme for the DES Y3 analyses, that combines a cross-correlation and shear ratio data vector with redshift informationderived using an empirical mapping of broad band βWide ο¬eldβphotometry to spatially smaller calibration ο¬elds with narrow-bandphotometric and spectroscopic redshift information.This paper presents a composite likelihood approach to jointlyconstrain photometric redshift distributions using information fromboth the available photometry and the clustering of galaxies. Wefocus on statistical challenges in this inference. In particular, theparts of the model that utilize the photometry of galaxies can posecomputational challenges, since the likelihood depends on mea-surements of all galaxies in the sample. We therefore derive an eο¬-cient methodology that facilitates inference of redshift distributionswithin this computationally expensive part of the model. Redshiftinference based on noisy photometry is an inverse problem and theinference scheme requires careful regularization to achieve goodprobability coverage. We therefore describe several regularizationtechniques and evaluate their respective merits in numerical exper-iments. Information from the spatial distribution of galaxies canthen be incorporated within the composite likelihood frameworkby eο¬cient MCMC sampling. We test our methodology using datafrom the CosmoDC2 (Korytov et al. 2019) simulated extragalac-tic catalog. While some of the inference techniques developed inthis paper can also be used in the context of an empirical mappingto a small-area calibration ο¬eld, our primary goal is to facilitateinference using physical SED modelling that utilizes a likelihoodthat jointly describes photometry and spatial information for all ob-served galaxies. Inference under spatial variations in photometry orredshift information will be addressed in the course of the paperand in Β§ 10.The paper is structured as follows: Β§ 2 describes the simu-lated galaxy samples used in this work, while Β§ 3 gives a briefintroduction into inverse problems and deconvolution by discussinga simple toy model for photometric redshift inference. The fol-lowing sections describe our inference methodology in detail: Β§ 4starts with a description of the photometric likelihood, where wealso discuss several regularization schemes, and Β§ 5 formulates the MNRAS , 1β22 (2015), 1β22 (2015)
With ongoing and future large area photometric surveys like theDark Energy Survey (DES; e.g., Abbott et al. 2018b), the Kilo-Degree Survey (KiDS; e.g., Hildebrandt et al. 2017), the HyperSuprime-Cam (HSC; e.g., Aihara et al. 2018), the Rubin Observa-tory Legacy Survey of Space and Time (LSST; e.g., IveziΔ et al.2019), the Roman Space Telescope (e.g. Spergel et al. 2015) andEuclid (e.g. LaureΔ³s et al. 2011) modern cosmology has enteredthe era of precision cosmology, where it becomes increasingly im-portant to accurately account for sources of systematic bias anduncertainty (e.g. Mandelbaum 2018). Large area photometric sur-veys constrain cosmological parameters and the growth of structureusing two-point statistics of galaxy and shear ο¬elds (see e.g. Hilde-brandt et al. 2017; Uitert et al. 2017; Abbott et al. 2018a; Joudakiet al. 2018; Hikage et al. 2019; Heymans et al. 2020). Using only β E-mail: [email protected] the broadband photometry of galaxies allows for a limited accuracyin the estimated redshifts. In photometric surveys, we therefore typ-ically consider two-point statistics of density ο¬elds that have beenprojected along the line-of-sight, i.e., in the redshift direction. Theyare then subsequently compared with the corresponding weak lens-ing (WL) and large scale structure (LSS) theory predictions in alikelihood framework. These theory predictions have to account forthe line-of-sight projection, and therefore depend on the redshiftdistribution of the galaxies in the sample that have to be accuratelymodelled and calibrated (see e.g. Huterer et al. 2006; Hoyle et al.2018; Tanaka et al. 2018; Hildebrandt et al. 2020; Joudaki et al.2020).A primary goal of large area photometric survey programs is tomap the growth of structure and expansion history of the Universe,and thereby constrain the dark energy equation of state via thedistance-redshift and growth-redshift relations (see e.g., Albrechtet al. 2006, p. 31) which both enter the WL and LSS modelling.Note that these fundamental relationships within our cosmologicalmodel are redshift dependent, as are some key sources of theoret- Β© a r X i v : . [ a s t r o - ph . C O ] J a n ical uncertainty, such as the galaxy-dark matter bias model (seee.g. Matarrese et al. 1997; Clerkin et al. 2015; Chang et al. 2016;Simon & Hilbert 2018; Prat et al. 2018). The inferred ensembleredshift distributions for samples of galaxies can therefore exhibit adegeneracy with cosmological or astrophysical parameters. Inaccu-rate distance, or redshift, measurements based on the photometry ofthe galaxies are therefore important modelling systematics in thesesurveys (e.g. Ma et al. 2006; Bernstein & Huterer 2010). We there-fore must exploit all data sources that have the potential to breakthese degeneracies to perform eο¬cient and accurate inference.The two methods available to constrain the redshift of galaxiesin the absence of accurate spectroscopic measurements are βtem-plate ο¬ttingβ methods and empirical methods that βlearnβ the map-ping between photometry and redshift (for a recent review, seeSalvato et al. 2019). SED ο¬tting methods ο¬t the galaxy photome-try with models of the galaxy spectral energy distribution (SED;e.g., Arnouts et al. 1999; BenΓtez 2000; Ilbert et al. 2006; Feld-mann et al. 2006; Greisel et al. 2015; Leistedt et al. 2016; Malz &Hogg 2020). Machine Learning-based methods infer photometricredshifts by constructing a density estimate for the conditional dis-tribution of the galaxy redshifts given their photometry (Tagliaferriet al. 2003; Collister & Lahav 2004; Gerdes et al. 2010; CarrascoKind & Brunner 2013; Bonnett 2015; Rau et al. 2015; Hoyle 2016).Combinations of both these techniques have also been investigated(Speagle & Eisenstein 2015; Hoyle et al. 2015). Unfortunately theaccuracy of these techniques is limited since they suο¬er from diο¬er-ent sources of systematic error. Template ο¬tting approaches can besystematically biased, if ο¬ts are constructed using sets of spectralenergy distributions that are not representative of all galaxies in thesample. In contrast, photometric redshift techniques that require atraining set can produce systematically biased results due to incom-plete spectroscopic training samples. It is particularly diο¬cult toobtain representative spectroscopic data due to the long exposuretimes that are necessary to obtain accurate spectroscopic redshiftsfor faint sources (see e.g. Huterer et al. 2014; Newman et al. 2015).Instead of inferring photometric redshifts by ο¬tting modelsfor the spectral energy distribution, we can also infer photometricredshift infromation using spatial cross-correlations between pho-tometric samples and spectroscopic samples (e.g. Newman 2008;MΓ©nard et al. 2013; McQuinn & White 2013; Scottez et al. 2016;Raccanelli et al. 2017; Morrison et al. 2017; Davis et al. 2017; Gattiet al. 2018). Cross-correlation methods measure the spatial crosscorrelation between a reference sample with accurate redshift infor-mation, typically spectroscopic galaxy catalogs, and photometricsamples that do not have accurate redshift information. Ignoringcosmic magniο¬cation eο¬ects (see e.g. Scranton et al. 2005) theexpected spatial cross correlation is only nonzero for samples atthe same redshift. By cross correlating subsamples of spectroscopicsamples that are selected in thin redshift slices with these photomet-ric catalogs and comparing the resulting signals, we can reconstructthe redshift distribution of the unknown photometric sample.It is important to highlight the diο¬erent sources of systematicuncertainty in these two approaches: the measurement of spatialcross correlations requires that the sample with unknown redshiftinformation and the reference sample overlap spatially and cover thesame redshift range. However, the spectroscopic calibration sampledoes not have to cover the same color/magnitude space as the un-known photometric sample. It is, however, important to accuratelymodel the redshift-dependent galaxy-dark matter bias of the photo-metric sample and the spectroscopic calibration sample, since theredshift-dependent ratio between these two functions is completelydegenerate with the photometric redshift distribution to be inferred. In contrast, template-based redshift inference requires a complete setof templates but no calibration sample. Checking a ο¬tted model canalso, in principle, use the color space alone, by comparing the pho-tometry generated by the ο¬tted templates with the measurements. Inpractice this approach has limitations. The generation of SED modeltemplates is challenging and often requires spectroscopic referencedata for some galaxies. Furthermore, degeneracies between galaxytype and galaxy redshift can make the aforementioned color-basedapproach ill-deο¬ned. Thus, while template ο¬tting does not requirespectroscopic data to infer redshifts of galaxies, in practise it is of-ten necessary for building and evaluating models. Finally, empiricaltechniques that construct photometric redshift estimates by βlearn-ingβ from a spectroscopic calibration dataset require reference datathat does not have to spatially overlap, but needs to be representativein color-redshift space.Besides spatial correlations of galaxy clustering, we can alsouse other two-point statistics from e.g., weak gravitational lensing(e.g. Benjamin et al. 2013; StΓΆlzner et al. 2020). There also existsa considerable literature in how photometric redshift uncertaintycan be treated in the individual cosmological probes (McLeod et al.2017; Hoyle & Rau 2019) or how one can combine template ο¬ttingand cross correlation measurements (Alarcon et al. 2020b; SΓ‘nchez& Bernstein 2019; Jones & Heavens 2019; Rau et al. 2020). Shortlybefore this paper was submitted for publication Myles et al. (2020);Gatti et al. (2020); Cawthon et al. (2020) presented the redshiftinference scheme for the DES Y3 analyses, that combines a cross-correlation and shear ratio data vector with redshift informationderived using an empirical mapping of broad band βWide ο¬eldβphotometry to spatially smaller calibration ο¬elds with narrow-bandphotometric and spectroscopic redshift information.This paper presents a composite likelihood approach to jointlyconstrain photometric redshift distributions using information fromboth the available photometry and the clustering of galaxies. Wefocus on statistical challenges in this inference. In particular, theparts of the model that utilize the photometry of galaxies can posecomputational challenges, since the likelihood depends on mea-surements of all galaxies in the sample. We therefore derive an eο¬-cient methodology that facilitates inference of redshift distributionswithin this computationally expensive part of the model. Redshiftinference based on noisy photometry is an inverse problem and theinference scheme requires careful regularization to achieve goodprobability coverage. We therefore describe several regularizationtechniques and evaluate their respective merits in numerical exper-iments. Information from the spatial distribution of galaxies canthen be incorporated within the composite likelihood frameworkby eο¬cient MCMC sampling. We test our methodology using datafrom the CosmoDC2 (Korytov et al. 2019) simulated extragalac-tic catalog. While some of the inference techniques developed inthis paper can also be used in the context of an empirical mappingto a small-area calibration ο¬eld, our primary goal is to facilitateinference using physical SED modelling that utilizes a likelihoodthat jointly describes photometry and spatial information for all ob-served galaxies. Inference under spatial variations in photometry orredshift information will be addressed in the course of the paperand in Β§ 10.The paper is structured as follows: Β§ 2 describes the simu-lated galaxy samples used in this work, while Β§ 3 gives a briefintroduction into inverse problems and deconvolution by discussinga simple toy model for photometric redshift inference. The fol-lowing sections describe our inference methodology in detail: Β§ 4starts with a description of the photometric likelihood, where wealso discuss several regularization schemes, and Β§ 5 formulates the MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error cross-correlation likelihood. Both of these parts are then combinedin a composite likelihood framework in Β§ 6. Β§ 7 discusses aspectsof model evaluation and parametrization of systematics. We thenapply our methodology to the simulated data in Β§ 8. Β§ 9 summarizesour ο¬ndings. Β§ 10 closes the paper with a discussion of future work. We use data from the CosmoDC2 simulated extragalactic catalog(Korytov et al. 2019) in this work. CosmoDC2 is a mock extragalac-tic catalog based on a trillion particle N-body simulation with a boxsize of 4 .
225 Gpc , the βOuter Rimβ run (Heitmann et al. 2019). Thesimulated catalog covers 440 deg of sky area and spans a redshiftrange 0 < π§ β€
3. Galaxies are assigned to the halo catalog andsupplemented with additional galaxies based on the assumption ofa power law extrapolation of a power law sub-halo mass function atlower masses. The resulting catalog exhibits a number count slopeconsistent with that of the Hyper SuprimeCam Deep survey (Aiharaet al. 2018) down to an r-band magnitude of r βΌ
28, well beyondthe apparent magnitudes that will be utilized in this paper. Thegalaxy catalog uses a combination of empirical and semi-analyticmodelling, utilizing the Galacticus (Benson 2012) and GalSamplercodes (Hearin et al. 2020). For more details on the catalog genera-tion and properties we refer the reader to Korytov et al. (2019).In Β§ 2.1 we will describe the particular selection of photometricdata and the photometric redshift catalog used in this work. Β§ 2.2describes the generation of the reference spectroscopic sample.
The photometric sample consists of mock galaxies from the LSST-DESC βCosmoDC2" synthetic sky catalog (Korytov et al. 2019).The catalogs do not contain stars or AGN, so star-galaxy separationand non-thermal contamination are not an issue in this data set.Observations consist of magnitudes in the six π’ππππ§π¦
Rubin Ob-servatory ο¬lters. Simulated photometric errors were added to thesix bands using a simple model designed to match the expectedphotometric S/N due to depth, seeing, airmass, and sky brightnessat the completion of the full 10-year Wide Fast Deep survey (IveziΔet al. 2019). All galaxies are assumed to be isolated, i.e. blendingeο¬ects are not modeled. We restrict the sample to galaxies withan π LSST -band magnitude of π LSST < . π LSST -band signal-to-noise (S/N) of βΌ
20. We makethis cut because redshift estimates for lower S/N objects degraderapidly below this S/N level. We reserve a small set of βΌ π LSST < . Template Fitting Redshifts
We use the publicly available Bayesianphotometric redshift code BPZ (BenΓtez 2000) to compute redshiftestimates for our simulated galaxies. BPZ is a template-based red-shift estimation code that estimates redshift by computing modelο¬uxes from a set of template SEDs and evaluating the resulting π when compared to observed ο¬uxes. BPZ includes the optional ap-plication of a bivariate Bayesian prior over the joint distribution of available at: type/SED and apparent magnitude in the redshift estimation, thoughwe do not employ the prior in this investigation.To construct a template set we begin with the empirical SEDcatalog of Brown et al. (2014). We then use the ESP software pack-age (Kalmbach & Connolly 2017), which constructs a principalcomponent basis set from the empirical SEDs and uses photomet-ric training data to construct the ο¬nal SED template via GaussianProcesses. The ο¬nal training set used by BPZ consists of the 129empirical templates and 100 additional templates output from ESP.These templates roughly, but not perfectly, span the observed rangeof colors for the sample. We compute the likelihoods for all SEDs bycomparing the observed ο¬uxes to model ο¬uxes evaluated on a grid ofredshift spanning 0 < π§ <
3. The 1-dimensional marginalized (overtemplate type) posterior distributions for each galaxy comprise ourο¬nal template ο¬tting redshift estimate.
Machine Learning-based Redshifts
We use the python version ofthe publicly available FlexCode (Izbicki & Lee 2017) combinedwith the XGBoost algorithm (Chen & Guestrin 2016) to computephotometric redshifts which we will refer to by the name FlexZ-Boost. FlexZBoost estimates the conditional density in redshiftfor each galaxy by ο¬tting to an orthonormal set of basis functions(in this case cosines) via regression with XGBoost. To further re-ο¬ne the estimates, 25 per cent of the training data is reserved asa validation set to determine optimal values for trimming extrane-ous low-level peaks in the likelihood, and a βsharpening" parameterof the form π ( π§ ) β π ( π§ ) πΌ that adjusts the overall width of thedensity estimates to best match the data. For this analysis we use35 cosine basis functions, and a sharpening parameter, chosen viacross-validation, of 1 .
4. Given the representative training data usedin this experiment, we expect very accurate redshift estimates fromthe FlexZBoost algorithm.
The simulated reference spectroscopic sample is selected to mimic,in broad strokes, the sample selections of the Dark Energy Spec-trosopic Instrument (DESI, DESI Collaboration et al. 2016, Zhouet al. 2020a, Zhou et al. 2020b). This consists of a set of four sam-ples with increasing mean redshift: a magnitude-limited sample to π LSST < .
5; a Luminous Red Galaxy (LRG) sample; an Emis-sion Line Galaxy (ELG) sample; and ο¬nally a high-redshift Quasar(QSO) sample. We show the redshift and π LSST -band magnitude dis-tributions of these subsamples in Fig. 1. The LRG, ELG, and QSOsamples are selected such that their density per redshift matchesthat of the DESI samples (priv. comm. Rongpu Zhou and Jeο¬reyNewman). This sample is distinct from the redshift calibration datamentioned in the previous section.We construct a magnitude-limited sample, by imposing a mag-nitude cut of π LSST < .
5. To approximate the LRG, ELG and QSOgalaxy samples, we use the values of the stellar mass, star forma-tion rate, and black hole mass times Eddington ratio as proxies forobjects that are LRG, ELG, and QSO-like respectively. Our goalwith these samples is to select galaxies that will have diο¬ering biasproperties and mimic the complexities of the DESI sample in thisregard, while matching the density and signal-to-noise we wouldexpect with a DESI-like sample. We thus use these simple truthquantities from the simulation rather than recreate the full colorselection of a true, simulated DESI sample. The QSO, ELG, and available at https://github.com/tpospisi/flexcode MNRAS , 1β22 (2015)
Figure 1.
Left:
Redshift probability density functions of the galaxy populations that constitute the DESI-like spectroscopic reference sample.
Right:
Corresponding i-band magnitude distributions.
LRG samples are all selected with π LSST > .
5, to be independentof the magnitude-limited sample. Additionally, QSOs and ELGs areselected to have π LSST < . π§ LSST < . βΌ
50 deg test area in theCosmoDC2 simulations and apply them to the full 300 deg area. As we will see in detail in the following sections, the photometricredshift problem is a deconvolution problem, where the redshiftdistribution of a sample of photometrically observed galaxies isinferred from their noisy photometric measurements. To give thereader an intuitive understanding of deconvolution problems, wepresent a short introduction into the classical deconvolution prob-lem. A similar description in the context of photometric redshiftestimation can be found in Padmanabhan et al. (2005). We close thissection by discussing the limitations of the toy model consideredhere and motivate the likelihood inference framework presented inthe following.
Consider three vectors of random variables Z , Z π and π with di-mension π gal , which denotes the photometric sample size. Z and Z π denote the true and photometric redshifts of the galaxies in thesample and π the residual error between both quantities. The addi-tive noise model that connects these random variables is given as: Z π = Z + π . (1)The probability densities associated with these random variablesare: π π βΌ π π§ (2) π ππ βΌ π ππ§ (3) π π βΌ π π , (4)where π β { , . . . , π gal } and β βΌ β connects the realization of a ran-dom variable on the left hand side with the probability densityfunction (PDF) on the right hand side from which this realization isdrawn.The random variable π π is assumed to be identically and in-dependently distributed, as well as independent of π π . These as-sumptions do not hold in the photometric redshift scenario, as thenoise very clearly depends on the color, and therefore redshift, ofthe galaxy. However, in the following toy model, we adopt these as-sumptions for simplicity. The theory can be easily extended towardsinput-dependent noise (see e.g. Meister 2009) without changing theintuition presented in this section.In order to derive an estimator for π π§ , we use the convolutiontheorem that connects the PDF of the sum of independent randomvariables with the convolution of their densities. We can thereforewrite: π ππ§ = π π§ β π π = β« π π§ ( π§ π β π§ ) π π ( π§ ) d π§ = β« π π ( π§ π β π§ ) π π§ ( π§ ) d π§ , (5) Note that the probability densities π π§ and π ππ§ are both redshift distribu-tions of samples of galaxies. They diο¬er since ( π ππ§ , π π§ ) denotes the sampledistribution of (photometric, true or spectroscopic) galaxy redshifts. Thus π ππ§ would be broader, since the error in the redshift, drawn from π π , isconvolved with the true redshift. A Fourier-based approach is not necessary. Concretely, the likelihoodframework presented in the following section works in real space. A Fourierdescription for the classical deconvolution problem is, however, analyticallytractable and provides a clear picture of the nature of the problem and theimportance of regularization. MNRAS , 1β22 (2015), 1β22 (2015)
Consider three vectors of random variables Z , Z π and π with di-mension π gal , which denotes the photometric sample size. Z and Z π denote the true and photometric redshifts of the galaxies in thesample and π the residual error between both quantities. The addi-tive noise model that connects these random variables is given as: Z π = Z + π . (1)The probability densities associated with these random variablesare: π π βΌ π π§ (2) π ππ βΌ π ππ§ (3) π π βΌ π π , (4)where π β { , . . . , π gal } and β βΌ β connects the realization of a ran-dom variable on the left hand side with the probability densityfunction (PDF) on the right hand side from which this realization isdrawn.The random variable π π is assumed to be identically and in-dependently distributed, as well as independent of π π . These as-sumptions do not hold in the photometric redshift scenario, as thenoise very clearly depends on the color, and therefore redshift, ofthe galaxy. However, in the following toy model, we adopt these as-sumptions for simplicity. The theory can be easily extended towardsinput-dependent noise (see e.g. Meister 2009) without changing theintuition presented in this section.In order to derive an estimator for π π§ , we use the convolutiontheorem that connects the PDF of the sum of independent randomvariables with the convolution of their densities. We can thereforewrite: π ππ§ = π π§ β π π = β« π π§ ( π§ π β π§ ) π π ( π§ ) d π§ = β« π π ( π§ π β π§ ) π π§ ( π§ ) d π§ , (5) Note that the probability densities π π§ and π ππ§ are both redshift distribu-tions of samples of galaxies. They diο¬er since ( π ππ§ , π π§ ) denotes the sampledistribution of (photometric, true or spectroscopic) galaxy redshifts. Thus π ππ§ would be broader, since the error in the redshift, drawn from π π , isconvolved with the true redshift. A Fourier-based approach is not necessary. Concretely, the likelihoodframework presented in the following section works in real space. A Fourierdescription for the classical deconvolution problem is, however, analyticallytractable and provides a clear picture of the nature of the problem and theimportance of regularization. MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error The Fourier transform of a probability distribution is the char-acteristic function. We will denote the characteristic functions of( π π§ , π ππ§ , π π ) as ( π ft π§ , π p , ft π§ , π ft π ). Given a sample drawn from a PDF,e.g., the sample of photometric redshifts of π gal galaxies, we canestimate π p , ft π§ as Λ π p , ft π§ ( π‘ ) = π gal π gal βοΈ π = exp (cid:16) ππ‘π ππ (cid:17) . (6)The argument π‘ of the characteristic function could be interpretedas a kind of redshift-frequency if we treat the redshift of a galaxy asa βtime parameterβ. Under the assumption of independence between Z and π we can write: π p , ft π§ ( π‘ ) = Λ π ft π§ ( π‘ ) Λ π ft π ( π‘ ) = π ft π§ ( π‘ ) π ft π ( π‘ ) . (7)Therefore an estimator for π ft π§ ( π‘ ) is given as:Λ π π π‘π§ ( π‘ ) = π ft π ( π‘ ) π gal π gal βοΈ π = exp (cid:16) ππ‘π ππ (cid:17) , (8)where we assume π ft π ( π‘ ) is known and nonzero everywhere. We notethat this estimator is consistent and unbiased (Meister 2009). Theerror term π ft π ( π‘ ) here acts as a βο¬lterβ to weight down small scalemodes in the distribution. However, we note that this term 1 / π ft π canbecome large when π ft π is small.As a consequence the inverse Fourier transform π π§ ( π§ ) = π β« exp (β ππ‘π§ ) Λ π π π‘π§ ( π‘ ) d π‘ , (9)is neither integrable nor square integrable. Loosely speaking thisimplies that the parameter space that describes the shape of π π§ ( π§ ) does not have to be bounded. We will see this eο¬ect also for the morecomplex model considered in the later sections of this work. Wereiterate that while the estimator of Λ π ft π§ has very desirable properties,the inverse transformation is not well deο¬ned, hence deconvolutionproblems are part of a larger class of βinverse problemsβ.In order to obtain well-deο¬ned results, we therefore have to per-form regularization either by regulating the shape/parametrizationof π ππ§ (e.g. using Kernel methods), projecting π π§ onto a suitablebasis like wavelet functions or by directly restricting the 1 / π ft π term,as implemented in a Ridge method (e.g. Meister 2009, Β§ 2.2.3).We will not discuss the details of these methods and refer to theliterature for a more detailed explanation (e.g. Meister 2009). It is,however, instructive to study the functional form of one of these reg-ularized estimators. Making the ansatz of a kernel density estimatefor the photometric redshift PDF, one can show that the deconvolveddensity Λ π π§ can be estimated as (e.g. Meister 2009):Λ π π§ ( π§ ) = π β« exp (β ππ‘π§ ) (cid:32) πΎ ft ( π‘π ) π ft π ( π‘ ) (cid:33) π gal π gal βοΈ π = exp (cid:16) ππ‘π ππ (cid:17) d π‘ , (10)where π denotes the bandwidth and πΎ ft the fourier transform ofthe kernel function that enters the kernel density estimation ansatzfor π ππ§ . We see that by restricting the shape of the density π ππ§ to a kernel density estimate whose smoothness is governed by theparameter π , we regularize the 1 / π ft π ( π‘ ) term by a multiplicative fac-tor, that renders the inverse Fourier transformation both integrableand square integrable assuming bounded, compactly supported and Here,Λdenotes an estimator for the respective function. non-vanishing πΎ ft . The bandwidth parameter governs the tradeoο¬between the bias, or βsmoothnessβ, of the density estimate, and itsvariance. Choosing a larger bandwidth washes out small scale noisein the reconstructed density. In the limit of vanishing bandwidth,we would again obtain an ill-posed inverse Fourier transformation.In the following sections, we will apply regularization tech-niques that restrict the functional form of the redshift distribution,following a similar idea as presented in closed form in Eq. (10) forthe classical deconvolution problem. We note that inverse problems like the classical deconvolution prob-lem also appear in several other scenarios like the measurementsof shapes, where the point-spread function (PSF) of galaxies con-volves the galaxiesβ light proο¬les and leads to a loss of information.While the considered toy model of the photometric redshift problemis analytically tractable, it does not describe the realistic situation.Besides the relatively simple extension towards a galaxy-dependentphotometric noise, the noise distribution π π is, in photometric red-shift estimation, given as a joint likelihood between the photometryof all galaxies in the sample, that depends on the additional pa-rameters that enter the SED modelling. The redshift of each galaxyis a parameter that enters its likelihood and the sample redshiftdistribution is its prior. Furthermore, the model does not properlyaccount for the spatial distribution of galaxies. The clustering ofgalaxies does not only constrain their redshift distribution, but con-nects to SED modelling, with nuisance parameters that describe,e.g., galaxy-dark matter bias.We structure the discussion of the likelihoods used in this workin practice based on the following roadmap: we ο¬rst describe ourlikelihood framework for the photometry of galaxies given a set oftemplates in Β§ 4. We reiterate that in contrast to the classical de-convolution problem, the βerror distributionβ in the full photometricredshift problem is based on the joint photometric likelihood. Wewill therefore base our estimator and inference on this likelihoodinstead of the characteristic function. Despite these methodologicaldiο¬erences, we note that the necessity for regularization is the sameas in the analytically tractable classical deconvolution problem. Theconsidered regularization schemes are described in Β§ 4.2. A partic-ular challenge in the context of large area photometric surveys isthe necessity to scale the inference to a large number of galaxies. InAppendices Β§ A and Β§ B we derive an eο¬cient inference frameworkbased on the Laplace approximation that facilitates fast probabilisticdeconvolution. We will use this deconvolution methodology in thefollowing sections Β§ 4 to Β§ 8. The spectral energy distributions of distant galaxies are a complexsuperposition of spectral components from their stellar populations.The SED of the galaxy can be uniquely mapped to a givenredshift π§ , which allows us to predict the galaxy ο¬ux as a functionof redshift in a given optical ο¬lter band F ( π ) by π π ( π§, πΌ ) = β« F ( π ) SED π ( π, π§, πΌ ) d π . (11)where SED π ( π, π§, πΌ ) is the Spectral Energy Distribution templatein units of erg (cid:14) cm (cid:14) s (cid:14) Γ . The parameter πΌ denotes additional freeparameters in the SED template models, such as galaxy age, type, orred continuum slope. For a given set of photometric ο¬lters F ( π ) we MNRAS , 1β22 (2015) obtain a mapping between the redshift π§ of the galaxy and a vectorof ο¬uxes f . We will denote this mapping as T ( π§, πΌ ) .Assuming that the measurements of photometry for diο¬erentgalaxies are independent we can make the ansatz for the jointlikelihood of ο¬uxes of a galaxy sample ΛF π ( ΛF | z , πΆ ) = π gal (cid:214) π = N ( Λf i |T ( π§ π , πΌ π ) , πΊ π ) . (12)Here, πΊ π denotes the measurement covariance matrix of the ο¬uxmeasurements Λf i , and ΛF denotes the set of all ο¬ux measurementsof the galaxies. We assume Gaussian uncertainties here, where N ( π₯, π, Ξ£ ) denotes the Normal distribution. The parameter πΌ caneither be a galaxy-speciο¬c index that selects a certain template froma pre-speciο¬ed number of models, or a physical property of galaxies.The prior on the parameters π§ and πΌ must account for theircorrelation. An example for a possible parametrization in the caseof a galaxy-speciο¬c template index would be a two-dimensionalhistogram. However, other parametrizations are possible, especiallyif additional parameters that change the shape of the base templatesare included in the template set. In this work, we will consider thesimplest case, where we use a multidimensional histogram priorwhere each histogram cell denotes a combination of redshift binand discretized πΌ parameter value, that for example could indicatea template selection. The histogram index π runs over all histogrambins { π : 0 < π β€ π tot } , where π tot = π bins Γ π parameters . Theprior on the corresponding histogram heights, denoted as π π΅π cor-responding to the interval πΌ π in the π§ β πΌ parameter space, reads: π ( π§, πΌ ) = π tot βοΈ π = π π΅π [( π§, πΌ ) β πΌ π ] . (13)Here [ πΎ ] denotes the Iverson bracket, that is (0, 1) if the proposition πΎ is (false, true). We note that n B parametrizes the joint distributionof redshift histograms and πΆ parameter. For simplicity we will inthe following omit the marginalization over πΆ and refer to n B as theparameters of the sample redshift distribution. The reason is that inthis paper we do not add additional parameters to parametrize theSEDs over which we need to marginalize. In applications like weakgravitational lensing and galaxy clustering we are mainly interestedin estimating the redshift distribution of a sample of galaxies, herereferred to as the base sample and parametrized by the vector n B .It is therefore useful to marginalize over the redshifts of individualgalaxies. We note that if the posterior of individual galaxy redshiftsis important, we can always post-sample using the ο¬nal posterioron n B , based on Eq. (34), that then also includes information fromgalaxy clustering. The posterior distribution of the sample redshiftdistribution given ΛF is then: π ( n B | ΛF ) β π ( n B ) π gal (cid:214) π = β« β d π§ π π ( π§ π | n B ) N ( Λf i |T ( π§ π ) , Ξ£ π ) . (14)Discretizing the integral and using Eq. (13) we obtain π ( n B | ΛF ) β π ( n B ) π gal (cid:214) π = π tot βοΈ π = π B π β« π§ ππ π§ ππΏ d π§ π π ( Λf i |T ( π§ π ) , Ξ£ π ) . (15)The histogram heights π π΅π = π π΅π (cid:14) Ξ π§ can be expressed as the ra-tio between Ο and the histogram width Ξ π§ , assuming equal-sized This assumption can be violated due to eο¬ects such as blending of nearbygalaxy light proο¬les on ο¬ux calibration errors.. redshift bins. The vector Ο π© has the properties (cid:205) π tot π = π π΅π = β€ π π΅π β€
1, and therefore lies on the simplex. Our ο¬rst choicefor a distribution on the simplex for π ( π π© ) (and therefore π ( n π΅ ) )was the Dirichlet distribution . During the course of this project wehave applied a mean ο¬eld variational inference scheme that uses theDirichlet as the variational distribution as well as a Gibbs samplingscheme based on the Dirichlet-Multinomial cojugacy for posteriorinference. We found that the variational inference scheme yieldedunderestimated error bars, likely due to the restricted covariancestructure of the dirichlet. Moreover, the sampling approach did notscale well to the large galaxy samples expected for the ο¬rst-yearLSST observations. Speciο¬cally, the computational workload to up-date redshift variables for 10 β galaxies seems very large, andwhile subsampling techniques provide a possible mitigation, theycan lead to biased inferences (Quiroz et al. 2018). Furthermore theapplication of sampling techniques requires a suο¬ciently large traceto ensure convergence. This can be diο¬cult to ensure in this case.In order to provide a more ο¬exible distributional ansatz than thedirichlet, while still maintaining the computational advantages ofa mean ο¬eld variational inference scheme, we decided to developa scheme that is based on the logit-normal distribution (Atchison& Shen 1980), as explained in the following section. While theseconsiderations motivate our choice of method, we note that thisshould not discredit alternative approaches based on sampling orvariational inference in general. We will perform a more detailedanalysis of convergence and probability coverage of multiple infer-ence techniques in future work. The problem speciο¬ed by Eq. (14) is a deconvolution problem thatextends the simple toy model considered in Β§ 3. The βnoiseβ PDF isnow given by a joint likelihood that can depend on a complex setof parameters. Furthermore, while the discussion in Β§ 3 focused onderiving an estimator for the deconvolved density, the focus here isto infer posteriors using eο¬cient inference techniques. We presentthe detailed description and derivation of the inference pipeline inAppendices A and B. The ο¬nal form of Eq. (14) is then given in theform of a logit-normal posterior: π ( n B | ΛF ) β βοΈ | π πΊ y | Ξ π§ π bins (cid:206) π bins π = π π΅π exp (cid:32) β (cid:32) log (cid:32) π π© β N bins π π΅π bins (cid:33) β π y , ML (cid:33) πΊ β y (cid:32) log (cid:32) π π© β N bins π π΅π bins (cid:33) β π y , ML (cid:33)(cid:33) , (16)where Ξ π§ denotes the histogram bin width. The estimation of thecovariance πΊ y and mean vector π y , ML are detailed in AppendicesA and B. However, we note that this formalism derives the hessian H = β πΊ π β and obtaining the covariance matrix πΊ π requires matrixinversion. The subscript βyβ here refers to the variable transforma-tion: y ( π ) = (cid:2) log ( π / π π bins ) , . . . , log ( π π bins β / π π bins ) (cid:3) , (17) The Dirichlet is the conjugate prior to the multinominal distribution, whichcan make sampling and inference easier. Concretely, if a Dirichlet prior is seton the probabilities of the multinomial likelihood (which are its parameters),the posterior over these probabilities is again a Dirichlet. However conjugacydoes not imply that the prior is ideal in all circumstances.MNRAS , 1β22 (2015), 1β22 (2015)
1, and therefore lies on the simplex. Our ο¬rst choicefor a distribution on the simplex for π ( π π© ) (and therefore π ( n π΅ ) )was the Dirichlet distribution . During the course of this project wehave applied a mean ο¬eld variational inference scheme that uses theDirichlet as the variational distribution as well as a Gibbs samplingscheme based on the Dirichlet-Multinomial cojugacy for posteriorinference. We found that the variational inference scheme yieldedunderestimated error bars, likely due to the restricted covariancestructure of the dirichlet. Moreover, the sampling approach did notscale well to the large galaxy samples expected for the ο¬rst-yearLSST observations. Speciο¬cally, the computational workload to up-date redshift variables for 10 β galaxies seems very large, andwhile subsampling techniques provide a possible mitigation, theycan lead to biased inferences (Quiroz et al. 2018). Furthermore theapplication of sampling techniques requires a suο¬ciently large traceto ensure convergence. This can be diο¬cult to ensure in this case.In order to provide a more ο¬exible distributional ansatz than thedirichlet, while still maintaining the computational advantages ofa mean ο¬eld variational inference scheme, we decided to developa scheme that is based on the logit-normal distribution (Atchison& Shen 1980), as explained in the following section. While theseconsiderations motivate our choice of method, we note that thisshould not discredit alternative approaches based on sampling orvariational inference in general. We will perform a more detailedanalysis of convergence and probability coverage of multiple infer-ence techniques in future work. The problem speciο¬ed by Eq. (14) is a deconvolution problem thatextends the simple toy model considered in Β§ 3. The βnoiseβ PDF isnow given by a joint likelihood that can depend on a complex setof parameters. Furthermore, while the discussion in Β§ 3 focused onderiving an estimator for the deconvolved density, the focus here isto infer posteriors using eο¬cient inference techniques. We presentthe detailed description and derivation of the inference pipeline inAppendices A and B. The ο¬nal form of Eq. (14) is then given in theform of a logit-normal posterior: π ( n B | ΛF ) β βοΈ | π πΊ y | Ξ π§ π bins (cid:206) π bins π = π π΅π exp (cid:32) β (cid:32) log (cid:32) π π© β N bins π π΅π bins (cid:33) β π y , ML (cid:33) πΊ β y (cid:32) log (cid:32) π π© β N bins π π΅π bins (cid:33) β π y , ML (cid:33)(cid:33) , (16)where Ξ π§ denotes the histogram bin width. The estimation of thecovariance πΊ y and mean vector π y , ML are detailed in AppendicesA and B. However, we note that this formalism derives the hessian H = β πΊ π β and obtaining the covariance matrix πΊ π requires matrixinversion. The subscript βyβ here refers to the variable transforma-tion: y ( π ) = (cid:2) log ( π / π π bins ) , . . . , log ( π π bins β / π π bins ) (cid:3) , (17) The Dirichlet is the conjugate prior to the multinominal distribution, whichcan make sampling and inference easier. Concretely, if a Dirichlet prior is seton the probabilities of the multinomial likelihood (which are its parameters),the posterior over these probabilities is again a Dirichlet. However conjugacydoes not imply that the prior is ideal in all circumstances.MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error that we discuss in more detail in Appendix B. The vector n B β π bins denotes here the vector of n B excluding the last entry π π΅ N bins , wherewe assume equal sized redshift histogram bin width.Like the classical deconvolution problem, the inference ofEq. (14) is an inverse problem. We can therefore expect that thereexists a parameter vector π β Ξ , where Ξ denotes the simplexspace, that has a high likelihood (relative to the maximum likeli-hood value), but a large distance from the true π opt .Furthermore, as we have seen in Β§ 3, the solutions of inverseproblems do not have to be bounded (or even well deο¬ned), whichimplies that uncertainties can be arbitrarily large (see, e.g., Kuusela2016). Regularization, detailed in the following section, is thereforea key aspect in our inference pipeline. In this section we describe techniques that we employ to regularize the deconvolution problem. As instabilities arise when the histogramwidth is of the same order as the uncertainty in the redshift of theindividual galaxies, picking broader bins reduces these artifacts(see e.g. Kuusela 2016). Considering our toy model in Eq. (10), wesee that if πΎ ft ( π‘π ) is narrower than π ft π ( π‘ ) , there can be values of π‘ where their ratio, and therefore the integrand in Eq. (10), can becomelarge or even unbounded. Subject to the aforementioned limitations,this behaviour generalizes to the deconvolution problem consideredhere. In the following, we will denote this scheme as the βWideBinβ method. As will be seen in the following section, this simplescheme can lead to posteriors that can be biased and too narrow. Wetherefore consider alternative approaches. The βMerging Bin Regularizationβ scheme (Kuusela 2016) uses avery thin initial histogram binning. This will likely result in theaforementioned typical instabilities of inverse problems, but canavoid biases of the Wide-Bin regularization (or other smoothing)schemes. However, we must ensure that the optimization of themaximum-likelihood solution converges to a global maximum. Wetherefore run multiple optimizations with diο¬erent initial conditionsand pick the best solution. Furthermore, the hessian H can have avery high condition number. Even though it is possible to sam-ple from the resulting posteriors without the matrix inverse usingMCMC sampling (only the inverse covariance enters the π ), sam-pling is more eο¬cient using the standard Box-Mueller method (Box& Muller 1958), which requires an inverse. Tikhonov Regularization
We perform the matrix inversion of thehessian H using Tikhonov regularization (see e.g. Kress 1998, pp.86-90). Here, we treat the matrix inversion as a system of linearequations constructed from the hessian and the column-wise inversehessian/unit matrix respectively. The instability of the problem canlead to very small entries in the hessian that imply large entries (in In our case we note that all parameter values π β Ξ are bounded, because Ξ (with a chosen metric) is a bounded metric space. However these solutionsin logit space (see Β§ B) do not have to be bounded. We will use the term βregularizationβ not only in the context of Bayesianstatistics, where itβs often implemented in the form of a prior, but in generalto describe methods that restrict the complexity of parameters or functions.
Redshift z S a m p l e P h o t o m e t r i c R e d s h i f t D i s t r i b u t i o n n B True DistributionFiducialMerging Bin
Figure 2.
Illustration of the impact of the βMerging Binβ regularizationon the posterior of the sample photometric redshift distribution. We show1 π intervals. The x-axis shows the redshift value π§ , the y-axis the value ofthe n B parameters. The errorbars are the [ , ] percentiles, which wouldcorrespond to 1 π intervals for a normal distribution. The black dashedcurve shows the spectroscopic redshift distribution. The red contour showsthe result of the βMerging Binβ regularization with 30 bins applied to theβFiducialβ contours that are binned using 50 bins. We refer for a detailedexplanation to Β§ 8 and Fig. 7, which shows and discusses the βFiducialβ caseas βSmall Sample (50k)β. absolute values) in its inverse. As a regularization, one can add apenalty term and reformulate the problem as a minimizationmin (cid:110) || Ha π β π || + || πΌ a π || (cid:111) , (18)where π denotes column π of the inverse hessian and the unit matrixrespectively. The matrix πΌ = πΌ is the Tikhonov matrix, πΌ theregularization parameter and the unit matrix. The regularizationterm penalizes large values for a i , which regularizes the inversionand reduces the condition number. The analytic solution to thisminimization problem is given as: c π = (cid:16) H T H + πΌ T πΌ (cid:17) β H T i . (19)We note that we introduce the Tikhonov regularization herepredominantly as a way to regularize the matrix inversion of thehessian. We recommend selecting the parameter πΌ to be just as largeas necessary to perform this inversion accurately. Tikhonov regular-ization can be used as the main regularization in inverse problems;however, we ο¬nd that the merging bin regularization scheme per-forms much better in terms of producing well-calibrated probability n B posteriors. We reiterate that the idea of the merging bin reg-ularization scheme proposed by Kuusela (2016) is to deliberatelystart with histogram bins that are too small and lead to a noisy de-convolved density. We then exploit the characteristic noise patternin the deconvolved distribution, where bins that overshoot, i.e. arelarger than the true value, are immediately followed by those thatundershoot. This results in an alternating or βzig-zagβ pattern of thedeconvolved density. Merging these neighboring bins then helps toβstabilizeβ the deconvolved distribution. We therefore sample froma posterior obtained assuming a ο¬nely binned histogram and mergeneighboring bins, which compensates the noise eο¬ect. We can thenin principle directly use these samples in the cross-correlation like-lihood. Nonetheless, it is computationally more eο¬cient to remapthese samples to a regular grid with the same or very similar resolu-tion than the binning used for the cross-correlations, since treatingthe ο¬nely binned histogram heights as free parameters would not MNRAS , 1β22 (2015) add more information due to the resolution loss. We found that thetemplate ο¬tting posterior after merging bin regularization can againbe well-described by a logit normal distribution that we ο¬t usingAssumed Density ο¬ltering.
Assumed Density Filtering
To ο¬t the logit normal distribution tothe sampled and merged posterior samples, we work in logit spaceand make a gaussian ansatz π ( y ) = N ( y | π , πΊ ) . (20)We can directly generate samples from the true distribution bysampling from the original, ο¬nely binned, logit-normal distributionwith subsequent merging, i.e. averaging, of neighboring bins, andthen transforming back to logit space. We will denote this truedistribution as π true ( y ) . Assumed density ο¬ltering then commencesby minimizing the Kullback-Leibler divergence between our ansatz π ( y ) and π ( y ) ,KL ( π || π ) = β« d y ( π ( y ) log π ( y ) β π ( y ) log π ( y )) . (21)After optimizing πΎ πΏ ( π || π ) for π and πΊ , we can show that theoptimium is reached if β« d y π ( y ) y = π , (22)and πΊ = β« d y π ( y ) ( y β π )( y β π ) T (23)We see that assumed density ο¬ltering reduces to moment matchingin logit space, when we apply the sample mean and sample covari-ance estimators to samples from π true ( y ) . We note that this is ingeneral true for distributions of the exponential family (see, e.g.,Ranganathan 2004).To summarize, we perform the inference scheme described inthe previous section using a ο¬ne histogram binning. Subsequentlywe sample from the posterior after regularizing the inverse hessianusing Tikhonov regularization. We merge neighboring bins fromeach posterior draw until the noise is reduced and we obtain a smoothprobability distribution. We illustrate this process in Fig. 2, whichillustrates the impact of the βMerging Binβ regularization schemeon the posterior of the sample photometric redshift distribution.Comparing the grey contours (βFiducialβ) that uses 50 bins withthe red contours (βMerging Binβ) that merges neighboring bins toa binning of 30 bins, we see the much smoother shape and theelimination of the βzig-zagβ pattern present in the grey contours.We defer a more thorough explanation of the methodology andsample to Β§ 8 and Fig. 7, which discusses the result shown in thegrey contours under the abbreviation βSmall Sample (50k)β.Based on our experience we propose to initially merge neigh-boring bins until we obtain a bin size of the order of the average Β± π range of the individual galaxy redshift distributions. Subse-quently we merge fewer bins until the aforementioned character-istic βzig-zagβ noise pattern appears. This can be identiο¬ed as thelimiting resolution we can obtain. We note that it is important todistinguish patterns due to βrealβ line of sight structure and due tothe aforementioned noise in the deconvolution. If the pattern ap-pears gradually with increasing resolution (merging fewer bins),it is indicative of statistically signiο¬cant line-of-sight structure. Ifthe deconvolved density suddenly becomes unstable in a βzig-zagβpattern when fewer bins are merged, we have reached a resolutionlimit. Using assumed density ο¬ltering under the ansatz of a logit normal distribution, we ο¬nally reparametrize our model on the ο¬nalredshift grid.The merging process described above is largely based on in-specting when the instabilities vanish. There are certainly moreprincipled alternatives. In a classical deconvolution problem, likethe one presented in Β§ 3, one could use a bootstrap estimate of thebias and variance of the reconstruction with respect to the over-smoothed photometric redshift distribution. This is consistent withthe approach taken by Padmanabhan et al. (2005) based on the rec-ommendation in Craig & Brown (1986). We use a joint likelihoodbetween the photometry and spatial information to produce posteri-ors for the photometric sample redshift distribution and not a βpointpredictionβ. Furthermore our βmeasured dataβ is the photometry andspatial information of galaxies. Accordingly our model selection, ofwhich regularization is a part, must reproduce the measured pho-tometry and spatial distribution, e.g., measured by the correlationfunctions of galaxies. In the Bayesian context, this would trans-late into the usage of posterior predictive checks (PPC) discussedin Β§ 7.1. While the path to development of a more principled se-lection of the hyperparameters of regularization is known, it willrequire a thorough investigation. We defer this to future work, us-ing the aforementioned βrule-of-thumbβ methodology as an interimsolution. In order to include information about the spatial distribution ofgalaxies into the likelihood, we consider spatial cross-correlationsbetween photometric and spectroscopic samples. Spatial correla-tions measure the excess probability over random to ο¬nd two galax-ies separated by a certain distance. This can be exploited to extractredshift information for galaxy samples (see e.g. Newman 2008;MΓ©nard et al. 2013; McQuinn & White 2013; Scottez et al. 2016;Raccanelli et al. 2017; Morrison et al. 2017; Davis et al. 2017; Gattiet al. 2018) for which we do not have accurate redshift informa-tion, i.e. photometric galaxy samples, using spatially overlappingspectroscopic catalogs.The idea is to select the spectroscopic samples in thin redshiftslices and estimate the cross-correlation between these redshift-selected samples and the full photometric galaxy sample. As dis-cussed in the following, the resulting signal will then be proportionalto the photometric redshift distribution at that redshift.Fig. 3 illustrates the basic idea of cross-correlation redshiftinference. We consider two galaxy samples: a reference sampleβRβ and a base galaxy sample βBβ. The reference sample containsgalaxies with accurate, often spectroscopic, redshift measurements;the base galaxy sample consists of galaxies observed in broad bandphotometric ο¬lters. As the base/reference samples are typically pho-tometric/spectroscopic samples, we use these terms interchangablyin text . The redshift distribution of the base sample is illustratedby the red distribution, while the binned reference sample redshiftdistributions for simplicity are shown as tophat functions (unlikethe simulated samples we use to test our methodology). A sin-gle cross-correlation is then obtained by cross-correlating a singletophat selection with the full base sample. Multiple measurements We note, however, that the reference sample does not have to be a spectro-scopic dataset, as multi-band, narrow ο¬lter photometric observations (Alar-con et al. 2020a), or photometric redshifts of redMaGiC samples (see e.g.Gatti et al. 2018), also allow for reasonable redshift accuracy.MNRAS , 1β22 (2015), 1β22 (2015)
To ο¬t the logit normal distribution tothe sampled and merged posterior samples, we work in logit spaceand make a gaussian ansatz π ( y ) = N ( y | π , πΊ ) . (20)We can directly generate samples from the true distribution bysampling from the original, ο¬nely binned, logit-normal distributionwith subsequent merging, i.e. averaging, of neighboring bins, andthen transforming back to logit space. We will denote this truedistribution as π true ( y ) . Assumed density ο¬ltering then commencesby minimizing the Kullback-Leibler divergence between our ansatz π ( y ) and π ( y ) ,KL ( π || π ) = β« d y ( π ( y ) log π ( y ) β π ( y ) log π ( y )) . (21)After optimizing πΎ πΏ ( π || π ) for π and πΊ , we can show that theoptimium is reached if β« d y π ( y ) y = π , (22)and πΊ = β« d y π ( y ) ( y β π )( y β π ) T (23)We see that assumed density ο¬ltering reduces to moment matchingin logit space, when we apply the sample mean and sample covari-ance estimators to samples from π true ( y ) . We note that this is ingeneral true for distributions of the exponential family (see, e.g.,Ranganathan 2004).To summarize, we perform the inference scheme described inthe previous section using a ο¬ne histogram binning. Subsequentlywe sample from the posterior after regularizing the inverse hessianusing Tikhonov regularization. We merge neighboring bins fromeach posterior draw until the noise is reduced and we obtain a smoothprobability distribution. We illustrate this process in Fig. 2, whichillustrates the impact of the βMerging Binβ regularization schemeon the posterior of the sample photometric redshift distribution.Comparing the grey contours (βFiducialβ) that uses 50 bins withthe red contours (βMerging Binβ) that merges neighboring bins toa binning of 30 bins, we see the much smoother shape and theelimination of the βzig-zagβ pattern present in the grey contours.We defer a more thorough explanation of the methodology andsample to Β§ 8 and Fig. 7, which discusses the result shown in thegrey contours under the abbreviation βSmall Sample (50k)β.Based on our experience we propose to initially merge neigh-boring bins until we obtain a bin size of the order of the average Β± π range of the individual galaxy redshift distributions. Subse-quently we merge fewer bins until the aforementioned character-istic βzig-zagβ noise pattern appears. This can be identiο¬ed as thelimiting resolution we can obtain. We note that it is important todistinguish patterns due to βrealβ line of sight structure and due tothe aforementioned noise in the deconvolution. If the pattern ap-pears gradually with increasing resolution (merging fewer bins),it is indicative of statistically signiο¬cant line-of-sight structure. Ifthe deconvolved density suddenly becomes unstable in a βzig-zagβpattern when fewer bins are merged, we have reached a resolutionlimit. Using assumed density ο¬ltering under the ansatz of a logit normal distribution, we ο¬nally reparametrize our model on the ο¬nalredshift grid.The merging process described above is largely based on in-specting when the instabilities vanish. There are certainly moreprincipled alternatives. In a classical deconvolution problem, likethe one presented in Β§ 3, one could use a bootstrap estimate of thebias and variance of the reconstruction with respect to the over-smoothed photometric redshift distribution. This is consistent withthe approach taken by Padmanabhan et al. (2005) based on the rec-ommendation in Craig & Brown (1986). We use a joint likelihoodbetween the photometry and spatial information to produce posteri-ors for the photometric sample redshift distribution and not a βpointpredictionβ. Furthermore our βmeasured dataβ is the photometry andspatial information of galaxies. Accordingly our model selection, ofwhich regularization is a part, must reproduce the measured pho-tometry and spatial distribution, e.g., measured by the correlationfunctions of galaxies. In the Bayesian context, this would trans-late into the usage of posterior predictive checks (PPC) discussedin Β§ 7.1. While the path to development of a more principled se-lection of the hyperparameters of regularization is known, it willrequire a thorough investigation. We defer this to future work, us-ing the aforementioned βrule-of-thumbβ methodology as an interimsolution. In order to include information about the spatial distribution ofgalaxies into the likelihood, we consider spatial cross-correlationsbetween photometric and spectroscopic samples. Spatial correla-tions measure the excess probability over random to ο¬nd two galax-ies separated by a certain distance. This can be exploited to extractredshift information for galaxy samples (see e.g. Newman 2008;MΓ©nard et al. 2013; McQuinn & White 2013; Scottez et al. 2016;Raccanelli et al. 2017; Morrison et al. 2017; Davis et al. 2017; Gattiet al. 2018) for which we do not have accurate redshift informa-tion, i.e. photometric galaxy samples, using spatially overlappingspectroscopic catalogs.The idea is to select the spectroscopic samples in thin redshiftslices and estimate the cross-correlation between these redshift-selected samples and the full photometric galaxy sample. As dis-cussed in the following, the resulting signal will then be proportionalto the photometric redshift distribution at that redshift.Fig. 3 illustrates the basic idea of cross-correlation redshiftinference. We consider two galaxy samples: a reference sampleβRβ and a base galaxy sample βBβ. The reference sample containsgalaxies with accurate, often spectroscopic, redshift measurements;the base galaxy sample consists of galaxies observed in broad bandphotometric ο¬lters. As the base/reference samples are typically pho-tometric/spectroscopic samples, we use these terms interchangablyin text . The redshift distribution of the base sample is illustratedby the red distribution, while the binned reference sample redshiftdistributions for simplicity are shown as tophat functions (unlikethe simulated samples we use to test our methodology). A sin-gle cross-correlation is then obtained by cross-correlating a singletophat selection with the full base sample. Multiple measurements We note, however, that the reference sample does not have to be a spectro-scopic dataset, as multi-band, narrow ο¬lter photometric observations (Alar-con et al. 2020a), or photometric redshifts of redMaGiC samples (see e.g.Gatti et al. 2018), also allow for reasonable redshift accuracy.MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error Figure 3.
Illustration of the construction of the cross-correlation data vector.A βReference Sampleβ that can be precisely selected in redshift is moved overa βBase Sampleβ, which can either be a sample without redshift information(photometric sample) or the βReference Sampleβ itself. In this illustration10 cross-correlations would be estimated, constituting the cross-correlationdata vector. therefore βsliceβ through the redshift distribution of the base sample,illustrated here by the arrows and the grey hatched tophat slices.As described in detail in Schmidt et al. (2013), Morrison et al.(2017) and MΓ©nard et al. (2013), we measure the over-density, com-pared with a spatially random distribution of points, of photometricgalaxies around each galaxy in the spectroscopic sample, withinan annulus of physical scale Ξ π = [ π min , π max ] . The theoreticalmodel for a cross-correlation function between the spectroscopicreference sample in tophat bin π and the photometric base sample isgiven as: π€ RB π β π π π π π΅π (cid:32) π B π π§ ππ» β π§ ππΏ (cid:33) π€ DM , i . (24)Here, π π π and π π΅π denote the value of the redshift-dependent galaxy-dark matter bias of the reference (R) and base (B) samples. Thenormed histogram bin heights of the base, or photometric, sampleredshift distribution are denoted as π π΅π , where (cid:205) π π π΅π =
1, and thesize of a redshift bin is given as π§ ππ» β π§ ππΏ . The term π€ DM , i denotesthe contribution of dark-matter clustering to the cross-correlationsignal, which depends on the cosmological model.We see that the modelling of the cross-correlation signal de-pends on the product of two redshift-dependent galaxy-dark matterbias functions that are completely degenerate with the set of pa-rameters π B that parametrizes the redshift distribution of the basesample. Furthermore, since π€ DM , i depends on the cosmologicalmodel, it will be computationally expensive to sample over theseparameters.To reduce the impact of the cosmological model, we want tocombine the cross-correlations w RB with the correlations w RR ofthe spectroscopic sample. We therefore correlate the spectroscopicsample with itself in a manner analogous to what was just described,i.e., by correlating tophat selected spectroscopic samples with thefull spectroscopic sample. The corresponding theory prediction thenreads π€ RR π β ( π π π ) (cid:32) π π π π§ ππ» β π§ ππΏ (cid:33) π€ DM , i (25)where π π π is the normalized histogram height of the spectroscopic(full) sample redshift distribution. Both correlation function mea- surements ( Λ w RB and Λ w RR ) just described are assumed to individu-ally follow a Gaussian likelihood .Based on these deο¬nitions and approximations and the consid-erations in the previous section, we construct a likelihood based onthe ratio between Λw RB and Λw RR . Under the assumption of a diag-onal covariance matrix for Λw RB and Λw RR , we can construct therandom variable πͺ for bin π with components Ξ meas π = (cid:32) Λ π€ π π΅π Λ π€ π π π (cid:33) . (26)We reiterate that both Λ π€ π π΅π and Λ π€ π π π are described by a Gaus-sian Likelihood. Their respective means and standard deviationsare given as π RB , i , π RR , i , π RB , i , π RR , i respectively. The theoreticalprediction for the transformed random variable Ξ meas π is then Ξ theo π ( π π΅π , π π π , π π΅π , π π π ) = π π΅π π π π π π΅π π π π , (27)and its likelihood: π ( Ξ meas π | Ξ theo π ) = π ( Ξ theo π ) π ( Ξ theo π ) π ( Ξ theo π ) β ππ RB , i π RR , i Γ (cid:32) Ξ¦ (cid:32) π ( Ξ theo π ) π ( Ξ theo π ) (cid:33) β Ξ¦ (cid:32) β π ( Ξ theo π ) π ( Ξ theo π ) (cid:33)(cid:33) + π ( Ξ theo π ) ππ RB , i π RR , i exp (cid:32) β π Ξ theo π (cid:33) . (28)Here Ξ¦ ( π§ ) denotes the cumulative distribution function of the zeromean unit variance normal distribution and π ( Ξ theo π ) = βοΈ π , i ( Ξ theo π ) + π , i (29) π ( Ξ theo π ) = π RB , i π , i Ξ π + π RR , i π , i (30) π ( Ξ theo π ) = exp (cid:32) π ( Ξ theo π ) β ππ ( Ξ theo π ) π ( Ξ theo π ) (cid:33) (31) π = π , i π , i + π , i π , i . (32)For the following discussion we deο¬ne n = π π§ π» β π§ πΏ , i.e. the variables π , n refer to histogram heights normalized to sum to unity and tounit area respectively. The variable n B and n R refer to the histogramheights of the base and reference samples. This likelihood assumesindependence between neighboring redshift bins. We can, however,expect a degree of correlation especially for lower redshift bins dueto magniο¬cation eο¬ects.Eq. (28) is approximately independent of the modelling of the π€ DM term, assuming that we pick suο¬ciently thin redshift bins toβdivide-outβ the redshift-dependence of the dark matter clusteringterm π€ DM . Based on the aforementioned independence assumption In reality, we expect that the likelihood will deviate from the Gaussianassumption (see e.g. Hahn et al. 2019). While the covariance matrix will be dominated by the diagonal, wecan expect, that the cross-correlation measurements in diο¬erent bins willbe correlated. Thus the assumption of a diagonal covariance matrix is anapproximation.MNRAS , 1β22 (2015) between redshift bins, the joint likelihood for all bins π now reads: π ( πͺ | b b , b R , n R , n B ) = π bins (cid:214) π = π ( Ξ π | π π΅π , π π π , π π π , π π΅π ) (33)The function that describes the set of ratios π π΅π / π π π will be denotedas πΆ ( π§, Ξ π β₯ ) and depends both on redshift and the size of theannulus Ξ π . For a selected annulus size we will use the abbreviation πΆ ( π§ ) . To formulate a joint likelihood for the data vector of both galaxy po-sitions and photometry, we use the composite likelihood ansatz (e.g.Varin et al. 2011) that uses the product of marginal likelihoods forboth the photometry ΛF and the vector of cross-correlation functions πͺ : π ( ΛF , πͺ | n B , sys , n B , n R , C ( z )) = π ( ΛF | n π΅ ) π π ( πͺ | n B , sys , C ( z )) π , (34)where π are weights that can be selected to improve the eο¬ciency ofthe estimation (see e.g. Varin et al. 2011) by increasing the inο¬uenceof one part of the composite likelihood over the other. Furthermorethe composite likelihood can be conditioned on auxiliary parameterssuch as the ο¬eld. For simplicity we consider here only the simplecase of π = ΛF and πͺ with the joint data vectors of theselected galaxy samples, which would include covariances betweenthe πͺ measurements for diο¬erent tomographic bins. Furthermorethe quality and number of available photometric bands can changefor diο¬erent spatial areas. Similarly we need to construct a joint datavector of ΛF and πͺ that incorporates these covariances. This can bemodelled either analytically (e.g. Stoyan & Stoyan 1994; SΓ‘nchezet al. 2020) or by using spatial resampling techniques. In this workwe will concentrate on the composite likelihood as given in Eq. (34)and refer extensions of the method to future work. Parameter inference is only a single step in a full statistical analysisand needs to be combined with additional analysis steps. We needto ensure that parameters can be uniquely inferred and the posteriordoes not exhibit ο¬at regions or strong degeneracies, which can makethe application of MCMC techniques diο¬cult (see e.g. Rothenberg1971; Raue et al. 2013). Furthermore, one needs to investigate thesensitivity of the results against changes in the prior and likelihood.Finally, one has to judge if the inferred posteriors are sensible inthe context of the cosmological/astrophysical model and evaluate ifthe ο¬tted model is a good representation of the observed data. Thelast step will be the topic of this section. Β§ 7.1 describes posteriorpredictive checks as a means to evaluate the goodness of ο¬t of themodel and in Β§ 7.2 we propose a method to parametrize systematicsdue to biased photometric likelihoods.
The idea of posterior predictive checks (PPC) is to simulate syn-thetic data from a ο¬tted model, that is then compared with theoriginal measurements to serve as an internal consistency check.For example there exist several approaches that allow us to estimatethe quality of probability calibration based on the distribution ofposterior predictive π -values (e.g. Gelman et al. 1996). This paperwill only provide a short discussion of posterior predictive testing,which is still an area of active research. The basic idea of modelchecking is to investigate if data predicted by the ο¬tted model isrepresentative of the observed data. The classical approach, for ex-ample developed for linear regression, uses an analytical probabilitydistribution for the test statistic (for example the π distribution) andevaluates the tail probability to test how much of an βoutlierβ theobserved data is, given the ο¬tted model. This approach can be prob-lematic if the model is complex , which makes it diο¬cult to derivean analytic sampling distribution for a suitable statistic given a ο¬ttedmodel. Further problems arise due to signiο¬cant inο¬uence of out-liers, or if parameters are subject to boundary conditions. Posteriorpredictive checks are extensions to this classical approach in theBayesian framework.Starting from the composite likelihood deο¬ned in Eq. (34), theposterior predictive distribution reads π (D rep |D) = β¬ d n B d C ( z ) π (D rep | n B , C ( z )) π ( n B , C ( z )|D) , (35)where D = { Ξ ( π€ BR , π€ RR ) , Λ F } and D rep denote the replicated, orpredicted, measurements sampled from the ο¬tted model. We notethat our model speciο¬es only the ratio between π€ RB and π€ RR . How-ever, we can always sample replications for one of these quantitiesusing the measurement of the other via the data transformationspeciο¬ed in Eq. (26).Posterior predictive checking is particularly useful as it allowsus to access model calibration quality and predictive accuracy with-out spectroscopic (or accurate multiband photometric) validationdata. This is a decisive advantage of specifying the photometriclikelihood over empirical approaches based on machine learningor, more speciο¬cally, conditional density estimation. By modellingthe data generating process of SED evolution and redshifting, wecan generate new photometry using a ο¬tted physical model, givingus the opportunity to develop statistical tests based on the predic-tive distribution to probe the quality of the photometric likelihoodcalibration.To avoid confusion, it is important to clarify that posteriorpredictive checking and model comparison, while often method-ologically similar, have diο¬erent goals. Posterior predictive checksaim to provide internal consistency tests for a given model and in-ference framework. Model comparison/combination has the goal tocompare/combine multiple models based on some measure of ο¬ttingaccuracy. However, model comparison and combination can also bebased on posterior predictive accuracy (see e.g. Gelman et al. 2013).In the development of new models and the evaluation of directionsfor improvement, both of these concepts work together.An important prerequisite for the ο¬tting of complex statis-tical models is the evaluation of model parameter degeneracies.We have discussed the fact that the parameters that describe the An example are models that do not have the form of a generalized linearmodel (see, e.g., Gelman et al. 1996). MNRAS , 1β22 (2015), 1β22 (2015)
The idea of posterior predictive checks (PPC) is to simulate syn-thetic data from a ο¬tted model, that is then compared with theoriginal measurements to serve as an internal consistency check.For example there exist several approaches that allow us to estimatethe quality of probability calibration based on the distribution ofposterior predictive π -values (e.g. Gelman et al. 1996). This paperwill only provide a short discussion of posterior predictive testing,which is still an area of active research. The basic idea of modelchecking is to investigate if data predicted by the ο¬tted model isrepresentative of the observed data. The classical approach, for ex-ample developed for linear regression, uses an analytical probabilitydistribution for the test statistic (for example the π distribution) andevaluates the tail probability to test how much of an βoutlierβ theobserved data is, given the ο¬tted model. This approach can be prob-lematic if the model is complex , which makes it diο¬cult to derivean analytic sampling distribution for a suitable statistic given a ο¬ttedmodel. Further problems arise due to signiο¬cant inο¬uence of out-liers, or if parameters are subject to boundary conditions. Posteriorpredictive checks are extensions to this classical approach in theBayesian framework.Starting from the composite likelihood deο¬ned in Eq. (34), theposterior predictive distribution reads π (D rep |D) = β¬ d n B d C ( z ) π (D rep | n B , C ( z )) π ( n B , C ( z )|D) , (35)where D = { Ξ ( π€ BR , π€ RR ) , Λ F } and D rep denote the replicated, orpredicted, measurements sampled from the ο¬tted model. We notethat our model speciο¬es only the ratio between π€ RB and π€ RR . How-ever, we can always sample replications for one of these quantitiesusing the measurement of the other via the data transformationspeciο¬ed in Eq. (26).Posterior predictive checking is particularly useful as it allowsus to access model calibration quality and predictive accuracy with-out spectroscopic (or accurate multiband photometric) validationdata. This is a decisive advantage of specifying the photometriclikelihood over empirical approaches based on machine learningor, more speciο¬cally, conditional density estimation. By modellingthe data generating process of SED evolution and redshifting, wecan generate new photometry using a ο¬tted physical model, givingus the opportunity to develop statistical tests based on the predic-tive distribution to probe the quality of the photometric likelihoodcalibration.To avoid confusion, it is important to clarify that posteriorpredictive checking and model comparison, while often method-ologically similar, have diο¬erent goals. Posterior predictive checksaim to provide internal consistency tests for a given model and in-ference framework. Model comparison/combination has the goal tocompare/combine multiple models based on some measure of ο¬ttingaccuracy. However, model comparison and combination can also bebased on posterior predictive accuracy (see e.g. Gelman et al. 2013).In the development of new models and the evaluation of directionsfor improvement, both of these concepts work together.An important prerequisite for the ο¬tting of complex statis-tical models is the evaluation of model parameter degeneracies.We have discussed the fact that the parameters that describe the An example are models that do not have the form of a generalized linearmodel (see, e.g., Gelman et al. 1996). MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error redshift-dependent galaxy-dark matter bias and the sample redshiftdistribution enter the clustering likelihood in a completely degener-ate way. Accordingly, completely diο¬erent n B - C ( z ) combinationswill produce the same data distribution. This therefore limits theinformation that can be obtained on n B depending on the prior in-formation imposed on C ( z ) . A combination with the photometriclikelihood can help to break these degeneracies, as long as the SEDmodelling itself does not exhibit strong degeneracies, e.g. betweenthe SED type and the redshift of galaxies. A dedicated study ofmodel checking in color space is left for future work. In Β§8.3, we will discuss how miscalibrated likelihoods can lead tosystematic biases and uncertainties in the deconvolution operation.Our goal in this section is to include a simple transformation into themodel that parametrizes these systematics. A simple choice is theGaussian convolution kernel, which modiο¬es the sample redshiftdistribution as π ( π§ | n B , sys ) = β« d π π ( π ) β« π ( π§ | π§, π ) π ( π§ | n B ) d π§ . (36)Here n B , sys denotes the parameters that describe the sample red-shift distribution after the convolution is applied to the originalsample redshift distribution π ( π§ | n B ) , where the convolution ker-nel is a gaussian with standard deviation and shift in the mean π = ( Ξ π, Ξ π ) : π ( π§ | π§, π ) = βοΈ ( π Ξ π ) exp (cid:34) β (cid:18) π§ β π§ + Ξ π Ξ π (cid:19) (cid:35) (37)Assuming the same histogram parametrization for π ( π§ | n B ) as for π ( π§ | n B , sys ) , we see that this implies an aο¬ne transformation n B ββ A ( π ) Β· n B where the matrix A is given as π΄ π½ π = Ξ¦ (cid:32) π§ ππ» (cid:12)(cid:12)(cid:12)(cid:12) π§ π½π» β π§ π½πΏ + Ξ π, Ξ π (cid:33) β Ξ¦ (cid:32) π§ ππΏ (cid:12)(cid:12)(cid:12)(cid:12) π§ π½π» β π§ π½πΏ + Ξ π, Ξ π (cid:33) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) erf (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) π§ ππ» β (cid:18) π§ π½π» β π§ π½πΏ (cid:19) β Ξ π Ξ π β (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) β erf (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) π§ ππΏ β (cid:18) π§ π½π» β π§ π½πΏ (cid:19) β Ξ π Ξ π β (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172)(cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) , (38)where Ξ¦ denotes the cumulative distribution function of a nor-mal distribution and erf the error function. We choose a Gaussiansmoothing kernel since it has been shown that unbiased cosmologi-cal inference from measurements of weak lensing and LSS criticallydepends on accurate recovery on the mean and standard deviation ofthe photometric sample redshift distribution. Biases in both of thesestatistics can be parametrized using this kernel. In contrast to thenormal distribution, which exhibits a closed form solution under anaο¬ne transformation, the logit-normal does not have this property.However we empirically ο¬nd that we can approximate the shapeof the distribution after an aο¬ne transformation as a logit-normaldistribution to good accuracy. For a given set of π , we can ο¬nd theupdated parameter values by assumed density ο¬ltering.While marginalization using sampling techniques is possible,we choose a computationally eο¬cient approximation and marginal-ize over a discrete model set that consists of diο¬erent smoothing N Filter N Galaxy N z β hist N tomo ΛF Ο z n B , sys n R n B ΞC ( z ) Figure 4.
Directed graphical representation of the statistical model de-scribed in this paper. Empty/ο¬lled circles denote random variables thatare latent/observed. π (β) denotes the dimensionality of the random vari-able. Boxes encapsulate random variables with the same dimensionality.Solid/dotted lines indicate random/deterministic relationships between ran-dom variables. sizes Ξ π π and Ξ π = π ( nz π΅ , C ( z )| πͺ , ΛF ) = βοΈ Ξ π π π ( Ξ π π | πͺ , ΛF ) π ( nz B , C ( z )| πͺ , ΛF , Ξ π π ) . (39)This assumes that there is no systematic shift in the sample redshiftdistribution, and deviations from the true underlying distribution aredue to miscalibrated but on average unbiased individual galaxy like-lihoods. Furthermore, we will assume that π ( Ξ π π | πͺ , ΛF ) = π ( Ξ π π ) ,i.e., the smoothing size is a prior choice independent of the data thatcan be calibrated on simulations. For simplicity we will use a ο¬atprior π ( Ξ π ) here. Here we review and summarize our complete model. We reviewthe structure of all components in Β§ 7.3.1 and review the inferencestrategy in Β§ 7.3.2.
Fig. 4 summarizes the joint inference strategy presented in the previ-ous sections in a directed graphical model. Each random variable isdenoted as a circle, probabilistic/deterministic relationships are de-noted as solid/dotted lines. Boxes around random variables denotethe dimensionality of the random variable. For example the colorvector f π is an π ο¬lter dimensional random variable for π galaxies in π tomo tomographic bins. Filled circles denote observed randomvariables, in our case the photometry ΛF and the cross correlationratios πͺ .The graphical model is structured into three parts: the leftpart represents the photometric likelihood, the middle bullets de-scribe our treatment of systematics, and the right part describes theclustering redshift likelihood, which depends on the spectroscopicredshift distribution and the redshift-dependent galaxy-dark matterbias ratio.The structure of the graph illustrates the construction of themodel via the composite likelihood ansatz discussed in Β§ 6. It sep-arates the two data sources ΛF and πͺ in the left and right part ofthe graph. As we are mainly interested in performing inferences on MNRAS , 1β22 (2015) the n B variables, we marginalize over the π§ variables, which pro-vides signiο¬cant computational advantages. The mapping between n B and n B , sys takes the form of a deterministic transformation andis therefore indicated by a dotted line.While n π is here treated as a random variable, its βshot noiseβuncertainties are very small for the considered sample sizes of thespectroscopic sample. We therefore decided to ο¬x its value to themaximum likelihood value, i.e., the histogram height. The presented model consists of two likelihood terms and a de-terministic transformation n B β n B , sys . Starting with the pho-tometric likelihood, we employ the inference scheme detailed inAppendices A and B that results in a posterior π ( n B | ΛF ) de-ο¬ned in Eq. (16). We then employ the transformation detailed inΒ§ 7.2 to parametrize systematics in the inferred posterior frombiased photometric likelihoods. This yields a systematics cor-rected posterior π ( n B , sys | ΛF ) using the methodology described inΒ§ 7.2. The ο¬nal combination with the clustering likelihood term π ( n B , sys , C ( z )| ΛF , πͺ ) β π ( πͺ | n B , sys , C ( z )) π ( n B , sys | ΛF ) is then per-formed using a Monte Carlo Markov chain (MCMC) sampling ap-proach.We update the parameters ( n B , sys , C ( z )) in two samplingblocks: the set of parameters that describe the redshift distribu-tion of the base sample n B , sys and the parameters c that describe theevolution of the redshift-dependent galaxy-dark matter bias ratio C ( z ) .Concretely, we iteratively sample from the conditional distribu-tions π ( n B , sys | ΛF , πͺ , c ) and then from π ( c | ΛF , Λ πͺ , n B , sys ) . This meansthat we iteratively sample each parameter block in turn, while hold-ing the other parameter block ο¬xed. The sampling method that canbe employed to update each parameter blocks is ο¬exible . We use aMetropolis-Hastings sampling scheme to sample the c parameters.To sample from the conditional π ( n B , sys | ΛF , πͺ , c ) we also employ aMetropolis scheme, however we perform the sampling not in termsof the n B , sys parameters, but in logit space, i.e. in terms of the y parameters that are connected with n B , sys , or their normalizedanalog π sys , via Eq. (B1). In this way we can utilize proposal dis-tributions that are deο¬ned in real space to sample a distributiondeο¬ned on the simplex. We reiterate that the posterior π ( n B | ΛF ) has,in our framework, an analytical form and sampling is therefore veryeο¬cient. However if we include a treatment of systematics or aclustering redshift likelihood into the inference, we need to employsampling approaches because the posterior has no longer a closedform solution. To demonstrate the eο¬ectiveness of our inference methodology, weconsider an idealized setup which allows us to forecast the con-straints on the sample redshift distribution that we can expect froma DESI-like spectroscopic survey overlapping with the LSST Y10footprint. We assume in this section that the composite likelihoodis well calibrated, both in the clustering and in the template ο¬t-ting part. This assumption will likely not hold in practice and we We tried several approaches like Hamiltonian MCMC or Elliptical Slicesampling. All approaches work well; we discuss here the structurally simplestscheme.
Figure 5.
The top three panels show cross-correlation measurements be-tween the photometric base sample and the spectroscopic reference sample π€ RB and correlation function measurements of the spectroscopic referencesample π€ RR for 3 diο¬erent annuli (see Β§ 5) as a function of redshift. Theerrorbars correspond to the Β± π measurement errors of the correlationfunction measurements. The lowest panel plots the true sample redshiftprobability density function of the spectroscopic reference sample (βSpec.β)and the photometric base sample (βPhot.β). therefore study the impact of likelihood mis-speciο¬cation on theinference in Β§ 8.3. In particular we will evaluate the performanceof our methodology by comparing with science requirements of theο¬rst year (βLSST Y1β) and the 10th year (βLSST Y10β) of the LSSTdata release deο¬ned in The LSST Dark Energy Science Collabora-tion et al. (2018). We note that we will utilize a mock simulation ofgalaxy photometry likelihoods in Β§ 8.2 instead of SED likelihoodsconstructed on the simulated photometry, because the simulatedphotometry of the DC2 simulations showed a discontinuous color-redshift mapping, which induced an unrealistically large error inour template ο¬tting results. In Β§ 8.3, which will discuss aspectsof model checking and will not interpret results in the context ofLSST science requirements, we will use both Machine Learningand Template Fitting methods described in Β§ 2.1. We use the software package βthe-wizzβ (Morrison et al. 2017) tomeasure cross-correlations between the reference (spectroscopic)and base (photometric) samples π€ RB and correlations of the refer-ence sample π€ RR in 20 equally spaced redshift bins from π§ β ( , ) ,corresponding to 3042 Mpc comoving distance at the mean redshiftof (cid:104) π§ (cid:105) = .
88. Fig. 5 shows these measurements in the three top pan-els for three annuli (see Β§ 5) of 0 . β . . β . . β
10 Mpc. In the lowest panel we show the sample redshift proba-bility density functions for the reference and base samples. The cor-relations w RR are larger than the cross-correlations w RB in all threepanels, implying on average a lower than unity ratio π B ( π§ )/ π R ( π§ ) .The errorbars increase with redshift due to the decreasing numberof galaxies, leading to a larger shot noise error. Consider the shapeof w RR and w RB for the largest annuli [ . β . ] Mpc, in the lowredshift range of π§ < .
0. We see that the measurements of w RB https://github.com/morriscb/the-wizz MNRAS , 1β22 (2015), 1β22 (2015)
0. We see that the measurements of w RB https://github.com/morriscb/the-wizz MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error and w RR roughly resemble the shape of the reference and base sam-ple redshift distributions (lowest panel), implying a roughly linearratio π π΅ ( π§ )/ π π ( π§ ) in this redshift range (compare with Fig. 6).For smaller annuli, the change in slope around π§ = . π§ = . w RB and w RR areapproximately equal. Here, the sample redshift distribution of thebase sample is larger than the one of the reference sample. Sincethe QSO sample will have a larger galaxy-dark matter bias than thebase sample, we can expect π π΅ ( π§ )/ π π ( π§ ) < . overlap between DESI and LSSTY10, using measurements obtained on the the 300 deg CosmoDC2simulations, we scale the error on these measurements by a factorof 4 in the following analyses.We would like to generate a cross-correlation likelihood thatallows us full control over the imposed redshift-dependent galaxy-dark matter bias ratio model and that has roughly the correctwidth of the full DESI area. Furthermore, since the mean of theratio distribution depends on both the mean and the variance of w RB and w RR , scaling the measurement error will induce biasesin the mean of this ratio and therefore in the reconstructed samplephotometric redshift distribution.To correct for possible biases that would occur when the mea-surements errors are naively scaled and allow for better control overthe redshift dependent galaxy-dark matter bias ratio, we ο¬rst ο¬t thegalaxy-dark matter bias ratio πΆ ( π§ ) to the original data within these20 bins. We show the results of this ο¬t in Fig. 6 for diο¬erent rangesin physical distance and indicate the redshift range of the diο¬er-ent spectroscopic samples by vertical lines, where these limits aremeant to guide the eye and do not constitute sharp breaks (com-pare with Fig. 1). We see that within these redshift ranges, πΆ ( π§ ) is asmooth function and can be ο¬tted by a 3rd degree Chebychev polyno-mial πΆ ( π§ ) = (cid:205) π = π π π π ( π§ ) . Here, π π ( π§ ) denote Chebychev functionsand π π denotes the expansion coeο¬cients. Within the three redshiftranges {[ . , . ] , [ . , . ] , [ . , . ]} , we perform a regressionο¬t to the median of the πΆ ( π§ ) posterior due its heavy tails. For thefollowing analysis we select the median annuli of 0 . β . β . π€ π π and π€ π π΅ deο¬ned inEq. (26) and forecast a new data vector for π€ π π΅ , while holding themeasurement of π€ π π ο¬xed. This amounts to multiplying the ratio w RB / w RR by a constant for each redshift bin that compensates forthe diο¬erence in the mean of the reconstructed sample photometricredshift distribution before and after we impose the ο¬tted redshiftdependent galaxy-dark matter bias model and perform the scaling.In this way we ensure that our ratio distribution is self-consistentwith the photometric sample redshift distribution. We then usethese adjusted measurements in the composite likelihood (Eq. 34).This correction is necessary because we would otherwise merelyuse noisy measurements with wrongly decreased errorbars, whichwill lead to biases in the probability calibration of any inference. The uncertainty in the correlation function measurements will likelydiο¬er from this factor of four scaling in the real data. Our treatment of thecross-correlation data vector here is approximate and will be complementedin future work.
Redshift z B i a s R a t i o b B ( z ) / b R ( z ) Mag-Lim LRG ELG QSO
Galaxy-DM Bias Ratio Scale/z-Dependence (Β±1 )
Figure 6.
Galaxy-dark matter bias ratio as a function of redshift for diο¬erentscales. We show here the [ , ] percentiles, that correspond to Β± π fora Normal distribution. The redshift ranges of the diο¬erent spectroscopicsubsamples are plotted by vertical lines. Within these ranges, the bias ratiois a relatively smooth function of redshift, indicating a smooth redshiftdependence of the galaxy-dark matter bias of the photometric sample. Atthe borders of these ranges, the bias ratio curves are discontinuous. Furthermore we want to have control over the underlying redshiftdependent galaxy-dark matter bias model to eliminate any additionalspeciο¬cation error. It should therefore merely be seen as an approx-imate forecast of the constraining power that cross-correlations willadd to the composite likelihood and a demonstration of the inferencemethodology. It is not an accurate treatment of galaxy-dark matterbias or the correlation function measurements expected in the ο¬nalLSST measurement. For this we would require a more realistic sim-ulation of the DESI-like spectroscopic sample, the ο¬nal area and amuch better understanding of the galaxy-dark matter bias of eachgalaxy population, all of which are subjects of active investigationin the ο¬eld.
For the photometric part of the composite likelihood we assume aredshift scaling of π ( π§ ) = . ( + π§ ) , where π§ denotes the true, orspectroscopic, redshift. This scaling is a photometric redshift perfor-mance benchmark for LSST frequently adopted in the literature (e.g.Graham et al. 2020) and deο¬ned in the LSST science requirementsdocument . We then generate a mock catalog by sampling valuesfrom the true sample redshift distribution and generate a catalog ofmock likelihoods by scattering these values within this redshift errormodel. We reiterate that we assume here that the redshift likelihoodsconstructed from the galaxiesβ photometry, mimicked here by theaforementioned redshift error model, is perfectly known. We notethat this is an idealized assumption that we impose to demonstratethe methodology described in the previous sections.Tab. 1 summarizes the diο¬erent conο¬gurations we use in thiswork. In particular we investigate posteriors obtained using severaldiο¬erent sample sizes and regularization techniques. In particular,the ο¬rst and second columns show the abbreviation used in thetext and the corresponding ο¬gure. The generated sample size of themock catalog is shown in the third column. The columns βTikhonov https://docushare.lsst.org/docushare/dsweb/Get/LPM-17 (page 4)MNRAS , 1β22 (2015) Figure 7.
Left panel:
Posteriors of the sample redshift probability density function π ( π§ ) of the photometric sample (short: photometric redshift distribution)parametrized by the parameters n B for diο¬erent setups listed in Tab. 1. The x-axis shows the redshift value π§ , the y-axis the value of the n B parameters. Theerrorbars are the [ , ] percentiles, which would correspond to 1 π intervals for a normal distribution. The black dashed curve shows the spectroscopicredshift distribution in the binning used by the cross-correlation measurements. We consider four cases, and refer to Tab. 1 and Β§ 8 for details on the experimentalsetup. We highlight a variance-dominated posterior βSmall Sample (50k)β, which shows a characteristic alternating, or βzig-zagβ pattern, as well as a comparisonbetween the cyan βMedium Sample (500k)β and βMedium Sample (500k), Fid. Setup + WXβ posteriors. Here, the latter includes a cross-correlation βWXβ datavector in its likelihood. This reduces the error especially in the high-redshift tail of the distributions. Right panel:
The y-axis shows the relative diο¬erencebetween the posterior of the photometric redshift distribution parametrized by the n B post parameters and the spectroscopic redshift distribution n B true (blackdashed curve in the left panel).Abbreviation Sample Size Figure Tikhonov Reg. πΌ Initial Binning Eο¬ective Binning WXSmall Sample (50k) 50k Fig. 7 0.1 50 50 -Medium Sample (500k) 500k Fig. 7 0.08 50 31 -Large Sample (5000k) Oversmoothing 5000k Fig. 7 0.0001 25 25 -Medium Sample (500k), Fid. Setup & WX 500k Fig. 7 0.08 50 31 (cid:88)
Tik. Regul. Low 5000k Fig. 8 0.0001 50 25 -Tik. Regul. Medium 5000k Fig. 8 1 50 25 -Tik. Regul. High 5000k Fig. 8 10 50 25 -Oversmoothing 5000k Fig. 8 0.0001 25 25 -Tik. Regul. Low + WX 5000k Fig. 8 0.0001 50 25 (cid:88)
Table 1.
Summary of the diο¬erent conο¬gurations that we test in this work. The ο¬rst column lists the abbreviations, the second refers to the Figure where thesetup is analysed. The next columns list the value of the Tikhonov regularization parameter πΌ (see Β§ 4.2.1), the number of initial bins, the eο¬ective bin numberafter (potentially) applying merging bin regularization (see Β§ 4.2.1) and an indicator if the composite likelihood includes the cross-correlation data (see Β§ 5). Reg. πΌ β, βInitial Binningβ and βEο¬ective Binningβ list the value ofthe Tikhonov regularization parameter πΌ (see Β§ 4.2.1), as well asthe used initial and eο¬ective bin number , i.e., the histogram binnumber after the merging bin regularization scheme. The ο¬nal col-umn indicates whether the cross-correlation data vector is includedin the composite likelihood. Fig. 7 shows a selection of posteriorsusing setups from Tab. 1. The left hand panel shows the obtained As an approximate rule, one can expect a noisy deconvolution if noprior is applied, if the size of the bins is smaller than Β± π range of theindividual galaxy redshift likelihoods for moderate sample sizes of the order10 galaxies. For our redshift range and photometric redshift scatter thiswould imply an eο¬ective number of bins of 30 β probability density function, the right panel the relative diο¬erencebetween these posteriors and the spectroscopic redshift distributionthat is shown as the black dashed line in both panels. The errorbars are the [ , ] percentiles, corresponding to a Gaussian Β± π interval.The βSmall Sample (50k)β setup highlights the noisy, i.e.,variance-dominated deconvolution, where we clearly see the com-paratively large and ο¬uctuating errorbars in Fig. 7. We note thatimposing a smoothing method will reduce these features. How-ever this can come at the expense of additional biases as discussedlater, and the characteristic covariance structure in the posterior isnot a priori problematic, as long as draws from the posteriors arebounded and well-deο¬ned. Merging bin regularization exploits this MNRAS , 1β22 (2015), 1β22 (2015)
Summary of the diο¬erent conο¬gurations that we test in this work. The ο¬rst column lists the abbreviations, the second refers to the Figure where thesetup is analysed. The next columns list the value of the Tikhonov regularization parameter πΌ (see Β§ 4.2.1), the number of initial bins, the eο¬ective bin numberafter (potentially) applying merging bin regularization (see Β§ 4.2.1) and an indicator if the composite likelihood includes the cross-correlation data (see Β§ 5). Reg. πΌ β, βInitial Binningβ and βEο¬ective Binningβ list the value ofthe Tikhonov regularization parameter πΌ (see Β§ 4.2.1), as well asthe used initial and eο¬ective bin number , i.e., the histogram binnumber after the merging bin regularization scheme. The ο¬nal col-umn indicates whether the cross-correlation data vector is includedin the composite likelihood. Fig. 7 shows a selection of posteriorsusing setups from Tab. 1. The left hand panel shows the obtained As an approximate rule, one can expect a noisy deconvolution if noprior is applied, if the size of the bins is smaller than Β± π range of theindividual galaxy redshift likelihoods for moderate sample sizes of the order10 galaxies. For our redshift range and photometric redshift scatter thiswould imply an eο¬ective number of bins of 30 β probability density function, the right panel the relative diο¬erencebetween these posteriors and the spectroscopic redshift distributionthat is shown as the black dashed line in both panels. The errorbars are the [ , ] percentiles, corresponding to a Gaussian Β± π interval.The βSmall Sample (50k)β setup highlights the noisy, i.e.,variance-dominated deconvolution, where we clearly see the com-paratively large and ο¬uctuating errorbars in Fig. 7. We note thatimposing a smoothing method will reduce these features. How-ever this can come at the expense of additional biases as discussedlater, and the characteristic covariance structure in the posterior isnot a priori problematic, as long as draws from the posteriors arebounded and well-deο¬ned. Merging bin regularization exploits this MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error anti-correlation structure to provide an βobjectiveβ regularizationwithout the need to carefully motivate an external prior or smooth-ing model choice.An alternative that provides additional physical motivation isthe inclusion of clustering redshift measurements into the compos-ite likelihood. This can be seen by comparing the posteriors fromthe βMedium Sample (500k)β and the βMedium Sample (500k), Fid.Setup & WXβ cases. The corresponding results in Fig. 7 show thatfor the same regularization, the inclusion of the cross-correlationdata into the likelihood decreases the uncertainties, which is espe-cially visible in the high-redshift tail. We note that these results aredependent on the chosen galaxy-dark matter bias model. As dis-cussed in the beginning of this section, the parametrization usedhere is very ο¬exible and the eο¬ective number of parameters canlikely be reduced, if a more physical model is chosen. In this regard,we can view the presented reduction in the statistical error due toclustering redshifts as conservative. As mentioned previously, bi-ases due to ill-motivated regularization choices play an importantrole, especially for large sample sizes, where the statistical error issmall. We illustrate this here in the βLarge Sample (5000k) Over-smoothingβ case, by deliberately choosing a coarser binning withoutmerging bin regularization. We clearly see that the statistical erroris quite small with the bias dominating.In order to investigate the quality of probability calibration,we consider the posterior distribution over the mean values of pho-tometric sample redshift distributions drawn from the posterior of n B . This is a reasonable choice, as it has been shown that accuratemodelling of weak lensing and LSS critically depends on accuraterecovery of the posterior mean.Fig. 8 shows ο¬ve boxplots that each visualize the distributionof the posterior mean that corresponds to a diο¬erent setup underconsideration. The box edges denote the [ , ] percentiles, andthe deο¬nition of the whiskers, i.e., the thin vertical lines with shorthorizontal edges represent the [ . , . ] percentiles. The horizon-tal line within the box represents the median . The x-axis showsseveral diο¬erent scenarios, as listed in Tab. 1; the y-axis shows thevalue of the posterior mean. The middle solid black line correspondsto the mean of the true redshift distribution, shown as the dashedblack line in the left panel of Fig. 7. We reiterate that all resultshave been obtained using a mock catalog containing 5000k galax-ies. The (dashed/dotted), (grey/magenta) horizontal lines representthe requirement values for (Y1/Y10), (LSS/WL) measurements asgiven in the LSST DESC Science Requirements Document (DESCSRD The LSST Dark Energy Science Collaboration et al. 2018).We note that the LSST DESC Science Requirements Documentsconsiders a tomographic analysis and not a single bin, as we dohere. We therefore restrict ourselves to a qualitative comparison.Furthermore it should be noted that higher order moments of thephotometric sample redshift distribution will also correlate withcosmological parameters, especially for a clustering likelihood (seee.g. Nicola et al. 2020; Hadzhiyska et al. 2020), and our metricis therefore bound to be incomplete. Redeο¬ning these metrics andrequirements is the subject of ongoing work.We consider three scenarios: βTik Regul. Lowβ, βTik. Regul.Mediumβ and Tik. Regul. Highβ. As can be seen from Tab. 1 these Concretely, we draw a number of π π΅ realizations that each parametrizea photometric sample redshift distribution and evaluate the mean on each ofthese distributions. We note that the original deο¬nition of the boxplot uses a diο¬erent deο¬-nition of the box size and the whiskers. We refer to Wickham & Stryjewski(2012) for more details. scenarios diο¬er by their value of the Tikhonov regularization pa-rameter πΌ . With increasing πΌ , the error bars decrease and the bias inthe results increases. When comparing this with the βoversmooth-ingβ results, we see the same pattern. This similarity in behaviorsarises because both a large πΌ and choosing large bins reduces thevariance of each bin.Finally we show the impact of including the cross-correlationmeasurements into the data vector in the βTik. Regul. Low + WXβscenario, which adds clustering information to the βTik. Regul.Lowβ scenario. When comparing these two cases, we see that thedistribution of the posterior mean is now symmetric and reasonablycentered within the science requirements. In particular we notethat the uncertainties are still much larger when compared withthe previously considered, strongly regularized cases. This showsthat while the eο¬ect of reducing the variance of the posteriors issimilar when using regularization or including cross-correlationdata into the composite likelihood, the posteriors can be much bettercalibrated in the latter case. Using a smoothing, or regularization,method essentially makes assumptions about the true shape of thedistribution without strict data evidence. In contrast, adding cross-correlations to the composite likelihood adds this information in aphysical, data-driven way.Another eο¬ect that needs consideration is the increase in theintrinsic estimator bias due to the βdownsamplingβ of the probabil-ity density function to a lower resolution, e.g., by picking largerbin width or by imposing a diο¬erent regularization or smoothingscheme. This loss in resolution implies that we inadvertently limitthe accuracy with which small scale structure can be reconstructedin the density ο¬eld along the line-of-sight. As demonstrated andstudied in detail in Rau et al. (2017), this eο¬ect can lead to biasesin the cosmological parameter inference that are often small, butthat would need scrutiny for upcoming data analyses. Since we gavea detailed description of this eο¬ect in Rau et al. (2017) includingschemes to detect and mitigate these eο¬ects, we do not focus on itin detail here. However, this eο¬ect can be illustrated for the currentsetup, since the merging bin regularization downsamples the reso-lution to a relatively coarse grid of 31 bins. Furthermore considerthe redshift distribution of true redshifts discretized using the 20 bingrid used to obtain the cross-correlation measurements. Since weuse this distribution, i.e. the black curve in Fig. 7, as a reference, wealso have to consider its intrinsic discretization error. Concretely,when comparing the mean estimated from this curve with the sam-ple mean, we obtain a diο¬erence in these values of 0.0079. Whilethis is of the same order as the Y1 science requirements in Fig. 8,Y10 requirements will necessitate an increase in sample size orthe inclusion of cross-correlation constraints that will allow us toperform inference at a higher resolution. Due to the slow expectedconvergence of deconvolution estimators with sample size (see e.g.Carroll & Hall 1988), it is likely that several orders of magnitudeincrease in sample size will be necessary. This is attainable for thelarge numbers of observed galaxies in LSST Y10, and our method-ology can scale to large sample sizes. However, in order to reachthe sample sizes that are expected for LSST observations, we needto develop an implementation that optimizes storage space and usesan eο¬cient parallelization strategy, which is beyond the scope ofthis work.Alternatively, we could use a diο¬erent scheme that employs acontinuous model like logistic Gaussian processes (Rau et al. 2020)or Dirichlet processes. The convergence of these density estimatorswill likely be better, however they will also require additional com-putational overhead in the inference. A detailed study of estimatorconvergence is needed to settle on a recommendation and prove sig- MNRAS , 1β22 (2015) Tik. Regul.Low Tik. Regul.Medium Tik. Regul.High Oversmoothing Tik. Regul.Low + WX0.870.880.890.900.910.920.930.94 P o s t e r i o r M e a n z DESC SRD Y10 WLDESC SRD Y1 WLDESC SRD Y10 LSSDESC SRD Y1 LSS
Figure 8.
Boxplot illustration of the mean of the posterior sample redshiftprobability density function of the photometric sample (short: posteriormean) for diο¬erent experimental setups listed in Tab. 1 and detailed in Β§ 8.The x-axis lists the diο¬erent scenarios, the y-axis the value of the posteriormean. The box shows the [ , ] percentiles, the vertical lines with hori-zontal edges (whiskers) show the [ . , . ] percentiles, corresponding tothe 1 π and 2 π intervals for the normal distribution. The horizontal line inthe box is the median. The (dashed/dotted), (grey/magenta) lines correspondto the requirement on the uncertainty of the posterior mean as quoted in theLSST DESC Science Requirements Document (DESC SRD, The LSSTDark Energy Science Collaboration et al. 2018) for (Y1/Y10) (LSS/WL)measurements. The central, solid grey line is the mean of the true redshiftdistribution, shown as the dashed black line in the left panel of Fig. 7. Allresults have been obtained using a mock catalog of 5000k galaxies withphotometric redshift scatter that is perfectly calibrated. We highlight thedecrease in the statistical error and potential increase in systematic bias forlarger regularization, going from the leftmost to the fourth case. The right-most boxplot shows the impact of including clustering redshift informationinto the likelihood. niο¬cant improvement over the simple histogram scheme employedhere; we will leave this for future work.Most importantly, however, it is likely that systematic errorsdue to the miscalibration and mis-speciο¬cation of the compositelikelihood, either by a suboptimal galaxy-dark matter bias modelor due to miscalibrated SED likelihoods, will lead to an errorbudget that will dominate the aforementioned errors. If the mis-speciο¬cation can be parametrized and marginalized over, the vari-ance of the parameter posteriors will be increased, otherwise theywill lead to biases in the resulting parameter posteriors. In thefollowing section we will discuss these sources of error. We willshowcase the usage of posterior predictive checks as a way to detectmiscalibration and suggest procedures for consistent model check-ing and reο¬nement. We discussed and showcased our inference methodology in the pre-vious section using idealized data. To complement the discussion,this section highlights how miscalibrated likelihoods can lead tobiases in the inferred sample redshift distribution and how posteriorpredictive checks can be used to detect these issues.To mimic well-calibrated photometric likelihoods we use con-ditional density estimates from the FlexZboost package (Dalmassoet al. 2020). We note that these conditional densities are not photo-metric likelihoods in the sense of Eq. (34). The free parameters inthe photometric likelihood are the redshifts of the galaxies and pa-
Redshift w P o s t P r e d R B w M e a s R B UnbiasedBiased = 0.8 Sys. Marg.Biased Fixed Bias = 0.8Biased Marg. Bias = 0.8
Figure 9.
We plot the residual between replicated and true scale-averagedcross-correlation measurement w RB between a DESI-like spectroscopic ref-erence (βRβ) and photometric base (βBβ) sample as a function of redshift.The horizontal line with errorbars shows the uncertainties in the originalmeasurement. The green/magenta contours show the scenario where wemarginalize over all parameters c that parametrize the redshift dependentgalaxy-dark matter bias ratio function πΆ ( π§ ) (see Β§ 8.1) without the sys-tematics kernel for the unbiased/biased ( π = .
8) cases. The yellow/bluecontours consider the biased scenario ( π = . πΆ ( π§ ) and do/donot marginalize over the systematics kernel. The error bars and contoursshow the [5, 95] percentiles. We see that the replicated measurements donot show signiο¬cant tension with the original measurements, if we eithermarginalize over the systematic (βBiased f=0.8 Sys. Marg.β) or if we use aο¬exible redshift-dependent galaxy-dark matter bias model (βBiased Marg.Bias f=0.8β). Only if the form of the galaxy-dark matter bias is known togood precision β in this case we hold its values ο¬xed β are PPCs usingcross-correlations sensitive in detecting tensions. rameters that describe properties of the Spectral Energy Distribution(SED). In conditional density estimation, non-physical parametersdescribe a ο¬exible model that provides a mapping between photom-etry and redshift. This ο¬exible model is then ο¬tted to known cali-bration data. Thus while in SED ο¬tting the distribution of redshiftconstitutes a posterior distribution, conditional density estimationtreats it as a predictive distribution (often without marginalizingover the modelling uncertainty). However since the goal of thissubsection is to demonstrate potential systematic biases and uncer-tainties in the deconvolution operation, this diο¬erence is not of greatimportance here.In order to simulate the impact that a population of galaxieswith inaccurately calibrated photometric redshift likelihoods has onthe deconvolved redshift distribution, we consider a redshift range of0 . β . π = .
8) of galaxies have alikelihood from the BPZ code, and only 20% retain their conditionaldensity predictions from the FlexZboost code. We picked this setupbecause the BPZ predictions within this redshift range, while be-ing inferior to the FlexZboost predictions, still have an acceptablequality. We perform this experiment by selecting a range in redshiftbecause we will perform posterior predictive checks (PPC) usingcross-correlations and need to control where we would expect sys-tematics. This will allow us to disentangle model misspeciο¬cation
MNRAS , 1β22 (2015), 1β22 (2015)
MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error issues from the systematics, e.g. from the βnoisyβ deconvolution,described previously. We note that the quality of photometric red-shift likelihoods does not sharply change with redshift in this way,in real photometric samples. Instead photometric redshift qualityis a complex function in color space that strongly depends e.g. onthe quality of the photometry, the number of available bands, theamount of calibration data and the template set. Modelling this ac-curately is beyond the scope of this work and would require themeasurement of cross-correlations in color cells and the extensionof PPC to the full composite likelihood, that includes sampling thephotometry of galaxies in these color cells. Using the mean of theFlexZboost conditional densities for the selection instead of the trueredshifts would βsmooth-outβ the quality of the likelihoods as a func-tion of redshift at the boundaries of the 0 . β . n B and the parameters that govern the galaxy-darkmatter bias ratio evolution c , following the Chebychev basis ex-pansion described in Β§ 8.1. For simplicity we will deconvolve theredshift distribution on the same 20 bin redshift grid used in thecross-correlation data vector. For 500k galaxies, this leads to verysmall statistical errorbars in the deconvolution. As a simpliο¬cationwe can then ο¬x the n B posterior to its maximum likelihood value.As mentioned in the previous section, this oversmoothing will leadto biases in the n B posteriors. However, since we will only performa posterior predictive analysis with respect to the clustering likeli-hood, that is less constraining than the photometric likelihood, thesystematics incurred by these simpliο¬cations and the underestima-tion of statistical error, are sub-dominant compared with the overallstatistical error budget from the correlation function measurements.Fig. 9 shows the sampled cross correlation measurements fromthe ο¬tted joint model, in residual to the original measurements. Weshowcase four scenarios. In the unbiased case we use FlexZBoostPDFs and marginalize over all c parameters. Due to the good cal-ibration of these Machine Learning-produced conditional distribu-tions, we obtain very similar results compared with the previoussection. The reason for this success is, of course, the representativetraining set that would not be available in a practical application.Furthermore we consider three scenarios with π = .
8. The sce-nario shown in yellow ο¬xes the galaxy-dark matter bias parameters( c ), but marginalizes over the parameters of the systematics kernel.The blue/magenta lines ο¬x/marginalize over the c parameters, butdo not include the systematics kernel correction.As can be seen in Fig. 9, the treatment of galaxy-dark matterbias has a profound impact on the consistency between the replicatedand original cross-correlation measurements. Within the errors, theresults are consistent for all scenarios except the one without sys-tematics kernel correction that uses a ο¬xed πΆ ( π§ ) model. As shownin the yellow line, these biases can be corrected by the systematicskernel marginalization. This illustrates that if suο¬cient informationabout πΆ ( π§ ) is available, the clustering likelihood alone can allow forpowerful posterior predictive checks. If this is not the case, consis-tency tests of redshift distributions with respect to clustering redshiftmeasurements can be misleading. Provided suο¬cient informationon the galaxy-dark matter bias, we can parametrize the biases in the deconvolved density estimate using, e.g., a convolution with akernel function as described in Β§ 7.2. We show these results as theyellow lines βBiased π = . n B vector with a Gaussian kernel function of width Ξ π β [ . , . ] in 40 steps. We see that this correction can compensate for themisspeciο¬ed likelihoods even in the case of a ο¬xed πΆ ( π§ ) model.The degeneracy between the redshift-dependent galaxy-dark matterbias ratio model and the n B parameters highlights the importanceof performing posterior predictive checks in color space to provideadditional information on the redshift distribution. However, thisrequires careful modelling of SEDs and the development of a trans-parent, reproducible analysis framework that additionally includestests for parameter degeneracies and a model comparison frame-work. This is beyond the scope of this work, but will be addressedin a future paper. Accurate photometric redshift inference is one of the most impor-tant challenges in large area photometric surveys like LSST, DES,HSC, or KiDS. As discussed in detail in Β§ 3, photometric redshiftinference is, from a statistical point of view, a deconvolution prob-lem, where an underlying true redshift distribution is convolvedwith an SED model-dependent error distribution given by the pho-tometric likelihood. The deconvolution inference of sample redshiftdistribution is not new (e.g. Padmanabhan et al. 2005; Leistedt et al.2016; Malz & Hogg 2020), and spatial information has also beenincorporated into the inference (e.g. Alarcon et al. 2020b; SΓ‘nchez& Bernstein 2019; Jones & Heavens 2019; Rau et al. 2020). We ex-tended these prior works by developing a fast approximate inferencescheme for deconvolution, that combines redshift information fromboth the photometry and the spatial distribution of galaxies in termsof a composite likelihood ansatz. We particularly provided a discus-sion on regularization techniques and the tradeoο¬ between bias andvariance in the Bayesian context for medium to large sample sizes.In particular, our goal is to include the treatment of photomet-ric redshift via the likelihood of the galaxiesβ photometry into thecurrent cosmological inference framework, which is based on cor-relation functions. The main reason for our likelihood choice is toallow the easy integration into the likelihood inference frameworkbased on two-point statistics of galaxy density and shear ο¬elds. Thisis more diο¬cult for other approaches presented in Alarcon et al.(2020b) and SΓ‘nchez & Bernstein (2019) since, in the currentlydemonstrated form, the redshift information from cross-correlatingthe overlapping spectroscopic sample is included via an estimatorand not using a likelihood (that would depend on cosmologicalparameters). While the works of Padmanabhan et al. (2005); Leist-edt et al. (2016); Malz & Hogg (2020) are structurally similar interms of the treatment of the photometric likelihood, they do notdiscuss the eο¬ect of including clustering information. It is notewor-thy that the early work by Padmanabhan et al. (2005) provides anexcellent, explicit discussion of regularization, which is the maindiο¬culty in the photometric redshift problem, although not in thecontext of probability calibration and the aforementioned joint in-ference framework. We summarized our approach to the challengesthat arise in the estimation of redshift distributions for samples ofgalaxies in the context of photometric surveys. Concretely, we con-sidered the combination of photometric information with two-pointstatistics, the scalability and regularization of the deconvolution in-ference in the large sample scenario, and investigate the impact of
MNRAS , 1β22 (2015) systematics from misspeciο¬ed individual galaxy photometric like-lihoods, proposing parametrizations for these systematics. Theseachievements lay the foundations for future extensions that we willdiscuss in the next section.In Β§ 4.1 we described our inference methodology that is de-signed to facilitate inference on large galaxy catalogs to be expectedin LSST. The scheme uses a Laplace Approximation in logit spaceand facilitates inference using an iterative scheme of expectationmaximization update equations. This provides computational ad-vantages over sampling approaches. Additionally this methodologyfacilitates fast joint inference with a cross-correlation data vector(see Β§ 5) that we included in a composite likelihood ansatz. As high-lighted in Β§ 6, this provides the possibility of additional extensionsthat include two-point statistics from cosmological weak lensingand galaxy-galaxy lensing measurements. As we discussed in Β§ 3,ensemble redshift distribution inference based on a photometriclikelihood is a deconvolution problem, which requires regulariza-tion to yield bounded and well-deο¬ned results. In this context, wediscussed a regularization scheme that consists of a combinationof Tikhonov regularization with (more importantly) a scheme thatmerges neighboring bins to exploit the characteristic covariancestructure in the deconvolved densities. In agreement with the ο¬nd-ings of the original paper by Kuusela (2016) that proposed andapplied this scheme to the Poisson inverse problem, we ο¬nd that theβMerging Binβ scheme leads to better calibrated results as comparedwith Tikhonov regularization and with an oversmoothing schemethat selects a coarser redshift binning for the sample redshift his-tograms.In order to test and discuss the quality of our posterior infer-ence, we used data from the CosmoDC2 simulations to generate aspectroscopic DESI-like sample and a photometric mock catalog,that uses an LSST-like photometric error model. This allowed us totest the impact of a spectroscopic calibration sample with an inho-mogeneous galaxy population as a function of redshift. We foundthat the ratio between the redshift-dependent galaxy-dark matterbias of the photometric and the spectroscopic sample is a smoothfunction of redshift, if the spectroscopic calibration sample consistsof a single galaxy population, and is discontinuous if the galaxypopulation strongly changes. We therefore employed a step-wisesmooth function based on a Chebychev polynomial expansion toparametrize this ratio.In Β§ 8 we performed a forecast of redshift inference perfor-mance on ideal data, assuming perfectly calibrated individual galaxyredshift likelihoods. We found that using the aforementioned merg-ing bin regularization, we were able to produce accurate posteriorsof ensemble redshift distributions. We reiterate that using other reg-ularization schemes, like an overly large Tikhonov regularizationparameter, or an oversmoothing approach that picks overly widehistogram bins, can lead to signiο¬cant biases in the recovered pos-terior mean.When compared with the DESC science requirements for WLand large scale structure measurements in terms of the mean ofthe photometric sample redshift distribution, we found that we canmeet the DESC SRD Y1 goals and remain consistent with the DESCSRD Y10 goals with 5000k galaxies, if cross-correlations are in-cluded in the joint composite likelihood. In practical applications,however, Spectral Energy Distribution (SED) templates for galaxieswill be subject to modelling biases that cannot be well calibratedusing spectroscopic data (see e.g. Hartley et al. 2020). We thereforeproposed to use posterior predictive checks (PPC) as a means toevaluate the quality of our inference. Here, we compared replica-tions of the data sampled from the ο¬tted model with the original measurement to evaluate model goodness-of-ο¬t. Speciο¬cally cross-correlation redshift inference is often used to calibrate photometricredshifts obtained using photometry (Newman 2008; Johnson et al.2016; Davis et al. 2017). In Β§ 8.3 we demonstrated that PPC ofcross-correlation measurements can detect systematic biases in therecovered sample redshift distribution if the galaxy-dark matter biasof the photometric and spectroscopic samples is known to suο¬cientaccuracy.In order to parametrize potential biases in the sample redshiftdistribution posteriors caused by misspeciο¬ed photometric likeli-hoods, particularly over-deconvolution eο¬ects that lead to overlynarrow redshift distributions, we proposed a simple Gaussian ο¬lterthat, as demonstrated in Β§ 8.3, was able to correct these biases.
10 FUTURE WORK
In future work, it will be important to extend the inference schemedeveloped in this paper. We plan to consider a range of extensions,e.g., iterated nested Laplace approximations (Bornkamp 2011) inlogit space, the usage of more ο¬exible distributions that can be ο¬ttedusing variational inference schemes, as well as the development ofspecialized subsampling MCMC schemes. The diο¬erent techniqueswill be evaluated in combination with regularization approachesbased on the quality of their probability coverage. Another exten-sion, particularly to reduce the bias in the density estimation, is toconsider other parametrizations for the deconvolved density eitherby employing density estimators with better mean squared errorscaling like Kernel Density estimators, basis function expansions orusing methods such as logistic Gaussian Processes (e.g. Rau et al.2020). The combination of photometric and clustering informationcan be extended by connecting the modelling of SEDs and redshift-dependent galaxy-dark matter bias modelling via the luminosityfunction as shown in van Daalen & White (2018). This also hasthe potential to reduce the degeneracy between SED and redshift-dependent galaxy-dark matter bias systematics. Finally, we note thatthe data quality of the photometry will not be the same in all areason the sky. In order to include these ο¬eld-to-ο¬eld variations into thecomposite likelihood framework, we can, for example, condition thelikelihood on the ο¬eld and include a corresponding data covarianceinto the likelihood by employing either resampling techniques (seee.g. Davison & Hinkley 2013) or using theoretical modelling (e.g.Stoyan & Stoyan 1994; SΓ‘nchez et al. 2020).To conclude, we have presented an eο¬cient photometric red-shift inference framework that combines information from both thephotometry and the spatial distribution of galaxies. The methodol-ogy is designed to scale well to large samples. We complement thisframework with methods for regularization, model checking andredshift systematics parametrization. The forecasts we performedusing CosmoDC2 data give us conο¬dence that, with the additionalimprovements described here, the methodology presented will en-able accurate and well-calibrated redshift inference for LSST andother ongoing and future large area photometric surveys.
SOFTWARE
Besides software referenced directly in the text, we performed theanalyses in this work using the following software packages: thepython language (van Rossum 1995), scipy (Virtanen et al. 2020),numpy (Harris et al. 2020), jupyter notebook (Kluyver et al. 2016),
MNRAS , 1β22 (2015), 1β22 (2015)
MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error ipython (Perez & Granger 2007), matplotlib (Hunter 2007) andpandas (Wes McKinney 2010). DATA AVAILABILITY STATEMENT
The cosmoDC2 extragalactic catalog is publicly available at https://portal.nersc.gov/project/lsst/cosmoDC2/_README.html . The ancillary catalogs (photo-z and DESI-likeselection for cosmoDC2) and other derived data underlying thisarticle will be shared on reasonable request to the correspondingauthor. The source code that implements the algorithms presentedin this article will be made available via Zenodo.
ACKNOWLEDGEMENTS
This paper has undergone internal review in the LSST Dark EnergyScience Collaboration. We would like to thank the internal reviewersDavid Alonso, Will Hartley and David Kirkby for their insightfulcomments.MMR led and planned the project from the initial idea andmotivation to the experimental design. He performed the analysesin interaction with the coauthors, and prepared the paper. CBMcontributed by: development of clustering redshifts software, cre-ation of clustering redshifts photometric and spectroscopic sam-ples, creation of clustering redshift data products. SJS constructedthe photometric redshift catalogs and contributed to writing thecorresponding portions of the paper. SW provided feedback on themethod development and manuscript. RM advised on motivation,scope, experimental design, and analysis, and contributed to theediting of the paper draft. YYM developed the access tools forcosmoDC2 and photo-z data products.MMR and RM are supported by DOE grant DE-SC0010118and NSF grant IIS-1563887. SJS acknowledges support from DOEgrant DESC0009999 and NSF/AURA grant N56981C. MMR wassupported in part by the Simons Foundation (Simons InvestigatorAward, PI: Mandelbaum). This material is based upon work sup-ported in part by the National Science Foundation through Cooper-ative Agreement 1258333 managed by the Association of Univer-sities for Research in Astronomy (AURA), and the Department ofEnergy under Contract No. DE-AC02-76SF00515 with the SLACNational Accelerator Laboratory. Additional LSST funding comesfrom private donations, grants to universities, and in-kind supportfrom LSSTC Institutional Members. SJS acknowledges supportfrom DOE grant DE-SC0009999 and NSF/AURA grant N56981C.Support for YYM was provided by NASA through the NASA Hub-ble Fellowship grant no. HST-HF2-51441.001 awarded by the SpaceTelescope Science Institute, which is operated by the Association ofUniversities for Research in Astronomy, Incorporated, under NASAcontract NAS5-26555.The DESC acknowledges ongoing support from the Insti-tut National de Physique NuclΓ©aire et de Physique des Particulesin France; the Science & Technology Facilities Council in theUnited Kingdom; and the Department of Energy, the National Sci-ence Foundation, and the LSST Corporation in the United States.DESC uses resources of the IN2P3 Computing Center (CC-IN2P3βLyon/Villeurbanne - France) funded by the Centre National de laRecherche Scientiο¬que; the National Energy Research Scientiο¬cComputing Center, a DOE Oο¬ce of Science User Facility sup-ported by the Oο¬ce of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; STFC DiRAC HPC Fa-cilities, funded by UK BIS National E-infrastructure capital grants;and the UK particle physics grid, supported by the GridPP Col-laboration. This work was performed in part under DOE ContractDE-AC02-76SF00515.
REFERENCES
Abbott T. M. C., et al., 2018a, Phys. Rev. D, 98, 043526Abbott T. M. C., et al., 2018b, ApJS, 239, 18Aihara H., et al., 2018, PASJ, 70, S4Alarcon A., et al., 2020a, arXiv e-prints, p. arXiv:2007.11132Alarcon A., SΓ‘nchez C., Bernstein G. M., GaztaΓ±aga E., 2020b, MNRAS,498, 2614Albrecht A., et al., 2006, arXiv e-prints, pp astroβph/0609591Arnouts S., Cristiani S., Moscardini L., Matarrese S., Lucchin F., FontanaA., Giallongo E., 1999, MNRAS, 310, 540Atchison J., Shen S., 1980, Biometrika, 67, 261BenΓtez N., 2000, ApJ, 536, 571Benjamin J., et al., 2013, MNRAS, 431, 1547Benson A. J., 2012, New Astron., 17, 175Bernstein G., Huterer D., 2010, MNRAS, 401, 1399Bishop C. M., 2006, Pattern Recognition and Machine Learning (Informa-tion Science and Statistics). Springer-Verlag, Berlin, HeidelbergBonnett C., 2015, MNRAS, 449, 1043Bornkamp B., 2011, Journal of Computational and Graphical Statistics, 20,656Box G. E. P., Muller M. E., 1958, Ann. Math. Statist., 29, 610Brown M. J. I., et al., 2014, ApJS, 212, 18Carrasco Kind M., Brunner R. J., 2013, MNRAS, 432, 1483Carroll R. J., Hall P., 1988, Journal of the American Statistical Association,83, 1184Cawthon R., et al., 2020, arXiv e-prints, p. arXiv:2012.12826Chang C., et al., 2016, MNRAS, 459, 3203Chen T., Guestrin C., 2016, in Proceedings of the 22Nd ACMSIGKDD International Conference on Knowled ge Discoveryand Data Mining. KDD β16. ACM, New York, NY, USA, pp785β794, doi:10.1145/2939672.2939785, http://doi.acm.org/10.1145/2939672.2939785
Clerkin L., Kirk D., Lahav O., Abdalla F. B., GaztaΓ±aga E., 2015, MNRAS,448, 1389Collister A. A., Lahav O., 2004, Publications of the Astronomical Societyof the Paciο¬c, 116, 345Craig I. J. D., Brown J. C., 1986, Inverse problems in astronomy. A guide toinversion strategies for remotely sensed dataDESI Collaboration et al., 2016, arXiv e-prints, p. arXiv:1611.00036Dalmasso N., Pospisil T., Lee A. B., Izbicki R., Freeman P. E., Malz A. I.,2020, Astronomy and Computing, 30, 100362Davis C., et al., 2017, arXiv e-prints, p. arXiv:1710.02517Davison A. C., Hinkley D. V., 2013, Bootstrap Methods and Their Applica-tion. Cambridge University Press, USAFeldmann R., et al., 2006, MNRAS, 372, 565Gatti M., et al., 2018, MNRAS, 477, 1664Gatti M., et al., 2020, arXiv e-prints, p. arXiv:2012.08569Gelman A., li Meng X., Stern H., 1996, Statistica Sinica, pp 733β807Gelman A., Hwang J., Vehtari A., 2013, arXiv e-prints, p. arXiv:1307.5928Gerdes D. W., Sypniewski A. J., McKay T. A., Hao J., Weis M. R., WechslerR. H., Busha M. T., 2010, ApJ, 715, 823Graham M. L., et al., 2020, AJ, 159, 258Greisel N., Seitz S., Drory N., Bender R., Saglia R. P., Snigula J., 2015,MNRAS, 451, 1848Hadzhiyska B., Alonso D., Nicola A., Slosar A., 2020, arXiv e-prints, p.arXiv:2007.14989Hahn C., Beutler F., Sinha M., Berlind A., Ho S., Hogg D. W., 2019,MNRAS, 485, 2956Harris C. R., et al., 2020, Nature, 585, 357β362MNRAS , 1β22 (2015) Hartley W. G., et al., 2020, MNRAS, 496, 4769Hearin A., Korytov D., Kovacs E., Benson A., Aung H., Bradshaw C.,Campbell D., LSST Dark Energy Science Collaboration 2020, MNRAS,495, 5040Heitmann K., et al., 2019, ApJS, 245, 16Heymans C., et al., 2020, arXiv e-prints, p. arXiv:2007.15632Hikage C., et al., 2019, PASJ, 71, 43Hildebrandt H., et al., 2017, MNRAS, 465, 1454Hildebrandt H., et al., 2020, arXiv e-prints, p. arXiv:2007.15635Hoyle B., 2016, Astronomy and Computing, 16, 34Hoyle B., Rau M. M., 2019, MNRAS, 485, 3642Hoyle B., Rau M. M., Bonnett C., Seitz S., Weller J., 2015, MNRAS, 450,305Hoyle B., et al., 2018, MNRAS, 478, 592β610Hunter J. D., 2007, Computing in Science Engineering, 9, 90Huterer D., Takada M., Bernstein G., Jain B., 2006, MNRAS, 366, 101Huterer D., Lin H., Busha M. T., Wechsler R. H., Cunha C. E., 2014,MNRAS, 444, 129Ilbert O., et al., 2006, A&A, 457, 841IveziΔ Ε½., et al., 2019, ApJ, 873, 111Izbicki R., Lee A. B., 2017, Electron. J. Statist., 11, 2800Johnson A., et al., 2016, MNRAS, 465, 4118Jones D. M., Heavens A. F., 2019, MNRAS, 483, 2487Joudaki S., et al., 2018, MNRAS, 474, 4894Joudaki S., et al., 2020, A&A, 638, L1Kalmbach J. B., Connolly A. J., 2017, AJ, 154, 277Kluyver T., et al., 2016, in Loizides F., Schmidt B., eds, Positioning andPower in Academic Publishing: Players, Agents and Agendas. pp 87 β90Korytov D., et al., 2019, ApJS, 245, 26Kress R., 1998, Numerical Analysis. Graduate Texts in Mathematics,Springer New York, https://books.google.com.na/books?id=R6182rh0tKEC
Kuusela M. J., 2016, PhD thesis, Lausanne, EPFL, doi:10.5075/epο¬-thesis-7118, http://infoscience.epfl.ch/record/220015
LaureΔ³s R., et al., 2011, arXiv e-prints, p. arXiv:1110.3193Leistedt B., Mortlock D. J., Peiris H. V., 2016, MNRAS, 460, 4258Ma Z., Hu W., Huterer D., 2006, ApJ, 636, 21Malz A. I., Hogg D. W., 2020, arXiv e-prints, p. arXiv:2007.12178Mandelbaum R., 2018, ARA&A, 56, 393Matarrese S., Coles P., Lucchin F., Moscardini L., 1997, MNRAS, 286, 115McLeod M., Balan S. T., Abdalla F. B., 2017, MNRAS, 466, 3558McQuinn M., White M., 2013, MNRAS, 433, 2857Meister A., 2009, Deconvolution Problems in Nonparametric Statistics. Lec-ture Notes in Statistics, Springer Berlin Heidelberg, https://books.google.de/books?id=ItGkJPZQj-MC
MΓ©nard B., Scranton R., Schmidt S., Morrison C., Jeong D., Budavari T.,Rahman M., 2013, arXiv e-prints, p. arXiv:1303.4722Morrison C. B., Hildebrandt H., Schmidt S. J., Baldry I. K., Bilicki M., ChoiA., Erben T., Schneider P., 2017, MNRAS, 467, 3576Myles J., et al., 2020, arXiv e-prints, p. arXiv:2012.08566Neal R. M., Hinton G. E., 1993, in Learning in Graphical Models. KluwerAcademic Publishers, pp 355β368Newman J. A., 2008, ApJ, 684, 88Newman J. A., et al., 2015, Astroparticle Physics, 63, 81Nicola A., et al., 2020, J. Cosmology Astropart. Phys., 2020, 044Padmanabhan N., et al., 2005, MNRAS, 359, 237Pawitan Y., 2001, In All Likelihood: Statistical Modelling and InferenceUsing Likelihood. Oxford science publications, OUP Oxford, https://books.google.com/books?id=M-3pSCVxV5oC
Perez F., Granger B. E., 2007, Computing in Science Engineering, 9, 21Prat J., et al., 2018, MNRAS, 473, 1667Quiroz M., Villani M., Kohn R., Tran M.-N., Dang K.-D., 2018, arXive-prints, p. arXiv:1807.08409Raccanelli A., Rahman M., Kovetz E. D., 2017, MNRAS, 468, 3650Ranganathan A., 2004, Assumed Density Filtering,
Rau M. M., Seitz S., Brimioulle F., Frank E., Friedrich O., Gruen D., HoyleB., 2015, MNRAS, 452, 3710Rau M. M., Hoyle B., Paech K., Seitz S., 2017, MNRAS, 466, 2927Rau M. M., Wilson S., Mandelbaum R., 2020, MNRAS, 491, 4768Raue A., Kreutz C., Theis F., Timmer J., 2013, Philosophical transac-tions. Series A, Mathematical, physical, and engineering sciences, 371,20110544Rothenberg T. J., 1971, Econometrica, 39, 577Salvato M., Ilbert O., Hoyle B., 2019, Nature Astronomy, 3, 212SΓ‘nchez C., Bernstein G. M., 2019, MNRAS, 483, 2801SΓ‘nchez C., Raveri M., Alarcon A., Bernstein G. M., 2020, MNRAS,Schmidt S. J., MΓ©nard B., Scranton R., Morrison C., McBride C. K., 2013,MNRAS, 431, 3307Scottez V., et al., 2016, MNRAS, 462, 1683Scranton R., et al., 2005, ApJ, 633, 589Simon P., Hilbert S., 2018, A&A, 613, A15Speagle J. S., Eisenstein D. J., 2015, arXiv e-prints, p. arXiv:1510.08073Spergel D., et al., 2015, arXiv e-prints, p. arXiv:1503.03757StΓΆlzner B., Joachimi B., Korn A., Hildebrandt H., Wright A. H., 2020,arXiv e-prints, p. arXiv:2012.07707Stoyan D., Stoyan H., 1994, Fractals, Random Shapes and PointFields: Methods of Geometrical Statistics. Wiley Series in Probabil-ity and Statistics, Wiley, https://books.google.com/books?id=Dw3vAAAAMAAJ
Tagliaferri R., Longo G., Andreon S., Capozziello S., Donalek C., GiordanoG., 2003, Neural Networks for Photometric Redshifts Evaluation. pp226β234, doi:10.1007/978-3-540-45216-4_26Tanaka M., et al., 2018, PASJ, 70, S9The LSST Dark Energy Science Collaboration et al., 2018, arXiv e-prints,p. arXiv:1809.01669Uitert E., et al., 2017, MNRAS, 476Varin C., Reid N., Firth D., 2011, Statist. Sinica, pp 5β42Virtanen P., et al., 2020, Nature Methods, 17, 261Wes McKinney 2010, in StΓ©fan van der Walt Jarrod Millman eds, Pro-ceedings of the 9th Python in Science Conference. pp 56 β 61,doi:10.25080/Majora-92bf1922-00aWickham H., Stryjewski L., 2012, Technical report, 40 years of boxplots.had.co.nzZhou R., et al., 2020a, arXiv e-prints, p. arXiv:2001.06018Zhou R., et al., 2020b, Research Notes of the American Astronomical Soci-ety, 4, 181van Daalen M. P., White M., 2018, MNRAS, 476, 4649van Rossum G., 1995, Technical Report CS-R9526, Python tutorial. Centrumvoor Wiskunde en Informatica (CWI), Amsterdam
APPENDIX A: DERIVING THE E-M UPDATEEQUATIONS
We start the discussion with an intuitive motivation for the theo-retical foundation of the E-M algorithm. Assume a βsystemβ thatconsists of hidden variables π and observed variables π . We wishto ο¬nd a set of parameters π that maximize the joint distribution ofboth variables given π . We know from statistical physics that the freeenergy of this system πΉ ( π, π ) should be minimized and depends onthe distribution of hidden variables, or states, π ( π¦ ) and the parame-ters of the conditional π ( π¦ | π§, π ) . The E-M algorithm performs thisminimization iteratively, where we assume an initial choice for π .In the πΈ -step, we choose a distribution π ( π¦ ) , while holding π ο¬xed,so that πΉ ( π, π old ) is minimized. In the subsequent π -step we hold π ο¬xed, but choose π in a way that πΉ ( π old , π ) is minimized. Thisprocedure is iterated until the free energy does not change muchwith additional iterations, i.e., the scheme converges. In practicalcalculations, the connection with the variational free energy is notoften used, but it is a useful concept to build up an intuitive under-standing of the method. We refer the interested reader to Neal & MNRAS , 1β22 (2015), 1β22 (2015)
We start the discussion with an intuitive motivation for the theo-retical foundation of the E-M algorithm. Assume a βsystemβ thatconsists of hidden variables π and observed variables π . We wishto ο¬nd a set of parameters π that maximize the joint distribution ofboth variables given π . We know from statistical physics that the freeenergy of this system πΉ ( π, π ) should be minimized and depends onthe distribution of hidden variables, or states, π ( π¦ ) and the parame-ters of the conditional π ( π¦ | π§, π ) . The E-M algorithm performs thisminimization iteratively, where we assume an initial choice for π .In the πΈ -step, we choose a distribution π ( π¦ ) , while holding π ο¬xed,so that πΉ ( π, π old ) is minimized. In the subsequent π -step we hold π ο¬xed, but choose π in a way that πΉ ( π old , π ) is minimized. Thisprocedure is iterated until the free energy does not change muchwith additional iterations, i.e., the scheme converges. In practicalcalculations, the connection with the variational free energy is notoften used, but it is a useful concept to build up an intuitive under-standing of the method. We refer the interested reader to Neal & MNRAS , 1β22 (2015), 1β22 (2015) ikelihood Inference under Photo-z error Hinton (1993) for a more detailed explanation. In the following wewill describe the derivation of the E-M algorithm in the concretecontext of ο¬nding the maximum likelihood solution of our photo-metric likelihood. For that we will use a diο¬erent notation, howeverthe intuition remains unchanged, if we associate the free energy (upto a sign) with the term L( π, π ) in Eq. (A3).To derive the Expectation-Maximization algorithm , we ο¬rstintroduce the parameter vector ΞΆ π for each galaxy π that is a π tot dimensional vector to indicate bin membership in the β1-hot en-codingβ scheme. This means that for each galaxy, we have a π tot dimensional binary vector, where (β0β/β1β) indicates that the galaxy(resides/does not reside) in the respective redshift bin. For the re-mainder of this section we will work with the normed histogram binheights Ο = n π΅ Ξ π§ that parametrize the prior distribution over π§ β πΌ as deο¬ned in Eq. 13. The prior distribution over π» is then givenas π ( π» | Ο ) β π tot (cid:214) π = π gal (cid:214) π = π π π,π π , (A1)where (cid:205) π tot π = π π =
1. Using π» we can write the conditional distribu-tion of the measured photometry given π» as π ( Λ F | π» , Ο ) β π gal (cid:214) π½ = π tot (cid:214) π = (cid:32)β« π§ ππ π§ ππΏ d π§ π½ β« πΌ ππ πΌ ππΏ d πΌ π½ π ( f π½ |T ( π§ π½ , πΌ π½ ) , Ξ£ π½ ) (cid:33) π π½, k . (A2)It is important at this point to note that a marginalization over theparameter vectors π π for all galaxies will yield the second, i.e. thelikelihood, term in Eq. (15).To derive the iterative optimization scheme we ο¬rst considerthe decomposition of the posterior aslog π ( π | ΛF ) = L( π, π ) + πΎ πΏ ( π || π ) + log π ( π ) β log π ( ΛF ) , (A3)where L( π, n B ) = βοΈ π» π ( π» ) log (cid:32) π ( ΛF , π» ) π ( π» ) (cid:33) (A4) πΎ πΏ ( π || π ) = β βοΈ π» π ( π» ) log (cid:32) π ( π» | ΛF , π ) π ( ΞΆ ) (cid:33) (A5)This decomposition implies an iterative scheme to maximizelog π ( π | ΛF ) . Given an initial parameter vector π old , we ο¬rst minimize πΎ πΏ ( π || π ) in the βE-stepβ which directly implies π ( ΞΆ ) = π ( π» | ΛF , π ) .In the βMβ-step we ο¬x the distribution π ( ΞΆ ) and maximize L( π, π ) .This maximization directive is then given as L( π, π ) = π (cid:16) π , π old (cid:17) = βοΈ π» π ( π» | ΛF , π old ) log π ( ΛF , π» | π ) + const . , (A6)which is the expectation of the data log-likelihood with respect to π» . After a new parameter vector π new is obtained, we continue withthe βEβ-step holding π new ο¬xed. This process is continued untilconvergence. In the following we will derive the correspondingupdate equations. The interested reader will ο¬nd the following derivation in analogy to thederivation to the E-M update equations for the Gaussian Mixture model (seeBishop 2006). We are referring to bins as deο¬ned in Eq. (13). Here, π» denotes the collection of ΞΆ vectors of all galaxies. E-step:
Given an old parameter vector π old we evaluate π ( π» | ΛF , Ο old ) β (A7) π gal (cid:214) π½ = π tot (cid:214) π = (cid:32) π j , old β« π§ ππ π§ ππΏ d π§ π½ β« πΌ ππ πΌ ππΏ d πΌ π½ π ( f π½ |T ( π§ π½ , πΌ π½ ) , Ξ£ π½ ) (cid:33) π π½ j . (A8) M-step:
In the maximization step of the algorithm we want tomaximize the expected data log-likelihood with respect to the pa-rameter π» . Given the updated posterior π ( π» | ΛF , Ο old ) this expectationis given as: π ( Ο new , Ο old ) = π gal βοΈ π½ = π tot βοΈ π = πΈ (cid:2) π π½ j (cid:3) Γ (cid:32) log (cid:16) π π½ j (cid:17) + log (cid:32)β« π§ ππ π§ ππΏ d π§ π½ β« πΌ ππ πΌ ππΏ d πΌ π½ π ( f π½ |T ( π§ π½ , πΌ π½ ) , Ξ£ π½ ) (cid:33)(cid:33) , (A9)where πΈ (cid:2) π π½ j (cid:3) = (cid:205) π π,π π ππ π ( π ππ | ΛF , Ο old ) (cid:205) π ππ π ( π π,π | ΛF , Ο old ) = π i , old β« π§ ππ π§ ππΏ d π§ π½ β« πΌ ππ πΌ ππΏ d πΌ π½ π ( f π½ |T ( π§ π½ , πΌ π½ ) , Ξ£ π½ ) (cid:205) π π j , old β« π§ ππ π§ ππΏ d π§ π½ β« πΌ ππ πΌ ππΏ d πΌ π½ π ( f π½ |T ( π§ π½ , πΌ π½ ) , Ξ£ π½ ) (A10)We optimize π (cid:0) nz z , t , new , nz z , t , old (cid:1) under the constraint (cid:205) π π π = π ( Ο new , Ο old ) = π ( Ο new , Ο old ) + π (cid:32)βοΈ π π π β (cid:33) (A11)Equating β Ο π ( Ο new , Ο old ) == , performing a summation over π ,and using the summation constraint of Ο , we obtain β π = π gal . (A12)This leads to the update equations for the E-M scheme that areiterated until we reach convergence in Ο : π π‘π = π π‘ β π π gal βοΈ π = (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) β« π§ ππ π§ ππΏ d π§ π β« πΌ ππ πΌ ππΏ d πΌ π π ( Λf i |T ( π§ π , πΌ π ) , Ξ£ π ) (cid:205) π tot π = π π‘ β π β« π§ ππ π§ ππΏ d π§ π β« πΌ ππ πΌ ππΏ d πΌ π π ( Λf i |T ( π§ π , πΌ π ) , Ξ£ π ) (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) (A13) π π‘π = π π‘ β π (cid:205) π π π‘ β π . (A14)In appendix B we will derive a Laplace approximation to theposterior based on this optimization scheme. We apply and discussthis scheme in Β§ 4 and Β§ 8. In practice we would iterate until the log-likelihood changes only by anextremely small amount.MNRAS , 1β22 (2015) APPENDIX B: DERIVING THE LAPLACEAPPROXIMATION
In the previous appendix we derived an iterative scheme to obtainmaximum likelihood estimates of the vector of normed histogramheights π ML based on the E-M algorithm. We note that the E-Malgorithm is guaranteed to produce a maximum likelihood estimate π ML that lies on the simplex. The direct application of the Laplaceapproximation will eο¬ectively estimate Gaussian errors on the val-ues. Applying this approximation around π ML will lead to posteriorsthat reach to negative values, i.e. the posterior draws are not guar-anteed to lie on the simplex. To extend the Laplace approximationto random variables that lie on the simplex, we ο¬rst consider amapping from simplex space to R π bins β . This mapping is realizedby the additive logistic transformation. Assume y β R π bins β , wedeο¬ne the function π ( y ) = (cid:32) π π¦ + (cid:205) π bins β π = π π¦ π , . . . , π π¦ Nbins β + (cid:205) π bins β π = π π¦ π , + (cid:205) π bins β π = π π¦ π (cid:33) π , (B1)with its inverse y ( π ) = (cid:2) log ( π / π π bins ) , . . . , log ( π π bins β / π π bins ) (cid:3) . (B2)We see that the transformed variables y are now deο¬ned in realspace and we perform the Laplace approximation as usual. Assum-ing a ο¬at prior in logistic space, we can directly utilize the invarianceof the Maximum Likelihood estimate under variable transforma-tions (see e.g. Pawitan 2001) and approximate the posterior π ( y | ΛF ) = N ( y | π ML , πΊ ) , (B3)where π y ML = y ( π ML ) , (B4)and πΊ π = β H β (cid:12)(cid:12)(cid:12) y = y ML . (B5)Here H is the hessian of the log-likelihood (the second term inEq. (15)) as a function of y evaluated at y ( π ML ) .The components of the hessian are given as π» ππ§ = β π gal βοΈ π = (cid:16)(cid:205) π tot π (cid:16) ππ π ππ¦ π§ (cid:17) πΌ π π (cid:17) (cid:16)(cid:205) π tot π = (cid:16) ππ π ππ¦ π (cid:17) πΌ π π (cid:17)(cid:16)(cid:205) π tot π = π π πΌ π π (cid:17) (B6) + π gal βοΈ π = (cid:16)(cid:205) π tot π = π π πΌ π π (cid:17) π tot βοΈ π = (cid:32) π π π ππ¦ π ππ¦ π§ (cid:33) πΌ π π , (B7)where πΌ π π = β« π§ ππ π§ ππΏ d π§ π β« πΌ ππ πΌ ππΏ d πΌ π π ( Λf i |T ( π§ π , πΌ π ) , Ξ£ π ) . (B8)The ο¬rst and second order derivatives are then evaluated to ππ π ππ¦ π =  π π ( β π π ) , π = π β§ π < π π· β π π π π , π β π β§ π < π π· β π π + (cid:205) π· β π§ = exp π¦ π§ , π = π π· (B9)and π π π ππ¦ πΌ ππ¦ π =  ππ π ππ¦ πΌ β π π ππ π ππ¦ πΌ , π = π β§ π < π π· β ππ π ππ¦ πΌ π π β π π ππ π ππ¦ πΌ , π β π β§ π < π π·π π π πΌ β πππππ¦πΌ + (cid:205) π· β π§ = exp π¦ π§ , π = π π· (B10) Transformed into probability, or simplex, space, this posterior isthen identiο¬ed as a logit-normal distribution π ( π | ΛF ) β βοΈ | π πΊ y | (cid:206) π bins π = π π (B11)exp (cid:18) β . (cid:18) log (cid:18) π β N bins π π bins (cid:19) β π y , ML (cid:19) πΊ β y (cid:18) log (cid:18) π β N bins π π bins (cid:19) β π y , ML (cid:19)(cid:19) . (B12)We note that the logit-normal is a probability distribution on thesimplex, just as the Dirichlet. In fact, the Dirichlet can be approxi-mated well by a logit-normal (Atchison & Shen 1980). However thelogit-normal allows for a more complex covariance structure. Thescheme developed in this appendix is applied and analysed in Β§ 4and Β§ 8. This paper has been typeset from a TEX/L A TEX ο¬le prepared by the author.MNRAS , 1β22 (2015), 1β22 (2015)