[PDF] Weak lensing for precision cosmology

Abstract

Weak gravitational lensing, the deflection of light by mass, is one of the best tools to constrain the growth of cosmic structure with time and reveal the nature of dark energy. I discuss the sources of systematic uncertainty in weak lensing measurements and their theoretical interpretation, including our current understanding and other options for future improvement. These include long-standing concerns such as the estimation of coherent shears from galaxy images or redshift distributions of galaxies selected based on photometric redshifts, along with systematic uncertainties that have received less attention to date because they are subdominant contributors to the error budget in current surveys. I also discuss methods for automated systematics detection using survey data of the 2020s. The goal of this review is to describe the current state of the field and what must be done so that if weak lensing measurements lead toward surprising conclusions about key questions such as the nature of dark energy, those conclusions will be credible.

Full PDF

WWeak lensing for precisioncosmology

Rachel Mandelbaum McWilliams Center for Cosmology, Department of Physics, Carnegie MellonUniversity, Pittsburgh, PA 15213, USA; email: [email protected]. Xxx. Xxx. Xxx. YYYY. AA:1–45https://doi.org/10.1146/((please addarticle doi))Copyright c (cid:13)

Keywords gravitational lensing, methods: data analysis, methods: statistical,techniques: image processing, cosmological parameters, cosmology:observations

Abstract

Weak gravitational lensing, the deﬂection of light by mass, is one of thebest tools to constrain the growth of cosmic structure with time andreveal the nature of dark energy. I discuss the sources of systematicuncertainty in weak lensing measurements and their theoretical inter-pretation, including our current understanding and other options forfuture improvement. These include long-standing concerns such as theestimation of coherent shears from galaxy images or redshift distribu-tions of galaxies selected based on photometric redshifts, along withsystematic uncertainties that have received less attention to date be-cause they are subdominant contributors to the error budget in currentsurveys. I also discuss methods for automated systematics detection us-ing survey data of the 2020s. The goal of this review is to describe thecurrent state of the ﬁeld and what must be done so that if weak lensingmeasurements lead toward surprising conclusions about key questionssuch as the nature of dark energy, those conclusions will be credible. a r X i v : . [ a s t r o - ph . C O ] O c t ontents

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22. FROM IMAGES TO CATALOGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1. PSF modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Detector systematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3. Detection and deblending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4. Image combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5. Selection bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.6. Other aspects of the image processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.7. Shear estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.8. Photometric redshifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.9. Masks and survey geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233. FROM CATALOGS TO SCIENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1. Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2. Redshift distributions and bins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3. Theoretical predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4. Intrinsic alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5. Baryonic eﬀects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.6. Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.7. Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334. DETECTING AND MODELING OBSERVATIONAL SYSTEMATICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345. SUMMARY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1. INTRODUCTION

Gravitational lensing is the deﬂection of light rays from distant objects by the matter –including dark matter – along their path to us. In the limit that the deﬂections cause smallmodiﬁcations of the object properties (position, size, brightness, and shape) but not visuallystriking phenomena such as multiple images or arcs, lensing is referred to as “weak lensing”(for recent reviews, see Kilbinger 2015; Dodelson 2017). Since light from distant sourcesthat are near each other on the sky must pass by nearby structures in the cosmic web (seeFigure 1), their shapes are correlated by lensing. This correlation drops with separation onthe sky; its amplitude and scale-dependence can be used to infer the underlying statisticaldistribution of matter and hence the growth of cosmic structure with time. This in turnallows us to infer the properties of dark energy (for early work along these lines, see Hu 2002;Huterer 2002), because the accelerated expansion of the Universe that it causes suppressesthe clustering of matter driven by gravity.The recognition that weak lensing provides the power to constrain the cause of theaccelerated expansion rate of the Universe has driven the development of ever-larger weaklensing surveys; see e.g., the Dark Energy Task Force report (Albrecht et al. 2006) and 2010decadal survey (Decadal Survey Committee 2010). However, weak lensing can also be usedto study the galaxy-dark matter halo connection (e.g., Coupon et al. 2015; Hudson et al.2015; van Uitert et al. 2015; Mandelbaum et al. 2016) and constrain neutrino masses (e.g.,Abazajian & Dodelson 2003; Abazajian et al. 2011; DES Collaboration et al. 2017).The correlation function of galaxy shapes (shear-shear correlations) is often referredto as ‘cosmic shear’. Making these measurements requires (1) the analysis of images to igure 1: The left panel (from LSST Science Collaboration et al. 2009, used with permissionfrom image creator, Tony Tyson) illustrates the basic lensing shear distortion caused byweak gravitational lensing, for a single lens and source. The right panel (image credit:Canada-France Hawaii Telescope) shows the coherent patterns induced in source shapes(blue ellipses) due to large-scale structure; the color scale indicates the density in thesimulation.infer the weak lensing distortions (‘shear’), where deviations from purely random galaxyorientations are assumed to arise due to lensing; (2) estimates of distances to the galaxiesinvolved, in order to interpret the shape distortions in terms of cosmological parameters;and (3) a host of supporting data e.g. to conﬁrm the calibration of the redshift estimatesand inferred shear. Interpreting them requires the ability to make predictions about thegrowth of cosmic structure at late times, well into the nonlinear regime.Weak lensing can be described as a linear transformation between unlensed ( x u , y u ) andlensed coordinates ( x l , y l ), where the origins of the coordinate systems are at the unlensedand lensed positions of the galaxy: (cid:32) x u y u (cid:33) = (cid:32) − γ − κ − γ − γ γ − κ (cid:33) (cid:32) x l y l (cid:33) . (1)There are two components of the complex-valued lensing shear γ = γ +i γ , which describesthe stretching of galaxy images due to lensing, and the convergence κ , which describes achange in size and brightness of lensed objects. The shear has elliptical symmetry, andhence transforms like a spin-2 quantity. Since we do not know the unlensed distribution ofgalaxy sizes very precisely, it is common to write this as (cid:32) x u y u (cid:33) = (1 − κ ) (cid:32) − g − g − g g (cid:33) (cid:32) x l y l (cid:33) , (2)in terms of the reduced shear, g i = γ i / (1 − κ ).Since the lensing shear causes a change the observed galaxy ellipticities, inference of theshear typically depends on measurements of the second moments of galaxies: Q ij = (cid:82) d x I ( x ) W ( x ) x i x j (cid:82) d x I ( x ) W ( x ) , (3) •• Weak lensing 3 here x and x correspond to the x and y directions, I ( x ) denotes the galaxy image lightproﬁle, and W ( x ) is a weighting function. One common deﬁnition of ellipticity relates tothe moments as e = e + i e = Q − Q + 2i Q Q + Q . (4)Another deﬁnition of ellipticity replaces the denominator in Eq. 4 with Q + Q +2[det( Q )] / . Both ellipticity deﬁnitions have a well-deﬁned response to a lensing shear,and hence can be averaged across ensembles of galaxies. A variety of methods exist forestimating these ellipticities while removing the eﬀect of the point-spread function (PSF)from the atmosphere and telescope.The convergence can be thought of as the projected matter overdensity, deﬁned for agiven point on the sky θ as κ ( θ ) = 3 H Ω m c (cid:90) χ d χ χ q ( χ ) a ( χ ) δ ( χθ, χ ) (5)in a ﬂat Universe. Here H is the current Hubble parameter, Ω m is the current matterdensity in units of the critical density, a is the scale factor, δ is the matter overdensity, χ isthe comoving distance, and the lens eﬃciency q is deﬁned as q ( χ ) = (cid:90) ∞ χ d χ (cid:48) n ( χ (cid:48) ) χ (cid:48) − χχ (cid:48) (6)in terms of the source distribution n ( χ (cid:48) ). The line-of-sight projection in the expression for κ indicates why the most interesting weak lensing measurements involve binning by sourceredshift (“tomography”): instead of averaging over all line-of-sight structure, using a set ofdistinct redshift bins enables measurement of how cosmic structure has grown with time.From the initial detections of cosmic shear (Bacon, Refregier & Ellis 2000; Van Waerbekeet al. 2000; Wittman et al. 2000; Rhodes, Refregier & Groth 2001) to recent measurements(e.g., Becker et al. 2016; Jee et al. 2016; Hildebrandt et al. 2017; Troxel et al. 2017), there hasbeen substantial evolution of methodology to ensure that either observational or astrophys-ical systematic errors do not dominate the measurements. Currently, three weak lensingsurveys are ongoing: the Kilo-Degree Survey (KiDS; de Jong et al. 2013), the Dark EnergySurvey (DES; Dark Energy Survey Collaboration et al. 2016), and the Hyper Suprime-Camsurvey (HSC; Aihara et al. 2017). In the 2020s, several “Stage-IV” (Albrecht et al. 2006)surveys will further increase the precision of these measurements: Euclid (Laureijs et al.2011), LSST (LSST Science Collaboration et al. 2009), and WFIRST (Spergel et al. 2015).The focus of this review is on weak lensing method development and systematics mit-igation in preparation for the surveys that will happen in the 2020s, which will have suchsmall statistical errors that serious discrimination among dark energy models will be pos-sible, the era of “precision cosmology” (Figure 2, left panel). Since the weak lensing shearis so small compared to the intrinsic, randomly-oriented galaxy ellipticities (often called“shape noise”), averaging over very large ensembles of galaxies is the key to achieving smallstatistical errors. Indeed, this shape noise dominates over the impact of pixel noise ingalaxy shape estimation for nearly all galaxies above detection signiﬁcance of ∼

5. Henceweak lensing measurements generally use galaxies that are as faint and small as possible,down to the limit imposed by the need to control systematic uncertainties. I describe theobstacles in the path towards a statistical error-dominated analysis, status of existing anal-ysis methods, and areas where more work is needed. How do we do weak lensing correctly Illustration of the potential improvement in the DETF figure of merit arising from Stage IV space-based projects. The bars extend from the pessimistic to the optimistic projections in each case. The final two error bars illustrate the improvement available from combining techniques; other combinations of techniques may be superior or more cost-effective. CL results are from an x-ray satellite; the others results from an optical/NIR satellite.

Illustration of the potential improvement in the DETF figure of merit arising from Stage IV space-based projects in the w a –w p plane. The DETF figure of merit is the reciprocal of the area enclosed by the contours. The outer contour corresponds to Stage II, and the inner contours correspond to pessimistic and optimistic BAO+SN+WL. All contours are 95% C.L. How we put them together

Images Catalogs: lists of objects and their properties Summary statistics: power spectra or correlation functions … Theory inputs: predictions, covariance matrices Likelihood analysis Cosmological parameters ~10 -10 objects ~10 numbers (current methods) ~10 numbers (marginalize over ~10 nuisance parameters here) Figure 2:

Left:

Forecast of dark energy constraining power in Stage-IV space-based sur-veys of the 2020s, normalized with respect to the Stage-II surveys that existed in ∼ Right:

Basic outline of the weak lensing analysis process, where the blue and dark purplerespectively indicate the parts of the analysis covered in Sections 2 and 3 of this review.when we need to trust it for potentially novel results, such as unusual ﬁndings about darkenergy? The goal of the review is to explain the technical state of the art and challengesmoving forward towards this goal.This review will cover both observational and theoretical systematics in weak lensingtwo-point correlations, both shear-shear (cosmic shear) and shear-galaxy (“galaxy-galaxylensing”). The canonical cosmological weak lensing analysis from Stage IV surveys willinclude joint analysis of shear-shear, shear-galaxy, and galaxy-galaxy correlations; however,I refer the reader to other works for thorough discussion of systematics in galaxy-galaxycorrelations (e.g., Morrison & Hildebrandt 2015). Throughout this work, I refer to two-point correlations generically; in practice, they may be estimated in conﬁguration space(“correlation functions”) or Fourier space (“power spectra”). The review will cover theentire path from raw images to science; see analysis ﬂowchart in the right panel of Figure 2.It will focus on cosmological distance scales, and will not cover the possibility of using small-scale lensing and clustering (e.g., More et al. 2015), which brings in a host of additionaltheoretical and observational issues, or cluster number counts (e.g., Hoekstra et al. 2015).To enable thorough discussion of the above topics, several other approaches to weaklensing analysis will be neglected. These include shear beyond two-point correlations; ﬂex-ion; lensing magniﬁcation; lensing cosmography (constraints on distance ratios rather thanstructure growth); and weak lensing outside of the optical or NIR wavelength range, suchas radio lensing. Lensing of the Cosmic Microwave Background (CMB; e.g., Planck Col-laboration et al. 2014) will be discussed as a consistency check on optical lensing, withoutcoverage of its systematic uncertainties.The structure of this review is as follows. I divide the weak lensing analysis processinto two major steps: from images to catalogs with measured object properties (Section 2),and from catalogs to cosmological parameters (Section 3). The additional step of detecting ••

Figure 3: An illustration of the processes that aﬀect the galaxy image (from Mandelbaumet al. 2014), including lensing followed by other eﬀects that also cause coherent shape distor-tions, such as convolution with the point-spread function (or PSF) due to the atmosphere(for ground-based telescopes) and telescope optics.and controlling for observational systematics more generally is described in Section 4. Isummarize the future prospects for the ﬁeld in Section 5.

2. FROM IMAGES TO CATALOGS

The full weak lensing analysis process goes all the way from the raw pixel data to cos-mological parameter constraints. While somewhat artiﬁcial, it is common to separatelyconsider the analysis steps from images to catalogs, followed by catalogs to science. Thisdivision partly reﬂects where certain systematics can be mitigated. Systematics relatedto the measurement process have a ﬁrst-pass correction during the “images to catalogs”pipeline, and any mitigation for insuﬃciency of those corrections occurs in the “catalogs toscience” pipeline. Theoretical systematics, i.e., those related to our insuﬃcient knowledgeof astrophysics, can only be mitigated in that second step. Hence we can think of the“images to catalogs” pipeline as the place where the weak lensing community attempts toestimate the properties of astronomical objects as accurately as possible, and the “catalogsto science” pipeline as the place where all residual systematics must get mitigated.This section focuses on the ﬁrst part of that pipeline. The basic challenge of inferringthe lensing shear from galaxy images is illustrated in Figure 3. I will describe the nature ofthe challenges that arise at each step, the state of the art, and directions for future work.

In this review, I consider the atmospheric PSF, optical PSF, pixel response, and chargediﬀusion together as the eﬀective PSF. In other words, the model is that the pixel response(ideally a top-hat function, though reality can be more complex, resulting in higher-ordercorrections) is convolved with the other PSF components, and then the image is sampledat pixel centers . This model is violated if the pixels do not form a regular grid; current estimates suggest thatthe impact of deviations from a regular grid in real sensors are suﬃciently small to ignore even for igure 4: An illustration of the PSF interpolation problem (credit: Paulin-Henriksson et al.2008, A&A, 484, 67, reproduced with permission c (cid:13)

ESO). From a sparse sampling of starsin a given ﬁeld, the PSF must be interpolated to the positions of galaxies so their propertiescan be measured.Inferring coherent weak lensing distortions requires correction for the eﬀect of the PSF,which if insuﬃciently corrected (a) dilutes the shear estimates, causing a multiplicativebias that is worse for small galaxies, and (b) imprints coherent additive corrections to thegalaxy ellipticity values, due to the PSF anisotropies. Methods for removing the impact ofthe PSF from shear estimates all come with the assumption that the PSF is known. Hence,modeling the PSF correctly is an important challenge for weak lensing; errors in PSF modelsize and shape result in multiplicative and additive shear biases, respectively. The exactimpact on the ensemble weak lensing shear observables depends not just on mean PSFsize or shape errors, but rather their spatial correlations, and the distribution of galaxyproperties. The formalism for understanding these eﬀects either through simulations orthrough a moments-based formalism for propagating PSF modeling errors into shear biaseswas developed over the past decade (Hoekstra 2004; Hirata et al. 2004; Jain, Jarvis &Bernstein 2006; Paulin-Henriksson et al. 2008; Rowe 2010). While accurate determinationof the PSF model is critical for weak lensing cosmology, it is fortunately a systematicuncertainty that comes with null tests that can be used to empirically identify problems,drive algorithmic development, and derive bias corrections.In principle, the PSF modeling process may be thought of as having two components (seeFigure 4). The ﬁrst is using bright star images to model the PSF. The second is interpolatingto other positions so the PSF model can be used to measure galaxy photometry and shapes(for discussion of several PSF interpolation methods, see e.g. Berg´e et al. 2012; Gentile,Courbin & Meylan 2013; Kitching et al. 2013; Lu et al. 2017). Challenges in PSF modeling

LSST (Baumer, Davis & Roodman 2017) ••

LSST (Baumer, Davis & Roodman 2017) •• Weak lensing 7

Focal Plane X (mm) F o c a l P l a n e Y ( mm ) e + e + , p s f Figure 5: The color scale shows the PSF model radial ellipticity residual (∆ e + ) averagedover many HSC survey exposures. Here ‘radial’ refers to the ellipticity component deﬁnedwith respect to the focal plane center. The rings of nonzero values indicate a coherentmisestimation of the radial ellipticity of the PSF near the focal plane edge edge. Figureprovided by Bob Armstrong, based on ﬁgures and data from Bosch et al. 2017.and interpolation diﬀer for ground- and space-based imaging. The optical PSF can bethought of as varying slowly over the ﬁeld-of-view and exhibiting a limited set of predictablepatterns, which is one of the appeals of space-based weak lensing measurements. In contrast,the atmospheric PSF exhibits stochastic behavior (for which the power spectrum can bemeasured, and each exposure is a diﬀerent realization of that power spectrum).Some PSF modeling and interpolation methods are purely empirical. These involvechoosing a set of basis functions to describe the bright star images, and some functionsfor interpolating between those images within a single CCD chip (e.g., regular or Cheby-shev polynomials, though more sophisticated options exist). The PSF tends to exhibitdiscontinuities at chip boundaries due to slight inconsistencies in chip heights, which makesmodeling purely within chips a common process. An example of an empirical PSF modelingalgorithm is PSFEx (Bertin 2011), which was used for both DES and HSC. Figure 5 showsa typical failure mode for empirical approaches: failure to properly describe PSF variationsin parts of the focal plane with the adopted interpolation functions.One method that has the potential to address both the PSF modeling and interpolationproblems is Principal Component Analysis, or PCA (Jee et al. 2007; Schrabback et al.2010). The PCA method considers all of the survey data, and identiﬁes the most importantpatterns in PSF model variation across that data. PCA analysis can be done at the level ofPSF images or any compact representation of the PSF, such as its second moments. Dueto its use of all survey data, with stars in diﬀerent exposures sampling diﬀerent locationsin the focal plane, the method can determine PSF model variation as a function of focalplane position at higher spatial frequency than is possible using only the stars observed on single exposure. Naively, this is a more promising method for space-based data (whichonly has an optical PSF determined by a relatively limited set of physical parameters).Another approach to PSF modeling is a physics-based forward-modeling approach. Onepast example of this (for the

Hubble Space Telescope , or

HST ) is ray-tracing through aphysical model of the telescope optics using

Tiny Tim . This was used by the COSMOSteam (e.g., Leauthaud et al. 2007) for several of its analyses. The idea is to forward-simulate the PSF as a function of position in the focal plane in each band given a limitedset of physical parameters such as variation in focus position, then match the stars in eachexposure against those models to identify the best model for each exposure. The PSF modelinterpolation then uses the ﬁnely-spaced grid of models rather than the more widely-spacedstars. The most obvious failure mode for forward-modeling of the PSF is if some relevantphysics determining the PSF is not included in the model (for an example of this in practice,see Sirianni et al. 1998).While physical modeling seems most appropriate for space-based PSF modeling andinterpolation, in principle one approach for LSST is a combination of an optical model(perhaps with additional empirical constraints from wavefront data, e.g. Roodman, Reil &Davis 2014 and Xin et al. 2016) and a stochastic atmospheric PSF model using, for exam-ple, Gaussian processes or a maximum entropy algorithm (Chang et al. 2012). Empiricalmodeling of the optical PSF using wavefront measurements from out-of-focus exposures atleast partially mitigates the concern about missing physics in the pure forward-modelingapproach. One important advantage of combined optical plus atmospheric PSF modelingis that the optical component can potentially include the chip discontinuities, enabling theatmospheric model to use the entire focal plane rather than modeling each chip separately.Recently, work on PSF modeling systematics has gone beyond second moments-basedsize and shape estimates. Getting higher order moments of the PSF model wrong may beproblematic; such errors can be identiﬁed most easily by comparing stacked star images andPSF models to identify diﬀerences that are not easily described using the second moments.Quantifying their impact on weak lensing is most easily done with simulations; no simpleanalytic formalism has been worked out in this case. Also, failure of the PSF and galaxyproﬁles to be well-approximated by a Gaussian (e.g., like space-based PSFs, since the Airyfunction has a formally inﬁnite variance) causes the simple analytic formalism for secondmoments to fail, rendering simulations necessary.Another eﬀect that has gotten more attention in recent years is the chromatic PSF. Boththe atmospheric and optical PSFs depend on wavelength; even the sensor contribution to thePSF can exhibit slight wavelength dependence as well (Meyers & Burchat 2015a). Withina single broad photometric band, the eﬀective PSF for any given object must depend on itsspectral energy distribution (SED). Since stars and galaxies tend to have diﬀerent SEDs,they will have diﬀerent eﬀective PSFs, which is a problem when using star images to inferthe PSF for galaxies. Even worse, galaxy color gradients cause a violation of the assumptionthat there is a single well-deﬁned PSF for the galaxy. Substantial work has been done onthe chromatic PSF eﬀect on weak lensing measurements (Cypriano et al. 2010; Plazas &Bernstein 2012; Voigt et al. 2012; Meyers & Burchat 2015b; Er et al. 2017). While themagnitude of the eﬀect tends to be larger for space-based PSFs, for which the relevantphysics scales like λ rather than ∼ λ − / for atmospheric PSFs, the actual importance ofthe eﬀect for science depends on the requirements on how well PSF size is known. These •• Weak lensing 9 equirements may be stricter for ground-based surveys given their larger PSF size. Broaderbands, such as those planned for the Euclid survey, are more problematic in this regard. Theabove references include work on mitigation schemes that approach the level of systematicscontrol needed for future surveys.As mentioned above, there are well-deﬁned null tests that can directly reveal PSF mod-eling errors, unlike some of the other systematic uncertainties described in this review.These null tests typically involve sets of stars (high-signiﬁcance detections) that were notused for PSF modeling. A comparison of their sizes and shapes based on second momentswith the PSF model sizes and shapes at the positions of those stars can be quite reveal-ing. While the most obvious thing to do is make a histogram of those diﬀerences and lookfor systematic biases, the spatial correlation function of these errors determines how weaklensing observables will be biased due to PSF modeling errors. For PSF shape errors, thereare ﬁve relevant correlation functions (called ρ statistics), two introduced by Rowe (2010)and three by Jarvis et al. (2016); these include factors of the PSF shape residuals, the PSFshape itself, and the PSF size residuals, and directly correspond to additive terms in theshear-shear correlation function generated by PSF modeling errors. For examples of theiruse in real survey data, see Jarvis et al. (2016); Mandelbaum et al. (2017b).An additional diagnostic is to compare the distribution of PSF shape and size errors forthe non-PSF star sample with the distribution for those stars used to estimate the PSF. Ifthe two samples have the same detection signiﬁcance, then the widths of the distributionscan reveal whether there are overall PSF modeling issues (similar breadth of the distribu-tions) or whether there may be issues with overﬁtting or interpolation (broader distributionfor non-PSF stars). Comparison of the ρ statistics computed with PSF and non-PSF starscan also be revealing. Finally, stacking the PSF size or shape residuals in the focal plane co-ordinate system, across multiple exposures, can reveal systematic failure to model recurringoptical features in the PSF; see Figure 5 for an example.Aside from the obvious approach of developing more sophisticated PSF modeling algo-rithms, survey strategy may mitigate the impact of single-exposure PSF modeling errors onthe ﬁnal multi-exposure shear estimate. For example, consider the not atypical case thatPSF modeling errors systematically correlate with distance from the center of the focalplane (e.g., Figure 5). If all exposures in that region have very small dithers, then eachgalaxy will be observed at nearly the same focal plane position in all exposures, and theirPSF modeling errors will be coherent. If there is substantial dithering compared to the sizeof the focal plane, then the galaxy will be observed at many diﬀerent focal plane positions,and the systematics due to PSF modeling errors will average down. Depending on the co-herent structure of PSF modeling errors, rotational dithering may also be beneﬁcial. Usingsurvey strategy to reduce systematics in large-scale structure statistics was considered byAwan et al. (2016) for LSST, where hundreds of exposures make this approach to system-atics mitigation possible (LSST Science Collaboration et al. 2017), but a similar study inthe weak lensing context has not yet been performed. For the purpose of weak lensing, detector non-idealities can cause two problematic typesof systematics that cannot be treated as a simple convolution (and hence as part of thePSF). First, there are ﬂux-dependent eﬀects that predominantly aﬀect bright objects, suchas nonlinearity or the brighter-fatter eﬀect discussed below. Since weak lensing measure-

10 Mandelbaum ents are dominated by faint galaxies, but the PSF for those faint galaxies is estimated viainterpolation between the PSF modeled from bright stars, detector non-idealities aﬀectingbright objects result in the wrong PSF being used when estimating shear from the faintgalaxies . The second type of detector non-idealities aﬀect all objects, due to defects thatcorrelate with position or galaxy orientation on the detector. They can induce spuriouscoherent shear signals or photometry errors, and/or cause selection biases due to coherentmasking patterns. While correction for some detector non-idealities such as nonlinear re-sponse has long been taken for granted as happening before the stage that weak lenserscare about, the ﬁeld’s approach to detector systematics has otherwise been varied.For example, the detectors on the HST are known to suﬀer from charge transfer ineﬃ-ciency (CTI) due to radiation damage. CTI imparts a preferential direction in the images,which is a problem for weak lensing measurements, the goal of which is to identify coher-ent smearing in galaxy shapes. A physically-motivated pixel-level correction scheme waspioneered primarily by and for weak lensers, resulting in a 97% correction for this eﬀect(Massey et al. 2010; Rhodes et al. 2010).In the past few years, there have been many more studies on the detailed impact ofdetector non-idealities on weak lensing. One example is the so-called “brighter-fatter” eﬀect(Antilogus et al. 2014; Guyonnet et al. 2015), wherein brighter objects spuriously appearslightly broader than fainter ones due to the electric ﬁeld sourced by charges accumulatedwithin a pixel deﬂecting later light-induced charges away from that pixel. Conceptually,one can think of this eﬀect as a dynamic change in pixel boundaries. While early workproposed methods for estimating the eﬀect using ﬂat ﬁelds, later work has focused ondetailed simulation and measurement methods (e.g., Lage, Bradshaw & Tyson 2017). Thiseﬀect is quite problematic for weak lensing because the if left uncorrected, the PSF inferredfrom bright stars is not the relevant one to use when removing the impact of the PSF onthe faint galaxies that dominate weak lensing measurements. Fortunately, empirical testsof PSF model ﬁdelity can be carried out as a function of magnitude to conﬁrm that thebrighter-fatter eﬀect has been corrected at the necessary level. These were used in the HSCsurvey to identify the impact of the brighter-fatter eﬀect, and show that the correctionswere suﬃcient for weak lensing science in HSC (Mandelbaum et al. 2017b).The conceptual framework mentioned above, wherein some detector non-idealities arethought of as dynamically adjusting pixel boundaries and therefore pixel sizes (resulting inastrometric and photometric errors), applies to several other detector eﬀects. For example,the concentric rings known as “tree rings” and bright stripes near detector edges known as“edge distortions” in DES can be modeled this way. Plazas, Bernstein & Sheldon (2014)proposed that the templates for these eﬀects derived using ﬂat-ﬁeld images can be used inthe derivation of photometric and astrometric solutions. In other words, the WCS (worldcoordinate system) that maps from image to world coordinates can include these (admit-tedly rather complex) eﬀects (Rasmussen et al. 2014; Baumer, Davis & Roodman 2017;Bernstein et al. 2017a). Note that modeling the eﬀect as part of the WCS is a distinct solu-tion from pixel-level correction, such as was used for CTI; the WCS-based correction kicksin when measuring positions, photometric quantities, and galaxy shapes from the images.It is a valid approach in the limit that the detector eﬀect can be described with a WCS thatvaries slowly compare to the size of a pixel, though since it is common practice to take the Technically, without correction for the brighter-fatter eﬀect, the PSF estimated from the brightstars is not even the right PSF to use for the bright galaxies. •• Weak lensing 11 ocal aﬃne approximation of the WCS when measuring individual objects, that imposes amore stringent constraint that the WCS can be considered locally aﬃne over the scale ofindividual objects.The details of detector non-idealities depend on the detectors used for each survey(though common mitigation schemes can be used for conceptually similar systematics inmultiple surveys). The general framework for how detector eﬀects impact weak lensingdeveloped in Massey et al. (2013) for Euclid would be relevant for other surveys, with theexact eﬀects to be considered varying. A new complication is the fact that WFIRST willuse near-infrared (NIR) detectors, which operate diﬀerently from CCDs. CMOS deviceshave 1 readout path per pixel, whereas CCDs have 1 readout path per channel; calibratingall pixels to within requirements, including the eﬀect of cross-talk, is more challenging forCMOS devices. The use of diﬀerent types of detectors necessitates studies of the impactof various NIR detector eﬀects, some of which are present in CCDs (e.g., nonlinearityand brighter-fatter: Plazas et al. 2016, 2017), and others that are not, such as interpixelcapacitance (IPC: Kannawadi et al. 2016), persistence, and correlated read noise. Furtherwork on characterizing the impact of NIR detector systematics for weak lensing is underway,with an eye towards placing requirements on hardware and survey strategy to ensure thatresidual systematics can be mitigated at the level needed for weak lensing with WFIRST.A range of correction schemes have been discussed for various detector eﬀects, includingpixel-level correction, including them in the WCS, and applying catalog- or higher-levelmitigation schemes such as template marginalization. Understanding the spatial- and time-dependence of detector eﬀects is also quite important, and can be a challenge especially forCMOS detectors. In principle there may also be the option of indirect mitigation throughsurvey strategy for eﬀects that correlate with location on the focal plane (e.g., followingthe approach of Awan et al. 2016). Additional work is needed to quantify the impact ofvarious low-level detector systematics for upcoming surveys, including lab measurementsand simulations of their impact on weak lensing. Detectors are suﬃciently complex, andrequirements on systematics suﬃciently strict for upcoming surveys, that analysis of realisticlab data is a necessity to avoid unpleasant surprises during commissioning – with ongoingeﬀorts in both WFIRST and LSST (e.g., Seshadri et al. 2013; Tyson et al. 2014).

Traditionally, object detection is carried out by detecting peaks above some detectionthreshold. For weak lensing, additional cuts are typically placed to identify objects thatcan be well-measured; these cuts can be a source of “selection bias” (see Section 2.5).“Deblending”, the process of removing the inﬂuence of light from other objects abovethat same detection threshold, requires the identiﬁcation of detections that have multiplepeaks. This naturally leads to two regimes: recognized blending, wherein the multiple peaksare recognizable, and unrecognized blending, wherein the deblending algorithm is not trig-gered because multiple peaks are not identiﬁed within the detection (see Figure 6). Thesame system could switch between these categories depending on the PSF size. In the caseof mild blending, one can ask whether the deblending algorithm results in unbiased measure-ments of object properties, or whether there are coherent systematics requiring mitigationand/or removal of mildly blended objects. For unrecognized blends, the only possibility is Terminology for these varies; e.g., they are called “ambiguous” blends in Dawson et al. (2016).

12 Mandelbaum a) D2015 J091618.93+29497.3 (b) D2015 J091623.84+294927.7 (c) D2015 J091620.65+29495.9 (d) D2015 J091619.74+294857.3 (e) D2015 J091610.65+294856.5 (g) D2015 J091615.25+294850.4 (f) D2015 J091603.65+295252.3

Figure 6: An illustration of the issues with unrecognized blends ( c (cid:13)

AAS. Reproduced withpermission, from Dawson et al. 2016). Each pair of images shows a ground-based (left) andspace-based (right) image of the same system, with the shapes of the galaxy detection inthe ground-based and space-based images shown as red and green ellipses.to quantify their rate of occurrence, and apply analysis-level mitigation strategies.For weak lensing, the primary concerns are the impact of blending on shear and photo-metric redshifts. If we consider unrecognized blends, then two objects at the same redshiftshould have the same shear, and therefore it should be possible to properly calibrate shearestimates for the combined (non-deblended) object. However, for photometric redshifts ofunrecognized blends at the same redshift, the situation is only simple if the two objectshave the same spectral energy distribution (SED). If they do not, then the composite ob-ject will correspond to some possibly strange SED, which may not give a correct photo- z .If the objects are at diﬀerent redshifts, it is unclear how the shear estimate should be in-terpreted (though the case of large ﬂux ratios or small redshift diﬀerences is simpler thanthe completely general case). The photometric redshift estimation is also complexiﬁed bythe superposition of SEDs at diﬀerent redshifts, even for reasonably large ﬂux ratios. Un-fortunately, the majority of unrecognized blends will be at diﬀerent redshifts (Kirkby et al. in prep. ), except perhaps in the centers of galaxy clusters.The weak lensing community has recently come to confront the issue of blends moredirectly; this area requires more work, both on the deblending algorithms and the post-deblender systematics quantiﬁcation and mitigation. In the past, it was common practice to •• Weak lensing 13 liminate galaxies recognized as having nearby neighbors; e.g., this was done in CFHTLenS (Miller et al. 2013). That approach does not help with unrecognized blends, and can givea few-percent scale-dependent bias in shear-shear correlations due to the fact that closepairs are more prevalent in high-density regions (Hartlap et al. 2011) unless a weightingscheme is used to mitigate that eﬀect. The early Dark Energy Survey Science Veriﬁcationresults imposed cuts on recognized blends (Jarvis et al. 2016) and ignored the issue ofunrecognized blends (while leaving a 5% uncertainty on the multiplicative bias to coverthis or other uncorrected issues). Their Year 1 (Y1) results included a more sophisticatedmulti-object ﬁtting strategy, with a careful study of the impact of blending using simulations(Samuroﬀ et al. 2017). In the HSC survey, which is deeper, Bosch et al. (2017) notes that58% of the objects detected in the HSC Wide survey are recognized blends. As a result,rejecting blended objects is not a viable strategy, and estimation and removal of blend-related systematics is necessary from the outset . This was done using simulations thatincluded a realistic level of nearby structure around galaxies (Mandelbaum et al. 2017a).For future surveys such as LSST, removing blended objects will not be a viable strat-egy due to both their prevalence and the fact that a truly non-negligible fraction of themare unrecognized. Dawson et al. (2016) quantiﬁed the unrecognized blend population asexceeding 10% for a survey like LSST, and investigated diﬀerences in its intrinsic ellipticitydistribution, which is relevant for weak lensing cosmology. Chang et al. (2013) includedblending when quantifying the expected galaxy source number density and redshift distri-bution from LSST, and estimated the impact of rejecting those blend systems recognized asseriously blended. They note that the need to reject these objects depends on our ability toquantify and remove blending systematics (and that the threshold for “seriously” blendeddepends on the deblending and measurement algorithms).The combination of space-based imaging from Euclid and/or WFIRST with LSSTground-based images has the potential to beneﬁt LSST on this issue (e.g., Jain et al. 2015and Rhodes et al. in prep. ). One could imagine a separate round of joint pixel-level analysisresulting in forced deblending for LSST based on higher-resolution space-based imaging.In the area of overlap between those surveys, the space-based data can substantially aidin deblending mildly blended cases and in detecting a larger number of otherwise unrecog-nized blends. It can also be used to learn about the impact of blending for LSST alone,and develop systematic error budgets for the entire LSST survey region.More work is clearly needed to develop a framework for quantifying systematics inshear inference and photometric redshifts due to both recognized and unrecognized blendsfor both ground- and space-based surveys. The impact of blending on photometric redshiftestimation and inference of the correct redshift distribution for photometric redshift-selectedsamples is particularly tricky, since the spectroscopic redshift selection and failure rate forsamples used for training and calibrating photometric redshifts may result in them hav-ing diﬀerent unrecognized blend rates than the general galaxy population. Both improveddeblenders and systematics mitigation schemes would be beneﬁcial. For all surveys, al-gorithms that include color and other information in new ways will likely be explored in More speciﬁcally, galaxies with overlapping isophotes were rejected, while those that did notoverlap very strongly and for which one galaxy could be masked without too much inﬂuence on theﬁt results for the other galaxy were retained. A cut on the “blendedness” parameter in Mandelbaum et al. (2017b) removes a very smallfraction of objects, of order 1%, that were dominated by spurious detections near very brightobjects. It does not aﬀect the much larger fraction of genuinely blended real galaxy detections.

14 Mandelbaum he coming years (e.g., Joseph, Courbin & Starck 2016); another example is the secondarydeblender used in crowded cluster regions for DES (Zhang et al. 2015). Unfortunately,well-established algorithms for crowded stellar ﬁelds (e.g., Stetson 1987) are not relevantin this situation, and more algorithm development is needed for the complex multi-galaxyand star-galaxy blend systems that will dominate surveys like LSST.

Nearly all surveys used for weak lensing science have multiple images at each point that mustbe combined to reach the full survey depth. Taking multiple exposures can be helpful toprevent excessive build-up of artifacts like cosmic rays or saturated stars on any one image,to ﬁll in the gaps between CCD boundaries or artifacts, and to build a Nyquist-sampledimage out of multiple undersampled images. The manner in which image combination isdone is important for weak lensing science. The primary consideration that determines theavailable algorithms is whether the imaging is Nyquist sampled or not, which is primarilya diﬀerence for ground vs. space-based imaging. For space-based imaging, the primaryconcern during the image combination stage is how to properly reconstruct a Nyquist-sampled image, while for ground-based imaging, the individual images are Nyquist sampled,and the main question is how to optimally combine information across exposures.Most space-based instruments are (by design) not Nyquist sampled, but a wise choiceof dithering strategies enables reconstruction of a Nyquist-sampled image. The need toreconstruct a Nyquist-sampled image from multiple undersampled images should factorinto survey strategy for space-based weak lensing surveys; e.g., the expected rate of cosmicrays should be factored into the calculation of how many exposures are needed at eachpoint (to allow for periodic losses and still obtain a Nyquist-sampled image). The ﬁrstprincipled method for combination of space-based telescope images was presented in Lauer(1999). This method used linear algebra to solve out the aliased Fourier modes givensome sub-pixel dither pattern, and reconstruct a Nyquist-sampled image. Rowe, Hirata &Rhodes (2011) generalized this approach to address several challenges: the fact that whendither patterns are comparable to the side of individual chips, the diﬀerent exposures ateach point will experience diﬀerent ﬁeld distortions; the ability to handle holes due to, e.g.,cosmic rays; and the fact that the input PSFs for the diﬀerent exposures may diﬀer. Theirmethod,

IMCOM , has not yet been used for survey data, but it may be the best approach touse for surveys like WFIRST. The commonly used MultiDrizzle method (Fruchter & Hook2002) carries out interpolation on the individual non-Nyquist-sampled exposures. Thisproblematic method necessarily causes stochastic aliasing of the PSF (Rhodes et al. 2007),which means that aliased modes are not fully removed, and the PSF of the resulting coaddedimage may not even be constant across a galaxy image. Later updates to the MultiDrizzlemethod were designed to eliminate these high-frequency artifacts and convolution with aninterpolant kernel (Fruchter 2011). Finally, multi-epoch model ﬁtting is a valid approachto image combination for undersampled data, and in some sense may be the most optimalapproach, provided that a good per-image PSF model is known. Further investigation onthis point is needed to weigh the tradeoﬀs between these options.For ground-based imaging, there are multiple image combination options. The ﬁrst is touse a coadded image for all science, including PSF estimation. Several challenges make this https://github.com/barnabytprowe/imcom •• Weak lensing 15 pproach sub-optimal. These include the fact that depending on how coaddition is carriedout, the coadd may not even have a well-deﬁned PSF at each point (for example, inversevariance weighting that depends on the total ﬂux, or use of a median for the coadd). Theanswer to this challenge is to generate the coadd in a principled way that results in a singlewell-deﬁned PSF at each point (e.g., Bosch et al. 2017). The primary remaining systematicchallenge for this method is that the PSF changes discontinuously wherever there is a chipedge in any exposure contributing to the coadd at a given point. Given the typical stellardensity in images used for weak lensing, it is very diﬃcult to model these small-scale changesin PSF. There is an additional concern with respect to statistical errors, since coadditionmay eﬀectively discard information that is present in the best-seeing images.The second option is to use a coadded image for measurements of object properties,but produce the coadded PSF based on the appropriate weighted combination of the single-exposure PSF models. This “stack-ﬁt” approach was used for the DLS and HSC surveys(Jee et al. 2013; Bosch et al. 2017). It enables the elimination of the primary systematicsconcern with the ﬁrst approach mentioned above (estimation of the coadd PSF), whileretaining concerns such as information loss from the best-seeing exposures. There are alsolower level concerns about relative astrometry, which can behave like a blurring kernel inthe coadded image. If the relative astrometric errors are well-characterized, they can beaccounted for by including this blurring in the coadded PSF, though in the HSC surveythis was not necessary because the astrometric errors were suﬃciently small as to havea negligible eﬀect on weak lensing (Mandelbaum et al. 2017b). While modeling the PSFdiscontinuities is not a problem with this approach, it is still the case that some fractionof objects will be lost due to their being so close to the edge of a sensor in some exposurethat the PSF cannot be modeled as eﬀectively constant across them. This issue is moreimportant as the number of contributing exposures in the coadd increases.Finally, it is possible to use a coadd for object detection and deblending, while measuringshear and (optionally) galaxy photometry through simultaneous ﬁtting to the individualexposures. This approach was proposed for LSST (Tyson et al. 2008), and used for DESY1 (Zuntz et al. 2017), CFHTLenS (Miller et al. 2013), and KiDS. In principle, this allowsfor marginalization over the centroid positions in the individual exposures, which was usedin CFHTLenS to marginalize over relative astrometric errors. Additional beneﬁts includethe fact that the information in the best-seeing images can be preserved. There are severallimitations to this approach. First, some measurement methods (such as those based onmeasurement of moments) do not map to a combined likelihood framework and thereforecannot be used in a simultaneous ﬁtting approach. Second, this approach is computationallyintensive. When producing a best-ﬁtting model, with M iterations, ﬁtting to a coaddrequires M model convolutions with the PSF, while ﬁtting to N individual exposures withdiﬀerent PSFs requires N × M model convolutions. For LSST, which will have hundredsof exposures per object in the ﬁnal dataset, this will require clever optimization to reducecomputational expense. Alternatively, use of a fully Bayesian shear estimation method(e.g., Bernstein et al. 2016) that includes computation only of moments on each exposurewould be a less expensive way to combine information from all exposures; in this case, thecomplexity is moved into the later shear inference step, and will not scale linearly with N .Several ideas have been proposed to optimize the simultaneous ﬁtting approach so thatit is more feasible for future surveys. First, as in one of the methods used in DES (Sheldon2015), representing the galaxy and PSF models as sums of Gaussians with diﬀerent scaleradii can drastically speed up the calculations. This could be done to produce an initial guess

16 Mandelbaum t model parameters before switching to a model with full complexity. Or it could be donein combination with a technique like metacalibration, which will soak up the error inducedby model simpliﬁcation in its estimate of the shear response (see Section 2.7 for a discussionof metacalibration). Second, it would be possible to ﬁt for object properties using a coaddto get an initial guess, and then tweak that model using ﬁts to individual exposures. Third,it may be possible to use some hybrid of the simultaneous ﬁtting and coadd approaches.For example, if the LSST exposures were split into 10 sets based on percentiles in the PSFsize, and each of those sets were coadded, then they could be used for simultaneous ﬁttingwith very little loss of information in the best-seeing exposures. This would also alleviatethe concern raised above with loss of objects falling on PSF discontinuities, since fewerexposures would contribute to each coadd. Further investigation into the interplay betweeninformation gain/loss and systematics for these approaches is needed in order to deﬁne apath forward for the ﬁeld (Sheldon et al. in prep. ).Finally, as suggested above, there is some connection between methods used for imagecombination, deblending, photometry, PSF and shear estimation; the best solution for imagecombination may depend on what is being done for the other steps of the analysis process.

Selection bias arises when quantities used to select or weight the galaxies entering thelensing analysis depend on the galaxy shape. Usually this dependence is implicit ratherthan explicit, due to the details of image analysis algorithms or lensing magniﬁcation (whichmodiﬁes sizes and brightnesses in a way that correlates with the shear). As a result, theprobability of a galaxy entering the sample (or its assigned weight) depends on its alignmentwith respect to the shear or PSF anisotropy direction. This violates the assumption thatgalaxy intrinsic shapes are randomly oriented. If the selection probability depends on theshear, there will be a multiplicative bias, whereas if it depends on the PSF shape, there willbe an additive bias.In the case of continuous quantities used for selection (such as galaxy size) causing thebias, the bias is present only for galaxies near the boundary of the sample in the quantityused for selection. In contrast, while weights used to construct weighted averages are alsocontinuous variables, biases related to the weighting scheme used to take ensemble averages(e.g., Fenech Conti et al. 2017) may be present throughout the sample. Finally, there areselection biases such as avoiding elongated structures (bad CCD columns) that can lead toselection bias depending on how they are imposed (Huﬀ et al. 2014). Cuts on continuousquantities and on regions like bad columns are appropriate to avoid certain systematicerrors, but if not applied with care, they can cause a selection bias.There are several approaches to selection bias. One is to simply avoid it: deﬁne detec-tion signiﬁcances and apparent sizes compared to the PSF using round kernels, so that theresults are insensitive to galaxy shapes (Jarvis et al. 2016). Another way is to estimate itsmagnitude through an analytic formalism based on moments (Hirata et al. 2004; Mandel-baum et al. 2005), and remove it. Its magnitude can be estimated using simulations andthen removed, if the simulations include all physical eﬀects that induce selection biases. Forexample, if the photometric redshifts are coupled to the galaxy shape then there could be aselection bias that can only be estimated using realistic multi-band simulations. Moreover,these estimation methods must take into account how selection bias varies based on the fullrange of observing conditions in the survey. Finally, self-calibration approaches to shear ••

Several other steps in the image processing can aﬀect weak lensing besides those explicitlycalled out in previous subsections. First, in order to detect and measure galaxy properties,the sky level must be estimated and subtracted. Errors in sky subtraction can cause coherentproblems with object detection, photometry, and shear estimation near very bright objects –bright stars or collections of bright galaxies (e.g., in galaxy clusters). A spurious sky gradientcan induce a spurious shear with respect to the location of the bright object causing the skymisestimation. This eﬀect was identiﬁed and its impact on object detection, photometry,and shapes was quantiﬁed in the SDSS (Adelman-McCarthy et al. 2006; Aihara et al. 2011).Another relevant issue is star-galaxy separation. There are two issues: the bright starsample used to estimate the PSF can be contaminated by galaxies; and the faint galaxysample used to estimate lensing shear can be contaminated by (unsheared) stars. Forcurrent surveys, we have no evidence that star-galaxy separation algorithms are failing atproblematic levels (e.g., in HSC, Bosch et al. 2017; Mandelbaum et al. 2017b). Given thatmore sophisticated methods for star/galaxy classiﬁcation have been proposed, for exampleusing machine learning, there is clearly room to improve to the level needed for upcomingsurveys. Slightly more interesting issues with star-galaxy separation include binary starscontaminating the galaxy sample (Hildebrandt et al. 2017). In principle, these can beidentiﬁed by looking for centroid oﬀsets between diﬀerent ﬁlters for highly elongated objects,for those binaries in which the stars have diﬀerent SEDs.The primary astrometric concern for weak lensing is the accuracy of the relative as-trometry between diﬀerent exposures for individual objects. The relative astrometry mustbe well-understood in order to fully understand the object measurements from simultane-ous ﬁtting and/or coaddition. Systematics due to errors in relative astrometry depend onexactly how the image combination is carried out; see Section 2.4. For an example of howastrometric calibration was carried out for the Dark Energy Survey, including correction forcertain detector non-idealities (see Section 2.2), see Bernstein et al. (2017a) and Bernsteinet al. (2017b). The astrometric calibration must include color terms to account for centroidshifts from diﬀerential chromatic refraction (DCR) and other low-level eﬀects.Finally, modeling of the noise in images is relevant to weak lensing measurements.Correlated pixel noise can arise due to low-level unresolved galaxies just below the detectionthreshold, methods used to combine multiple exposures into a coadd (Section 2.4), and pixel-level correction for eﬀects such as CTI (Section 2.2). Since correlated noise means that

18 Mandelbaum etection signiﬁcances diﬀer from the values one would naively assume given uncorrelatednoise with the same variance, and shear biases depend on the detection signiﬁcance (Gurvich& Mandelbaum 2016), it is important to understand the noise correlations.

Since the initial detections of weak lensing shears around galaxy clusters in the 1990s, a largefraction of the weak lensing community’s technical concern eﬀort has focused on the chal-lenge of correcting galaxy shapes for the impact of the PSF so they can be averaged to inferthe lensing shear. In the past two decades, the ﬁeld has moved from simple methods basedon correcting second moments of galaxy images for the moments of the PSF (e.g., Kaiser,Squires & Broadhurst 1995), to a broader set of methods that include ﬁtting parametricmodels (see methods described in Massey et al. 2007a), to greater conceptual sophistica-tion in how shear should be inferred (see methods described in Mandelbaum et al. 2015).The community has set itself a series of blind challenges (Heymans et al. 2006a; Masseyet al. 2007a; Bridle et al. 2009, 2010; Kitching et al. 2010, 2012; Mandelbaum et al. 2014,2015) aimed at benchmarking the performance of shear estimation methods in a commonsetting, understanding the main challenges, and in the process developed an open-source,well-validated image simulation software package (

GalSim ; Rowe et al. 2015).It is important to note that we do not care about galaxy shapes. Indeed, the conceptof a single number characterizing the galaxy shape is not well-deﬁned in the presence ofellipticity gradients and irregular galaxy morphology. Given those real physical eﬀects, themeasured shape will depend on the radial weight function. Moreover, even for a galaxywith elliptical isophotes, shapes must be measured with weighted moments to reduce noise,and the measured shape will depend on the shape of the weight function. The assignmentof “shapes” to individual galaxies is eﬀectively the assignment of a single estimate of thelocal shear from each galaxy image, along with the assumption that the best ensembleestimator is the weighted mean of the point shear estimators. For this reason, comparisonof galaxy shapes measured with diﬀerent algorithms or in diﬀerent surveys is rarely useful,and ensemble shear statistics provide the only meaningful comparison.Shear systematics are often categorized into “multiplicative” or “additive” (in terms ofwhat they do to ensemble shear statistics). Additive systematics can have quite diﬀerentscale dependence from lensing shear correlations, depending on their physical origin. In Sec-tion 4 I discuss methods for empirically identifying additive bias. In contrast, multiplicativebias cannot be easily identiﬁed through null tests; typically simulations are required. Whileexact requirements vary depending on the details of the survey and the assumptions madeabout the weak lensing analysis, typically the upcoming Stage IV surveys require under-standing of the shear calibration at the level of ∼ × − in order to avoid this systematicuncertainty dominating over the statistical uncertainties in the measurement. This is afactor of several smaller than the requirements for the measurements with the full areas ofongoing surveys, and includes all sources of multiplicative calibration uncertainties (e.g.,PSF modeling errors), not just those due to the insuﬃciency of the PSF correction method.The past ﬁve years have seen a shift in how the ﬁeld approaches shear estimation.From the mid-1990s until ∼ https://github.com/GalSim-developers/GalSim •• Weak lensing 19 or the ensemble shear. During that time period, the typical magnitude of shear calibrationbiases decreased by a factor of a few. However, by that point it was becoming increasinglyobvious that the “measure galaxy shapes and average them to get the shear” approach hasfundamental ﬂaws from a mathematical perspective. An example is noise bias, wherein themaximum-likelihood estimate of per-galaxy shapes at ﬁnite signal-to-noise is biased becausenoise changes the shape of the likelihood surface (Bernstein & Jarvis 2002; Hirata et al.2004; Kacprzak et al. 2012; Melchior & Viola 2012; Refregier et al. 2012). Another exampleis model bias, which arises from the failure of model assumptions to describe real galaxylight proﬁles (e.g., Voigt & Bridle 2010; Melchior et al. 2010). Selection bias (Sec. 2.5) isanother limitation of this approach.Some proposed solutions for model and noise bias compete with each other: increasingmodel complexity may decrease model bias, while increasing noise bias due to the need toconstrain additional degrees of freedom. Shear estimation methods based on measurementsof per-galaxy shapes must balance these two considerations, with a ﬁnite amount of bothbiases in the ensemble shear estimates. Any method based on the use of second moments toestimate shears cannot be completely independent of the details of the galaxy light proﬁles,such as the overall galaxy morphology and presence of detailed substructure (Massey et al.2007b; Bernstein 2010; Zhang & Komatsu 2011). Nor is noise bias avoidable: given thelarge intrinsic galaxy shape dispersion, lensing measurements must include galaxies downto relatively low signal-to-noise detections to achieve a reasonable overall signal-to-noise inthe ensemble shear statistics.Given the recent understanding of this situation, the community has sought other ap-proaches to reliable ensemble shear estimation. There are four general classes of approach,some of which are compatible with each other.

Image simulations to estimate and remove calibration biases:

Image simula-tions enable the estimation of biases in the shear signal due to the intrinsic limitations ofthe adopted shear estimation method. Given that shear biases depend on detailed galaxymorphologies (beyond second moments) and on the PSFs, there has been a move towardsever greater realism in the image simulations used by ongoing surveys (e.g., Zuntz et al.2017). For example, several works have argued that one must include nearby structurearound the galaxies in order to accurately predict shear biases due to nearby objects andunrecognized blends (Hoekstra et al. 2015; Hoekstra, Viola & Herbonnet 2017; Mandelbaumet al. 2017a), must account for variation of these biases with observing conditions acrossthe survey, and have identiﬁed other key factors in image simulations for shear calibration.This approach will be challenging to take to the limit of future surveys, given our limitedknowledge of galaxies, although the survey data itself provides a form of sanity check onthe accuracy of the simulations and perhaps could enable an interative process to improvethe simulations (see, e.g., the sfit method: Mandelbaum et al. 2015). To ensure that thestatistical error on the derived bias corrections is a subdominant part of the overall errorbudget, it is necessary to simulate many more galaxies than exist in the survey itself. More-over, use of calibrations as the sole way of estimating and removing shear biases does notprovide an independent cross-check on the results (unlike, e.g., using one of the methodsof calibrating the shear below, and then using simulations to validate that method as across-check).

Self-calibration:

Recently, methods have been devised to calibrate ensemble shearstatistics based on manipulations of the real images (“metacalibration”: Huﬀ & Mandel-baum 2017; Sheldon & Huﬀ 2017). Metacalibration provides a way to determine the re-

20 Mandelbaum ponse of an ensemble shear estimator for the real galaxy population in the data. Thispotentially enables direct removal of selection biases, depending on what stage of the imageprocessing metacalibration is inserted into. The fact that it does not require assumptionsabout galaxy morphology is a clear virtue of this approach over image simulations.

CMB lensing:

Since CMB lensing has very diﬀerent observational systematics anda perfectly known source redshift, it is an attractive method for testing galaxy lensing(Vallinotto 2012; Das, Errard & Spergel 2013). Comparison of galaxy lensing with CMBlensing should not be thought of as a test of galaxy shear estimation, but rather of acombination of shear estimation and photometric redshift biases that both modify thelensing signal amplitude. When measuring cross-correlations between CMB and galaxylensing, intrinsic alignments (Section 3.4) are a contaminant that must be modeled (Hall& Taylor 2014; Troxel & Ishak 2014; Chisari et al. 2015), unlike for correlations betweenthe galaxy positions and CMB lensing. While current imaging and CMB surveys can onlyprovide a ∼ Paradigm shift:

The ﬁnal approach described here, and perhaps the most principledone, is that since the meaning of per-galaxy shapes is questionable (given ellipticity gradientsand other eﬀects) and the approach of averaging them is fundamentally mathematicallyﬂawed, we should stop doing this. Instead, we should directly infer ensemble shear statisticsin a way that avoids these assumptions, using the actual posterior shear estimate from eachgalaxy without assuming that an ellipticity is an unbiased proxy for it. Schneider et al.(2015) explored hierarchical inference of the shear, which involves parametric model ﬁtsthat are then used to infer ensemble shear given a prior; this appeared promising, butrequires further development due to the computational expense of the approach. Bernstein& Armstrong (2014) presented a Fourier-space Bayesian shear estimation method that doesnot involve averaging galaxy shapes, and should not be susceptible to either model or noisebias. This method involves measurements of moments in Fourier space for the galaxy sampleto be used, and the construction of a prior for what the unlensed distribution of momentslooks like using a deep subset of the same survey. Together, these can be used to infer theensemble shears. The method was developed in subsequent work (Bernstein et al. 2016)to bring it closer to a practical shear estimator for use in real data, and the self-consistentmodeling of photometric redshifts, selection biases, and measurements in multi-epoch dataseems possible in principle. While work is needed to fully demonstrate the utility of thesemethods that overthrow the traditional paradigm in real data, particularly the extension tounrecognized blends that are not at the same redshift, their mathematical justiﬁcation isunquestionable. A ﬁrst application of the method from Bernstein et al. (2016) to the HSCsurvey will be presented in Armstrong et al. in prep.

It is becoming increasingly common for weak lensing surveys to use two shear estimationmethods with diﬀerent assumptions (e.g., DES Y1 results in Zuntz et al. 2017), relying onthe comparison to provide some support for the reliability of survey results. A combination •• Weak lensing 21 f a “traditional method” calibrated using method (1) or (2), compared with a method inclass (4), and an external calibrator like CMB lensing (3), may be necessary to fully justifya belief in the results of Stage-IV lensing surveys at the level of their statistical errors (i.e.,without addition of a substantial systematic error budget).

In this section, I discuss the calculation of photometric redshifts , or photo- z ’s. For weaklensing, what primarily matters is the ability to infer the true redshift distribution for aphoto- z -selected sample of galaxies. In other words, there are strong requirements on ourknowledge of the photometric redshift errors. I will discuss methods to accurately calibratethe redshift distributions in Section 3.2, while in this section I focus on the photometricredshift estimation itself.The ﬁrst step in calculation of photometric redshifts is to measure the input data, whichmost commonly consists of ﬂux measurements in several photometric bands . For example,consider a galaxy without any color gradients. If the PSF is the same in all bands, aperturephotometry might be a perfectly reasonable way to get stable color estimates. Given thatthe PSF typically diﬀers between the bands, aperture photometry will not give stable colorestimates unless the aperture size is large compared to the PSF in the band with the worstseeing, which would result in quite low S/N. As a result, typically some form of PSF-matched aperture photometry (e.g., Hildebrandt et al. 2012) or forced model photometry(with the same model used in each band; e.g., Tanaka et al. 2017) gives better results. Moregenerally, the multi-band photometry must measure light from the same physical area of agalaxy to properly estimate the SED, even if that light comes from a subset of the galaxy(chosen consistently across the bands).Ideally, these measurements should be made in a way that reduces sensitivity to sys-tematics such as Galactic extinction, seeing, and other observational or astrophysical eﬀectswith coherent patterns on the sky. There are some low-level systematics to consider in cal-culation of the photometry, e.g., variation of the bandpasses across the ﬁeld-of-view andphotometric calibration across the survey including color eﬀects (Li et al. 2016; Burke et al.2017). For these and other spatially-varying issues such as Galactic extinction, it is notgenerally the RMS error that is relevant, but rather the spatial correlation function of theerrors, which will determine the scales on which the measured two-point correlations willshow signatures of these systematics. See Ilbert et al. (2009) for a discussion of technicalconsiderations such as uncertainty in photometry/ﬁlter curves.There are several classes of photometric redshift methods; see Hildebrandt et al. (2010)for a summary of many methods, and Tanaka et al. (2017) and S´anchez et al. (2014) for themethods used for HSC and DES Science Veriﬁcation, respectively. The two main classesof methods are (1) template-ﬁtting methods, which rely on a set of templates for galaxySEDs that are used to predict the galaxy photometry as a function of redshift, and can be Many methods produce a photometric redshift posterior probability p ( z ) rather than a singlepoint estimate. I will nonetheless refer to these indiscriminately as photometric redshifts or photo- z ’s. Several papers have suggested using additional morphological information, such as sizes andshapes (e.g., most recently, Soo et al. 2017). Given that these correlate with lensing shear, it isunclear what the impact of this would be for cosmology analyses. For example, if the photo- z errorsbecome systematically correlated with the lensing shear, this could be problematic to correct.

22 Mandelbaum ompared with the observed photometry (for a summary, see Ilbert et al. 2006); and (2)machine learning methods, which empirically learn the relationship between photometryand redshift based on a training sample. The key issues for template-ﬁtting methods areinsuﬃciency of the templates to accurately describe the full span of the real data, while thekey issues for machine learning methods are the diﬃculty in generalizing to samples that donot look like the training data. Both of these limitations would be eliminated if we had avery large, perfectly representative spectroscopic training sample – which highlights the factthat the primary limitation for modern photometric redshift methods is the insuﬃciencyand/or non-representativeness of spectroscopic redshift samples to the depth of the lensingsurveys (Newman et al. 2015).There are multiple problems with existing spectroscopic samples. Some regions of colorand magnitude space are not well-covered by spectroscopic redshift samples, particularly atthe faint end. In principle, reweighting schemes (Lima et al. 2008; Cunha et al. 2009) couldmitigate this limitation when training and/or calibrating photometric redshifts, as long asall regions of color and magnitude space have some objects. Unfortunately, this solutionmay not work because it is not obvious that spectroscopic redshift successes and failuresat ﬁxed color and magnitude have the same redshift distribution. This problem is muchharder to detect without e.g. obtaining spectra from a diﬀerent spectrograph that has adiﬀerent range of wavelengths and sensitivity. Also, since the galaxy samples used for weaklensing have additional selection criteria imposed besides cuts on color and magnitude, itmay be necessary to consider this higher dimensional space when training and calibratingphotometric redshifts for lensing (e.g., Hoyle et al. 2017; Medezinski et al. 2017). Tryingto match this higher dimensional space is challenging given the limited size of current deepspectroscopic samples. Finally, it is possible that some selection criteria used for targetinggalaxies for spectra can induce additional non-negligible biases in the redshift distribution,which is problematic when using those redshifts for spectroscopic training and calibration(e.g., Gruen & Brimioulle 2017).One outstanding problem in the ﬁeld is photometric redshift training in the presence ofunrecognized blends (see Section 2.3). This is a non-trivial problem that requires additionalattention from the ﬁeld as we move towards deeper surveys. One approach may be to ignorethis issue in training, and fold it into the catastrophic failure rate when calibrating thephoto- z ; this places greater demands on the calibration strategy. In addition, the existenceof shear selection biases induced by photo- z complicates the analysis of tomographic shearcorrelations; see Troxel et al. (2017) for a recent example with mitigation schemes. Describing the survey coverage requires a way to describe the exact location of its bound-aries – not just edges but also internal boundaries due to e.g. masking bright stars – and thespatial dependence of quantities that determine systematic errors and/or galaxy numberdensities, such as the depth, PSF size, etc. Several software frameworks have been devel-oped to describe survey geometry, typically with some ﬂexible hierarchical description ofgeometry. These include Healpix (G´orski et al. 2002), Mangle (Swanson et al. 2012), andSTOMP (Scranton, Krughoﬀ & Connolly 2007).There are several places where these descriptions are needed. First, maps of the spatial https://github.com/healpy/healpy •• Weak lensing 23 ependence of systematics can be correlated against quantities of scientiﬁc interest (e.g.,photo- z ’s or shear estimates) to identify which systematics are most relevant and need fur-ther improvement. An example of map-level systematics investigation in the HSC surveywas carried out by Oguri et al. (2017). Second, coverage maps can be useful to generatemock observations that have the same coverage as the real data. Since survey boundariescan lead to selection biases and to leakage between E and B-mode power, mock catalogs withthe same boundaries can be valuable for systematics investigations. Finally, the optimalestimator for galaxy-galaxy correlations (Landy & Szalay 1993) and galaxy-shear correla-tions (Singh et al. 2017) requires random points with the same spatial coverage as the realgalaxies (but with correlation function equal to zero). This need arises because the optimalestimator involves correlation of the overdensity rather than the density itself, so in eachcase where the galaxy ﬁeld is used, a random ﬁeld is needed also. Moreover, for galaxy-shear correlations, the subtraction of shear around random points not only produces a moreoptimal estimator, but is useful for subtraction of systematics (Mandelbaum et al. 2005)if the number density-dependence on systematics-generating quantities is faithfully repro-duced in the random sample (Mandelbaum et al. 2013). Morrison & Hildebrandt (2015)have demonstrated the impact of systematic variation of galaxy number densities with ob-servational parameters such as depth, extinction, and so on, and the need to model thesedependencies beyond linear order to accurately estimate angular correlations from largeimaging surveys. This is relevant both for galaxy-galaxy and galaxy-shear correlations thatgo into a cosmological weak lensing analysis.The above statements about the need to faithfully reproduce survey boundaries and thedependence of galaxy density on observational conditions is related to arguments made inthe literature about the so-called “boost factor” that accounts for the contamination due to(unlensed) physically-associated galaxies used as sources in galaxy-galaxy or cluster-galaxylensing measurements. This idea was introduced by Sheldon et al. 2004. The diﬃculty inusing this formalism for small-area surveys in practice was presented by Medezinski et al.(2017) and Melchior et al. (2017), with an alternative formalism involving explicit modelingof the smooth redshift distribution and the contribution from physically-associated galaxiesgiven by Gruen et al. (2014). An additional complication is the need to trace the possiblediﬃculties detecting source galaxies in high-density regions (e.g., in galaxy clusters, due tothe obscuration of background galaxies by foregrounds; Simet & Mandelbaum 2015).

3. FROM CATALOGS TO SCIENCE

This section covers the steps in a weak lensing analysis from catalogs to cosmological pa-rameters. It is in this phase of the analysis that we must include steps for mitigation ofastrophysical uncertainties and any residual observational systematics.

Here I assume the availability of a set of galaxy positions on the sky, per-object shearestimates deﬁned as in Sec. 1, and photometric redshifts. The estimator for the reducedshear will be denoted ˆ g . Typically the coordinate system for ˆ g is deﬁned such that positiveˆ g corresponds to an East-West or North-South elongation, while ˆ g is deﬁned at 45 ◦ withrespect to that axis. The focus of this section is how to combine these quantities andmeasure statistics that are cleanly related to the matter distribution.

24 Mandelbaum or shear-shear correlations, galaxies are divided into tomographic bins based on thephotometric redshifts. Pairs of galaxies are identiﬁed, and their separation on the sky iscalculated, including the angle with respect to the sky coordinate axes: separation on thesky | θ | and polar angle φ . For each pair, the relevant shear components are tangentialˆ g + and cross ˆ g × , with the convention that tangential shear around overdensities results in (cid:104) ˆ g + (cid:105) > (cid:104) ˆ g + (cid:105) <

0. For one of the galaxieswith shape ˆ g , we obtain ˆ g + = − Real[ˆ g exp ( − φ )] (7)ˆ g × = − Imag[ˆ g exp ( − φ )] . (8)The estimator for the shear correlation functions ξ ± in that tomographic bin is (Schneideret al. 2002) ˆ ξ ± ( θ ) = (cid:104) ˆ g + ˆ g + (cid:105) ( θ ) ± (cid:104) ˆ g × ˆ g × (cid:105) ( θ ) (9)with (cid:104) ˆ g + ˆ g × (cid:105) = 0 due to parity symmetry, and the averages being weighted averages (typi-cally inverse variance weighting, including the intrinsic shape noise and measurement error).This realistic estimator is insensitive to survey masks and boundaries. The theoretical pre-diction for ξ ± ( θ ) can be derived as Hankel transforms of the convergence power spectrum, ξ ± ( θ ) = (cid:90) (cid:96) d (cid:96) π J / ( (cid:96)θ )[ P (E) κ ( (cid:96) ) ± P (B) κ ( (cid:96) )] . (10)To lowest order, lensing produces only E-mode power (a pure gradient ﬁeld), but thereare low-level physical eﬀects that cause B modes (corresponding to a curl component; seeSection 3.3). Certain systematics can manifest as mixes of E and B modes, and detectionof B-mode power is one way to identify those systematics; however, not all systematicsproduce B modes.Since lensing produces primarily E-mode power, and the power estimated in each (cid:96) binshould be roughly independent, there is interest in directly estimating the power spectrum.However, the most naive way of doing so involves measuring shear correlations over all scales,and in practice, the lack of pairs on small scales and the ﬁnite sizes of lensing surveys leadsto a mixing of E and B modes (Kilbinger, Schneider & Eiﬂer 2006). There are so-calledpseudo-power spectrum estimators (e.g., Hikage et al. 2011) that aim to mitigate this eﬀectand enable direct estimation of the power spectrum. There are additional conﬁguration-space estimators, and estimators that combine the estimated ˆ ξ ± ( θ ) with various ﬁlters inways that are meant to be more optimal (e.g., Asgari, Schneider & Simon 2012). No matterwhat estimator is used, models for systematic uncertainties must be re-expressed in termsof those estimators in order to marginalize over and remove the uncertainties.The above discussion was focused on shear-shear correlation functions. However, asmentioned previously, the canonical weak lensing analysis for future surveys will includegalaxy-shear and galaxy-galaxy correlations, deﬁned within tomographic bins in analogousways. When constructing these estimators using the galaxy overdensity ﬁeld, they havecontributions from both clustering and magniﬁcation. Bernstein (2009) presents the rela-tionship between empirical estimators of these three two-point correlation functions and theunderlying theoretical quantities: lensing shear, magniﬁcation, and galaxy overdensity.When choosing estimators for the galaxy-shear and galaxy-galaxy correlations, thereare diﬀerent philosophical approaches. On small scales, these correlations depend on howgalaxies populate dark matter halos. One family of estimators removes the small-scale •• Weak lensing 25 nformation to avoid systematic uncertainty in cosmological constraints due to astrophysicaldetails (e.g., Baldauf et al. 2010). Other approaches are to include the small scales, buildmodels with nuisance astrophysical parameters, and marginalize over them (e.g., Yoo et al.2006; van den Bosch et al. 2013). The choice of which type of estimator to use dependson the users optimism in their ability to describe these astrophysical uncertainties withsuﬃcient realism to avoid substantial systematic errors while using a simple model.Data compression from shear correlations or power spectra may be possible and evendesirable. The number of data points in the estimator places serious demands on covari-ance matrix estimation (Section 3.6) and the cosmological parameter inference method(Section 3.7). For that reason, investigation of data compression methods such as thoserecently proposed for galaxy power spectra may be beneﬁcial (Gualdi et al. 2017).Finally, for the case of shear-shear correlations, a 3D lensing approach that avoids theneed for tomographic binning has been proposed and used in real data (Simon, Taylor &Hartlap 2009; Kitching, Heavens & Miller 2011; Kitching et al. 2014). However, futurework is needed on how to use this in a joint analysis with galaxy-shear and galaxy-galaxycorrelations, and properly marginalize over systematics.

Section 2.8 described photometric redshifts, deﬁned either as point estimates or posteriorprobability distributions p ( z ). This section will explain how they are used for science. Avariety of schemes exist for dividing galaxies into tomographic bins, e.g., based on division ofthe sample using the point photo- z estimates. Determination of the true ensemble redshiftdistribution , or N ( z ), is critical for cosmological analyses. To lowest order, weak lensingis primarily sensitive to the mean redshift and the width of the redshift distribution ofeach tomographic bin (Amara & R´efr´egier 2007); this fact is often used to motivate hownuisance parameters for redshift uncertainty are included in the cosmological analysis (e.g.,DES Collaboration et al. 2017). The inclusion of catastrophic photometric redshift errorscomplicates this issue (Hearin et al. 2010).In general, spectroscopic redshifts are needed for photo- z training (Sec. 2.8) and cal-ibration, where the type of redshift samples needed for these purposes diﬀers (Newmanet al. 2015). The typical required redshift sample size is of order 10 in order to reduce thesystematic uncertainty on mean redshifts in tomographic bins to the ∼ − level that isneeded to avoid Stage IV surveys being systematically biased at a level exceeding the statis-tical uncertainties. There are two methods for using spectroscopic redshifts to calibrate the N ( z ) of photometric redshift samples. The ﬁrst is to reweight the spectroscopic redshiftsto match the observed properties of the photometric sample (Lima et al. 2008) and directlyinfer the N ( z ), though there is some debate as to which sample properties should be usedfor that reweighting (see, e.g., Medezinski et al. 2017). Previous studies have explored thespectroscopic redshift sample size needed for direct calibration of N ( z ), without (Ma &Bernstein 2008) and with catastrophic errors (Sun et al. 2009; Bernstein & Huterer 2010).Generically, this method requires a spectroscopic redshift sample that covers all of the pho-tometric color and magnitude space (not necessarily evenly, with reweighting accounting for While it is tempting to stack the per-object p ( z ), which is a mathematically acceptable approachto using spectroscopic redshifts, stacking per-object p ( z ) violates the deﬁnitions of probability (Malzet al. in prep. ). It is nonetheless often done.

26 Mandelbaum he non-representativeness of the spectroscopic redshift sample). An additional assumptionis that the N ( z ) at ﬁxed color and magnitude is the same for spectroscopic successes andfailures, which is likely incorrect at some level. The resulting systematic uncertainty isdiﬃcult to estimate and is often ignored for current datasets. Beck et al. (2017) proposes aframework for exploring this assumption for shallow surveys. The extension of this test todeeper surveys (where degeneracies between high and low redshift may be more importantin determining spectroscopic success) is of critical importance for future surveys that wishto rely on spectroscopic reweighting to determine the N ( z ) of photometric redshift samples.The second method is to use the cross-correlation between the photometric redshiftsample and some non-representative spectroscopic redshift sample covering the full redshiftrange of the photometric sample with large enough area and sampling rate to allow the clus-tering cross-correlation to be well-determined. Several variations on the cross-correlationor clustering redshift method have been proposed (Newman 2008; Benjamin et al. 2010;McQuinn & White 2013; M´enard et al. 2013; Schmidt et al. 2013). Diﬀerences betweenthem include the choice of scales to use (purely linear bias scales, or small scales as well);the method of modeling the galaxy bias for the photometric sample; and the corrections formagniﬁcation bias, which induces nonzero correlations between galaxies in bins that trulyare separated in redshift. Recent results using this approach include Choi et al. (2016);Morrison et al. (2017); Johnson et al. (2017).Because of the diﬀerent assumptions behind these two methods, DES and KiDS usedboth to calibrate their N ( z ) (Hoyle et al. 2017; Hildebrandt et al. 2017), though for DES theyused subsets of luminous red galaxies with high-quality photo- z ’s rather than spectroscopicredshifts when carrying out the cross-correlation analysis, and also used high-quality 30-band COSMOS photo- z ’s for the direct N ( z ) calibration. It seems likely that in future, bothmethods will continue to be used so as to have a cross-check on the resulting calibrated N ( z ).The needs for additional spectroscopic redshift samples for photo- z training and cali-bration for future surveys is summarized in Newman et al. (2015). Techniques have beenproposed for how to identify the regions of color/magnitude space that should be targetedto ﬁll in missing regions of parameter space (including self-organizing maps, Masters et al.2015, which were used for targeting a new spectroscopic survey in Masters et al. 2017).Finally, the diﬃculty in calibrating N ( z ) is connected to the exact analysis being done.Use of galaxy-shear, galaxy-galaxy, and shear-shear correlations together may result in lessstringent needs for spectroscopic redshift calibration samples, while the need to jointlymodel intrinsic alignments may result in more stringent requirements for how well we un-derstand photometric redshift uncertainties (Joachimi & Schneider 2009). An additionalissue of relevance especially for deep ground-based surveys is the role of blending systemat-ics (Section 2.3), which have the potential to increase the catastrophic photometric redshifterror rate. Since small-area spectroscopic redshift samples may have targeting criteria thatavoid obvious blends, the impact of blending on photometric redshift errors may need tobe assessed primarily through the cross-correlation method. To constrain cosmology with weak lensing measurements, theoretical predictions with anaccuracy of ∼

1% over a wide range of scales and cosmological parameters are needed (e.g.,Huterer & Takada 2005). To interpret the shear-shear correlations alone, predictions for ••

1% over a wide range of scales and cosmological parameters are needed (e.g.,Huterer & Takada 2005). To interpret the shear-shear correlations alone, predictions for •• Weak lensing 27 he distribution of dark matter are needed, while joint interpretation with galaxy-shear andgalaxy-galaxy correlations requires a way of describing the distribution of galaxies.Weak lensing measurements typically go quite far into the nonlinear regime, so an ac-curate description of the nonlinear matter power spectrum is required. This descriptioncan come from large suites of N -body simulations with many values of cosmological pa-rameters, and some manner of interpolating between the values of parameters for whichsimulations were generated. One option is to use simulations to calibrate a ﬁtting formula(e.g., haloﬁt: Takahashi et al. 2012). Another approach is to use an emulator, such as Heit-mann et al. (2014), which interpolates over cosmological parameter space using Gaussianprocesses. While ﬁtting formulae and emulators have tremendous value in enabling fast,accurate calculations of the matter power spectrum, using simulations directly can help (a)enable inclusion of physical eﬀects that might be diﬃcult to incorporate through an analyticapproach and which depend on cosmology, such as density-dependent selection eﬀects (e.g.,Hartlap et al. 2011); (b) allow for joint modeling with galaxy correlations; and (c) includehigher-order theoretical nuances.When modeling the galaxy-shear and galaxy-galaxy correlations, the simplest assump-tion to make is that the galaxy bias is linear (galaxy and matter overdensities are re-lated as δ g = b δ ) and that the galaxy and matter overdensities are perfectly correlated, r cc = P gm / (cid:112) P gg P mm = 1. These simple assumptions are valid at large separations, and failfor a variety of reasons on small scales. They were used in the joint analyses of the threegalaxy and shear auto- and cross-correlations from DES and KiDS (DES Collaborationet al. 2017; van Uitert et al. 2017). In DES, to avoid sensitivity to systematics from thelinear bias assumption, the choice was made to limit the analysis to relatively large scales, > > h − Mpc for galaxy-galaxy and galaxy-shear correlations, respectively. In bothcases, tests were carried out to assess the sensitivity of the results to this assumption.For future surveys, the measurements will have suﬃcient signal-to-noise that it will benecessary to adopt more realistic models. One option is a perturbation theory-based modelfor b ( k ) and r cc ( k ) (e.g., Baldauf et al. 2010), which has been used for a galaxy-shear andgalaxy-galaxy joint analysis in SDSS (Mandelbaum et al. 2013). Another option is a halomodel approach, which provides a numerical description for the galaxy-matter and galaxy-galaxy correlations based on how galaxies populate dark matter halos (e.g., Yoo et al. 2006;van den Bosch et al. 2013), and which has been used in practice for interpretation of BOSSgalaxy lensing and clustering (More et al. 2015). As mentioned in Section 3.1, the choice ofthe estimator to use for the measurement is related to the question of how the modeling is tobe done, because some model descriptions can go to smaller scales than others. In addition,it may be necessary to consider how higher-order complexities like assembly bias (whereinthe galaxy bias depends on more than just the mass) complicates the joint analysis ofgalaxy and shear correlations, speciﬁcally assumptions about r cc and its scale dependence.Preliminary steps towards understanding this issue have already been made (e.g., McEwen& Weinberg 2016). For future surveys, calibration of how b ( k ) and r cc ( k ) are modeledagainst realistic mock galaxy catalogs will be crucial for choosing what range of scales canbe used and ensuring accurate cosmological constraints.For the prediction of projected lensing statistics, there are a number of low-level theo-retical issues that have not been relevant for past and ongoing lensing surveys, but whichmay require attention in upcoming surveys. These include the distinction between shearand reduced shear, the impact of several approximations (ﬂat sky, Born, Limber, linearizedgravity, and Hankel transform) and higher order lensing terms, lens-lens coupling, and

28 Mandelbaum .22 0.26 0.30 0.82 0.86 0.90 0.60 0.70 0.80 0.90 − − − − − . . . . . . . − . − . − . − . − . . . . − − − − − − LSST no sysLSST HF G impactLSST HF G blue+red impactLSST HF D impactLSST HF G blue+red marg Ω m σ h w w a w a w h σ P r ob Figure 7: An illustration of the impact of intrinsic alignments on cosmological parameterconstraints with weak lensing in LSST, using shear-shear correlations only (from Krause,Eiﬂer & Blazek 2016). The bottom left triangle shows the 2D contours for cosmologicalparameters, where the black curve shows the case of no intrinsic alignments; red, green, andblue curves show intrinsic alignments predictions with diﬀerent ways of modeling the align-ments of blue and red galaxy populations and their luminosity evolution; and the orangecurve shows how the constraints become less tight when marginalizing over the intrinsicalignments. The top row shows the posterior probabilities for each of the cosmologicalparameters. Clearly the biases without marginalization are unacceptably large.source clustering-induced B modes (Bernardeau 1998; Schneider, van Waerbeke & Mellier2002; Dodelson, Shapiro & White 2006; Hilbert et al. 2009; Bernardeau, Bonvin & Vernizzi2010; Krause & Hirata 2010; Giblin et al. 2017; Kilbinger et al. 2017; Kitching et al. 2017;Lemos, Challinor & Efstathiou 2017; Petri, Haiman & May 2017). Fast methods have beendeveloped for ray-tracing through N -body simulations (Barreira et al. 2016) to avoid someof these approximations, and to incorporate some of the second-order eﬀects (Becker 2013).These eﬀects can enter in diﬀerent ways to the galaxy-shear correlations, e.g., because oflensing deﬂections modifying observed positions. Since recent reviews have covered the physics of intrinsic alignments, their impact on weaklensing cosmology, theoretical models, and observations (Joachimi et al. 2015; Kirk et al. •• Weak lensing 29 N -body simulations. Depending on assumptions made about the galaxypopulation and the alignments of its baryonic components with the underlying matter ﬁeld,the predicted level of alignments can vary by orders of magnitude (e.g., Heymans et al.2006b). However, observations have substantially narrowed this wide variation by placingconstraints on the large-scale alignment model for red galaxies, and (so far) null detectionsof large-scale shape alignments for blue galaxies. Since red galaxies exhibit alignmentsconsistent with their shapes being aligned with the shapes of the inner regions of theirhalos, high-resolution N -body simulations may indeed be populated with red galaxies thathave realistic alignments with the underlying matter density ﬁeld (e.g., Schneider, Frenk &Cole 2012). Recent work has also included comparison of measured galaxy alignments withhigh-resolution, large-volume hydrodynamic simulations that include the physics of galaxyformation (Velliscig et al. 2015; Tenneti, Mandelbaum & Di Matteo 2016; Chisari et al.2017; Hilbert et al. 2017); the simulations broadly reproduce many of the alignment trendsseen in real data, but not all of them.Initial eﬀorts to remove intrinsic alignments from lensing measurements focused on theremoval of close galaxy pairs (in 3D: King & Schneider 2002; Heymans & Heavens 2003).However, intrinsic alignments can coherently anti-align galaxies that are well-separatedalong the line-of-sight. Hirata & Seljak (2004) highlighted the importance of this eﬀect,and subsequent observational work conﬁrmed that it is the dominant impact of intrinsicalignments on weak lensing measurements; but it cannot be eliminated by removing galaxypairs at the same redshift from the lensing sample. Based on recent observational constraints(e.g., Singh & Mandelbaum 2016), intrinsic alignments will be an important systematic thatsurveys like LSST must mitigate (Figure 7). Eﬀorts to remove this systematic typicallyinclude joint modeling or self-calibration using joint analysis of galaxy-galaxy, galaxy-shear,and shear-shear correlations (Joachimi & Bridle 2010; Yao et al. 2017). These approachesrely on the fact that the various contributing terms have diﬀerent redshift dependencies,spending some of the statistical constraining power of the data to marginalize over theintrinsic alignments terms.Current work on intrinsic alignments includes attempts at better observational con-straints (requiring redshift estimates and shape measurements), model building (e.g.,Blazek, Vlah & Seljak 2015; Blazek et al. 2017), and tests of mitigation methods. Ofparticular value would be large-area spectroscopic samples that would enable better priorsto be placed on the parameters of intrinsic alignment models at redshift z (cid:38) .

30 Mandelbaum .5. Baryonic eﬀects

The impact of the physics of galaxy formation on weak lensing observables has been a subjectof study for more than a decade. Unfortunately, thorough investigation of this topic requireshydrodynamic simulations that have realistically complicated models of galaxy formation(without the over-cooling problem), high enough resolution to ensure their convergencefor typical galaxy masses, and large enough volume to study the impact on the matterpower spectrum on cosmological distance scales. This combination of scenarios has onlyrecently become possible, in families of very expensive high-resolution simulations with boxlengths of order 100 Mpc, including the EAGLE simulations (Schaye et al. 2015), Illustris(Vogelsberger et al. 2014), and MassiveBlack-II (Khandai et al. 2015).One approach to account for the impact of baryons on the matter power spectrum is toinclude them in a perturbation theory-based model for the power spectrum, with baryonicphysics producing higher-order terms that can be marginalized over (Mohammed & Seljak2014). There is also a halo model approach, which has nuisance parameters describingthe change in internal structure of dark matter halos, speciﬁcally their concentration, dueto baryonic physics (Semboloni et al. 2011; Zentner et al. 2013). The extension of a halomodel to quite small scales is simpler in practice than the extension of the perturbationtheory-based model mentioned above, which requires the inclusion of many additional terms.Zentner et al. (2013) found that for future lensing surveys, additional mitigation may beneeded, possibly reﬂecting the fact that a change in halo concentration is not the onlyimpact of baryonic physics. A halo model approach with changes in halo concentrationsand a mass-dependent ‘halo bloating’ parameter (Mead et al. 2015) has been quite successfulin describing the matter power spectrum to small scales; the parameters of that model werecalibrated to maximize the ﬁdelity of reproduction of the matter power spectrum, ratherthan to accurately describe dark matter halo proﬁles. This approach was adopted by theKiDS survey (Hildebrandt et al. 2017) to model shear-shear correlation functions.Rather than adopting a physically-motivated approach, Eiﬂer et al. (2015) used an em-pirical PCA approach. Using a set of cosmological hydrodynamic simulations to constructPCA components that describe the impact of baryonic physics on the dark matter powerspectrum, they showed that excluding the ﬁrst four PCA components is suﬃcient to mit-igate the impact of baryonic physics on a shear-shear correlation function measurement,even going to relatively small scales ( (cid:96) ∼ (cid:96) or minimum usable θ . Additionalquestions for investigation include the interaction between marginalization over baryoniceﬀects and other systematics. For example, if intrinsic alignment models are constructedseparately for red and blue galaxies, such that theoretical models separately predict thesignal for the two populations and then take the appropriate weighted averages, then doesthe baryonic physics model for galaxy-shear and galaxy-galaxy terms also need to diﬀer ••

The process of inferring cosmological parameters given a set of measurements typicallyinvolves knowing the covariance matrix of those measurements, under the assumption thatthe likelihood function of the observable quantities is a Gaussian. Computing the covariancematrix for weak lensing measurements is a task for which multiple approaches exist in theliterature , and additional development will be needed for upcoming surveys. In principle,future surveys may have data vectors with of order 1000 points, considering some numberof tomographic bins, bins in angular scale or wavenumber, and the three diﬀerent types ofcorrelations to be measured. Several studies have argued that the number of simulationrealizations of upcoming surveys needed to estimate the covariance matrix with suﬃcientaccuracy through brute force methods is prohibitively large (Dodelson & Schneider 2013;Taylor, Joachimi & Kitching 2013), though see recent work by Sellentin & Heavens (2017)that argues those were signiﬁcantly overestimated in the case that one can parameterize thecovariance in some compact way and use the simulations to constrain that parameterization(see discussion below).The covariance matrix in general has shot noise terms and cosmic variance terms, in-cluding contributions from connected four-point functions and supersample covariance (Li,Hu & Takada 2014; Mohammed & Seljak 2014). See Singh et al. (2017) for a recent deriva-tion of the generic covariance expression for two-point correlations of either densities andoverdensities, and quantities such as shear. Because some of these terms are cosmology-dependent, in principle, the covariance matrix itself should be re-estimated at each step ofa likelihood analysis to constrain cosmology (Eiﬂer, Schneider & Hartlap 2009).Numerical estimation of the covariance matrix using theoretical expressions is a naturalway to incorporate the cosmology-dependence of the covariance. However, ensuring thenumerical stability of all terms in the covariance matrix estimation can be quite expen-sive (e.g., Krause et al. 2017). Hence, building an emulation tool that would enable fastestimation of these covariances would be highly valuable. Most lensing analyses to datehave not incorporated a full cosmology-dependent covariance, with the notable exceptionof Jee et al. (2013) for non-tomographic shear-shear correlations only. In the KiDS analysisthat included shear-shear, shear-galaxy, and galaxy-galaxy correlations (van Uitert et al.2017), the cosmology-dependence of the covariance was partially accounted for through aniterative procedure. While they did not vary the covariance at each step of their MCMC,they did use the best-ﬁtting cosmology from their ﬁrst MCMC to regenerate the covarianceand then rerun the ﬁtting procedure.Another approach that has seen popularity with past surveys is direct empirical estima-tion of the covariances, such as using the jackknife or bootstrap method, with the subsam-ples consisting of large contiguous regions within the survey (e.g., Mandelbaum et al. 2013).This approach has been rigorously compared with both numerical estimates of covariancesand with realistically complex mock catalogs for galaxy-shear and galaxy-galaxy correla-tions (Shirasaki et al. 2017; Singh et al. 2017), and has been found to be quite accurate for Some of these approaches were developed for shear-shear correlations, and the extension togalaxy-shear and galaxy-galaxy requires additional work.

32 Mandelbaum cales up to the size of the jackknife regions. A natural tension for this method is that theneed for the number of regions to signiﬁcantly exceed the number of data points motivatesthe use of many smaller regions, but use of smaller regions reduces the range of scales onwhich the jackknife can be accurately used, and causes a violation of the assumption ofregion independence. However, if a given survey conﬁguration allows the jackknife methodto be used, it can be useful in avoiding the need for many realizations of mock catalogs.The covariance matrix estimated in this way will be noisy, and since the inverse covarianceused for likelihood analysis is then biased, the sizes of cosmological parameter constraintsmust be corrected (e.g., Hirata et al. 2004; Hartlap, Simon & Schneider 2007) .In principle, using many simulation realizations of the survey provides a way to esti-mate covariance matrices. Similarly to the above empirical methods, each element of thecovariance matrix must be independently constrained, and hence longer data vectors posea greater challenge. For expected data vectors in the surveys of the 2020s, the number ofrealizations needed to do this as a function of cosmology is likely prohibitive, even assum-ing the expected increase in computing power available in the 2020s. The noise resultingfrom the limited number of realizations compared to the number of data points (Blot et al.2016) must also be taken into account, e.g., using the methods mentioned above for empir-ical covariances. However, it is more likely that a hybrid approach will be used, adoptingsome method for modeling the covariance and then constraining its (much smaller numberof) parameters using the simulations. For example, methods for modeling the precisionmatrix (inverse covariance matrix; Padmanabhan et al. 2016; Friedrich & Eiﬂer 2017) orperturbation theory approaches to the covariance (Mohammed, Seljak & Vlah 2017) maybe useful, in addition to simulation-calibrated versions of the numerical models mentionedabove. Techniques for fast mode resampling may also be useful in reducing the number ofsimulation realizations needed (Schneider et al. 2011). Finally, data compression methodsintroduced in Sec. 3.1 are also relevant here. In the coming years, it will be important thata way forward that works for the full tomographic shear-shear, galaxy-shear, and galaxy-galaxy analysis be validated and implemented; see Marian, Smith & Angulo (2015) for ademonstration of some issues that arise when combining galaxy-shear and galaxy-galaxycorrelations and estimating covariances. Ideally, a fast emulator (e.g., building on ideasfrom Morrison & Schneider 2013) would be constructed, to avoid cosmology-dependentcovariance matrix estimation being a primary limiting step in the ﬁnal likelihood analysis.

Cosmological inference from weak lensing with past and current datasets has typicallyinvolved the assumption that the likelihood function of the observables in Gaussian, anduse of some form of Markov Chain Monte Carlo (Lewis & Bridle 2002; Feroz, Hobson &Bridges 2009; Foreman-Mackey et al. 2013) to sample parameter space and identify best-ﬁtting parameters and conﬁdence intervals. It may also be important for future surveys toaccount for non-Gaussianity in the likelihood (e.g., Sato et al. 2009, 2011).One of the key challenges facing future surveys in the cosmological parameter inferencestep is the high dimensionality of the problem. The multiple tomographic bins and threecorrelations to jointly model produce of order 1000 data points. The number of systematicsthat must be marginalized over will be of order 100, in addition to of order 10 cosmologicalparameters. While alternative inference methods have been proposed (e.g., Jasche & Wan-delt 2013; Lin & Kilbinger 2015; Alsing et al. 2016), substantially more research must be •• Weak lensing 33 one to ascertain the feasibility of adopting them for future cosmological lensing surveys.Some of these methods come with the substantial beneﬁt that one can avoid the Gaussianlikelihood assumption and covariance matrix estimation, typically at the cost of requiringfairly realistic forward simulation techniques for the observable quantities.One issue that has received signiﬁcant attention in the weak lensing community isconﬁrmation bias (e.g., Croft & Dailey 2011), the solution for which is to carry out ablinded analysis until null tests are passed and decisions have been made about what rangeof scales and models to use. All three ongoing weak lensing surveys have adopted a blindcosmological analysis strategy (DES Collaboration et al. 2017; Hildebrandt et al. 2017;Mandelbaum et al. 2017b). The details depend on the survey and the analysis being carriedout, and most surveys adopt a combination of the following: (a) applying a randomlyselected calibration factor to the shears, (b) having multiple catalogs with randomly selectedcalibrations and only one person able to reveal which of those catalogs is the true one (zeroadditional calibration factor), (c) applying random calibration factors to the measured two-point correlations, (d) avoiding plotting the data against predictions from any cosmologicalmodel, (e) looking at MCMC results only after subtracting oﬀ the best-ﬁtting cosmologicalparameters, i.e., ∆Ω m rather than Ω m itself. Some of these strategies may only work withcurrent datasets (where cosmological parameter changes look close enough like calibrationoﬀsets), but not future data (where changes in shape of the two-point correlations withcosmology will be evident due to the smaller errors). The common adoption of blind analysismethods is a positive step forward for the ﬁeld of weak lensing, and current surveys shouldbe quite informative as to which methods are likely to work for future surveys.

4. DETECTING AND MODELING OBSERVATIONAL SYSTEMATICS

In this section, I discuss the classes of systematic errors, and the tests that can help revealthem using the data itself. For survey papers that use many systematics tests to revealobservational systematics, including ‘null tests’ (which should be zero in the absence ofsystematics), see Hildebrandt et al. (2017); Mandelbaum et al. (2017b); Zuntz et al. (2017).Here I will not discuss null tests for PSF modeling errors, which were thoroughly discussedin Sec. 2.1 of this review. Before considering speciﬁc null tests, it is worth noting a generalrule that null tests are often most informative when carried out after binning samples basedon any independent quantity that could be related to a potential systematic.As mentioned in Sec. 3.3, cosmological B modes are expected to be quite small, andhence a detection of non-zero B-mode power is typically interpreted as arising from system-atics. Unfortunately, B modes can have many origins, including PSF modeling errors, PSFcorrection errors, astrometric errors, and intrinsic alignments. Uncovering which of these isresponsible can be diﬃcult, and the correct mitigation scheme to use depends on the originof the eﬀect (Hildebrandt et al. 2017). Also, many systematics do not generate B modes,and hence a lack of B modes does not guarantee a systematics-free measurement.Another common diagnostic for additive biases, the star-galaxy correlation function, canbe nonzero for a number of reasons. This test involves correlating the shapes of stars withthe PSF-corrected galaxy shear estimates. Thus, both PSF modeling errors and insuﬃcientPSF correction of galaxy shapes can contribute. This test has been used in diﬀerent ways inprevious lensing measurements. The zero-lag star-galaxy correlation can be estimated usingthe PSF model shape for the “star” shape, and averaged within small regions (Heymanset al. 2012; Hildebrandt et al. 2017). Assigning an uncertainty on this quantity generally

34 Mandelbaum equires using mock catalogs that have a realistic level of cosmic variance and PSF modelvariation across the ﬁelds. This test was used in CFHTLenS to eliminate outlier ﬁeldsthat (for undetermined reasons) were too systematics-dominated to use for science. Analternative approach (Mandelbaum et al. 2017b) is to measure the full star-galaxy shapecorrelation as a function of separation, averaged over the entire survey. In principle, thisshould be the sum of terms from PSF modeling errors (related to the ρ statistics) and fromuncorrected PSF anisotropy. It provides a template for marginalization over additive errorsdue to these systematics; however, leakage across the star-galaxy boundary can result inthis correlation including cosmic shear as well.Correlations with systematics maps is another method that can enable the detectionof observational systematics (Chang et al. 2017; Oguri et al. 2017). This method involvesproducing lensing mass maps from the shear catalog, and maps corresponding to the valuesof any quantity that may be considered as a possible cause for weak lensing systematics(e.g., stellar density, PSF FWHM, PSF shape). The cross-correlation between the lensingand systematics maps should be zero in the absence of systematics. Map-level correlationscan be a more compact way to detect certain systematics, rather than re-computing all 2-point correlation null tests after dividing the sample into bins in seeing and other quantities(as was done in, e.g., Becker et al. 2016).Calculating average shears with respect to arbitrary locations that should not generatelensing shear is another common null test. For example, the average shapes of galaxies withrespect to the CCD coordinate system or the positions of stars should be zero (modulo noiseand the contamination of the star sample with galaxies). The caveat in the parenthesishighlights another important point, which is that the origins of deviations from zero fornull tests should be carefully considered. Sometimes the source of the signal observed iscompletely diﬀerent from what was originally intended.There are few null tests that are meant to speciﬁcally identify residual detector eﬀects.A recent example is the computation of PSF model size residuals as a function of stellarmagnitude (Mandelbaum et al. 2017b; Zuntz et al. 2017). Computing the mean shear inCCD coordinates for galaxies binned based on their CCD row/column can also be useful foridentifying detector systematics (e.g., Huﬀ et al. 2014; Zuntz et al. 2017). The developmentof more tests that can identify failure to correct for detector eﬀects or chromatic PSF eﬀectswould be useful for the next generation of surveys, which require a greater level of controlover those eﬀects.One useful tool to detect systematics due to nearby galaxies and/or due to failures inthe analysis pipeline is to inject fake galaxies into real data and rerun the analysis pipeline(Suchyta et al. 2016; Huang et al. 2017). Comparison of the measured properties of the fakegalaxies with the input ones can help diagnose problems with many steps of the analysispipeline (detection, deblending, photometry, shear estimation). The impact of the injectedgalaxies on the real ones can also be measured; while we do not know ground truth for thereal galaxies, the diﬀerence between the originally-measured properties and those measuredafter injection of the fake objects can be revealing (Samuroﬀ et al. 2017).Unfortunately, there is no observational test to identify failures in the absolute multi-plicative calibration of the ensemble shear signal, which is why the problem described inSection 2.7 has attracted so much attention. However, comparing subsamples of galaxies canreveal relative calibration biases between subsamples, modulo selection bias (Section 2.5),which makes the division into subsamples a potentially problematic null test. In otherwords, for this to be a useful test, the standard sources of bias such as noise bias, model •• Weak lensing 35 ias, and selection bias must be separately calibrated out for the subsamples in order touse this as a test for unrecognized/unknown systematics. One important aspect of shearcomparisons (whether between subsamples within a given survey, or between the same sam-ple of galaxies measured in two surveys) is that they should always happen at the level ofensemble shears, not per-galaxy shapes, for the reasons explained in Section 2.7. See Amonet al. (2017) for one example of a recent shear comparison, with methodology that should beapplicable elsewhere. This comparison can be done at the level of ensemble shear estimatesfor matched samples with a common set of photo- z ’s, to identify shear-related calibrationoﬀsets, or at a higher level that includes both photo- z and shear-related calibration oﬀsets.To identify and remove additive systematics due to physical eﬀects associated withspeciﬁc exposures or surveys (e.g., atmospheric PSFs, or a detector eﬀect), one possibleway forward is to cross-correlate shear maps from diﬀerent surveys or subsets of exposureswithin a single survey. In principle, this test could be extended not just to consistencytests (i.e., split the LSST exposures into two sets, and do a separate analysis in each one),but to detect and exclude data with potentially unknown systematic errors (i.e., througha jackknife process that involves sequentially excluding small portions of data and testingtheir statistical consistency with the rest).The above tests (and the ρ statistics described earlier in this review) and analysis ofsurvey simulations can be used to identify the presence of systematics that can contam-inate cosmological weak lensing analysis. They will also provide templates for residualsystematics that can be marginalized over when constraining cosmological parameters. Itis important that these template not only be scale-dependent, but also galaxy property-dependent and/or redshift-dependent, since most shear systematics will depend on thegalaxy properties at some level, and hence on the redshift. Indeed template marginaliza-tion is a popular method for removing theoretical systematics, but unlike for observationalsystematics, there are fewer null tests that can be done for theoretical systematics (withtypical tests being eliminating data in regions where the systematics should be worse, andtesting for consistency of results).

5. SUMMARY

The high-level goal of this review is to answer the following question: “What does the weaklensing community need to do in order to get to the point where surprising claims that aremade about dark energy with LSST, Euclid, or WFIRST will be believed?”The fact that LSST, Euclid, and WFIRST are designed in ways that result in diﬀerentdominant systematics is an important aspect of the landscape of the 2020s. For exam-ple, Euclid’s broad RIZ ﬁlter means that it is far more susceptible to chromatic eﬀects(Section 2.1) than LSST or WFIRST. The WFIRST survey design is more conservativethan Euclid in terms of the number of samples at each position, making it less likely tosuﬀer from undersampling (due to, e.g., cosmic rays that result in an exposure being ex-cluded). WFIRST’s NIR detectors will have diﬀerent pixel-level systematics than the Euclidor LSST CCDs, and greater calibration challenges due to the CMOS architecture. LSSTwill suﬀer from blending far more than the two space-based surveys. The fact that Euclidis shallower than WFIRST or LSST means that it can more easily gather representativespectroscopic samples for photometric redshift training and calibration. However, the factthat WFIRST will be completely within the LSST footprint (and will use it for photometricredshift determination) results in greater survey homogeneity than Euclid, which will rely

36 Mandelbaum n several ground-based datasets for photometric redshifts. Relying on the various surveycross-comparisons plus the fact that they suﬀer from systematics at diﬀerent levels will behighly scientiﬁcally valuable, and the combination of the surveys has the potential to beeven more powerful than one would expect by naively combining statistical errors (Jainet al. 2015, Rhodes et al. in prep. ).Below are a number of key take-aways synthesizing the material in the sections above:1. There are low-level issues such as detector systematics, chromatic eﬀects, astrometry,and survey geometry representation for which work is clearly needed to get where weneed to be for surveys in the 2020s, but there are promising avenues for investigation.2. An area where genuinely new ideas are needed is blending systematics, both in howto quantify and mitigate the impact of low-level blends on shear and photo- z , andthe impact of unrecognized blends. The ﬁeld has only recently started to confrontthis issue, and more work is needed.3. Several issues fall into the category of “promising ideas exist but more explorationis needed to determine which will work and how exactly to use them at the levelof precision needed for future surveys”: calibration of N ( z ) for photometric redshiftsamples, shear calibration, optimal image combination, PSF modeling, mitigation oftheory systematics, and covariance matrix estimation. Serious work must be done bythe community, but all of these issues are more advanced than blending systematics.The calibration of N ( z ) for photometric redshift samples has gotten less attentionthan shear calibration until recently, and therefore there is some catching up to doin this area. Indeed, the weak lensing community’s unfortunate habit of outsourcingphoto- z production and calibration without considering the cross-talk between shear-related selection eﬀects and photo- z ’s must end: we must interface with the photo- z community at an earlier phase of the analysis.4. Decisions to be made about image combination must factor in the connection betweenimage combination, PSF modeling, shear estimation, and deblending.5. Choice of data compression methods will have an impact on the best way to handlecovariance matrix estimation and cosmological parameter inference.6. The ﬁeld views shear estimation quite diﬀerently from how it did from the mid-1990suntil around 2012: it is now well-understood that estimation of per-galaxy shapeswill not result in an unbiased estimate of the ensemble shear, so the focus is oneither calibration strategies or methods of inferring shear without per-galaxy shapes.Several highly promising options currently exist.7. Regarding the overall cosmological inference problem, more work is needed on blindingstrategies for weak lensing analysis by upcoming surveys. In addition, there is stillroom to draw the ﬁeld away from the standard method of likelihood analysis (seealternatives discussed in Sec. 3.7), but it will take substantial development for thosemethods to be viable.8. Having (at least) two methods with diﬀerent assumptions for any complex analysisstep is highly valuable. This was highlighted in the DES year 1 cosmology analysis(DES Collaboration et al. 2017). Even having two independent pipelines that shareassumptions can be useful for identifying bugs, hidden assumptions, and numericalissues. Pipeline redundancy will likely remain an important element of cosmologyanalysis in the 2020s, and hence it is really valuable that for most of the key issuesdiscussed here (e.g., N ( z ) and shear calibration) there are multiple viable approaches. •• Weak lensing 37 . Null tests are valuable, but it is important to understand what really is a null test,and which “null tests” could be defeated by faulty assumptions.10. In many of the above sections on theoretical systematics, papers that are referencedshow methods for marginalizing over that systematic. Most of those papers consideredindividual systematics in isolation. The full problem with all of these theoretical andobservational systematics is likely more complex, with degeneracies between somesystematics. It will be important for the ﬁeld to confront the multiple-systematics-mitigation problem sooner rather than later, in order to identify obstacles early onand develop strategies to overcome them.The issue of multiple independent approaches raises the question of how independentthe diﬀerent surveys should try to be. For example, if they all rely on the representativenessof a given spectroscopic training sample for photo- z , systematic uncertainties could becomecorrelated across the surveys. Cross-survey comparison, when carried out properly, can bea powerful tool for identifying inconsistencies that may be due to systematic errors. Butit is important to consider the ingredients of the analysis (e.g., commonality in algorithmsfor shear calibration or photo- z calibration, same vs. diﬀerent implementations of commonalgorithms) before deciding what parts truly are independent. In that sense, comparisonagainst CMB lensing is “safer” as a cross-check, because it is diﬃcult to correlate systematicsbetween galaxy and CMB lensing. It is clear, however, that a broad set of internal cross-checks (Section 4) and external ones will be necessary for the surveys of the 2020s to producecredible weak lensing cosmology results. This work should begin before the 2020s: existingsurveys – KiDS, HSC, and DES – will play a crucial role in this path towards believableprecision cosmology with the surveys of the 2020s. The community must demonstrate anability to self-consistently constrain cosmology with these datasets.While Figure 2 (left panel) provided an initial motivation for why weak lensing is sovaluable as a cosmological probe, the sections above may raise the question of why try to doit at all given the complexity of the problems involved? The community has made tremen-dous strides in how to address the key problems facing the ﬁeld, and most outstandingissues now have multiple paths to a resolution. To distinguish between general dark energyand modiﬁed gravity models as the cause of the observed accelerated expansion rate of theUniverse, we generally require a probe of the distance-redshift relation (e.g., baryon acous-tic oscillations, supernovae, time-delay strong lenses) and structure growth (weak lensing,galaxy clustering, redshift-space distortions, galaxy cluster counts). While all come withchallenges, weak lensing is currently the most promising of the “structure growth” probes.Use of galaxy clustering or redshift-space distortions alone requires highly precise deter-mination of the galaxy bias or marginalization over its value (which weakens constraints).In contrast, weak lensing allows direct determination of the galaxy bias from the shear-shear, galaxy-shear, galaxy-galaxy correlations. Competitive galaxy cluster measurementsof cosmology require weak lensing measurements with special care for systematics and the-oretical uncertainties arising in crowded cluster regions. All probes of structure growthbesides shear-shear correlations suﬀer worse from baryonic eﬀects, since weak lensing sig-nals are dominated by collisionless matter in the translinear regime. In short, these factorsplus the tremendous development in the ﬁeld of weak lensing in the past decade lead to theconclusion that weak lensing provides the cosmology community’s best hope for competitiveand believable constraints on cosmic structure growth, and hence on dark energy.

38 Mandelbaum

CKNOWLEDGMENTS

I would like to thank Mike Jarvis, Joe Zuntz, and Daniel Gruen for providing helpfulfeedback on the structure of this review based on early outlines. I also thank Gary Bernstein,Jim Bosch, Scott Dodelson, Mike Jarvis, Benjamin Joachimi, Tod Lauer, Peter Melchior,Josh Meyers, Jeﬀ Newman, Sam Schmidt, Michael Schneider, Chaz Shapiro, Erin Sheldon,and Michael Troxel for reading portions of this review and providing thoughtful feedbackon relatively short notice.

LITERATURE CITED

Abazajian K, Dodelson S. 2003.

Physical Review Letters preprint (arXiv:1610.02743)

Abazajian KN, Calabrese E, Cooray A, De Bernardis F, Dodelson S, et al. 2011.

AstroparticlePhysics

ApJS

ApJS preprint (arXiv:astro-ph/0609591)

Alsing J, Heavens A, Jaﬀe AH, Kiessling A, Wandelt B, Hoﬀmann T. 2016.

MNRAS

MNRAS preprints (arXiv:1707.04105)

Antilogus P, Astier P, Doherty P, Guyonnet A, Regnault N. 2014.

Journal of Instrumentation

A&A

ApJ

MNRAS

Phys. Rev. D

J. Cosmology Astropart. Phys.

PASP

MNRAS preprint (arXiv:1701.08748)

Becker MR. 2013.

MNRAS

Phys. Rev. D

MNRAS

A&A

Phys. Rev. D

MNRAS

ApJ

MNRAS preprint(arXiv:1706.09928)

Bernstein GM, Armstrong R. 2014.

MNRAS

PASP AJ Automated Morphometry with SExtractor and PSFEx . In

Astronomical Data Anal- • Weak lensing 39 sis Software and Systems XX , eds. IN Evans, A Accomazzi, DJ Mink, AH Rots, vol. 442 of

Astronomical Society of the Paciﬁc Conference Series

Blazek J, MacCrann N, Troxel MA, Fang X. 2017.

ArXiv e-prints

Blazek J, Vlah Z, Seljak U. 2015.

J. Cosmology Astropart. Phys.

MNRAS preprint (arXiv:1705.06766)

Bridle S, Balan ST, Bethge M, Gentile M, Harmeling S, et al. 2010.

MNRAS

Annals of Applied Statistics preprint (arXiv:1706.01542)

Catelan P, Kamionkowski M, Blandford RD. 2001.

MNRAS

ArXiv e-prints

Chisari NE, Dunkley J, Miller L, Allison R. 2015.

MNRAS

MNRAS preprint (arXiv:1112.3108)

Croft RAC, Metzler CA. 2000.

ApJ

MNRAS

MNRAS preprint (arXiv:1311.2338)

Dawson WA, Schneider MD, Tyson JA, Jee MJ. 2016.

ApJ

Experimental Astronomy

New Worlds, New Horizons in Astronomy and Astrophysics .The National Academies PressDES Collaboration, Abbott TMC, Abdalla FB, Alarcon A, Aleksi´c J, et al. 2017. preprint(arXiv:1708.01530)

Dodelson S. 2017.

Gravitational Lensing . Cambridge University PressDodelson S, Schneider MD. 2013.

Phys. Rev. D

MNRAS

A&A preprint(arXiv:1708.06085)

Fenech Conti I, Herbonnet R, Hoekstra H, Merten J, Miller L, Viola M. 2017.

MNRAS

PASP preprint (arXiv:1703.07786)

Fruchter AS. 2011.

PASP

A&A preprint (arXiv:1707.06640)

G´orski KM, Banday AJ, Hivon E, Wandelt BD. 2002.

HEALPix — a Framework for High Reso-lution, Fast Analysis on the Sphere . In

Astronomical Data Analysis Software and Systems XI ,

40 Mandelbaum ds. DA Bohlender, D Durand, TH Handley, vol. 281 of

Astronomical Society of the PaciﬁcConference Series

Gruen D, Brimioulle F. 2017.

MNRAS

MNRAS preprint (arXiv:1709.03600)

Gurvich A, Mandelbaum R. 2016.

MNRAS

A&A

MNRAS

A&A

ApJ

MNRAS

ApJ

MNRAS

A&A

MNRAS

A&A

MNRAS

Phys. Rev. D

MNRAS

MNRAS preprint (arXiv:1708.01532)

Hu W. 2002.

Phys. Rev. D preprint (arXiv:1705.01599)

Hudson MJ, Gillis BR, Coupon J, Hildebrandt H, Erben T, et al. 2015.

MNRAS preprint (arXiv:1702.02600)

Huﬀ EM, Hirata CM, Mandelbaum R, Schlegel D, Seljak U, Lupton RH. 2014.

MNRAS

Phys. Rev. D

Astroparticle Physics

A&A

ApJ

J. Cosmology Astropart. Phys. preprint (arXiv:1501.07897)

Jarvis M, Sheldon E, Zuntz J, Kacprzak T, Bridle SL, et al. 2016.

MNRAS

ApJ

PASP

ApJ

A&A

Space Sci. Rev. • Weak lensing 41

A&A

MNRAS

A&A

MNRAS

ApJ

PASP

MNRAS

Space Sci. Rev.

Reports on Progress in Physics preprint (arXiv:1702.05301)

Kilbinger M, Schneider P, Eiﬂer T. 2006.

A&A

Space Sci. Rev.

AOAS

MNRAS

ApJS

MNRAS preprint (arXiv:1706.09359)

Krause E, Hirata CM. 2010.

A&A

Journal of Instrumentation

ApJ

PASP preprint (astro-ph:1110.3193)

Leauthaud A, Massey R, Kneib J, Rhodes J, Johnston DE, et al. 2007.

ApJS

ApJ

J. Cosmology Astropart. Phys.

Phys. Rev. D AJ Phys. Rev. D

MNRAS

A&A preprints(astro-ph:0912.0201)

LSST Science Collaboration, Marshall P, Anguita T, Bianco FB, Bellm EC, et al. 2017.

ArXive-prints

Lu T, Zhang J, Dong F, Li Y, Liu D, et al. 2017. AJ ApJ

MNRAS preprint(arXiv:1710.00885)

Mandelbaum R, Miyatake H, Hamana T, Oguri M, Simet M, et al. 2017b. preprint(arXiv:1705.06745)

42 Mandelbaum andelbaum R, Rowe B, Armstrong R, Bard D, Bertin E, et al. 2015.

MNRAS

ApJS

MNRAS

ApJ

ApJ preprint (arXiv:1601.02693)

McQuinn M, White M. 2013.

MNRAS

MNRAS preprint(arXiv:1706.00427)

Melchior P, B¨ohnert A, Lombardi M, Bartelmann M. 2010.

A&A

MNRAS

MNRAS preprint (astro-ph:1303.4722)

Meyers JE, Burchat PR. 2015a.

Journal of Instrumentation

ApJ

MNRAS

ApJ

MNRAS

J. Cosmology Astropart. Phys.

ApJ

Astroparticle Physics preprint (arXiv:1705.06792)

Padmanabhan N, White M, Zhou HH, O’Connell R. 2016.

MNRAS

A&A

Phys. Rev. D

A&A

PASP

Journal of Instrumentation

A framework for modeling thedetailed optical response of thick, multiple segment, large format sensors for precision astronomyapplications . In

Modeling, Systems Engineering, and Project Management for Astronomy VI , vol.9150 of

Proc. SPIE

Refregier A, Kacprzak T, Amara A, Bridle S, Rowe B. 2012.

MNRAS

PASP

ApJ

ApJS

Wavefront sensing and the active optics system of the darkenergy camera . In

Ground-based and Airborne Telescopes V , vol. 9145 of

Proc. SPIE • Weak lensing 43 owe B. 2010.

MNRAS

ApJ

Astronomy and Com-puting preprint (arXiv:1708.01534)

S´anchez C, Carrasco Kind M, Lin H, Miquel R, Abdalla FB, et al. 2014.

MNRAS

ApJ

Phys. Rev. D

MNRAS

ApJ

J. Cosmology Astropart. Phys.

ApJ

A&A

Chapter 10: Web-based Tools - STOMP FootprintService . In

Astronomical Society of the Paciﬁc Conference Series , eds. MJ Graham, MJ Fitz-patrick, TA McGlynn, vol. 382 of

Astronomical Society of the Paciﬁc Conference Series

Sellentin E, Heavens AF. 2017.

MNRAS

PASP

ApJ AJ MNRAS

MNRAS

Long-wavelength scattered-light halos in ASC CCDs . In

Optical Astronomical Instrumentation , ed. S D’Odorico, vol. 3355of

Proc. SPIE

Soo JYH, Moraes B, Joachimi B, Hartley W, Lahav O, et al. 2017. preprint (arXiv:1707.03169)

Spergel D, Gehrels N, Baltay C, Bennett D, Breckinridge J, et al. 2015. preprint (arXiv:1503.03757)

Stetson PB. 1987.

PASP

MNRAS

ApJ

ApJ preprint (arXiv:1704.05988)

Taylor A, Joachimi B, Kitching T. 2013.

MNRAS

Phys. Rev. D

Phys. Rep. preprint (arXiv:1708.01538)

Tyson JA, Roat C, Bosch J, Wittman D. 2008.

LSST and the Dark Sector: Image Processing

44 Mandelbaum hallenges . In

Astronomical Data Analysis Software and Systems XVII , ed. R. W. Argyle,P. S. Bunclark, & J. R. Lewis, vol. 394 of

Astronomical Society of the Paciﬁc Conference Series

Tyson JA, Sasian J, Gilmore K, Bradshaw A, Claver C, et al. 2014.

LSST optical beam simulator .In

High Energy, Optical, and Infrared Detectors for Astronomy VI , vol. 9154 of

Proc. SPIE

Vallinotto A. 2012.

ApJ

MNRAS

A&A preprint(arXiv:1706.05004)

Van Waerbeke L, Mellier Y, Erben T, Cuillandre JC, Bernardeau F, et al. 2000.

A&A

MNRAS

Nature

MNRAS

Nature

Comparison of LSST and DECam wave-front recovery algorithms . In

Ground-based and Airborne Telescopes VI , vol. 9906 of

Proc. SPIE

Yao J, Ishak M, Lin W, Troxel MA. 2017. preprint (arXiv:1707.01072)

Yoo J, Tinker JL, Weinberg DH, Zheng Z, Katz N, Dav´e R. 2006.

ApJ

Phys. Rev. D

MNRAS

PASP preprint (arXiv:1708.01533) ••