Statistical and systematic uncertainties in pixel-based source reconstruction algorithms for gravitational lensing
MMon. Not. R. Astron. Soc. , 000–000 (0000) Printed 10 October 2018 (MN L A TEX style file v2.2)
Statistical and systematic uncertainties in pixel-basedsource reconstruction algorithms for gravitational lensing
Amitpal S. Tagore and Charles R. Keeton
Department of Physics & Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ 08854, USA
10 October 2018
ABSTRACT
Gravitational lens modeling of spatially resolved sources is a challenging inverse prob-lem with many observational constraints and model parameters. We examine estab-lished pixel-based source reconstruction algorithms for de-lensing the source and con-straining lens model parameters. Using test data for four canonical lens configurations,we explore statistical and systematic uncertainties associated with gridding, source reg-ularisation, interpolation errors, noise, and telescope pointing. Specifically, we comparetwo gridding schemes in the source plane: a fully adaptive grid that follows the lensmapping but is irregular, and an adaptive Cartesian grid. We also consider regular-isation schemes that minimise derivatives of the source (using two finite differencemethods) and introduce a scheme that minimises deviations from an analytic sourceprofile. Careful choice of gridding and regularisation can reduce “discreteness noise” inthe χ surface that is inherent in the pixel-based methodology. With a gridded source,some degree of interpolation is unavoidable, and errors due to interpolation need tobe taken into account (especially for high signal-to-noise data). Different realisationsof the noise and telescope pointing lead to slightly different values for lens model pa-rameters, and the scatter between different “observations” can be comparable to orlarger than the model uncertainties themselves. The same effects create scatter in thelensing magnification at the level of a few percent for a peak signal-to-noise ratio of10, which decreases as the data quality improves. Key words: gravitational lensing: strong – methods: numerical
The gravitational deflection of light produces a variety ofobservable effects that can be used to study the mass distri-butions of deflectors (e.g., galaxies and clusters of galaxies)and the structure of light sources (e.g., distant quasars andstar-forming galaxies), and to constrain cosmological param-eters (see the review by Schneider, Kochanek & Wambsganss2006). In this paper, we focus on strong gravitational lensingin which light bending creates multiple images of the source.If the source is compact and unresolved, the images andsource are each characterised by just three numbers: two po-sition coordinates and a flux. If the source is extended, theresolved images provide many constraints but the structureof the source must be included in the modeling. One ap-proach is to assume the source has elliptical symmetry andanalyse isophotal shapes (e.g., Blandford, Surpi & Kundi´c2001) or peak surface brightness curves (e.g., Kochanek,Keeton & McLeod 2001) in Einstein rings. A more generalapproach is to reconstruct the source on a grid in order topermit complex structure and reproduce the data pixel by pixel. Pixel-based source reconstruction (PBSR) algorithmstake full advantage of the information in the lensed images,but the large numbers of constraints (image pixels) and freeparameters (source pixels) demand advanced techniques andmore computational effort.The history of extended image lens modeling is rich.Early implementations of PBSR algorithms (e.g., Walling-ton, Kochanek & Narayan 1996; Koopmans 2005) used atwo-loop method in which an outer loop varied the lensmodel parameters, while an inner loop varied source param-eters to find the best fit given a lens model. The lens was de-scribed parametrically, typically using standard galaxy andhalo mass profiles. The source, by contrast, was constructedon a Cartesian grid, and a penalty function was used to dis-favor source models that seemed too unphysical. Varying allof the source pixels independently was a costly step. Warren& Dye (2003) simplified the inner loop by showing that thelensing equation can be written as a matrix equation, allow-ing the optimal source to be found in a single, analytic step(see § c (cid:13) a r X i v : . [ a s t r o - ph . C O ] A ug Tagore & Keeton (2005) and Vegetti & Koopmans (2009) introduced irregularsource grids while keeping the inner loop linear (see § CLEAN algorithm (H¨ogbom 1974) fits the“dirty” map of observed surface brightnesses with pointsources. It finds the brightest region in the map and sub-tracts a point source convolved with the instrumental beam,and then iterates until a stopping criterion is met. LensClean(Kochanek & Narayan 1992) adds a step in which the pointsource is gravitationally lensed before the images are sub-tracted, allowing the lens model and source to be fit simul-taneously.In this paper, we present a new software called pixsrc that performs PBSR in conjunction with the established lensmodel software (Keeton 2001) for exploring the lensmodel parameter space. We present the methodology be-hind pixsrc and then discuss issues that arise during thelens modeling process. In particular, we investigate statis-tical uncertainties and systematic biases inherent in PBSRmethods by analysing representative galaxy-galaxy stronglensing events. We examine the effects of noise and tele-scope pointing on the lens model analysis, as well the effectsof different choices of gridding and priors.
For a given dataset, there may be lens models that fit thedata well but require a source that seems unrealistic. (Amodel with no mass can fit the data perfectly if the sourcelooks exactly like the image.) There may also be models forwhich the source fits the noise in addition to the lens data.Using Bayesian inference, priors can be used to reject modelsthat are unphysical or overfit the noise. This section reviewsthe formal framework for PBSR, which has been discussedin detail by Suyu et al. (2006) and Vegetti & Koopmans(2009). We reproduce only the key aspects here.
In the absence of dust or other attenuation, lensing conservessurface brightness. The mapping between the source planeand image plane can therefore be written as d = Ls + n , (1)where L is a linear “lensing operator” that acts on surfacebrightness values. This operator can encode not only thegravitational deflections of the lens but also effects from theatmosphere and telescope. For example, if G characterisesthe lens while B is a “blurring operator” that characterisesthe point spread function (PSF) of the observations, we candefine L ≡ BG to capture the combined effects. s and d are vectors containing the surface brightness values in thesource plane and image plane, respectively, and n is thenoise present in the data. If the source and data are two-dimensional images with surface brightness values specifiedon a Cartesian grid, the one-dimensional vectors s and d can be constructed by column- or row-stacking the two-dimensional images. If the source grid is irregular, the struc-ture of s can be more complicated, but the formal frameworkstill applies. For reference, we note that the numbers of pix-els in the source and image plane maps are N s and N d ,respectively.If the noise is Gaussian, we can write the likelihood ofobserving data d given a lensing operator L and source s as P ( d | L , s ) ∝ exp (cid:18) − E d ( d | L , s ) (cid:19) , (2)where E d ( d | L , s ) = 12 χ ( s ) = 12 ( Ls − d ) (cid:62) C − ( Ls − d ) , (3)and C d is the symmetric noise covariance matrix, which con-tains pixel-to-pixel noise correlations. For the case of uni-form, pixel-independent noise, C d is diagonal with entriesequal to σ , where σ is the standard deviation of the noise.Suyu et al. (2006) define the most likely solution, s ml ,as the source model that maximises the likelihood and thusminimises E d . Setting ∇ E d ( s ) = 0, we find that s ml satisfies Fs = f , (4)where F = L (cid:62) C − L and f = L (cid:62) C − d . Because F is squareand invertible by construction, s ml is given by s ml = F − f . (5)If the left-inverse, L − , of L exists, then Eq. 5 reduces to We adopt the following conventions: one-dimensional vectorsare denoted by bold lower-case letters, two-dimensional matricesare denoted by bold capital letters, and scalars are unbolded. F − will fail to exist if there are source pixels that cannot beconstrained by the image pixels (i.e., there are too many sourcepixels overall, or source pixels that do not map to regions of theimage plane with useful data), or if there are image pixels thatlack corresponding source pixels. Those situations can generallybe avoided with reasonable choices of grids. It is conceivable thatcertain grid configurations could also create problems for F − ,but those should be rare. L − will exist and be unique if L is square and non-singular.If L is rectangular, L − will exist if there are more image pixelsthan source pixels and L has full column rank. These conditionsc (cid:13) , 000–000 ncertainties in pixel-based source reconstruction what one might na¨ıvely expect: s ml = L − d . (6) Unfortunately, s ml will fit the noise in the data in additionto the lensed images. There are several different ways toavoid such overfitting. Maximum entropy methods (MEMs;Wallington, Narayan & Kochanek 1994) favor sources whosepixel values follow broad distributions expected from infor-mation theory, as opposed to sources with some pixels thatare very different from the rest. MEMs also prohibit neg-ative surface brightness values. They do not constrain sur-face brightness variations between adjacent pixels, however,and can lead to large fluctuations over small scales. To favorsources that are smooth, we might introduce a function thatpenalises large values of the first or second derivative (theparticular choice depends on the data and underlying source;see Brewer & Lewis 2006; Suyu et al. 2006). If the penaltyfunction is quadratic in the source surface brightness, thesource that maximises the likelihood while minimising thepenalty is still given by a linear equation.Suppose, for example, that we want to introduce a func-tion E s ( s ) that penalises large surface brightness gradients.We can define a derivative operator H that acts on a sourcevector s to produce a vector Hs containing the gradient ofthe surface brightness at each pixel. Then we put E s ( s ) = 12 ( Hs ) (cid:62) Hs = 12 s (cid:62) ( H (cid:62) H ) s = 12 s (cid:62) Rs , (7)where R ≡ H (cid:62) H . In other words, when sandwiched be-tween two source vectors, R returns the square of the gradi-ent summed over source pixels. A similar construction canreturn the sum of the squares of the curvature (see § P ( s | L , R , d , λ ) ∝ exp (cid:18) − M ( s ) (cid:19) , (8)where M ( s ) ≡ E d ( s ) + λE s ( s ) , (9)and λ is a dimensionless parameter that determines whichterm in Eq. 9 dominates. When the “regularisation strength” λ is small, the Bayesian framework will primarily fit thedata; while when λ is large, the framework will enforcestrong priors on the source.Suyu et al. (2006) define the most probable solution s mp as the source model that maximises the posterior and thusminimises M ( s ). To find this model, we Taylor expand E d to second order about its minimum, E d ( s ) = E d ( s ml ) + 12 ( s − s ml ) (cid:62) F ( s − s ml ) . (10) may not be satisfied if two or more image pixels map to the samepoint (within machine precision), or if other similar coincidencesoccur. The matrix F that appears here is the Hessian of E d , butfrom Eq. 3 this is the same as F defined in Eq. 4. Setting ∇ M ( s ) = 0, we find that s mp satisfies As = Fs ml , (11)where A = F + λ R is the Hessian of M from Eq. 9. Because A is square and invertible by construction, s mp is given by s mp = A − Fs ml = A − f . (12)It remains to determine the regularisation strength λ seen in Eq. 9. In the Bayesian framework, the optimal valueof λ is found by maximising (see Suyu et al. 2006 for a fulldiscussion) P ( λ | d , L , R ) ∝ P ( d | L , λ, R ) P ( λ ) . (13)We assume a uniform logarithmic prior, P ( λ ) ∝ λ − , be-cause we do not know the scale of λ a priori. The optimalregularisation strength, ˆ λ , can then be found numerically.Formally, s mp is a biased estimator of the true sourcesurface brightness s true . Suyu et al. (2006) show that aver-aging over many realisations yields (cid:104) s mp (cid:105) = A − Fs true , (14)which differs from s true to the extent that A − = ( F + λ R ) − differs from F − . The simulations presented in § Once we solve for the source at a fixed lens model, we mustrank different models by evaluating the posterior probability P ( L , R | d ) ∝ P ( d | L , R ) × priors on L and R . (15)If the priors on the lens models and regularisation schemeare flat, then we can just evaluate the Bayesian evidence P ( d | L , R ) = (cid:82) P ( d | L , λ, R ) P ( λ ) d λ . Suyu et al. (2006)suggest that the distribution for λ can be expected to havea sharp peak, so instead of computing the full integral wecan just evaluate the integrand at its peak.Examining Eqs. 2 and 15, we can infer that − E = χ + V, (16)where E is shorthand for the evidence and V is a constantthat depends on the available prior volume of the parameterspace. Thus, we will use χ and − E interchangeably.For a more detailed discussion on the connection betweenevidence and χ , see Jenkins & Peacock (2011). The Hessian of a function is a matrix that contains the secondorder partial derivatives of the function. In this case, the deriva-tives are taken with respect to the source vector. For example,the ( i, j ) entry of F would hold the second order derivative of E d with respect to the i th and j th source pixels. Strictly speaking, this is not the full evidence because the lensmodel parameters are not marginalised, but the terminology isstandard.c (cid:13) , 000–000
Tagore & Keeton
Figure 1.
Test data in four canonical configurations. Clockwisefrom top left: cusp, fold, cross, and 2-image lens configurations.For each panel, the diamond-shaped curve (the caustic) and ob-ject (source galaxy) inside the red square are in the source plane,while the elliptical curve (the critical curve) and the other fea-tures (the arcs) outside the red square are in the image plane. Thelensing galaxy used to create the data is the same in all cases: aSIE with Einstein radius of 3 (cid:48)(cid:48) , ellipticity of 0 .
3, position angle ofthe semi-major axis 60 ◦ east of north. The colour scale is linearand identical in all panels. The source galaxies used to create thedata share the same size and luminosity profile; only the positionsand orientations differ. To explore possible uncertainties and biases in PBSR algo-rithms, we construct test data using a simple but realisticlens and source. The lens is a singular isothermal ellipsoid(SIE), which is a popular choice for modeling elliptical galax-ies. Although the dark and luminous mass profiles are notsimple power laws individually, the total density profile ap-pears to be close to isothermal (Kronawitter et al. 2000;Koopmans et al. 2009; Treu 2010). The SIE is placed at theorigin and fixed with an Einstein radius of 3 (cid:48)(cid:48) , ellipticity of0 .
3, and position angle of 60 ◦ east of north. The source lu-minosity profile is an elliptical Gaussian with a half-light ra-dius of 0 . (cid:48)(cid:48) and peak surface brightness of 5 (in arbitraryunits). The position and orientation of the source are variedto create four canonical lens configurations that let us as-sess whether uncertainties in PBSR algorithms are sensitiveto the image morphology. Fig. 1 shows a 2-image configu-ration along with three configurations that nominally havefour images: a source near the center of the caustic producesa “cross” configuration with four distinct images; a sourcejust inside the caustic curve produces two short arcs and along arc from two merging images (a “fold” configuration);and a source inside a cusp in the caustic produces one longarc from three merging images along with an isolated imageon the other side (a “cusp” configuration). In the follow-ing sections we vary the amount of noise in the mock data(Fig. 2) and the resolution (i.e., the pixel scale; Fig. 3). Figure 2.
Test data for the cusp configuration shown with vary-ing noise levels. Peak S/N clockwise from top left: 1, 10, 25, 500.The pixel scale is 0.1 arcsec/pixel. The colour scale is linear andconsistent except for the S/N = 1 case.
Figure 3.
Test data for the cusp configuration shown with vary-ing pixel scales. Only a subsection of the long arc is shown so thatthe differences in resolution are visible. Clockwise from top left:0.1, 0.05, 0.03, and 0.02 arcsec/pixel.
Some of the practical challenges in PBSR are inherent tothe algorithm itself. We have already mentioned the need forregularisation. Dealing with gridded data makes some degreeof interpolation unavoidable. Also, different parts of the im-age plane probe different spatial scales in the source plane,depending on the lensing magnification. Using an adaptivesource plane grid helps take full advantage of the informa-tion contained in a lensed image, but leads to challengeswith interpolating and calculating derivatives on an irregu-lar grid. In this section we examine how these issues affectthe source reconstruction and lens model ranking. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction In PBSR, the image and source grids do not have to be thesame. Image pixels have definite dimensions set by the in-strument and data processing. But source “pixels” are moregeneral; they refer loosely to positions (and small regionsaround them) where one chooses to reconstruct the surfacebrightness of the source. The shape and density of sourcepixels are arbitrary, and they can vary across the sourceplane. The source pixel density directly limits the resolutionof the reconstruction.If source pixels outnumber image pixels, the reconstruc-tion problem will be underconstrained. The regularisationstrength will be driven to high values, effectively decreasingthe number of independent source pixels. In each of thegridding schemes discussed below, the grid is constructed sothe number of source pixels is approximately half the num-ber of image pixels.The size and shape of the grid can be limited to specificregions on the sky. Using all image pixels may be computa-tionally expensive, and it can make the regularisation less ef-fective (because most of the source pixels would just containnoise). Therefore it may be useful to construct masks aroundregions that contain lensed images. Wayth et al. (2005) com-ment on the importance of careful pixel masking, becausepixels that do not contain flux can be as important as thosethat do. If a model fits the observed surface brightness butalso puts flux where no light is observed, the model shouldbe penalised but overly aggressive masking might cause thefaulty pixels to be ignored. As a precaution, pixsrc can findand include all pixels that are “sisters” to the pixels in themasked region(s). Doing so requires some care because thenumber of image pixels that get used can vary with the lensmodel.We describe three different schemes for gridding thesource plane: one Cartesian and two adaptive. Fig. 4 showsexamples of the two adaptive grids. We compare the perfor-mance of the two adaptive gridding schemes in § We begin with a simple Cartesian grid. The pixel densityand resolution in the source plane are uniform. The grid di-mensions and pixel scale can be set manually or chosen toachieve N s ≈ N d /
2, as this seems to adequately reconstructthe source without being underconstrained. Benefits of theCartesian grid lie in its simplicity: the grid, lensing opera-tor, and regularisation operator are easily and quickly con-structed. However, the uniform resolution means that smallscales cannot be probed without incurring a large numberof source pixels and a correspondingly large regularisationstrength. Strong regularisation introduces correlations between nearbypixels, smoothing the surface brightness and decreasing the effec-tive resolution in the source plane. Heuristically, image pixels are sisters if they come from thesame source pixel.
Figure 4.
Triangulation of a fully adaptive grid (top) and anadaptive Cartesian grid (bottom). The lens model used is speci-fied in § (cid:48)(cid:48) ,ellipticity 0.3, and position angle 60 ◦ east of north. Vegetti & Koopmans (2009) introduced a gridding schemein which some of the image plane pixels are mapped tothe source plane and used to construct the source grid (seeFig. 5). By default we choose to use every other pixel toconstruct the grid, which helps to ensure that N s ≈ N d / § c (cid:13) , 000–000 Tagore & Keeton
Figure 5.
A demonstration of the fully adaptive grid construc-tion. The left and right panels show the image and source grids,respectively. Filled circles in the image plane are mapped to thesource plane, and a Delaunay triangulation (Shewchuk 1996) isused to construct the source grid. Open circles in the image planeare then mapped to the source plane and set to values interpo-lated from the surrounding source pixels. Each filled circle hasa row in the unblurred lensing operator with a single entry of1, while each open circle has a row with three non-negative en-tries that sum to 1. This figure is inspired by Fig. 1 in Vegetti &Koopmans (2009).
The adaptive Cartesian grid builds from the Cartesian grid.An initial two-dimensional grid is refined, adding or remov-ing pixels, so the pixel density varies according to somecriterion. Such adaptive mesh refinement algorithms havebeen used in many fields of research, including star forma-tion modeling, radiative transfer codes, and magnetohydro-dynamic simulations. For PBSR, we implement an adaptiveCartesian grid similar to that used by Dye & Warren (2005),which is designed to place more source pixels in regions ofhigher magnification. We first give a heuristic description ofthe gridding scheme, and then provide more details.An initial, zeroth level grid is constructed as a box justlarge enough to contain all of the ray-traced image pixels,with five grid points (at the corners and center). A zerothlevel magnification, µ , is ascribed to this grid. Then eachquadrant is examined, and if the magnification in this quad-rant, µ , is larger than four times the magnification of theparent grid ( µ (cid:62) µ ), the quadrant is split into a (firstlevel) subgrid, itself consisting of four quadrants. The factorof four here is necessary because as we add a subgrid, wesplit a quadrant into four more quadrants, increasing thespatial resolution by a factor of four (in area). Then, foreach of these first level quadrants, we add a second level ofsubgridding if µ (cid:62) µ = 16 µ . This process is repeated forevery quadrant and subquadrant.In practice, it would be computationally expensive toexamine every quadrant and subquadrant, and it would beundesirable to do so since many source pixels would be un-used in the lensing operator. Instead, every image pixel isray-traced back to the source plane, the local magnificationat that location is computed, the appropriate level of sub-gridding is determined based on the ratio of the local mag-nification to the zeroth level magnification, and only theminimum number of source pixels (three or fewer) neededare created.It still remains to determine µ . Because the size ofthe zeroth level grid is arbitrary, µ is also arbitrary. Thisfreedom is what Dye & Warren (2005) encapsulate in their“splitting factor.” We note that Dye & Warren (2005) allowtheir splitting factor to vary in the source reconstruction. We have not explored this additional freedom. Instead, wefix µ so that N s ≈ N d / § χ surface is larger using the adaptive Cartesian grid.The higher noise is thought to be due to the discrete changein magnification required to trigger the subgridding. How-ever, as discussed in § The surface brightness of an image pixel is calculated byray-tracing the pixel to the source plane and linearly in-terpolating over up to three adjacent source pixels. Suchinterpolation amounts to treating the source as a collectionof small planes, which may or may not provide an accurateapproximation to the true surface brightness distribution(depending on the pixel scale). Errors from the interpola-tion can be important if they are large compared with therandom noise in the data.As an illustration, Fig. 6 shows data, model, and resid-uals for a high-quality image of a source in the cusp configu-ration. The peak S/N is 500, and the image resolution is 0.03arcsec/pixel. The lens model was fixed at the correct model,and the fully adaptive grid was constructed as usual, butthe source surface brightness was fixed at the known valuefor the cusp source (rather than being reconstructed). Byvisual inspection, the data and model seem to agree well,but the residuals show clear structure. Also, the χ valueis 13,236, which corresponds to a reduced χ of 1.60. Thisis troubling since the lens and source models were fixed atthe true values. The residuals, and hence the large χ value,arise from interpolation errors. To see this, Fig. 6 shows thedifference between an image constructed directly from theanalytic source and an image constructed from the inter-polated version. The structure of the interpolation errorsclearly explains the structure of the model residuals.We need to find a way to account for these errors.Strictly speaking, we would have to know the true surfacebrightness of the source in order to determine interpolationerrors in the first place. As an approximation, we fit an an-alytic model (comprising one or more S´ersic profiles) to thepixelated source. We use this analytic model to compute amap of interpolation errors, as shown in the top right panelof Fig. 6. (The error map can be blurred by the PSF asneeded.) We then modify the noise covariance matrix ( C d in Eq. 3) with the substitution C d → C d + C interp . (17)We make C interp a diagonal matrix containing the squaresof the interpolation errors (which omits any correlations inerrors among pixels but is a simple and effective approach).This modification lowers the χ value for the case shown inFig. 6 to 8275, which corresponds to a reduced χ of 0.999.Accounting for interpolation errors in this way is a con-servative practice, as the effect is to broaden the χ surface. If the fit to the pixelated source is poor, we do not account forinterpolation errors. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction Figure 6.
Visualisation of interpolation errors for a source in thecusp configuration with a peak S/N of 500 and a resolution of 0.03arcsec/pixel. From top left, clockwise: data, interpolation errors,model residuals, model. The lens and source models were fixed attheir correct values, but there are significant residuals with thesame structure as the interpolation errors. Accounting for inter-polation errors lowers the χ from 13,236 to 8275, correspondingto a change in reduced χ from 1.60 to 0.999. As an example, Fig. 7 shows a one dimensional cut of theBayesian evidence for a cusp configuration with a pixel scaleof 0.05 arcsec/pixel and a peak S/N of 100. All lens modelparameters except the Einstein radius are fixed at their truevalues. The curves show the Bayesian evidence as a functionof R E for two forms of regularisation (discussed in § χ curve becomes shallower when interpolationerrors are addressed, reflecting a larger uncertainty in theEinstein radius.The scale of interpolation errors depends on the lensconfiguration and image resolution. Fig. 8 shows the mini-mum and maximum interpolation errors as a function of thepixel scale for all four lens configurations, using both thefully adaptive and adaptive Cartesian grids. The lens modeland source brightness are again fixed at their true values. Asthe image resolution improves, the interpolation errors de-crease. The doubly-imaged source is not as highly magnifiedas the other cases, so the effective resolution in the sourceplane is lower and the interpolation errors are larger (reach-ing about 20% of the peak flux). The quad configurationsshow interpolation errors up to about 7%. In § Figure 7.
Effect of interpolation errors, when the errors are com-parable to the noise level. The test data have a source in the cuspconfiguration with a peak S/N of 100 and a resolution of 0.05arcsec/pixel. Red, solid lines and blue, dashed lines correspondto curvature regularisation and ASR, respectively (see § χ , vertical offsets have been applied,but differences between same colour curves are meaningful. Qual-itatively, we see that accounting for interpolation errors broadensthe χ . Quantitatively, the ranges of χ change by factors of 1.8and 1.2 for curvature regularisation and ASR, respectively. Figure 8.
Minimum and maximum interpolation errors areshown as a function of the pixel scale for both gridding schemes.(The pixel scale is quoted for the image plane because that isthe quantity known from data, but bear in mind that interpola-tion occurs in the source plane.) From top left, clockwise: cusp,fold, cross, and 2-image configurations. The lens and source mod-els were fixed at their correct values. The percent error in anygiven image pixel is calculated by taking the ratio of the inter-polation error in that pixel to the peak signal in the image. Thedoubly-imaged configuration shows the largest errors, because themagnification is lower and hence the number of source pixels thatcover the source is smaller.c (cid:13) , 000–000
Tagore & Keeton
Figure 9.
Comparison of FDM and DTM on both adaptive grids.Top left: fixed source model (an elliptical Gaussian with ellipticity0.3), placed in the cusp configuration. Top right: exact magnitudeof the gradient of source. The remaining panels show the magni-tude of the gradient computed with various grids and derivativeschemes. The middle row corresponds to the fully adaptive (FA)grid, while the bottom row corresponds to the adaptive Cartesian(AC) grid. The left column corresponds to FDM, while the rightcolumn corresponds to DTM. The colour scale is linear. Only rela-tive changes within a panel are important, because multiplicativeconstants can be absorbed into the regularisation strength. Thespurious peaks sometimes seen when using the FDM are likelydue to unfortuitous alignment of “virtual pixels” with the pixelat which the derivative is evaluated. For a more detailed discus-sion, see § § lensing effectively gives higher resolution in regions that aremore highly magnified (see Vegetti & Koopmans 2009 formore discussion). This choice makes the regularisation sen-sitive to the lens model through the density of source pixels,so in principle it might introduce model-dependent biasesinto the regularisation. The simulations presented in § Using Taylor’s theorem, we can calculate derivatives on agrid using the finite difference method (FDM). For a simpleCartesian grid, the gradient at a particular pixel m can beapproximated by taking directional finite differences of thesurface brightness along the grid axes at the pixel m . This Figure 10.
Diagram illustrating the derivative calculation usingthe FDM on an irregular grid. The blue point indicates the pixel m where we seek to compute the derivative. The grid is the sameas that shown in Fig. 5. The black, solid lines form a quadri-lateral Q connecting the surrounding pixels (labeled n p , where p = { , , , } ). The points where Q intersects the horizontaland vertical lines through m are called virtual pixels (labeled v p ,where p = { , , , } ). The flux at each virtual pixel is a linearcombination of the fluxes at the two surrounding pixels that arecolinear with that virtual pixel. The virtual pixels are used tocompute the derivative at m . can be written as (cid:126) g [ m ] = 12 (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) (cid:126) r [ n ] − (cid:126) r [ m ] | (cid:126) r [ n ] − (cid:126) r [ m ] | ∝ (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) ˆ r nm , (18)where (cid:126) g is a vector containing the derivatives at each sourcepixel, (cid:126) r is a vector containing the position vectors of eachsource pixel, ˆ r nm is a unit vector pointing from n to m , andthe sums are over the four nearest pixels. The last propor-tionality holds because, for a Cartesian grid, the distancesbetween adjacent pixels are identical and can be absorbedinto the regularisation strength. To approximate the secondderivative across the source plane, we write down the Lapla-cian as h [ m ] = (cid:126) ∇ · (cid:126) g [ m ] ∝ (cid:126) ∇ · (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) ˆ r nm ∝ (cid:18) (cid:88) n s [ n ] (cid:19) − N s [ m ] , (19)where h is a vector containing the second derivative at eachsource pixel and the sum is again over the N = 4 nearestpixels. In both cases, if the pixel is not on the edge of thegrid then the sums include the four pixels to the immediateleft, right, top, and bottom of the pixel in question. If thepixel is on the edge, the “missing” pixels are assumed to con-tain zero flux. This effectively assumes the surface brightnessoutside the source grid is zero, which can lead to ineffectiveregularisation if the grid is small enough that the edges areclose to the region of interest. Suyu et al. (2006) note thatthe derivative calculations can be modified to avoid assum-ing zero surface brightness outside the grid, but that canlead to problems for ranking lens models. The preceding discussion can be extended to adaptive As Suyu et al. (2006) explain, dropping the zero surface bright-ness assumption causes the Hessian of R to become singular. Thesingularity can be removed by introducing a renormalisation con-stant, but the constant will vary with the lens model, complicatingthe model comparison. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction grids, although some care is needed because there may bemore than four pixels nearby and it may not be immediatelyobvious which ones should be used. Vegetti & Koopmans(2009) compute the derivative for a particular pixel m usingthe triangles that surround m in the Delaunay triangulationof the grid. In pixsrc , we instead identify four pixels (here-after referred to as surrounding pixels) as follows. Trans-forming to a coordinate system centered on m , we select thesurrounding pixels so that each pixel lies in a different quad-rant, each pixel is near m , and the quadrilateral Q formedby the pixels deviates the least from a square. We use thesurrounding pixels to calculate the surface brightness at theintersections of the x and y axes with Q , which we refer to asvirtual pixels. (A schematic diagram of the surrounding andvirtual pixels is shown in Fig. 10.) From the surface bright-nesses at the virtual pixels, we can compute the derivativesusing Eq. 18 and a modified version of Eq. 19. After somealgebra, the first derivative at m can be expressed as (cid:126) g [ m ] = (cid:88) n (cid:18) D [ v n − , s n − ] D [ v n − , m ] D [ s n − , s n ] ˆ r v n − + D [ v n , s n +1 ] D [ v n , m ] D [ s n +1 , s n ] ˆ r v n (cid:19) s [ s n ] − (cid:18) (cid:88) n D [ m, v n ] ˆ r v n (cid:19) s [ m ] , (20)and the second derivative at m is given by h [ m ] = (cid:88) n (cid:18) D [ v n − , s n − ] D [ v n − , m ] D [ s n − , s n ]+ D [ v n , s n +1 ] D [ v n , m ] D [ s n +1 , s n ] (cid:19) s [ s n ] − (cid:18) (cid:88) n D [ m, v n ] (cid:19) s [ m ] , (21)where the sums run from n = 1 to n = 4, D [ r, s ] is a func-tional that returns the distance between points r and s , v p is the p th virtual pixel, and ˆ r v p is a unit vector pointing to-ward the p th virtual pixel. For simplicity of notation, we letthe indices wrap around (e.g., v = v and v = v ).Fig. 9 suggests that derivatives calculated with theFDM can be inaccurate. Certain configurations of pointson the fully adaptive grid can cause virtual pixels to lie veryclose to m , leading to an anomalously high estimate for thederivative. Such events are rare and do not have a dramaticeffect on the source reconstruction. The adaptive Cartesiangrid is less susceptible to such gridding issues. The method described here is developed in Xu & Liu (2006);we reproduce some of the key elements. It is called by theauthors an irregular grid finite difference method based onthe Green-Gauss theorem (as Green’s theorem reduces toGauss’ theorem in two dimensions). The theorem states thatfor a scalar function F defined on R with continuous par-tial derivatives, we can relate the surface integral over someregion Ω to a line integral along the boundary of Ω: (cid:90) (cid:90) Ω (cid:126) ∇ F dΩ = (cid:73) ∂ Ω F ˆ n d s, (22) Figure 11.
Diagram illustrating the derivative calculation us-ing the DTM. The blue point again indicates the pixel m wherewe seek to compute the derivative. The black, solid lines forma polygon surrounding m . The square points are placed at themidpoints between m and the surrounding pixels. The trianglepoints are placed at the centroids of the triangles. The stencilΩ is formed by connected midpoints to adjacent centroids. Theunit vector ˆ n , denoted by the magenta arrows, is orthogonal tothe edges of the stencil, points outwards, and changes directionas the stencil is traced along its edges. This figure is inspired byFig. 1 in Xu & Liu (2006). where ˆ n is a unit normal vector on the boundary, point-ing outwards. Suppose Ω is a region, called a stencil, smallenough that (cid:126) ∇ F is approximately constant across the re-gion. Then we can pull the gradient out of the integral andwrite (cid:126) ∇ F = 1Ω (cid:73) ∂ Ω F ˆ n d s, (23)from which it follows that (cid:126) ∇ · ( (cid:126) ∇ F ) = 1Ω (cid:73) ∂ Ω ˆ n · (cid:126) ∇ F d s, (24)For implementation, the integrals are converted to sums, andΩ is defined by connecting centroids of Delaunay triangles toadjacent midpoints of the sides of the triangles (see Fig. 11).Depending on the density of source pixels, the stencil Ω maynot be small enough for ∇ F to be constant. We neverthelesstake Eq. 23 to define an effective gradient for each pixel. Unlike the FDM, the DTM does not assume the fluxvanishes outside the grid. Eq. 23 and 24 can be applied tothe grid edges, as long as care is taken in closing the line in-tegrals. Thus, edge effects in the regularisation are minimal.Fig. 9 suggests that derivatives computed with the DTM aremore accurate than those computed with the FDM, becausethe DTM uses all nearby pixels.
As an alternative to derivative-based regularisation, we havedeveloped a quadratic form of regularisation that penalisesthe source for deviations in surface brightness from one ormore analytic profiles. We find that analytic source regular-isation (ASR) is especially useful in recovering the surfacebrightness of the source in noisy data. Currently, the refer-ence surface brightness distribution has a S´ersic profile, I ( (cid:126)r ) = I exp (cid:34) − (cid:18) | (cid:126)r | r s (cid:19) /n (cid:35) , (25) Implementation of the second derivative requires additionalcorrection terms found in Xu & Liu (2006); it is still under refine-ment in pixsrc .c (cid:13) , 000–000 Tagore & Keeton where I is the normalisation, r s is the scale radius, and n is the S´ersic index. Elliptical models are created from a lin-ear transformation of coordinates. More complicated sourcescan be built from a combination of S´ersic profiles that rep-resent multiple, blended sources (such as “knots” in star-forming galaxies).To implement ASR, we first find the analytic source s a that best fits the data. We vary the position, normalisation,scale radius, S´ersic index, ellipticity, and position angle ofthe analytic source using a downhill simplex optimisationroutine (Press et al. 2002). We then use the best-fit analyticsource to construct a regularisation matrix, H , that acts ona source vector, s , to produce a deviation vector, ∆ = Hs ,whose value at pixel m is given by δ [ m ] = (cid:18) (cid:88) n s [ n ] s a [ n ] (cid:19) − N s [ m ] s a [ m ] , (26)where s a [ p ] is the flux at p from the analytic source, and thesum is over N pixels that share a Delaunay triangle withpixel m . The deviation vector vanishes if the source vectoragrees completely with the analytic profile, or indeed if s isany real multiple of s a . More generally, ∆ quantifies thedegree to which s does not match a multiple of s a . Becausethe inverse brightness values in Eq. 26 can become large to-ward the outer regions of the analytic profile, we set theanalytic source flux to 10% of the noise level once it fallsbelow this value. Fig. 12 suggests that ASR is more effec-tive than derivative-based regularisation at recovering thesource from noisy data. This result is perhaps not surpris-ing; because ASR assumes a functional form for the source,it is a stronger prior than derivative-based regularisation.It is important to note that ASR will yield accurate sourcereconstructions only if the true source is well described bythe assumed functional form.At this point we should consider whether regularisa-tion introduces any biases in the values or uncertaintiesfor recovered lens model parameters. Because the noise isGaussian and centered on zero, we conjecture that analysingmany different realisations of the noise can uncover the trueunderlying likelihood function (as an alternative to explic-itly regularising the source surface brightness). We constructthousands of “observations” of a cusp lens with a peak S/Nof unity and a resolution of 0.1 arcsec/pixel. We vary theellipticity of the lens while holding other parameters fixedat their true values. The χ curves from individual runsvary significantly, but stacking the results washes away thefluctuations from noise (see the red curve in Fig. 13). Thestacked curve from ASR (shown in blue) matches the un-derlying χ curve well. The results from curvature regular-isation, by contrast, show a small bias toward lower ellip-ticity and underestimate the uncertainties for this parame-ter. This is yet another indication that ASR can outperformderivative-based regularisation when the data are noisy andthe true source follows an analytic profile. At higher S/N(not shown), there is less difference between the regularisa-tion schemes. Because ASR can obtain a minimum for s (cid:54) = , some of thealgebra in § Figure 12.
Comparison of sources reconstructed from noisy data.The test data have a source in the cusp configuration with a peakS/N of 1 (see the top left panel of Fig. 2) and a resolution of0.05 arcsec/pixel. Clockwise from top left: the true source sur-face brightness followed by sources reconstructed from gradient-based regularisation, curvature-based regularisation, and ana-lytic source regularisation. The source recovered using ASR bestmatches the true source brightness. In the case of blended sources(not shown), ASR also outperforms derivative-based regularisa-tion.
Figure 13.
Effects of regularisation on parameter estimation.The y -axis label “statistic” denotes χ and − E for cases with-out and with regularisation, respectively, but for simplicity werefer to both as χ . (We have applied a vertical offset to facilitatecomparing the curves.) We construct many realisations of a cusplens with a peak S/N of unity and a resolution of 0.1 arcsec/pixel.After stacking the results, we expect the χ curve without regu-larisation (shown in red) to represent the true errors on the ellip-ticity. ASR (shown in blue) seems to agree well with the referencecase. By contrast, curvature regularisation (shown in green) hasa minimum that is shifted away from the true value e = 0 .
3. Also,the χ curve rises rapidly away from the minimum, causing theparameter uncertainties to be underestimated. For each case, thedark, thick line corresponds to the median value, and the bandsare 68% confidence intervals, estimated from bootstrapping.c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction χ When exploring the lens model parameter space, we findthat the likelihood surface can be jagged even for ourclean test data. Wallington, Narayan & Kochanek (1994) re-marked on “glitches” in χ for their maximum entropy anal-ysis, but noted that the glitches disappeared as the PSF andnoise vanished. In our analysis, the jaggedness is reduced butnot eliminated in that limit. It arises, we suspect, from thediscrete nature of PBSR itself. A small, continuous changein the lens model parameters can shift the source pixels ina way that causes the Delaunay algorithm to connect thepixels in a different way, leading to abrupt changes in thelensing operator and regularisation matrix.To probe these issues, we examine one-dimensional cutsof the χ surface for various gridding and regularisationschemes. We focus on test data for the cusp configurationwith a peak S/N of 25 and pixel scale of 0 .
05 arcsec/pixel.We fix all lens model parameters at their correct values andvary only the ellipticity of the lensing galaxy. We considerdifferent combinations of grids (fully adaptive or adaptiveCartesian) and priors (gradient regularisation with FDM orDTM, or ASR). The results are shown in Fig. 14.Qualitatively, we find that the fully adaptive grid showsless small-scale fluctuation in χ than the adaptive Carte-sian grid. Since the fully adaptive grid is constructed byray tracing image pixels to the source plane, it more nat-urally accommodates small changes in the lens model. Theadaptive Cartesian grid, by contrast, either remains fixedor changes discretely (if the magnification crosses the cri-terion for subgridding; see § χ . It may be possible to improve theperformance of the adaptive Cartesian grid by developinga different criterion for subgridding, but such modificationshave not yet been explored.Turning to regularisation, the DTM yields somewhatsmaller fluctuations than the FDM, at least for the fullyadaptive grid (with the adaptive Cartesian grid, the noiseis dominated by the gridding anyway). ASR leads to thesmoothest χ curves for both types of grids. As the lensmodel parameters vary, the best fit analytic source and thecorresponding weights in the regularisation matrix can varysmoothly as well. It is interesting that the fully adaptivegrid with the DTM does not show a similar level of smooth-ness, because that method also changes continuously withlens model parameters. The difference may occur becausethe deviation vector δ in Eq. 26 is a dimensionless ratio ofsurface brightnesses, whereas the derivatives used for gradi-ent or curvature regularisation have units of surface bright-ness divided by distance or squared distance. Using a di-mensionless measure of deviation allows each pixel to haveequal weight in the regularisation matrix, a quality that thederivative-based methods do not necessarily have.Finally, we note that the χ curve is flatter near theminimum for ASR than it is for gradient regularisation (fo-cusing now on the fully adaptive grid). This causes ASRto yield larger uncertainties in the ellipticity of the lensinggalaxy, as we saw already in Fig. 13. If the ASR is taken torepresent the true posterior probability distribution, then the errors reported using the fully adaptive grid with gradi-ent regularisation are being underestimated.In summary, we find that the gridding and regularisa-tion schemes both affect the level of noise in the Bayesianevidence. These two algorithmic issues need to be consideredcarefully in applications of PBSR. In real data, the image resolution is typically fixed by theobservational equipment, but the telescope pointing and thenoise in the data are particular realisations; on a differentday, the same observation would not actually be identical. We now consider whether such chance events introduce anystatistical or systematic uncertainties into conclusions de-rived from lens modeling. We examine noise and pointingboth separately and jointly, with and without a PSF, sometimes just optimising the parameters and sometimesperforming a full parameter space exploration. We assumeGaussian noise with zero mean, which can be consideredto represent electron read-out noise, Poisson noise (in thelarge mean limit), or sky noise. As a fiducial case, we usea lens in the cusp configuration with a pixel scale of 0.05arcsec/pixel and a peak S/N of 10, but we examine differentchoices as discussed below. Since ASR is computationally ex-pensive, and curvature regularisation is well suited for initialparameter space explorations (see § While the noise level will affect the uncertainty in lens modelparameters, the particular noise realisation will also affectthe best-fit values of the parameters. To explore this pos-sibility, we create 100 “observations” with the same databut different realisations of the noise, for the various noiselevels shown in Fig. 2. We optimise the parameters and ex-amine the scatter among best-fit values (at this point weare not fully quantifying the parameter uncertainties). Thepixel scale is fixed at the high resolution of 0.02 arcsec/pixel(see Fig. 3) so that effects due to pixel size are minimised.Fig. 15 shows the results in terms of different two-dimensional parameter projections, along with the medianand 68% confidence intervals for individual parameters.There is no significant bias in the parameter values. Em-pirically, the scatter among best-fit values appears to have apower law dependence on S/N with a slope of ∼ − . Telescope pointing affects how photons are collected intopixels, so small shifts may influence the data and hence the Observations often include multiple exposures to handle cos-mic rays, bad pixels, dithering, and subsampling the PSF. Weimagine our analysis being applied to the final image after datareduction. The PSF is used both in creating the mock data and in mod-eling the lens.c (cid:13) , 000–000 Tagore & Keeton
Figure 14.
Effective χ as a function of ellipticity for various gridding and regularisation schemes. The test data have a source in thecusp configuration with a peak S/N of 25 and a resolution of 0.05 arcsec/pixel. The columns correspond to different grids (left is fullyadaptive, right is adaptive Cartesian). The rows correspond to different regularisation schemes (top is gradient regularisation with FDM,middle is gradient regularisation with DTM, bottom is analytic source regularisation [ASR]). The different panels have the same verticalrange (and the vertical offsets are not meaningful). In general, the fully adaptive grid leads to less noise in χ than the adaptive Cartesiangrid. ASR produces the smoothest curve over large scales, presumably because the regularisation matrix does not change discretely andthe deviation from an analytic profile is measured in a dimensionless way. c (cid:13)000
Effective χ as a function of ellipticity for various gridding and regularisation schemes. The test data have a source in thecusp configuration with a peak S/N of 25 and a resolution of 0.05 arcsec/pixel. The columns correspond to different grids (left is fullyadaptive, right is adaptive Cartesian). The rows correspond to different regularisation schemes (top is gradient regularisation with FDM,middle is gradient regularisation with DTM, bottom is analytic source regularisation [ASR]). The different panels have the same verticalrange (and the vertical offsets are not meaningful). In general, the fully adaptive grid leads to less noise in χ than the adaptive Cartesiangrid. ASR produces the smoothest curve over large scales, presumably because the regularisation matrix does not change discretely andthe deviation from an analytic profile is measured in a dimensionless way. c (cid:13)000 , 000–000 ncertainties in pixel-based source reconstruction Figure 15.
Best-fit lens model parameters for different realisation of noise in the data. The source is in the cusp configuration. Red,green, and blue points correspond to peak S/N levels of 1, 10, and 25, respectively. The points marked correspond to optimal lens modelparameters; this analysis does not include full parameter uncertainties. The quoted uncertainties indicate the ranges that enclose 68%of the best-fit values. The pixel scale is fixed at 0.02 arcsec/pixel so the effects of pixel size are minimal. Dashed, yellow lines mark thetrue values. recovered model parameters. To explore this issue, we againcreate 100 “observations” in which the pointing is shiftedrandomly. The shifts are drawn from a uniform distributionthat is one pixel in each direction, for image resolutions of0.03, 0.05, and 0.1 arcsec/pixel. pixsrc requires some amountof noise, but the noise map is kept identical and the noiselevel is minimal (the peak S/N is 5 × ) so the effects arenegligible.Fig. 16 shows two-dimensional projections of the best-fit parameter values. The median values reveal biases thatare small (a fraction of a pixel for the Einstein radius andposition of the lens galaxy) but statistically significant. Thebiases become less significant, however, when a PSF is in-cluded (see Fig. 17). For the case with no PSF, the scatterin the best-fit parameter values follows a power law witha slope of ∼ . Now we consider noise and telescope pointing together, andwe extend the analysis to all four test image configurations.We again create multiple “observations” but now each con-tains both a different realisation of the noise and a differentrandom pointing. Table 5.3 quantifies the spread in best-fit Ignoring edge effects, shifts of N + ∆ x are equivalent to shiftsof ∆ x , where N is an integer. parameter values for all four image configurations and peakS/N values of 1, 10, and 500.In general, the scatter decreases as the S/N increases.The 2-image case tends to have more scatter than the othercases because a 2-image configuration provides weaker con-straints than configurations that have additional imagesand/or long arcs. The high-S/N cases show some small for-mal biases in the parameters, but we expect those would bereduced if a PSF were included.For some applications we are interested in the intrin-sic properties of the source galaxy (e.g., Sharon et al. 2012;Dye et al. 2014). Depending on the information available,it may be possible to estimate the luminosity, dynamicalmass, mass-to-light ratio, gas mass fraction, and star forma-tion rate for the source. Such applications require knowledgeof the lensing magnification, so we examine uncertainties inthe magnification associated with noise and pointing. Specif-ically, for each lens model in Table 5.3 we compute the totalmagnification of the source. We quantify the scatter usingthe 68% confidence interval, and then divide by the truemagnification to obtain the fractional uncertainty for eachlens configuration. (We are still just examining the scatteramong best-fit models for different realisations of noise andpointing; we are not yet characterising the full uncertaintiesin individual lens models.)Fig. 18 shows the results. At low S/N, the cusp configu-ration has the largest uncertainties, presumably because thesource lies in a region where small changes in the model canlead to large changes in the magnification. At higher S/N, c (cid:13) , 000–000 Tagore & Keeton
Figure 16.
Best-fit lens model parameters for different realisations of the telescope pointing. The source is in the cusp configuration.The shifts are drawn from a uniform distribution that is one pixel in each direction. Red, green, and blue points correspond to imageresolutions of 0.1, 0.05, and 0.03 arcsec/pixel, respectively. The peak S/N is 5 × so that effects related to noise are negligible. Dashed,yellow lines mark the true values. Figure 17.
Similar to Fig. 16 but including a PSF. The source is placed in the cusp configuration, and the image resolution is fixed at0.05 arcsec/pixel. The red points have no PSF, while the green and blue points have circular Gaussian PSFs with FWHM equal to 0.059and 0.12 arcsec, respectively. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction S/N = 1
S/N = 10
S/N = 500 lower median upper lower median upper lower median upper cusp R E e fold R E e cross R E e R E e Table 1.
Best-fit lens model parameters when we consider different noise realizations and telescope pointings simultaneously. Theobservations correspond to peak S/N values of 1, 10, and 500, and the resolution is 0.05 arcsec/pixel. For each set of observations, themiddle column corresponds to the median value recovered, and the lower and upper bounds of the 68% CI are shown in the first andthird columns, respectively. the 2-image case fares worst because the lens model is nothighly constrained. At all S/N values, the cross case has thesmallest fractional uncertainties because the source is in aregion where the magnification gradient is small. All told,for S / N (cid:38)
10 the scatter in magnification associated withnoise and pointing is (cid:46)
10% for all lens configurations.
To this point we have only examined how the best-fit lensmodel parameters change with different realisations of thenoise and telescope pointing. Now for each “observation” weuse an adaptive Markov Chain Monte Carlo (MCMC) algo-rithm to explore the full parameter space and characterisethe posterior distribution of parameters. The width of theposterior depends on the noise level, while the peak loca-tion depends on the particular realisation of the noise andpointing. By comparing the width of each posterior to thescatter across realisations, we can investigate how the scat-ter from pointing compares to the scatter from noise. Note that noise contributes to this analysis twice: to the width ofeach posterior, and to the scatter between them. We con-sider this “double counting” when interpreting the results,as discussed below.Fig. 19 shows the 68% and 95% confidence intervals forlens model parameters when we combine all of the realisa-tions. The noise level is fixed so the peak S/N is 10, and theimage resolution is fixed at 0.05 arcsec/pixel. We analyse thecusp configuration, both without a PSF and with a PSF thathas a FWHM of 0.12 or 0.24 arcsec. Adding a PSF causesthe distributions to shift and broaden to some degree, butthe true values always lie within the 95% confidence interval.Parameter inference, in other words, is robust.Let S tot be the width of the posterior from the com-bined analysis. For comparison, let S i be the width froman individual “observation.” Since S i only accounts for noise We quantify the width in terms of the 68% confidence interval,and use the symbol S to distinguish this scatter from the standarddeviation.c (cid:13) , 000–000 Tagore & Keeton
Figure 19.
Marginal posterior probability distributions for lens model parameters, using a source in the cusp configuration with animage resolution of 0.05 arcsec/pixel and a peak S/N of 10. We use a Markov Chain Monte Carlo analysis to explore the parameterspace and combine 100 realisations of the noise and telescope pointing. The contour plots show 68% and 95% confidence intervals forthe various 2-dimensional projections. The top plots show individual probability distributions (normalised to the same peak), with the95% confidence interval marked by points (defined so 2.5% of the integrated probability is in each of the left and right tails). Solid blue,dashed green, and dot-dashed red curves correspond to data created with circular Gaussian PSFs having FHWMs of 0.0, 0.12, and 0.24arcsec, respectively.
Figure 18.
Scatter in the lensing magnification (shown as a frac-tional uncertainty, quoted as a percentage) for different realisa-tions of the noise and telescope pointing. (This analysis does nottake full lens model parameter uncertainties into account.) Sta-tistical errorbars on the scatter are computed with a bootstrapanalysis. while S tot accounts for both noise and pointing, we generallyexpect S tot > S i . Indeed, Fig. 20 shows that this ratio typ-ically has values between 1 and 2. To understand what wemight expect, consider that if the distributions were Gaus-sian then the total scatter would be the quadrature sum ofthe width of each run and the scatter between runs: σ tot ≈ (cid:0) σ + σ (cid:1) / ≈ (cid:0) σ + σ (cid:1) / , (27)where σ width ≈ σ noise while σ scatter ≈ ( σ + σ ) / .In other words, we might na¨ıvely predict that the ratio inFig. 20 has the form S tot S i ≈ (cid:18) σ + σ σ (cid:19) / . (28)If the scatter from pointing is negligible compared with thescatter from noise, the ratio S tot /S i would have a value near √ ≈ .
4. As the scatter from pointing increases, the ratiowould likewise increase. We could therefore interpret scatterratios above √ √
2. The 2-image configuration has values that are nom-inally higher but still consistent with √ c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction Figure 20.
Ratio of the overall scatter, S tot , to the scatter forindividual runs, S i , for the various lens model parameters. Thepoint corresponds to the median value of the ratio, while theerrorbars are computed with a bootstrap analysis. A value of S tot /S i ≈ √ ≈ . on Gaussianity, which may not strictly apply to our distri-butions, the results suggest that the statsitical properties ofour runs are sensible. We note that these conclusions maydepend on the pixel scale and noise level, which we have notexplored in detail. We have introduced a new pixel-based source reconstruction(PBSR) software called pixsrc and applied it to mock data inorder to investigate statistical and systematic uncertaintiesin modeling lenses with extended sources. We have examinedseveral issues that are intrinsic to the pixel-based approach: • The χ surface contains “discreteness noise” that is in-fluenced by the gridding and regularisation schemes. • Errors associated with interpolating surface brightnessvalues in the source plane need to be taken into account,especially for high-S/N data. • Adaptive grids are often used to achieve good resolu-tion in the source plane, but they require some care whencomputing numerical derivatives. • A new regularisation scheme called analytic source reg-ularisation (ASR) reconstructs a source with more fidelitythan derivative-based regularisation when the data are noisy. • Compared to ASR, curvature regularisation may under-estimate parameter uncertainties for noisy data.We have applied ASR to sources that are fairly regular, butit could be extended to blended sources or galaxies withstar-forming regions by writing the analytic source as acollection of S´ersic profiles. Differences between ASR andderivative-based regularisation are smaller when the S/N ra-tio is higher. We have also examined statistical issues that arise be-cause any given data set has a particular realisation of thenoise and telescope pointing. For the cusp configuration, wefind that different realisations of the noise lead to scatterin the best-fit model parameters that scales as a power lawin S/N with a slope of ∼ − .
8. Different realisations of thepointing lead to scatter that scales as a power law in the pixelscale with a slope of ∼ .
3. Some parameters show small butstatistically significant biases, but those can be washed outwith the inclusion of a PSF. When we fully characterise themodel uncertainties, the 95% confidence intervals always in-clude the true parameter values, with or without a PSF.These results are not highly sensitive to the image configu-ration, except that our 2-image lens has more scatter thanour 4-image lenses because the constraints on the lens modelare weaker.The scatter in noise and pointing lead to scatter inthe lensing magnification, which is important for determin-ing the intrinsic properties of the source. The magnificationscatter does vary with the lens configuration because it issensitive to how rapidly the magnification changes at the lo-cation of the source. This scatter decreases with increasingS/N, but more slowly for the 2-image configuration than forthe 4-image cases. For S/N (cid:38)
10 the scatter in magnifica-tion associated with noise and pointing is (cid:46)
10% for all lensconfigurations.We note that real data may have complications beyondthe issues we have addressed. Examples include irregularstructure in the source, differential extinction by dust inthe lens galaxy, departures from a smooth lensing potential,incomplete knowledge of the PSF, and intricate aspects ofimage reduction. Such issues will be specific to particulardata sets and need to be examined in conjunction with thealgorithmic issues presented here.We have discussed a number of different approaches toPBSR, so let us summarise our suggestions for modelingthat is both efficient and effective. If the images can be sep-arated, it may be useful to take their positions and fluxesand perform an initial parameter search assuming point-likeimages. Then using an analytic source characterised by asmall number of free parameters can help identify the ap-propriate region of parameter space. When undertaking fullPBSR, derivative-based regularisation is good for computa-tional efficiency, but analytic source regularisation is a valu-able step if the source is reasonably well described by ananalytic profile or a collection of such profiles. Finally, it isa good idea to find the best-fit lens and source models andcreate many realisations of similar observations (as in § ACKNOWLEDGEMENTS
We thank Andrew Baker, Simon Dye, Ross Fadely, CurtisMcCully, and Brandon Patel for helpful comments. We alsothank the referee, Olaf Wucknitz, for thorough and helpfulcomments on the manuscript. We acknowledge the follow-ing computational tools. For performing Delaunay triangu-lations, we use the triangle code by Shewchuk (1996). Forhandling world coordinate system information, we use the c (cid:13) , 000–000 Tagore & Keeton wcslib library (Calabretta & Greisen 2002) compiled by WC-STOOLS (Mink 2011). For solving linear matrix equationsand computing matrix factorizations, we use UMFPACK(Davis 2004b,a; Davis & Duff 1999, 2006). For enabling GPUcomputing, we use CUDA (NVIDIA Corporation 2007) andThrust (Hoberock & Bell 2010). We acknowledge fundingfrom grants AST-0747311 and AST-1211385 from the U.S.National Science Foundation.
REFERENCES
Blandford R., Surpi G., Kundi´c T., 2001, in AstronomicalSociety of the Pacific Conference Series, Vol. 237, Gravita-tional Lensing: Recent Progress and Future Goals, Brain-erd T. G., Kochanek C. S., eds., p. 65Brewer B. J., Lewis G. F., 2006, ApJ, 637, 608Calabretta M. R., Greisen E. W., 2002, A&A, 395, 1077Davis T. A., 2004a, ACM Trans. Math. Softw., 30, 196Davis T. A., 2004b, ACM Trans. Math. Softw., 30, 165Davis T. A., Duff I. S., 1999, ACM Trans. Math. Softw.,25, 1Davis T. A., Duff I. S., 2006, SIAM. J. Matrix Anal. andAppl., 18, 140Dye S. et al., 2014, MNRAS, 440, 2013Dye S., Warren S. J., 2005, ApJ, 623, 31Hoberock J., Bell N., 2010, Thrust: A parallel templatelibrary. Version 1.7.0H¨ogbom J. A., 1974, A&AS, 15, 417Jenkins C. R., Peacock J. A., 2011, MNRAS, 413, 2895Keeton C. R., 2001, arXiv:astro-ph/0102340Kochanek C. S., Keeton C. R., McLeod B. A., 2001, ApJ,547, 50Kochanek C. S., Narayan R., 1992, ApJ, 401, 461Koopmans L. V. E., 2005, MNRAS, 363, 1136Koopmans L. V. E. et al., 2009, ApJ, 703, L51Kronawitter A., Saglia R. P., Gerhard O., Bender R., 2000,A&AS, 144, 53Mink D. J., 2011, Astrophysics Source Code Library, 9015NVIDIA Corporation, 2007, NVIDIA CUDA Com-pute Unified Device Architecture Programming Guide.NVIDIA CorporationPress W. H., Teukolsky S. A., Vetterling W. T., FlanneryB. P., 2002, Numerical recipes in C++ : the art of scientificcomputingSchneider P., Kochanek C. S., Wambsganss J., 2006, Grav-itational Lensing: Strong, Weak and Micro. SpringerSharon K., Gladders M. D., Rigby J. R., Wuyts E., KoesterB. P., Bayliss M. B., Barrientos L. F., 2012, ApJ, 746, 161Shewchuk J. R., 1996, in Lecture Notes in Computer Sci-ence, Vol. 1148, Applied Computational Geometry: To-wards Geometric Engineering, Lin M. C., Manocha D.,eds., Springer-Verlag, pp. 203–222, from the First ACMWorkshop on Applied Computational GeometrySuyu S. H., Halkola A., 2010, A&A, 524, A94Suyu S. H. et al., 2012, ApJ, 750, 10Suyu S. H., Marshall P. J., Auger M. W., Hilbert S., Bland-ford R. D., Koopmans L. V. E., Fassnacht C. D., Treu T.,2010, ApJ, 711, 201Suyu S. H., Marshall P. J., Blandford R. D., FassnachtC. D., Koopmans L. V. E., McKean J. P., Treu T., 2009,ApJ, 691, 277 Suyu S. H., Marshall P. J., Hobson M. P., Blandford R. D.,2006, MNRAS, 371, 983Treu T., 2010, ARA&A, 48, 87Vegetti S., Czoske O., Koopmans L. V. E., 2010, MNRAS,407, 225Vegetti S., Koopmans L. V. E., 2009, MNRAS, 392, 945Vegetti S., Koopmans L. V. E., Bolton A., Treu T., GavazziR., 2010, MNRAS, 408, 1969Vegetti S., Lagattuta D. J., McKean J. P., Auger M. W.,Fassnacht C. D., Koopmans L. V. E., 2012, Nature, 481,341Wallington S., Kochanek C. S., Narayan R., 1996, ApJ,465, 64Wallington S., Narayan R., Kochanek C. S., 1994, ApJ,426, 60Warren S. J., Dye S., 2003, ApJ, 590, 673Wayth R. B., Warren S. J., Lewis G. F., Hewett P. C.,2005, MNRAS, 360, 1333Xu G., Liu G. R., 2006, WSEAS Trans Math, 5 c (cid:13)000