[PDF] Statistical and systematic uncertainties in pixel-based source reconstruction algorithms for gravitational lensing

Abstract

Gravitational lens modeling of spatially resolved sources is a challenging inverse problem with many observational constraints and model parameters. We examine established pixel-based source reconstruction algorithms for de-lensing the source and constraining lens model parameters. Using test data for four canonical lens configurations, we explore statistical and systematic uncertainties associated with gridding, source regularisation, interpolation errors, noise, and telescope pointing. Specifically, we compare two gridding schemes in the source plane: a fully adaptive grid that follows the lens mapping but is irregular, and an adaptive Cartesian grid. We also consider regularisation schemes that minimise derivatives of the source (using two finite difference methods) and introduce a scheme that minimises deviations from an analytic source profile. Careful choice of gridding and regularisation can reduce "discreteness noise" in the χ 2 surface that is inherent in the pixel-based methodology. With a gridded source, some degree of interpolation is unavoidable, and errors due to interpolation need to be taken into account (especially for high signal-to-noise data). Different realisations of the noise and telescope pointing lead to slightly different values for lens model parameters, and the scatter between different "observations" can be comparable to or larger than the model uncertainties themselves. The same effects create scatter in the lensing magnification at the level of a few percent for a peak signal-to-noise ratio of 10, which decreases as the data quality improves.

Full PDF

MMon. Not. R. Astron. Soc. , 000–000 (0000) Printed 10 October 2018 (MN L A TEX style ﬁle v2.2)

Statistical and systematic uncertainties in pixel-basedsource reconstruction algorithms for gravitational lensing

Amitpal S. Tagore and Charles R. Keeton

Department of Physics & Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ 08854, USA

10 October 2018

ABSTRACT

Gravitational lens modeling of spatially resolved sources is a challenging inverse prob-lem with many observational constraints and model parameters. We examine estab-lished pixel-based source reconstruction algorithms for de-lensing the source and con-straining lens model parameters. Using test data for four canonical lens conﬁgurations,we explore statistical and systematic uncertainties associated with gridding, source reg-ularisation, interpolation errors, noise, and telescope pointing. Speciﬁcally, we comparetwo gridding schemes in the source plane: a fully adaptive grid that follows the lensmapping but is irregular, and an adaptive Cartesian grid. We also consider regular-isation schemes that minimise derivatives of the source (using two ﬁnite diﬀerencemethods) and introduce a scheme that minimises deviations from an analytic sourceproﬁle. Careful choice of gridding and regularisation can reduce “discreteness noise” inthe χ surface that is inherent in the pixel-based methodology. With a gridded source,some degree of interpolation is unavoidable, and errors due to interpolation need tobe taken into account (especially for high signal-to-noise data). Diﬀerent realisationsof the noise and telescope pointing lead to slightly diﬀerent values for lens model pa-rameters, and the scatter between diﬀerent “observations” can be comparable to orlarger than the model uncertainties themselves. The same eﬀects create scatter in thelensing magniﬁcation at the level of a few percent for a peak signal-to-noise ratio of10, which decreases as the data quality improves. Key words: gravitational lensing: strong – methods: numerical

The gravitational deﬂection of light produces a variety ofobservable eﬀects that can be used to study the mass distri-butions of deﬂectors (e.g., galaxies and clusters of galaxies)and the structure of light sources (e.g., distant quasars andstar-forming galaxies), and to constrain cosmological param-eters (see the review by Schneider, Kochanek & Wambsganss2006). In this paper, we focus on strong gravitational lensingin which light bending creates multiple images of the source.If the source is compact and unresolved, the images andsource are each characterised by just three numbers: two po-sition coordinates and a ﬂux. If the source is extended, theresolved images provide many constraints but the structureof the source must be included in the modeling. One ap-proach is to assume the source has elliptical symmetry andanalyse isophotal shapes (e.g., Blandford, Surpi & Kundi´c2001) or peak surface brightness curves (e.g., Kochanek,Keeton & McLeod 2001) in Einstein rings. A more generalapproach is to reconstruct the source on a grid in order topermit complex structure and reproduce the data pixel by pixel. Pixel-based source reconstruction (PBSR) algorithmstake full advantage of the information in the lensed images,but the large numbers of constraints (image pixels) and freeparameters (source pixels) demand advanced techniques andmore computational eﬀort.The history of extended image lens modeling is rich.Early implementations of PBSR algorithms (e.g., Walling-ton, Kochanek & Narayan 1996; Koopmans 2005) used atwo-loop method in which an outer loop varied the lensmodel parameters, while an inner loop varied source param-eters to ﬁnd the best ﬁt given a lens model. The lens was de-scribed parametrically, typically using standard galaxy andhalo mass proﬁles. The source, by contrast, was constructedon a Cartesian grid, and a penalty function was used to dis-favor source models that seemed too unphysical. Varying allof the source pixels independently was a costly step. Warren& Dye (2003) simpliﬁed the inner loop by showing that thelensing equation can be written as a matrix equation, allow-ing the optimal source to be found in a single, analytic step(see § c (cid:13) a r X i v : . [ a s t r o - ph . C O ] A ug Tagore & Keeton (2005) and Vegetti & Koopmans (2009) introduced irregularsource grids while keeping the inner loop linear (see § CLEAN algorithm (H¨ogbom 1974) ﬁts the“dirty” map of observed surface brightnesses with pointsources. It ﬁnds the brightest region in the map and sub-tracts a point source convolved with the instrumental beam,and then iterates until a stopping criterion is met. LensClean(Kochanek & Narayan 1992) adds a step in which the pointsource is gravitationally lensed before the images are sub-tracted, allowing the lens model and source to be ﬁt simul-taneously.In this paper, we present a new software called pixsrc that performs PBSR in conjunction with the established lensmodel software (Keeton 2001) for exploring the lensmodel parameter space. We present the methodology be-hind pixsrc and then discuss issues that arise during thelens modeling process. In particular, we investigate statis-tical uncertainties and systematic biases inherent in PBSRmethods by analysing representative galaxy-galaxy stronglensing events. We examine the eﬀects of noise and tele-scope pointing on the lens model analysis, as well the eﬀectsof diﬀerent choices of gridding and priors.

For a given dataset, there may be lens models that ﬁt thedata well but require a source that seems unrealistic. (Amodel with no mass can ﬁt the data perfectly if the sourcelooks exactly like the image.) There may also be models forwhich the source ﬁts the noise in addition to the lens data.Using Bayesian inference, priors can be used to reject modelsthat are unphysical or overﬁt the noise. This section reviewsthe formal framework for PBSR, which has been discussedin detail by Suyu et al. (2006) and Vegetti & Koopmans(2009). We reproduce only the key aspects here.

In the absence of dust or other attenuation, lensing conservessurface brightness. The mapping between the source planeand image plane can therefore be written as d = Ls + n , (1)where L is a linear “lensing operator” that acts on surfacebrightness values. This operator can encode not only thegravitational deﬂections of the lens but also eﬀects from theatmosphere and telescope. For example, if G characterisesthe lens while B is a “blurring operator” that characterisesthe point spread function (PSF) of the observations, we candeﬁne L ≡ BG to capture the combined eﬀects. s and d are vectors containing the surface brightness values in thesource plane and image plane, respectively, and n is thenoise present in the data. If the source and data are two-dimensional images with surface brightness values speciﬁedon a Cartesian grid, the one-dimensional vectors s and d can be constructed by column- or row-stacking the two-dimensional images. If the source grid is irregular, the struc-ture of s can be more complicated, but the formal frameworkstill applies. For reference, we note that the numbers of pix-els in the source and image plane maps are N s and N d ,respectively.If the noise is Gaussian, we can write the likelihood ofobserving data d given a lensing operator L and source s as P ( d | L , s ) ∝ exp (cid:18) − E d ( d | L , s ) (cid:19) , (2)where E d ( d | L , s ) = 12 χ ( s ) = 12 ( Ls − d ) (cid:62) C − ( Ls − d ) , (3)and C d is the symmetric noise covariance matrix, which con-tains pixel-to-pixel noise correlations. For the case of uni-form, pixel-independent noise, C d is diagonal with entriesequal to σ , where σ is the standard deviation of the noise.Suyu et al. (2006) deﬁne the most likely solution, s ml ,as the source model that maximises the likelihood and thusminimises E d . Setting ∇ E d ( s ) = 0, we ﬁnd that s ml satisﬁes Fs = f , (4)where F = L (cid:62) C − L and f = L (cid:62) C − d . Because F is squareand invertible by construction, s ml is given by s ml = F − f . (5)If the left-inverse, L − , of L exists, then Eq. 5 reduces to We adopt the following conventions: one-dimensional vectorsare denoted by bold lower-case letters, two-dimensional matricesare denoted by bold capital letters, and scalars are unbolded. F − will fail to exist if there are source pixels that cannot beconstrained by the image pixels (i.e., there are too many sourcepixels overall, or source pixels that do not map to regions of theimage plane with useful data), or if there are image pixels thatlack corresponding source pixels. Those situations can generallybe avoided with reasonable choices of grids. It is conceivable thatcertain grid conﬁgurations could also create problems for F − ,but those should be rare. L − will exist and be unique if L is square and non-singular.If L is rectangular, L − will exist if there are more image pixelsthan source pixels and L has full column rank. These conditionsc (cid:13) , 000–000 ncertainties in pixel-based source reconstruction what one might na¨ıvely expect: s ml = L − d . (6) Unfortunately, s ml will ﬁt the noise in the data in additionto the lensed images. There are several diﬀerent ways toavoid such overﬁtting. Maximum entropy methods (MEMs;Wallington, Narayan & Kochanek 1994) favor sources whosepixel values follow broad distributions expected from infor-mation theory, as opposed to sources with some pixels thatare very diﬀerent from the rest. MEMs also prohibit neg-ative surface brightness values. They do not constrain sur-face brightness variations between adjacent pixels, however,and can lead to large ﬂuctuations over small scales. To favorsources that are smooth, we might introduce a function thatpenalises large values of the ﬁrst or second derivative (theparticular choice depends on the data and underlying source;see Brewer & Lewis 2006; Suyu et al. 2006). If the penaltyfunction is quadratic in the source surface brightness, thesource that maximises the likelihood while minimising thepenalty is still given by a linear equation.Suppose, for example, that we want to introduce a func-tion E s ( s ) that penalises large surface brightness gradients.We can deﬁne a derivative operator H that acts on a sourcevector s to produce a vector Hs containing the gradient ofthe surface brightness at each pixel. Then we put E s ( s ) = 12 ( Hs ) (cid:62) Hs = 12 s (cid:62) ( H (cid:62) H ) s = 12 s (cid:62) Rs , (7)where R ≡ H (cid:62) H . In other words, when sandwiched be-tween two source vectors, R returns the square of the gradi-ent summed over source pixels. A similar construction canreturn the sum of the squares of the curvature (see § P ( s | L , R , d , λ ) ∝ exp (cid:18) − M ( s ) (cid:19) , (8)where M ( s ) ≡ E d ( s ) + λE s ( s ) , (9)and λ is a dimensionless parameter that determines whichterm in Eq. 9 dominates. When the “regularisation strength” λ is small, the Bayesian framework will primarily ﬁt thedata; while when λ is large, the framework will enforcestrong priors on the source.Suyu et al. (2006) deﬁne the most probable solution s mp as the source model that maximises the posterior and thusminimises M ( s ). To ﬁnd this model, we Taylor expand E d to second order about its minimum, E d ( s ) = E d ( s ml ) + 12 ( s − s ml ) (cid:62) F ( s − s ml ) . (10) may not be satisﬁed if two or more image pixels map to the samepoint (within machine precision), or if other similar coincidencesoccur. The matrix F that appears here is the Hessian of E d , butfrom Eq. 3 this is the same as F deﬁned in Eq. 4. Setting ∇ M ( s ) = 0, we ﬁnd that s mp satisﬁes As = Fs ml , (11)where A = F + λ R is the Hessian of M from Eq. 9. Because A is square and invertible by construction, s mp is given by s mp = A − Fs ml = A − f . (12)It remains to determine the regularisation strength λ seen in Eq. 9. In the Bayesian framework, the optimal valueof λ is found by maximising (see Suyu et al. 2006 for a fulldiscussion) P ( λ | d , L , R ) ∝ P ( d | L , λ, R ) P ( λ ) . (13)We assume a uniform logarithmic prior, P ( λ ) ∝ λ − , be-cause we do not know the scale of λ a priori. The optimalregularisation strength, ˆ λ , can then be found numerically.Formally, s mp is a biased estimator of the true sourcesurface brightness s true . Suyu et al. (2006) show that aver-aging over many realisations yields (cid:104) s mp (cid:105) = A − Fs true , (14)which diﬀers from s true to the extent that A − = ( F + λ R ) − diﬀers from F − . The simulations presented in § Once we solve for the source at a ﬁxed lens model, we mustrank diﬀerent models by evaluating the posterior probability P ( L , R | d ) ∝ P ( d | L , R ) × priors on L and R . (15)If the priors on the lens models and regularisation schemeare ﬂat, then we can just evaluate the Bayesian evidence P ( d | L , R ) = (cid:82) P ( d | L , λ, R ) P ( λ ) d λ . Suyu et al. (2006)suggest that the distribution for λ can be expected to havea sharp peak, so instead of computing the full integral wecan just evaluate the integrand at its peak.Examining Eqs. 2 and 15, we can infer that − E = χ + V, (16)where E is shorthand for the evidence and V is a constantthat depends on the available prior volume of the parameterspace. Thus, we will use χ and − E interchangeably.For a more detailed discussion on the connection betweenevidence and χ , see Jenkins & Peacock (2011). The Hessian of a function is a matrix that contains the secondorder partial derivatives of the function. In this case, the deriva-tives are taken with respect to the source vector. For example,the ( i, j ) entry of F would hold the second order derivative of E d with respect to the i th and j th source pixels. Strictly speaking, this is not the full evidence because the lensmodel parameters are not marginalised, but the terminology isstandard.c (cid:13) , 000–000

Tagore & Keeton

Figure 1.

Test data in four canonical conﬁgurations. Clockwisefrom top left: cusp, fold, cross, and 2-image lens conﬁgurations.For each panel, the diamond-shaped curve (the caustic) and ob-ject (source galaxy) inside the red square are in the source plane,while the elliptical curve (the critical curve) and the other fea-tures (the arcs) outside the red square are in the image plane. Thelensing galaxy used to create the data is the same in all cases: aSIE with Einstein radius of 3 (cid:48)(cid:48) , ellipticity of 0 .

3, position angle ofthe semi-major axis 60 ◦ east of north. The colour scale is linearand identical in all panels. The source galaxies used to create thedata share the same size and luminosity proﬁle; only the positionsand orientations diﬀer. To explore possible uncertainties and biases in PBSR algo-rithms, we construct test data using a simple but realisticlens and source. The lens is a singular isothermal ellipsoid(SIE), which is a popular choice for modeling elliptical galax-ies. Although the dark and luminous mass proﬁles are notsimple power laws individually, the total density proﬁle ap-pears to be close to isothermal (Kronawitter et al. 2000;Koopmans et al. 2009; Treu 2010). The SIE is placed at theorigin and ﬁxed with an Einstein radius of 3 (cid:48)(cid:48) , ellipticity of0 .

3, and position angle of 60 ◦ east of north. The source lu-minosity proﬁle is an elliptical Gaussian with a half-light ra-dius of 0 . (cid:48)(cid:48) and peak surface brightness of 5 (in arbitraryunits). The position and orientation of the source are variedto create four canonical lens conﬁgurations that let us as-sess whether uncertainties in PBSR algorithms are sensitiveto the image morphology. Fig. 1 shows a 2-image conﬁgu-ration along with three conﬁgurations that nominally havefour images: a source near the center of the caustic producesa “cross” conﬁguration with four distinct images; a sourcejust inside the caustic curve produces two short arcs and along arc from two merging images (a “fold” conﬁguration);and a source inside a cusp in the caustic produces one longarc from three merging images along with an isolated imageon the other side (a “cusp” conﬁguration). In the follow-ing sections we vary the amount of noise in the mock data(Fig. 2) and the resolution (i.e., the pixel scale; Fig. 3). Figure 2.

Test data for the cusp conﬁguration shown with vary-ing noise levels. Peak S/N clockwise from top left: 1, 10, 25, 500.The pixel scale is 0.1 arcsec/pixel. The colour scale is linear andconsistent except for the S/N = 1 case.

Figure 3.

Test data for the cusp conﬁguration shown with vary-ing pixel scales. Only a subsection of the long arc is shown so thatthe diﬀerences in resolution are visible. Clockwise from top left:0.1, 0.05, 0.03, and 0.02 arcsec/pixel.

Some of the practical challenges in PBSR are inherent tothe algorithm itself. We have already mentioned the need forregularisation. Dealing with gridded data makes some degreeof interpolation unavoidable. Also, diﬀerent parts of the im-age plane probe diﬀerent spatial scales in the source plane,depending on the lensing magniﬁcation. Using an adaptivesource plane grid helps take full advantage of the informa-tion contained in a lensed image, but leads to challengeswith interpolating and calculating derivatives on an irregu-lar grid. In this section we examine how these issues aﬀectthe source reconstruction and lens model ranking. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction In PBSR, the image and source grids do not have to be thesame. Image pixels have deﬁnite dimensions set by the in-strument and data processing. But source “pixels” are moregeneral; they refer loosely to positions (and small regionsaround them) where one chooses to reconstruct the surfacebrightness of the source. The shape and density of sourcepixels are arbitrary, and they can vary across the sourceplane. The source pixel density directly limits the resolutionof the reconstruction.If source pixels outnumber image pixels, the reconstruc-tion problem will be underconstrained. The regularisationstrength will be driven to high values, eﬀectively decreasingthe number of independent source pixels. In each of thegridding schemes discussed below, the grid is constructed sothe number of source pixels is approximately half the num-ber of image pixels.The size and shape of the grid can be limited to speciﬁcregions on the sky. Using all image pixels may be computa-tionally expensive, and it can make the regularisation less ef-fective (because most of the source pixels would just containnoise). Therefore it may be useful to construct masks aroundregions that contain lensed images. Wayth et al. (2005) com-ment on the importance of careful pixel masking, becausepixels that do not contain ﬂux can be as important as thosethat do. If a model ﬁts the observed surface brightness butalso puts ﬂux where no light is observed, the model shouldbe penalised but overly aggressive masking might cause thefaulty pixels to be ignored. As a precaution, pixsrc can ﬁndand include all pixels that are “sisters” to the pixels in themasked region(s). Doing so requires some care because thenumber of image pixels that get used can vary with the lensmodel.We describe three diﬀerent schemes for gridding thesource plane: one Cartesian and two adaptive. Fig. 4 showsexamples of the two adaptive grids. We compare the perfor-mance of the two adaptive gridding schemes in § We begin with a simple Cartesian grid. The pixel densityand resolution in the source plane are uniform. The grid di-mensions and pixel scale can be set manually or chosen toachieve N s ≈ N d /

2, as this seems to adequately reconstructthe source without being underconstrained. Beneﬁts of theCartesian grid lie in its simplicity: the grid, lensing opera-tor, and regularisation operator are easily and quickly con-structed. However, the uniform resolution means that smallscales cannot be probed without incurring a large numberof source pixels and a correspondingly large regularisationstrength. Strong regularisation introduces correlations between nearbypixels, smoothing the surface brightness and decreasing the eﬀec-tive resolution in the source plane. Heuristically, image pixels are sisters if they come from thesame source pixel.

Figure 4.

Triangulation of a fully adaptive grid (top) and anadaptive Cartesian grid (bottom). The lens model used is speci-ﬁed in § (cid:48)(cid:48) ,ellipticity 0.3, and position angle 60 ◦ east of north. Vegetti & Koopmans (2009) introduced a gridding schemein which some of the image plane pixels are mapped tothe source plane and used to construct the source grid (seeFig. 5). By default we choose to use every other pixel toconstruct the grid, which helps to ensure that N s ≈ N d / § c (cid:13) , 000–000 Tagore & Keeton

Figure 5.

A demonstration of the fully adaptive grid construc-tion. The left and right panels show the image and source grids,respectively. Filled circles in the image plane are mapped to thesource plane, and a Delaunay triangulation (Shewchuk 1996) isused to construct the source grid. Open circles in the image planeare then mapped to the source plane and set to values interpo-lated from the surrounding source pixels. Each ﬁlled circle hasa row in the unblurred lensing operator with a single entry of1, while each open circle has a row with three non-negative en-tries that sum to 1. This ﬁgure is inspired by Fig. 1 in Vegetti &Koopmans (2009).

The adaptive Cartesian grid builds from the Cartesian grid.An initial two-dimensional grid is reﬁned, adding or remov-ing pixels, so the pixel density varies according to somecriterion. Such adaptive mesh reﬁnement algorithms havebeen used in many ﬁelds of research, including star forma-tion modeling, radiative transfer codes, and magnetohydro-dynamic simulations. For PBSR, we implement an adaptiveCartesian grid similar to that used by Dye & Warren (2005),which is designed to place more source pixels in regions ofhigher magniﬁcation. We ﬁrst give a heuristic description ofthe gridding scheme, and then provide more details.An initial, zeroth level grid is constructed as a box justlarge enough to contain all of the ray-traced image pixels,with ﬁve grid points (at the corners and center). A zerothlevel magniﬁcation, µ , is ascribed to this grid. Then eachquadrant is examined, and if the magniﬁcation in this quad-rant, µ , is larger than four times the magniﬁcation of theparent grid ( µ (cid:62) µ ), the quadrant is split into a (ﬁrstlevel) subgrid, itself consisting of four quadrants. The factorof four here is necessary because as we add a subgrid, wesplit a quadrant into four more quadrants, increasing thespatial resolution by a factor of four (in area). Then, foreach of these ﬁrst level quadrants, we add a second level ofsubgridding if µ (cid:62) µ = 16 µ . This process is repeated forevery quadrant and subquadrant.In practice, it would be computationally expensive toexamine every quadrant and subquadrant, and it would beundesirable to do so since many source pixels would be un-used in the lensing operator. Instead, every image pixel isray-traced back to the source plane, the local magniﬁcationat that location is computed, the appropriate level of sub-gridding is determined based on the ratio of the local mag-niﬁcation to the zeroth level magniﬁcation, and only theminimum number of source pixels (three or fewer) neededare created.It still remains to determine µ . Because the size ofthe zeroth level grid is arbitrary, µ is also arbitrary. Thisfreedom is what Dye & Warren (2005) encapsulate in their“splitting factor.” We note that Dye & Warren (2005) allowtheir splitting factor to vary in the source reconstruction. We have not explored this additional freedom. Instead, weﬁx µ so that N s ≈ N d / § χ surface is larger using the adaptive Cartesian grid.The higher noise is thought to be due to the discrete changein magniﬁcation required to trigger the subgridding. How-ever, as discussed in § The surface brightness of an image pixel is calculated byray-tracing the pixel to the source plane and linearly in-terpolating over up to three adjacent source pixels. Suchinterpolation amounts to treating the source as a collectionof small planes, which may or may not provide an accurateapproximation to the true surface brightness distribution(depending on the pixel scale). Errors from the interpola-tion can be important if they are large compared with therandom noise in the data.As an illustration, Fig. 6 shows data, model, and resid-uals for a high-quality image of a source in the cusp conﬁgu-ration. The peak S/N is 500, and the image resolution is 0.03arcsec/pixel. The lens model was ﬁxed at the correct model,and the fully adaptive grid was constructed as usual, butthe source surface brightness was ﬁxed at the known valuefor the cusp source (rather than being reconstructed). Byvisual inspection, the data and model seem to agree well,but the residuals show clear structure. Also, the χ valueis 13,236, which corresponds to a reduced χ of 1.60. Thisis troubling since the lens and source models were ﬁxed atthe true values. The residuals, and hence the large χ value,arise from interpolation errors. To see this, Fig. 6 shows thediﬀerence between an image constructed directly from theanalytic source and an image constructed from the inter-polated version. The structure of the interpolation errorsclearly explains the structure of the model residuals.We need to ﬁnd a way to account for these errors.Strictly speaking, we would have to know the true surfacebrightness of the source in order to determine interpolationerrors in the ﬁrst place. As an approximation, we ﬁt an an-alytic model (comprising one or more S´ersic proﬁles) to thepixelated source. We use this analytic model to compute amap of interpolation errors, as shown in the top right panelof Fig. 6. (The error map can be blurred by the PSF asneeded.) We then modify the noise covariance matrix ( C d in Eq. 3) with the substitution C d → C d + C interp . (17)We make C interp a diagonal matrix containing the squaresof the interpolation errors (which omits any correlations inerrors among pixels but is a simple and eﬀective approach).This modiﬁcation lowers the χ value for the case shown inFig. 6 to 8275, which corresponds to a reduced χ of 0.999.Accounting for interpolation errors in this way is a con-servative practice, as the eﬀect is to broaden the χ surface. If the ﬁt to the pixelated source is poor, we do not account forinterpolation errors. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction Figure 6.

Visualisation of interpolation errors for a source in thecusp conﬁguration with a peak S/N of 500 and a resolution of 0.03arcsec/pixel. From top left, clockwise: data, interpolation errors,model residuals, model. The lens and source models were ﬁxed attheir correct values, but there are signiﬁcant residuals with thesame structure as the interpolation errors. Accounting for inter-polation errors lowers the χ from 13,236 to 8275, correspondingto a change in reduced χ from 1.60 to 0.999. As an example, Fig. 7 shows a one dimensional cut of theBayesian evidence for a cusp conﬁguration with a pixel scaleof 0.05 arcsec/pixel and a peak S/N of 100. All lens modelparameters except the Einstein radius are ﬁxed at their truevalues. The curves show the Bayesian evidence as a functionof R E for two forms of regularisation (discussed in § χ curve becomes shallower when interpolationerrors are addressed, reﬂecting a larger uncertainty in theEinstein radius.The scale of interpolation errors depends on the lensconﬁguration and image resolution. Fig. 8 shows the mini-mum and maximum interpolation errors as a function of thepixel scale for all four lens conﬁgurations, using both thefully adaptive and adaptive Cartesian grids. The lens modeland source brightness are again ﬁxed at their true values. Asthe image resolution improves, the interpolation errors de-crease. The doubly-imaged source is not as highly magniﬁedas the other cases, so the eﬀective resolution in the sourceplane is lower and the interpolation errors are larger (reach-ing about 20% of the peak ﬂux). The quad conﬁgurationsshow interpolation errors up to about 7%. In § Figure 7.

Eﬀect of interpolation errors, when the errors are com-parable to the noise level. The test data have a source in the cuspconﬁguration with a peak S/N of 100 and a resolution of 0.05arcsec/pixel. Red, solid lines and blue, dashed lines correspondto curvature regularisation and ASR, respectively (see § χ , vertical oﬀsets have been applied,but diﬀerences between same colour curves are meaningful. Qual-itatively, we see that accounting for interpolation errors broadensthe χ . Quantitatively, the ranges of χ change by factors of 1.8and 1.2 for curvature regularisation and ASR, respectively. Figure 8.

Minimum and maximum interpolation errors areshown as a function of the pixel scale for both gridding schemes.(The pixel scale is quoted for the image plane because that isthe quantity known from data, but bear in mind that interpola-tion occurs in the source plane.) From top left, clockwise: cusp,fold, cross, and 2-image conﬁgurations. The lens and source mod-els were ﬁxed at their correct values. The percent error in anygiven image pixel is calculated by taking the ratio of the inter-polation error in that pixel to the peak signal in the image. Thedoubly-imaged conﬁguration shows the largest errors, because themagniﬁcation is lower and hence the number of source pixels thatcover the source is smaller.c (cid:13) , 000–000

Tagore & Keeton

Figure 9.

Comparison of FDM and DTM on both adaptive grids.Top left: ﬁxed source model (an elliptical Gaussian with ellipticity0.3), placed in the cusp conﬁguration. Top right: exact magnitudeof the gradient of source. The remaining panels show the magni-tude of the gradient computed with various grids and derivativeschemes. The middle row corresponds to the fully adaptive (FA)grid, while the bottom row corresponds to the adaptive Cartesian(AC) grid. The left column corresponds to FDM, while the rightcolumn corresponds to DTM. The colour scale is linear. Only rela-tive changes within a panel are important, because multiplicativeconstants can be absorbed into the regularisation strength. Thespurious peaks sometimes seen when using the FDM are likelydue to unfortuitous alignment of “virtual pixels” with the pixelat which the derivative is evaluated. For a more detailed discus-sion, see § § lensing eﬀectively gives higher resolution in regions that aremore highly magniﬁed (see Vegetti & Koopmans 2009 formore discussion). This choice makes the regularisation sen-sitive to the lens model through the density of source pixels,so in principle it might introduce model-dependent biasesinto the regularisation. The simulations presented in § Using Taylor’s theorem, we can calculate derivatives on agrid using the ﬁnite diﬀerence method (FDM). For a simpleCartesian grid, the gradient at a particular pixel m can beapproximated by taking directional ﬁnite diﬀerences of thesurface brightness along the grid axes at the pixel m . This Figure 10.

Diagram illustrating the derivative calculation usingthe FDM on an irregular grid. The blue point indicates the pixel m where we seek to compute the derivative. The grid is the sameas that shown in Fig. 5. The black, solid lines form a quadri-lateral Q connecting the surrounding pixels (labeled n p , where p = { , , , } ). The points where Q intersects the horizontaland vertical lines through m are called virtual pixels (labeled v p ,where p = { , , , } ). The ﬂux at each virtual pixel is a linearcombination of the ﬂuxes at the two surrounding pixels that arecolinear with that virtual pixel. The virtual pixels are used tocompute the derivative at m . can be written as (cid:126) g [ m ] = 12 (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) (cid:126) r [ n ] − (cid:126) r [ m ] | (cid:126) r [ n ] − (cid:126) r [ m ] | ∝ (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) ˆ r nm , (18)where (cid:126) g is a vector containing the derivatives at each sourcepixel, (cid:126) r is a vector containing the position vectors of eachsource pixel, ˆ r nm is a unit vector pointing from n to m , andthe sums are over the four nearest pixels. The last propor-tionality holds because, for a Cartesian grid, the distancesbetween adjacent pixels are identical and can be absorbedinto the regularisation strength. To approximate the secondderivative across the source plane, we write down the Lapla-cian as h [ m ] = (cid:126) ∇ · (cid:126) g [ m ] ∝ (cid:126) ∇ · (cid:88) n (cid:18) s [ n ] − s [ m ] (cid:19) ˆ r nm ∝ (cid:18) (cid:88) n s [ n ] (cid:19) − N s [ m ] , (19)where h is a vector containing the second derivative at eachsource pixel and the sum is again over the N = 4 nearestpixels. In both cases, if the pixel is not on the edge of thegrid then the sums include the four pixels to the immediateleft, right, top, and bottom of the pixel in question. If thepixel is on the edge, the “missing” pixels are assumed to con-tain zero ﬂux. This eﬀectively assumes the surface brightnessoutside the source grid is zero, which can lead to ineﬀectiveregularisation if the grid is small enough that the edges areclose to the region of interest. Suyu et al. (2006) note thatthe derivative calculations can be modiﬁed to avoid assum-ing zero surface brightness outside the grid, but that canlead to problems for ranking lens models. The preceding discussion can be extended to adaptive As Suyu et al. (2006) explain, dropping the zero surface bright-ness assumption causes the Hessian of R to become singular. Thesingularity can be removed by introducing a renormalisation con-stant, but the constant will vary with the lens model, complicatingthe model comparison. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction grids, although some care is needed because there may bemore than four pixels nearby and it may not be immediatelyobvious which ones should be used. Vegetti & Koopmans(2009) compute the derivative for a particular pixel m usingthe triangles that surround m in the Delaunay triangulationof the grid. In pixsrc , we instead identify four pixels (here-after referred to as surrounding pixels) as follows. Trans-forming to a coordinate system centered on m , we select thesurrounding pixels so that each pixel lies in a diﬀerent quad-rant, each pixel is near m , and the quadrilateral Q formedby the pixels deviates the least from a square. We use thesurrounding pixels to calculate the surface brightness at theintersections of the x and y axes with Q , which we refer to asvirtual pixels. (A schematic diagram of the surrounding andvirtual pixels is shown in Fig. 10.) From the surface bright-nesses at the virtual pixels, we can compute the derivativesusing Eq. 18 and a modiﬁed version of Eq. 19. After somealgebra, the ﬁrst derivative at m can be expressed as (cid:126) g [ m ] = (cid:88) n (cid:18) D [ v n − , s n − ] D [ v n − , m ] D [ s n − , s n ] ˆ r v n − + D [ v n , s n +1 ] D [ v n , m ] D [ s n +1 , s n ] ˆ r v n (cid:19) s [ s n ] − (cid:18) (cid:88) n D [ m, v n ] ˆ r v n (cid:19) s [ m ] , (20)and the second derivative at m is given by h [ m ] = (cid:88) n (cid:18) D [ v n − , s n − ] D [ v n − , m ] D [ s n − , s n ]+ D [ v n , s n +1 ] D [ v n , m ] D [ s n +1 , s n ] (cid:19) s [ s n ] − (cid:18) (cid:88) n D [ m, v n ] (cid:19) s [ m ] , (21)where the sums run from n = 1 to n = 4, D [ r, s ] is a func-tional that returns the distance between points r and s , v p is the p th virtual pixel, and ˆ r v p is a unit vector pointing to-ward the p th virtual pixel. For simplicity of notation, we letthe indices wrap around (e.g., v = v and v = v ).Fig. 9 suggests that derivatives calculated with theFDM can be inaccurate. Certain conﬁgurations of pointson the fully adaptive grid can cause virtual pixels to lie veryclose to m , leading to an anomalously high estimate for thederivative. Such events are rare and do not have a dramaticeﬀect on the source reconstruction. The adaptive Cartesiangrid is less susceptible to such gridding issues. The method described here is developed in Xu & Liu (2006);we reproduce some of the key elements. It is called by theauthors an irregular grid ﬁnite diﬀerence method based onthe Green-Gauss theorem (as Green’s theorem reduces toGauss’ theorem in two dimensions). The theorem states thatfor a scalar function F deﬁned on R with continuous par-tial derivatives, we can relate the surface integral over someregion Ω to a line integral along the boundary of Ω: (cid:90) (cid:90) Ω (cid:126) ∇ F dΩ = (cid:73) ∂ Ω F ˆ n d s, (22) Figure 11.

Diagram illustrating the derivative calculation us-ing the DTM. The blue point again indicates the pixel m wherewe seek to compute the derivative. The black, solid lines forma polygon surrounding m . The square points are placed at themidpoints between m and the surrounding pixels. The trianglepoints are placed at the centroids of the triangles. The stencilΩ is formed by connected midpoints to adjacent centroids. Theunit vector ˆ n , denoted by the magenta arrows, is orthogonal tothe edges of the stencil, points outwards, and changes directionas the stencil is traced along its edges. This ﬁgure is inspired byFig. 1 in Xu & Liu (2006). where ˆ n is a unit normal vector on the boundary, point-ing outwards. Suppose Ω is a region, called a stencil, smallenough that (cid:126) ∇ F is approximately constant across the re-gion. Then we can pull the gradient out of the integral andwrite (cid:126) ∇ F = 1Ω (cid:73) ∂ Ω F ˆ n d s, (23)from which it follows that (cid:126) ∇ · ( (cid:126) ∇ F ) = 1Ω (cid:73) ∂ Ω ˆ n · (cid:126) ∇ F d s, (24)For implementation, the integrals are converted to sums, andΩ is deﬁned by connecting centroids of Delaunay triangles toadjacent midpoints of the sides of the triangles (see Fig. 11).Depending on the density of source pixels, the stencil Ω maynot be small enough for ∇ F to be constant. We neverthelesstake Eq. 23 to deﬁne an eﬀective gradient for each pixel. Unlike the FDM, the DTM does not assume the ﬂuxvanishes outside the grid. Eq. 23 and 24 can be applied tothe grid edges, as long as care is taken in closing the line in-tegrals. Thus, edge eﬀects in the regularisation are minimal.Fig. 9 suggests that derivatives computed with the DTM aremore accurate than those computed with the FDM, becausethe DTM uses all nearby pixels.

As an alternative to derivative-based regularisation, we havedeveloped a quadratic form of regularisation that penalisesthe source for deviations in surface brightness from one ormore analytic proﬁles. We ﬁnd that analytic source regular-isation (ASR) is especially useful in recovering the surfacebrightness of the source in noisy data. Currently, the refer-ence surface brightness distribution has a S´ersic proﬁle, I ( (cid:126)r ) = I exp (cid:34) − (cid:18) | (cid:126)r | r s (cid:19) /n (cid:35) , (25) Implementation of the second derivative requires additionalcorrection terms found in Xu & Liu (2006); it is still under reﬁne-ment in pixsrc .c (cid:13) , 000–000 Tagore & Keeton where I is the normalisation, r s is the scale radius, and n is the S´ersic index. Elliptical models are created from a lin-ear transformation of coordinates. More complicated sourcescan be built from a combination of S´ersic proﬁles that rep-resent multiple, blended sources (such as “knots” in star-forming galaxies).To implement ASR, we ﬁrst ﬁnd the analytic source s a that best ﬁts the data. We vary the position, normalisation,scale radius, S´ersic index, ellipticity, and position angle ofthe analytic source using a downhill simplex optimisationroutine (Press et al. 2002). We then use the best-ﬁt analyticsource to construct a regularisation matrix, H , that acts ona source vector, s , to produce a deviation vector, ∆ = Hs ,whose value at pixel m is given by δ [ m ] = (cid:18) (cid:88) n s [ n ] s a [ n ] (cid:19) − N s [ m ] s a [ m ] , (26)where s a [ p ] is the ﬂux at p from the analytic source, and thesum is over N pixels that share a Delaunay triangle withpixel m . The deviation vector vanishes if the source vectoragrees completely with the analytic proﬁle, or indeed if s isany real multiple of s a . More generally, ∆ quantiﬁes thedegree to which s does not match a multiple of s a . Becausethe inverse brightness values in Eq. 26 can become large to-ward the outer regions of the analytic proﬁle, we set theanalytic source ﬂux to 10% of the noise level once it fallsbelow this value. Fig. 12 suggests that ASR is more eﬀec-tive than derivative-based regularisation at recovering thesource from noisy data. This result is perhaps not surpris-ing; because ASR assumes a functional form for the source,it is a stronger prior than derivative-based regularisation.It is important to note that ASR will yield accurate sourcereconstructions only if the true source is well described bythe assumed functional form.At this point we should consider whether regularisa-tion introduces any biases in the values or uncertaintiesfor recovered lens model parameters. Because the noise isGaussian and centered on zero, we conjecture that analysingmany diﬀerent realisations of the noise can uncover the trueunderlying likelihood function (as an alternative to explic-itly regularising the source surface brightness). We constructthousands of “observations” of a cusp lens with a peak S/Nof unity and a resolution of 0.1 arcsec/pixel. We vary theellipticity of the lens while holding other parameters ﬁxedat their true values. The χ curves from individual runsvary signiﬁcantly, but stacking the results washes away theﬂuctuations from noise (see the red curve in Fig. 13). Thestacked curve from ASR (shown in blue) matches the un-derlying χ curve well. The results from curvature regular-isation, by contrast, show a small bias toward lower ellip-ticity and underestimate the uncertainties for this parame-ter. This is yet another indication that ASR can outperformderivative-based regularisation when the data are noisy andthe true source follows an analytic proﬁle. At higher S/N(not shown), there is less diﬀerence between the regularisa-tion schemes. Because ASR can obtain a minimum for s (cid:54) = , some of thealgebra in § Figure 12.

Comparison of sources reconstructed from noisy data.The test data have a source in the cusp conﬁguration with a peakS/N of 1 (see the top left panel of Fig. 2) and a resolution of0.05 arcsec/pixel. Clockwise from top left: the true source sur-face brightness followed by sources reconstructed from gradient-based regularisation, curvature-based regularisation, and ana-lytic source regularisation. The source recovered using ASR bestmatches the true source brightness. In the case of blended sources(not shown), ASR also outperforms derivative-based regularisa-tion.

Figure 13.

Eﬀects of regularisation on parameter estimation.The y -axis label “statistic” denotes χ and − E for cases with-out and with regularisation, respectively, but for simplicity werefer to both as χ . (We have applied a vertical oﬀset to facilitatecomparing the curves.) We construct many realisations of a cusplens with a peak S/N of unity and a resolution of 0.1 arcsec/pixel.After stacking the results, we expect the χ curve without regu-larisation (shown in red) to represent the true errors on the ellip-ticity. ASR (shown in blue) seems to agree well with the referencecase. By contrast, curvature regularisation (shown in green) hasa minimum that is shifted away from the true value e = 0 .

3. Also,the χ curve rises rapidly away from the minimum, causing theparameter uncertainties to be underestimated. For each case, thedark, thick line corresponds to the median value, and the bandsare 68% conﬁdence intervals, estimated from bootstrapping.c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction χ When exploring the lens model parameter space, we ﬁndthat the likelihood surface can be jagged even for ourclean test data. Wallington, Narayan & Kochanek (1994) re-marked on “glitches” in χ for their maximum entropy anal-ysis, but noted that the glitches disappeared as the PSF andnoise vanished. In our analysis, the jaggedness is reduced butnot eliminated in that limit. It arises, we suspect, from thediscrete nature of PBSR itself. A small, continuous changein the lens model parameters can shift the source pixels ina way that causes the Delaunay algorithm to connect thepixels in a diﬀerent way, leading to abrupt changes in thelensing operator and regularisation matrix.To probe these issues, we examine one-dimensional cutsof the χ surface for various gridding and regularisationschemes. We focus on test data for the cusp conﬁgurationwith a peak S/N of 25 and pixel scale of 0 .

05 arcsec/pixel.We ﬁx all lens model parameters at their correct values andvary only the ellipticity of the lensing galaxy. We considerdiﬀerent combinations of grids (fully adaptive or adaptiveCartesian) and priors (gradient regularisation with FDM orDTM, or ASR). The results are shown in Fig. 14.Qualitatively, we ﬁnd that the fully adaptive grid showsless small-scale ﬂuctuation in χ than the adaptive Carte-sian grid. Since the fully adaptive grid is constructed byray tracing image pixels to the source plane, it more nat-urally accommodates small changes in the lens model. Theadaptive Cartesian grid, by contrast, either remains ﬁxedor changes discretely (if the magniﬁcation crosses the cri-terion for subgridding; see § χ . It may be possible to improve theperformance of the adaptive Cartesian grid by developinga diﬀerent criterion for subgridding, but such modiﬁcationshave not yet been explored.Turning to regularisation, the DTM yields somewhatsmaller ﬂuctuations than the FDM, at least for the fullyadaptive grid (with the adaptive Cartesian grid, the noiseis dominated by the gridding anyway). ASR leads to thesmoothest χ curves for both types of grids. As the lensmodel parameters vary, the best ﬁt analytic source and thecorresponding weights in the regularisation matrix can varysmoothly as well. It is interesting that the fully adaptivegrid with the DTM does not show a similar level of smooth-ness, because that method also changes continuously withlens model parameters. The diﬀerence may occur becausethe deviation vector δ in Eq. 26 is a dimensionless ratio ofsurface brightnesses, whereas the derivatives used for gradi-ent or curvature regularisation have units of surface bright-ness divided by distance or squared distance. Using a di-mensionless measure of deviation allows each pixel to haveequal weight in the regularisation matrix, a quality that thederivative-based methods do not necessarily have.Finally, we note that the χ curve is ﬂatter near theminimum for ASR than it is for gradient regularisation (fo-cusing now on the fully adaptive grid). This causes ASRto yield larger uncertainties in the ellipticity of the lensinggalaxy, as we saw already in Fig. 13. If the ASR is taken torepresent the true posterior probability distribution, then the errors reported using the fully adaptive grid with gradi-ent regularisation are being underestimated.In summary, we ﬁnd that the gridding and regularisa-tion schemes both aﬀect the level of noise in the Bayesianevidence. These two algorithmic issues need to be consideredcarefully in applications of PBSR. In real data, the image resolution is typically ﬁxed by theobservational equipment, but the telescope pointing and thenoise in the data are particular realisations; on a diﬀerentday, the same observation would not actually be identical. We now consider whether such chance events introduce anystatistical or systematic uncertainties into conclusions de-rived from lens modeling. We examine noise and pointingboth separately and jointly, with and without a PSF, sometimes just optimising the parameters and sometimesperforming a full parameter space exploration. We assumeGaussian noise with zero mean, which can be consideredto represent electron read-out noise, Poisson noise (in thelarge mean limit), or sky noise. As a ﬁducial case, we usea lens in the cusp conﬁguration with a pixel scale of 0.05arcsec/pixel and a peak S/N of 10, but we examine diﬀerentchoices as discussed below. Since ASR is computationally ex-pensive, and curvature regularisation is well suited for initialparameter space explorations (see § While the noise level will aﬀect the uncertainty in lens modelparameters, the particular noise realisation will also aﬀectthe best-ﬁt values of the parameters. To explore this pos-sibility, we create 100 “observations” with the same databut diﬀerent realisations of the noise, for the various noiselevels shown in Fig. 2. We optimise the parameters and ex-amine the scatter among best-ﬁt values (at this point weare not fully quantifying the parameter uncertainties). Thepixel scale is ﬁxed at the high resolution of 0.02 arcsec/pixel(see Fig. 3) so that eﬀects due to pixel size are minimised.Fig. 15 shows the results in terms of diﬀerent two-dimensional parameter projections, along with the medianand 68% conﬁdence intervals for individual parameters.There is no signiﬁcant bias in the parameter values. Em-pirically, the scatter among best-ﬁt values appears to have apower law dependence on S/N with a slope of ∼ − . Telescope pointing aﬀects how photons are collected intopixels, so small shifts may inﬂuence the data and hence the Observations often include multiple exposures to handle cos-mic rays, bad pixels, dithering, and subsampling the PSF. Weimagine our analysis being applied to the ﬁnal image after datareduction. The PSF is used both in creating the mock data and in mod-eling the lens.c (cid:13) , 000–000 Tagore & Keeton

Figure 14.

Eﬀective χ as a function of ellipticity for various gridding and regularisation schemes. The test data have a source in thecusp conﬁguration with a peak S/N of 25 and a resolution of 0.05 arcsec/pixel. The columns correspond to diﬀerent grids (left is fullyadaptive, right is adaptive Cartesian). The rows correspond to diﬀerent regularisation schemes (top is gradient regularisation with FDM,middle is gradient regularisation with DTM, bottom is analytic source regularisation [ASR]). The diﬀerent panels have the same verticalrange (and the vertical oﬀsets are not meaningful). In general, the fully adaptive grid leads to less noise in χ than the adaptive Cartesiangrid. ASR produces the smoothest curve over large scales, presumably because the regularisation matrix does not change discretely andthe deviation from an analytic proﬁle is measured in a dimensionless way. c (cid:13)000

Best-ﬁt lens model parameters for diﬀerent realisation of noise in the data. The source is in the cusp conﬁguration. Red,green, and blue points correspond to peak S/N levels of 1, 10, and 25, respectively. The points marked correspond to optimal lens modelparameters; this analysis does not include full parameter uncertainties. The quoted uncertainties indicate the ranges that enclose 68%of the best-ﬁt values. The pixel scale is ﬁxed at 0.02 arcsec/pixel so the eﬀects of pixel size are minimal. Dashed, yellow lines mark thetrue values. recovered model parameters. To explore this issue, we againcreate 100 “observations” in which the pointing is shiftedrandomly. The shifts are drawn from a uniform distributionthat is one pixel in each direction, for image resolutions of0.03, 0.05, and 0.1 arcsec/pixel. pixsrc requires some amountof noise, but the noise map is kept identical and the noiselevel is minimal (the peak S/N is 5 × ) so the eﬀects arenegligible.Fig. 16 shows two-dimensional projections of the best-ﬁt parameter values. The median values reveal biases thatare small (a fraction of a pixel for the Einstein radius andposition of the lens galaxy) but statistically signiﬁcant. Thebiases become less signiﬁcant, however, when a PSF is in-cluded (see Fig. 17). For the case with no PSF, the scatterin the best-ﬁt parameter values follows a power law witha slope of ∼ . Now we consider noise and telescope pointing together, andwe extend the analysis to all four test image conﬁgurations.We again create multiple “observations” but now each con-tains both a diﬀerent realisation of the noise and a diﬀerentrandom pointing. Table 5.3 quantiﬁes the spread in best-ﬁt Ignoring edge eﬀects, shifts of N + ∆ x are equivalent to shiftsof ∆ x , where N is an integer. parameter values for all four image conﬁgurations and peakS/N values of 1, 10, and 500.In general, the scatter decreases as the S/N increases.The 2-image case tends to have more scatter than the othercases because a 2-image conﬁguration provides weaker con-straints than conﬁgurations that have additional imagesand/or long arcs. The high-S/N cases show some small for-mal biases in the parameters, but we expect those would bereduced if a PSF were included.For some applications we are interested in the intrin-sic properties of the source galaxy (e.g., Sharon et al. 2012;Dye et al. 2014). Depending on the information available,it may be possible to estimate the luminosity, dynamicalmass, mass-to-light ratio, gas mass fraction, and star forma-tion rate for the source. Such applications require knowledgeof the lensing magniﬁcation, so we examine uncertainties inthe magniﬁcation associated with noise and pointing. Specif-ically, for each lens model in Table 5.3 we compute the totalmagniﬁcation of the source. We quantify the scatter usingthe 68% conﬁdence interval, and then divide by the truemagniﬁcation to obtain the fractional uncertainty for eachlens conﬁguration. (We are still just examining the scatteramong best-ﬁt models for diﬀerent realisations of noise andpointing; we are not yet characterising the full uncertaintiesin individual lens models.)Fig. 18 shows the results. At low S/N, the cusp conﬁgu-ration has the largest uncertainties, presumably because thesource lies in a region where small changes in the model canlead to large changes in the magniﬁcation. At higher S/N, c (cid:13) , 000–000 Tagore & Keeton

Figure 16.

Best-ﬁt lens model parameters for diﬀerent realisations of the telescope pointing. The source is in the cusp conﬁguration.The shifts are drawn from a uniform distribution that is one pixel in each direction. Red, green, and blue points correspond to imageresolutions of 0.1, 0.05, and 0.03 arcsec/pixel, respectively. The peak S/N is 5 × so that eﬀects related to noise are negligible. Dashed,yellow lines mark the true values. Figure 17.

Similar to Fig. 16 but including a PSF. The source is placed in the cusp conﬁguration, and the image resolution is ﬁxed at0.05 arcsec/pixel. The red points have no PSF, while the green and blue points have circular Gaussian PSFs with FWHM equal to 0.059and 0.12 arcsec, respectively. c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction S/N = 1

S/N = 10

S/N = 500 lower median upper lower median upper lower median upper cusp R E e fold R E e cross R E e R E e Table 1.

Best-ﬁt lens model parameters when we consider diﬀerent noise realizations and telescope pointings simultaneously. Theobservations correspond to peak S/N values of 1, 10, and 500, and the resolution is 0.05 arcsec/pixel. For each set of observations, themiddle column corresponds to the median value recovered, and the lower and upper bounds of the 68% CI are shown in the ﬁrst andthird columns, respectively. the 2-image case fares worst because the lens model is nothighly constrained. At all S/N values, the cross case has thesmallest fractional uncertainties because the source is in aregion where the magniﬁcation gradient is small. All told,for S / N (cid:38)

10 the scatter in magniﬁcation associated withnoise and pointing is (cid:46)

10% for all lens conﬁgurations.

To this point we have only examined how the best-ﬁt lensmodel parameters change with diﬀerent realisations of thenoise and telescope pointing. Now for each “observation” weuse an adaptive Markov Chain Monte Carlo (MCMC) algo-rithm to explore the full parameter space and characterisethe posterior distribution of parameters. The width of theposterior depends on the noise level, while the peak loca-tion depends on the particular realisation of the noise andpointing. By comparing the width of each posterior to thescatter across realisations, we can investigate how the scat-ter from pointing compares to the scatter from noise. Note that noise contributes to this analysis twice: to the width ofeach posterior, and to the scatter between them. We con-sider this “double counting” when interpreting the results,as discussed below.Fig. 19 shows the 68% and 95% conﬁdence intervals forlens model parameters when we combine all of the realisa-tions. The noise level is ﬁxed so the peak S/N is 10, and theimage resolution is ﬁxed at 0.05 arcsec/pixel. We analyse thecusp conﬁguration, both without a PSF and with a PSF thathas a FWHM of 0.12 or 0.24 arcsec. Adding a PSF causesthe distributions to shift and broaden to some degree, butthe true values always lie within the 95% conﬁdence interval.Parameter inference, in other words, is robust.Let S tot be the width of the posterior from the com-bined analysis. For comparison, let S i be the width froman individual “observation.” Since S i only accounts for noise We quantify the width in terms of the 68% conﬁdence interval,and use the symbol S to distinguish this scatter from the standarddeviation.c (cid:13) , 000–000 Tagore & Keeton

Figure 19.

Marginal posterior probability distributions for lens model parameters, using a source in the cusp conﬁguration with animage resolution of 0.05 arcsec/pixel and a peak S/N of 10. We use a Markov Chain Monte Carlo analysis to explore the parameterspace and combine 100 realisations of the noise and telescope pointing. The contour plots show 68% and 95% conﬁdence intervals forthe various 2-dimensional projections. The top plots show individual probability distributions (normalised to the same peak), with the95% conﬁdence interval marked by points (deﬁned so 2.5% of the integrated probability is in each of the left and right tails). Solid blue,dashed green, and dot-dashed red curves correspond to data created with circular Gaussian PSFs having FHWMs of 0.0, 0.12, and 0.24arcsec, respectively.

Figure 18.

Scatter in the lensing magniﬁcation (shown as a frac-tional uncertainty, quoted as a percentage) for diﬀerent realisa-tions of the noise and telescope pointing. (This analysis does nottake full lens model parameter uncertainties into account.) Sta-tistical errorbars on the scatter are computed with a bootstrapanalysis. while S tot accounts for both noise and pointing, we generallyexpect S tot > S i . Indeed, Fig. 20 shows that this ratio typ-ically has values between 1 and 2. To understand what wemight expect, consider that if the distributions were Gaus-sian then the total scatter would be the quadrature sum ofthe width of each run and the scatter between runs: σ tot ≈ (cid:0) σ + σ (cid:1) / ≈ (cid:0) σ + σ (cid:1) / , (27)where σ width ≈ σ noise while σ scatter ≈ ( σ + σ ) / .In other words, we might na¨ıvely predict that the ratio inFig. 20 has the form S tot S i ≈ (cid:18) σ + σ σ (cid:19) / . (28)If the scatter from pointing is negligible compared with thescatter from noise, the ratio S tot /S i would have a value near √ ≈ .

4. As the scatter from pointing increases, the ratiowould likewise increase. We could therefore interpret scatterratios above √ √

2. The 2-image conﬁguration has values that are nom-inally higher but still consistent with √ c (cid:13) , 000–000 ncertainties in pixel-based source reconstruction Figure 20.

Ratio of the overall scatter, S tot , to the scatter forindividual runs, S i , for the various lens model parameters. Thepoint corresponds to the median value of the ratio, while theerrorbars are computed with a bootstrap analysis. A value of S tot /S i ≈ √ ≈ . on Gaussianity, which may not strictly apply to our distri-butions, the results suggest that the statsitical properties ofour runs are sensible. We note that these conclusions maydepend on the pixel scale and noise level, which we have notexplored in detail. We have introduced a new pixel-based source reconstruction(PBSR) software called pixsrc and applied it to mock data inorder to investigate statistical and systematic uncertaintiesin modeling lenses with extended sources. We have examinedseveral issues that are intrinsic to the pixel-based approach: • The χ surface contains “discreteness noise” that is in-ﬂuenced by the gridding and regularisation schemes. • Errors associated with interpolating surface brightnessvalues in the source plane need to be taken into account,especially for high-S/N data. • Adaptive grids are often used to achieve good resolu-tion in the source plane, but they require some care whencomputing numerical derivatives. • A new regularisation scheme called analytic source reg-ularisation (ASR) reconstructs a source with more ﬁdelitythan derivative-based regularisation when the data are noisy. • Compared to ASR, curvature regularisation may under-estimate parameter uncertainties for noisy data.We have applied ASR to sources that are fairly regular, butit could be extended to blended sources or galaxies withstar-forming regions by writing the analytic source as acollection of S´ersic proﬁles. Diﬀerences between ASR andderivative-based regularisation are smaller when the S/N ra-tio is higher. We have also examined statistical issues that arise be-cause any given data set has a particular realisation of thenoise and telescope pointing. For the cusp conﬁguration, weﬁnd that diﬀerent realisations of the noise lead to scatterin the best-ﬁt model parameters that scales as a power lawin S/N with a slope of ∼ − .

8. Diﬀerent realisations of thepointing lead to scatter that scales as a power law in the pixelscale with a slope of ∼ .

3. Some parameters show small butstatistically signiﬁcant biases, but those can be washed outwith the inclusion of a PSF. When we fully characterise themodel uncertainties, the 95% conﬁdence intervals always in-clude the true parameter values, with or without a PSF.These results are not highly sensitive to the image conﬁgu-ration, except that our 2-image lens has more scatter thanour 4-image lenses because the constraints on the lens modelare weaker.The scatter in noise and pointing lead to scatter inthe lensing magniﬁcation, which is important for determin-ing the intrinsic properties of the source. The magniﬁcationscatter does vary with the lens conﬁguration because it issensitive to how rapidly the magniﬁcation changes at the lo-cation of the source. This scatter decreases with increasingS/N, but more slowly for the 2-image conﬁguration than forthe 4-image cases. For S/N (cid:38)

10 the scatter in magniﬁca-tion associated with noise and pointing is (cid:46)

10% for all lensconﬁgurations.We note that real data may have complications beyondthe issues we have addressed. Examples include irregularstructure in the source, diﬀerential extinction by dust inthe lens galaxy, departures from a smooth lensing potential,incomplete knowledge of the PSF, and intricate aspects ofimage reduction. Such issues will be speciﬁc to particulardata sets and need to be examined in conjunction with thealgorithmic issues presented here.We have discussed a number of diﬀerent approaches toPBSR, so let us summarise our suggestions for modelingthat is both eﬃcient and eﬀective. If the images can be sep-arated, it may be useful to take their positions and ﬂuxesand perform an initial parameter search assuming point-likeimages. Then using an analytic source characterised by asmall number of free parameters can help identify the ap-propriate region of parameter space. When undertaking fullPBSR, derivative-based regularisation is good for computa-tional eﬃciency, but analytic source regularisation is a valu-able step if the source is reasonably well described by ananalytic proﬁle or a collection of such proﬁles. Finally, it isa good idea to ﬁnd the best-ﬁt lens and source models andcreate many realisations of similar observations (as in § ACKNOWLEDGEMENTS

We thank Andrew Baker, Simon Dye, Ross Fadely, CurtisMcCully, and Brandon Patel for helpful comments. We alsothank the referee, Olaf Wucknitz, for thorough and helpfulcomments on the manuscript. We acknowledge the follow-ing computational tools. For performing Delaunay triangu-lations, we use the triangle code by Shewchuk (1996). Forhandling world coordinate system information, we use the c (cid:13) , 000–000 Tagore & Keeton wcslib library (Calabretta & Greisen 2002) compiled by WC-STOOLS (Mink 2011). For solving linear matrix equationsand computing matrix factorizations, we use UMFPACK(Davis 2004b,a; Davis & Duﬀ 1999, 2006). For enabling GPUcomputing, we use CUDA (NVIDIA Corporation 2007) andThrust (Hoberock & Bell 2010). We acknowledge fundingfrom grants AST-0747311 and AST-1211385 from the U.S.National Science Foundation.

REFERENCES

Blandford R., Surpi G., Kundi´c T., 2001, in AstronomicalSociety of the Paciﬁc Conference Series, Vol. 237, Gravita-tional Lensing: Recent Progress and Future Goals, Brain-erd T. G., Kochanek C. S., eds., p. 65Brewer B. J., Lewis G. F., 2006, ApJ, 637, 608Calabretta M. R., Greisen E. W., 2002, A&A, 395, 1077Davis T. A., 2004a, ACM Trans. Math. Softw., 30, 196Davis T. A., 2004b, ACM Trans. Math. Softw., 30, 165Davis T. A., Duﬀ I. S., 1999, ACM Trans. Math. Softw.,25, 1Davis T. A., Duﬀ I. S., 2006, SIAM. J. Matrix Anal. andAppl., 18, 140Dye S. et al., 2014, MNRAS, 440, 2013Dye S., Warren S. J., 2005, ApJ, 623, 31Hoberock J., Bell N., 2010, Thrust: A parallel templatelibrary. Version 1.7.0H¨ogbom J. A., 1974, A&AS, 15, 417Jenkins C. R., Peacock J. A., 2011, MNRAS, 413, 2895Keeton C. R., 2001, arXiv:astro-ph/0102340Kochanek C. S., Keeton C. R., McLeod B. A., 2001, ApJ,547, 50Kochanek C. S., Narayan R., 1992, ApJ, 401, 461Koopmans L. V. E., 2005, MNRAS, 363, 1136Koopmans L. V. E. et al., 2009, ApJ, 703, L51Kronawitter A., Saglia R. P., Gerhard O., Bender R., 2000,A&AS, 144, 53Mink D. J., 2011, Astrophysics Source Code Library, 9015NVIDIA Corporation, 2007, NVIDIA CUDA Com-pute Uniﬁed Device Architecture Programming Guide.NVIDIA CorporationPress W. H., Teukolsky S. A., Vetterling W. T., FlanneryB. P., 2002, Numerical recipes in C++ : the art of scientiﬁccomputingSchneider P., Kochanek C. S., Wambsganss J., 2006, Grav-itational Lensing: Strong, Weak and Micro. SpringerSharon K., Gladders M. D., Rigby J. R., Wuyts E., KoesterB. P., Bayliss M. B., Barrientos L. F., 2012, ApJ, 746, 161Shewchuk J. R., 1996, in Lecture Notes in Computer Sci-ence, Vol. 1148, Applied Computational Geometry: To-wards Geometric Engineering, Lin M. C., Manocha D.,eds., Springer-Verlag, pp. 203–222, from the First ACMWorkshop on Applied Computational GeometrySuyu S. H., Halkola A., 2010, A&A, 524, A94Suyu S. H. et al., 2012, ApJ, 750, 10Suyu S. H., Marshall P. J., Auger M. W., Hilbert S., Bland-ford R. D., Koopmans L. V. E., Fassnacht C. D., Treu T.,2010, ApJ, 711, 201Suyu S. H., Marshall P. J., Blandford R. D., FassnachtC. D., Koopmans L. V. E., McKean J. P., Treu T., 2009,ApJ, 691, 277 Suyu S. H., Marshall P. J., Hobson M. P., Blandford R. D.,2006, MNRAS, 371, 983Treu T., 2010, ARA&A, 48, 87Vegetti S., Czoske O., Koopmans L. V. E., 2010, MNRAS,407, 225Vegetti S., Koopmans L. V. E., 2009, MNRAS, 392, 945Vegetti S., Koopmans L. V. E., Bolton A., Treu T., GavazziR., 2010, MNRAS, 408, 1969Vegetti S., Lagattuta D. J., McKean J. P., Auger M. W.,Fassnacht C. D., Koopmans L. V. E., 2012, Nature, 481,341Wallington S., Kochanek C. S., Narayan R., 1996, ApJ,465, 64Wallington S., Narayan R., Kochanek C. S., 1994, ApJ,426, 60Warren S. J., Dye S., 2003, ApJ, 590, 673Wayth R. B., Warren S. J., Lewis G. F., Hewett P. C.,2005, MNRAS, 360, 1333Xu G., Liu G. R., 2006, WSEAS Trans Math, 5 c (cid:13)000