Global Optimization methods for Gravitational Lens Systems with Regularized Sources
aa r X i v : . [ a s t r o - ph . C O ] O c t To appear in ApJ, 759, 27, Nov. 1, 2012
Preprint typeset using L A TEX style emulateapj v. 8/13/10
GLOBAL OPTIMIZATION METHODS FOR GRAVITATIONAL LENS SYSTEMS WITH REGULARIZEDSOURCES
Adam Rogers and Jason D. Fiege
Department of Physics and Astronomy, The University of Manitoba, Winnipeg, Manitoba, R3T-2N2, Canada
To appear in ApJ, 759, 27, Nov. 1, 2012
ABSTRACTSeveral approaches exist to model gravitational lens systems. In this study, we apply global op-timization methods to find the optimal set of lens parameters using a genetic algorithm. We treatthe full optimization procedure as a two-step process: an analytical description of the source planeintensity distribution is used to find an initial approximation to the optimal lens parameters. The sec-ond stage of the optimization uses a pixelated source plane with the semilinear method to determinean optimal source. Regularization is handled by means of an iterative method and the generalizedcross validation (GCV) and unbiased predictive risk estimator (UPRE) functions that are commonlyused in standard image deconvolution problems. This approach simultaneously estimates the optimalregularization parameter and the number of degrees of freedom in the source. Using the GCV andUPRE functions we are able to justify an estimation of the number of source degrees of freedom foundin previous work. We test our approach by applying our code to a subset of the lens systems includedin the SLACS survey.
Subject headings: gravitational lensing: strong — methods: numerical INTRODUCTION
Methods for modeling gravitational lens systems aredivided into a broad dichotomy between schemes thatrequire a parameterized analytical model for the sourceintensity distribution, and schemes that assume only apixelated source with no underlying model. Methodsthat parameterize the source intensity distribution areoften quite easy to implement, but assume a priori knowl-edge of the source structure. Schemes that make use ofa pixelated source are generally more complex, but offergreater flexibility since no parametric form is assumedfor the source. This paper makes use of both parameter-ized and pixelated source models, exploiting the benefitsprovided by each.Lens inversion schemes based on analytical sourcemodels assume an intensity distribution I s ( β ) in thesource plane β . A model of the lens density is then usedto calculate a ray-tracing from the image plane θ to thesource plane using the thin lens equation β ( θ ) = θ − α ( θ ) , (1)where α ( θ ) is the deflection angle field calculatedfrom the projected lens potential (Schneider 1985;Schneider et al. 1992; Petters et al. 2001). Sincegravitational lensing conserves surface brightness(Kayser & Schramm 1988), the lensed image intensity iseasily found by I ( θ ) = I s ( β ( θ )) (2)for an assumed parametric source intensity function I s .The resulting lensed image I ( θ ) is then convolved witha point spread function (PSF) and compared with thedata. The χ statistic is minimized over the combined set [email protected] of lens and source parameters using non-linear methodsfor parameter search and global optimization.S´ersic profiles (S´ersic 1968) are widely used for galaxyscale sources, as defined by I s ( r ) = I exp {− k ( n )[( r/r ) − n − } , (3)which assumes intensity I at the scale length r andshape index n . The shape index controls the curvature ofthe profile, where most galaxies have profiles with 0 . 10. The de Vaucouleurs (1948) profile is recoveredfor n = 4, and the exponential disk is found by setting n = 1. The scaling factor k ( n ) is used to normalize thedistribution such that half the total luminosity is within r .Due to their flexibility and simple physical interpre-tation, S´ersic functions are commonly used to modellensed sources (Marshall et al. 2007; Bolton et al. 2008;Brewer and Lewis 2011). However, more complicatedanalytical source functions have also been used to ap-proximate the varied and complex morphologies of galax-ies and can include hundreds of parameters in extremecases (Tyson et al. 1998). In general, analytical mod-els are used because they are typically fast to evaluateand provide an intuitive understanding of the resultingsource.As useful as analytical models are, they may not beflexible enough to describe complex sources and may biasthe lens parameters during χ minimization to compen-sate for the artificial constraints imposed by their as-sumed analytical form. Pixelated source models wereintroduced to move past this limitation. This approachrepresents the source plane intensity as a set of basisfunctions, each having an adjustable parameter that rep-resents the surface brightness of the source plane at agiven pixel. The semilinear method treats each pixel Rogers & Fiegeas a basis function and minimizes the mismatch be-tween model and data by manipulating the brightness ofeach source pixel s j independently (Warren & Dye 2003;Treu & Koopmans 2004; Suyu et al. 2006).The semilinear method divides the lens modeling prob-lem into a non-linear “outer loop” problem that solvesfor lens parameters, and an “inner loop” problem thatsolves for the pixelated source, assuming a fixed set oflens parameters. An important benefit of this approachis that the inner loop problem is linear and thereforedoes not require complicated nonlinear optimization rou-tines. The blurring and lensing effects are expressedby the matrix f = BL . The lensing matrix L en-codes the ray tracing operation from the image plane tothe source plane and forms the lensed image of a givensource brightness distribution. In this work we makeuse of a bilinear interpolation scheme, where the cen-ter of each image pixel is traced to a position on thesource plane using the lens equation. Then the bright-ness of an image pixel is found by a weighted average ofthe four source pixels that enclose each back-traced ray(Treu & Koopmans 2004; Koopmans 2005). This choiceis not unique and many different kinds of interpolationschemes have been studied in the literature includingnearest neighbor (Warren & Dye 2003), adaptive sourcepixel tilings (Dye & Warren 2005) and delaunay triangu-lations (Vegetti & Koopmans 2009). We plan on study-ing the effects of a variety of such interpolation schemesin future work.The blurring matrix B describes the effect of the PSFon the resulting lensed image. By minimizing the χ statistic with respect to the source plane intensities s j ,the least-squares form of the problem is exposed: F T F s = F T ˆ d , (4)where F is the lens matrix divided by the errors in thedata, F ij = f ij /σ i , and s is a “flattened” image vec-tor containing the intensities of the source plane pix-els (Warren & Dye 2003; Koopmans 2005). The vec-tor ˆ d i = d i /σ i is the data vector d normalized by thenoise σ i . This type of problem has been well studied inthe context of the standard image deconvolution problem(Golub, Heath & Wahba 1979; Vogel 1987; Hansen 1994;Nagy et al. 2002), which seeks to remove the distortionintroduced by a blurring function (PSF).In general, the solution of Equation 4 requires regu-larization to stabilize the inversion of the system matrix F T F (Koopmans 2005). The modified matrix is thengiven by M = F T F + λ H T H , (5)where H is a regularization matrix and λ a multiplierthat controls the amount of regularization added to theproblem. The simplest case, zeroth order regularization,assumes that H = I . This scheme regularizes the prob-lem by seeking the solution s that has minimal inten-sity over the source plane. Higher order regularizationschemes are also commonly used, such as curvature reg-ularization that uses the second order derivatives of s to smooth the solution by minimizing the curvature overthe source plane. Regularization schemes seek to impose physicality constraints on the source intensity to select asmoothly varying and physically realistic solution fromthe many alternatives that exist to solve the ill-posedsystem. Linear regularization schemes were studied indepth by Suyu et al. (2006).Following our previous work (Rogers & Fiege 2011a),we use the Qubist Optimization Toolbox (Fiege 2010)to find the nonlinear lens parameters varied in the outerloop of the lens inversion problem. The Qubist Tool-box contains several non-linear global optimization rou-tines including Ferret, an advanced genetic algorithm(GA), and Locust, a particle swarm optimizer (PSO).In the inner loop, we solve the least squares problem ofthe semilinear method using Krylov subspace methods(Bj¨orck 1996). Krylov subspace methods are well knownin the image deblurring community and have been stud-ied in the context of deconvolution problems at length(Hansen 1994; Nagy et al. 2002). This class of optimiza-tion routines include the conjugate gradient method forleast squares problems (CGLS) and the steepest descent(SD) method. Krylov methods are attractive becausethey naturally regularize ill-posed problems and are ef-ficient at solving large scale problems. We previouslystudied the performance of the GA and PSO methods ontest problems using simulated lens data (Rogers & Fiege2011a). In that work we found that the GA exploredthe parameter space more thoroughly than the PSO, al-though the PSO was slightly faster to converge.In this work, we will explore parameter selection meth-ods to determine an appropriate value for the regular-ization constant in the semilinear method, and use theFerret GA with our lens code to model data from theSLACS survey. We use a two stage approach to the lensmodeling problem: we begin the optimization with an-alytical sources to estimate the approximate position ofthe globally optimal lens parameters, and switch to apixelated source for further model refinement once theglobal optimizer has converged. GRAVITATIONAL LENS SOURCE DECONVOLUTION The semilinear method with regularization describesgravitational lens modeling in the context of a leastsquares problem, where we seek a vector s that mini-mizes g = || F s − ˆ d || + λ || Hs || . (6)The first term in this sum is the χ between the modeland observed images, while the second term quantifiesthe strength of the regularization. In general, gravita-tional lensing produces multiple images, so F is a rect-angular N × M matrix ( N > M ), where N is the numberof image pixels involved in the inversion and M the num-ber of source pixels.The most direct method to solve the least squares prob-lem is to decompose F using the singular value decom-position (SVD; Golub & Reinsch 1970), F = U Σ V T , (7)where Σ is an N × M diagonal matrix composed of aset of non-zero, non-increasing elements Σ jj = ν j suchthat ν ≥ ν ≥ , ..., ≥ ν M . These diagonal elements arelobal Optimization methods for Gravitational Lens Systems with Regularized Sources 3the singular values of F , defined as the eigenvalues of F T F and F F T , both of which produce identical sets ofnon-vanishing eigenvalues. The U and V matrices areorthogonal ( N × N ) and ( M × M ) matrices respectively.We denote the columns of these matrices as u i and v j ,the left and right singular value basis vectors. Thesevectors are the set of eigenvectors of the square matrices F F T ( N × N ) and F T F ( M × M ).It is straightforward to write the solution to the systemdefined by Equation (6) using the SVD in the absence ofregularization when λ = 0 in Equation (6), expressingthe solution as a sum over the basis vectors v j : s = ( F T F ) − F T ˆ d = V Σ − U T ˆ d = X j u Tj ˆ d ν j v j . (8)In this equation, we have written the SVD in terms ofsums over the orthogonal columns of U and V , and theentries of Σ − , which is simply defined as an M × N di-agonal matrix with non-zero elements Σ − jj = 1 /ν j . TheSVD allows us to express s as an expansion over theorthogonal basis v j .The matrix F will have small singular values such that ν j → s may thenbecome corrupted by the noise contained in the data vec-tor ˆ d . This amplification of noise due to small singu-lar values is the reason why regularization is required inEquation (6).The simplest regularization scheme simply truncatesthe terms that arise from small singular values from thesum in Equation (8). Since the singular values forma non-increasing set, this corresponds to discarding allterms j ≥ k , where k is the truncation threshold. Earlytermination of the sum removes the high frequency com-ponents of the basis vectors v j . This is known as thetruncated singular value decomposition, or TSVD: s φ = X j φ j u Tj ˆ d ν j v j , (9)where φ j are a set of constants called the filter fac-tors that are equal to 1 for terms j ≤ k and 0 for allterms higher than this threshold. However, terminat-ing the summation abruptly may discard too much highfrequency information. A more general choice is to grad-ually decrease the contribution of small singular valueterms to the sum. This approach is called Tikhonov reg-ularization, which amounts to a modification of the filterfactors (Tikhonov 1963): φ j = ν j ν j + λ (10)where λ is the regularization constant. Note that φ j ≈ ν j ≫ λ , which occurs for small j . When ν j issmaller than the regularization constant (large j ), thefilter factors damp the corresponding terms of Equation(8) as φ j ≈ ν j /λ . Thus, λ must be assigned a value between the maximum and minimum singular values ν and ν N . This regularization scheme corresponds to set-ting the matrix H = I in Equation (6) (Twomey 1963;Tikhonov 1963). Regularization modifies the system thatwe are attempting to solve so that the inverse of Equation(7) becomes F − φ = ( F T F + λ I ) − F T = V Σ − Φ U T , (11)where Φ is the N × N diagonal matrix of filter factorswith diagonal elements φ i>M = 0.Note that these schemes do not specify how muchregularization should be included for a given problem.The strength of the regularizing effect in Tikhonov reg-ularization is controlled by the value of the regulariza-tion constant λ and by the truncation index k in theTSVD scheme. The regularization constant is a “hyper-parameter” which must be selected a priori. Fortunately,several methods exist to estimate the optimal regulariza-tion parameter for a given problem (Hansen 2010). Regularization Parameter Selection Methods A widely used technique to select a regularization pa-rameter is the L-curve criterion (Hansen 1992), which weused in Rogers & Fiege (2011a). The L-curve is a plot ofthe residual versus the regularization term that appearsin Equation (6), and is named for the characteristic shapeof the resulting curve. The L-curve is parameterized bythe regularization constant λ and the position on theplot with the largest curvature represents a balance be-tween the image χ and regularization term (Press et al.2007). This does not imply that an optimally regularizedsolution has a reduced χ exactly equal to 1, but shouldtrade-off between the amount of source structure and thequality of the fit.As an alternative to the L-curve, another well-knownregularization selection method is generalized cross val-idation (GCV; Golub, Heath & Wahba 1979). This is astatistical method that aims to minimize the mean squareerror, || F s φ − d || , where s φ is the optimally regularizedsolution. We now define the GCV function: G ( λ ) = || d − F s || trace( I N − F F − φ ) , (12)where I N is the N × N identity matrix. This equation isbased on statistical arguments that consider a solution tobe properly regularized when it can predict elements ofthe data vector that have been omitted (Hansen 1997).The trace term in the denominator can be dramaticallysimplified given the definition of F − φ in terms of the SVD(Equation (11)). The denominator of the GCV functionbecomes:trace( I N − F V Σ − Φ U T ) = trace( I N − U Φ U T ) (13)using the SVD expansion of F (Equation (7)). With theorthogonality of U and the diagonality of Φ , the traceterm simplifies dramatically. We are left with trace( I N − Φ ) such thattrace( I N − F F − φ ) = N − X i φ i . (14) Rogers & FiegeThis sum represents the number of degrees of freedomin the problem. Putting these arguments together, theGCV function becomes G ( λ ) = || F s φ − d || ( N − P i φ i ) . (15)Wahba (1977) showed that when the errors in the datavector are unbiased white noise with covariance matrix C = σ I N , and satisfy the discrete Picard condition(Kress 1989; Engl et al. 1996), the minimum of the GCVfunction corresponds to a regularization parameter thatis a good estimator of the optimal λ and approaches thisvalue asymptotically as N → ∞ . The convergence re-sults between the true solution of a test problem andthe GCV-regularized solution have also been thoroughlyexplored when these conditions are not satisfied (Vogel1987; Lukas 1993).The denominator of the GCV function has special sig-nificance for gravitational lens modeling. Lens modelingschemes that pixelate the source plane have been criti-cized for relying on regularization since smoothing causesthe number of degrees of freedom in the source to be-come undetermined (Kochanek et al. 2004). Suyu et al.(2006) give an estimate for the number of effective de-grees of freedom based on Bayesian arguments. In thatwork the authors construct a variety of possible expres-sions for the number of degrees of freedom (NDF), andchose NDF = N − γ with N the number of image pixels,and γ = M X i =1 ν i ν i + λ , (16)which corresponds to Tikhonov (zeroth order) regular-ization when H = I in Equation (6). This expressionwas selected as the correct number of degrees of freedombased on an empirical test that produced a reduced χ nearest to 1 for a simulated test problem (see Table 1,Suyu et al. 2006). In fact, γ is simply the sum of thefilter factors from Tikhonov regularization. The GCVfunction gives a statistical argument for choosing thisvalue based on the nature of an optimally regularizedsource inversion.In addition to the GCV method, the unbiased pre-dictive risk estimator (UPRE; Mallows 1973) has alsobeen used to select the regularization parameter indeconvolution problems (Vogel 2002; Bardsley 2006;Lin & Wohlberg 2008). The UPRE method was initiallydeveloped for model selection in linear regression, thoughvariations of this approach have been subsequently ap-plied to the solution of inverse problems. A concisederivation of the method can be found in Vogel (2002);here we simply define the UPRE function U ( λ ) = || F s φ − d || + 2trace (cid:16) F F − φ (cid:17) − N. (17)In analogy with the denominator of the GCV function,we identify the trace term with the sum of the filter fac-tors, γ . The optimal regularization parameter is chosenas the value of λ that minimizes U ( λ ).Iterative methods complicate the calculation of theGCV and UPRE functions since we do not know the filter factors a priori, nor do we have the decomposition of F ,which can be expensive due to the sparsity and size of thematrix. In this case, we estimate the denominator by aMonte Carlo method (Girard 1989). This allows two ad-vantages: we approximate the number of source degreesof freedom while simultaneously finding an approxima-tion to the optimal regularization parameter. Using aniterative method, we find these quantities as we solve forthe source intensity distribution. This is accomplishedby running iterations on both ˆ d and ˜ d , where the vector˜ d is a discrete white-noise vector composed of elementsthat are ± d T ˜ r , where ˜ r = ˜ d − F ˜ s φ .This quantity approximates the denominator of the GCVfunction and therefore the number of degrees of freedomin the iterative problem (Girard 1989; Hansen 1997).This calculation requires twice the work during the it-erative process and therefore effectively doubles the exe-cution time of the code to solve for the source intensityfunction. However, since we generally need only a smallnumber of iterations to solve a gravitational lens sys-tem, this extra work is acceptable due to the amountof information the calculation provides. By using thisMonte Carlo estimate, we find the number of effectivedegrees of freedom and evaluate Equation (12) at eachiteration. Once we have evaluated an arbitrary numberof iterations, we find the minimum of the GCV functionand select the critical number of iterations necessary toproduce an optimally regularized source. The estimate ofthe number of source degrees of freedom and the residualat this iteration are used to evaluate the reduced χ ofthe lens model. A similar procedure can be used to eval-uate the UPRE function using ˜ d T F ˜ s φ to approximate γ ,the trace term in Equation (17).Rogers & Fiege (2011a) explored the L-curve methodfor the selection of regularization parameters in gravita-tional lens modeling, arguing that the L-curve providesa useful parameter selection method that yields resultswhich are easy to interpret. However, using this selectioncriterion can be difficult due to the curvature calculation,which requires spline fitting of the points on the L-curveand the curvature of the resulting smoothed curve. Thiscalculation is non-trivial and results can be somewhatsensitive to the details of the fitting procedure. TheGCV and UPRE functions require more involved sta-tistical arguments but provide more robust and reliableselection methods, since the functions are calculated ateach iteration simultaneously with the linear optimiza-tion. We find that the GCV and UPRE methods produceresults that are consistent with one another, indicatingthat both can be used effectively to determine the op-timal termination condition for the iterative solver. Weprefer evaluating the GCV and UPRE functions to theL-curve method for the reasons outlined above and focuson these parameter selection routines in this study. THE SLACS SURVEY lobal Optimization methods for Gravitational Lens Systems with Regularized Sources 5The Sloan Lens ACS Survey (SLACS) was conductedusing the Hubble Space Telescope ACS instrument(Bolton et al. 2006). The survey has detected 70 earlytype galaxies with definite lensed sources in the redshiftrange z = 0 . 06 to z = 0 . I -bandimages. See Bandara et al. (2009) for more details onthe reduction procedure. RESULTS Bolton et al. (2008) modeled the SLACS gravitationallens systems using analytical S´ersic and Gaussian sourcemodels to describe the intensity distribution in the sourceplane. A subset of 15 of these systems were further in-vestigated using the semilinear method (Koopmans et al.2006). We focus on six of the SLACS lens systems inthis paper, and plan to model more of them in the fu-ture. Since they have been well studied using severalestablished methods, the SLACS galaxies provide a use-ful consistency check for verifying the results of our lensmodeling code.The SLACS systems are modeled using a singularisothermal elliptical mass density (SIE). We define a dis-tance ψ = p qx + y /q , such that the deflection angle α = ( α x , α y ) is given by α x = bq f tan − (cid:18) q f xψ (cid:19) (18) α y = bq f tanh − (cid:18) q f yψ (cid:19) , (19)with q f = p /q − q , and Einstein radius b . In the limit q → 1, the model corresponds to a singular isothermalsphere with Einstein radius b = 4 π σ v c D ds D s , (20)where σ v is the velocity dispersion, c the speed of light, D ds the distance between the deflector and the source,and D s the distance between the observer and the source.These distances depend on the corresponding redshifts z d and z s and determine angular diameter distances thatdepend on the cosmological model used. We assume astandard cosmology with Hubble constant H = 70 kms − Mpc − , matter density Ω = 0 . = 0 . 7. Following Bolton et al. (2008),we adopt the intermediate-axis normalization of the SIE(Kormann, Schneider and Bartelmann 1994). This nor-malization fixes the mass within given isodensity con- tours for constant b , and is implemented in the deflec-tion angles above. Koopmans et al. (2006) showed thatthe SIE is a useful model of early type isolated galax-ies because the lens density ellipticity and orientationwere found to align well with the surface brightness ofthe SLACS lens galaxies, indicating that light closelytraces mass for these systems. No significant externalshear was found to improve the fits. We therefore followKoopmans et al. (2006) and adopt the SIE as a goodmodel to represent isolated the early type E/S0 SLACSlens galaxies.We cropped out the residuals left over from the surfacebrightness subtraction of the lens in the F814W SLACSdata, and cropped the field of view to the region of in-terest, but performed no rebinning or other manipula-tion of the data in any way. Our lens models use thesame ACS PSF that was used for the lens galaxy sub-traction. Although it is known that the ACS PSF is po-sition dependent (Bandara et al. 2009), we simplify ourtreatment by assuming a constant PSF over the region ofinterest, though we have previously developed methodsto include spatially variant PSFs in the gravitational lensproblem (Rogers & Fiege 2011b). We output the sigmaimage from the GALFIT code (Peng et al. 2010) thatcorresponds with the region of interest to estimate theerrors on the image plane. We emphasize that the mainfocus of this work is to study the regularizing propertiesof the CGLS method on the derived solutions with theGCV and UPRE schemes to select the optimal level ofregularization.Our analysis initially solves for the parameters of ananalytical source model, which we use as an approximatesolution to a more refined model that uses a pixelatedsource. We start by treating the source plane intensitydistribution as a sum of S´ersic profiles, using the samenumber of analytical source components to model eachsystem as in Bolton et al. (2008). The SIE lens is usedto find the lensed image of the source plane, which isconvolved with the appropriate ACS PSF. We search forthe global minimum of χ , using the Ferret GA (Fiege2010) to fit both the lens density and source parameters.Once we find an approximation to the global minimum,we select a volume of lens parameter space in the neigh-borhood around the best fit lens model. Noting thatFerret is used predominantly as a bounded optimizer,this neighborhood becomes the search volume in the nextstep of our method, which replaces our analytical sourcemodel with a pixelated source. The optimization of apixelated model requires a new Ferret run, which be-gins with the search volume found in the previous steppopulated initially by random lens models. Normally,we expect the lowest χ model to reside within this vol-ume; however, we configure the optimizer using “soft”boundaries, which allows the GA to move outside of thepredefined search volume if the initial approximation isbounded too tightly. This option allows Ferret to expandthe search space if a large fraction of the GA populationoccupies positions close to the boundaries of the param-eter space. In general, the lens parameters of our pix-elated sources were found to reside within these searchvolumes and agree well with the analytical approxima- Rogers & Fiege Table 1 - Lens Model ParametersSDSS System z d z s σ v (km s − ) q b(”)J0037-0942 0 . . . 825 1 . . . . 783 1 . . . . 661 0 . . . . 561 1 . . . . 620 1 . . . . 843 1 . TABLE 1Lens model parameters for a subset of the SLACS systemsfound by the Ferret GA with source reconstruction bythe CGLS routine. tions. We compute our best refined model by optimizingthe lens and source plane parameters using a pixelatedsource and regularizing iteration selected by the GCVand UPRE functions.In addition to the regularizing effect of truncated it-eration, we have found that enforcing non-negativity inthe source solutions dramatically improves the quality ofthe reconstruction and tends to further reduce remainingstructure in the image residuals. As a final step, we havemodeled the set of best-fit lens models with the modi-fied residual norm steepest descent algorithm (MRNSD;Kaufman 1993; Nagy & Strakoˇs 2000; Bardsley 2006).This algorithm is a bounded SD optimization routinethat seeks sources with s j ≥ 0. In practice we havefound that the MRNSD method can produce residualswhich decrease in a step-like manner, making the de-termination of the minimum difficult for the GCV andUPRE functions. This zig-zag behavior has also beennoted by Favati et al. (2010) in the context of the stan-dard deconvolution problem.Combining analytical and pixelated sources greatly im-proves the efficiency of the search, since analytical modelscan be evaluated very quickly. Searching using pixelatedsources is a more intensive process, and time can be savedby adopting the semilinear method only once we have agood approximation to the lens parameters correspond-ing to the minimum χ . Rogers & Fiege (2011a) notedthat a set of trivial pixelated solutions exist when globaloptimization methods are used to model lensed systems.These trivial solutions are found when the effect of thelens is reduced, resulting in sources that closely resemblethe data. The two-stage optimization process is usefulsince the initial analytical sources are generally not asflexible as pixelated sources, and thus provide a natu-ral method for avoiding exploration of the trivial regionsof the parameter space. The analytical stage of the al-gorithm terminates once the GA has converged and weno longer see improvement in the population. Typically,convergence requires only 50 − 100 generations using apopulation of 300 individuals for the analytical portionof the optimization, and approximately 100 iterations forthe second semilinear optimization stage.The final velocity dispersion σ v , axis ratio q , and Ein-stein radius b of our models are shown in Table 1. The re-duced data, model image, recovered non-negative sourceand residuals are shown in Figures 1 and 2. Our resultsagree with the SLACS lens models for each system towithin 3% in velocity dispersion σ v . Both the pixelated and analytical source plane intensity distributions agreewith one another in all cases. Our lens modeling resultsagree with the parameters in Bolton et al. (2008) verywell. The reduced χ statistic for all systems is veryclose to unity.The Einstein radius (velocity dispersion) and elliptic-ity of SDSS J0737 + 3216 are similar to the model fromBolton et al. (2008), and share a velocity dispersion sim-ilar to Marshall et al. (2007). However, the recoveredellipticity is smaller than both Koopmans et al. (2006)and Marshall et al. (2007). We found a lower elliptic-ity from both the initial and analytical source fit and bypixelated source modeling. To illustrate the differencebetween analytical and pixelated source lens models, weshow the lens parameter space in Figure 5. Points inthis figure are shaded according to confidence interval,and demonstrates that the 1 σ , 2 σ and 3 σ confidence re-gions are larger for the pixelated source than the ana-lytical source. This shows that a more flexible pixelatedsource can broaden the error bars on the lens param-eters when compared to an analytical description. Wehave observed this result for all the lens systems that westudied. The SDSS J0912 + 0029 data is heavily contam-inated with noise, although it is adequately fit by ourGCV and UPRE regularized solution, and our analyti-cal and pixelated sources agree. Of all of the systems,SDSS J0956 + 5100 and SDSS J0737 + 3216 show themost structure in the residuals, although the magnitudeof these residuals are small ( < ◦ .We have used the GCV and UPRE approaches withboth CGLS and SD, and find similar results for bothof these algorithms. The SD routine takes much longerthan the CGLS method to converge, although it is ingeneral a more stable approach to regularization and hasbeen suggested as a superior routine for image deblur-ring problems due to its reduced sensitivity to stoppingcriterion (Nagy & Palmer 2003). The best fit MRNSDsolutions are found by comparing the solution at each it-eration to the optimally regularized CGLS solution. Thiscomparison minimizes z = || x CGLS − x MRNSD k |||| x CGLS || (21)where x CGLS is the optimally regularized CGLS solutionand x MRNSD k the non-negative solution at the k th itera-lobal Optimization methods for Gravitational Lens Systems with Regularized Sources 7tion of the MRNSD algorithm. In general the MRNSDresiduals appear smoother than the residuals of theCGLS models. This is due to the reconstruction of back-traced noise present in the CGLS solutions. The filterfactors of the CGLS method are given by a recursion re-lation that depends on all of the singular values (Hansen2010). Even though CGLS tends to suppress high fre-quency noise at the beginning of the optimization pro-cess, the high frequency components are not completelydamped out at any given iteration and build up over thecourse of a run. Hence, even the optimally regularizedsolution still contains some high-frequency componentsthat correspond to back-traced noise. The MRNSD al-gorithm seems to be more robust to the propagation ofhigh-frequency noise in the recovered non-negative solu-tions, thus producing images that are naturally smootherthan the corresponding CGLS sources.Regularization by truncated iteration in the contextof Krylov optimization is the simplest of many regular-ization methods that can be used. Truncated iterationregularization produces solutions (figures 1 and 2) whichare less smooth than the second order (curvature) regu-larization used in Koopmans et al. (2006). It has beensuggested that the LSQR algorithm (Bj¨orck 1996) cangenerally accomfffmodate more complicated regulariza-tion schemes with increased numerical stability when thesystem is poorly conditioned. The LSQR routine is an it-erative Krylov subspace method that solves least-squaresproblems using QR decomposition (Paige & Saunders1982). We previously tested LSQR in the context ofgravitational lens modeling using simulated data withthe L-curve (Rogers & Fiege 2011a), and suggest thatthis scheme may provide a higher level of control overregularizing effects than the truncated iteration schemeused in this study.Figure 3 illustrates the regularizing behavior of theCGLS routine as a function of iteration k using SDSSJ0216 − System N img N src Reg. N eff γ χ J0037-0942 1072 895 I 708.90 363.10 0.94G 799 . 35 272 . 65 1 . 03C 845 . 29 226 . 71 1 . . 53 334 . 47 0 . . 13 1074 . 87 1 . 01G 6074 . 54 524 . 46 1 . 05C 6247 . 78 351 . 22 1 . 078 Iter. 6130 . 83 468 . 17 1 . . 95 552 . 05 0 . 97G 2154 . 11 381 . 89 1 . 03C 2191 . 58 344 . 42 1 . . 33 402 . 67 1 . . 58 677 . 41 0 . 96G 9659 . 28 210 . 72 1 . 00C 9764 . 23 105 . 77 1 . 015 Iter. 9482 . 82 387 . 18 0 . . 96 639 . 04 0 . 94G 4228 . 21 393 . 79 0 . 98C 4308 . 46 313 . 54 1 . . 65 421 . 35 1 . . 95 443 . 05 0 . 92G 3102 . 21 295 . 79 1 . 00C 3183 . 88 214 . 12 1 . 067 Iter. 3132 . 38 265 . 62 1 . TABLE 2A comparison of Bayesian selected regularization usingthe semilinear method with the GCV and UPRE functions. N img and N src are the number of pixels in the image planeand source plane. The column marked “Reg.” is theregularization type for each system (I, Intensity; G,gradient; C, curvature), N eff is the effective number ofdegrees of freedom for the corresponding lens model inTable 1, and γ gives the effective number of sourcedegrees of freedom. The reduced χ is given in therightmost column for each method. The GCV and UPREfunctions gave identical results for each of the systemsusing the CGLS method. For these calculations weestimated N eff and γ using the Monte Carlo approachdescribed in Section 2.1. functions, which correspond with one another in all of ourtest cases in Table 1 and are marked with circles. Theseselection schemes both favor the solutions from early it-erations. The lower right panel plots the L-curves usingall three types of regularization. Note that in this casethe L-curves from gradient and curvature regularizationmatch the GCV and UPRE solutions. However, we of-ten observe that the L-curve can show false curvaturemaxima when iterative optimizers make rapid progressearly in the run, leading to dramatically over regular-ized solutions. The GCV and UPRE functions avoid thisproblem. Combined with the statistical arguments usedto derive the GCV and UPRE functions and the morerobust behavior of these regularization parameter selec-tion methods, we conclude that the GCV and UPRE ap-proaches are more useful than the L-curve methodologyfor solving the least-squares source deconvolution prob-lem for gravitational lens systems. The comparison ofour results with the optimally regularized Bayesian solu-tions is shown in Table 2 for each of the systems that wemodeled.We have marked two additional points on the GCV Rogers & Fiegecurve in Figure 3. These points signify over regular-ized and under regularized solutions. The sources cor-responding to the solution of the SDSS J0216 − B and L while pre-serving the least-squares form of the problem. There-fore, even large scale systems that would prohibit thedirect application of the semilinear method in matrixform can be practically solved and optimally regular-ized by an iterative Krylov method (Rogers & Fiege2011a,b). For example, Alard (2009) has modeled thecluster lens SL2SJ021408-053532, which is comprised ofa small group of 6 galaxies and results in a set of largelensed arcs. As noted in that work, the large size ofthe system prevents direct application of the semilinear method. However, iterative approaches with algorithmicsubstitutions for the explicit representations of the blur-ring and lensing matrices can accommodate large-scalelensing problems that are realistic for a number of prac-tical modeling situations. CONCLUSIONS We have used iterative Krylov methods to model a sub-set of the SLACS lenses using GCV and the UPRE toselect the optimal regularizing iteration. We addressedthe problem of the number of effective degrees of freedomin the source by making use of parameter choice meth-ods that are commonly used in standard image decon-volution problems. This approach leads to a key resultfrom Suyu et al. (2006) that was derived using Bayesianmethods. The GCV and UPRE functions shed light onthe concept of optimally regularized sources and providean efficient method to select regularization parametersfor iterative schemes. A non-negative bounded iterativealgorithm is found to significantly improve the quality ofthe reconstructed sources. This approach provides non-negative solutions through linear optimization, which issignificantly simpler to implement than other constrainedoptimization techniques such as the maximum entropymethod (Skilling & Bryan 1984; Wayth & Webster 2006)that require the use of more complicated non-linear op-timization schemes.The lens parameters recovered by the Ferret GAare similar to previously published results found byBolton et al. (2008) and we find consistency betweenanalytical approximations to the source plane intensitybased on a sum of S´ersic profiles. We plan to investi-gate a larger sample of the SLACS lenses in the futureand explore other local optimization methods to solvethe least-squares problem with a variety of regulariza-tion schemes. ACKNOWLEDGEMENTS The authors are grateful to Kaushala Bandara, whosupplied the reduced SLACS observations used in thisstudy. We would also like to thank the anonymous ref-eree, whose thoughtful comments and suggestions sig-nificantly expanded the scope of this work and improvedthe overall flow of the paper. A.R. acknowledges NSERCfor funding this research, and J.F. acknowledges fundingfrom an NSERC Discovery Grant. REFERENCESAlard, C. 2009, A& A, 506, 2, 609-621Bandara, K., Crampton, D., & Simard, L. 2009, ApJ, 704, 1135Bardsley, J. M. 2006, BIT Numer. Math., 48, 4, 651Bj¨orck, ˚A. 1996, Numerical Methods for Least Squares Problems,(Philadelphia, PA; SIAM)Bolton, A. S., Burles, S., Koopmans, L. V. E., Treu, T. andMoustakas, L. A. 2006, ApJ, 638, 2,703Bolton, A. S., Burles, S., Koopmans, L. V. E., Treu, T., Gavazzi,R., Moustakas, L. A., Wayth, R. and Schlegel, D. J. 2008, ApJ,682, 946Brewer, B.J., Lewis, G. F., Belokurov, V., Irwin, M. J., Bridges,T. J. & Evans, N. W. 2011, MNRAS, 412, 4, 2521de Vaucouleurs, G. 1948, Ann.d’Astroph., 11, 247Dye, S. & Warren, S. J. 2005, ApJ, 623, 1 Engl, H. W., Hanke, M. & Neubauer, A. 1996, Regularization ofInverse Problems, (Dordrecht, Netherlands: Kluwer)Favati, P., Lotti, G., Menchi, O. & Romani, F. 2010, InverseProblems, 26, 8, 085013Fiege, J. D., 2010, Qubist Users Guide: Optimization, DataModeling, and Visualization with the Qubist OptimizationToolbox for MATLAB, (Winnipeg Canada:nQube TechnicalComputing)Girard, D. 1989, Numer. Math., 56, 1Golub, G.H., & Reinsch, C. 1970, Numer. Math., 14, 403Golub, G. H., Hansen, P. C. & Wahba, G. 1979, Technometrics,21, 215Hansen, P. C. 1992, SIAM Rev., 34, 561Hansen, P.C. 1994, Numer. Algorithms, 6, 1 lobal Optimization methods for Gravitational Lens Systems with Regularized Sources 9 Hansen, P.C. 1997, Rank-Deficient and Discrete Ill-PosedProblems, (Philadelphia PA: SIAM)Hansen, P.C. 2010, Discrete Inverse Problems: Insight andAlgorithms, (Philadelphia PA:SIAM publishers)Hutchinson, M. F. 1990, Commun. Stat. Simm. Comput., 19, 2,433Kaufman, L. 1993, IEEE Trans. Med. Imag., 12, 2, 200Kayser, R. & Schramm, T. 1988, Astron. Astrophys. 191, 39Kochanek, C. S., Schneider, P., & Wambsganss, J. 2004, Proc.33rd Saas-Fee Adv. Course, Part 2, ed. G. Meylan, P. Jetzer, &North, P. (Berlin: Springer)Koopmans, L. V. E. 2005, MNRAS, 363, 1136Koopmans, L. V. E., Treu, T., Bolton, A. S., Burles, S. andMoustakas, L. A. 2006, ApJ, 649, 2, 599Kormann, R., Schneider, P., & Bartelmann, M. 1994, A&A,284,285Kress, R. 1989, Linear Integral Equations, (Berlin: Springer)Lin, Y., & Wohlberg, B. 2008, Proc. IEEE Southwest Symposiumon Image Analysis and Interpretation, (Santa Fe, NM, USA),IEEE Computer Society, 89Lukas, M. A. 1993, Numer. Math., 66, 41Mallows, C. L. 1973, Technometrics, 15, 661-676Marshall, P., Treu, T., Melbourne, J., et al. 2009, ApJ, 671,2,1192Nagy, J., & Strakoˇs, Z. 2000, Math. Model., Est. and Imag., Vol.4121, ed. D. C. Wilson, Proc. of SPIE, 4121 (SPIE,Bellingham, WA 2000), 182Nagy, J. G. & Palmer, K. M. 2003, BIT Numer. Math., 43, 1003Nagy, J.G., Palmer, K.M., & Perrone, L. 2002, Numer. Alg. 36, 73Paige, C. C. & Saunders, M. A. 1982, ACM Trans. Math. Softw.8, 2Peng, C. Y., Ho, L. C., Impey, C. D. & Rix, H.-W. 2010, AJ, 139,2097 Petters, A.O., Levine, H. & Wambsganns, J. 2001, SingularityTheory and Gravitational Lensing (Boston, MA: Birkh¨auser)Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P.2007, Numerical Recipes: The Art of Scientific Computing (3rded.; New York:Cambridge Univ. Press)Rogers, A. & Fiege, J. D. 2011, ApJ, 727, 2, 80Rogers, A. & Fiege, J. D. 2011b, ApJ, 743, 1, 68Schneider, P. 1985, A&A 143, 413Schneider, P., Ehlers, J., & Falco, E.E. 1992, GravitationalLenses, (Berlin: Springer)S´ersic, J. L. 1968, Atlas de Galaxies Australes (Cordoba:Observatorio Astronomica)Simard, L., et al. 2002, ApJS, 142,1Skilling, J. & Bryan, R. K. 1984, MNRAS, 211, 111Suyu, S. H., Marshall, P. J., Hobson, M. P., & Blandford, R. D.2006, MNRAS, 371, 983Tikhonov, A. N. 1963, Sov. Math., 4, 1035Treu, T. & Koopmans, L. V. E. 2004, ApJ, 611, 739Twomey, S. 1963, J. Assoc. Comput. Mach., 10, 97Tyson, J.A., Kochanski, G.P., & dellAntonio, I.P. 1998, ApJ, 498,107Vegetti, S. & Koopmans, L. V. E. 2009, MNRAS, 392, 3Vogel, C. R. 1987, Report, Dept. of Mathematical Sciences,Montana State University, BozemanVogel, C.R. 2002, Computational Methods for Inverse Problems (Frontiers in Applied Mathematics Series, 23; PhiladelphiaPA:SIAM)Wahba, G. 1977, SIAM J. Numer. Anal., 14, 651Warren, S. J. & Dye, S. 2003, ApJ, 590, 2, 673Wayth, R. B. & Webster R. L. 2006, MNRAS, 372, 3, 1187 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−0.8 −0.2 0.4−0.500.6 θ x θ y Residuals−2 0 2−202 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−1 0 1−101 θ x θ y Residuals−2 0 2−202 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−0.5 0 0.5−0.500.5 θ x θ y Residuals−2 0 2−202 Fig. 1.— A selection of SLACS gravitational lenses. The sources are non-negative and found using the MRNSD algorithm as the finalpolishing step. The columns show the data d , image model, source model s and residual r respectively. The model parameters are givenin table 1. Top row: SDSS J0037-0942, second row: SDSS J0216-0813, bottom row: SDSS J0737+3216. Model images and sources in thisfigure were produced using an image pixel subsampling factor of 3. lobal Optimization methods for Gravitational Lens Systems with Regularized Sources 11 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−1 0 1−101 θ x θ y Residuals−2 0 2−202 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−1 0 1−101 θ x θ y Residuals−2 0 2−202 θ x θ y Observation−2 0 2−202 θ x θ y Model Image−2 0 2−202 β x β y Model Source−0.5 0 0.5−0.500.5 θ x θ y Residuals−2 0 2−202 Fig. 2.— Top row: SDSS J0912+0029, second row: SDSS J0956+5100, bottom row: SDSS 1402+6321. Model images and sources in thisfigure were produced using an image pixel subsampling factor of 3. no r m ( s i − s ) no r m ( s g − s ) no r m ( s c − s ) G C V ( k ) UPREIteration k U P R E ( k ) l og ( η ) L−curve CurvatureIntensityGradient Fig. 3.— Top row: Comparison of the difference between CGLS at each iteration k and Bayesian selected solutions using intensity(top left panel), gradient (top center) and curvature regularization (top right). Early iterations correspond more closely to gradient andcurvature regularized solutions while the later iterations more closely approximate intensity regularization. The GCV and UPRE functionsselect the iteration marked by circles, and the intensity L-curve picks the under-regularized solution marked with a square. Bottom row:Selection methods as a function of iteration including the GCV function (bottom left panel), the UPRE function (bottom center) and aset of L-curves found by using the intensity, gradient and curvature norms of the CGLS solutions. In this case the curvature and gradientL-curves select the same solution as the GCV and UPRE functions. The iterations marked with triangles on the GCV plot representover-regularized and under-regularized solutions, respectively. lobal Optimization methods for Gravitational Lens Systems with Regularized Sources 13 12 MRNSD β x β y −2 0 2−202 18 MRNSD β x β y −2 0 2−202 44 MRNSD β x β y −2 0 2−2024 CGLS β x β y −2 0 2−202 8 CGLS β x β y −2 0 2−202 15 CGLS β x β y −2 0 2−202 Fig. 4.— Three solutions for SDSS J0216-0813 marked in the lower left-hand panel of Figure 3. These solutions correspond to over-regularized (left), critically-regularized (middle) and under-regularized solutions (right) as selected by the GCV function. Note the emphasison back-traced noise in the under-regularized CGLS solution and the excessive smoothing of the over-regularized solution. Non-negativeMRNSD solutions are shown on the second row. 290 295 300 3050.60.650.70.750.8 σ v q 290 295 300 3050246 σ v θ 290 295 300 3050.60.650.70.750.8 σ v q 290 295 300 3050246 σ v θ Fig. 5.— An example of the lens parameter space mapping for J0737+3216 using the GA. Solutions within the 1 σ contour are black, 2 σ mid gray and 3 σ light gray. The top row shows the lens parameter space with a pixelated source (lens velocity dispersion σ v , ellipticity q and lens orientation angle θθ