[PDF] Using Convolutional Neural Networks to identify Gravitational Lenses in Astronomical images

Abstract

The Euclid telescope, due for launch in 2021, will perform an imaging and slitless spectroscopy survey over half the sky, to map baryon wiggles and weak lensing. During the survey Euclid is expected to resolve 100,000 strong gravitational lens systems. This is ideal to find rare lens configurations, provided they can be identified reliably and on a reasonable timescale. For this reason we have developed a Convolutional Neural Network (CNN) that can be used to identify images containing lensing systems. CNNs have already been used for image and digit classification as well as being used in astronomy for star-galaxy classification. Here our CNN is trained and tested on Euclid-like and KiDS-like simulations from the Euclid Strong Lensing Group, successfully classifying 77% of lenses, with an area under the ROC curve of up to 0.96. Our CNN also attempts to classify the lenses in COSMOS HST F814W-band images. After convolution to the Euclid resolution, we find we can recover most systems that are identifiable by eye. The Python code is available on Github.

Full PDF

MMNRAS , 1–8 (2018) Preprint 14 May 2019 Compiled using MNRAS L A TEX style ﬁle v3.0

Using Convolutional Neural Networks to identifyGravitational Lenses in Astronomical images

Andrew Davies (cid:63) , Stephen Serjeant , Jane M. Bromley School of Physical Sciences, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK School of Computing & Communications, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACT

The Euclid telescope, due for launch in 2021, will perform an imaging and slitlessspectroscopy survey over half the sky, to map baryon wiggles and weak lensing. Duringthe survey Euclid is expected to resolve 100,000 strong gravitational lens systems. Thisis ideal to ﬁnd rare lens conﬁgurations, provided they can be identiﬁed reliably andon a reasonable timescale. For this reason we have developed a Convolutional NeuralNetwork (CNN) that can be used to identify images containing lensing systems. CNNshave already been used for image and digit classiﬁcation as well as being used inastronomy for star-galaxy classiﬁcation. Here our CNN is trained and tested on Euclid-like and KiDS-like simulations from the Euclid Strong Lensing Group, successfullyclassifying 77% of lenses, with an area under the ROC curve of up to 0.96. Our CNNalso attempts to classify the lenses in COSMOS HST F814W-band images. Afterconvolution to the Euclid resolution, we ﬁnd we can recover most systems that areidentiﬁable by eye. The Python code is available on Github.

Key words:

Gravitational Lensing – Classiﬁcation – Neural Networks

Gravitational lensing is caused by the mass of a foregroundobject, such as a galaxy or galaxy cluster, deﬂecting lightfrom another distant source object, such as a galaxy orquasar. Strong gravitational lensing is rare with only afew systems expected from surveying thousands of objects(Blain 1996). The ﬁrst strong gravitational lens system, QSO0957+561, was recognised as such in 1979 when the spectraof two objects were compared and conﬁrmed to be from thesame object (Walsh et al. 1979).The Jodrell Bank-Very Large Array (VLA) Astromet-ric Survey (JVAS) (Patnaik et al. 1992; Browne et al. 1997)and the Cosmic Lens All-Sky Survey (CLASS) (Browneet al. 2003; Myers et al. 2003) have detected 22 radio loudlensed active galactic nuclei (Chae 2003). Currently theSloan Lens ACS Survey (SLACS) has provided the moststrong lensed systems from a single survey with nearly 100observed (Bolton et al. 2008). Other sources of strong lensesinclude the Kilo-Degree Survey (KiDS) (de Jong, Jelte T.A. et al. 2015) which uses the VLT Survey Telescope atthe Paranal Observatory in Chile, the Dark Energy Sur-vey (The Dark Energy Survey Collaboration 2005), and theSubaru Hyper Suprime-Cam Survey (Miyazaki et al. 2012),expected to observed thousands of lenses (Collett 2015). The (cid:63)

E-mail: [email protected]

BOSS Emission-Line Lens Survey (BELLS) have discoveredat least 25 strong galaxy − galaxy gravitational lens systemswith lens redshifts . < z < . , discovered spectroscopicallyby the presence of higher redshift emission lines within theBaryon Oscillation Spectroscopic Survey (BOSS) of lumi-nous galaxies, and conﬁrmed with high − resolution HubbleSpace Telescope (HST) images (Brownstein et al. 2012).Lensing systems are extremely useful cosmologicaltools. Lensed systems can be used to constrain the value ofthe Hubble constant, H , by measuring time delay (Refsdal1964; Kochanek & Schechter 2004), which occurs becausethe light from multiple images has taken diﬀerent paths toreach the observer, introducing a time delay. Cosmologicaldistances are proportional to c / H , meaning ∆ t = ( / H ) k where k is related to the lens mass model. If a lens modelcan be found then we can predict ∆ tH and infer H . Gravita-tional lensing is independent of the lensing object’s luminos-ity and depends only on the mass of the lens object and thegeometry of the source and the lens relative to the observer.This makes lensing a unique tool for analysing mass distribu-tion in the foreground lens (Treu & Koopmans 2002). Usingthis dependence on mass alone, and combining mass modelsfrom mass − luminosity analysis, the baryonic and dark mat-ter mass of the galaxy can be mapped to ﬁnd dark mattersubstructure (Vegetti et al. 2012; Metcalf & Madau 2001).Gravitational lensing conserves surface brightness (a conse-quence of Liouville’s Theorem) but not the angular size of © a r X i v : . [ a s t r o - ph . I M ] M a y A. Davies the source object (Marchetti et al. 2017), causing a magniﬁ-cation of the source object’s ﬂux, if the image of the sourceis enlarged. This enables the observation of fainter galaxieswhich would otherwise be missed, including galaxies at highredshifts (Claeskens et al. 2006; Jackson 2011; Wyithe et al.2011; Marchetti et al. 2017).Future telescopes are expected to observe many morestrongly lensed systems. The Euclid telescope (Laureijs et al.2011) and the Large Synoptic Survey Telescope (LSST Sci-ence Collaboration et al. 2009), will bring the total numberof systems above (Oguri & Marshall 2010; Collett 2015).Euclid will map three-quarters of the extragalactic sky with0.2 arcsecond resolution to 24 AB magnitude (Amendolaet al. 2018). Another project, the Square Kilometer Array(SKA) (Rawlings & Schilizzi 2011), will take observations ata resolution of 2 milliarcseconds at 10 GHz, and 20 milliarc-seconds at 1 GHz (Perley et al. 2009). The lensing surveysusing SKA are expected to observe ≈ new radio − loudgravitational lenses (McKean et al. 2015; Serjeant 2014).There is currently a shift in the methodology for de-tecting strongly lensed systems in astronomical images asnumbers of lens candidates becomes much larger. Tradi-tionally most images were found by eye. 112 lens candi-dates, and at least 2 certain lenses, were found in a HSTlegacy programme, looking at the COSMOS ﬁeld (Faureet al. 2008; Jackson 2008). But not all searching by eye hasbeen carried out by people working in strong gravitationallensing. The public have been tasked with ﬁnding new lenscandidates through the Space Warps citizen science project(Marshall et al. 2016). Space Warps made use of volunteersanalysing 430,000 images by eye to look for lensing features,via an online webpage using the Zooniverse platform. Tensof new lens candidates have been identiﬁed with the helpof these volunteers, using large ground − based surveys, e.g.the Canada − France − Hawaii Telescope Legacy Survey (Moreet al. 2016). But with the growth of survey size there will betoo many candidates to be examined by eye.There have been several successful methods of compu-tational searches for lenses. Arcﬁnder (Seidel, G. & Bartel-mann, M. 2007), uses pixel-grouping methods to attempt toﬁnd cluster − scale lens systems. Ringﬁnder (Gavazzi et al.2014), searches for blue residuals surrounding early − typegalaxies using multi − band data, also there are several pro-grams to ﬁnd arc − like shapes (Lenzen, F. et al. 2004; Moreet al. 2012).In recent years there has been a rise inmachine − learning methods to detects lenses. 56 lenscandidates were found in the KiDS dataset using a con-volutional neural network (Petrillo et al. 2017). Machinelearning methods rely on large datasets in order to trainand learn, something which has become available in recentyears. Once a machine has been trained, thousands ofimages can be classiﬁed in seconds. Speed is an importantfactor due to the expected images from Euclid (Collett2015). The use of citizen science could be used to create adataset of images for machine learning techniques to trainon, once trained, citizen science can then be applied againto examine the output images from the machine learningtechnique.In section 2 we discuss neural networks and why we usethem for this problem. In section 3 we discuss the simulateddata we have used. In section 4 we discuss our convolutional 𝜒 𝜒 𝜒 𝜒 bias 𝑏 𝑓 𝑖=14 𝑤 𝑖 𝑥 𝑖 + 𝑏 Output 𝑤 𝑤 𝑤 𝑤 Activation function 𝑓 Perceptron

Figure 1.

A perceptron with inputs x , x , x , x , weights w , w , w , w , bias b , and output calculated from the activationfunction f , together with the product of the weights and inputand bias added. neural network and how we trained it, and the results are dis-cussed in section 5. The Python code is available at Github . Computers are very eﬀective at tasks with a limited set ofrules, such as chess. However, humans are still often betterat real world tasks which cannot easily be described by a setof rules, e.g. recognising objects. Artiﬁcial Neural Networks(NNs) are loosely inspired by how the brain works. They aremade from simple computing elements with multiple inputsand one output analogous to a brain made up from neu-rons with dendrites and cell body receiving the inputs andaxon outputting a signal. Like the brain, a NN can mod-ify strengths of connections learnt from examples. Humanshave evolved to be very fast and accurate at recognisingobjects, order of 100 ms (Thorpe et al. 1996). NNs, partic-ularly Convolutional Neural Networks (CNNs) are the bestavailable techniques in some tasks, e.g. translation, visualobject recognition (LeCun et al. 2015). With current tech-nology, a trained CNN can also perform these tasks fasterthan humans.Neural Networks are built from individual artiﬁcial neu-rons called perceptrons. A perceptron is designed to simu-late the role of a biological neuron, but with the advantagethat a mathematical function can be used for activation ofthe neuron (Aggarwal 2014). A perceptron takes a numberof inputs, applies a weight to each, sums these products,applies a bias, and then is used as input to an activationfunction, which then gives the perceptron’s output. Percep-trons can accept multiple inputs and apply diﬀerent weightsto each individual input. A perceptron with example inputsand outputs can be seen in Fig. 1. Individual scalar inputsare grouped together as a 1-d vector. If we let x be the 1-dvector of inputs, w be the 1-d vector of weights associatedwith this perceptron, b be the bias, and f be the activation https://github.com/A-Davies/LensCNNMNRAS000

Convolution and pooling layers Convolution and pooling layers Flattening OutputConvolutional Neural Network Fully-connected Neural Network

Figure 2.

Diagram showing the architecture of the network andwhere the convolution layers are in respect to the fully-connectedneural network. Not all layers shown. function, then the output z is calculated using: z = f ( n (cid:213) i = w i x i + b ) (1)where n is the number of inputs to the perceptron. In aNN, many perceptrons are grouped together to form a layercontaining a few perceptrons to thousands. Layers use theoutputs from previous layers as their inputs.Neural Networks have to be trained before they can beused. For classiﬁcation problems this involves passing manyimages of a known classiﬁcation through the network in aprocess called supervised learning. The network classiﬁeseach image, and this output is combined with the true clas-siﬁcation in a function called the loss function. A high valuefor the loss function means many images were incorrectlylabelled by the network. The goal of training the network isto minimise the loss function over a number of passes of thetraining data. Each pass of the data is known as an epoch.After each epoch, the weights and biases are changed usingStochastic Gradient Descent (SGD), in order to minimisethe loss function thereby increasing the rate of correct clas-siﬁcations from the network. The rate at which SGD changesthe weights and biases is controlled by a variable called thelearning rate. The is a linear parameter which controls byhow much the weights and biases are changed. Using Adam(Kingma & Ba 2014) instead of only SGD allows the learningrate to change as the network learns. Initially the learningrate starts oﬀ high, and decreases as the network becomesmore accurate and the loss function value decreases. A smallsubset of images are used for data validation. After everyepoch the validation images are classiﬁed and validation lossis recorded, calculated the same way as the training loss. Thenetwork is not trained on the validation data, so no changesare made to the weights and bias. Validation is done to pre-vent over-training the network. Training is stopped once thevalidation loss has reached a minimum. The validation losswill increase after this point as the network becomes over-trained, and this extra training is detrimental to classify-ing new datasets. After training has been completed, newimages can be classiﬁed by passing the image through thenetwork and obtaining a classiﬁcation. Classifying an imagemakes no changes to the network parameters. Batch trainingmeans that the weights and biases are updated after seeing only a fraction of the training set. The batch number is typi-cally small compared to the training number. Batch trainingis used to speed up training since the weights and biases arechanged after each batch instead of at the end of each epoch.CNNs are a subset of NNs which use convolutional lay-ers in the network for feature recognition or classiﬁcation.An example architecture of a CNN can be seen in Fig. 2. Aconvolutional layer involves a kernel being convolved withan input image in order to make a feature map. Often theconvolution layer has several diﬀerent kernels for the sameimage, meaning several diﬀerent feature maps are given asoutput. The image kernel can either be pre-determined, orcan be another parameter that the network trains and opti-mises. Diﬀerent layers may have diﬀerent image kernel sizes,as well as diﬀerent sized image outputs. Between convolu-tion layers, pooling layers are often inserted. A pooling layeris designed to greatly reduce the number of pixels in an in-put image to speed up training and reduce the number ofparameters. Pooling is generally done one of two ways, maxpooling or mean pooling. Both methods look at a small sec-tion of the image, say a × section, and reduces this toone pixel value by either ﬁnding the maximum value in the × square or by ﬁnding the mean. The output is then areduced image with fewer pixels than the input. Pooling isdone to reduce the number of variables whilst trying to keepas much spatial information as needed (Mallat 2016). Theconvolution layers in a CNN are designed to process visualinformation hierarchically, with earlier layers ﬁnding morebasic features, and later layers building on what layers be-fore have found to create more complicated features withinthe image. This is how a network can go from seeing indi-vidual pixel values to ﬁnding complicated features, such asa face. After the convolutional layers, the network will havea layer to ﬂatten the output from the ﬁnal layer into a 1-dvector to be used as input for a layer composed of percep-trons. A layer in a network made from perceptrons that eachuse every output from the previous layer as input is knownas a fully-connected or dense layer. All layers in NN apartfrom the input and output layer are known as hidden layers.CNNs have been used to solve classiﬁcation problemssuch as digit recognition with the MNIST database, a collec-tion of 70,000 × grey scale images of handwritten digits.The problem is to classify these as the digits 0 through 9.Using CNNs gives a solution with an error rate of 0.23%(Ciresan et al. 2013). CNNs have also been successful in ob-ject recognition in images. The CIFAR-10 database consistsof 60,000 × colour images in 10 classes, such as truck,bird and dog, with 6000 images per class. The error rate inthis classiﬁcation problem when using CNNs is lower than4% . An astronomy classiﬁcation problem where CNNs havebeen used is classifying images as a star or galaxy. Here thebest network had an error rate of only 0.29% for galaxies( ?? ). The task of ﬁnding strong gravitational lenses in largedatasets is a problem within Euclid. Feature recognition by http://rodrigob.github.io/are we there yet/build/classiﬁcation datasets results.htmlMNRAS , 1–8 (2018) A. Davies eye will not be fast enough to cope with the amount of datareceived. The Strong Lensing group within the Euclid con-sortium set up the Euclid Strong Lensing challenge. Thiswas a challenge aimed at developing machine learning tech-niques to classify images as to whether containing a lensor not. The simulated data was provided by the BolognaLens Factory . The images are producing using GLAMERcode (Metcalf & Petkova 2014), which uses galaxies from theMillennium Simulation and real galaxies from KiDS (KiloDegree Survey) as foreground lenses and background sourcesto produce the simulated images. More details of the lensingprocess can be found in (Metcalf et al. 2018). The simu-lated images are provided as × pixel sized images,centred on the foreground galaxy, either containing a lensedsource or not. In total 200,000 images were provided,100,000 were Euclid VIS-like images, with ≈ ≈ . The Euclid-like space images are single bandimages, very broad band (r+i+z), whereas the KiDS-likeimages have four bands; u , g , r , i . The images have an imageresolution of 0.2 arcsecond, meaning each is a × arcsec-ond square image. Examples of the Euclid VIS-like imagescan be seen in Fig. 3, and an example of the 4 KiDS-likebands can be seen in Fig. 4. However our close scrutiny ofthe simulated images uncovered some unphysical examples.The COSMOS lenses , were used as a comparison to testthe simulations against. Examples of the COSMOS lensescan be seen in the Appendix. The band used for the COS-MOS images is the HST F814W wide band. F814W coversthe longer wavelength half of the VIS throughput. Visual in-spection of the VIS and smoothed COSMOS images showsqualitatively similar features. Therefore, we argue that theCOSMOS data set is an appropriate and interesting test ofour VIS-trained network.By comparing histograms of the Einstein radius andlens magnitudes of both the simulated Euclid VIS-like im-ages and the real COSMOS lens images, it was found thatmany of the simulations had unrealistically large Einsteinrings, and that the Euclid VIS-like and KiDS-like simu-lations were fainter than the COSMOS lenses. Because ofthis, images with Einstein radii greater than 4 arcseconds,and lenses towards the faint end of the Euclid VIS-like andKiDS-like datasets have been removed in order to create amore representative subset. Histograms showing removal ofsome of the Euclid VIS-like images to make the dataset moreCOSMOS-like can be seen in Fig. 5. A similar histogram alsoshows that by removing the larger Einstein radii, the EuclidVIS-like images have an Einstein radius distribution similarto that of COSMOS. The same has been done for the KiDS-like dataset. In total we now have 4 datasets for training and5 for testing which can be seen in Table. 1 Four networks have been built, two designed to work withKiDS-like images with 4 ﬁlter bands as input, and two to https://bolognalensfactory.wordpress.com/ Figure 3.

Samples from the 100,000 Euclid VIS instrument sim-ulated images. The top row of images do not contain lenses, whilethe bottom row contains lenses.

Figure 4.

A sample from the 100,000 KiDS-like simulated im-ages. The images are labelled above by their corresponding wave-length band. This example does contain lensing. N u m b e r N u m b e r

18 20 22 24 26AB Magnitude0.02.55.0 N u m b e r Figure 5.

Histograms showing the AB magnitude across 3datasets. Top: Original 100,000 Euclid VIS-like images, Middle:Subset of Euclid VIS-like images designed to have the same dis-tribution as the COSMOS lenses, Bottom: The COSMOS lenses.MNRAS000

Table describing the contents of each dataset.Type Description Number of lensesEuclid-VIS like simulations 100,000 single-band × × simulated images 39975KiDS-like simulations 100,000 multi-band × × simulated images 49862Euclid-VIS like simulations with COSMOS distribution 68,923 single-band × × simulated images 24029KiDS-like simulations with COSMOS distribution 60,144 multi-band × × simulated images 29960COSMOS lenses 65 single-band real lenses cropped to × images 65 work with Euclid VIS-like data with a single ﬁlter input.They have the same architecture, but have been trained ondiﬀerent data using datasets 1 through to 4 from Table. 1.The networks have been built and trained in Python 2.7 us-ing the neural network library Keras . Keras runs on topof either Theano , TensorFlow or CNTK back-ends . We used Theano. The CNN architecture used herehas been inspired by the work of (Petrillo et al. 2017). Thenetwork architectures can be seen in Table 2. Robust initial-isation (HeNormal) is used to initialise the weights as thisspeeds up network convergence (He et al. 2015). The net-works have 4 convolutional layers initially, with × max-pooling incorporated twice after the ﬁrst two convolutionallayers. After the convolutional layers, the 2D feature mapsthat have been made in the ﬁnal convolutional layer are ﬂat-tened into a 1D vector to be used as input into the denselayer of fully connected neurons. The ﬁnal layer is a classi-ﬁcation layer, where the network gives each image a classi-ﬁcation between 0 and 1. This number can been seen as aprobability that the image is a lens. The CNN is trained ona set of 75% of the labelled images from the dataset, using aprocess of batch training with a batch size of 500. 5% of thedataset is used for data validation, to avoid over-training. Throughout our networks we used ReLU (RectiﬁedLinear Units (?)) for the activation functions andbinary cross-entropy was used as the loss function . An image is judged to contain a lens if the classiﬁcation fromthe network is above or equal to . , conversely if the classiﬁ-cation is below . the image is judged to not contain a lens.This value can be increased to give a more accurate classi-ﬁcation, causing the number of false positives to decrease.However it also means that more lenses are misclassiﬁed.20% of each dataset is passed through the appropriate net-work to be classiﬁed; this is the test set. The scores for eachnetwork can be seen in Table. 3. By looking at the percentageof images classiﬁed correctly, and the percentage of lensesand non-lenses classiﬁed correctly, the KiDS-like networksare more successful than the Euclid VIS-like networks. Thisis not surprising since the KiDS-like images have 4 imagebands compared to the single band of the Euclid VIS-likeimages. This would imply that colour information from themultiple bands of the KiDS-like images has been helpful in https://keras.io/ http://deeplearning.net/software/theano/ https://github.com/Microsoft/cntk classifying the lenses correctly, as one would expect as ithelps greatly when classifying by eye. A test of this (whichwe will conduct in future work) would be to compare single-band KIDS-like images with multi-band KIDS-like images.In both the COSMOS-like datasets, the percentage of lensescorrectly identiﬁed is less than the original datasets. Thisis probably because the images with the largest rings ( > The area under the ROC curvedetermines the results, 1 being the score for all clas-siﬁcations correct, 0 the score for all classiﬁcationsincorrect, and 0.5 being the result of random selec-tion . True negatives and false negatives are deﬁned simi-larly. Fig. 6 conﬁrms that the KiDS-like dataset performedbest, although the network only improved slightly using asubset of the images, unlike the Euclid dataset which im-proved signiﬁcantly by removing images from the dataset.As well as using simulated data, we tested on 65 realimages from COSMOS. These are single-band images madefrom larger image cutouts and modiﬁed to have the samePSF as the Euclid images by convolving with a suitable ker-nel. The full width at half maximum (FWHM) of Euclidsquared is equal to the FWHM of COSMOS squared plusthe FWHM of the kernel squared;

FW HM ( Euclid ) = FW HM ( COSMOS ) + FW HM ( kernel ) After convolution to match the Euclid PSF, the COSMOSimages also had their pixels resampled to match the Euclidpixel scale. Images of the COSMOS lenses before and afterapplying the convolution can be seen in the Appendix. Theresulting images were also × pixels. Having only 65available images meant that training on these images wasnot a possibility, but testing them with the trained Euclid-like networks was. The results can be seen in Table. 4. Thescores for both networks were very low, and can be expectedafter training on a diﬀerent type of image. Every COSMOSlens that has been classiﬁed incorrectly is a false negative.All of the images from the COSMOS survey can be seenin the Appendix. Nevertheless, our network recovers16/31 of the lenses identiﬁable by eye at the Euclid

MNRAS , 1–8 (2018)

A. Davies

Table 2.

Table shows the architecture of the 4 networks, Euclid VIS-like and KiDS-like. What each layers contains, as well as each layersinitial weights and biases. The ﬁnal sigmoid layer gives an output between 0 and 1.

For the convolutional and max-pooling layersa stride length of 2 was used, meaning each pixel was only used once in pooling, padded with the same edge pixelswhere required.

Type of layer Layer contains Initial weights Initial biasConvolutional 8 ( × ) image kernels HeNormal ZeroesMax-pooling Pooled over each ( × ) square - -Convolutional 8 ( × ) image kernels HeNormal ZeroesMax-pooling Pooled over each ( × ) square - -Convolutional 16 ( × ) image kernels HeNormal ZeroesConvolutional 16 ( × ) image kernels HeNormal ZeroesFlatten Convert image maps into 1-d vector - -Dense 512 fully-connected neurons HeNormal ZeroesDense 1 sigmoid output neuron HeNormal Zeroes resolution (see Appendix), and 8/34 of the lensesthat cannot be identiﬁed by eye . Although our net-work at Euclid resolution only recovers of the lensesknown to exist at HST resolution, this is in itself quite in-teresting: it implies that the detectability of lensing is a verystrong function of angular resolution. Roughly doubling theangular resolution (from Euclid to HST resolution) resultsin roughly a ﬁve-fold increase in the numbers of detectablelensing systems.Even though the KiDS-like subset showed a great dealof success, there are still things to be wary of: all the im-ages used in this work in training and testing are simulatedimages, but real images may not be classiﬁed as accurately,which can be seen by looking at the results from the COS-MOS images. The percentage of images containing a lens ismuch higher in these simulated cases than for real data.Once implemented with real data, the number of lensesobserved will increase. This in turn will increase the numberof rare lens systems found, such as double-source plane lenssystems (Collett & Auger 2014). These rare systems can beused to constrain the dark-energy equation of state param-eter w (Gavazzi et al. 2008). Lens models can be made fromthe systems observed, and when coupled with visible imagescan infer dark matter substructure within the lensing galaxy.Future work will include training and testing the net-works on updated simulations incorporating cluster lensesand Eulcid’s grism data. The problems from these ﬁrst sim-ulations have been noted so that the next training set willnot include large Einstein rings ( > Figure 6.

Receiver operating characteristic curves for Euclid net-work (top) and KiDS network (bottom) with the area under thecurve shown. The black dashed line indicates the curve for ran-dom choice.

A well designed CNN can be used with future observationsfrom Euclid and other similar surveys as they are demon-strably successful on simulated data. Making more realisticsimulations, more accurate distributions of Einstein radiiand faint lens galaxies, will give a more accurate accountof how CNNs will perform with real data. Machine learn-ing techniques will provide a subset of ostensibly reliablelens systems where veriﬁcation by visual inspection can beachieved in a realistic time − scale which can then be used toreﬁne the CNN. ACKNOWLEDGEMENTS

We thank the anonymous referee for many helpful and con-structive comments. We thank the Science and Technol-ogy Facilities Council for ﬁnancial support under grantsST/N50421X/1 and ST/P000584/1. We acknowledge sup-port during the preparation of this work from the Interna-tional Space Science Institute (ISSI), Berne, Switzerland, in

MNRAS000

MNRAS000 , 1–8 (2018) sing CNNs to identify gravitational lenses Table 3.

This table shows the percentage of images classiﬁed correctly, for lenses, non-lenses and total correct.Images Lenses Correct (%) Non-Lenses Correct (%) Total Correct (%)Euclid VIS-like simulations 60.32 93.26 80.13KiDS-like simulations 88.17 86.82 87.49Euclid VIS-like simulations with COSMOS distribution 56.14 98.86 93.33KiDS-like simulations with COSMOS distribution 76.83 97.46 93.62

Table 4.

Table containing the classiﬁcation results of the COSMOS lenses on two diﬀerently trained networks. Note all 65 images werelenses. Trained Dataset Lenses CorrectEuclid VIS-like simulations 18 (27.69%)Euclid VIS-like simulations with COSMOS distribution 15 (20.00%) the form of support for meetings of the collaboration ’StrongGravitational Lensing with Current and Future Space Ob-servations’, P.I. J-P. Kneib.

REFERENCES

Aggarwal C. C., 2014, Data classiﬁcation: algorithms and appli-cations.. Chapman & Hall, CRC PressAmendola L., et al., 2018, Living Reviews in Relativity, 21, 2Blain A., 1996, mnras, 283, 1340Bolton A. S., Burles S., Koopmans L. V. E., Treu T., Gavazzi R.,Moustakas L. A., Wayth R., Schlegel D. J., 2008, ApJ, 682,964Browne I., Jackson N., Augusto P., Henstock D., Marlow D., NairS., Wilkinson P., 1997Browne I. W. A., et al., 2003, Monthly Notices of the Royal As-tronomical Society, 341, 13Brownstein J. R., et al., 2012, ApJ, 744, 41Chae K.-H., 2003, Monthly Notices of the Royal AstronomicalSociety, 346, 746Ciresan D. C., Giusti A., Gambardella L. M., Schmidhuber J.,2013, in MICCAI. pp 411–418Claeskens J.-F., Sluse D., Riaud P., Surdej J., 2006, A&A, 451,865Collett T. E., 2015, The Astrophysical Journal, 811, 20Collett T. E., Auger M. W., 2014, Monthly Notices of the RoyalAstronomical Society, 443, 969Dom´ınguez S´anchez H., et al., 2018, preprint,( arXiv:1807.00807 )Faure C., et al., 2008, The Astrophysical Journal SupplementSeries, 178, 382Gavazzi R., Treu T., Koopmans L. V. E., Bolton A. S., MoustakasL. A., Burles S., Marshall P. J., 2008, ApJ, 677, 1046Gavazzi R., Marshall P. J., Treu T., Sonnenfeld A., 2014, TheAstrophysical Journal, 785, 144He K., Zhang X., Ren S., Sun J., 2015, in The IEEE InternationalConference on Computer Vision (ICCV).Jackson N., 2008, MNRAS, 389, 1311Jackson N., 2011, ApJ, 739, L28Kim J., 2007, Monthly Notices of the Royal Astronomical Society,375, 625Kingma D. P., Ba J., 2014, CoRR, abs/1412.6980Kochanek C. S., Schechter P. L., 2004, Measuring and Modelingthe Universe, p. 117LSST Science Collaboration et al., 2009, preprint,( arXiv:0912.0201 )Laureijs R., et al., 2011, preprint, ( arXiv:1110.3193 ) LeCun Y., Bengio Y., Hinton G., 2015, Nature, 521, 436Lenzen, F. Schindler, S. Scherzer, O. 2004, A&A, 416, 391Mallat S., 2016, Philosophical Transactions of the Royal Societyof London Series A, 374, 20150203Marchetti L., Serjeant S., Vaccari M., 2017, MNRAS, 470, 5007Marshall P. J., et al., 2016, MNRAS, 455, 1171McKean J., et al., 2015, Advancing Astrophysics with the SquareKilometre Array (AASKA14), p. 84Metcalf R. B., Madau P., 2001, ApJ, 563, 9Metcalf R. B., Petkova M., 2014, Monthly Notices of the RoyalAstronomical Society, 445, 1942Metcalf R. B., et al., 2018, preprint, ( arXiv:1802.03609 )Miyazaki S., et al., 2012, Hyper Suprime-Cam,doi:10.1117/12.926844, https://doi.org/10.1117/12.926844

More A., Cabanac R., More S., Alard C., Limousin M., Kneib J.-P., Gavazzi R., Motta V., 2012, The Astrophysical Journal,749, 38More A., et al., 2016, Monthly Notices of the Royal AstronomicalSociety, 455, 1191Myers S. T., et al., 2003, MNRAS, 341, 1Oguri M., Marshall P. J., 2010, MNRAS, 405, 2579Patnaik A. R., Browne I. W. A., Walsh D., Chaﬀee F. H., FoltzC. B., 1992, MNRAS, 259, 1PPerley R., et al., 2009, IEEE Proceedings, 97, 1448Petrillo C. E., et al., 2017, MNRAS, 472, 1129Rawlings S., Schilizzi R., 2011, preprint, ( arXiv:1105.5953 )Refsdal S., 1964, MNRAS, 128, 295Seidel, G. Bartelmann, M. 2007, A&A, 472, 341Serjeant S., 2014, ApJ, 793, L10The Dark Energy Survey Collaboration 2005, ArXiv Astrophysicse-prints,Thorpe S., Fize D., Marlot C., 1996, Nature, 381, 520Treu T., Koopmans L. V. E., 2002, Monthly Notices of the RoyalAstronomical Society, 337, L6Vegetti S., Lagattuta D. J., McKean J. P., Auger M. W., Fass-nacht C. D., Koopmans L. V. E., 2012, Nature, 481, 341Walsh D., Carswell R. F., Weymann R. J., 1979, Nature, 279, 381Wyithe J. S. B., Yan H., Windhorst R. A., Mao S., 2011, Nature,469, 181de Jong, Jelte T. A. et al., 2015, A&A, 582, A62

APPENDIX A: COSMOS LENSES

The following ﬁgures show the 65 lenses from COSMOS thatour CNNs were tested with. Each lens has the ID numberbelow. Images with an asterisk after the ID number were the

MNRAS , 1–8 (2018)

A. Davies ones that were correctly identiﬁed as a lens. Each image hasthe usual North up, East left conﬁguration, and are 10 × This paper has been typeset from a TEX/L A TEX ﬁle prepared bythe author. 0047+5023 0049+5128*0050+0357* 0050+49010055+3821* 0056+1226*0056+2106 0104+20460104+2501* 0105+4531*0107+0533 0108+56060120+4551 0124+5121*0148+2325 0208+1422*MNRAS000