Alert Classification for the ALeRCE Broker System: The Real-time Stamp Classifier
Rodrigo Carrasco-Davis, Esteban Reyes, Camilo Valenzuela, Francisco Förster, Pablo A. Estévez, Giuliano Pignata, Franz E. Bauer, Ignacio Reyes, Paula Sánchez-Sáez, Guillermo Cabrera-Vives, Susana Eyheramendy, Márcio Catelan, Javier Arredondo, Ernesto Castillo-Navarrete, Diego Rodríguez-Mancini, Daniela Ruz-Mieres, Alberto Moya, Luis Sabatini-Gacitúa, Cristóbal Sepúlveda-Cobo, Ashish A. Mahabal, Javier Silva-Farfán, Ernesto Camacho-Iñiquez, Lluís Galbany
DDraft version August 11, 2020
Typeset using L A TEX twocolumn style in AASTeX63
Alert Classification for the ALeRCE Broker System: The Real-time Stamp Classifier
R. Carrasco-Davis ,
1, 2, ∗ E. Reyes ,
2, 1, ∗ C. Valenzuela ,
3, 1
F. F¨orster ,
3, 1, 4
P. A. Est´evez ,
2, 1, 3
G. Pignata ,
5, 1
F. E. Bauer ,
6, 1, 7
I. Reyes ,
1, 3, 2
P. S´anchez-S´aez ,
1, 6, 8
G. Cabrera-Vives ,
9, 1
S. Eyheramendy ,
8, 1
M. Catelan ,
6, 1
J. Arredondo, E. Castillo-Navarrete,
3, 1
D. Rodr´ıguez-Mancini,
1, 9
D. Ruz-Mieres ,
3, 1, 10
A. Moya ,
1, 3
L. Sabatini-Gacit´ua,
1, 3
C. Sep´ulveda-Cobo,
1, 3
A. A. Mahabal,
11, 12
J. Silva-Farfn, E. Camacho-Iiguez, and L. Galbany Millennium Institute of Astrophysics (MAS), Nuncio Monse˜nor S´otero Sanz 100, Providencia, Santiago, Chile Department of Electrical Engineering, Universidad de Chile, Av. Tupper 2007, Santiago 8320000, Chile Center for Mathematical Modeling, Universidad de Chile, Beauchef 851, North building, 7th floor, Santiago 8320000, Chile Departamento de Astronom´ıa, Universidad de Chile, Casilla 36D, Santiago, Chile Department of Physical Science, Universidad Andres Bello, Av. Republica 230, Santiago 8370146, Chile Instituto de Astrof´ısica and Centro de Astroingenier´ıa, Facultad de F´ısica, Pontificia Universidad Cat´olica de Chile, Casilla 306,Santiago 22, Chile Space Science Institute, 4750 Walnut Street, Suite 205, Boulder, Colorado 80301, USA Faculty of Engineering and Sciences, Universidad Adolfo Iba˜nez, Diagonal Las Torres 2700, Pe˜nalol´en, Santiago, Chile Department of Computer Science, Universidad de Concepcin, Edmundo Larenas 219, Concepcin, Chile Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge CB3 0HA, UK Cahill Center for Astrophysics, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, USA Center for Data Driven Discovery, California Institute of Technology, Pasadena, CA 91125, USA Departamento de F´ısica Te´orica y del Cosmos, Universidad de Granada, E-18071 Granada, Spain
ABSTRACTWe present a real-time stamp classifier of astronomical events for the ALeRCE (Automatic Learningfor the Rapid Classification of Events) broker. The classifier is based on a convolutional neural networkwith an architecture designed to exploit rotational invariance of the images, and trained on alertsingested from the Zwicky Transient Facility (ZTF). Using only the science, reference and difference images of the first detection as inputs, along with the metadata of the alert as features, the classifier isable to correctly classify alerts from active galactic nuclei, supernovae (SNe), variable stars, asteroidsand bogus classes, with high accuracy ( ∼ Keywords:
Supernovae — Alert Broker Visualization Tools — Deep Learning
Corresponding author: Rodrigo [email protected]@ing.uchile.cl ∗ These authors contributed equally to this work INTRODUCTIONThe amount of data generated by modern survey tele-scopes cannot be directly handled by humans. There-fore, automatic data analysis methods are necessary tofully exploit their scientific return. A particularly chal-lenging problem is the real-time classification of tran- a r X i v : . [ a s t r o - ph . I M ] A ug R. Carrasco-Davis and E. Reyes sient events. Nevertheless, the possibility to generate aquick probabilistic evaluation of which type of transienthas been discovered is crucial to perform the most suit-able follow-up observation, and by extension obtain thebest constraints on its physics. In this work we focus onthe early detection of supernovae (SNe) by quickly dis-cerning between SNe and various other confusing classesof astronomical objects. Photometric and spectroscopicobservations carried out soon after the explosion are fun-damental to put constraints on the progenitor systemsand explosion physics.In the case of thermonuclear explosions (Type IaSNe), early observations probe the outermost part ofthe ejecta, where it is possible to detect the materialpresent at the surface of the progenitor (e.g., Nugentet al. 2011), evaluate the degree of mixing induced bydifferent explosion models (e.g., Piro & Morozova 2016;Jiang et al. 2017; Noebauer et al. 2017), and estimatethe size of the companion star (e.g., Kasen 2010).For core collapse (CC) SNe, observations carried outsoon after the explosion allow to constrain the radius ofthe progenitor star, its outer structure and the degree of Ni mixing (e.g., Tominaga et al. 2011; Piro & Nakar2013), but also the immediate SN environment, provid-ing a critical diagnostic for the elusive final evolutionaryhistory of the progenitor and/or the progenitor systemconfiguration (e.g., Moriya et al. 2011; Gal-Yam et al.2014; Groh 2014; Khazov et al. 2016; Tanaka et al. 2016;Yaron et al. 2017; Morozova et al. 2017; Frster et al.2018).We propose a method to quickly classify alerts amongfive different classes, four of which are astrophysical, tothen use the predictions to find and report SNe. Thiswork has been developed in the framework of ALeRCE (Automatic Learning for the Rapid Classification ofEvents; Frster et al. 2020). The ALeRCE system isable to read, annotate, classify and redistribute the datafrom large survey telescopes. Such efforts are commonlycalled Astronomical Broker Systems (other examples in-clude, e.g., ANTARES, Narayan et al. 2018; Lasair,Smith et al. 2019). Currently, ALeRCE is processing thealert stream generated by the Zwicky Transient Facility(ZTF; Bellm et al. 2018) and its main goal is to reli-bly classify data of non-moving objects, and make theseclassifications available to the scientific community.For the purpose of classifying astronomical objects ortransients, one way to discriminate among them is bycomputing features from the light curve of each object(e.g., Richards et al. 2011; Pichara et al. 2016; Martnez- https://alerce.online/ Palomera et al. 2018; Boone 2019; SnchezSez et al.2020), using the light curve directly as input to a classi-fier (e.g., Mahabal et al. 2017; Naul et al. 2018; Muthukr-ishna et al. 2019; Becker et al. 2020). In the case of analert stream scenario such as for ZTF (whereby no forcedphotometry of past images is as yet provided), the lightcurve is built by estimating the flux from the differenceimage for all alerts triggered at the same coordinates.Our model is dubbed the “stamp classifier” , sinceit only uses the first alert of an astronomical ob-ject. ALeRCE also developed a light curve classifier (SnchezSez et al. 2020) based on light curves with ≥ g or ≥ r ZTF bands.The light curve classifier is able to discriminate amonga richer taxonomy of astronomical objects. Both thestamp and light curve classifiers are currently runningthrough the ALeRCE frontend (Frster et al. 2020).Our proposed stamp classifier is based on a convolu-tional neural network (CNN) architecture that uses onlythe information available in the first alert of an astro-nomical object, which includes the images of the objectsplus metadata regarding some of the object properties,observation conditions and information from other cata-logs. The stamp classifier uses the first alert to discrim-inate between active galactic nuclei (AGN), supernovae(SNe), variable stars (VS), asteroids and bogus alerts.The architecture was designed to exploit the rotationalinvariance of astronomical images. The classifier wastrained using an entropy regularizer that avoids the as-signment of high probability to a single class, yieldingsofter output probabilities that give extra informationto experts, useful for further analysis of candidates.To the best of our knowledge, this is the first classi-fier that discriminates among five classes using a singlealert, allowing a rapid, reliable characterization of thedata stream to trigger immediate follow-up. Previouswork on stamp classification has focused instead on theclassification of real objects vs. bogus detections (e.g.,Goldstein et al. 2015; Cabrera-Vives et al. 2017; Reyeset al. 2018; Duev et al. 2019; Turpin et al. 2020), galaxymorphologies (e.g., Dieleman et al. 2015; Prez-Carrascoet al. 2019; Barchi et al. 2020), or time domain classifi-cation (Carrasco-Davis et al. 2019; Gmez et al. 2020).An associated contribution to the stamp classifier isthe implementation of a visualization tool called
SNHunter , which allows experts to explore SN candidatesto further filter alerts, and choose objects to requestfollow-up. This visualization tool is deployed online and https://snhunter.alerce.online/ LeRCE Stamp Classifier ,along with an analysis of reported and confirmed SNeby ALeRCE using the proposed methodology since June2019. We finally draw our conclusions and describe fu-ture work in Section 6. DATAAn alert within the ZTF stream is defined as a sourcein the sky that produces a signal five standard deviationshigher than the background noise (a five- σ magnitudelimit; Masci et al. 2018), and which passes a real bo-gus filter designed by the ZTF collaboration (Mahabalet al. 2019). When an alert is triggered, an alert packet is generated with all the relevant information about thesource that triggered the alert (Bellm et al. 2018). Thealert packet contains three images called stamps, whichare cropped at 63 pixels on a side (1 pixel = 1 arcsec)from the original image and centered on the position ofthe source. In addition, the alert packet contains meta-data related to the source, the observation conditions ofthe exposure and other useful information (Masci et al.2018). An example of the three stamps within an alertpacket is shown in Figure 1. The stamp in Figure 1ais called the science image and corresponds to the mostrecent measurement of the source. The stamp depictedin Figure 1b is the reference image , which is fixed for agiven region and bandpass. It is usually based on im-ages taken at the beginning of the survey and it is builtby averaging multiple images to improve its signal-to-noise ratio. The stamp shown in Figure 1c is the dif-ference image , between the science and reference images(Masci et al. 2018), which shows the change in flux be-tween those frames, removing other sources with con-stant brightness. https://wis-tns.weizmann.ac.il/ Each alert packet represents only 2 samples in time,the reference and science image exposures, and often isinsufficient to correctly classify objects over the full tax-onomy of different variable stars, transients or stochasticsources as in SnchezSez et al. (2020).However, our hypothesis is that it is feasible to use theinformation included in a single alert packet to separateobjects into several broad classes, namely AGN, SNe,VS, asteroids and bogus alerts. Each class presents dis-tinctive characteristics within the image triplet of thefirst detection alert (see Figure 2), which could be auto-matically learned by a CNN. In addition to the images,information in the metadata in the alert packet alongwith some derived features from the metadata are im-portant to discriminate among the mentioned classes.The metadata used for the classification task is listed inTable 1, and the distribution of values for each featureper class is shown in Figure A1 in Appendix A. Someof the distinctive characteristics for each class are thefollowing: • AGN : Being stochastically variable objects, an alertgenerated by an AGN should have flux from the sourcein both the reference and science stamps. Consid-ering this feature alone, it is difficult to discrimi-nate AGNs from other variable sources. Nevertheless,AGNs should lie at the centers of their host galaxies(based on dynamical friction arguments), or appear as(quasi-)stellar objects, in relatively lower stellar den-sity fields. Thus, a change in flux will appear as a vari-able source, which may lie at the center of a galaxy, oreven when the galaxy is not visible they tend to be inlower stellar density fields. In these cases, the alert islikely to be triggered by an AGN. In addition, AGNsare commonly found outside the Galactic plane, asshown in Appendix A. Other important features are sgscores , which tend to have values closer to 0, sinceAGNs occur in extragalactic sources, and distpsnr ,which should have low distpsnr1 values since thenearest source should be the AGN itself, combinedwith large distpsnr2 and distpsnr3 values due tothe lower source density outside of the Galactic plane.The classtar is also useful as more weakly accretingAGN candidates tend to be classified as galaxy-shapedsources by the SExtractor classifier (Bertin & Arnouts1996). • Supernovae (SNe) : An alert generated by a SNshould appear as a change in flux where no unresolvedsources were present. These transients tend to appearnear their host galaxies, and their location should beconsistent with the underlying host stellar populationdistribution (e.g., a SN will have a higher probability
R. Carrasco-Davis and E. Reyes (a) Science (b) Reference (c) Difference (d) Colored Image
Figure 1.
Example g -band images from a ZTF alert packet, in this case from a type Ia SN (ZTF19abmolyr) classified by ourmethod. (a) The science image is the latest measurement of a source. (b) The reference image is usually a higher signal-to-noisemeasurement taken from an earlier epoch. (c) The third stamp is the difference between the reference and science images. (d)For context, we also show the gri color image from PanSTARRS, which is not part of the alert packet nor used in the currentstamp classifier. Each image stamp is 63 pixels ×
63 pixels, where 1 pixel = 1 arcsec. of arising from a location aligned with the disk thanperpendicular to it). As such, most SN detections ex-hibit a visible host galaxy in both the science and ref-erence stamps, with the flux from the SN arising onlyin the science and difference images. SN candidatestend to appear outside the Galactic plane, and so the sgscores , distpsnr , and Galactic latitude featureshave similar distributions to AGN candidates. How-ever, there are other features that might help to clas-sify SN candidates correctly. For instance, chinr and sharpnr present a different distribution for the SNclass, compared to the other classes (see Appendix A).Furthermore, isdiffpos value should always be 1 fornew SN candidates. • Variable Stars (VS) : The flux coming from variablestars usually appears in both the reference and sci-ence stamps. With ZTF’s sensitivity, variable starscan be detected within the Milky Way or the LocalGroup, and thus the alert will typically not be asso-ciated with a visible host galaxy in the stamp, butrather with other point-like sources. In addition, suchalerts will have a higher probability of residing at lowerGalactic latitudes and in crowded fields with multi-ple point sources within the stamps, given the highconcentration of stars in the disk and bulge of ourGalaxy. Therefore, VS candidates present a distribu-tion of higher sgscores , lower distpsnr and Galacticlatitude closer to 0 compared to AGN and SN candi-dates (see Figure A1). • Asteroids : Alerts from moving Solar-system objectswill appear only one time at a given position, and thuswill show flux only in the science and difference im-ages. Depending on their distance and speed, theymay appear elongated in the direction of motion. In addition, such alerts should have a higher probabilityof residing at lower ecliptic latitudes as shown in Fig-ure A1. Also, new asteroid candidates should alwayshave an isdiffpos feature equal to 1. • Bogus alerts : Camera and telescope optics effects,such as saturated pixels at the centers of brightsources, bad columns, hot pixels, astrometric mis-alignment in the subtraction to compute the difference image, unbaffled internal reflections, etc., can producebogus alerts with no interesting real source. Bogusalerts are characterized by the presence of NaN pix-els due to saturation, single or multiple bright pixelswith little or no spatial extension (i.e., smaller thanthe telescope point spread function (PSF) and nightlyseeing), or lines with high or low pixel values that ex-tend over a large portion of the stamp (hot or coldcolumns/rows). We are currently working to includesatellites in this class. However, they may share someimage traits with asteroids, but are not confined tothe ecliptic.We built a training set of ZTF alerts using the la-beled set from SnchezSez et al. (2020), which is a re-sults of cross-matching with other catalogs, such asthe ASAS-SN catalogue of variable stars (Jayasingheet al. 2018, 2019a,b, 2020), the Roma-BZCAT Multi-Frequency Catalog of Blazars (Massaro et al. 2015),the Million Quasars Catalog (version June 2019, Flesch2015, 2019), the New Catalog of Type 1 AGNs (Oh2015;Oh et al. 2015), the Catalina Surveys Variable Star Cat-alogs (Drake et al. 2014, 2017), the LINEAR catalog ofperiodic light curves (Palaversa et al. 2013),
Gaia
DataRelease 2 (Mowlavi et al. 2018; Rimoldini et al. 2019),the SIMBAD database (Wenger et al. 2000), and spec-troscopically classified SNe from the TNS database. The
LeRCE Stamp Classifier Active Galactic Nuclei Supernovae Variable Stars Asteroids Bogus
Figure 2.
Examples of the five classes that are to be discriminated by using only the first detection. For each class, thetriplet of images in each row are science, reference and difference images from left to right. Each row corresponds to a differentcandidate. asteroid subset was built by selecting the alerts that werenear a Solar-system object, requiring that the ssdistnr field in the alert metadata exists. Each sample corre-sponds to the triplet of science, reference , and difference images of the first detection. The number of samplesof AGN, SN, VS, asteroid, and bogus are 14,966 (29%),1620 (3%), 14,996 (29%), 9899 (19%), and 10,763 (20%),respectively, with a total of 52,244 examples (undersam-pling the labeled set from SnchezSez et al. (2020) for forbetter balance between classes since 3% SNe would notexactly be considered balanced compared to the rest).The bogus class was built in two steps: we first used1980 bogus examples reported by ZTF (based on humaninspection) and ran an initial iteration of the proposedclassifier detailed in Section 3.2. Then, another 8783 bo-gus samples were labeled by our team of experts usingthe SN Hunter and added to the training set by manu-ally inspecting the samples predicted by an early versionof the model as SNe. The main aim of the stamp clas-sifier is the fast detection of SNe, therefore the trainingset consists only of first alerts, which allows us to esti-mate probabilities of objects as soon as we receive thealert. METHODOLOGY3.1.
Data Pre-Processing
The standard shape for each stamp within an alertis 63 ×
63 pixels; 650 non-square shaped stamps wereremoved from the dataset. After removing misshapedstamps, we obtained 14742 (29%) AGN, 1596 (3%) SN,14723 (29%) VS, 9799 (19%) asteroids and 10734 (20%)bogus alerts, with a total of 51,594 examples. Some pix-els have NaN values due to pixel saturation, bad columnsor stamps from the edges of the camera; all NaN pixelswere replaced by a value of 0, giving information about NaNs content within the stamp to the classifier. Prelim-inary tests showed that smaller images for training leadto better results, therefore we cropped all the stamps atthe center getting 21 ×
21 pixels images; this size was se-lected by the hyperparameter random search discussedin Section 3.5. Better results with a small stamp sizemay be explained by the fact that smaller stamps meansa dimensionality reduction with respect to the originalimage size in the input of the CNN, and this may be eas-ier to handle by the model. Further analysis of the op-timal stamp size for the classification task at hand mustbe carried out since it might be important for the designof future alert stream based surveys. Each stamp wasnormalized independently between 0 and 1 by subtract-ing the minimum pixel value in the image and dividingby the maximum of the result. Finally, a 3-channel cubeis assembled as input to the classifier, built by stackingthe resulting science, reference and difference images asseparate channels, resulting in a 21 × × Classifier Architecture
The classification model is a CNN based on the real-bogus classifier proposed by Reyes et al. (2018), which isan improvement over Deep-HiTS (Cabrera-Vives et al.2017) by adding rotational invariance to the CNN andanalyzing the predictions of the model using Layer-wiseRelevance Propagation (LRP; Bach et al. 2015). Thespecific CNN architecture used in this work is shown inTable 2. In these previous works, metadata were notincluded for classification.
R. Carrasco-Davis and E. Reyes
Table 1.
List of metadata of the alert used as features by theclassifier. The definitions are from the ZTF avro schemas a . Feature Description [units] sgscore { } Star/Galaxy score of the { first, second, third } clos-est source from PanSTARRS1 catalog 0 ≤ sgscore ≤ distpsnr {
1, 2, 3 } Distance of the { first, second, third } closest sourcefrom PanSTARRS1 catalog, if one exists within 30arcsec, -999 if there is no source [arcsec]. isdiffpos t (converted to 1) if the candidate is from positive(science minus reference) subtraction; f (convertedto 0) if the candidate is from negative (referenceminus science) subtraction. fwhm Full Width Half Max assuming a Gaussian coreof the alert candidate in the science image fromSExtractor (Bertin & Arnouts 1996) [pixels]. magpsf magnitude from PSF-fit photometry of the alertcandidate in the difference image. [mag]. sigmapsf magpsf [mag]. ra , dec Right ascension and declination of candidate;J2000 [deg]. diffmaglim classtar
Star/Galaxy classification score of the alert candi-date in the science image, from SExtractor. ndethist
Number of spatially-coincident detections fallingwithin 1.5 arcsec going back to beginning of sur-vey; only detections that fell on the same field andreadout-channel ID where the input candidate wasobserved are counted. All raw detections down toa photometric Signal/Noise ≈ ncovhist Number of times input candidate position fell onany field and readout-channel going back to begin-ning of survey. chinr , sharpnr DAOPhot (Stetson 1987) chi, sharp parametersof nearest source in reference image PSF-catalogwithin 30 arcsec.Eclipticcoordi-nates ecliptic latitude and longitude computed from the ra , dec coordinates of the candidate [deg].Galacticcoordi-nates Galactic latitude and longitude computed from the ra , dec coordinates of the candidate [deg].approxnon-detections ncovhist minus ndethist . Approximate numberof observation in the position of the candidate,with a signal lower than Signal/Noise ≈ a https://zwickytransientfacility.github.io/ztf-avro-alert/ The input of the neural network has a shape of21 × × × Rotational Invariance
Astronomical objects present within a stamp usu-ally have a random orientation. It has been shownthat imposing rotational invariance to a classifier im-proves its accuracy for some classification problems (e.g.,Dieleman et al. 2015, 2016; Cabrera-Vives et al. 2017;Reyes et al. 2018). In this work, rotational invari-ance is achieved by feeding the neural network with90 ◦ , 180 ◦ and 270 ◦ rotated versions of the original in-put batch x . Defining r as a 90 ◦ rotation operation,then the samples within the extended batch will be B ( x ) = [ x, rx, r x, r x ] after applying the rotations. Atthe last step of the architecture before the softmax out-put layer, a cyclic pooling operation is performed, whichis basically an average pooling over the representationof the dense layer for each rotated example. A schemeof the procedure described in this section is shown inFigure 3. 3.4. Entropy Regularization
When the CNN model is trained using cross-entropyas the loss function to be minimized, the classificationconfidence of the model is very high, resulting in a dis-tribution of output probabilities with saturated valuesof 0s and 1s without populating the values in between,even for wrong classifications. In this case there is no in-sight of certainty (relative probabilities between classes)
LeRCE Stamp Classifier DifferenceReferenceScience 0°90°180°270° ConvolutionalLayersConvolutionalLayersConvolutionalLayersConvolutionalLayers Dense Dense PredictionCyclic PoolingConcatenationAlertMetadata BatchNormalization Dense
Figure 3.
CNN enhanced with rotational invariance. The box
Convolutional Layers refers to those described in Table 2,from the first convolutional layer to the last pooling layer. For each sample, the science, reference and difference images areconcatenated in the channel dimension, obtaining an image input of dimension 21 × ×
3. For each sample within the sampledbatch, rotated versions are generated as described in Section 3.2 and fed to the CNN. After the first dense layer, the Cyclicpooling is performed. The metadata features pass through a batch normalization layer, the output of which is concatenatedwith the cyclic pooling output. Then, the concatenation goes through 2 fully connected layers, and finally a softmax functionis applied to estimate the output probabilities. of the prediction because most estimated probabilitiesfor each class were either 0 or 1. In order to providemore granularity to the astronomers who revise SN can-didates reported by the model to later request follow-upobserving time, we added the entropy of the predictedprobabilities of the models as a regularization term, tobe maximized during training (Pereyra et al. 2017). Bymaximizing the entropy of the output probabilities, wepenalize predictions with high confidence, in order to getbetter insight in cases where the stamps seem equallylikely to belong to more than one class. The loss func-tion L per sample is as follows: L = − N (cid:88) c =1 y c log (ˆ y c ) (cid:124) (cid:123)(cid:122) (cid:125) cross-entropy + β N (cid:88) c =1 ˆ y c log (ˆ y c ) (cid:124) (cid:123)(cid:122) (cid:125) entropy regularization , (1)where N is the number of classes, y c is the one-hot en-coding label (a value of 1 in the corresponding index ofclass, and 0 for the rest) indexed by c , ˆ y c is the modelprediction for class c , and β controls the regularizationterm in the loss function. Further explanation on therole of the loss function in the training process of a neu-ral network is explained in Appendix B.3.5. Experiments
A hyperparameter search was done by randomly sam-pling 133 combinations of the parameters shown in Ta- ble 3. For each combination of hyperparameters, wetrained 5 networks with different initial random weights.The initial maximum number of iterations (presenting asingle batch per iteration) was 30,000, evaluating theloss in the validation set every 10 iterations to save thebest model thus far. After the first 20,000 iterations,if a lower loss is found on the validation set, 10,000more iterations are performed. The validation and test-ing subsets were sampled randomly only once, taking100 samples per class from the whole dataset, obtain-ing 500 samples for each of the mentioned subsets. Theremaining samples were used in the training set. Foreach training iteration, the batch was built to containroughly the same number of samples per class. We usedAdam (Kingma & Ba 2017) as the updating rule for thenetwork parameters during training, with β = 0 . β = 0 .
9. Further details on the updating rules of aneural network an the Adam optimizer are described inAppendix B. RESULTSIn this section, we first describe our results in termsof the classification task for the five classes. Then, wechange our focus to the detection of SN candidates, sinceour main interest in this work is to discover extremelyyoung transient candidates to be observed with follow-up resources. Further applications of this early classifi-
R. Carrasco-Davis and E. Reyes
Table 2.
Convolutional neural network architecture.Layer Layer Parameters Output SizeInput - 21 × × × × × × × ,
32 24 × × × ,
32 24 × × ×
2, stride 2 12 × × × ,
64 12 × × × ,
64 12 × × × ,
64 12 × × ×
2, stride 2, 6 × × ×
64 64Rotationconcatenation - 4 × a features - 64 + 26Fully connectedwith dropout 90 ×
64 64Fully connected 64 ×
64 64Output softmax 64 × ◦ classes) a BN stands for batch normalization
Table 3.
Hyperparameter random search values.Hyperparameter Random Search ValuesLearning rate 5 e -3 , e -3 , e -4 , e -4 , e -5Regularization parameter ( β ) 0 , . , . , . , . , , , , . , . , . , , cation system might include rapid detection of extremevariability in AGN or tracking solar system objects.The following results correspond to the best model inthe search for hyperparameters, which adopts a batchsize of 64 samples, learning rate of 1 e -3, dropout rate of0.5, CNN kernel size of 5, image size of 21 ×
21 pixels,and regularization parameter of β = 0 .
5. Appendix Ccontains the details on how this model was selected. We use accuracy to compare models since the validationand test sets are balanced; achieving 0 . ± .
005 in thevalidation set and 0 . ± .
004 in the test set. Figure 4shows the confusion matrix for the test set consistingof using five realizations of the mentioned model. Withour five class model, we recover 87 ±
1% of the SNe, withonly 5 ±
2% of false positives. For completeness, we alsoreport the confusion matrix of the stamp classifier whenno metadata features are included in the fully connectedlayers (see Figure A3), which has a test-set accuracy of0 . ± . ±
2% of the SNe in the testset, with 10 ±
4% of false positives.
AGN SN VS asteroid bogusPredicted labelAGNSNVSasteroidbogus T r u e l a b e l Figure 4.
Average confusion matrix for the test set using 5different realizations of the stamp classifier.
By inspecting the predictions made by our model foreach SN sample in the test set, we found that the resultsare in agreement with our initial expectations regard-ing the class discrimination described in Section 2, andthe characteristics presented within the three stamps foreach sample. Figure 5 shows SNe examples from TNSthat have been corretly classified by our model, wherein most cases a host galaxy is present, which is a goodindicator of an alert triggered by a SN. In the examplesshown in Figures 5c and 5d, the second most likely classis AGN, due to the spatial coincidence of the transientwith the center of the host galaxy.In Figure 6, incorrectly classified examples are shown.The examples in Figures 6a, 6b and 6c are SNe fromTNS classified as Asteroids by our model. The absence accuracy = N ◦ correct classificationsTotal N ◦ of samples LeRCE Stamp Classifier the first alert only . Accordingto the confusion matrix, the most probable missclassifi-cation for SN candidates are asteroid and bogus classes.This confusion between SN, asteroids, and bogus couldbe fixed by looking at the second alert of the same ob-ject. If the second alert exists, it is safe to discard thebogus and asteroid classes, since it is extremely unlikelythat the same bogus error or a moving object will ap-pear in the exact same location in consecutive images,unless the alert is near a bright star that produces pixelerrors due to saturation.An example of the effect of the regularization term isdepicted in Figure 7. Considerable differences in the dis-tribution of the predicted probability for each class canbe observed by varying β between 0 and 1, since bothterms in eq. 1 are expected values of log probabilities.In the case of β = 0, the predictions are mostly satu-rated around 0 or 1 for the SN, VS, Asteroids and Bogusalert classes, creating difficulties to identify stamps thatseem equally likely to belong to more than one class,because every sample is mapped to similar levels of highcertainty. As the value of β increases, the saturationof predicted values decreases, spreading the predictedprobability distributions and emphasizing the differentlevels of certainty between predictions of different sam-ples. The order of predicted probabilities for each sam-ple does not change significantly by varying β , achiev-ing 99% of accuracy in the test set by checking whetherthe correct label lies in the highest two predicted prob-abilities for different β . The use of regularization tofind noticeable differences in the predicted probabilitiescould be helpful to an expert for evaluating the outputof the classifier, gaining better insight into how reliablethe classifications are.As a consistency check, we predicted the classes of un-labeled candidates using the stamp classifier, in order tocompare their spatial distribution to the expected spa-tial locations for each class as mentioned in Section 2.To gather the unlabeled candidates, we queried objects using the ALeRCE API by selecting 390,498 first alertsof different objects, chosen to be uniformly distributedover the full sky coverage of ZTF, where 325,582 of thealerts come from objects with > https://alerceapi.readthedocs.io/en/latest/ R. Carrasco-Davis and E. Reyes
Predicted probabilities:AGN: 0.074, SN: 0.7, VS: 0.089, asteroid: 0.082, bogus: 0.055 (a) Object ID: ZTF19abfdsbu
Predicted probabilities:AGN: 0.084, SN: 0.648, VS: 0.106, asteroid: 0.078, bogus: 0.085 (b) Object ID: ZTF18acrdwcf
Predicted probabilities:AGN: 0.196, SN: 0.568, VS: 0.062, asteroid: 0.097, bogus: 0.077 (c) Object ID: ZTF18abuhzfc
Predicted probabilities:AGN: 0.324, SN: 0.37, VS: 0.072, asteroid: 0.137, bogus: 0.096 (d) Object ID: ZTF19abrirdm
Figure 5.
Correctly classified SN examples, with their respective predicted probabilities according to the proposed model.Panels (a) and (b) show typical examples of well-classified SNe, where the presence of a host galaxy within the stamps increasesthe chances of a SN alert being triggered. Panels (c) and (d) show small confusions between SN and AGN, due to the spatialcoincidence of the transient with the center of the host galaxy.
Predicted probabilities:AGN: 0.085, SN: 0.105, VS: 0.096, asteroid: 0.635, bogus: 0.078 (a) Object ID: ZTF18absoomk
Predicted probabilities:AGN: 0.097, SN: 0.344, VS: 0.082, asteroid: 0.351, bogus: 0.127 (b) Object ID: ZTF19abmqasg
Predicted probabilities:AGN: 0.083, SN: 0.208, VS: 0.101, asteroid: 0.535, bogus: 0.074 (c) Object ID: ZTF19aazlsfj
Predicted probabilities:AGN: 0.114, SN: 0.25, VS: 0.095, asteroid: 0.086, bogus: 0.456 (d) Object ID: ZTF19abpbvsk
Figure 6.
Incorrectly classified SN examples, with their respective predicted probabilities by the proposed model. In Panels(a), (b) and (c), the SNe are classified as asteroids. The SN in panel (d) is classified as a bogus alert, which might be caused bythe small size of the PSF, confusing the classifier with a hot pixel or a cosmic ray, which usually occupies a very narrow portionof the stamp at the center. In all cases, the absence of a clear host-galaxy within the stamps reduces the probability of a SNalert being triggered.
LeRCE Stamp Classifier class probability b e t a Active Galactic Nuclei class probabilitySupernovae class probabilityVariable Stars class probabilityAsteroids class probabilityBogus
AGNSNVSasteroidbogus class probability b e t a . class probability class probability class probability class probability class probability b e t a . class probability class probability class probability class probability Figure 7.
Probability distribution for each of the classes in the training set, for different values of the regularization constant β = { , . , . } . For the model without regularization ( β = 0 shown on the top plot), the probability distribution saturates to1 or 0. Increasing β to 0.5 or 1.0 decreases the saturation and spreads the distribution of predictions made by the model (midand bottom plots). Figure 8.
Spatial distribution for the unlabeled data, and distribution of predictions per class. The colorbar indicates thedensity of points. The ecliptic is shown with a yellow line with black edges. The distributions are shown as a 2d histogramof density of alerts. Extragalactic sources (SNe and AGNs) are found outside the Galactic plane. On the contrary, VS areconcentrated in the Galactic plane. Asteroids are near the ecliptic. R. Carrasco-Davis and E. Reyes list for inspection by experts, while keeping a high truepositive ratio. For instance, for a SN probability thresh-old of 0.1, 0.2 and 0.5, the false positive ratio is 0.87,0.03 and 0.01 respectively, while the true positive ratio is0.94, 0.92 and 0.83, respectively. Our model is suitableto be used to process large volumes of alerts, when lim-ited resources for manual inspection and confirmationby means of follow-up observations are available. T r u e p o s i t i v e th=0.1th=0.2th=0.5 0.00.20.40.60.81.0 S N p r o b a b ili t y t h r e s h o l d Figure 9.
ROC curve with SN detection threshold. Thecolorbar shows the threshold that a sample’s predicted SNprobability must surpass in order to be assigned as a mem-ber of the SN class. For a SN probability threshold of 0.1,0.2 and 0.5, the false positive ratio is 0.87, 0.03 and 0.01,respectively, while the true positive ratio is 0.94, 0.92 and0.83, respectively.5.
MODEL DEPLOYMENT AND SN HUNTERThe SN Hunter (https://snhunter.alerce.online) is avisualization tool that allows the user to inspect SN can-didates classified by the model in real time, in order toselect good targets for follow-up observations. The in-terface of the SN Hunter is shown in Fig. 10. At theleft of the interface, a celestial map shows the positionof each candidate with a circle, where the size of the cir-cle is proportional to the class probability assigned byour model, with the map centered on the right ascen-sion ( ra ) and declination ( dec ) coordinates of the alert.The Milky Way plane is highlighted by the regions withlighter shades of purple. The green curve in the map rep-resents the ecliptic, where SN candidate alerts are morelikely to be triggered by asteroids instead of real SNe.The right side of the interface provides a table wherethe highest probability SN candidates are listed. Thetable shows the ZTF Object ID which uniquely identi-fies each astronomical alert, the discovery date specify-ing day, month, year and time where the first alert wastriggered, the corresponding SN probability (score) from the stamp classifier, and the number of available alertsin the r and g bands ( ra and dec coordinates of thealert, the filter in which the first detection was made, thePSF magnitude and the observation date. Below comesthe PanSTARRS cross-match information, containingthe Object ID, distance to the first closest known object,and the classtar score of the first closest known object,where a score closer to 1 implies higher likelihood of itbeing a star. The buttons below this information, fromleft to right, correspond to queries with the ALeRCEfrontend, the NASA Extragalactic Database (NED ),TNS, and the Simbad Astronomical Database (Wengeret al. 2000) around the position of the candidate. Fi-nally, the full metadata associated with the first alertof the SN candidate is linked below these buttons. Themiddle panel of Figure 11 contains an interactive colorimage from PanSTARRS DR1 (Chambers et al. 2019),centered around the source using Aladin (Bonnarel et al.2000; Boch & Fernique 2014); this image is also avail-able in the main frontend of ALeRCE. The right panel ofFigure 11 provides the science, reference and differencestamps of the first detection. It is also possible to sign inwith a user account and label candidates as either a pos-sible SN or bogus by clicking the corresponding buttonsbelow the image stamps. These can be used to build uplarger training sets, as well as select candidates for theTarget and Observation Managers (TOMs).We implemented the CNN stamp classifier using Ten-sorFlow 1.14 (Martn Abadi et al. 2015) and deployedit to classify the streaming alerts from ZTF’s Kafkaserver . The timespan between a ZTF exposure andits first arrival as an alert from the stream is 14.6 ± The NASA/IPAC Extragalactic Database (NED) is operated bythe Jet Propulsion Laboratory, California Institute of Technol-ogy, under contract with the National Aeronautics and SpaceAdministration. https://ned.ipac.caltech.edu/ https://kafka.apache.org/ LeRCE Stamp Classifier Figure 10.
SN Hunter, a tool for the visualization of SNe candidates. On the left side, the location of each candidate in skycoordinates with respect to the Galactic plane and the ecliptic are depicted. On the right side, a selection of the top candidatesis listed, initially ordered by SN probability score from the stamp classifier. The list of candidates can be sorted by otherparameters, and updated/refreshed to include newly ingested alerts. nova Hunter tool for expert inspection. Further detailsabout the complete processing pipeline are described inFrster et al. (2020).5.1.
Additional Visual Selection Criteria
We note that the SN candidate sample presented inthis and the following subsections resulted from an olderversion of the Stamp Classifier which relied only thethree images within the first alert and did not use fea-tures for SN classification. Moreover, some of the filter-ing steps we applied manually are no longer necessarynow that features are included (we note these below). Asshown in Appendix D, even without the metadata fea-tures, the classifier provides reasonably high accuracy(only 6% worse than the model with features). Regard-less of whether features are included or not, we found itcritical to visually inspect the predicted SNe candidatesin order to weed out misclassifications and submit morereliable candidates to TNS.There are some common characteristics among thehigher confidence SN candidates. As mentioned in Sec-tion 2, most confirmed SNe are located on top or near anextended galaxy. If there is no galaxy within the stamps,then it is more likely that is a variable star or asteroid,when the candidate is located near the ecliptic or theMilky Way, respectively, or bogus. In some cases, it isdifficult to tell if the nearest source to the alert in thescience image is an extended galaxy or star; for these, asearch of archival catalogs and/or an assessment of thespectral energy distribution can further aid classifica-tion. Therefore, the star galaxy score from PanSTARRSin Figure 11 should be closer to 0, indicating that theextended source is more likely to be an extended galaxy.Real SN should have a positive flux in the difference im- age, so we removed candidates that have negative fluxin the latter, by checking if the field isdiffpos value inthe metadata is false, this is automatically done in thecurrent pipeline. It is also important to check that theobject is visible in the difference image.Another relevant feature is the shape of the candidate,which should be similar to other stars with fuzzy edgesand generally symmetric in shape. If the shape of thecandidate is sharp (pixelized) or very localized, it mightbe a cosmic ray or a defect of the CCD camera. Alter-natively, if it is elongated, it could be an asteroid or asatellite (often seen as a streak or multiple small dashesdue to rotational reflections). After doing all of thesechecks, if the candidate is not convincing enough, thenit is helpful to look at the next detections when avail-able and search for the characteristics mentioned, whichcan be done using the ALeRCE frontend by clicking theALeRCE button in the SN Hunter tool and queryingdirectly that specific candidate’s data. The 100 highestprobability SNe candidates each day are manually in-spected by astronomers of the ALeRCE team, and allof them must be in agreement before a candidate is re-ported to TNS.5.2.
Reported and Confirmed Supernovae
From June 26th 2019 to June 21th 2020, we have re-ported 3060 new SN candidates to TNS, increasing thisnumber by 9.2 SNe per day on average, of which 399have been observed spectroscopically. Table 4 shows thenumber of candidates for each confirmed class, of which359 were confirmed as SNe.In Figure 12, we show the cumulative distribution ofcandidates reported to TNS from June 26th 2019 toJune 21th 2020. The cumulative distribution is sepa-4
R. Carrasco-Davis and E. Reyes
Figure 11.
Candidate visualization in the Supernova Hunter tool. On the left side, the SN candidate ID is shown as a clickablelink to the ALeRCE frontend, with relevant metadata such as ra , dec , magnitude, date, etc. At the bottom there are links toother sources of information, including ALeRCE, NED, TNS, and the Simbad Astronomical Database. In the middle of thefigure there is a colored image from Aladin. On the right side, the stamps of the first detection are shown, along with buttonsfor reporting the candidate as eventual bogus or as a possible SN. Table 4.
Spectroscopically observed candidates discoveredby ALeRCE, with a total of 359 SNe, 5 non-SN objects (2galaxies, 1 TDE, 1 variable star and 1 AGN).Confirmed class Spectroscopically observed candidatesSN Ia 265SN II 63SN IIn 13SN Ic 10SN Ib 9SN IIP 9SN Ia-91T-like 7SN Ic-BL 6SN IIb 4SN 2Galaxy 2SN Ia-91bg-like 2SN Iax[02cx-like] 1SN Ia-pec 1SN Ib/c 1SN Ibn 1TDE 1Variable star 1AGN 1 rated into two parts, namely the alerts with more thanone detection to date (orange) and the alerts with asingle detection to date (blue). We can consider thepercentage of candidates with more than one detectionto be a lower bound of real non-moving astronomicalobjects; we define this as “purity”, since multiple asso-ciated detections are a clear sign of a real non-movingastronomical object rather than a moving object or bo- gus alert. Candidates with only single detections to datecould be due to several reasons: moving objects, bogusalerts, relatively short transients that were only abovethe detection threshold for a short period of time, andobjects that are in locations which have not been vis-ited again by the public ZTF Survey since the objectdetection.We find that ≈
70% of our reported candidates are de-tected with multiple alerts, while ≈
30% have only onedetection. Currently, we are achieving an ≈
80% of pu-rity estimated with a moving average. Among the 2187SNe candidates with multiple alerts, 399 have been ob-served spectroscopically. Table 4 lists the number ofcandidates for each confirmed class, of which 394 wereconfirmed as SNe. Four of the remaining five are mis-classifications: one variable star; one tidal disruptionevent (TDE); one AGN, and one galaxy. The final ob-ject, ZTF19aciiuta, is classified by TNS as a galaxy, butis likely be a SN (as the object has more than one detec-tion, with a SN shaped light curve, and was reported byother groups to TNS). Even though TDEs are not SNe,follow-up of these events is still of significant interest,due to their relative scarcity (van Velzen et al. 2020). Insummary, taking into consideration the conservative fi-nal candidate selection done by the team of astronomersto perform spectroscopic confirmation, our reported andconfirmed candidates have around 1% contamination bynon-SNe objects.For comparison purposes, we gathered the objectsreported by both ALeRCE and AMPEL (Alert Man-agement, Photometry and Evaluation of Lightcurves,which is an internal ZTF classification effort; Nordinet al. 2019) to TNS, and compared the reporting timeswithin 3 days after the first detection. Figure 13 showsthe cumulative histogram of reporting times to TNS for
LeRCE Stamp Classifier Detection time [MJD] C u m u l a t i v e nu m b e r o f r e p o r t e d S N e
70% of the total cumulative1 alert>1 alerts
Figure 12.
Cumulative number of reported SNe since westarted reporting on June 26th 2019 to June 21th 2020. Theaverage rate of reporting is 9.2 candidates per day. Roughly ≈
70% of our reported candidates are detected with multiplealerts (purity), implied they are true SNe, while ≈
30% haveonly one detection and thus less certainty. Currently, we arecloser to ≈
80% of purity.
ALeRCE and AMPEL, along with the cases where re-ports were done by ALeRCE before having the seconddetection in the public stream (one detection). Approxi-mately 64% of the candidates reported by ALeRCE werebased on a single detection. An important difference be-tween both systems is the visual inspection by expertsin the reporting process to TNS. According to Nordinet al. (2019), AMPEL reports candidates automaticallyusing their “TNS channel”, which produces more re-ported candidates than our system, within 12 hours af-ter the first detection. As described in this work, oursystem’s final stage so far relies on human inspection,checking and reporting, which occurs within 10 to 24hours after the first detection, without reporting tran-sients already reported by AMPEL (only two cases werereported after AMPEL). Therefore, we report new can-didates to TNS within a day after the first detection.Besides, since ALeRCE is mostly reporting candidateswith a single detection, 92% (2804) of the reports weresent within one day after the first detection, comparedto 27% (1860) from AMPEL.Figure 14 shows the distribution of times between firstor second detection and last non-detection for candi-dates reported by ALeRCE. Based on the data shownin Figure 14, the average time between the last non-detection and the first detection is 3.8 days, and 8.5 daysfor the second detection. Reporting candidates only af-ter the second detection would result in a delay of 4.7days on average, which represents a potentially criticaltimespan to measure the spectra at early stages of the N u m b e r o f r e p o r t e d c a n d i d a t e s ALeRCEAMPELALeRCE one detection
Figure 13.
Cumulative distribution of time between thefirst detection and the reporting time from TNS, for candi-dates reported by ALeRCE and AMPEL. The full distribu-tions are shown with solid lines, and the distributions of re-porting time for candidates with a single detection are shownwith segmented lines. transient event, as required in order to achieve the sci-ence goals described in Section 1. As mentioned before,ALeRCE currently does not report candidates that werepreviously reported by other groups, therefore our candi-dates reported using a single detection increase the bulkof objects available for early follow-up of transients thatwere not found by other groups. We will report alreadyreported candidates in the near future, since this addsthe additional information that the candidates passedour visual inspection test. In addition, the work pre-sented is a starting point towards our goal of developingan automatic reporting systems of the most highly con-fident subset of SN candidates. CONCLUSION AND FUTURE WORKAs part of the ALeRCE Broker processing pipeline,we identified characteristics of the images and metadatawithin the ZTF alert stream that allow us to discrimi-nate among SN, AGN, VS, asteroids and bogus alerts.In order to solve this classification problem automati-cally and quickly identify the best SN candidates to per-form follow-up, we trained a CNN. This classifier uses the first detection only , i.e., its inputs are the science,reference and difference images, and part of the meta-data of the first detection alert. In addition, our CNNarchitecture is invariant to rotations within the stamps,and was trained using an entropy regularized loss func-tion. The latter is useful to improve human readabilityin predicted probabilities per class, in terms of certaintyassigned to each sample, so an expert can gain better6
R. Carrasco-Davis and E. Reyes
Time between detection and last non-detection [days] N o r m a li z e d nu m b e r o f c a n d i d a t e s Figure 14.
Histogram of time between first (second) de-tection and the last non-detection of the ALeRCE reportedcandidates. insights into the actual nature of the transients wheninspecting SN candidates.Among all five classes that our CNN can classify, itachieves an accuracy of 0 . ± .
004 on a balanced testset, while in the SN class reaches a true positive rateand a false positive rate of 87 ±
1% and 5 ± LeRCE Stamp Classifier × ACKNOWLEDGMENTSThe authors acknowledge support from the ChileanMinistry of Economy, Development, and TourismsMillennium Science Initiative through grant IC12009,awarded to the Millennium Institute of Astrophysics(RC, ER, CV, FF, PE, GP, FEB, IR, PSS, GC,SE, Ja, EC, DR, DRM, MC) and from the National Agency for Research and Development (ANID) grants:BASAL Center of Mathematical Modelling AFB-170001(FF) and Centro de Astrofsica y Teconologas AfinesAFB-170002 (FEB, PSS, MC); FONDECYT Regu-lar
Software:
Aladin (Bonnarel et al. 2000), ApacheECharts , Apache Kafka , Apache Spark ( ? ), AS-TROIDE (BRAHEM et al. 2018), Astropy (Astropy Col-laboration et al. 2013), catsHTM (Soumagnac & Ofek2018), Dask (Rocklin 2015), Jupyter , Keras (Chollet& others 2018), Matplotlib (Hunter 2007), NED (Steeret al. 2016), Pandas (McKinney 2010), Prometheus ,Python , scikit-learn (Pedregosa et al. 2011), Simbad-CDS (Wenger et al. 2000), Tensorflow (Martn Abadiet al. 2015), Vue , Vuetify , PostgreSQL .APPENDIX A. EXPLORING THE RELATIONSHIP BETWEEN FEATURES AND CLASSES
Table A1.
Clipping values for each feature. “max” or “min” in the clipping range for each feature means that the maximumand minimum value is preserved for that feature respectively.Feature [min value, max value] sgscore1 [-1, max] distpsnr1 [-1, max] sgscore2 [-1, max] distpsnr2 [-1, max] sgscore3 [-1, max] distpsnr3 [-1, max] ifwhm [min, 10] ndethist [min, 20] ncovhist [min, 3000] chinr [-1, 15] sharpnr [-1, 1.5]non detections [min, 2000] https://echarts.apache.org https://kafka.apache.org/ https://jupyter.org/ https://prometheus.io/ https://vuejs.org/ https://vuetifyjs.com/ R. Carrasco-Davis and E. Reyes
AGNSNVSasteroidbogus
Figure A1.
Feature distribution per class of the labeled dataset. Each feature was clipped to the values given in Table A1.B.
CNN GLOSSARY AND TRAININGB.1.
CNN architecture • Fully connected layer : Artificial neural networks (ANNs) are mathematical models that are mostly used forclassification or regression. ANN make use of basic processing units called neurons , which receive vectors x of dataas input, then apply a linear function to them, followed by a non-linear activation function. These neurons aregrouped in layers , which are called fully connected layers . The output produced by a set of neurons of a specificfully connected layer is calculated as: y = φ ( Wx + b ) , (B1)where x ∈ R n is the input of the layer, y ∈ R m is the output of the layer, W ∈ R m × n is a matrix of parameterscalled weights , b ∈ R m is a vector which contains the so-called biases of the layer, and φ ( · ) is a non-linear activation LeRCE Stamp Classifier x . There are all sort of flavors of non-linear activation functions,the most commonly used are: sigmoid( x ) = 11 + e − x , (B2)tanh( x ) = e x − e − x e x + e − x , (B3)ReLU( x ) = max { , x } . (B4)To be precise, W and b are referred to as the parameters of a fully connected layer, and they are modified duringtraining to be optimized for the task at hand. Fully connected layers can be sequentially stacked one after the otherto integrate an ANN model. For instance, an ANN of two layers, is defined as: z = φ ( W (1) x + b (1) ) , (B5) y = φ ( W (2) z + b (2) ) . (B6)The parameters of the ANN are θ = ( W (1) , b (1) , W (2) , b (2) ). The way of grouping neurons and layers in an ANN iscalled the architecture of an ANN. • Softmax output layer:
A commonly used activation function at the output of ANN models is the sigmoid( x ),whose output is bounded by (0 , softmax activation function, usually referred to as softmax output layer , where there are K neurons x i , i ∈ { , . . . , K } , and it is desired to assign a probability to each one, hence, requiring that they add up to one.This is done by the softmax activation function, defined as:softmax( x i ) = e x i (cid:80) Kj =1 e x j , i ∈ { , . . . , K } . (B7) • Convolutional layer:
ANN with fully connected layers are limited to vector-like inputs, and they do not take intoaccount the presence of correlation between adjacent features. To overcome this limitation, and preserve a degreeof spatial or temporal correlation in the input of models, CNNs were proposed. The main component of CNNs are convolution layers , which apply a filter or kernel to the input of the layer by a convolution operation. Similar to afully connected layer, convolutional layer outputs are calculated as follow: y = φ ( W ∗ x + b ) , (B8)where x stands for the input of the layer, y is the output of the layer, W is a set of filters to apply by convolutionto the input, b is the vector of biases for each filter, and φ ( · ) is the activation function. In this case the ∗ operationbetween x and W is a convolution. The model used in this work applies convolutions to images, so x ∈ R e × f × g and y ∈ R u × v × l are 3d tensors, while W ∈ R d × d × t × l and b ∈ R l . The calculation of every element y i,j,k of y is derivedfrom the operation of the convolutional layer as follows: y i,j,k = (cid:88) m,n,p x i − m,j − n,p W m,n,p,k + b k , (B9)where every element i, j, k of the tensor y is calculated by moving the filters of W over the tensor x and applyingeq. B9. Each time W moves over the first two dimensions of x , it skips S pixels, where S is called stride. Afterapplying the convolutional layer, the first two dimensions of y are smaller in size than the ones of x . The spatialdimensions (first and second dimension) of the tensors x and y relate to each other as follows: U = E − DS + 1 , (B10)0 R. Carrasco-Davis and E. Reyes where U is the size of any of the spatial dimensions of y , E is the size of the respective spatial dimension of x , D isthe respective spatial dimension of W and S is the stride used in the convolution operation. • Zero-padding:
It is a commonly used technique to preserve the spatial dimensions of the input x ∈ R e × f × g at theoutput y ∈ R u × v × l of a convolutional layer. Zero-padding consists on adding 0’s to the edges of the spatial dimensionsof x . For a convolutional layer of stride S = 1, and kernel size D , the zero-padded input to the convolutional layermust have dimensions such as x ∈ R ( e + (cid:98) D/ (cid:99) ) × ( f + (cid:98) D/ (cid:99) ) × g , where D/ e == u ∧ f == v , between the layer’s input x and output y . • Max pooling:
Pooling layers are used in CNNs to reduce the spatial dimensionality of their inputs. The maxpooling used in the model of Fig. 3 returns the maximum value within a window of its input x , in the same wayas a convolutional filter, this maximum value extraction window is rolled across the spatial dimensions of the input.In the case of the architecture of Fig. 3, the pooling window is of dimension 2 × • Batch normalization layer:
It works as a trainable normalization layer that has different behavior during trainingand evaluation of the model. During training, batch normalization layer calculates the mean and variance of eachfeature, to normalize them and compute an exponential moving average of mean and variance of the training set.After training of the model, at its evaluation, the whole population statistics adjusted during training are used tonormalize evaluation inputs. Batch normalization not only normalizes input values to have a mean value near 0 ora variance value near 1, it also contains a linear ponderation of these inputs, that allows their scaling an shifting.This layer allows the model to emphasize or ignore specific inputs, acting as a regularizer and speeding up training. • Dropout:
It is an operation that is usually applied at the output of fully connected layers, and it is used as aregularizer of the model to avoid overfitting of layers with large number of neurons. Similar to a batch normalizationlayer, dropout performs different operations during training and evaluation. The dropout operation is defined by the dropout rate DR ∈ [0 , / (1 − DR ), such that the sum over allthe input values remains the same. At each training step a percentage DR of the outputs of a fully connected layerwon’t be used, reducing the effective size of that layer. On the other hand, when using the model after training,dropout is deactivated. The desired effect of dropout is to enforce the model to not depend on specific units of everylayer.The model described in Fig. 3, is based on Enhanced Deep-HiTS Reyes et al. (2018), a state of the art classifier forbinary classification of real astronomical object and artifact samples. The architecture of this model introduced totalrotational invariance, which empirically proved to enhance performance on the classification of astronomical images.B.2. Neural network training • Procedure to train a neural network:
The objective of using a neural network f θ of parameters θ ∈ Θ, is toapproximate a function y = f ( x ), with x ∈ X . In practice there is no access to the whole data distribution X , butto a subset of N data samples of the function to approximate { ( x ( i ) , y ( i ) ) } Ni =1 , called training set . Finding the bestparameters θ ∗ for the neural network f θ ( x ) requires solving the optimization problem: θ ∗ = arg min θ ∈ Θ C ( θ ) = arg min θ ∈ Θ N N (cid:88) i =1 L ( y ( i ) , f θ ( x ( i ) )) , (B11)where C is an error functional defined by the function L that is called loss function . The optimization depictedin eq. B11, is achieved by the training of the model through optimization techniques based on gradient descent ,when L is chosen as a differentiable function (e.g. cross-entropy). The parameters θ are iteratively adjusted by thefollowing rule, until convergence: θ k = θ k − − µ ∇ θ C ( θ ) . (B12)Because neural networks are composed of many consecutive layers, the direct calculation of ∇ θ C ( θ ) is computationallyexpensive. However, they can be calculated efficiently by back-propagation , which is an algorithm that propagates LeRCE Stamp Classifier ∇ θ C ( θ ) becomes computationally expensive. As a solution to this problem, a non biased estimationof the gradient ∇ θ ˜ C ( θ ) to apply is used, where the same gradient is calculated over a small random fraction of data,this is called a batch and the amount of data samples in the batch is the batch size . Therefore, the optimizationrule for a batch B ⊂ X is: θ k = θ k − − µ |B| (cid:88) i ∈B ∇ θ L ( y ( i ) , f θ ( x ( i ) )) , (B13)where µ is a constant called learning rate , and establishes how large is the performed training step. This techniqueof training by batches, is a form of stochastic gradient descent , and it guarantees convergence when µ is a well definedsequence in k that satisfies (cid:80) k µ k = ∞ and (cid:80) k µ k < ∞ . • Adam optimizer:
An alternative to the optimization rule of eq. B13, is Adam Kingma & Ba (2017), which is anadaptive learning rate optimization algorithm that automatically adjust µ k . It uses the squared gradients to scalethe learning rate and it includes the moving average of the gradients in its formulation, strategy that is known as momentum , and its used to avoid converge to a local minima in the optimization. The main hyperparameters ofAdam are β and β , which relate to the moving averages of the gradients and the squared gradients, respectively,and they regulate the rate at which the learning rate µ k is adjusted. C. HYPERPARAMETER RANDOM SEARCH RESULTSFor the hyperparameter random search, 133 different combinations of hyperparameters values were sampled fromTable 3, we trained 5 models for each hyperparameter set and then evaluated the test and validation set with everymodel. In addition, for each model the inference time over a single sample was measured. The training procedure took ≈ β = 0 .
5, because it shows the most interpretable range of prediction probabilities, according to astronomers.The model chosen as the best has a validation accuracy of 0.950 ± ± ± M i , where i corresponds to the position of its validation accuracy w.r.t. to all the 133 models, the lower its validationaccuracy, the higher is i . Table A2 shows validation, test accuracy and inference time for the models with top-5validation accuracy of Figure A2b, where M is selected as the best model, which is used for experiments shown inprevious sections. In Table A2, metrics of the best model M are underlined, whereas bold metrics correspond to thehighest of their respective column. Coincidentally, model M chosen as the best, has the higher test accuracy amongthe top-5 models, and when compared to the model M with worst test accuracy, the Welchs hypothesis test p-valueis 0.364, meaning that the difference is not statically meaningful.Figure A2b shows that test accuracy and inference time error bars of each of the top-5 models contain each other. D. RESULTS OF CLASSIFIER USING IMAGES ONLY2
R. Carrasco-Davis and E. Reyes
20 22 24 26
Inference time [ms] T e s t a cc u r a c y (a) Test accuracy v/s inference time of 133 models fromhyperparameter random search.
18 20 22 24 26
Inference time [ms] T e s t a cc u r a c y M : ( : 0.0, BS: 32, LR: 1e-03,DR: 0.2, IS: 21, KS: 7) M : ( : 0.5, BS: 64, LR: 1e-03,DR: 0.5, IS: 21, KS: 5) M : ( : 1.0, BS: 32, LR: 1e-03,DR: 0.5, IS: 21, KS: 7) M : ( : 1.0, BS: 32, LR: 5e-04,DR: 0.2, IS: 21, KS: 3) M : ( : 0.8, BS: 32, LR: 1e-03,DR: 0.8, IS: 21, KS: 3) (b) Mean test accuracy v/s inference time of the 5 modelswith best validation accuracy from hyperparameter randomsearch. Figure A2.
Accuracy of 133 models from hyperparameter random search. For each model, results consider 5 trainings andrespective evaluations on the test set. (a) Test accuracy versus inference time, where each dot is a model with differenthyperparameters, the closer to the left top corner is a model, the better its performance. Model represented with diamondscorrespond to the 5 models with best validation accuracy. (b) Test accuracy versus inference time for the models representedas diamond in (a), each model has its error bars corresponding to one standard deviation and every model is denoted as M i ,where i corresponds to the ranking of its validation accuracy performance among all the 133 models of (a). Table A2.
Top-5 models with highest validation accuracy from the hyperparameter random search, ranked from M to M .There are no statistical difference between the accuracy and inference time of the displayed models. M is chosen as the bestmodel because it has β = 0 .
5, which shows the most interpretable range of prediction probabilities, according to astronomers.Metrics of the best model M are underlined, whereas bold metrics are the best of their respective column.Model Name Model’s Hyperparameters Validation Accuracy Test Accuracy Inference Time [ms] M β : 0, BS: 32, LR: 1e-03, DR: 0.2, IS: 21, KS: 7 ± ± ± M β : 0.5, BS: 64, LR: 1e-03, DR: 0.5, IS: 21, KS: 5 0.950 ± ± ± M β : 1.0, BS: 32, LR: 1e-03, DR: 0.5, IS: 21, KS: 7 0.949 ± ± ± M β : 1.0, BS: 32, LR: 5e-04, DR: 0.2, IS: 21, KS: 3 0.948 ± ± ± M β : 0.8, BS: 32, LR: 1e-03, DR: 0.8, IS: 21, KS: 3 0.949 ± ± ± Welch’s t-test p-value M v/s M — M v/s M — M v/s M REFERENCES
Astropy Collaboration, Robitaille, T. P., Tollerud, E. J.,et al. 2013, \ textbackslashaap, 558, A33,doi: 10.1051/0004-6361/201322068Bach, S., Binder, A., Montavon, G., et al. 2015, PLoSONE, 10, doi: 10.1371/journal.pone.0130140Barchi, P. H., de Carvalho, R. R., Rosa, R. R., et al. 2020,Astronomy and Computing, 30, 100334,doi: 10.1016/j.ascom.2019.100334Becker, I., Pichara, K., Catelan, M., et al. 2020, \ mnras,493, 2981, doi: 10.1093/mnras/staa350 Bellm, E. C., Kulkarni, S. R., Graham, M. J., et al. 2018,Publications of the Astronomical Society of the Pacific,131, 018002, doi: 10.1088/1538-3873/aaecbeBertin, E., & Arnouts, S. 1996, Astronomy andAstrophysics Supplement Series, 117, 393,doi: 10.1051/aas:1996164Boch, T., & Fernique, P. 2014, 485, 277.http://adsabs.harvard.edu/abs/2014ASPC..485..277BBonnarel, F., Fernique, P., Bienaym, O., et al. 2000,Astronomy and Astrophysics Supplement Series, 143, 33,doi: 10.1051/aas:2000331 LeRCE Stamp Classifier AGN SN VS asteroid bogusPredicted labelAGNSNVSasteroidbogus T r u e l a b e l Figure A3.
Confusion matrix for the test set when using the stamp classifier without alert metadata features.
Figure A4.
Space distribution for the unlabeled data, and distribution of predictions per class using the stamp classifierwithout features. The ecliptic is shown with a yellow line with black edges. Extragalactic classes (SNe and AGNs) are foundwith higher density the Galactic plane, compared to the predictions of the stamp classifier with features, shown in Figure 8.Also, the predicted asteroids by the classifier without using alert metadata features have higher density far from the eclipticcompared to the classifier that uses features.Boone, K. 2019, The Astronomical Journal, 158, 257,doi: 10.3847/1538-3881/ab5182BRAHEM, M., Zeitouni, K., & Yeh, L. 2018, IEEETransactions on Big Data, 1,doi: 10.1109/TBDATA.2018.2873749Cabrera-Vives, G., Reyes, I., Frster, F., Estvez, P. A., &Maureira, J.-C. 2017, The Astrophysical Journal, 836, 97,doi: 10.3847/1538-4357/836/1/97 Carrasco-Davis, R., Cabrera-Vives, G., Frster, F., et al.2019, Publications of the Astronomical Society of thePacific, 131, 108006, doi: 10.1088/1538-3873/aaef12Chambers, K. C., Magnier, E. A., Metcalfe, N., et al. 2019,arXiv:1612.05560 [astro-ph].http://arxiv.org/abs/1612.05560 R. Carrasco-Davis and E. Reyes
Chollet, F., & others. 2018, Astrophysics Source CodeLibrary, ascl:1806.022.http://adsabs.harvard.edu/abs/2018ascl.soft06022CDieleman, S., De Fauw, J., & Kavukcuoglu, K. 2016,arXiv:1602.02660 [cs]. http://arxiv.org/abs/1602.02660Dieleman, S., Willett, K. W., & Dambre, J. 2015, MonthlyNotices of the Royal Astronomical Society, 450, 1441,doi: 10.1093/mnras/stv632Drake, A. J., Graham, M. J., Djorgovski, S. G., et al. 2014,The Astrophysical Journal Supplement Series, 213, 9,doi: 10.1088/0067-0049/213/1/9Drake, A. J., Djorgovski, S. G., Catelan, M., et al. 2017,Monthly Notices of the Royal Astronomical Society, 469,3688, doi: 10.1093/mnras/stx1085Duev, D. A., Mahabal, A., Masci, F. J., et al. 2019,Monthly Notices of the Royal Astronomical Society, 489,3582, doi: 10.1093/mnras/stz2357Flesch, E. W. 2015, Publications of the AstronomicalSociety of Australia, 32, doi: 10.1017/pasa.2015.10—. 2019, arXiv:1912.05614 [astro-ph].http://arxiv.org/abs/1912.05614Frster, F., Moriya, T. J., Maureira, J. C., et al. 2018,Nature Astronomy, 2, 808,doi: 10.1038/s41550-018-0563-4Frster et al., F. 2020, \ apjGal-Yam, A., Arcavi, I., Ofek, E. O., et al. 2014, \ textbackslashnat, 509, 471, doi: 10.1038/nature13304Goldstein, D. A., D’Andrea, C. B., Fischer, J. A., et al.2015, The Astronomical Journal, 150, 82,doi: 10.1088/0004-6256/150/3/82Groh, J. H. 2014, \ textbackslashaap, 572, L11,doi: 10.1051/0004-6361/201424852Gmez, C., Neira, M., Hoyos, M. H., Arbelez, P., &Forero-Romero, J. E. 2020, arXiv:2004.13877 [astro-ph].http://arxiv.org/abs/2004.13877Hunter, J. D. 2007, Computing in Science Engineering, 9,90, doi: 10.1109/MCSE.2007.55Ivezi´c, ˇZ., Kahn, S. M., Tyson, J. A., et al. 2019, TheAstrophysical Journal, 873, 111,doi: 10.3847/1538-4357/ab042cJayasinghe, T., Kochanek, C. S., Stanek, K. Z., et al. 2018,Monthly Notices of the Royal Astronomical Society, 477,3145, doi: 10.1093/mnras/sty838Jayasinghe, T., Stanek, K. Z., Kochanek, C. S., et al.2019a, Monthly Notices of the Royal AstronomicalSociety, 486, 1907, doi: 10.1093/mnras/stz844—. 2019b, Monthly Notices of the Royal AstronomicalSociety, 485, 961, doi: 10.1093/mnras/stz444—. 2020, Monthly Notices of the Royal AstronomicalSociety, 491, 13, doi: 10.1093/mnras/stz2711 Jiang, J.-A., Doi, M., Maeda, K., et al. 2017, \ textbackslashnat, 550, 80, doi: 10.1038/nature23908Kasen, D. 2010, \ textbackslashapj, 708, 1025,doi: 10.1088/0004-637X/708/2/1025Khazov, D., Yaron, O., Gal-Yam, A., et al. 2016, \ \ textbackslashmnras, 415, 199,doi: 10.1111/j.1365-2966.2011.18689.xMorozova, V., Piro, A. L., & Valenti, S. 2017, \ textbackslashapj, 838, 28,doi: 10.3847/1538-4357/aa6251Mowlavi, N., Lecoeur-Tabi, I., Lebzelter, T., et al. 2018,Astronomy & Astrophysics, 618, A58,doi: 10.1051/0004-6361/201833366Muthukrishna, D., Parkinson, D., & Tucker, B. E. 2019,The Astrophysical Journal, 885, 85,doi: 10.3847/1538-4357/ab48f4 LeRCE Stamp Classifier Nair, V., & Hinton, G. E. 2010, in Proceedings of the 27thInternational Conference on International Conference onMachine Learning, ICML’10 (Haifa, Israel: Omnipress),807–814Narayan, G., Zaidi, T., Soraisam, M. D., et al. 2018, TheAstrophysical Journal Supplement Series, 236, 9,doi: 10.3847/1538-4365/aab781Naul, B., Bloom, J. S., Prez, F., & van der Walt, S. 2018,Nature Astronomy, 2, 151,doi: 10.1038/s41550-017-0321-zNoebauer, U. M., Kromer, M., Taubenberger, S., et al.2017, \ textbackslashmnras, 472, 2787,doi: 10.1093/mnras/stx2093Nordin, J., Brinnel, V., Santen, J. v., et al. 2019,Astronomy & Astrophysics, 631, A147,doi: 10.1051/0004-6361/201935634Nugent, P. E., Sullivan, M., Cenko, S. B., et al. 2011, \ textbackslashnat, 480, 344, doi: 10.1038/nature10644Oh, K., Yi, S. K., Schawinski, K., et al. 2015, TheAstrophysical Journal Supplement Series, 219, 1,doi: 10.1088/0067-0049/219/1/1Palaversa, L., Ivezi´c, ˇZ., Eyer, L., et al. 2013, TheAstronomical Journal, 146, 101,doi: 10.1088/0004-6256/146/4/101Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011,Journal of Machine Learning Research, 12, 2825.http://jmlr.org/papers/v12/pedregosa11a.htmlPereyra, G., Tucker, G., Chorowski, J., Kaiser, L., &Hinton, G. 2017, arXiv:1701.06548 [cs].http://arxiv.org/abs/1701.06548Pichara, K., Protopapas, P., & Len, D. 2016, TheAstrophysical Journal, 819, 18,doi: 10.3847/0004-637X/819/1/18Piro, A. L., & Morozova, V. S. 2016, \ textbackslashapj,826, 96, doi: 10.3847/0004-637X/826/1/96Piro, A. L., & Nakar, E. 2013, \ textbackslashapj, 769, 67,doi: 10.1088/0004-637X/769/1/67Prez-Carrasco, M., Cabrera-Vives, G., Martinez-Marin, M.,et al. 2019, Publications of the Astronomical Society ofthe Pacific, 131, 108002, doi: 10.1088/1538-3873/aaeeb4Reyes, E., Estvez, P. A., Reyes, I., et al. 2018, in 2018International Joint Conference on Neural Networks(IJCNN), 1–8, doi: 10.1109/IJCNN.2018.8489627Richards, J. W., Starr, D. L., Butler, N. R., et al. 2011,The Astrophysical Journal, 733, 10,doi: 10.1088/0004-637X/733/1/10 Rimoldini, L., Holl, B., Audard, M., et al. 2019, Astronomy& Astrophysics, 625, A97,doi: 10.1051/0004-6361/201834616Rocklin, M. 2015, Proceedings of the 14th Python inScience Conference, 126,doi: 10.25080/Majora-7b98e3ed-013Smith, K. W., Williams, R. D., Young, D. R., et al. 2019,Research Notes of the AAS, 3, 26,doi: 10.3847/2515-5172/ab020fSoumagnac, M. T., & Ofek, E. O. 2018, Publications of theAstronomical Society of the Pacific, 130, 075002,doi: 10.1088/1538-3873/aac410Steer, I., Madore, B. F., Mazzarella, J. M., et al. 2016, TheAstronomical Journal, 153, 37,doi: 10.3847/1538-3881/153/1/37Stetson, P. B. 1987, Publications of the AstronomicalSociety of the Pacific, 99, 191, doi: 10.1086/131977SnchezSez et al., P. 2020, \ apjTanaka, M., Tominaga, N., Morokuma, T., et al. 2016, TheAstrophysical Journal, 819, 5,doi: 10.3847/0004-637X/819/1/5Tominaga, N., Morokuma, T., Blinnikov, S. I., et al. 2011, \\