[PDF] Alert Classification for the ALeRCE Broker System: The Light Curve Classifier

Abstract

We present the first version of the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream, and colors obtained from AllWISE and ZTF photometry. We apply a Balanced Random Forest algorithm with a two-level scheme, where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolves each of these hierarchical classes, amongst 15 total classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including core- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and {\em Gaia} DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with ≥6 g -band or ≥6 r -band detections in ZTF (868,371 sources as of 2020/06/09), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively. Updated classifications from the light curve classifier can be found at the \href{http://alerce.online}{ALeRCE Explorer website}.

Full PDF

DDraft version August 11, 2020

Typeset using L A TEX twocolumn style in AASTeX63

Alert Classiﬁcation for the ALeRCE Broker System: The Light Curve Classiﬁer

P. S´anchez-S´aez,

1, 2, 3

I. Reyes,

1, 4, 5

C. Valenzuela,

4, 1

F. F¨orster,

6, 1, 7

S. Eyheramendy,

3, 1

F. Elorrieta,

6, 1

F. E. Bauer,

2, 8, 1, 9

G. Cabrera-Vives,

10, 1

P. A. Est´evez,

5, 1

M. Catelan,

2, 8, 1

G. Pignata,

11, 1

P. Huijse,

12, 1

D. De Cicco,

1, 2

P. Ar´evalo, R. Carrasco-Davis, J. Abril,

14, 15

R. Kurtev,

13, 1

J. Borissova,

13, 1

J. Arredondo, E. Castillo-Navarrete,

1, 4

D. Rodriguez, D. Ruz-Mieres,

1, 4

A. Moya,

4, 1

L. Sabatini-Gacit´ua,

4, 1 andC. Sep´ulveda-Cobo

4, 1 Millennium Institute of Astrophysics (MAS), Nuncio Monse˜nor S´otero Sanz 100, Providencia, Santiago, Chile Instituto de Astrof´ısica, Facultad de F´ısica, Pontiﬁcia Universidad Cat´olica de Chile, Casilla 306, Santiago 22, Chile Faculty of Engineering and Sciences, Universidad Adolfo Iba˜nez, Diagonal Las Torres 2700, Pe˜nalol´en, Santiago, Chile Center for Mathematical Modeling, Universidad de Chile, Beauchef 851, North building, 7th ﬂoor, Santiago 8320000, Chile Department of Electrical Engineering, Universidad de Chile, Av. Tupper 2007, Santiago 8320000, Chile Departmento de Matem´aticas, Facultad de Ciencia, Universidad de Santiago de Chile, Av. Libertador Bernardo OHiggins 3663.Estaci´on Central, Santiago, Chile Departamento de Astronom´ıa, Universidad de Chile, Casilla 36D, Santiago, Chile Centro de Astroingenier´ıa, Pontiﬁcia Universidad Cat´olica de Chile, Av. Vicu˜na Mackenna 4860, 7820436 Macul, Santiago, Chile Space Science Institute, 4750 Walnut Street, Suite 205, Boulder, Colorado 80301 Department of Computer Science, University of Concepc´on, Edmundo Larenas 219, Concepci´on, Chile Departamento de Ciencias Fis´ıcas, Universidad Andres Bello, Avda. Republica 252, Santiago, Chile Informatics Institute, Universidad Austral de Chile, General Lagos 2086, Valdivia, Chile Instituto de F´ısica y Astronom´ıa, Facultad de Ciencias, Universidad de Valpara´ıso, Gran Bretana No. 1111, Playa Ancha, Valparaso,Chile European Southern Observatory (ESO), Alonso de C´ordova 3107, Vitacura, Santiago, Chile Centro de Estudios de F´ısica del Cosmos de Arag´on (CEFCA) - Unidad Asociada al CSIC, Plaza San Juan, 1, E-44001, Teruel, Spain (Received; Revised; Accepted)

Submitted to AJABSTRACTWe present the ﬁrst version of the ALeRCE (Automatic Learning for the Rapid Classiﬁcation ofEvents) broker light curve classiﬁer. ALeRCE is currently processing the Zwicky Transient Facility(ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curveclassiﬁer uses variability features computed from the ZTF alert stream, and colors obtained fromAllWISE and ZTF photometry. We apply a Balanced Hierarchical Random Forest algorithm with atwo-level scheme, where the top level classiﬁes each source as periodic, stochastic, or transient, andthe bottom level further resolve each hierarchical class, yielding a total of 15 classes. This classiﬁercorresponds to the ﬁrst attempt to classify multiple classes of stochastic variables (including nucleus-and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) inaddition to diﬀerent classes of periodic and transient sources, using real data. We created a labeled setusing various public catalogs (such as the Catalina Surveys and

Gaia

DR2 variable stars catalogs, andthe Million Quasars catalog), and we classify all objects with ≥ g -band or ≥ r -band detections inZTF (868,371 sources as of 2020/06/09), providing updated classiﬁcations for sources with new alertsevery day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99,respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and0.76, respectively. Updated classiﬁcations from the light curve classiﬁer can be found at the ALeRCEExplorer website. Corresponding author: P. S´anchez-S´[email protected] a r X i v : . [ a s t r o - ph . I M ] A ug S´anchez-S´aez et al.

Keywords: galaxies: active – methods: data analysis – stars: variables: general – supernovae: general– surveys INTRODUCTIONBrightness variations of astrophysical objects oﬀer keyinsights into their physical emission mechanisms andrelated phenomena. In stars, pulsations, both radialand non-radial, can result from a thermodynamic en-gine operating in their partial ionization layers, whenstars are located inside one of the several so-called in-stability strips that are found in the Hertzsprung-Russelldiagram. Eruptive events can be generated by materialbeing lost from a star, or occasionally accreted onto it, asis typical in protostars and young stellar objects (YSOs).Explosive events can occur when material is accretedonto compact objects, such as white dwarfs in the case ofcataclysmic variables (CVs) or neutron stars in the caseof X-ray binaries, or star mergers. Brightness changescan also originate from the rotation of stars, caused bysurface features such as starspots, and/or by stars’ el-lipsoidal shapes. Finally, eclipses can occur, dependingon the observer’s line-of-sight, due to the presence ofbinary companions, planets, and/or other circumstellarmaterial. These and other classes of stellar variabilityare reviewed and summarized, for instance, in Catelan& Smith (2015), where extensive additional referencescan be found. In addition, there are a wide array oftransients such as kilonovae (Metzger et al. 2010), super-novae (SNe; Woosley et al. 2002), and tidal disruptionevents, which are beacons of destructive episodes in thelife of a star (Komossa 2015). Galaxies, in turn, can alsopresent a wide array of variability phenomena. In thosehosting strongly accreting massive black holes (BHs),for instance, variations develop due to the stochasticnature of the accretion disk, corona, and jet emission,potentially related to both the BH properties and thestructure of the material in the immediate vicinity (e.g.,MacLeod et al. 2010; Caplar et al. 2017; S´anchez-S´aezet al. 2018).To study the variability of individual variable andtransient objects in detail, and to use this informationto probe diﬀerent variability and physical models, ob-servations over a wide range of timescales are required.Hence, long and intensive campaigns of a large numberof targets are crucial. In recent years surveys coveringa signiﬁcant part of the sky, revisiting the same regionson timescales from days to years, and containing a largesample of serendipitous objects, are now becoming avail-able as predecessors of the Vera C. Rubin ObservatoryLegacy Survey of Space and Time (LSST; Ivezi´c et al.2019). Among these is the Zwicky Transient Facility (ZTF;Bellm 2014; Bellm et al. 2019), which had ﬁrst light in2017 and employs a powerful 47 deg ﬁeld-of-view cam-era mounted on the Samuel Oschin 48-inch Schmidt tele-scope. ZTF is designed to image the entire northern skyevery three nights and scan the plane of the Milky Waytwice each night to a limiting magnitude of 20.5 in gri ,thus enabling a wide variety of novel multiband timeseries studies, in preparation for the LSST.LSST, which aims for ﬁrst light in 2022, will revo-lutionize time domain astronomy, enabling for the ﬁrsttime the study of transient and variable objects over longperiods of time ( ∼

10 years) with (cid:38) r ∼ . ∼ . ∼ . σ ), over a large sky area ( > ).Given the large number of sources that ZTF andLSST will observe ( ∼ he ALeRCE Light Curve Classifier DATA2.1.

Reference Data

ALeRCE has been processing the public ZTF alertstream since May 2019, which includes g and r pho-tometry. The ALeRCE pipeline is described in detail byF¨orster (2020); for clarity, we provide a brief descriptionof the light curve construction process.The ALeRCE pipeline processes the ZTF Avro alertﬁles. These ﬁles contain metadata and contextual in-formation for a single event, which are deﬁned as aﬂux-transient, a reoccurring ﬂux-variable, or a mov-ing object (Masci et al. 2019). To construct lightcurves, the ALeRCE pipeline uses: the photometry of For details, see https://zwickytransientfacility.github.io/ztf-avro-alert/.

S´anchez-S´aez et al. the diﬀerence-image and reference-image (detections);possible non-detections associated with the target dur-ing the previous 30 days of the event (5 σ magnitudelimit in the diﬀerence image based on PSF-ﬁt photom-etry, called diffmaglim by ZTF); the real-bogus qual-ity score reported by ZTF ( rb , which ranges from 0 to1, with values closer to 1 implying more reliable detec-tions); and the morphological classiﬁcation of the clos-est object obtained from PanSTARRS1 (Tachibana &Miller 2018). An overview of the pipeline is presentedin Figure 1. In addition, Figure 2 shows an example ofa SN light curve obtained using ZTF data. In summary,the diﬀerent stages of the pipeline are:1) Ingestion: the ZTF public stream is ingested usingKafka.2) S3 upload: the alert Avro packets are stored inAWS S3 for later access.3) Crossmatch: the position of the alert is used toquery external catalogs.4) Stamp classiﬁer: alerts from new objects are clas-siﬁed using their image cutouts (stamps).5) Preprocessing: the photometry associated with agiven alert is corrected to take into account theuse of diﬀerence image ﬂuxes (see details below),and simple statistics associated with the aggre-gated light curve are computed.6) Light curve features: advanced light curve statis-tics (features) are computed when there are atleast six detections in a given band.7) Light curve classiﬁer: the light curve classiﬁer de-scribed in this work is applied.8) Outliers: an outlier detection algorithm is applied.9) ALeRCE stream: the aggregated, annotated andclassiﬁed light curves are reported in a Kafkastream.In step 3) we are experimenting with several catalogs,but for this work we use the AllWISE public SourceCatalog (Wright et al. 2010; Mainzer et al. 2011), in-voking a match radius of 2 arcseconds, to obtain W1,W2, and W3 photometry (using magnitudes measuredwith proﬁle-ﬁtting photometry, e.g., w1mpro).The preprocessing procedure (step 5) is described indetail in F¨orster (2020) (see Section A of their ap-pendix). In particular, for the light curve classiﬁer we http://wise2.ipac.caltech.edu/docs/release/allwise/ use the corrected light curves ( lc corr ; ˆ m sci in F¨orster2020) for sources whose closest source in the referenceimage coincides with the location of the alert (in a radiusof 1.4 arcseconds). It is important to use the correctedlight curves for persistent variable sources, in order totake into account changes in the sign of the diﬀerencebetween the reference and the science images, or possi-ble changes of the reference image. For the rest of thesources we use the light curves obtained using the dif-ference images ( lc diff ; m diﬀ in F¨orster 2020), whichcorrespond in general to transient sources.2.2. Classiﬁcation Taxonomy

The ﬁrst version of the ALeRCE light curve classiﬁerconsiders 15 subclasses of variable and transient objects,presented as a taxonomy tree deﬁned by the ALeRCEcollaboration in Figure 3. The taxonomy is subdividedin a hierarchical fashion according to both the physi-cal properties of each class and the empirical variabilityproperties of the light curves, as follows (in parenthesiswe indicate the class name used by the classiﬁer): • Transient: Type Ia supernova (SNIa), Type Ibcsupernova (SNIbc), Type II supernova (SNII), andSuper Luminous Supernova (SLSN); • Stochastic: Type 1 Seyfert galaxy (AGN; i.e.,host-dominated active galactic nuclei), Type 1Quasar (QSO; i.e., nucleus-dominated activegalactic nuclei), blazar (Blazar; i.e, beamed jet-dominated active galactic nuclei), Young StellarObject (YSO), and Cataclysmic Variable/Nova(CV/Nova); • Periodic: Long-Period Variable (LPV; includesregular, semi-regular, and irregular variable stars),RR Lyrae (RRL), Cepheid (CEP), eclipsing binary(E), δ Scuti (DSCT), and other periodic variablestars (Periodic-Other; this includes classes of vari-able stars that are not well represented in the la-beled set, e.g., sources classiﬁed as Hump, Misc,Rotational, or RS CVn in CRTS).It is important to note that there are a number of lesscommon classes which have not been separated out yetin the ALeRCE taxonomy tree, because the number ofcross-matched objects in these classes is too low to traina good classiﬁcation model (e.g. SNe IIb, TDEs, KNe,among others). There is a catch-all “Periodic-Other”class for periodic classes excluded in the taxonomy tree,but not for transient or stochastic classes, and thus, forthe moment, these missing classes are being groupedinto one or more of the existing ones. he ALeRCE Light Curve Classifier ZTFstream S3 upload Preprocessing LC features LC classifierOutliers ALeRCE streamStamp classifierCrossmatch≥ 6 detectionsall alerts

Figure 1.

A scheme of the ALeRCE pipeline. ZTF alerts are ingested using Kafka and a series of sequential and parallel stepsare initiated. Alerts are stored in AWS S3, classiﬁed based on its image stamps, crossmatched with other catalogs, and theirphotometry corrected to take into account diﬀerence ﬂuxes. Aggregated light curves are used to compute basic statistics (forinternal use) and, if enough data points exist, features are computed, and a light curve and outlier classiﬁers are applied beforesending an output stream. A PostgreSQL database is populated along the way, which can then be queried.

Figure 2.

A SN light curve queried from the previouslypopulated database (after stage 5 of the pipeline is per-formed). Green circles and red squares indicate the g and r bands, respectively. Error bars indicate photometry asso-ciated with detections, with consecutive measurements con-nected by straight solid line segments. Triangles denote lim-iting magnitudes. Labeled Set

The labeled set (i.e., the set of sources used to de-ﬁne the training and testing sets) for the light curveclassiﬁer was built using sources observed by ZTF withknown labels obtained via spectroscopic and/or photo-metric analysis by previous works. Further descriptionof the labeled set construction strategy can be foundin F¨orster (2020). We obtained labels from the follow-ing catalogs: the ASAS-SN catalogue of variable stars(ASASSN; Jayasinghe et al. 2018, 2019a,b, 2020), theCatalina Surveys Variable Star Catalogs (CRTS; Drakeet al. 2014; Drake et al. 2017), LINEAR catalog of peri-odic light curves (LINEAR; Palaversa et al. 2013), GaiaData Release 2 (

Gaia

DR2; Mowlavi et al. 2018; Rimol-dini et al. 2019), the Transient Name Server database(TNS) , the Roma-BZCAT Multi-Frequency Catalog ofBlazars (ROMABZCAT; Massaro et al. 2015), the Mil-lion Quasars Catalog (MILLIQUAS, version 6.4c, De-cember 2019; Flesch 2015, 2019), the New Catalog ofType 1 AGNs (Oh2015; Oh et al. 2015), and the SIM-BAD database (Wenger et al. 2000). Some additionalCV labels were obtained from diﬀerent catalogs (includ-ing Ritter & Kolb 2003), compiled by Abril et al. (2020)(JAbril). https://wis-tns.weizmann.ac.il/ S´anchez-S´aez et al.

Real BogusTransient StochasticSN IaSN Ib/c SN II Super luminousSN Long PeriodVariableYoung Stellar ObjectNova / Cataclysmicvariable

Illustrations: @wandering_astro

Blazar QSO AGN PulsatingPeriodic RR Lyrae δ ScutiCepheidAlert ? Periodic-OtherEclipsing Binary

Figure 3.

Taxonomy tree used in the current version of the ALeRCE light curve classiﬁer.

Table 1 lists the number of sources in the labeled setbelonging to each class (with their correspondent per-centages according to their hierarchical group), and thecatalogs from which the classiﬁcations were obtained.Only sources with ≥ g or ≥ r were included (considering data obtained un-til 2020/06/09). It is clear from the table that there isa high imbalance in the labeled set, with some classesrepresenting less than 5% of their respective hierarchicalgroup. Figure 4 shows the (ordered) number of sourcesper class for the labeled set, and Figure 5 shows thefraction of sources in each class with photometry onlyin the g band, only in the r band, or in both bands. FEATURES USED BY THE CLASSIFIERThe light curve classiﬁer uses a total of 152 features.We avoid including features that require a long time tocompute, for example features that require the use ofMarkov chain Monte Carlo techniques, since one of thegoals of the light curve classiﬁer is to provide a fast andhighly scalable classiﬁcation. 146 of these features arecomputed using the public ZTF g and r data. We ex-cluded the mean magnitude as a feature to avoid thatany bias in the labeled set magnitude distribution af-fects the classiﬁcation of sources that are fainter (orbrighter). Features obtained using the ZTF observedmagnitudes are called detection features (124 in total), E RR L Q S O L P V A G N Y S O S N I a B l a z a r P e r i o d i c - O t h e r C V / N o v a D S C T C E P S N II S N I b c S L S N o f s o u r c e s Figure 4.

Number of sources per class for the labeled set,as reported in Table 1. and features computed using the ZTF non-detection 5 σ magnitude limits diffmaglim ’s are called non-detectionfeatures (18 in total). These features are described inthe following sections (3.1 and 3.2), as well as in Ap-pendix A. Considering the LSST Data Products Deﬁ-nition Document (Juri´c et al. 2019), we expect that allthese features would be measured using LSST data. he ALeRCE Light Curve Classifier Table 1.

Labeled set deﬁnitionHierarchical Class Class † Source CatalogsSNIa 1272 (74.0%) TNSSNIbc 94 (5.5%) TNSTransient SNII 328 (19.1%) TNSSLSN 24 (1.4%) TNSTotal 1718QSO 26168 (75.4%) MILLIQUAS (sources with class “Q”)AGN 4667 (13.4%) Oh2015, MILLIQUAS (sources with class “A”)Stochastic Blazar 1267 (3.6%) ROMABZCAT, MILLIQUAS (sources with class “B”)YSO 1740 (5.0%) SIMBADCV/Nova 871 (2.5%) TNS, ASASSN, JAbrilTotal 34713LPV 14076 (16.2%) CRTS, ASASSN,

Gaia

DR2E 37901 (43.5%) CRTS, ASASSN, LINEARDSCT 732 (0.8%) CRTS, ASASSN, LINEAR,

Gaia

DR2Periodic RRL 32482 (37.3%) CRTS, ASASSN, LINEAR,

Gaia

DR2CEP 618 (0.7%) CRTS, ASASSNPeriodic-Other 1256 (1.4%) CRTS, LINEARTotal 87065 † Values in parentheses correspond to the fraction of sources of a given class (second column) within its correspondinghierarchical class (ﬁrst column).

Figure 5.

For the sources in the labeled set, this ﬁgure shows the fraction of sources in each class with photometry: only inthe g band (green); only in the r band (red); or in both bands (grey). The reasons for the non-uniformity of coverage may bephysical (strongly red or blue source) or organizational (survey focused on one band only). For most classes, the vast majorityof the sources ( (cid:38) g and r ; the exceptions are the YSO, LPV, and CEP classes, whereonly 76% of the sources have photometry in both bands. We also included as features the galactic coordinatesof each target ( gal b and gal l ), the

W1-W2 and

W2-W3

AllWISE colors, and the g − W2 , g − W3 , r − W2 , and r − W3 colors, where g and r are computed as the mean mag-nitude of the g band and r band light curves for agiven source. In addition, we use information includedin the Avro ﬁles metadata: the sgscore1 parameter,which corresponds to a morphological star/galaxy scoreof the closest source from PanSTARRS1 (Tachibana& Miller 2018) reported in the ZTF Avro ﬁles, with0 ≤ sgscore1 ≤

1, where values closer to 1 imply a higher likelihood of the source being a star; and the me-dian rb (real-bogus) parameter for each band.As we mentioned in Section 2.1, in this work we onlyconsider light curves with ≥ g or ≥ r . If a given source has ≥ S´anchez-S´aez et al.

Detection Features

Most of the features used by the light curve classiﬁerare computed using the observed magnitudes in the g and r bands (i.e., the detections). There are 56 diﬀer-ent features computed for each band, and 12 featurescomputed using a combination of both bands, yieldinga total of 124 detection features. The deﬁnition of allthese features can be found in Table 2. We split thetable in three blocks. The ﬁrst block contains new fea-tures deﬁned by this work (i.e., novel features). Some of these features are further described in Section 3.1.1. Thesecond block contains features that correspond to newvariants of descriptors included in other works. Some ofthem are further described in Section 3.1.1 and in Ap-pendix A. Finally, the third block includes 22 featuresthat come from the Feature Analysis for Time Series(FATS; Nun et al. 2015) Python package. Hereafter,features ending with “ 1” are computed for the g band,and features ending with “ 2” are computed for the r band, following the notation used in the ZTF Avro ﬁles. Table 2 . List of detection features used by the light curve classiﬁer. Features marked with (cid:7) are computed using both g and r bands at the same time. Features marked with * and ** are further described in Section 3.1.1 and Appendix A, respectively.Feature Description Reference delta period Absolute value of the diﬀerence between the

Multiband period andthe MHAOV period obtained using a single band This work

IAR phi * Level of autocorrelation using a discrete-time representation of aDRW model Eyheramendy et al. (2018)MHPS parameters* Obtained from a MHPS analysis (three in total) Ar´evalo et al. (2012) positive fraction

Fraction of detections in the diﬀerence-images of a given band whichare brighter than the template image This work

Power rate * (cid:7) Ratio between the power of the multiband periodogram obtainedfor the best period candidate ( P ) and 2 × P , 3 × P , 4 × P , P/ P/ P/ PPE * (cid:7) Multiband Periodogram Pseudo Entropy This work ( g - r ) max (cid:7) g − r color obtained using the brightest lc diff magnitude in eachband This work ( g - r ) max corr (cid:7) g − r color obtained using the brightest lc corr magnitude in eachband This work ( g - r ) mean (cid:7) g − r color obtained using the mean lc diff magnitude of each band This work ( g - r ) mean corr (cid:7) g − r color obtained using the mean lc corr magnitude of each band This work delta mag fid Diﬀerence between maximum and minimum observed magnitude ina given band This work

ExcessVar ** Measure of the intrinsic variability amplitude Allevato et al. (2013)

GP DRW tau ** Relaxation time τ from DRW modeling Graham et al. (2017) GP DRW sigma ** Amplitude of the variability at short timescales ( t << τ ), fromDRW modeling Graham et al. (2017)Harmonics parameters* Obtained by ﬁtting a harmonic series up to the seventh harmonic(14 in total) (Stellingwerf & Donohoe1986)

Multiband period * (cid:7) Period obtained using the multiband MHAOV periodogram Mondrik et al. (2015)

Pvar ** Probability that the source is intrinsically variable McLaughlin et al. (1996)

SF ML amplitude ** rms magnitude diﬀerence of the SF, computed over a 1 yr timescale Schmidt et al. (2010)

SF ML gamma ** Logarithmic gradient of the mean change in magnitude Schmidt et al. (2010)SPM features* Supernova parametric model features (seven in total) Villar et al. (2019b)

Amplitude

Half of the diﬀerence between the median of the maximum 5% andof the minimum 5% magnitudes Richards et al. (2011)

AndersonDarling

Test of whether a sample of data comes from a population with aspeciﬁc distribution Nun et al. (2015)

Autocor length

Lag value where the auto-correlation function becomes smaller than

Eta e

Kim et al. (2011)

Beyond1Std

Percentage of points with photometric mag that lie beyond 1 σ fromthe mean Richards et al. (2011) he ALeRCE Light Curve Classifier Con

Number of three consecutive data points brighter/fainter than 2 σ of the light curve Kim et al. (2011) Eta e

Ratio of the mean of the squares of successive mag diﬀerences tothe variance of the light curve Kim et al. (2014)

Gskew

Median-based measure of the skew Nun et al. (2015)

LinearTrend

Slope of a linear ﬁt to the light curve Richards et al. (2011)

MaxSlope

Maximum absolute magnitude slope between two consecutiveobservations Richards et al. (2011)

Meanvariance

Ratio of the standard deviation to the mean magnitude Nun et al. (2015)

MedianAbsDev

Median discrepancy of the data from the median data Richards et al. (2011)

MedianBRP

Fraction of photometric points within amplitude/10 of the medianmag Richards et al. (2011)

PairSlopeTrend

Fraction of increasing ﬁrst diﬀerences minus fraction of decreasingﬁrst diﬀerences over the last 30 time-sorted mag measures Richards et al. (2011)

PercentAmplitude

Largest percentage diﬀerence between either max or min mag andmedian mag Richards et al. (2011)

Psi CS

Range of a cumulative sum applied to the phase-folded light curve Kim et al. (2011)

Psi eta Eta e index calculated from the folded light curve Kim et al. (2014)

Q31

Diﬀerence between the 3 rd and the 1 st quartile of the light curve Kim et al. (2014) Rcs

Range of a cumulative sum Kim et al. (2011)

Skew

Skewness measure Richards et al. (2011)

SmallKurtosis

Small sample kurtosis of the magnitudes Richards et al. (2011)

Std

Standard deviation of the light curve Nun et al. (2015)

StetsonK

Robust kurtosis measure Kim et al. (2011)

Further description of some relevant detectionfeatures

Table 2 summarizes the deﬁnitions of the detectionfeatures used by the light curve classiﬁer. However, someof these features are worth describing in more detail,either because they are new features, or because thechanges we made to them were suﬃciently relevant thatfurther explanation is in order (see also Appendix A): • Multiband Period:The period is estimated using the Multi Har-monic Analysis of Variance (MHAOV) peri-odogram (Schwarzenberg-Czerny 1996), which isbased on ﬁtting periodic orthogonal polynomialsto the data. A single period estimate per lightcurve is computed by ﬁtting both bands usingthe MHAOV multiband extension proposed byMondrik et al. (2015). We denote this period as

Multiband period . For sources with detectionsonly in g or only in r , the Multiband period re-ports the single band period. To avoid overﬁttingthe data when few samples are available we setthe number of harmonics to one. This might notcapture the best period for non-sinusoidal lightcurves, e.g., detached and semi-detached eclips-ing binaries, returning a harmonic instead. Wefound that having a harmonic of the true period is in general suﬃcient to classify non-sinusoidal lightcurves correctly given that other features such asthe power rate are included. We choose MHAOVfor this analysis as it provides a good trade-oﬀ be-tween performance and computational complexity. • Periodogram Pseudo Entropy:To have an estimate of the conﬁdence of thecandidate period (obtained with the multibandMHAOV method), we developed a heuristic basedon the entropy of the normalized periodogrampeaks, which we denote as periodogram pseudoentropy (

PPE ). This value is computed by recov-ering the 100 largest values of the periodogram,normalizing them so they add to 1 and computingthe entropy of that vector. This feature is com-puted as

P P E = 1 + 1log(100) (cid:88) i =1 (cid:16) p i Z (cid:17) log (cid:16) p i Z (cid:17) , (1)where p i is the value of the i -th largest peak of theperiodogram and Z = (cid:80) i =1 p i . This feature takesvalues between zero (no clear period stands out)and one (periodogram has a single large peak).0 S´anchez-S´aez et al. • Power Rate: Ratio between the power of themultiband periodogram obtained for the best pe-riod candidate ( P ) versus the power of the multi-band periodogram obtained for n times this pe-riod [ Power rate n = P ower ( P ) /P ower ( n × P )],where n can take values of 2, 3, 4, 1 /

2, 1 /

3, and1 /

4. We computed these ratios in order to detectcases where we measure an aliased multiple of theperiod instead of the true period, which is partic-ularly a common issue for some classes of eclipsingbinaries (e.g., Catelan et al. 2013; Graham et al.2013; McWhirter et al. 2018; VanderPlas 2018, andreferences therein). • Harmonics parameters: Harmonic series (Stelling-werf & Donohoe 1986) are commonly used tomodel and classify periodic light curves (Deboss-cher et al. 2007; Sarro et al. 2009; Richards et al.2011; Elorrieta et al. 2016). In this work we ﬁt aharmonic series up to the seventh harmonic, ac-cording to the expression y ( t j ) = (cid:88) k =1 (cid:20) A k cos (cid:18) πkt j P (cid:19) + B k sin (cid:18) πkt j P (cid:19)(cid:21) + C, (2)where t j corresponds to the observational time ofthe j -th detection, P is the best candidate pe-riod computed from the multiband periodogramas above, and y ( t j ) is the magnitude estimatedby the harmonic model. Even though we use the Multiband period , the harmonic model is com-puted using the detections in each band indepen-dently. A k , B k and C for k = 1 , . . . , A k ,B k J J (cid:88) j =1 [mag( t j ) − y ( t j )] sigma( t j ) , (3)where J is the number of observations, mag( t j ) isthe observed magnitude at time t j , and sigma( t j )is the observational error at time t j . The solutionto the weighted least squares optimization problemis found using the Moore-Penrose pseudoinverse. This solution has the additional property of havingthe minimum Euclidean norm when the problemis underdetermined (Ben-Israel & Greville 2003),which in this case corresponds to having less than15 observations.Once the parameters are learnt, equation 2 can berewritten as y ( t j ) = (cid:88) k =1 M k cos (cid:18) πkt j P − φ k (cid:19) + C, (4)with M k = (cid:112) A k + B k and φ k = arctan( B k /A k ).In this way, the harmonics are now described bythe amplitude and phase of each component. Themodel is shifted in time in order to have zero phasein the ﬁrst harmonic, which is done following theexpression φ (cid:48) k = φ k − kφ , replacing φ k by φ (cid:48) k inEq. 4.Finally, the parameters M k for k = 1 , . . . , φ (cid:48) k for i = 2 , . . . , Harmonics mag 1 ,...,

Harmonics mag 7 , Harmonics phase 2 , ...,

Harmonics phase 7 , Harmonics mse , respectively. • Supernova parametric model (SPM): Villar et al.(2019b) introduced an analytic model describingSN light curves as a six parameter function, whichthey used to characterize and classify SN lightcurves from the Pan-STARRS1 Medium-deep Sur-vey (Chambers et al. 2016). This model is anextension of previous empirical eﬀorts to analyti-cally describe supernova light curves, including theeﬀects of diﬀerent explosion times, normalizationfactors, initial rise timescales, rate of decline afterpeak, plateau lengths, or tail decay timescales. Weintroduce two modiﬁcations to this model. Firstwe reparametrize the function to always remainpositive in a simple validity range, i.e., one deﬁnedby a set of inequalities involving at most one vari-able per inequality, changing the validity range ac-cordingly. After the ﬁrst modiﬁcation, the modelis the following: F =  A (cid:16) − β (cid:48) t − t t − t (cid:17) (cid:16) − t − t τ rise (cid:17) if t < t A (1 − β (cid:48) ) exp (cid:16) − t − t τ fall (cid:17) (cid:16) − t − t τ rise (cid:17) if t ≥ t , (5) he ALeRCE Light Curve Classifier γ ≡ t − t as a parameter in-stead of t . This function is positive valued when A > γ > τ rise > τ fall > < β (cid:48) < σ ( t ) = 1 / (1 + exp( − t )), which allows a soft tran-sition between zero and one. As the parameter t deﬁnes the transition between the two pieces of themodel in Eq. 5, it cannot be optimized properlyusing ﬁrst-order methods. Our proposed modelallows using this technique eﬀectively to learn theparameters of the model, which is given by thefollowing equation: F = A (cid:16) − β (cid:48) t − t t − t (cid:17) (cid:16) − t − t τ rise (cid:17) · (cid:20) − σ (cid:18) t − t (cid:19)(cid:21) + A (1 − β (cid:48) ) exp (cid:16) − t − t τ fall (cid:17) (cid:16) − t − t τ rise (cid:17) · (cid:20) σ (cid:18) t − t (cid:19)(cid:21) . (6)In this particular model, for all the sources weuse the light curves based on the diﬀerence im-ages ( lc diff ). This is done to avoid the con-tamination from unrelated host galaxy emission,which can distort the real shape of the SNe lightcurves. We also subtract from t the MJD value ofthe ﬁrst detection observed for a given source. Wecomputed A ( SPM A ), β (cid:48) ( SPM beta ), t ( SPM t0 ), γ ( SPM gamma ), τ rise ( SPM tau rise ), and τ fall ( SPM tau fall ), for each band independently. Inaddition, we computed the reduced χ of the ﬁt forthe light curve, denoted as SPM chi . The param-eters are found using the function curve fit pro-vided by the Scipy library (Virtanen et al. 2020). • Irregular autoregressive (IAR) model: Eyhera-mendy et al. (2018) introduced this model. It is adiscrete-time representation of the continuous au-toregressive model of order 1 [CAR(1) or DRW],which has desirable statistical properties such asstrict stationarity and ergodicity without a distri-butional assumption. The IAR model is deﬁnedby y t j = φ t j − t j − y t j − + σ (cid:113) − φ t j − t j − ) ε t j , (7) where ε t j is a white noise sequence with zero meanand unit variance, σ is the standard deviationof y t j , and { t j } are the observational times for j = 1 , . . . , n . We used a modiﬁed version of theIAR model, which considers the estimated vari-ance of the measurement errors δ t j in the likeli-hood of the model. Thus, by assuming a Gaussiandistribution, the negative log-likelihood of the pro-cess is given by (cid:96) ( θ ) = n π ) + 12 n (cid:88) j =1 log ν t j + 12 n (cid:88) j =1 e t j ν t j , (8)where e t = y t , ν t = σ + δ t , and ˆ y t = 0are the initial values, while ˆ y t j = φ t j − t j − y t j − , e t j = y t j − ˆ y t j , and ν t j = σ (1 − φ t j − t j − ) ) + δ t j for j = 2 , . . . , n . Particularly, φ describes the au-tocorrelation function of order 1 for a given lightcurve. We computed the maximum likelihood esti-mation of the parameter φ (obtained directly fromthe light curves), and we used this as a featurefor our classiﬁer. We denoted this parameter as IAR phi . • Mexican Hat Power Spectrum (MHPS): Ar´evaloet al. (2012) proposed a method to compute low-resolution power spectra from data with gaps,where the light curves are convolved with a Mexi-can hat ﬁlter: F ( x ) ∝ (cid:104) − x σ (cid:105) e − x / σ . Gaps,or generally uneven sampling, are corrected forby convolving a unit-valued mask with the samesampling as the light curve and dividing the con-volved light curve by it. This method can beused to isolate structures with a characteristictimescale ( t ∼ σ/ √ π ) in a given light curve,in order to estimate the light curve variance as-sociated with that timescale. We compute thelight curve variance at two diﬀerent timescales of10 and 100 days. The variance associated withthe 10 day timescale (“high” frequency) is denoted MHPS high , while the variance associated with the100 day timescale (“low” frequency) is denoted

MHPS low . We also compute the ratio betweenthe low and high frequency variances for a givenband, denoted as

MHPS ratio . The logarithm of

MHPS ratio is therefore an estimate of the powerlaw slope of the power spectrum of the light curve.3.2.

Non-detection Features

For each detection, the ZTF alert stream includes 5- σ magnitude limits ( diffmaglim ), which are computed2 S´anchez-S´aez et al. from the g and r diﬀerence images of the same areaof the sky obtained in the previous 30 days, wherethe target associated with the alert was not detected(non-detections). These non-detections are very infor-mative, since they can, for instance, inform us whethera transient has not been detected before; whether a non-variable source has begun to exhibit a variable behav-ior; or which range of observed magnitudes we shouldexpect to measure when there are not signiﬁcant diﬀer-ences between the science and template images, and analert is not generated. The light curve classiﬁer uses ninediﬀerent features deﬁned using all the non-detectionsassociated with a given source, computed for both g and r bands, yielding a total of 18 non-detection fea-tures. Note that all these features are new, and havenot been used before for classiﬁcation. Table 3 lists thenon-detection features used by the light curve classiﬁer.As before, non-detection features ending with “ 1” arecomputed for the g band, and features ending with “ 2”are computed for the r band. CLASSIFICATION ALGORITHMSThe labeled set used in this work presents a very highimbalance (see Table 1). For instance, QSOs represent75.4% of the stochastic sources, while CV/Novae repre-sent just 2.5%. To deal with this issue, we looked forML algorithms available in the literature that are de-signed to mitigate the imbalance problem. In particular,we worked with the imbalanced-learn

Python package(Lemaˆıtre et al. 2017).

Imbalanced-learn includes im-plementations of several re-sampling algorithms that arecommonly used to handle data sets with strong between-class imbalance. The algorithms available in this pack-age are fully compatible with scikit-learn methods.In the following sections we describe the Random For-est (RF) algorithm used by the light curve classiﬁer, aswell as other tested ML algorithms. To train each clas-siﬁer we randomly split the labeled set into a trainingset (80%) and testing set (20%) in a stratiﬁed fashion,preserving the percentage of samples for each class.4.1.

Balanced Hierarchical Random Forest

A Decision Tree (Rokach & Maimon 2008) is a pre-dictive algorithm that uses a tree structure to performsuccessive partitions on the data according to a certaincriterion (e.g., a cut-oﬀ value in one of the descriptorsor features) and produces possible decision paths, pro-viding a ﬁnal outcome for each path (the leaves of thetree). Decision Trees are commonly used for classiﬁ-cation, where each ﬁnal leaf is associated with a givenclass. RFs (Breiman 2001) are algorithms that buildmultiple Decision Trees, where each tree is trained us- ing a random sub-sample of elements from a given train-ing set, selected allowing repetition (bootstrap sample ofthe training set), and using a random selection of fea-tures. The ﬁnal classiﬁcation is obtained by averagingthe classiﬁcations provided by each single tree. This av-erage score can be interpreted as the probability ( P RF )that the input element belongs to a given class. One ofthe main advantages of RF is that it naturally providesa ranking of features for the classiﬁcation, by countingthe number of times each feature is selected to split thedata.Chen et al. (2004) proposed a modiﬁed RF that candeal with the imbalanced data classiﬁcation. In theirmodel each individual tree is trained using a sub-sampleof the training set that is deﬁned by generating a boot-strap sample from the minority class, and then ran-domly selecting the same number of cases, with replace-ment, from the majority classes. Imbalanced-learn im-plements the balanced RF classiﬁer proposed by Chenet al. (2004). For the ALeRCE light curve classiﬁer weuse their

BalancedRandomForestClassifier method,selecting the hyper-parameters (number of trees, maxi-mum number of features per tree, and maximum depthof each tree) with a K-Fold Cross-Validation procedureavailable in scikit-learn , with k = 5 folds and usingthe “macro-recall” as target score (see its deﬁnition inSection 5.1).4.1.1. The hierarchical classiﬁer approach

Considering the hierarchical structure of the taxon-omy (see Section 2.2), we decided to construct a bal-anced hierarchical RF (BHRF) classiﬁer. There are var-ious approaches to perform a hierarchical classiﬁcation(e.g., Silla & Freitas 2011; Naik & Rangwala 2018). Inthe particular case of the light curve classiﬁer, we usea two-level scheme. The ﬁrst level consists of a sin-gle classiﬁer that separates the sources into three broadclasses. The second level consists of three distinct clas-siﬁers, which further resolve each class in the ﬁrst levelinto subclasses. We then use the probabilities obtainedfor each independent classiﬁer to obtain the ﬁnal classi-ﬁcation.In more detail, the ﬁrst level (top level hereafter) con-sists of a single classiﬁer which classiﬁes every sourceas periodic, stochastic, or transient. The second level(bottom level hereafter) consists of three distinct clas-siﬁers: Transient, Stochastic, and Periodic. The classesconsidered by each of these three classiﬁers are the onesshown in Table 1 and Figure 3. Each classiﬁer in thebottom level is trained using a training subset havingonly those classes included in the primary top class (forinstance, the Transient classiﬁer only includes sources he ALeRCE Light Curve Classifier Table 3.

List of non-detection features used by the light curve classiﬁer. Note that “x” stands for either g or r bands. Feature Description dmag first det fid

Diﬀerence between the last non-detection diffmaglim in band “x” before the ﬁrst detectionin any band and the ﬁrst detected magnitude in band “x” dmag non det fid

Diﬀerence between the median non-detection diffmaglim in the “x” band before the ﬁrstdetection and in any band the minimum detected magnitude (peak) in the “x” band last diffmaglim before fid

Last non-detection diffmaglim in the “x” band before the ﬁrst detection in any band max diffmaglim before fid

Maximum non-detection diffmaglim in the “x” band before the ﬁrst detection in any band max diffmaglim after fid

Maximum non-detection diffmaglim in the “x” band after the ﬁrst detection in any band median diffmaglim before fid

Median non-detection diffmaglim in the “x” band before the ﬁrst detection in any band median diffmaglim after fid

Median non-detection diffmaglim in the “x” band after the ﬁrst detection in any band n non det before fid

Number of non-detections in the “x” band before the ﬁrst detection in any band n non det after fid

Number of non-detections in the “x” band after the ﬁrst detection in any band classiﬁed as SNIa, SNIbc, SNII, and SLSN). It is im-portant to note that these four classiﬁers are indepen-dent and process the same input features set describedin Section 3. The ﬁnal classiﬁcation is constructedby multiplying the probabilities obtained for each classof the top level [ P top ( transient ), P top ( stochastic ), and P top ( periodic )] with the individual probabilities ob-tained by their correspondent classiﬁer in the bottomlevel. Namely, the probabilities of the Transient classi-ﬁer ( P T ) are multiplied by P top ( transient ), the prob-abilities of the Stochastic classiﬁer ( P S ) are multipliedby P top ( stochastic ), and the probabilities of the Peri-odic classiﬁer ( P S ) are multiplied by P top ( periodic ). Wedenote the product of these probabilities as P . For in-stance, the probability of a given source being an RRLcorresponds to the product of its probability of beingperiodic (according to the top level) and its probabilityof being an RRL (according to the Periodic classiﬁer): P ( RRL ) = P top ( periodic ) · P P ( RRL ) , while the probability of being a Blazar is computed as: P ( Blazar ) = P top ( stochastic ) · P S ( Blazar ) . Following this, the sum of the probabilities of the 15classes for a given source adds up to one. Finally, theclass of a given object is determined by selecting theclass with the maximum P .The best cross-validation performance was obtainedwith the following hyper-parameter setting: 500 trees ineach classiﬁer, maximum depth trees (the nodes are ex-panded until all leaves are pure), and a maximum num-ber of features equal to the square root of the total num-ber of features, except for the Stochastic classiﬁer, wherewe used 20% of the features. In Section 5 we present the results obtained when applying the BHRF classiﬁer tothe ZTF data.4.2. Additional ML algorithms tested

In addition to RF, we also tested two other super-vised classiﬁcation algorithms: Gradient Boosting andMultilayer Perceptron. These tests were done as a com-plementary analysis, with the purpose of guiding futureeﬀorts in improving the light curve classiﬁer.None of these methods has a Python implementationparticularly designed to handle imbalanced data sets;however, using imbalanced-learn we can generate bal-anced training sets. We present the results obtainedusing both classiﬁers in Section 5.4.2.1.

Gradient Boosting

Gradient Boosting (GBoost; Friedman 2001) is anML algorithm that uses an ensemble of weak predic-tion models (e.g., Decision Trees) to produce a morerobust classiﬁcation. The method implements a boost-ing algorithm (using a Gradient Descent algorithm) thattrains a sequence of weak models, each compensatingthe weaknesses of its predecessors. eXtreme GradientBoosting ( XGBoost ; Chen & Guestrin 2016) is a pack-age available in several computing languages (includingPython) that implements GBoost algorithms for classiﬁ-cation and regression in an eﬃcient and scalable way. Ithas become one of the most used packages for regressionand classiﬁcation in recent years.For the case of GBoost we followed the sametwo-level hierarchical strategy described in Section4.1.1. However, since the current version of the

XGBoost multi-class classiﬁer was not designed todeal with highly imbalanced data sets (e.g., Wanget al. 2019), we tested a model that uses

XGBoost S´anchez-S´aez et al. and is trained with a balanced training set. Weconstructed this balanced training set using the

RandomUnderSampler and

RandomOverSampler meth-ods available in imbalanced-learn . For the case of thetop level, Periodic, and Stochastic classiﬁers, we con-structed a balanced training set by generating 10 ran-dom samples using the

RandomUnderSampler method,resampling all classes (‘all’ sampling strategy), and con-catenating them, in order to obtain a training set withmore than 10,000 objects in total for each classiﬁer.For the case of the Transient classiﬁer we used the

RandomOverSampler method, resampling all classes, togenerate one random sample with ∼

600 objects. Eachclassiﬁer uses the default hyper-parameters deﬁned bythe

XGBoost

Python package, with the exception of theboosting rounds, where we used 500, and the objectivefunction, which was set up to do multi-class classiﬁca-tion, using the softmax function (‘multi:softmax’). As inthe case of the BHRF model, the class of a given objectis determined by selecting the class with the maximumprobability (obtained by multiplying the probabilities ofthe top and bottom levels).4.2.2.

Multilayer Perceptron

Artiﬁcial neural networks (ANNs) are mathematicalmodels inspired by the human brain. ANNs are com-posed of elemental computational units called neurons(Haykin 1994). ANNs can be used to perform complextasks such as classiﬁcation or regression. A MultilayerPerceptron (MLP) corresponds to an ANN whose neu-rons are ordered by layers, where all neurons belong-ing to a given layer receive the same input vector andeach unit processes this vector independently accordingto its own parameters. The outputs of all neurons ina layer are grouped and form the input vector for thenext layer. For the case of classiﬁcation, when usingthe softmax activation function, the ﬁnal layer providesthe probabilities that a given element belongs to a givenclass. One way of obtaining the ﬁnal class is to assignthe label with the maximum probability in the outputlayer.For this model, we also followed the two-level hierar-chical strategy described in Section 4.1.1. We tested dif-ferent MLP architectures, changing the number of layersand the number of neurons per layer. We used the

KerasAPI provided by the Python version of

TensorFlow 2.0 (Abadi et al. 2016). We split the original training setdeﬁned above into a new training set (80%) and a valida-tion set (20%). In order to deal with the high imbalanceof the training set we used the balanced mini-batchesgenerator for Keras provided by the imbalanced-learn

Python package. The best performance (considering the categorical cross-entropy loss and accuracy curvesfor the training and validation sets) was obtained usingMLPs with two hidden layers with 256 and 128 neuronsfor all the classiﬁers (top level, Transient, Stochastic,and Periodic). Regularization via the dropout method(Srivastava et al. 2014) is used to prevent overﬁtting.The dropout fraction is set at 0.50. RESULTS5.1.

Results for the BHRF classiﬁer

In order to test the performance of our BHRF classiﬁerwe generated 20 diﬀerent training and testing sets usingthe

ShuffleSplit iterator provided by scikit-learn ,which uses random permutations to split the data, usingeach time 80% of the labeled set as training set and 20%as testing set, preserving the percentage of samples foreach class in the original labeled set. Then, we trained20 diﬀerent BHRF models using each training set, andtested their performance using the corresponding testingsets. We emphasize that the testing sets are never usedin training their respective models.Table 4 lists three diﬀerent scores: precision, recall,and F1-score. Despite the high imbalance present in thelabeled set, all classes are equally important, thus wecomputed macro-averaged scores:Precision = 1 n cl n cl (cid:88) i =1 T P i T P i + F P i , (9)Recall = 1 n cl n cl (cid:88) i =1 T P i T P i + F N i , (10)F1-score = 2 × Precision × RecallPrecision + Recall , (11)where n cl is the total number of classes, T P i is the num-ber of true positives, F P i the number of false positives,and F N i is the number of false negatives, for a givenclass i . For the particular case of the BHRF classiﬁer,Table 4 reports the mean and the standard deviationof the macro-averaged scores obtained by the 20 modelswhen applying them to their respective testing sets.In addition, Figures 6 and 7 show the confusion matri-ces obtained for the top and bottom levels, respectively.To generate these confusion matrices we used the resultsobtained when applying each of the 20 BHRF models totheir corresponding testing sets, providing for each levelthe median and 5 and 95 percentiles of the confusionmatrices obtained by each of the 20 models.The confusion matrix of the top level shows that theclassiﬁer can recover more than 97% of the true labels,and that the contamination between classes is below 3%.The scores obtained reﬂect the good performance of thetop level classiﬁer. he ALeRCE Light Curve Classifier Table 4.

Macro-averaged scores obtained for each classiﬁer in the testing set.Classiﬁer Precision Recall F1-scoreBHRF - top 0.96 ± ± ± ± ± ± T r a n s i e n t S t o c h a s t i c P e r i o d i c Predicted label

TransientStochasticPeriodic T r u e l a b e l +00 +10 +00 +00 +01 +00 +00 +00 +01 Figure 6.

Confusion Matrix for the top level BHRF clas-siﬁer. The confusion matrix was obtained by generating 20diﬀerent training and testing sets, and by training 20 in-dependent models using each training set separately. Afterthe training, each model is applied to their respective test-ing set. We provide the median and 5 and 95 percentilesof the confusion matrices obtained for the 20 testing sets.The reported values correspond to the percentages obtainedby normalizing the matrices with the true labels. This levelshows a high degree of accuracy with a low percentage ofmisclassiﬁcations.

For the case of the bottom level we obtained an F1-score of 0.59, implying signiﬁcant confusion betweenclasses. From Figure 7 we can see that the fraction oftrue positives in the confusion matrix of the bottom levelhas values between 50% and 100%, with mean, medianand standard deviation of 78%, 76%, and 14%, respec-tively. In addition, from the ﬁgure it can be observedthat the confusion is most often observed among classeswith similar characteristics, like among the SN classes(particularly among SNIa versus SNIbc and SNII ver-sus SLSN); among Blazar, AGN, and QSO classes; andamong various periodic classes. The highest standarddeviation of the predictions is observed for the case of SLSN. This is a result of the low number of SLSN in thelabeled set.To complement our analysis, in Appendix B we pro-vide the results obtained by a one-level multi-class RFmodel. From its results we can conclude that the lightcurve classiﬁer improves considerably when a hierarchi-cal strategy is followed.5.1.1.

Comparison with the GBoost and MLP classiﬁers

In this work we tested two other classiﬁers: GBoostand MLP. For these models we present the results ob-tained by using 80% of the labeled set as a training set,and the remaining 20% as a testing set, preserving thepercentage of samples for each class in the original la-beled set. The scores obtained by these classiﬁers in thetesting set are shown in Table 4. The confusion matricesobtained for the bottom level of the GBoost and MLPclassiﬁers in the testing set are presented on the left andright sides of Figure 8, respectively.The precision and F1-score obtained by GBoost arein general better than the ones obtained by the BHRFclassiﬁer, with the exception of the recall score of thebottom level. However, the fraction of true positives inthe confusion matrix of the bottom level range between5% and 100%, with a mean, median, and standard de-viation of 72%, 83%, and 29%, respectively, which ex-plain the lower recall obtained by GBoost, compared tothe BHRF classiﬁer. In addition, the classes with thelargest fraction of true positives in the confusion matrixof GBoost correspond to the most populated classes inthe labeled set, like QSOs, which represent 75.4% of thestochastic sources; SNIa, which represent 74.0% of thetransients; or LPV, E, and RRL, which represent 16.2%,43.5%, and 37.3% of the periodic sources, respectively.This is not observed in the results obtained by BHRF,where there is no evidence of correlation between therepresentativity of a given class and its fraction of truepositives in the confusion matrix shown in Figure 7.The results obtained using GBoost are promising. Weobtained good scores although the current versions ofGBoost available in the literature have not been de-signed to deal with high imbalance for the case of multi-6

S´anchez-S´aez et al.

Figure 7.

As in Figure 6, but for the bottom level of the BHRF classiﬁer. The black squares highlight the three classes of thetop level (from top to bottom, transient, stochastic, and periodic, respectively). This matrix is quite diagonal, but shows moremisclassiﬁcation among related subtypes compared to the matrix obtained for the top level. class classiﬁcation. Therefore, further eﬀorts should bedone to implement GBoost in future versions of the lightcurve classiﬁer. In particular, new implementations ofGBoost for multi-class classiﬁcation that follow similarapproaches to the ones proposed by Chen et al. (2004)or Wang et al. (2019) should be tested, as should com-binations of GBoost with data augmentation techniques(i.e., generating synthetic light curves of less populatedclasses using physical and/or statistical models). For the case of the MLP classiﬁer the scores obtainedare in general lower compared to BHRF and GBoost.Its confusion matrix for the bottom level (see Figure8) presents the same issues already discussed for thecase of GBoost. The fraction of true positives in theconfusion matrix ranges between 11% and 98%, with amean, median, and standard deviation of 69%, 74%, and22%, respectively. Therefore, we conclude that more he ALeRCE Light Curve Classifier

Results for the BHRF classiﬁer excluding AllWISEdata

We tested a version of the BHRF classiﬁer that ex-cludes features computed using AllWISE data. Themacro-averaged precision, recall, and F1-score of thetop level are 0.93, 0.97, and 0.95, respectively. For thebottom level the macro-averaged precision, recall, andF1-score are, respectively, 0.53, 0.72, and 0.55. Thesescores are slightly smaller than the ones obtained us-ing the original version of the BHRF classiﬁer. As canbe observed in the confusion matrix shown in Figure9, the stochastic classes are the most aﬀected by thelack of AllWISE data, particularly YSOs and Blazars,whose fraction of true positives decreased 10% and 16%,respectively. This happens because of the similaritiesobserved between the light curves of these and otherclasses, which are not easily separated using variabilityfeatures alone. However, the results obtained by thisversion of the classiﬁer are still good enough to be usedin the case that AllWISE data are not available, as oc-curs, for instance, for faint objects.5.2.

Performance of the BHRF classiﬁer as a functionof magnitude and number of detections

As can be seen in Figure 5 for some classes, like YSOs,CEPs, and LPVs, a non-negligible fraction of sources inthe labeled set have photometry available only in oneband. It is therefore important to know how well theclassiﬁer behaves when a single band is available for agiven source.To evaluate this, we created 20 new testing sets de-ﬁned considering only those sources with ≥ ShuffleSplit iterator (see Section 5.1).We then classiﬁed each new testing set with its respec-tive model, considering: a) the features available for the g and r bands, b) the features available only for the g band, and c) the features available only for the r band.Figures 10 and 11 illustrate the results of this analy-sis. Figure 10 shows the recall as a function of the av- erage magnitude for each subclass, with the exceptionof the SN classes which are grouped in one class calledSN, while Figure 11 shows the recall as a function ofthe number of detections. From both ﬁgures we can in-fer that in general the best results are obtained whenphotometry from both g and r are available, with theexceptions of QSO, CEP and Periodic-Other classes.From Figure 10 we can also conclude that the relia-bility of the classiﬁcation versus the average magnitudeis diﬀerent for each class. These distributions in gen-eral follow the magnitude distribution in the labeled setof each class considered in this model (with the excep-tion of RRL). For instance, for the labeled set, the CEPclass corresponds to one of the brightest classes, hav-ing in general r <

16, while the LPV class covers abroader range of magnitudes. On the other hand, fromFigure 11, we can infer that in general the classiﬁcationimproves when more detections are available in bothbands, with the exceptions of the QSO and Periodic-Other classes.The results obtained for Periodic-Other are not sur-prising since this class includes all the periodic classesnot considered in the classiﬁer (including several diﬀer-ent types of pulsating stars, as well as the rotationalvariables). The results observed for the CEP class areprobably due to the large fraction of CEPs in the la-beled set with photometry only in the g band, whichis produced by the saturation limit of the ZTF survey(12.5 to 13.2 magnitudes), and the fact that CEPs tendto be very bright particularly in the r band, and thusfor bright CEPs the r band light curve is not available.For SNe, there is a decrease in the recall curve in the g band, presumably due to the fact that in general the g -band light curves of SNe tend to decay faster than the r -band light curves, producing shorter (and thus fewerdetections) g -band light curves. This trend can be seenin the SN shown in Figure 2, as well as more generally inthe light curve statistics for SNe. The average numberof detections of SNe light curves is 12 and 16 in the g and r bands, respectively, and the total time length ofSNe light curves corresponds to 53 and 64 days in the g and r bands, respectively. The low recall obtained forbright RRL when only the g band is available may beproduced by diﬀerences in the variability features mea-sured for diﬀerent RRL sub-types. This issue is furtherdiscussed in Appendix C. The results obtained for AGNsand QSOs are likely related with incorrect labels, whichwe discuss further in Appendix D.5.3. The deployed BHRF classiﬁer

In order to use the BHRF classiﬁer to classify the ZTFalert stream we need to train a single BHRF model. We8

S´anchez-S´aez et al.

Figure 8.

Confusion matrices of the bottom level obtained when applying the GBoost (left) and the MLP (right) classiﬁersto the testing set. The black squares highlight the three hierarchical classes (from top to bottom, transient, stochastic, andperiodic, respectively). The reported values correspond to the percentages obtained by normalizing the matrix with the truelabels. These matrices present a high percentage of misclassiﬁcations compared to the bottom level of the BHRF model (Figure7).

Figure 9.

As in Figure 7, but for a model that excludesAllWISE data. We drop the errors, which are comparable tothose listed in Figure 7, for simplicity. The results obtainedfor this model are reasonable, although the classiﬁcation ofsome stochastic classes are aﬀected by the lack of AllWISEdata. call this model “the deployed BHRF classiﬁer”. As inthe previous sections, we trained the deployed BHRFmodel using 80% of the labeled set as the training setand the remaining 20% as the testing set. Table 5 showsthe classiﬁcation report obtained for the bottom levelof this classiﬁer. From the table we can notice thatin general the precision scores are lower than the recallscores. The results obtained for SNIa, SNII, QSO, AGN,YSO, CV/Nova, LPV, E, and RRL are good, with F1-score values higher than or equal to 0.60. This tablealso shows that some classes, such as the the SNIbc,Periodic-Other, DSCT, and CEP classes, require fur-ther work in order to improve the results obtained bythe classiﬁer; this may involve data augmentation pro-cedures, and better period estimations, as discussed inthe following sections.5.3.1.

Feature ranking of the deployed BHRF classiﬁer

In Table 6 we list the feature ranking (top 30) foreach classiﬁer within the two-level BHRF classiﬁer (toplevel, Transient, Stochastic, and Periodic). The featureranking is computed considering which features separatebetter the subclasses within each classiﬁer, with moreinformative features having higher ranks (for more de-tails see Hastie et al. 2009). From the table we can seethat for all the classiﬁers, a considerable fraction of thetop 30 features correspond to colors computed using the he ALeRCE Light Curve Classifier

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0 R e c a ll SN

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0

QSO

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0

AGN

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0 R e c a ll Blazar

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0

YSO

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0

CV/Nova

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0 R e c a ll LPV

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0 E

13 14 15 16 17 18 19 20 210.00.20.40.60.81.0

DSCT

13 14 15 16 17 18 19 20 21 average magnitude R e c a ll RRL

13 14 15 16 17 18 19 20 21 average magnitude

CEP

13 14 15 16 17 18 19 20 21 average magnitude

Periodic-Other g+rgr

Figure 10.

Recall for each stochastic and periodic subclass, as well as all transients (grouped as SN), as a function of theaverage magnitude. The x-axis ranges from 13 to 21 magnitudes, this range includes ∼

90% of the sources. In black triangles weshow the recall curves obtained when g and r photometries are available (considering the average magnitude in the g band), ingreen circles when only the g band is available, and in red squares when only the r band is available. The shaded regions wereobtained by generating 20 diﬀerent training and testing sets, and training 20 independent models using each of these sets. Wereport the median and 5 and 95 percentile values obtained from the 20 models. There is a truly wide variety of behaviors (seediscussion in the text). S´anchez-S´aez et al. R e c a ll SNe QSO AGN R e c a ll Blazar YSO CV/Nova R e c a ll LPV E DSCT R e c a ll RRL CEP Periodic-Other g+rgr

Figure 11.

As in Figure 10, but plotting the Recall as a function of the number of detections in the light curve (in logarithmicscale). The x-axis ranges from 6 to 150 detections, this range includes ∼

90% of the sources. Again, there is a wide variety ofbehaviors (see discussion in the text). he ALeRCE Light Curve Classifier Table 5.

Classiﬁcation report of the bottom level of thedeployed BHRF classiﬁer. The last row shows the macroaverage of each scoreClass score precision recall f1-scoreSNIa 0.92 0.76 0.83SNIbc 0.14 0.58 0.23SNII 0.69 0.55 0.61SLSN 0.28 1.00 0.43QSO 0.97 0.87 0.92AGN 0.65 0.85 0.73Blazar 0.40 0.75 0.53YSO 0.73 0.78 0.75CV/Nova 0.52 0.71 0.60LPV 0.99 0.98 0.98E 0.95 0.74 0.83DSCT 0.15 0.89 0.26RRL 0.95 0.88 0.91CEP 0.18 0.75 0.29Periodic-Other 0.14 0.73 0.24macro avg 0.58 0.79 0.61

AllWISE and ZTF photometry, as well as new detectionfeatures (i.e., features not included in the FATS pack-age) and non-detection features. Moreover, it can beobserved that the ranking of features changes for eachclassiﬁer.The top level classiﬁer is dominated by diﬀerent typesof features: ZTF and ALLWISE colors, morpholog-ical properties of the images ( sgscore1 ), variabilityfeatures related with the amplitude of the variabilityat short and long timescales (

MHPS low , GP DRW sigma , Meanvariance , ExcessVar , and

SPM A ), variability fea-tures that detect smooth decrease or increase of the lu-minosity (

LinearTrend , SPM tau rise , SPM tau fall ),features related with the quality of a supernovaparametric model ﬁtting (

SPM chi ), and features re-lated with transient appearance or disappearance( positive fraction , and n non det after fid ).On the other hand, the Transient classiﬁer is dom-inated by the SPM features (e.g.,

SPM beta , SPM t0 , SPM tau rise , and

SPM tau fall ). Other relevant fea-tures are the optical colors in the peak and the meanof the light curve, measured from the diﬀerence im-age light curves, features that detect smooth increaseor decrease of the observed ﬂux (

LinearTrend ), fea-tures that measure the level of correlation in the lightcurve (

IAR phi ), features related with the amplitude ofthe variability (

MHPS low ), and features related with theappearance of a transient source ( dmag first det fid , last diffmaglim before fid 1 ). Note that SN rise re- lated features, such as SPM t0 and

SPM rise , are some ofthe most relevant features for the classiﬁcation of tran-sients, and are crucial for the early classiﬁcation of SNe.Also, note that

SPM t0 is not the explosion time, butsome characteristic time where the SN has risen signiﬁ-cantly.For the Stochastic classiﬁer, 12 of the top 30 fea-tures are related with color, morphology and distanceto the Galactic plane, and the rest correspond tofeatures related with the amplitude of the variabil-ity observed at diﬀerent time scales (e.g.,

ExcessVar , SPM A , Meanvariance , GP DRW sigma , and

Amplitude ),and features related with the time scale of the variability(

IAR phi , GP DRW tau ).Finally, the Periodic classiﬁer is clearly dominated bythe

Multiband period feature, but also by diﬀerent col-ors, by features related with the amplitude of the vari-ability (e.g., delta mag fid , Amplitude , ExcessVar , Meanvariance , and

GP DRW sigma ), and features relatedwith the timescale of the variations (e.g.,

GP DRW tau ,and

IAR phi ).5.3.2.

Results for the unlabeled ZTF set

We now turn to discuss the results obtained when ap-plying the ﬁnal BHRF classiﬁer to all the sources in ZTFwith ≥ g or ≥ r . Consid-ering the data obtained by ZTF until 2020/06/09, thereare 868,371 sources that satisfy this condition, hereafterdeﬁned as the “unlabeled ZTF set”. We deﬁne the classof a given object in the unlabeled ZTF set by selectingthe class with the maximum probability obtained by thedeployed BHRF classiﬁer. However, users of this clas-siﬁer can use the obtained probabilities to make theirown probability cuts and select samples for their science.The features, classiﬁcations and probabilities obtainedfor the unlabeled ZTF set with data until 2020/06/09are provided in a catalog that can be downloaded athttp://ml.alerce.online/lc pred.It is important to note that the classiﬁcations obtainedby the light curve classiﬁer are updated every day, asnew alerts are received. Whenever a new alert is receivedfor a given source, the ALeRCE pipeline recomputes itsvariability features and provides an updated classiﬁca-tion. These updated classiﬁcations can be found at theALeRCE Explorer website, using the “light curve clas-siﬁer” tab, and specifying the desired class. Consideringthe results shown in Figure 11, we expect that the qual-ity of the classiﬁcation for a given source will improveas more detections are added to the light curve. In ad-dition, with new alerts, more objects will satisfy thecondition of having ≥ g or ≥ r , and thus, the size of the unlabeled ZTF set2 S´anchez-S´aez et al.

Table 6.

Feature ranking (top 30) for each layer of the deployed BHRF classiﬁer.

Top level Transient Stochastic PeriodicFeature Rank Feature Rank Feature Rank Feature Rank

W1-W2

SPM t0 1

W1-W2

Multiband period sgscore1

SPM beta 2 sgscore1 g -W2 positive fraction 2 SPM tau rise 2 r -W2 r -W2 positive fraction 1 SPM tau rise 1 ( g - r ) mean corr ( g - r ) max corr SPM tau rise 1 ( g - r ) max g -W2 g -W3 LinearTrend 2

SPM t0 2 gal b ( g - r ) mean SPM chi 1

LinearTrend 2 g -W3 ( g - r ) max g -W2 AndersonDarling 2 ( g - r ) max corr GP DRW tau 1 g -W3 SPM beta 1

ExcessVar 2 ( g - r ) mean corr n non det after fid 2 SPM tau fall 2

Meanvariance 2

IAR phi 1

W2-W3 dmag first det fid 1 ( g - r ) mean Amplitude 1

SPM beta 1

MHPS low 1 delta mag fid 2

ExcessVar 1

SPM tau rise 2

LinearTrend 1 r -W3 delta mag fid 1 SPM A 2 ( g - r ) mean W2-W3

Meanvariance 1

SPM chi 2

MHPS ratio 1

Amplitude 2 r -W3 SPM A 1

SPM tau fall 1

Std 2

Std 1 r -W2 SPM gamma 2 ( g - r ) max GP DRW sigma 1

ExcessVar 1

MHPS ratio 2

SPM A 2

GP DRW tau 2

ExcessVar 2

Skew 2

PercentAmplitude 2

PercentAmplitude 1 r -W3 sgscore1 SPM A 1

W1-W2

Rcs 2

SPM gamma 1

ExcessVar 1

W2-W3

GP DRW sigma 2

Power rate 2

IAR phi 1

SF ML amplitude 1

SPM tau fall 1

IAR phi 2

GP DRW sigma 2

SPM A 1

Meanvariance 2 dmag first det fid 2 delta mag fid 1

Gskew 1

LinearTrend 1

IAR phi 1

Pvar 2

Q31 1

SPM beta 2 last diffmaglim before fid 1

IAR phi 2

Autocor length 1

Pvar 2

PPE

GP DRW tau 2

IAR phi 2

MHPS low 2

Harmonics mag 6 1

GP DRW tau 1

SF ML gamma 1

SF ML amplitude 2

MHPS low 2

MHPS high 1

Amplitude 2 ( g - r ) max corr Gskew 2

MHPS low 1 delta mag fid 2 increases every day. Moreover, with new detections thesize of the labeled set will increase, allowing the trainingof new BHRF models. Updates regarding any possiblemodiﬁcation to the light curve classiﬁer (e.g., labeled setand models) will be published on the ALeRCE Sciencewebsite.Figure 12 shows the number of candidates per classobtained for the 868,371 sources with enough alerts un-til 2020/06/09. Compared to Figure 4, it can be no-ticed that there is no correlation between the number ofsources per class for the unlababel set and the numberof sources per class for the labeled set. The Periodic-Other, E, and LPV classes have the highest number ofcandidates, while the SN classes have the lowest. Thedistribution of candidates per class is consistent with theastrophysical number densities (i.e., we are likely notmisclassifying large numbers of sources). For instance,Blazars are relativistically beamed (and thus seen tohave farther distances), but only over very small viewing angles, and hence are expected to be less common thanQSOs and AGNs. In the case of SNe, not factoring in theamount of time a particular SN is above the magnitudelimits of the search (the “control time”), we ﬁnd ratiosof SNe II/SNe Ia and SNe Ibc/SN Ia of 0.21 and 0.41,respectively. Computing these ratios using the numberof such classes reported from ASAS-SN discoveries inHoloien et al. (2019) yields 0.36 and 0.09, respectively.The signiﬁcant diﬀerences between the SNe Ibc/SN Iaratios implies that we are strongly overestimating thenumbers of SN Ibc; given the similarities between SN Ibcand SN Ia light curves, we are likely classifying a non-negligible fraction of SN Ia as SN Ibc. This highlightsthe importance of including distance estimations to im-prove the classiﬁcation of transients. We are currentlyworking to include distance-based features in future ver-sions of the classiﬁer.To investigate the quality of the predictions, we plot-ted the probability distributions of the top and bottom he ALeRCE Light Curve Classifier P e r i o d i c - O t h e r E L P V Y S O RR L Q S O D S C T C E P A G N C V / N o v a B l a z a r S N I a S N I b c S N II S L S N o f c l a ss i f i e d s o u r c e s Figure 12.

Number of candidates per class for all thesources in the unlabeled ZTF set. It can be seen that thenumber of sources per class for the unlabeled ZTF set doesnot correlate with the number of sources per class for thelabeled set (see Figure 4). levels on the left and right sides of Figure 13. The redlines denote the position of the median probability foreach class, and the green lines denote the 5 and 95 per-centiles. It is clear from the ﬁgure that the distributionof probabilities for the top level are higher comparedto the bottom level. For the top level, the classes withthe lowest probabilities are CV/Nova, and YSO. For thebottom level, the classes with the highest probabilitiesare LPV, QSO, and AGN, and the lowest probabilitiesare for the diﬀerent classes of SNe, YSO, CV/Nova, andsome periodic variables.The low probabilities obtained for some classes arerelated with the confusion between classes observed inFigure 7. For instance, in Figure 7 we can see that theSN classes present a high confusion among them. On theother hand, the SNIa, SNIbc, SNII and SLSN medianprobabilities of sources classiﬁed as SNII are 0.16, 0.19,0.28 and 0.19, respectively. The high confusion amongthe SN classes may be due to the low number of sourcesin the labeled set, but also to the intrinsic similaritiesamong these classes. For example, the physical mecha-nism responsible for the main peak of the light curve ofSNe Ia and SNe Ib/c is the same, the diﬀusion of energydeposited by radioactive Ni (Arnett 2008). Indeed,Villar et al. (2019a) report that 15% of their Type IbcSNe are classiﬁed as Ia. This might be improved by per-forming data augmentation using, for example, Gaus-sian process modeling (e.g., Boone 2019). On the otherhand, the low probabilities observed for CV/Novae andYSOs can be produced by the similarities between theircolors and the colors of some periodic sources, and the fact that some CV/Novae and YSOs present very rapidvariability compared to the ZTF cadence, that produceslight curves with low auto-correlation, and thus low val-ues of the

IAR phi parameter, which is normally ob-served for periodic sources (excluding LPVs). Thesesimilarities can be seen in Figure 14, where we showthe distributions of

IAR phi 1 and g -W3 for YSOs andCV/Novae (grouped), the rest of the stochastic classes,and periodic sources from the labeled set.Figure 15 shows the normalized r band magnitudedistribution of the diﬀerent classes considering sourcespresent in the labeled set and candidates from the un-labeled ZTF set. In general, the distributions of mag-nitudes of the candidates are similar to or fainter thanfound among sources from the labeled set. These resultsshow that the classiﬁer is able to detect faint and brightcandidates, regardless of the luminosity biases present inthe labeled set, which can be dominated by the brightesttail of the true magnitude distribution of each class.A simple way to test the performance of the BHRFclassiﬁer is to verify whether the results obtained whenthe model is applied to the unlabeled ZTF set are inagreement with what is astrophysically expected fromprevious works. For instance, younger Galactic targetslike YSOs, Classical Cepheids, and LPVs should residenear the Galactic plane (e.g., Catelan & Smith 2015;Mowlavi et al. 2018), while extragalactic sources likeAGNs, QSOs, Blazars, and SNe should have roughlyisotropic distributions, perhaps with fewer sources nearthe Galactic plane due to attenuation/reddening by gasand dust (e.g., Calzetti et al. 2000; Padovani et al. 2017).On the left side of Figure 16 the sky distribution, inGalactic coordinates, of LPV, CEP, and YSO candidatesis shown. It is clear from the ﬁgure that most of themare located in the Galactic plane, and that sources lo-cated outside the plane have a low BHRF probability.This is consistent with the results obtained by previousworks (e.g., Mowlavi et al. 2018; Rimoldini et al. 2019).The right panel of Figure 16 shows the Galactic latitudeversus the g − r color obtained using the mean magni-tude of the light curves in each band, for extragalacticcandidates (QSO, AGN, Blazar, SNIa, SNIbc, SNII, andSLSN). From the ﬁgure we can see that the fraction ofextragalactic candidates observed around the Galacticplane is low, and that most of the candidates locatedin the plane have low probabilities. Moreover, the g − r colors of the extragalactic candidates are consistent withwhat is expected for these classes, with clear evidence ofreddening for the candidates located around the Galac-tic plane. The sky distribution, in Galactic coordinates,of the extragalactic candidates can be found in Figure22 of the appendix.4 S´anchez-S´aez et al. top ( transient ) top ( transient ) top ( transient ) top ( transient ) top ( stochastic ) top ( stochastic ) top ( stochastic ) top ( stochastic ) top ( stochastic ) top ( periodic ) top ( periodic ) top ( periodic ) top ( periodic ) top ( periodic ) Predicted top level probability ( top ) top ( periodic ) Predicted bottom level probability ( )

Figure 13.

Left: normalized probability distributions of the top level of the deployed BHRF classiﬁer, split by subclass, forcandidates from the unlabeled ZTF set. The reported values correspond to the probabilities obtained for each class of the toplevel, as indicated below the class name. Right: normalized probability distributions of the bottom level of the deployed BHRFclassiﬁer, split by subclass, for candidates from the unlabeled ZTF set. The red lines show the median probability for each class.The green lines show the 5 and 95 percentiles of the probabilities. Some subclasses show broad distributions to low values,implying that they are not so well-represented or characterized by the highest ranked features within the hierarchical class. he ALeRCE Light Curve Classifier IAR g band (IAR_phi_1) n o r m a li z e d f r e q u e n c y YSO and CV/NovaStochastic (other)Periodic g -W3 n o r m a li z e d f r e q u e n c y Figure 14.

Normalized IAR φ distribution ( top ) in the g band ( IAR phi 1 ; cut in the y-axes at 0.4), and normalized g -W3 distribution ( bottom ), for YSOs and CV/Novae (blue),stochastic sources (red; excluding CV/Novae and YSOs),and periodic sources (yellow). It can be seen that someCV/Novae and YSOs have similar IAR phi 1 and g -W3 thanperiodic sources. 6. CONCLUSIONS6.1.

Summary

In this paper we presented the ﬁrst version of theALeRCE light curve classiﬁer. This classiﬁer uses a to-tal of 152 features, including variability features com-puted from ZTF light curves with ≥ g or r bands, and colors computed using ZTF and AllWISEphotometry (see Section 3), to classify each source into15 subclasses, including periodic, transient, and stochas-tic variable sources (see Section 2.2). The light curveclassiﬁer uses a balanced hierarchical RF classiﬁer (see SNIa in LSSNIa cand.

SNIbc in LSSNIbc cand.

SNII in LSSNII cand.

SLSN in LSSLSN cand.

QSO in LSQSO cand.

AGN in LSAGN cand.

Blazar in LSBlazar cand.

YSO in LSYSO cand.

CV/Nova in LSCV/Nova cand.

LPV in LSLPV cand.

E in LSE cand.

DSCT in LSDSCT cand.

RRL in LSRRL cand.

CEP in LSCEP cand.

12 14 16 18 20

Mean r Periodic-Other in LSPeriodic-Other cand.

Figure 15.

Normalized magnitude distributions in the r band for sources in the labeled set (LS; red histograms)and candidates from the unlabeled ZTF set (cand.; blue his-tograms). Section 4.1), constructed with a two-level scheme. Theﬁrst level (top level) classiﬁes each source as periodic,stochastic or transient. The second level (bottom level)consists of three classiﬁers that further resolve each hi-erarchical class into subclasses.6

S´anchez-S´aez et al.

Figure 16.

Left: Galactic latitude (gal b, in degrees) versus Galactic longitude (gal l, in degrees) for candidates expected to bemostly observed around the Galactic plane (LPV, CEP, and YSO classes). Right: Galactic latitude (gal b, in degrees) versus g − r max for extragalactic candidates (QSO, AGN, Blazar, SNIa, SNIbc, SNII, and SLSN classes). The bottom level probabilitycomputed by the deployed BHRF classiﬁer are color-coded according to the colorbar to the right. The red contours show thedensity of points in each plot. It can be seen that most of the extragalactic candidates are located outside the Galactic plane,while most of the YSO, LPV and CEPcandidates are located in the Galactic plane. We trained and tested the BHRF classiﬁer using a la-beled set obtained by cross-matching the ZTF databasewith diﬀerent catalogs of transients, stochastic and pe-riodic sources (see Section 2.2.1). For the top level weobtained macro-averaged precision, recall, and F1-scorevalues of 0.96, 0.99, and 0.97, respectively, while for thebottom level we obtained macro-averaged precision, re-call, and F1-score values of 0.57, 0.76, and 0.59, respec-tively.We used the BHRF classiﬁer to classify 868,371sources from ZTF (unlabeled ZTF set), obtaining re-sults that are in agreement with what we expect astro-physically. For instance, most of the high probabilityextragalactic candidates are located outside the Galac-tic plane, and most of the high probability YSO, LPV,and CEP candidates are located in the Galactic plane(see Figure 16).The condition of ≥ g or r normallyequates to a timespan between 3 and 30 days sincethe ﬁrst detection. Whenever a new detection is re-ceived for an object, the ALeRCE pipeline processes itand provides an updated classiﬁcation in ∼ Final remarks and perspectives

One of the main challenges found during the devel-opment of the ALeRCE light curve classiﬁer was thehigh imbalance present in the labeled set. For instance,the transient sources represent 1.4% of the labeled set,while the periodic sources represent 70.5%. Each hier-archical class also suﬀers from high imbalance amongits subclasses; for example, in the case of the tran-sient class, SNIa comprise 74.0% of the sample, whileSLSN correspond to only 1.4%. We addressed this prob-lem by using the balanced RF implementation of the imbalanced-learn

Python package, which follows theprocedure proposed by Chen et al. (2004). This methoduses a downsampling majority class technique to traineach tree with a balanced sub-sample. We also testedtwo other algorithms, GBoost and MLP, but concludedthat more work is needed if we want to obtain betterresults from those. he ALeRCE Light Curve Classifier

IAR phi parameter, theMHPS features, and the non-detection features. In addi-tion, during the development of the light curve classiﬁerwe realized that some stochastic classes are hard to sep-arate using just variability features. In particular, theseparation of YSOs from the other stochastic classes im-proved signiﬁcantly once we included AllWISE colors inthe set of features (as can be seen in Figures 7 and 9).Furthermore, the computation of reliable periodswas quite challenging, particularly considering that theALeRCE pipeline requires fast computation of features.Huijse et al. (2018) demonstrated that very good re-sults can be achieved by quadratic mutual information(QMI) estimators, however these techniques are compu-tational expensive [ O ( n )]. We solve this issue by usingthe MHAOV periodogram, which provides less reliableperiods (see Figure 10 in Huijse et al. 2018), but is muchfaster to compute [ O ( n )]. Periods become increasinglyunreliable as the number of datapoints decreases, butthe classiﬁcation of periodic variables can still be accu-rate (see RRL, E or LPV panels in Figure 11), as otherfeatures can compensate for the decreasing quality ofthe periodogram (e.g., features related with the ampli-tude and the timescale of the variability). ALeRCE iscurrently working to implement methods for period esti-mation that are both accurate and fast. Computing theperiodogram is expensive for sources with a large num-ber of detections. We are currently exploring so-called“online” periodograms, which are updated as new sam-ples arrive, at a fraction of the computational cost ofrecomputing the period each time from scratch (Zorichet al. 2020), as well as other techniques that might workbetter with eclipsing binary light curves (e.g., Kov´acset al. 2002; Mighell & Plavchan 2013).Moreover, the classiﬁcation of the diﬀerent SN classeswas particularly challenging. First, the number of SNein the labeled set was very small compared to otherclasses, and second, the light curves of SN classes canpresent similarities, which makes their separation dif-ﬁcult, as discussed in Section 5.3.2. We solved thisissue by using the BalancedRandomForestClassifier method from imbalanced-learn , and by including theSPM features, whose deﬁnition is a modiﬁcation of the work of Villar et al. (2019a). In the future we plan totest other techniques to improve the separation of SNclasses. Previous works have performed Gaussian pro-cess regression to model SNe light curves and generatenew light curves with diﬀerent cadences therefrom (e.g.,Boone 2019). Moreover, better results can be obtainedif we use information regarding the SN host galaxy (e.g.Foley & Mandel 2013; Baldeschi et al. 2020).In the future we also plan to perform data augmenta-tion to improve the classiﬁcation of persistent variableobjects. For the case of variable stars, light curves canbe modeled with Gaussian process or with a combina-tion of harmonics, and then basic transformations canbe applied to these models to obtain light curves withdiﬀerent periods and amplitudes (e.g., Elorrieta et al.2016; Mart´ınez-Palomera et al. 2018; Castro et al. 2018;Aguirre et al. 2019). To the best of our knowledge forthe case of AGNs, QSOs and Blazars no previous at-tempts to perform data augmentation have been made.A promising option is to use synthetic light curve gen-erators that consider the physical processes behind thevariability (e.g., Sartori et al. 2019).Most of the features used by this classiﬁer can be im-plemented and used to classify light curves from otherdata sets. In particular, for the case of LSST, thenon-detection features can be adapted to work withthe forced photometry that will be provided for eachalert (

DIAForcedSources in the Data Products Deﬁni-tion Document; Juri´c et al. 2019). LSST will also beneﬁtfrom the multiband ugrizy light curves. As we demon-strated in this work, in general the light curve classiﬁca-tion improves when both ZTF g and r data are available.For the case of LSST this would be the same, and prob-ably we should even be able to further resolve some ofthe subclasses presented in this work. For instance, us-ing the zy light curves we should be able to separatelocal type 1 and type 2 active galactic nuclei, since forlow redshift sources, we can detect variability from thedusty torus at these wavelengths (see S´anchez et al. 2017and references therein), or identify high redshift QSOs,whose emission is expected to be absorbed in the bluerbands. We encourage researches interested in classifyingstochastic and transient sources in particular to use thenovel (or modiﬁed) features presented in this work, likethe IAR phi parameter, the MHPS features, the SPMfeatures, and the non-detection features.It is worth to note that the ALeRCE light curve clas-siﬁer is being constantly improved, and this work de-scribes its Version 1.0. Future versions of this classiﬁermay include new classes of variable and transient ob-jects, as well as sub-classes of sources already presentin the taxonomy (e.g., RRL types ab and c; classical8

S´anchez-S´aez et al. and type II CEPs; contact, detached, and semi-detachedeclipsing binaries; among others). We are also workingto ﬁnd new features and techniques that can improvethe performance of the classiﬁer. Future work will re-port any changes included in the classiﬁer model, likethe inclusion of data augmentation, or the use of otherclassiﬁcation strategies (e.g., semi supervised training).We recommend the users of this classiﬁer to check theALeRCE Science website to get updates related with thediﬀerent classiﬁers and the data processing. We are ex-ploring diﬀerent classiﬁcation algorithms which are notbased on manually designed features, but on automat-ically derived, recurrent, implicitly extracted features,via deep learning (e.g. Naul et al. 2018; Muthukrishnaet al. 2019; Becker et al. 2020). However, up to thispoint we have found that the former produce better re-sults when applied to real data. Most likely, a combina-tion of simulated and real data will be required to trainreliable deep learning classiﬁcation models in the future,as found by Carrasco-Davis et al. (2019).ACKNOWLEDGMENTSThe authors acknowledge support from: ANID Mil-lennium Science Initiative ICN12 009; ANID grantsPIA AFB-170001 (F.F., I.R., C.V., E.C., D.R., L.S.G.,C.S.C., D.R.U.), PIA ACT172033 (P.A.), Basal-CATAAFB-170002 (P.S.S., F.E.B., M.C., D.D.), FONDECYTRegular N 1200710 (F.F.), 1190818 (F.E.B.), 1200495(F.E.B.), 1171678 (P.E.) and 1171273 (M.C.), FONDE-CYT Initiation N 11191130 (G.C.V.), and FONDECYTPostdoctorado N 3200250 (P.S.S.), and 3200222 (D.D.),the Max-Planck Society, Germany, through a PartnerGroup grant (P.A.). Powered@NLHPC: This researchwas partially supported by the supercomputing infras-tructure of the NLHPC (ECM-02).Based on observations obtained with the SamuelOschin 48-inch Telescope at the Palomar Observatory as part of the Zwicky Transient Facility project. ZTFis supported by the National Science Foundation underGrant No. AST-1440341 and a collaboration includ-ing Caltech, IPAC, the Weizmann Institute for Science,the Oskar Klein Center at Stockholm University, theUniversity of Maryland, the University of Washington,Deutsches Elektronen-Synchrotron and Humboldt Uni-versity, Los Alamos National Laboratories, the TANGOConsortium of Taiwan, the University of Wisconsin atMilwaukee, and Lawrence Berkeley National Laborato-ries. Operations are conducted by COO, IPAC, andUW. AllWISE makes use of data from WISE, which is ajoint project of the University of California, Los Ange-les, and the Jet Propulsion Laboratory/California Insti-tute of Technology, and NEOWISE, which is a projectof the Jet Propulsion Laboratory/California Institute ofTechnology. WISE and NEOWISE are funded by theNational Aeronautics and Space Administration.

Software:

Apache Kafka , Apache Spark (Zahariaet al. 2016), ASTROIDE (Brahem et al. 2020), As-tropy (Astropy Collaboration et al. 2013, 2018), Dask(Rocklin 2015), FATS (Nun et al. 2015), IAR (Eyhera-mendy et al. 2018), Imbalanced-learn (Lemaˆıtre et al.2017), Jupyter (Kluyver et al. 2016), Keras (Cholletet al. 2015), Matplotlib (Hunter 2007), Numpy & Scipy(van der Walt et al. 2011), P4J (Huijse et al. 2018),Pandas (McKinney et al. 2010), PostgreSQL , Python(Van Rossum & Drake Jr 1995), Scikit-learn (Pedregosaet al. 2012), Seaborn (Waskom et al. 2017), Tensorﬂow(Abadi et al. 2016), TOPCAT (Taylor 2005), XGBoost(Chen & Guestrin 2016).APPENDIX A. FURTHER DESCRIPTION OF SOME VARIABILITY FEATURESIn this section we provide additional description of some of the features listed in Table 2 (those marked with **).These features correspond to new variants of features included in the FATS package and other works: • Damp Random Walk (DRW) parameters: a DRW model is deﬁned by a stochastic diﬀerential equation whichincludes a damping term that pushes the signal back to its mean: dX ( t ) = − τ DRW X ( t ) dt + σ DRW √ dt (cid:15) ( t ) + https://kafka.apache.org/ Python and R implementations are available in https://github.com/felipeelorrieta/IAR Model he ALeRCE Light Curve Classifier b dt, τ DRW , σ

DRW , t > τ DRW corresponds to the characteristic time for the time series to become roughlyuncorrelated, σ DRW corresponds to the amplitude of the variability at short timescales ( t (cid:28) τ DRW ), and (cid:15) ( t ) isa white noise process with zero mean and variance equal to 1. DRW modelling is typically used to describe lightcurves of active galactic nuclei (Kelly et al. 2009). In this case we obtained the σ DRW and τ DRW parametersusing Gaussian process regression, with a Ornstein-Uhlenbeck kernel, as in Graham et al. (2017).

GP DRW sigma denotes σ DRW , while

GP DRW tau denotes τ DRW . • Excess Variance ( σ rms ): Measure of the intrinsic variability amplitude in a given band (see S´anchez et al. 2017,and references therein). σ = ( σ LC − σ m ) /m , where σ LC is the standard deviation of the light curve, σ m isthe average photometric error, and m is the average magnitude. We denoted σ as ExcessVar . • P var : Probability that the source is intrinsically variable in a given band (see Paolillo et al. 2004, and referencestherein). It considers the χ of the light curve respect to its mean, and calculates the probability P var = P ( χ )that a χ lower or equal to the observed value could occur by chance for an intrinsically non-variable source,assuming that for each light curve its χ will follow a probability distribution described by an incomplete gammafunction Γ( ν/ , χ / ν corresponds to the degrees of freedom. We denoted P var as Pvar . • Structure Function (SF) parameters: The SF quantiﬁes the amplitude of the variability as a function of the timediﬀerence between pairs of detections ( τ ). In this work we consider the deﬁnition provided by Caplar et al. (2017).We model the SF as a power law: SF( τ ) = A SF (cid:16) τ (cid:17) γ SF , where γ SF corresponds to the logarithmic gradientof the change in magnitude, and A SF corresponds to the amplitude of the variability at 1 yr. SF ML amplitude denotes A SF , while SF ML gamma denotes γ SF . B. ONE-LEVEL MULTI-CLASS RF MODELThe ﬁrst model tested for the ALeRCE light curve classiﬁer was a simple one-level RF model with 15 classes,implemented using the imbalanced-learn

Python package. This model uses 500 trees, maximum depth trees, andmaximum number of features equal to the square root of the total number of features. Figure 17 shows the confusionmatrix obtained by this model. The precision, recall and F1-score obtained are 0.49, 0.68, and 0.50, respectively.Clearly this model has a lower performance compared to the BHRF classiﬁer. C. THE PARTICULAR CASE OF RRLIn Section 5.2 we claimed that the low recall values obtained in the g band for bright RRL can be explained bythe diﬀerences in the variability features of the RRL sub-types. Figure 18 shows the Multiband period versus the

Meanvariance measured in the g and r bands, for RRLs split into their sub-classes ‘ab’ and ‘c’, and for E, CEP, andDSCT (grouped as a single class). From the ﬁgure we can notice that ab-type RRL tend to have larger Meanvariance in the g band, which helps to distinguish them from E/CEP/DSCT, while c-type RRL have values of Meanvariance inboth bands similar to those of E/CEP/DSCT, which makes it diﬃcult to tell them apart. The RRL class in our labeledset is dominated by ab-type RRL ( ∼ g ≤ D. THE PARTICULAR CASE OF AGN, QSO AND BLAZARIn this work we present the ﬁrst attempt to separate diﬀerent types of active galactic nuclei according to theirvariability properties. As we mentioned in Section 2.2, we separate active galactic nuclei in the following way: • AGN: type 1 Seyfert galaxies (i.e., active galactic nuclei whose emission is dominated by the host galaxy), selectedfrom MILLIQUAS (broad type “A”), and from Oh et al. (2015). • QSO: type 1 nucleus-dominated active galactic nuclei (i.e., active galactic nuclei whose emission is dominated bytheir active nuclei), selected from MILLIQUAS (broad type “Q”). • Blazar: BL Lac objects and Flat Spectrum Radio Quasars (FSRQ), selected from ROMABZCAT and MILLI-QUAS.0

S´anchez-S´aez et al.

Figure 17.

Confusion matrix for a one-level multi-class RF Model. The black squares highlight the three hierarchical classes(from top to bottom, transient, stochastic, and periodic, respectively). The reported values correspond to the percentagesobtained by normalizing the matrix with the true labels. The performance of this model is poorer compared to the BHRFclassiﬁer (see Figure 7).

Figure 18.

Logarithm of the Multiband period versus the

Meanvariance 1 ( g band, left) and the Meanvariance 2 ( r band,right), for RRL split according to their sub-classes ‘ab’ (green) and ‘c’ (red), and for E, CEP, and DSCT (grey). We show azoom in the area where most of the RRLs lie. The contours show the density of points of each RRL class. he ALeRCE Light Curve Classifier

14 15 16 17 18 19 20 21 average magnitude R e c a ll AGN+QSO+Blazar g+rgr R e c a ll AGN+QSO+Blazar g+rgr

Figure 19.

Similar to Figures 10 and 11, but treating AGNs, QSOs, and Blazars as a single class. When grouped together, therecall is much better behaved than for any individual active galactic nuclei class, highlighting remaining diﬃculties distinguishingbetween these classes.

In Figure 11 we showed the recall curves for each class as a function of the number of detections, and we couldobserve that these curves decrease for QSOs and AGNs when more detections are available, particularly in the r band.These results are puzzling, considering that we would normally expect to improve the classiﬁcation of a given class asmore detections are included in the light curves. In order to better understand the origin of these results, we show inFigure 19 the recall curves as a function of average magnitude and number of detections for AGNs, QSOs and Blazarsgrouped as a single class. In this case, the recall curves are around 0.8 and 1.0 for every bin of magnitude (speciallyfor g >

16 or r >

16) and number of detections. From this we can infer that the light curve classiﬁer has a very goodperformance selecting active galactic nuclei as a single class, but some issues still remain regarding the separation ofAGNs, QSOs, and Blazars. There are two main explanations for these results: a) the method cannot properly separateQSOs from AGNs and Blazars, or b) there are sources in the labeled set with incorrect labels.A possible way to explore how well the light curve classiﬁer can discriminate among AGNs, QSOs, and Blazars,is to check whether the features available in the light curve classiﬁer can separate these three populations. Figure20 shows six diﬀerent features used by the classiﬁer, ( g - r ) mean corr , Meanvariance 1 , ExcessVar 1 , sgscore1 , Meanvariance 2 , and

ExcessVar 2 , for QSOs, AGNs, and Blazars from the labeled set (most of these features arein the top 30-ranked features shown in Table 6). From the ﬁgure we can see that these three classes have diﬀerentcolor distributions, diﬀerent morphologies, and also diﬀerent variability properties. AGNs and Blazars tend to beredder than QSOs (see ( g - r ) mean corr ), Blazars and AGNs tend to have larger amplitudes (see Meanvariance 1 and

ExcessVar 1 ), and AGNs tend to have more extended morphologies compared to QSOs and Blazars. Theseare just some examples of features that can be used to separate the three classes above mentioned. After a visualinspection of the feature distribution of AGNs, QSOs, and Blazars, we found that more than 30 features can be used toseparate them, including for instance

PercentileAmplitude , Q31 , GP DRW sigma , GP DRW tau , MHPS low , MHPS high , SF ML amplitude , among others. From this we can infer that the light curve classiﬁer should be able to separate thesethree populations properly.In addition, from Figure 20 we can notice that for the case of AGNs and QSOs, the distribution of features obtainedusing the g band light curves presents larger diﬀerences compared to the features obtained for the r band. Thisbehavior is also seen in other features, like PercentileAmplitude , GP DRW sigma , Std , among other features relatedwith the amplitude of the variability. These diﬀerences might be produced by the combined eﬀect of having a highercontamination from the host in the r band and intrinsically lower amplitude of the variability in the r band, due tothe well known anti-correlation between amplitude of the variability and the wavelength of emission (see S´anchez et al.2017 and references therein). From this, we can understand the diﬀerences observed in the g and r band recall curvesof AGNs and QSOs shown in Figure 11, which should be produced by these diﬀerences in the features distributions.On the other hand, MILLIQUAS uses a luminosity-based divider to separate AGNs and QSOs (see Section 5 ofFlesch 2015), and therefore we expect that a fraction of sources could have an incorrect classiﬁcation, if the hostgalaxy component is not corrected properly. In order to see whether there are sources in the labeled set with incorrectQSO or AGN classiﬁcation, we crossmatched our labeled set with the SIMBAD database. There are 26168 QSOsin the labeled set (all obtained from MILLIQUAS), and 1590 of them are classiﬁed as Seyfert or AGN by SIMBAD2 S´anchez-S´aez et al. ( g r ) mean _ corr n o r m a li z e d f r e q u e n c y Meanvariance_1 ( g band) ExcessVar_1 ( g band) sgscore1 n o r m a li z e d f r e q u e n c y AGNQSOBlazar Meanvariance_2 ( r band) ExcessVar_2 ( r band) Figure 20.

Distribution of ( g - r ) mean corr , Meanvariance 1 , ExcessVar 1 , sgscore1 , Meanvariance 2 , and

ExcessVar 2 , forQSOs (blue), AGNs (red), and Blazars (yellow) classes from the labeled set. These features (among others) can be used toseparate these classes of active galactic nuclei. (6%), with 1580 having a reported redshift <

1. On the other hand, there are 4667 AGNs in the labeled set, and 830(17%) of them are classiﬁed as QSO in SIMBAD. Therefore, there are 2420 sources in the labeled set with inconsistentclassiﬁcation in MILLIQUAS and SIMBAD. The light curve classiﬁer classiﬁes 920 of these sources as QSO and 1319as AGN. To understand more properly the diﬀerences between these AGN and QSO candidates, we plot in Figure 21the same features plotted in Figure 20 for the sources with inconsistent classiﬁcation in MILLIQUAS and SIMBAD,splitting them according to their predicted class from the BHRF model. The distribution of features of these AGNand QSO candidates is similar to the general distributions observed for AGNs and QSOs from the labeled set. Again,AGNs tend to have higher variability amplitudes, redder colors and extended morphologies.From these results, we can infer that the decrease in the recall as a function of the number of detections observed forQSOs and AGNs is produced by the discrepancy in the QSO/AGN classiﬁcation obtained from diﬀerent catalogs. Whenmore epochs are available, it is easier for the light curve classiﬁer to perform a correct variability-based classiﬁcation,and therefore, identify an original incorrect classiﬁcation provided by a given catalog. We propose that by usingthe variability properties of active galactic nuclei we can more easily separate them as nucleus-dominated or host-dominated, or in other words, as regular QSOs or low luminosity AGNs. E. SKY DISTRIBUTION OF THE EXTRAGALACTIC CANDIDATESFigure 22 shows the sky distribution (in Galactic coordinates) of extragalactic candidates (QSO, AGN, Blazar, SNIa,SNIbc, SNII, and SLSN). It is expected that only a few extragalactic candidates are observed in the Galactic plane,which is conﬁrmed in Figure 22. REFERENCES

Abadi, M., Barham, P., Chen, J., et al. 2016, in 12th { USENIX } Symposium on Operating Systems Designand Implementation ( { OSDI } he ALeRCE Light Curve Classifier ( g r ) mean _ corr n o r m a li z e d f r e q u e n c y Meanvariance_1 ( g band) ExcessVar_1 ( g band) sgscore1 n o r m a li z e d f r e q u e n c y AGN cand.QSO cand. Meanvariance_2 ( r band) ExcessVar_2 ( r band) Figure 21.

Distribution of ( g - r ) mean corr , Meanvariance 1 , ExcessVar 1 , sgscore1 , Meanvariance 2 , and

ExcessVar 2 , forsources from the labeled set with discrepancies in their classiﬁcation according to MILLIQUAS and SIMBAD, and classiﬁed asAGN (red) and QSO (blue) by the light curve classiﬁer. The distribution of features is similar to that presented in Figure 20for QSOs and AGNs.Aguirre, C., Pichara, K., & Becker, I. 2019, MNRAS, 482,5078, doi: 10.1093/mnras/sty2836Allevato, V., Paolillo, M., Papadakis, I., & Pinto, C. 2013,ApJ, 771, 9, doi: 10.1088/0004-637X/771/1/9Ar´evalo, P., Churazov, E., Zhuravleva, I.,Hern´andez-Monteagudo, C., & Revnivtsev, M. 2012,MNRAS, 426, 1793,doi: 10.1111/j.1365-2966.2012.21789.xArnett, D. 2008, in American Institute of PhysicsConference Series, Vol. 1053, American Institute ofPhysics Conference Series, ed. S. K. Chakrabarti & A. S.Majumdar, 237–242, doi: 10.1063/1.3009489Astropy Collaboration, Robitaille, T. P., Tollerud, E. J.,et al. 2013, A&A, 558, A33,doi: 10.1051/0004-6361/201322068Astropy Collaboration, Price-Whelan, A. M., Sip˝ocz, B. M.,et al. 2018, AJ, 156, 123, doi: 10.3847/1538-3881/aabc4fBaldeschi, A., Miller, A., Stroh, M., Margutti, R., &Coppejans, D. L. 2020, arXiv e-prints, arXiv:2005.00155.https://arxiv.org/abs/2005.00155 Becker, I., Pichara, K., Catelan, M., et al. 2020, MNRAS,493, 2981, doi: 10.1093/mnras/staa350Bellm, E. 2014, in The Third Hot-wiring the TransientUniverse Workshop, ed. P. R. Wozniak, M. J. Graham,A. A. Mahabal, & R. Seaman, 27–33.https://arxiv.org/abs/1410.8185Bellm, E. C., Kulkarni, S. R., Graham, M. J., et al. 2019,PASP, 131, 018002, doi: 10.1088/1538-3873/aaecbeBen-Israel, A., & Greville, T. 2003, Adi Ben-Israel andThomas N.E. Greville, Generalized Inverses: Theory andApplications, ISBN 0-387-00293-6,doi: https://doi.org/10.1007/b97366Boone, K. 2019, AJ, 158, 257,doi: 10.3847/1538-3881/ab5182Brahem, M., Zeitouni, K., & L., Y. 2020, ApJBreiman, L. 2001, Machine Learning, 45, 5,doi: 10.1023/A:1010933404324Butler, N. R., & Bloom, J. S. 2011, AJ, 141, 93,doi: 10.1088/0004-6256/141/3/93Calzetti, D., Armus, L., Bohlin, R. C., et al. 2000, ApJ,533, 682, doi: 10.1086/308692 S´anchez-S´aez et al.

Figure 22.