[PDF] A Two-Stage Deep Learning Detection Classifier for the ATLAS Asteroid Survey

Abstract

In this paper we present a two-step neural network model to separate detections of solar system objects from optical and electronic artifacts in data obtained with the "Asteroid Terrestrial-impact Last Alert System" (ATLAS), a near-Earth asteroid sky survey system [arXiv:1802.00879]. A convolutional neural network [arXiv:1807.10912] is used to classify small "postage-stamp" images of candidate detections of astronomical sources into eight classes, followed by a multi-layered perceptron that provides a probability that a temporal sequence of four candidate detections represents a real astronomical source. The goal of this work is to reduce the time delay between Near-Earth Object (NEO) detections and submission to the Minor Planet Center. Due to the rare and hazardous nature of NEOs [Harris and D'Abramo, 2015], a low false negative rate is a priority for the model. We show that the model reaches 99.6\% accuracy on real asteroids in ATLAS data with a 0.4\% false negative rate. Deployment of this model on ATLAS has reduced the amount of NEO candidates that astronomers must screen by 90%, thereby bringing ATLAS one step closer to full autonomy.

Full PDF

AA Two-Stage Deep Learning Detection Classiﬁer for the ATLASAsteroid Survey

Amandin Chyba Rabeendran and Larry Denneau Applied Mathematics, Colorado School of Mines, 1500 Illinois St, Golden, CO 80401, [email protected] Institute for Astronomy, University of Hawai‘i, 2680 Woodlawn Drive, Honolulu, HI 96822, [email protected]

Keywords:

Convolutional neural networks, Asteroids, Sky surveys

Abstract

In this paper we present a two-step neural network model to separate detections of solar system objectsfrom optical and electronic artifacts in data obtained with the “Asteroid Terrestrial-impact Last AlertSystem”(ATLAS), a near-Earth asteroid sky survey system [Tonry et al., 2018]. A convolutional neuralnetwork [Lieu et al., 2019] is used to classify small “postage-stamp” images of candidate detections ofastronomical sources into eight classes, followed by a multi-layered perceptron that provides a probabilitythat a temporal sequence of four candidate detections represents a real astronomical source. The goal of thiswork is to reduce the time delay between Near-Earth Object (NEO) detections and submission to the MinorPlanet Center. Due to the rare and hazardous nature of NEOs [Harris and D’Abramo, 2015], a low falsenegative rate is a priority for the model. We show that the model reaches 99.6% accuracy on real asteroids inATLAS data with a 0.4% false negative rate. Deployment of this model on ATLAS has reduced the amountof NEO candidates that astronomers must screen by 90%, thereby bringing ATLAS one step closer to fullautonomy.

From the Tunguska impactor in 1908 [Foschini et al.,2018] to the realization by Alvarez et al. that the ∼ a r X i v : . [ a s t r o - ph . E P ] J a n ost of the need for human screening.Deep learning has provided astronomers with newtools to autonomously analyze astronomical sourcesthrough predictions based on a trained neural net-work [Baron, 2019]. Convolutional Neural Networks(CNN) represent state of the art accuracy when itcomes to image classiﬁcation [Khan et al., 2020]. Wedecided to employ a CNN over other deep learningarchitectures based on its excellent feature extractioncapability. Additionally, most other deep learningmodels that work with 2D imagery such as GANs,R-CNNs, and YOLO use CNNs as a classiﬁer (e.g.the discriminator in GANs) [Schawinski et al., 2017,Ren et al., 2015, Redmon et al., 2016]. Therefore aCNN provides a ﬂexible backbone that we can use asa baseline or starting point for future image classiﬁ-cation work.We tested the model on a month of ATLAS track-lets from June 5th to July 5th 2020. We have de-ployed the model on ATLAS and it is currently be-ing used by astronomers to screen NEO candidateson a daily basis. The deployed model is showingpositive results while being improved upon based onastronomer feedback. ATLAS telescopes have been in operation since 2017on two mountaintops in Hawai‘i (Haleakal¯a and Mau-naloa). Two additional ATLAS telescopes are underconstruction in the southern hemisphere, in SouthAfrica and Chile. The current two-telescope ATLASsystem gathers new data every night on the order of10 images resulting in over 0.5 TB of raw uncom-pressed data [Tonry et al., 2018]. Approximately 10 full sized images are classiﬁed by the ATLAS pro-cessing system every night.Every night when permitted by weather, eachATLAS telescope surveys a set of ∼

200 pre-deﬁnedlocations that raster the night sky (each called a“footprint”) and takes four 30-second exposures ateach footprint over a span of ∼

30 minutes. TheATLAS image reduction pipeline [Tonry et al., 2018]subtracts a static-sky “template” image from eachreduced image. The template image is created fromthousands of historical ATLAS observations and rep-resents the non-varying sky. When the static sky issubtracted, what remains are transient phenomenasuch as variable stars, supernovae, and most impor-tantly for ATLAS, moving objects — asteroids andcomets.After subtraction of the static sky template im-age, the ATLAS pipeline searches the subtracted im-age for astronomical sources with signal-to-noise ra-tio > ×

100 “postage stamp” images of source detectionsfrom subtracted images. The MOPS pipeline worksin celestial coordinate space and normally does notexamine image data due to the computational costsof retrieving pixels for the ∼ nightly detections.To integrate our model, we retrieve postage stampsfor all tracklets that have been created; this producesa much smaller number ( ∼ ) of postage stampsthat must be retrieved and classiﬁed since most de-tections in an image do not form a tracklet. Mostasteroids are faint and span a small amount of the16-bit range of allowed pixel values, and thereforethe postage stamps are image-equalized for maxi-mum contrast so that relevant features are ampliﬁedfor the classiﬁer.Examples tracklets produced by MOPS can beseen in Figure 1.Figure 1: The ﬁrst row shows streaked NEO 2020 NK1representing a real tracklet. The second row shows an-other real tracklet containing asteroid (4700) Carusi. Thethird and fourth rows are false tracklets caused by brightpixels from optical diﬀraction spikes and readout opticalartifacts respectively. Black pixels represent saturatedpixels caused by bright objects.

High-level diagram of ATLAS processing. For each image, the reference sky is registered and subtractedfrom the source image, leaving moving objects and other transient features. Then the pipeline identiﬁes sources in eachsubtracted image and saves a catalog of candidate moving object locations. MOPS then links sources from exposurecatalogs together to form tracklets. Image subtraction allows ATLAS to detect asteroids very close to bright stars andin the galactic plane.

Real astronomical objects are either a solar systemobject, a variable star, or a point source that is in-distinguishable from a slow moving object. Track-lets can be categorized as real or bogus depending onwhether or not the tracklet corresponds to a movingreal astronomical object in the solar system. The bo-gus category can be divided into sub-categories whichrepresent the majority of sources that cause a bogustracklet. ATLAS produces its own metrics for everydetection in a subtracted image; these metrics areused to cull the set of detections into probable realastronomical objects. This culling process still leaksmany false detections to the MOPS asteroid process-ing which then can turn into bogus tracklets.To achieve the highest classiﬁcation accuracywith a neural network, we used data spanning a widevariety of image types that are captured by the as-teroid processing system every night. Types of im-ages that contribute to a real or bogus category, andused to train the neural network, are called classes ,shown in Figure 3 and in Table 2. There are ad-ditional classes that contribute to bogus tracklets,such as optical ghosts (reﬂected bright sources) andglints (reﬂections from internal optical elements), oreﬀects from clouds; these have been excluded dueto their rarity and diﬃculty of establishing training data based on their non-uniform appearances.Table 1 summarizes each of the data sets used totrain and test our two step deep learning detectionclassiﬁer.Data Set Images Tracklets TrainingCurated 3 ,

500 No YesEvaluation 250 ,

000 Yes YesTable 1:

Two diﬀerent data sets consisting of ATLAS im-ages were used to train and evaluate the network. Bothdata sets were used for training, however, the main goalof the evaluation data set was to test the model’s ro-bustness and consistency. The curated data set did nothave access to each postage stamp’s parent tracklet labelas listed in the Tracklets column. Therefore the curateddata set cannot be used to train a network to classifytracklets.

The ATLAS database containing images from realand bogus asteroid tracklets was used to generate acurated data set for neural network training. Im-ages were selected based on an internal non-machinelearning software called vartest05 [A. et al., 2020, inprep] that pre-classiﬁes detected sources as one of theclasses from Figure 3. Even though large amounts ofbogus tracklets are parsed everyday, the vartest05 classiﬁcations are not preserved in the processing and3herefore it is not trivial to separate detections intotheir subclasses for training. Additionally, the vari-ous subclasses do not occur equally — STREAK ismuch less common than SPIKE – so building a cleandata set for these classes with uniform representationhas resulted in a possibly smaller data set than ideal(see work in [Duev et al., 2019] for a CNN trainedon a similar task with over 35,000 images).The complete curated data set consisted of 3500images each associated with one of seven classes.Furthermore, we balanced the data set so that 500images were associated with each class. The trainingset consisted of 470 of the 500 images associated witheach class while the rest of the images were used forvalidation.The goal of this data set is to identify the classcausing the ATLAS detection rather than the largestor most commonly appearing class. Images contain-ing multiple diﬀerent classes were manually removedfrom the training data to see if the network couldlearn the unique features of each class.We separated the curated data set into a trainingset (94% of the data set) and a validation set (6% ofthe data set). The order of each set was randomlyshuﬄed and due to the small size of this curateddata set, some minor data augmentation was usedto increase class generalization. However, since ob-ject shape, sharpness, and magnitude are such a keypart of class separation, image augmentations suchas random blurring, resizing, and brightness jitterswere not used. Instead we used random horizon-tal/vertical image ﬂipping, rotations, and cropping.

Multiple ATLAS completed nights form an evalua-tion data set of real known, real unknown, and bogustracklets. Known tracklets contain a pre-identiﬁedobject that ATLAS has been following with Inter-Night linking [Denneau et al., 2013]. Unknown track-lets are made of objects that ATLAS cannot matchwith a known object. Real unknown and bogustracklets were labeled by human experts while theknown tracklets are classiﬁed by Inter-night linking.Known tracklets do not require human conﬁrmationbut they are still useful as a reference for comparingthe performance of the non-machine learning pipelineto the deep learning approach.Even though real and known tracklets share sim-ilar features, known tracklets have been success-fully classiﬁed by the current non-machine-learningATLAS pipeline while unknowns have not. How-ever, they are not necessarily visually diﬀerent sinceknown tracklets are identiﬁed by more than just theirpostage stamp (tracked from previous nights based on position, size, rate of motion).This data set consisted of two lunations of AT-LAS observations from May 7-July 5 2020. The ﬁrstlunation was used for training, and the second forvalidation. The training and validation data con-sisted of pre-labeled real and bogus tracklets. Thereal tracklets were further labeled as “known”, mean-ing that a tracklet matched the position of an aster-oid in the MPC catalog, and “unknown”, meaningthat the tracklet could not be matched to a cata-logued object at the time of detection. ATLAS de-tects about 1000 times more known objects than un-known objects. The orbits for known objects areaccurate enough that tracklets can be automaticallyassigned to a known object.Each lunation of unknowns consisted of ≈ , ≈ ,

000 tracklets (over800,000 images) all of which are real known track-lets. No image augmentations were applied to thetracklet postage stamps during training due to thelarger size of the data set.

Since each tracklet can contain detections of diﬀerentclasses, we designed a two-step model consisting ofan Image Classiﬁcation Network (ICN) and TrackletClassiﬁcation Network (TCN) to tackle image andtracklet classiﬁcations respectively. Due to the highvisual similarities between some class variations andfaint asteroids, see Figure 5, this two step model wasprioritized over a single fully connected neural net-work. By having the network classify in terms ofall eight classes, we can better understand how thenetwork is separating each class and which ones itcommonly misclassiﬁes as another class. Unknowntracklets removed before human screening are essen-tially unobserved by ATLAS and could remain unde-tected to other asteroid surveys. For this reason, weprioritized the preservation of real tracklets over alower false positive rate while designing and trainingthe model.4 lass label Category Description Appearance

Burn BURN bogus vertical electronic readout artifactscaused by bright sources on the detec-tor. long vertical lines.Cosmic Ray CR bogus bright spots in the charged-coupled de-vice (CCD) in the ATLAS camera dueto electrons released in the detector sil-icon by a collision with a high-energyparticle. CR artifacts are often verysharp because they only aﬀect a regionon the CCD much smaller than a pointsource imaged through the optical sys-tem. small, sharp, and often non-circularshapes.Noise NOISE bogus detected as a source due to Poisson andreadout noise distributions. no distinguishable object at the posi-tion of detection (center of the image)compared to the area around it.Scar SCAR bogus residual bright pixels left over from animperfect subtraction of a bright starfrom the sky template image. clearly identiﬁable spots with high con-trasting color.DiﬀractionSpike SPIKE bogus caused by bright stars diﬀractedthrough the optical path. dispersing lines appearing near ex-tremely bright sources at predeter-mined angles (usually 45 degrees).AstronomicalObject AST real a real astronomical object; mainly as-teroids, variable stars, and satellites butcan also contain comets. brighter than NOISE but usuallyfainter than CRs and circularly shaped.Streak STREAK real real objects (asteroids or artiﬁcial satel-lites), that leave a linear trail on theimage due to its motion during a ≈ Each class has a corresponding Label and Category which represents what kind of tracklet, real or bogus, eachclass contributes to. A short description of each class’s origin and visual appearance in postage stamps can be found incolumns 4 and 5.

After considerable historical analysis of ATLAS detections, seven classes were chosen to represent all possibletypes of postage stamps in the curated data set. Each class has unique visual cues but some variations make it indis-tinguishable from another class. For example, some faint AST can be impossible to distinguish from background noisewithout additional information.

Figure 4:

In a convolutional neural network such as resnet-18, ﬁlters are applied to the original image (convolutionlayer) and resized (pool layer) multiple times. The feature maps produced by the ﬁnal pooling layer are ﬂattened into a1 dimensional tensor and passed into a dense layer (linear layer) which results in an output for each class. The outputfor each class is a singular number between zero and one which represents a conﬁdence measurement of the detectionbeing that class (the sum of all the output values is equal to one).

Image A and B show an example of two dif-ferent classes that are visually indistinguishable from an-other. Image A shows a faint main belt asteroid (117708)and image B shows a false detection caused by noise. Byclassifying each postage stamp in a tracklet with eightdiﬀerent classes, the network can associate error marginsfor each class based on the result of all the other classes.

The PyTorch library [Paszke et al., 2019] wasused to generate the neural networks and trainingwas done with a single CUDA enabled GPU to de-crease training times. Each network’s training pa-rameters are summarized in Table 3.Parameters ICN TCNOptimizer Adam SGDCriterion CE MSELearning Rate 0.01 0.1Data Set Curated EvaluationBatch Size 15 1Table 3:

The data set row represents the inputs used totrain the model and the criterions or loss column usedeither Cross Entropy Loss (CE) or Mean Squared ErrorLoss (MSE). A decaying learning rate was used alongsideearly stopping to reduce over-ﬁtting and training run-time.

For the image classiﬁer, we created a convolutionalneural network (CNN) that takes an individual im-age and returns a set of eight conﬁdence scores, onefor each class. At ﬁrst, we attempted to train a cus-tom CNN but simple tests with the curated data setshowed limited performance when compared to a pre-trained resnet-18 network [He et al., 2015]. It is likelythat the pre-trained ﬁlters and convolutional layers(Figure 4) on ImageNet contributed to the higherclass accuracy [Russakovsky et al., 2014]. We ﬁne-tuned the pre-trained resnet-18 by training it on thecurated data set.It is important that the classiﬁcation runtime bekept as low as possible so that we can minimize thelatency for processing time. Therefore, we decidedto employ a shallower network with less layers over adeeper network to help decrease computation time.Also, we theorized that the seemingly low visual com-plexity of the classes and small number of classeswould not be able to reap the beneﬁts of deeper net-works. One reason for this assumption is a deep net-work’s likelihood to over-ﬁt much faster during train-ing than a simple neural network. This approach hasbeen used with success in recent research similar tothis one [Duev et al., 2019]. Furthermore, training on the curated data set with multiple deep network ar-chitectures such as vgg-19, resnet-101, densenet-121among others showed no accuracy gains [Simonyanand Zisserman, 2014] [He et al., 2015] [Huang et al.,2016].

To classify tracklets, we used a multi-layered per-ceptron (MLP) that takes the outputs of the ICNas inputs. MLPs are well known among feed forwardneural network architectures and represent one of thesimplest deep learning architectures commonly used.By stacking deeper layers of perceptrons (takes mul-tiple inputs and spits out one output) more complexdetails and features can be extracted from the input.The MLP returns a new set of conﬁdence scores forthe tracklets as bogus or real, see Figure 6. Sincean MLP takes an input of ﬁxed size only the classconﬁdence scores of the ﬁrst four postage stamps ina tracklet are provided to the TCN as inputs. Ap-proximately 90% of tracklets contain at least fourdetections.The MLP cannot be trained with the curateddata set since the postage stamps in the data setare not linked to a tracklet label. Instead, we usedthe evaluation data set which only provided real andbogus as ground truth labels for each tracklet. Theuse of the evaluation data set for training will re-move the MLP’s potential to classify tracklets as aclass rather than a category.The conﬁdence scores of some classes for apostage stamp will be more competitive than oth-ers (when the standard deviation of all the conﬁ-dence scores is small). This is simply due to thefact that some class cases are practically indistin-guishable from another class and an image can con-tain several diﬀerent classes. Additionally, we expectmost postage stamps in a tracklet to return similarconﬁdence outputs when passed through the ICN.However, it is possible for tracklets to contain rad-ically diﬀerent postage stamps with unique featuresrespective to another. This could cause diﬀerent con-ﬁdence outputs for postage stamps in the same track-let. Therefore, the goal of the TCN is to determinea set of rules with the conﬁdence scores that wouldmaximize correct tracklet classiﬁcation.To ensure a low false negative rate, we designedthe neural network with a real tracklet bias using asimple weighted loss scheme. By assigning weights( α , α ) to both the real and bogus category in theMSE Loss function ( ˜ L ) which takes in the networkspredictions ( x ) and the ground truth ( y ), we can gen-7igure 6: Four postage stamps from a tracklet are passed into a CNN one by one and a total of 32 conﬁdence scoresare generated. These 32 values are passed into the Multi-Layered Perceptron (MLP) in the same order for each trackletso that the network can learn the optimal weights for each input. A single value between zero and one is produced bythe MLP where 1 represents a real tracklet and 0 represents a bogus tracklet. This type of model structure allows us totrain an MLP that takes in more or less than 4 postage stamps for uncommon tracklets as long as a training data set isprovided. erate an imbalance/bias for real tracklets. α ( y ) = (cid:26) α if y is bogus α if y is real (1) L ( x, y ) = α ( y ) ˜ L ( x, y ) (2)with α < α and where L is the weighted loss func-tion. These adjustments come at the cost of a loweraccuracy on bogus tracklet classiﬁcation. Multipletests were required to identify an optimal balance inminimizing both the false negative and false positiverate. We found that α = 1 and α = 4 provided thebest results for increasing real tracklet classiﬁcationaccuracy. We will ﬁrst address the capability of the ICN by gen-erating a confusion matrix based on its performanceon the validation images in the curated data set, seeFigure 7. It can be observed that the ICN does bestseparation for SCAR and SPIKEs. The primary is-sue of our ICN, highlighted by the confusion matrix,is that about 13% of images in the STREAK class aremisclassiﬁed as AST. Given that STREAKs consistsof NEOs and small STREAKs are barely distinguish-able from comets and other asteroids, it is expectedthat the ICN has diﬃculty diﬀerentiating betweenthe two. However, since both STREAK and ASTclasses represent image types from a real tracklet itshould not aﬀect the overall accuracy of the model.We trained the TCN on a month of ATLAS track-lets from May 5 to June 4 2020. From Figure 8it can be deduced that about 99.6% of real track- lets were correctly classiﬁed, while 90.8% of bogustracklets were correctly classiﬁed. The bogus track-let accuracy is lower than the real tracklet accuracydue to the weighted loss scheme highlighted in sec-tion 3.2.2. Additionally, the ICN was trained on thecurated data set which contained more distinguish-able artifacts and image types than the images inthe evaluation data set. This issue is not as impact-ful for postage stamps from real tracklets since theevaluation data set is heavily represented by track-lets the ATLAS detection pipeline classiﬁed as realand are therefore similar to the AST, STREAK,and COMET type images in the curated data sets.Also, there is generally more variation in the postagestamps from bogus tracklets than the postage stampsfrom real tracklets containing asteroids and comets.All of these eﬀects result in a lower bogus trackletaccuracy when compared to the real tracklets.8igure 7:

Each class was tested on 30 standalone validation images from the curated data set. A confusion matrix usingthe validation images was generated at every epoch. The matrix seen above corresponds to the last epoch used to trainthe oﬃcial model.

The TCN was tested on 30 nights of ATLASdata from June 5 to July 5 2020. A total of 212,686 track-lets were classiﬁed from the evaluation data set. Our pri-ority was to minimize the false-negative rate which aretracklets the model deems bogus but are in reality realNEO.

The Receiver Operating Characteristic (ROC)shown in Figure 9 helps us understand the cost ofa higher true positive rate versus an increase in falsepositive rate. The ROC curve shows that for athreshold of 0.15 false positive rate we can reach a0.97-0.99 true positive rate. The ROC curve alsoshows that optimizing the false positive thresholdabove 0.1 gives diminishing returns for the true pos-itive rate. On the other hand, a threshold of 0.02 orsmaller will drastically reduce the true positive rateof the model which is a higher priority to ATLASthan a slightly lower false positive rate. Finally, the91% fraction of correctly predicted bogus trackletsleads to signiﬁcant reduction in human workload oftracklet review, at a cost of 0.4% incorrectly classi-ﬁed real objects.Figure 9:

The Receiver Operating Characteristic (ROC)curve of the TCN’s performance on the evaluation data.When the model clearly distinguishes between true posi-tives and false positives the curve will converge to 1 fasterwhich means the Area Under the Curve (AUC) will ap-proach 1). The AUC for real tracklets is 0.992 whichindicates that the model does a good job at separatingreal tracklets from bogus tracklets. The AUC for bo-gus tracklets is equivalent to the AUC for real trackletsbecause it is a binary classiﬁcation problem.

The high accuracy of the trained ICN and TCNwas reproduced multiple times on the curated andevaluation data set respectively in case any parts ofthe model are permanently lost.A qualitative analysis of the TCN’s predictionsshow that although it consistently identiﬁes moreprominent artifacts, it has some trouble with subtlecases involving noise as seen in Figure 10 and Table 4.Figure 10: Several example tracklets are shown aboveand their corresponding model predictions are listedin Table 4. The ﬁrst four tracklets are the same asthe ones from Figure 1. Tracklet 5 is main belt aster-oid (17710) and Tracklet 6 consists of false detectionscaused by noise. Both of these tracklets were hand-picked from the evaluation data set to show exampletracklets that the deep learning model struggles tocorrectly classify.In the unknown evaluation data set results, thetwo-step model never failed to classify tracklets suchas tracklet 1 and 2 as real. Additionally, themodel consistently classiﬁed prominent SPIKEs andBURNs as displayed in tracklet 3 and 4. Most of thefalse negative results came from the ICN’s inabil-ity to consistently diﬀerentiate NOISE from AST.In tracklet 5 an extremely faint astronomical ob-ject is positioned at the center of each image whilesurrounded by subtraction artifacts (SCAR type ar-tifacts). The ICN classiﬁes most of the images asNOISE with high conﬁdence scores which results inthe TCN classifying the tracklet as bogus. In track-let 6 the ICN classiﬁes most of the postage stampsas AST even though the detections are indistinguish-able from Poisson noise.The two-stage deep learning model is currentlydeployed on ATLAS as a primary ﬁlter prior to NEOscreening before submission to the MPC. The modelis currently undergoing preliminary inspection as itis compared to the NEO candidate list generated10racklet I Score II Score III Score IV Score Pred Score Truth1 ST 0.99 ST 1.0 ST 1.0 ST 1.0 real 0.904 real2 AS 0.99 AS 0.90 AS 0.99 AS 0.99 real 0.999 real3 SP 0.66 SP 0.80 SP 0.71 SP 0.94 bogus 0.058 bogus4 BU 0.78 BU 0.91 BU 0.48 BU 0.63 bogus 0.129 bogus5 NO 0.63 AS 0.82 NO 0.92 AS 0.98 bogus 0.740 real6 AS 0.96 NO 0.99 AS 0.85 AS 0.98 real 0.982 bogusTable 4:

Each row shows the ICN’s highest scoring class on each individual image and the TCN’s prediction for thetracklet. The output scores for each classiﬁcation can be found to the right of its label. Also, the ground truth for eachtracklet is shown in the last column. Note that the prediction score diﬀers from the other scores by indicating a bogustracklet if the score is closer to zero and indicating a real tracklet if the score is closer to one. A threshold tracklet scoreof 0.8 is used to determine real versus bogus classiﬁcation (e.g. score of 0.6 is real). The class labels were simpliﬁed toBU, CR, NO, SC, SP, ST, AS, CO. by ATLAS before the deep learning removes bogustracklets. Several weeks of deployment have shownencouraging results as it reduces the NEO candidatelist by ≈

95% without losing any real tracklets.Table 5 shows the real-world performance of theTCN against unknown asteroid candidates that mustbe reviewed prior to submission to the MPC. Thefalse positives (column FP) are dominated by a sin-gle type of artifact (a bright horizontal optical eﬀectcaused by visible planets close to the ﬁeld) that wasunknown to the ICN for this work. Training the ICNon this additional image type would eﬀectively re-move this type of false tracklet, resulting in FP ratesnear 1% or less.Comparison of the TCN with human screeningis somewhat delicate. Because of the comprehen-sive job the human must perform, the TCN cannotscreen as accurately as its human counterpart. Bydeﬁnition, human-screened tracklets are 100% cor-rect since they form the basis for the training sets.But for visual screening only, we ﬁnd that the TCNperforms very accurately for image types where theICN has been trained,The human screening process also evaluates cri-teria beyond what the TCN was designed for. TheTCN ﬁlters tracklets purely based on visual appear-ance, i.e. imagery only, while humans screen bothon visual appearance and on dynamical parametersof a tracklet. Typical scenarios are a tracklet com-posed of detections from multiple variable stars, ortwo diﬀerent real asteroids mis-linked into a singletracklet. In these cases a “ﬁt” to the motion mayproduce plausible values, but further inspection ofpositional residuals against the ﬁtted motion wouldshow that at least one of the sources shows no actualmotion and is consistent with a variable star. To ahuman, this is a bogus but “real-looking” tracklet,while the TCN simply (and correctly) considers this “good”, since the images show detections that areindistinguishable from asteroids.A natural extension of our model would be the in-clusion of these additional non-image parameters, ormetadata, into an additional machine-learning net-work that complements the ICN and TCN. ATLASprovides numerous parameters about every detectionthat would be suitable: X, Y location on the detec-tor, proximity to bright stars and planets, proximityto known electronic artifacts, and so on. The ﬂexibil-ity of the model could allow us to feed image speciﬁcmetadata along the images in the ICN while track-let speciﬁc metadata would be fed along the outputsof the ICN into the TCN. However, we have chosennot to explore this avenue in this work for two rea-sons: a) the number of available metadata parame-ters is large (on the order of dozens) and the eﬀort totrain and understand a metadata model would dis-tract from the core of this work; and b) ATLAS al-ready performs non-machine-learning classiﬁcationsof detections with the metadata parameters using acode called vartest05 , and it is our opinion that vartest05 is eﬀective enough at pre-classifying de-tections so that the available performance gains areminimal. vartest05 employs its own internal modelof how a bright source can create a burn or diﬀractionspike, or what a cosmic ray might look like, and pro-vides a score for likelihood of being one of its knowndetection types. The set of detections that is seen bya human reviewer is thus pre-screened by vartest05 ;this screened set still contains many false detections,and that is our motivation for the ICN and TCNcreated for this work.11ight Unknown Real FP (%) FN (%)59090 188 26 6.4 059091 11 7 9.1 059092 79 8 1.3 059093 637 8 4.1 059094 185 11 3.3 059095 971 22 4.1 059096 47 16 2.1 059097 413 12 0.2 059098 29 16 3.4 059099 67 25 1.5 059100 134 63 2.2 059101 314 75 2.5 0Table 5:

The deployed model’s results were recorded from August 29, 2020 (MJD 59090) to September 9, 2020 (MJD59101) using production ATLAS data to assess its performance. “Unknown” is the number of tracklets that could notbe automatically matched with a known object, and “Real” is the number of these that were real astronomical objectsbased on visual inspection. “FP” is the false positive rate (bogus scored as real) and “FN” the number of real scored asbogus. The variation in the number of tracklets each night is due to sky coverage, weather conditions, and the presenceof sky features that can produce false tracklets (bright stars, planets). The false negative rate over this interval was zero,meaning no real tracklets were lost due to misclassiﬁcation.

While a real-bogus threshold of 0 . We have designed and deployed a lightweight ma-chine classiﬁcation model that can accurately dis-criminate between real and bogus tracklets in theATLAS asteroid detection pipeline. This modelachieves a ∼

90% reduction in the workload of falsetracklets to review each night at a cost of 0.4% in realobjects. This improved accuracy is an essential step toward immediate, automatic submission of danger-ous asteroids to the MPC after they are detected byATLAS. The reduced latency between detection andreporting increases the ability of follow-up telescopesto track inbound asteroids because their positionswill be closer to their discovery positions and there-fore easier to observe. Perhaps more importantly, re-duced latency provides greater warning time for civildefense purposes in the case of an actual impact.The model’s performance is still being monitoredwith the goal of identifying speciﬁc areas (types ofincoming ATLAS data) in which the deep learn-ing model fails to be advantageous over full hu-man screening. We identiﬁed eight dominant imagetypes that appear in ATLAS tracklets, and unsur-prisingly the model performs poorly classifying track-lets with image types that the ICN was not trainedon. Since the model was deployed, we have identi-ﬁed and started creating training sets for new classesto improve the model’s accuracy. One of these newclasses will address horizontal line artifacts generatedby very-bright astronomical sources such as the vis-ible planets. Early training results on these brighthorizontal features show that they can be eﬀectivelyremoved completely, but this capability was not inte-grated in time for this work. Future user feedback onthe screening lists will allow us to iteratively improvethe accuracy and overall robustness of the model.

This work has made use of data from the Aster-oid Terrestrial-impact Last Alert System (ATLAS)project. LD and ATLAS are primarily funded byNASA grants NN12AR55G, 80NSSC18K0284, and80NSSC18K1575; byproducts of the ATLAS NEOsearch include images and catalogs from the surveyarea. The ATLAS science products have been madepossible through the contributions of the Universityof Hawaii Institute for Astronomy, the Queen’s Uni-versity Belfast, the Space Telescope Science Insti-tute, and the South African Astronomical Observa-tory. This work was supported in part by a Na-tional Science Foundation Research Experience forUndergraduate grant (6104374) to the Institute forAstronomy at the University of Hawaii-Manoa. Wewould like to thank Dr. Michael Bottom, Dr. RobertJedicke, and Dr. Ben Shappee for their insightfulcomments and feedback.13 eferences

Heinze A., Denneau L., and Tonry J.L. Algorithms for real/bogus ﬁltering of atlas asteroid and transientdetections, 2020, in prep.Luis W. Alvarez, Walter Alvarez, Frank Asaro, Helen V. Michel, Luis W. Alvarez, Walter Alvarez, Frank Asaro,and Helen V. Michel. Extraterrestrial cause for the cretaceous-tertiary extinction.

Science , 208:1095–1108,1980.Dalya Baron. Machine learning in astronomy: a practical overview, 2019.Minor Planet Center. https://minorplanetcenter.net/mpec/K19/K19O56.html , 2019 OK, MPEC 2019-O56.Online; accessed 29 January 2014.Larry Denneau, Robert Jedicke, Tommy Grav, Mikael Granvik, Jeremy Kubica, Andrea Milani, Peter Vereˇs,Richard Wainscoat, Daniel Chang, Francesco Pierfederici, and et al. The pan-starrs moving object processingsystem.

Publications of the Astronomical Society of the Paciﬁc , 125(926):357–395, Apr 2013. ISSN 1538-3873.doi: 10.1086/670337. URL http://dx.doi.org/10.1086/670337 .Dmitry A Duev, Ashish Mahabal, Quanzhi Ye, Kushal Tirumala, Justin Belicki, Richard Dekany, Sara Frederick,Matthew J Graham, Russ R Laher, Frank J Masci, Thomas A Prince, Reed Riddle, Philippe Rosnet, andMaayane T Soumagnac. Deepstreaks: identifying fast-moving objects in the zwicky transient facility datawith deep learning.

Monthly Notices of the Royal Astronomical Society , 486(3):4158–4165, 04 2019. ISSN0035-8711. doi: 10.1093/mnras/stz1096. URL https://doi.org/10.1093/mnras/stz1096 .L. Foschini, L. Gasperini, C. Stanghellini, R. Serra, A. Polonia, and G. Stanghellini. The atmospheric fragmen-tation of the 1908 tunguska cosmic body: reconsidering the possibility of a ground impact, 2018.Alan W. Harris and Germano D’Abramo. The population of near-Earth asteroids. icarus , 257:302–312, Septem-ber 2015. URL https://ui.adsabs.harvard.edu/abs/2015Icar..257..302H .Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.A. N. Heinze, L. Denneau, J. L. Tonry, H. Weiland, B. Stalder, A. Rest, K. W. Smith, and S. J. Smartt. Neopopulation, velocity bias, and impact risk from an atlas analysis, 2020, in prep.Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutionalnetworks, 2016.Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. A survey of the recent architecturesof deep convolutional neural networks.

Artiﬁcial Intelligence Review , Apr 2020. ISSN 1573-7462. doi:10.1007/s10462-020-09825-6. URL http://dx.doi.org/10.1007/s10462-020-09825-6 .Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. Imagenet classiﬁcation with deep convolutional neuralnetworks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,

Advances in NeuralInformation Processing Systems 25 , pages 1097–1105. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf .Maggie Lieu, Luca Conversi, Bruno Altieri, and Benoˆıt Carry. Detecting solar system objects with convolutionalneural networks.

Monthly Notices of the Royal Astronomical Society , 485(4):5831–5842, Mar 2019. ISSN 1365-2966. doi: 10.1093/mnras/stz761. URL http://dx.doi.org/10.1093/mnras/stz761 .Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, ZacharyDeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, andSoumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach,H. Larochelle, A. Beygelzimer, F. d ' Alch´e-Buc, E. Fox, and R. Garnett, editors,

Advances in Neural Informa-tion Processing Systems 32 , pages 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf .14oseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Uniﬁed, real-time objectdetection, 2016.Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-timeobject detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D.Lee, M. Sugiyama, and R. Garnett, editors,

Advances in Neural Information Processing Sys-tems 28 , pages 91–99. Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf .Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, AndrejKarpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visualrecognition challenge, 2014.Kevin Schawinski, Ce Zhang, Hantian Zhang, Lucas Fowler, and Gokula Krishnan Santhanam. Generativeadversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit.

Monthly Notices of the Royal Astronomical Society: Letters , page slx008, Jan 2017. ISSN 1745-3933. doi:10.1093/mnrasl/slx008. URL http://dx.doi.org/10.1093/mnrasl/slx008 .Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition,2014.J. L. Tonry, L. Denneau, A. N. Heinze, B. Stalder, K. W. Smith, S. J. Smartt, C. W. Stubbs, H. J. Weiland,and A. Rest. Atlas: A high-cadence all-sky survey system.

Publications of the Astronomical Society ofthe Paciﬁc , 130(988):064505, May 2018. ISSN 1538-3873. doi: 10.1088/1538-3873/aabadf. URL http://dx.doi.org/10.1088/1538-3873/aabadfhttp://dx.doi.org/10.1088/1538-3873/aabadf