[PDF] Automated identification of transiting exoplanet candidates in NASA Transiting Exoplanets Survey Satellite (TESS) data with machine learning methods

Abstract

A novel artificial intelligence (AI) technique that uses machine learning (ML) methodologies combines several algorithms, which were developed by ThetaRay, Inc., is applied to NASA's Transiting Exoplanets Survey Satellite (TESS) dataset to identify exoplanetary candidates. The AI/ML ThetaRay system is trained initially with Kepler exoplanetary data and validated with confirmed exoplanets before its application to TESS data. Existing and new features of the data, based on various observational parameters, are constructed and used in the AI/ML analysis by employing semi-supervised and unsupervised machine learning techniques. By the application of ThetaRay system to 10,803 light curves of threshold crossing events (TCEs) produced by the TESS mission, obtained from the Mikulski Archive for Space Telescopes, we uncover 39 new exoplanetary candidates (EPC) targets. This study demonstrates for the first time the successful application of combined multiple AI/ML-based methodologies to a large astrophysical dataset for rapid automated classification of EPCs.

Full PDF

AAutomated identiﬁcation of transiting exoplanetcandidates in NASA Transiting Exoplanets SurveySatellite (TESS) data with machine learning methods

Leon Ofman a,b ∗ , Amir Averbuch c,d , Adi Shliselberg d , Idan Benaun d , DavidSegev d , Aron Rissman d a Dept. of Physics, Catholic University of America, Washington, DC 20064 b NASA GSFC, Code 671, Greenbelt, Maryland, 20771, USA. c School of Computer Science, Tel Aviv University, Tel Aviv, Israel d ThetaRay, 8 Hanagar Street, Hod HaSharon, Israel

Abstract

A novel artiﬁcial intelligence (AI) technique that uses machine learning (ML)methodologies combines several algorithms, which were developed by

ThetaRay,Inc. , is applied to NASA’s Transiting Exoplanets Survey Satellite (TESS)dataset to identify exoplanetary candidates. The AI/ML

ThetaRay systemis trained initially with Kepler exoplanetary data and validated with con-ﬁrmed exoplanets before its application to TESS data. Existing and newfeatures of the data, based on various observational parameters, are con-structed and used in the AI/ML analysis by employing semi-supervised andunsupervised machine learning techniques. By the application of

ThetaRay system to 10,803 light curves of threshold crossing events (TCEs) producedby the TESS mission, obtained from the Mikulski Archive for Space Tele-scopes, we uncover 39 new exoplanetary candidates (EPC) targets. Thisstudy demonstrates for the ﬁrst time the successful application of combined ∗ Corresponding author

Preprint submitted to New Astronomy February 23, 2021 a r X i v : . [ a s t r o - ph . E P ] F e b ultiple AI/ML-based methodologies to a large astrophysical dataset forrapid automated classiﬁcation of EPCs. Keywords:

Exoplanet detection methods - Transit photometry - ComputationalMethods - Machine Learning

1. Introduction

The Transiting Exoplanet Survey Satellite (TESS) (Ricker et al., 2014)was launched by NASA on April 18, 2018 with the primary objective ofall-sky surveying more than 200,000 near-Earth stars in search of transit-ing exoplanets using high-precision photometry, producing light curves witha 2-minute cadence. The TESS Objects of Interests (TOI) have been re-leased periodically and archived at the Mikulski Archive for Space Tele-scopes (MAST, https://archive.stsci.edu/ ). The TOI includes plan-etary candidates, as well as potential planetary candidates and other as-trophysical targets, including false positives, comprising the database usedfor searching for conﬁrmed exoplanets. As of March 23, 2020 TESS hasreleased 1766 TOIs with 43 conﬁrmed planets and 412 false positives (see, https://tess.mit.edu/publications/ ).Previously, Kepler Space Telescope launched by NASA in 2009 was de-signed to determine the occurrence frequency of Earth-sized planets. Towardsthis objective, Kepler observed about 200,000 stars with high photometricprecision discovering thousands of transiting exoplanets and exoplanetarycandidates (Borucki et al., 2010; Jenkins et al., 2010a; Koch et al., 2010;Christiansen et al., 2012). During the prime missions (2009 May 2 -2013 May21) Kepler was pointing at a single ﬁeld of view of about 115 square degreesin the constellations of Cygnus and Lyra. The many periodic signals detectedby Kepler were processed using the Kepler Science Processing Pipeline (Jenk-ins et al., 2010b). They were assembled into a database of threshold crossingevents (TCEs). Direct human input was required to remove false positivesand instrumental eﬀects from this database. However, the resulting TCEsdatabase contains data produced by many possible sources, such as eclipsingbinaries, background eclipsing binaries and many other possible false alarmsources, in addition to small fraction of exoplanetary candidates (EPCs), andstill require considerable analysis for conﬁrmed identiﬁcation of exoplanets.Recently, Shallue and Vanderburg (2018) identiﬁed transiting exoplanetsin Kepler satellite data using Deep Learning (DL) algorithm based on trainingof convolutional neural networks using the Google-Vizier system (Golovinet al., 2017). Shallue and Vanderburg (2018) trained the neural networksto classify whether a given light curve signal is a signature of a transitingexoplanet with low false positive rate. By using their algorithm, they identifymulti-planet resonant chains around Kepler-80 and Kepler-90. Later, theextended Kepler K2 mission, which starting in Nov. 2013, was designedto use the remaining Kepler capabilities after the completion of the primemission including the technical failures of the reaction wheels. During thisobservation phase, the photometric accuracy was reduced, and the pointingvaried in diﬀerent regions of the sky. Nevertheless, Dattilo et al. (2019) used asimilar automated technique based on Shallue and Vanderburg (2018) studythat is applied to mission data K2 while identifying two previously unknownexoplanets. 3utomated classiﬁcation methods for transiting exoplanets from TESSdata have been developed using machine learning (ML) techniques in severalstudies (e.g., Ansdell et al., 2018; Zucker and Giryes, 2018; Yu et al., 2019;Osborn et al., 2020) that demonstrate the usefulness and feasibility of thisapproach with various degrees of improved classiﬁcation performance. In thispaper, we describe an application of novel algorithms, which combine severalML approaches and low rank matrix decomposition, including algorithmsthat identify anomalies in high dimensional big data by using augmentationapproach. This methods, utilized semi-supervised and unsupervised learningwas developed by

ThetaRay, Inc. ( https://thetaray.com/ ) for uncoveringﬁnancial crimes, cyber and Internet of Things (IoT) security, was applied fortransiting EPCs search, reported in this study. By using Kepler data withconﬁrmed exoplanets as part of the algorithm training phase and validation,the ThetaRay platform was applied to TESS data yielding 39 new EPCs outof nearly 11000 TCEs, demonstrating the feasibility and utility of this newplatform.The paper is organized as follows: Section 2 discusses the ML methods,Section 3 presents the resulting planetary exoplanet classiﬁcation in TESSdata. Section 4 is discussion and conclusions. Details of

ThetaRay algorithmsare described in the Appendix.

2. Machine Learning Methods

ThetaRay

Algorithm

In the present study we utilize

ThetaRay

AI-based Fintech algorithms,commercially developed for anomaly detection (ﬁnancial crimes) in ﬁnancial4nstitutions, cyber security and IoT for smooth operations of critical infras-tructure installations. Since transiting exoplanets light curves are rare andonly appear in small number of all observed Kepler or TESS stellar lightcurves, they are classiﬁed as ‘anomalies’ in our analysis when

ThetaRay sys-tem utilizes the strengths of its algorithms to identify transiting EPCs inthe large number of TCEs. To identify these ‘anomalies’, or exoplanet light-curves,

ThetaRay ’s algorithms generates a data-driven ‘normal’ proﬁle of thedata ingested, and simultaneously identiﬁes anomalies also called abnormalevents, providing forensics that categorizes each event based on its features.This is done autonomously by the algorithm without the need to have rules orsignatures.

ThetaRay ’s algorithmic engine utilizes techniques drawn from awide variety of mathematical disciplines, such as harmonic analysis, diﬀusiongeometry and stochastic processing, low rank matrix decomposition, ran-domized algorithms in general and randomized linear algebra in particular,geometric measure theory, manifold learning, neural networks/deep learning,and compact representation by dictionaries. One approach models the dataas a diﬀusion process using Brownian motion of a random walk process togeometrize the data. There is no need for any semantic understanding of theprocessed data, nor are there any predeﬁned rules, heuristics or weights inthe system. The diﬀused collected dataset is then converted into a Markovmatrix through a normalized graph-Laplacian and modeled as a stochasticprocess that is applied in many dimension (could reach thousands) - see theAppendix for additional details of the algorithms.5 .2. Kepler Satellite data ML training

We have focused on light curves produced by the Kepler space telescope,which collected the light curves of ∼ https://exoplanetarchive.ipac.caltech.edu/ ). We obtained the TCE labels from the catalog’s“av training set” column, which has three possible values: planet candidate(PC), astrophysical false positive (AFP) and non-transiting phenomenon(NTP). We ignored TCEs with the “unknown” label (UNK). These labelswere produced by manual vetting and other diagnostics. We obtained addi-tional data on the TCEs such as planet number, radius of the planet, intervalbetween consecutive planetary transits, etc., from the MAST TESS archive( https://archive.stsci.edu/missions-and-data/transiting-exoplanet-survey-satellite-tess ) for data labeling and use in our analysis.6 .2.1. Features Feature engineering is the process of using data domain knowledge to cre-ate features by manipulating the data through mathematical and statisticalrelations (for examples, see section 2.2.4) of the various components in orderto improve the performance of the AI/ML algorithms. The feature engineer-ing process includes deciding which features to develop, creating the features,checking how the features work with the model, improving the features asneeded, and going back to deciding on or creating additional data featuresuntil the ML/AI algorithm results are optimized. We applied the featureengineering process on our dataset and created new features in addition tothe existing features available in MAST in order to provide more informationwhich will quantify various aspects of the data used by the AI/ML algorithmin the present analysis. We produced a total of 424 features that were usedfor the analysis. We chose the combination of features that provided the bestresults under the capabilities of

ThetaRay ’s system, validated in the train-ing step. In the feature engineering process, we tested the eﬀectiveness ofdiﬀerent combinations of features under the limits of

ThetaRay ’s system.

Additional TCEs Data were downloaded from MAST. We narrowed downthe data only to the required ﬁelds for the present task, such as the planetnumber, the radius of the planet, the interval between consecutive planetarytransits, etc., and selected the relevant data from all the ﬁelds from “DataColumns in the Kepler TCE Table” ( https://exoplanetarchive.ipac.caltech.edu/docs/API_tce_columns.html ) using the visualization of thevariables (especially KDE plots, see below). Below is the description of the7ariables and labels used in our analysis. • Unique key - concatenation of Kepler ID and Planet Number. KeplerID is a target identiﬁcation number, as listed in the Kepler Input Cat-alog (KIC). The KIC was derived from a ground-based imaging surveyof the Kepler ﬁeld conducted prior to launch. The survey’s purposewas to identify stars for the Kepler exoplanet survey by magnitude andcolor. The full catalog of 13 million sources can be searched at theMAST archive. The subset of 4 million targets found upon the KeplerCCDs can be searched via the Kepler Target Search form. – Kepler Input Catalog (KIC) (Brown et al., 2011). – MAST archive - http://archive.stsci.edu/kepler/kic10/search.php . – Kepler Target Search form - http://archive.stsci.edu/kepler/kepler_fov/search.php . • av training set - Autovetter Training Set Label. If the TCE wasincluded in the training set, the training label encodes what is be-lieved to be the “true” classiﬁcation, and takes a value of either PC,AFP or NTP. The TCEs in the UNKNOWN class sample are markedUNK. Training labels are given a value of NULL for TCEs not in-cluded in the training set. For more detail about how the training setis constructed, see Autovetter Planet Candidate Catalog for Q1-Q17Data Release 24 (KSCI-19091): https://exoplanetarchive.ipac.caltech.edu/docs/KSCI-19091-001.pdf .8 tce prad - Planetary Radius (Earth radii). The radius of the planetobtained from the product of the planet to stellar radius ratio and thestellar radius. • tce max mult ev - Multiple Event Statistic (MES). The maximum cal-culated value of the MES. TCEs that meet the maximum MES thresh-old criterion and other criteria listed in the TCE release notes are deliv-ered to the Data Validation (DV) module of the data analysis pipelinefor transit characterization and the calculation of statistics requiredfor disposition. A TCE exceeding the maximum MES threshold areremoved from the time-series data and the SES and MES statisticsrecalculated. If a second TCE exceeds the maximum MES thresholdthen it is also propagated through the DV module and the cycle isiterated until no more events exceed the criteria. Candidate multi-planet systems are thus found this way. Users of the TCE table canexploit the maximum MES statistic to help ﬁlter and sort samples ofTCEs for the purposes of discerning the event quality, determiningthe likelihood of planet candidacy, or assessing the risks of observa-tional follow-up. DV module – http://archive.stsci.edu/kepler/manuals/KSCI-19081-001_Data_Processing_Handbook.pdf • tce period - Orbital Period (days). The interval between consecutiveplanetary transits. • tce time0bk - Transit Epoch (BJD) - 2,454,833.0. The time corre-sponding to the center of the ﬁrst detected transit in Barycentric Ju-lian Day (BJD) minus a constant oﬀset of 2,454,833.0 days. The oﬀset9orresponds to 12:00 on Jan 1, 2009 UTC. • tce duration - Transit Duration (hrs). The duration of the observedtransits. Duration is measured from ﬁrst contact between the planetand star until last contact. Contact times are typically computed froma best-ﬁt model produced by a Mandel and Agol (2002) model ﬁt to amulti-quarter Kepler light curve, assuming a linear orbital ephemeris. • tce model snr - Transit Signal-to-Noise (SNR). Transit depth normal-ized by the mean uncertainty in the ﬂux during the transits. • av pred class - Autovetter Predicted Classiﬁcation. Predicted clas-siﬁcations, which are the ‘optimum MAP classiﬁcations.’ Values areeither PC, AFP, or NTP. • tce depth - Transit Depth (ppm). The fraction of stellar ﬂux lost atthe minimum of the planetary transit. Transit depths are typicallycomputed from a best-ﬁt model produced by the Mandel and Agol(2002) model ﬁt to a multi-quarter Kepler light curve, assuming a linearorbital ephemeris. • tce impact - Impact Parameter. The sky-projected distance betweenthe center of the stellar disc and the center of the planet disc at con-junction, normalized by the stellar radius. • local view - vector of length 201: a ‘local view’ of the TCE. It showsthe shape of the transit in detail (close-up of the transit event).10 .2.3. Visualization of Kepler Data We investigated the Kepler data and visualized the variables with Pan-das package in Python. For example, we visualize the distributions of thenumerical variables per class using KDE (Kernel Density Estimation) plots.In Figure 2.2.3 we show several interesting examples with a gap between thecurves labeled ‘Planets’ and ‘Not planets’ as identiﬁed by

ThetaRay systemand validated by the Kepler data training set. It can be concluded that thesefeatures are signiﬁcant in candidate exoplanet identiﬁcation and therefore wehave included them in the model. If both curves coincide, it can be concludedthat the behavior is the same for label ‘planets’ and ‘not planets’, and so wechose not to include these features in the model.Another example of our analysis is demonstrated in the ‘heat map’, whichis basically a color-coded matrix, where a correlation value between the vari-able of features is used to color each cell of the matrix to represent the relativevalue of that cell. If there is a high correlation between any variables, thedimension of the data can be reduced. The various features are labeled onthe axes. Obviously, the features on the main diagonal that indicate iden-tity correlation are light colored. It is evident from the ‘heat map’ shown inFigure 2.2.3 that most oﬀ-diagonal features are weakly correlated. The onlysigniﬁcant oﬀ-diagonal correlations is between av training set - the train-ing labels, i.e., if the TCE was included in the training set, the training labelencodes what is believed to be the “true” classiﬁcation, and av pred class - predicted classiﬁcations, which are the optimum MAP (maximum a poste-riori) classiﬁcations. In fact, this ﬁeld does not provide analysis informationfor the data but is used as forensic feature. The forensic features are not11 a) (b)(c) (d)

Figure 1: The distributions of the numerical variables using KDE (Kernel Density Esti-mation) plots where the blue curves are labeled ‘Planet’ and the orange curves are labeled‘ Not a planet’ from Kepler data. When there is signiﬁcant diﬀerence between the curves,it can be concluded that these features are more signiﬁcant for planet identiﬁcation andtherefore we have included them in the model. If both curves coincide, it can be concludedthat the behavior is not statistically diﬀerent between the two populations. The plottedvariables are (a) tce period , (b) tce duration , (c) tce time0bk , (d) tce model snr (seetext for their deﬁnitions). tce time0bk - transit epoch (BJD),and tce period - Orbital Period (days).

New features were developed based on the original data set from Keplerthat was obtained from MAST to optimize the analysis with

ThetaRay algo-rithm. These features were constructed from the original dataset as describedbelow using the phase-folded “Local View” light curves (see, e.g., Shallue andVanderburg, 2018). • global view - the original vector of length 2001 or a ‘global view’of the TCE that shows the characteristics of the light curve over anentire orbital period. Because of the size limitations of the ThetaRay ’ssystem, we performed dimension reduction. We represented groups of20 columns in the ‘global view’ by computing the average and standarddeviation of those columns. We have a total of 200 new “global view”features. • spline bkspace - the break-point spacing in time units, used for thebest-ﬁt spline. We chose the optimal spacing of spline breakpoints foreach light curve by ﬁtting splines with diﬀerent breakpoint spacings,calculating the Bayesian Information Criterion (BIC, Schwarz (1978))for each spline, and choosing the breakpoint spacing that minimizedthe BIC. Below, is a brief description of the new features that werecomputed for each TCE “Global View” and “Local View” light curves:13 igure 2: The ‘Heat map’ of some of the features (or parameters) used in the ThetaRay algorithm. The intensity scale indicates the magnitude of the correlation between thefeatures that facilitates determining the dimensionality of the dataset (see text). loc mean – average of the “Local View” light curve. • loc std - standard deviation of the “Local View” light curve. • loc 25% -25% percentile of the “Local View” light curve. • loc 75% - 75% percentile of the “Local View”light curve. • loc max – max value of the “Local View” light curve. • glob mean – average of the original “Global View” light curve. • glob std - standard deviation of the original “Global View” light curve. • glob 25% - lower percentage of the original “Global View” light curve. • glob 75% - upper percentage of the original “Global View” light curve. • glob max – max value of the original “Global View” light curve. • zScore loc min – minimum value of the Z-Score on the “Local View”light curve with window of 10. • zScore loc max – maximum value of the Z-Score on the “Local View”light curve with window of 10. • zScore glob min – minimum value of the Z-Score on the “Global View”light curve with window of 100. • zScore glob max – maximum Z of the-Score on the “Global View”light curve with window of 100.15 .2.5. Working on ThetaRay ’s System

We built in

ThetaRay platform an “analysis chain”, which is a multi-staged ﬂowchart, that is composed of three main stages: Data Source, DataFrame and Analysis. The data is organized into data sources and they areuploaded to

ThetaRay ’s platform. We created data frames in the system withwrangling method (where, data wrangling is a process of cleaning, structur-ing and enriching raw data into a desired format with the intent of makingit more appropriate and valuable for modeling) and split the data randomlyinto train and test in

ThetaRay system such that 80% is allocated for trainingand 20% are allocated for testing. The training procedure generates proﬁleand this was fed into diﬀerent types of analyses using

ThetaRay

Augmentedand unsupervised algorithms, to ﬁnd the best parameters that maximize theArea Under ROC Curve (AUC) in each chain, where ROC is Receiver Oper-ating Characteristic (ROC) curve - a standard evaluation metrics for testingclassiﬁcation model’s performance. After the analysis and review of theseresults were completed, the data was processed again after modiﬁcation andﬁne tuning of the internal parameters in the system for results improvement.Then, identiﬁcation was executed again.

We obtained 10,803 light curves of TCEs produced by the TESS mis-sion from MAST ( http://archive.stsci.edu/ ). We wanted to use thesame model we built based on Kepler’s data, in order to ﬁnd potential ex-oplanets (anomalies) in the new data from TESS. For using the same mod-els for the two diﬀerent satellites, we must convert the TESS data to the16ame structure as Kepler data. Therefore, we performed additional stepsto prepare the light curves to be used as inputs to our system. We gener-ated a set of TFRecord ﬁles for the TCEs. Each ﬁle contains global view , local view and spline bkspace representations like in Kepler. We alsocreated in python the following data ﬁles: • global view - Vector of length 2001 that shows the characteristics ofthe light curve over an entire orbital period. • local view - Vector of length 201 that shows the shape of the transitin detail (phase-folded close-up of the transit event). • more features - includes – ticid - TESS ID of the target star. – planetNumber - TCE number within the target star. – planetRadiusEarthRadii - has the same meaning as the ﬁeld of tce prad in Kepler data. – spline bkspace , mes - same meaning as tce max mult ev in Ke-pler data. – orbitalPeriodDays - same meaning as tce period in Keplerdata. – transitEpochBtjd - same meaning as tce time0bk in Keplerdata. – transitDurationHours - same meaning as tce duration in Ke-pler data. 17 transitDepthPpm - same meaning as tce depth in Kepler Data. – minImpactParameter - same meaning as tce impact in Keplerdata.TESS data is unlabeled, so av training set and av pred class ﬁeldsdo not exist in the TESS data, therefore, we ﬁlled these ﬁelds with zeros. tce model snr feature exists in Kepler data, but it does not exist inTESS data, so we calculated its value by the ratio of transitDepthPpm and transitDepthPpm err . • Describe files - includes count, mean, std, min, max, 25% percentile,median (50%), 75% percentile. These quantities were computed on eachoriginal data row from the global view and local view ﬁles and oneach scaling row of these ﬁles.Following the generation of the dataset in the form of Coma SeparatedValues (CSVs), we applied the same manipulation on global view , as inKepler data, in order to reduce the dimensions, and used the analogous424 features produced from TESS data as in Kepler data, for the analysis on

ThetaRay ’s system. Following this step, we applied the

Detection algorithmon TESS data according to the saved model from Kepler and used the resultsfor classiﬁcation and mapping of TESS light curve TCEs data.

3. Results: Transiting Exoplanet Detection

The ﬁrst results of the

ThetaRay algorithm produced around 90 prelim-inary identiﬁcation of EPCs that were further manually vetted, reducing18he number of conﬁrmed EPCs by about a factor of two. Local view light-curves were used together with planetary candidate parameters to vet thealgorithm’s output. In the manual vetting the physical parameters, such asnon-typical ‘local view’ light curves (i.e, v-shapes, and other non-planetaryperiodic features), extremely large planetary radius, and very low signal-to-noise were used. The parameters for the remaining 39 identiﬁed EPCs bythe

ThetaRay system form the TESS database of 10,803 TCE’s are given inTable 1. In Figure 3 we show the Local View light curves of eight selectedlight curves for exoplanetary candidates identiﬁed using the

ThetaRay al-gorithm. The TESS input catalog ID number (TIC ID), along with severalparameters ( tce prad, tce period, tce depth deﬁned in section 2.2.2) forthe identiﬁed EPC are indicated on each panel. Of the 39 validated cases wenote that only two case with planetary radius ( tce prad ) or r p < R Earth (TIC ID 307210830 and 259377017), and a total of eight EPCs identiﬁedwith r p < R Earth . Another 15 identiﬁed EPCs were similar in size or largerthan Jupiter with r p ≥ R Earth . We ﬁnd the following properties of the 39cases • The orbital periods ( tce period ) of the identiﬁed EPCs range from0.38d to just under 23d. • The transit depth ( tce depth ) varied by about an order of magnitudein the range 986 − ∼ −

60 ppm. • The impact parameter was in the range ∼ . − . • The duration of the transits ( tce duration ) was in the range 0 . − . • In four cases the identiﬁed EPCs suggest multiple planetary systemswith 2 and 3 planets.

4. Discussion and Conclusions

The TESS satellite provides observations of a large number (200,000) ofstellar light curves with high photometric precision over the whole sky, di-vided in observing sectors, with the aim of detecting transiting Earth-sizedplanets. The stellar object were selected to represent the brightest and clos-est to our solar system. The large dataset of nearly 27 gigabytes per day isthen processed in the science data pipeline providing nearly 11,000 TCE’s asof the time of writing this paper. Further analysis of the TCEs is required toﬁnd conﬁrmed examples of exoplanets, or exoplanetary candidates for morein-depth processing. However, evidently this formidable data analysis task isdiﬃcult, if not impossible to carry out manually. A feasible approach for theTESS data analysis is based on automated identiﬁcation techniques that weredeveloped recently, customized for transiting exoplanetary candidates iden-tiﬁcation, utilizing AI/ML methods based on DL neural networks machinelearning methods combined with anomaly identiﬁcation methods reportedthe present study. This EPCs could be than vetted further with targetedobservations and data analysis.In this study we apply a novel algorithm developed by

ThetaRay, Inc. for cybersecurity and anomaly identiﬁcation in ﬁnancial systems. The ad-vantage of this AI/ML system over other machine learning methods is the20 able 1: Some of the parameters (see text) of identiﬁed exoplanetary candidates (EPCs)from the TESS mission data archive at http://archive.stsci.edu/ using the

ThetaRay system.

TIC ID TIC_ID: 101948569 -1.2-1-0.8-0.6-0.4-0.200.2 0 50 100 150 200 250

TIC_ID: 422655579 -1.2-1-0.8-0.6-0.4-0.200.20.4 0 50 100 150 200 250

TIC_ID: 178155732 tce_prad=2.372940063tce_period=5.971879959tce_depth=316.0220032 -1.2-1-0.8-0.6-0.4-0.200.20.4 0 50 100 150 200 250

TIC_ID: 219403686 tce_prad=5.821829796tce_period=0.380145997tce_depth=1336.619995tce_prad=15.61709976tce_period=2.903460026tce_depth=4657.879883 -1.2-1-0.8-0.6-0.4-0.200.20.4 0 50 100 150 200 250

TIC_ID: 270677759 tce_prad=9.437470436tce_period=9.129110336tce_depth=8185.049805 -1.2-1-0.8-0.6-0.4-0.200.20.40.6 0 50 100 150 200 250

TIC_ID: 453767182 tce_prad=2.725820065tce_period=10.76249981tce_depth=1361tce_prad=3.049010038tce_period=19.47240067tce_depth=1645.949951 -1.2-1-0.8-0.6-0.4-0.200.20.4 0 50 100 150 200 250

TIC_ID: 308994098 tce_prad=5.175449848tce_period=10.51659966tce_depth=991.1049805 -1.2-1-0.8-0.6-0.4-0.200.2 0 50 100 150 200 250

TIC_ID: 423275733 tce_prad=17.97360039tce_period=2.052979946tce_depth=10176.90039time timetime timetime timetime time

Figure 3: “Local view” normalized phase-folded light-curves of selected exoplanetarycandidates from Table 1 with the parameters tce prad the radius in terms of R Earth , tce period in days, tce depth in ppm, indicated on the corresponding panels. The typ-ical eclipsing exoplanetary light curve temporal shape structure is evident. ThetaRay algorithm to TESS TCE’s we report 39 newplanetary candidates in wide range of sizes from below Earth’s radius tosuper-Jupiter’s radii, and planetary periods ranging from 0.38d to just un-der 23d. We demonstrate that the combination of DL neural networks withanomaly identiﬁcation mathematical techniques provide an eﬃcient AI/MLalgorithm for the rapid automated search of transiting exoplanet candidateslight curves. Although, we ﬁnd that we need to apply manual vetting to re-duce the number of false-positives, the total number of EPCs identiﬁcationsis manageable for secondary manual vetting of the relatively small number oflight-curves, and this approach provides the desired identiﬁcation results. Infuture applications, the

ThetaRay ’s algorithm could be further optimized fortransiting exoplanets identiﬁcation, by including, for example, informed MLsteps, potentially reducing further the false-positive rate in this applicationand providing a new tool for analyzing TESS TCE data.

Acknowledgment

The resources for this research were provided by

ThetaRay, Inc.

LOwould like to acknowledge the hospitality of the Department of Geosciences,Tel Aviv University. 23 ppendix

The classiﬁcation of light curves as exoplanetary candidates in this paperis achieved by using the analytic platform of

ThetaRay that is described inthis appendix. This platform processes high dimensional big data to iden-tify anomalous behavior in comparison to a normal proﬁle. This anomalydetection tool is used in the present application for classiﬁcation of EPCsin TESS TCE database. The normal proﬁle is a training data driven andits generation is explained below. In the present study we used Kepler TCEdata as a training dataset as described in section 2.2. This appendix de-scribes some of the algorithms that were utilized in the study of identifyinganomalies in a big data using augmentation, semi-supervised and unsuper-vised type algorithms. The same core algorithms for anomaly identiﬁcationare capable of identifying anomalies in cyber (malware), industrial malfunc-tion (IoT) and ﬁnancial (crimes) data. The algorithms were applied for theﬁrst time to astrophysical data in this study. These algorithms are partof

ThetaRay ( ) core technology portfolio to ﬁght ﬁnan-cial crimes (Shabat et al., 2018a). The algorithms are housed in ThetaRay

Computational Platform that enables eﬃcient data manipulation and pro-cessing. The reported results were obtained by executing these algorithmson

ThetaRay platform.

Appendix A. Semi-supervised processing via augmentation: In-troduction

For background and context, we describe brieﬂy the

ThetaRay systemcurrent commercial applications that now have been expanded and applied to24strophysics dataset. The

ThetaRay is designed to provide a fast and accurateanalytic solutions for identifying emerging risk/crime (classiﬁed as anomalies)in ﬁnancial data, discovering new opportunities, and exposing blind spotswithin these large, complex high dimensional data sets. These AI-basedalgorithms radically reducing false positives, and are uniquely able to uncover“unknown unknowns” (these are threats that one is not aware of, and do noteven know that one is not aware of them).

ThetaRay provides constructivesolutions to anomaly detections challenges via its analytic platform designedfor a big data, uncover previously unknown risks, and do so with industrylow false positive rates and in real time enabling fast forensic.In this project, we assume that some labels of Kepler TCE data, which isa related dataset to TESS TCEs, are given but are not given for the TESSdata. An augmented algorithm, which is considered as a learning method,generates a new data frame based on the provided labels. Then, the newdata frame serves as an input to unsupervised algorithms. In this project,we apply 3 unsupervised algorithms to the augmented data: Geometric-baseddenoted by NY (see section Appendix C.1), algebraic-based denoted by LU(see section Appendix C.2), an hybrid of LU and NY and Neural networkdenoted by AE.The augmentation method is based on Neural Network. By using a Neu-ral Network-based method, the default network (that can be user-adjusted)consists of one input layer (the analysis data frame), three hidden layersand one output layer. All the layers are connected through “weights” thatare automatically tuned during the learning (optimization) process until thenetwork output layer values are close to the values of the provided labels.25fter optimization, the third hidden layer becomes the new data frame aswell as the input to the unsupervised algorithms that are outlined in sectionAppendix B and some of them are described in details in section AppendixC. ThetaRay’s platform covers detection and monitoring of several verticalswith current emphasis on ﬁnancial crimes by suppling an end-to-end solu-tion. ThetaRay provides an un- and semi-supervised real-time agnostic, AIbased ﬁnancial crimes detection platform that are based on anomaly detec-tion algorithms of “unknown unknowns”.Rule-based technology, which is very popular among anomaly detectiontools, is intended for what is known and when you know what to look for.

ThetaRay’s detection is achieved by un- and semi-supervised with automaticmethods that are not based on rules, patterns, signatures, heuristics, datasemantics of the features or any prior domain expertise and provide high de-tection rate and very low false positives.

ThetaRay’s methodologies within itsAnalytics Platform are based on unbiased detection through a series of ran-domized advanced AI-based algorithms that can process any number of datafeatures and can be explained, justiﬁed and anomalies can be traced back toidentify features that triggered the anomalies therefore it is not classiﬁed asa black box. Thus, the platform enables past tracking of events and featuresthat trigger the occurrence of anomalies.

ThetaRay’s system operates underthe assumption that is not know what to look for or what to ask. This al-lows their technology to potentially, detect every type of anomaly before therules are discovered automatically. For eﬃcient processing of the algorithmsthe system uses oﬀ-the-shelf hardware components. Inherent parallelism in26he algorithms are implemented with GPU utilization. The platform con-tains advanced and interactive visualization of the input and output phasesof the data analysis. The detection approach is data driven thus, no pre-existing models are assumed to exist. This makes this approach universaland generic and thus opens the way for diﬀerent applications without theintroduction of bias, limitations, and unfounded preconceptions into the pro-cessing, a property well suited for large astrophysical datasets. Mathematicaland physical justiﬁcation for most of the available algorithms in the systemare given below.The input training data can be enriched by a given limited set of labels.This increases the detection rate and reduces the false alarm rates. Thisis part of semi-supervised algorithms. Semi- and un-supervised algorithmsare used. Currently, the platform contains eight diﬀerent unsupervised al-gorithms for the data without labels and three diﬀerent semi-supervised al-gorithms for the data with partial labels within the detection engine. Theresults are fused to produce one solution.

ThetaRay combines the strengthsof unsupervised and semi-supervised techniques to identify anomalies in thedata. Unsupervised learning assumes that there are no labels to the vari-ous data components. Semi-supervised learning frameworks have made sig-niﬁcant progress in training machine learning with limited labeled data inimage domain. Augmented unsupervised learning can be used side-by-sidewith semi-supervised learning. The augmented algorithms generate a newdata frame based on the analysis data frame and the provided labels. Thenew data frame generated is then the new input for all the unsupervisedalgorithms selected. Labels are categorized as binaries, with the minority of27he labels (known anomalies) marked as “1” and the remainder, which arethe majority of unknown cases, assigned “0”.Augmented process enables covering both the known and the unknownwith a relative balance between them. The

ThetaRay system allows for con-ﬁguration of the underlying input features, algorithms and detection logicat each applications. Technically it is a neural network-based process whichgenerates a new data frame based on the input data frame and binary labelsprovided by the application (in the present case, stellar light-curve data).

Appendix B. Unsupervised algorithms: General descriptionNY:

This algorithm (see, Figure B.4) is based on diﬀusion maps (DM)methodology (Coifman and Lafon, 2006a) and it is primarily a non-linear dimension reduction process. The anomaly identiﬁcation proce-dure takes place inside the lower dimensional space (manifold) that isdetermined automatically during the training phase. An out-of-sampleextension procedure (Coifman and Lafon, 2006b) is applied to the iden-tiﬁcation phase for each multidimensional data point, which did notparticipate in the training phase, to determine whether it belongs tothe manifold (low dimensional space - classiﬁed as normal) or deviatesfrom it (classiﬁed as anomalous).The NY algorithm, which is based on DM, geometrizes the input train-ing data. DM analyzes the ambient space (training data) and deter-mines automatically where the data actually resides in the embeddedspace. We can visualize the input training data (ambient space) as amatrix of size m × n where m is the number of multidimensional data28 rocessing the input data to obtain a reduced dimension embedding (manifold)Receiving newly arrived data pointDetermine if the newly arrived data point belongs to the embedded manifold than it is classified as normal otherwise it is classified as abnormal (anomalous) Normalization to obtain Markov matrixExtraction of the eigenvalues and eigenvectors of the Markov matrix Input; high dimensional data

Generation of the embedded manifold by the eigenvalues and eigenvectors

TrainingDetection

Figure B.4: NY algorithm: ﬂow chart. points (number of rows in the matrix) and each row is of dimension n - the number of columns in the matrix. The input data is assumed tobe sampled from a low dimensional manifold (embedded space) thatcaptures the dependencies between the observable parameters. DM re-duces in a non-linear way the dimension of the ambient space whichis the training data. The dimensionality reduction by DM is based onlocal aﬃnities between multidimensional data points and on non-linearembedding of the ambient space into a lower dimensional space, de-scribed as a manifold, by using a low rank matrix decomposition. Thenon-parametric nature of this analysis uncovers the important under-lying factors of the input data and reveals the intrinsic geometry of thedata represented by the embedded manifold. This manifold describesgeometrically what we classify as the normal proﬁle of the ambient data.29ewly arrived multidimensional data points, which did not participatein the training procedure, are embedded into the lower dimensionalspace by the application of an out-of-sample extension algorithm. Ifthe embedded multidimensional data point falls into the manifold, it isclassiﬁed as normal otherwise it is classiﬁed as abnormal (anomalous).See section Appendix C.1 for more details. LU:

Based on a randomized low-rank matrix decomposition (Shabat et al.,2018b). This algorithm builds a dictionary from the training data.Then, each newly arrived multidimensional data point that is not welldescribed (not spanned well) by the dictionary is classiﬁed as an anoma-lous data point.The randomized LU (RLU) algorithm is an algebraic approach appliedto input matrix A of size m × n with an intrinsic dimension k smallerthan n . k can be computed automatically or given. RLU is a low rankmatrix decomposition which enables the identiﬁcation of anomalies us-ing a dictionary constructed from the training data. RLU forms a lowrank matrix approximation of A such that P AQ ≈ LU where P and Q are orthogonal permutation matrices, and L and U are the lowerand upper triangular matrices, respectively. A dictionary is then con-structed according to D = P T L ( T is the transpose of a matrix). Thus, D is a linear combination of the input matrix and a representation ofthe normal data. It is also used in the identiﬁcation step to classifynewly arrived multidimensional data points that did not participatein the training phase. Thus, a new incoming a multidimensional datapoint x , which satisﬁes (cid:107) DD † x − x (cid:107) < (cid:15) , is classiﬁed as normal; other-30ise, it is classiﬁed as anomalous. Here, D † is the pseudo inverse of D and (cid:15) is a quantity deﬁned in the training phase. When applied to amatrix A of size m × n , the RLU decomposition reduces the number m of multidimensional data points, resulting in a reduced-measurementsmatrix of size k × n where k < n < m . Although the algorithm is arandomized, it has been proven in Shabat et al. (2018b) that the prob-ability that the RLU approximation will generate a big error tends tobe very small. See section Appendix C.2 for more details. DK:

The DK Algorithm relies on successive applications of LU and NY.Assume the size of a given training matrix is m data points (rows)by n features (columns). RLU (described in section Appendix C.2)is applied to n . The size of n is reduced substantially through theapplication of random projection (Johnson and Lindenstrauss, 1984).Then, NY (described in section Appendix C.1) is applied to n (dimen-sion) and the matrix is embedded into a lower dimensional space andanomaly identiﬁcation procedure NY is called in this embedded space. AE:

This is a variational autoencoder (AE) algorithm. AE is machine learn-ing tool designed to generate complex models of data after careful dis-tribution modeling of example data. In neural net language, AE con-sists of an encoder component and a decoder component. We assumethat the input data set is generated from an underlying unobserved(latent) representation. Given an input data set, the encoder part ofan AE approximates the distribution of the latent variables. Finally,the algorithm sets the distribution parameters of the latent layers in a31anner that maximizes the likelihood of generating or reconstructingthe input data in the decoder section. As soon as the distribution ofthe latent variables is approximated, we can sample from this distri-bution to generate an approximate representation of the input data.Since normality consists of and is deﬁned by most of the data points,those will be well-approximated by the AE, while anomalies will bepoorly modeled. Therefore, by comparing the original sample with thereconstructed (generated) data, we can calculate a similarity score thatenables us to detect anomalies. The goal is to use the AE as a denoisingautoencoder. It allows us to encode our sample into the latent spaceand then reconstruct it. By comparing the original sample to the re-construction, we are able to calculate a score that enables us to classifya data point as anomalous data point. Since we plan to use the AE foranomaly detection, we have to calculate the scores for the input andoutput.

Appendix C. Unsupervised algorithms: Mathematical description

Appendix C.1. Diﬀusion geometry: Background

DMare a kernel-based method for manifold learning that can reveal theintrinsic structures in data and embed them in a low dimensional space. TheDM-based approach computes the diﬀusion geometry. A spectral embeddingof the data points provides coordinates that are used to interpolate andapproximate the pointwise diﬀusion map embedding of data.Manifold learning approaches are often used for modeling and uncoveringintrinsic low dimensional structure in high dimensional data. DM is a method32hat captures data manifolds with random walks that propagate throughnon-linear pathways in the data. Transition probabilities of a Markoviandiﬀusion process (explained later how to compute them) deﬁne an intrinsicdiﬀusion distance metric that is amenable to a low dimensional embedding.By arranging transition probabilities in a row-stochastic diﬀusion operator,and taking its leading eigenvalues and eigenvectors, one can derive a smallset of coordinates where diﬀusion distances are approximated as Euclideandistances and intrinsic manifold structures are revealed.In more details, the NY algorithm uncovers the internal geometry ofthe input training data denoted as A . The use of geometric consdierationsspeeds up signiﬁcantly the anomaly detection computational time. Next isa theory that supports this approach: The goal is to detect anomalies in A and in newly arrived n -dimensional data points that did not participatein the training data A . During the training procedure, size of n , which isalso called the dimension of A , is automatically reduced. The procedureis called dimensionality reduction. Dimensionality reduction as explainedlater, is achieved without damaging the quality and the coherency of thedata in A . More than that, there is no loss of data as explained later.Dimensionality reduction is just a diﬀerent representation of the training datathat automatically without any human intervention reduced the dimensionaccording to the data and uncovers the real dimension where the trainingdata actually resides.In general, anomaly detection is based on the notion of similarities (oraﬃnities) between the m high dimensional data points (these are the rows inthe matrix A ). How we detect anomalies in this big data eﬃciently without33ntroducing bias and without damaging the data? Dimensionality reductionof n is needed. How to achieve this reduction? The following provides the ra-tionale why geometrization of the training data A and tracking the movementof newly arrived data points identify a low dimensional manifold for learning.It is founded mathematically through the preservation of the quality and theintegrity (completeness) of the data in A .The assumption is that the processed data is imbalance: High densitiesof n -dimensional samples (rows in the matrix A ) represent normal data oth-erwise the data is classiﬁed as anomalous (abnormal) since the majority ofthe data is normal and thus it is classiﬁed as having high density.Theory: How to ﬁnd the low dimensional space (manifold)? It is provedthat if A is sampled from a low intrinsic dimensional manifold then, as n (di-mension) tends to inﬁnity, the deﬁned random walk, which travels betweenall the data samples, converges to a diﬀusion process over the manifold. Thisis the key to the processing of A as diﬀusion process that guarantees eﬃcientscan of the data through randomization without introduction of bias. Itprovides three complementary approaches for dimensionality reduction – dif-fusion distances between n -dimensional samples, randomization and manifoldlearning - emerge from this observation (theorem): 1. A kernel matrix B ofsize m × m (huge) is constructed from distances among all the n -dimensionalsamples (rows). The distances are diﬀusion distances. 2. Random walk isapplied to the entries in B . This random walk guarantees that there is nobias between the utilization of the distances in B . 3. Diﬀusion Maps (DM)links between the matrix B and a lower dimensional space (manifold) viadiﬀusion processing. The dimension of the embedded manifold represents34he reduction of n .Geometrization of the training data - outline description of the approach:The NY algorithm is based on a geometric uncovering of a low dimensionalmanifold in the ambient space (the original space represented by A ) by theapplication of DM to ambient space represented by A . The input data isassumed to be sampled from a low intrinsic dimensional manifold that cap-tures the dependencies between the observable parameters ( n -dimensionalfeatures). DM reduces the dimension n of the training data. It is basedon local aﬃnities between multidimensional data points and on non-linearembedding of the ambient space into a lower dimensional space, described asa manifold, by using a low rank matrix decomposition. The non-parametricnature of this analysis uncovers the important underlying factors of the in-put data and reveals the intrinsic geometry of the data represented by theembedded manifold. This manifold describes geometrically what we classifyas the normal proﬁle in the ambient data. Newly arrived n-dimensional datapoints, which did not participate in the training procedure, are embeddedinto the lower dimensional space by the application of an out-of-sample ex-tension algorithm. If the embedded n-dimensional data point falls into themanifold where most of the normal data reside, it is classiﬁed as normal;otherwise it is classiﬁed as abnormal (anomalous). The exchange of data be-tween the ambient space and the manifold, where the detection takes place,does not degrade the coherency and the completeness of the data and pre-serves the geometrical relations (aﬃnities) between the two spaces – ambientand embedded (manifold). 35 ppendix C.1.1. Diﬀusion geometry: outline Let X = { x , . . . , x n } be a dataset and let k : X ×X → R be a symmetricpoint-wise positive kernel that deﬁnes a connected, undirected and weightedgraph over X . Then, a random walk over X is deﬁned by the n × n row-stochastic transition probabilities matrix P = D − K , where K is an n × n matrix whose entries are K ( i,j ) := k ( x i , x j ) , i, j = 1 , . . . , n, and D is the n × n diagonal degrees matrix whose i -th element is d ( i ) := (cid:80) nj =1 k ( x i , x j ) , i =1 , . . . , n. The vector d ∈ R n is referred to as the degrees vector of the graphdeﬁned by k .The associated time-homogeneous random walk X ( t ), is deﬁned via theconditional probabilities on its state-space X : assuming that the processstarts at time t = 0, then for any time point t ∈ N P ( X ( t ) = x j | X (0) = x i ) = P t ( i,j ) , where P t ( i,j ) is the ( i, j ) − th entry of the t -th power of thematrix P . As long as the process is aperiodic, it has a unique stationarydistribution ˆd ∈ R n which is the steady state of the process, i.e. ˆd ( j ) =lim t →∞ P t ( i,j ) , regardless the initial state X (0). This steady state is theprobability distribution resulted from (cid:96) normalization of the degrees vector d , i.e., ˆd = d (cid:107) d (cid:107) ∈ R n , (C.1)where (cid:107) d (cid:107) := (cid:80) ni =1 d ( i ) . The diﬀusion distances at time t are deﬁned bythe metric D ( t ) : X × X → R , D ( t ) ( x i , x j ) := (cid:13)(cid:13) P t ( i, :) − P t ( j, :) (cid:13)(cid:13) (cid:96) ( ˆd − ) = (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) k =1 ( P t ( i,k ) − P t ( j,k ) ) / ˆd ( k ) , i, j = 1 , . . . , n. (C.2)36y deﬁnition, P t ( i, :) , the i -th row of P t , is the probability distribution over X after t time steps given that the initial state is X (0) = x i . Therefore, thediﬀusion distance D ( t ) ( x i , x j ) from Eq. C.2 measures the diﬀerence betweentwo propagations along t time steps: the ﬁrst is originated in x i and thesecond in x j . Weighing the metric by the inverse of the steady state resultsin ascribing high weight for similar probabilities on rare states and vice versa.Thus, a family of diﬀusion geometries is deﬁned by Eq. C.2, each correspondsto a single time step t .Due to the above interpretation, the diﬀusion distances are naturally uti-lized for multiscale clustering since they uncover the connectivity propertiesof the graph across time. In B´erard et al. (1994); Coifman and Lafon (2006a)it has been proven that under some conditions, if X is sampled from a low in-trinsic dimensional manifold then, as n tends to inﬁnity, the deﬁned randomwalk converges to a diﬀusion process over the manifold. Appendix C.2. Randomized LU decomposition: An algorithm for dictionaryconstruction

A dictionary construction algorithm is presented. It is based on a low-rank matrix factorization being achieved by the application of the randomizedLU decomposition (Shabat et al., 2018b) to a training data. This methodis fast, scalable, parallelizable, consumes low memory, outperforms SVD inthese categories and works also extremely well on large sparse matrices. Incontrast to existing methods, the randomized LU decomposition constructsan under-complete dictionary, which simpliﬁes both the construction and theclassiﬁcation processes of newly arrived multidimensional data points. Thedictionary construction is generic and general that ﬁts diﬀerent applications.37he randomized LU algorithm, which is applied to a given training datamatrix A ∈ R m × n of m multidimensional data points and n features, de-composes A into two matrices L and U . The size of L is determined by thedecaying spectrum of the singular values of the matrix A , and bounded bymin { n, m } . Both L and U are linearly independent.The randomized LU decomposition algorithm (see, Figure C.5) computesthe rank k LU approximation of a full matrix (Algorithm 1). The mainbuilding blocks of the algorithm are random projections and Rank RevealingLU (RRLU) (Pan, 2000) to obtain a stable low-rank approximation for aninput matrix A that is classiﬁed as a training data. In Figure C.5 ‘II’ describesthe generation of a dictionaries by calling item I that describes the ﬂow of therandomized LU decomposition. The end of the execution of ‘I’ means thatthe training is completed. The dictionaries are the input of ‘II’ that performsthe identiﬁcation. Newly arrived data point that did not participate in thetraining is either span (classiﬁed as normal) or not spanned by the dictionary(classiﬁed as anomalous).The RRLU algorithm, used in Algorithm 1, reveals the connection be-tween LU decomposition of a matrix and its singular values. Similar algo-rithms exist for rank revealing QR decompositions (see, for example Gu andEisenstat (1996)). Theorem Appendix C.1 (Pan (2000)) . Let A be an m × n matrix ( m (cid:29) n ).Given an integer ≤ k < n , then the following factorization P AQ =  L L I n − k   U U U  , (C.3) holds where L is a lower triangular with ones on the diagonal, U is an onstruct dictionary by applying randomized LU TO A using IReceiving newly arrived data pointDetermine if the newly arrived data point is spanned by D then classified as normal Randomized LUP,Q, L,U Dictionary construction and its pseudo inverse by using the lower matrix and the permutation matrix

Dictionary D

A,k III

Figure C.5: II calls the construction of a dictionary D via randomized LU decompositionas described in I. The LU algorithm is built from the following steps: The inputs to thealgorithm are a matrix A and its rank k (see I). They are submitted to Randomized LUthat generates the following outputs: Permutation matrices P and Q and lower and uppertriangle matrices L and U , respectively. Then, a newly arrived data point, that did notparticipate in the training, is either spanned by D therefore classiﬁed as normal otherwiseit is classiﬁed as abnormal (anomalous). pper triangular, P and Q are orthogonal permutation matrices. Let σ ≥ σ ≥ ... ≥ σ n ≥ be the singular values of A , then: σ k ≥ σ min ( L U ) ≥ σ k k ( n − k ) + 1 , (C.4) and σ k +1 ≤ (cid:107) U (cid:107) ≤ ( k ( n − k ) + 1) σ k +1 . (C.5)Based on Theorem Appendix C.1, we have the following deﬁnition: Deﬁnition Appendix C.1 (RRLU Rank k Approximation denoted RRLU k ) . Given a RRLU decomposition (Theorem Appendix C.1) of a matrix A withan integer k (as in Eq. C.3) such that P AQ = LU , then the RRLU rank k approximation is deﬁned by taking k columns from L and k rows from U such that RRLU k ( P AQ ) =  L L  (cid:16) U U (cid:17) . (C.6) where L , L , U , U , P and Q are deﬁned in Theorem Appendix C.1. Lemma Appendix C.2 ( Shabat et al. (2018b) RRLU ApproximationError) . The error of the RRLU k approximation of A is (cid:107) P AQ − RRLU k ( P AQ ) (cid:107) ≤ ( k ( n − k ) + 1) σ k +1 . (C.7)Algorithm 1 describes the ﬂow of the RLU decomposition algorithm. Appendix C.2.1. Randomized LU Based Classiﬁcation Algorithm

Based on Section Appendix C.2, we apply the randomized LU decompo-sition (Algorithm 1) to matrix A , yielding P AQ ≈ LU . The outputs P and Q are orthogonal permutation matrices. Theorem Appendix C.3 shows that40 lgorithm 1: Randomized LU Decomposition

Input:

Matrix A of size m × n to decompose; k rank of A ; l numberof columns to use (for example, l = k + 5). Output:

Matrices

P, Q, L, U such that (cid:107)

P AQ − LU (cid:107) ≤ O ( σ k +1 ( A ))where P and Q are orthogonal permutation matrices, L and U arethe lower and upper triangular matrices, respectively, and σ k +1 ( A )is the ( k + 1)th singular value of A . Create a matrix G of size n × l whose entries are i.i.d. Gaussianrandom variables with zero mean and unit standard deviation. Y ← AG . Apply RRLU decomposition (See Pan (2000)) to Y such that P Y Q y = L y U y . Truncate L y and U y by choosing the ﬁrst k columns and k rows,respectively: L y ← L y (: , k ) and U y ← U y (1 : k, :). B ← L † y P A . ( L † y is the pseudo inverse of L y ). Apply LU decomposition to B with column pivoting BQ = L b U b . L ← L y L b . U ← U b . 41 T L forms (up to a certain accuracy) a basis to A . This is the key propertyof the classiﬁcation algorithm. Theorem Appendix C.3 ( Shabat et al. (2018b)) . Given a matrix A . Itsrandomized LU decomposition is P AQ ≈ LU . Then, the error of representing A by P T L satisﬁes: (cid:107) ( P T L )( P T L ) † A − A (cid:107) ≤ (cid:16) (cid:112) nlβ γ + 1 + 2 √ nlβγ ( k ( n − k ) + 1) (cid:17) σ k +1 ( A ) . (C.8)Let x be a multidimensional data point and D = P T L is a dictionary. Thedistance between x and the dictionary D is deﬁned by dist ( x, D ) (cid:44) || DD † x − x || , where D † is the pseudo-inverse of the matrix D . If dist ( x, D ) ≤ (cid:15) then x is normal otherwise it is anomalous. References

Ansdell, M., Ioannou, Y., Osborn, H.P., Sasdelli, M., 2018 NASA FrontierDevelopment Lab Exoplanet Team, Smith, J.C., Caldwell, D., Jenkins,J.M., R¨aissi, C., Angerhausen, D., NASA Frontier Development Lab Exo-planet Mentors, ., 2018. Scientiﬁc Domain Knowledge Improves ExoplanetTransit Classiﬁcation with Deep Learning. Astrophys. J. Lett. 869, L7.doi: , arXiv:1810.13434 .B´erard, P., Besson, G., Gallot, S., 1994. Embedding riemannian manifolds bytheir heat kernel. Geometric and Functional Analysis GAFA 4, 373–398.Borucki, W.J., Koch, D., Basri, G., Batalha, N., Brown, T., Caldwell, D.,Caldwell, J., Christensen-Dalsgaard, J., Cochran, W.D., DeVore, E., Dun-ham, E.W., Dupree, A.K., Gautier, T.N., Geary, J.C., Gilliland, R., Gould,42., Howell, S.B., Jenkins, J.M., Kondo, Y., Latham, D.W., Marcy, G.W.,Meibom, S., Kjeldsen, H., Lissauer, J.J., Monet, D.G., Morrison, D., Sas-selov, D., Tarter, J., Boss, A., Brownlee, D., Owen, T., Buzasi, D., Char-bonneau, D., Doyle, L., Fortney, J., Ford, E.B., Holman, M.J., Seager, S.,Steﬀen, J.H., Welsh, W.F., Rowe, J., Anderson, H., Buchhave, L., Ciardi,D., Walkowicz, L., Sherry, W., Horch, E., Isaacson, H., Everett, M.E., Fis-cher, D., Torres, G., Johnson, J.A., Endl, M., MacQueen, P., Bryson, S.T.,Dotson, J., Haas, M., Kolodziejczak, J., Van Cleve, J., Chandrasekaran,H., Twicken, J.D., Quintana, E.V., Clarke, B.D., Allen, C., Li, J., Wu,H., Tenenbaum, P., Verner, E., Bruhweiler, F., Barnes, J., Prsa, A., 2010.Kepler Planet-Detection Mission: Introduction and First Results. Science327, 977. doi: .Brown, T.M., Latham, D.W., Everett, M.E., Esquerdo, G.A., 2011. KeplerInput Catalog: Photometric Calibration and Stellar Classiﬁcation. Astron.J. 142, 112. doi: , arXiv:1102.0342 .Catanzarite, J.H., 2015. Autovetter Planet Candidate Catalog for Q1-Q17Data Release 24. KSCI-19091-001, NASA Ames Research Center, MoﬀettField, CA.Christiansen, J.L., Jenkins, J.M., Caldwell, D.A., Burke, C.J., Tenenbaum,P., Seader, S., Thompson, S.E., Barclay, T.S., Clarke, B.D., Li, J., Smith,J.C., Stumpe, M.C., Twicken, J.D., Cleve, J.V., 2012. The derivation,properties, and value of kepler’s combined diﬀerential photometric preci-sion. Publications of the Astronomical Society of the Paciﬁc 124, 1279–1287. URL: https://doi.org/10.1086%2F668847 , doi: .43oifman, R.R., Lafon, S., 2006a. Diﬀusion maps. Applied and ComputationalHarmonic Analysis 21, 5 – 30.Coifman, R.R., Lafon, S., 2006b. Geometric harmonics: a novel tool formultiscale out-of-sample extension of empirical functions. Applied andComputational Harmonic Analysis 21, 31–52.Coughlin, J.L., Mullally, F., Thompson, S.E., Rowe, J.F., Burke, C.J.,Latham, D.W., Batalha, N.M., Oﬁr, A., Quarles, B.L., Henze, C.E., Wolf-gang, A., Caldwell, D.A., Bryson, S.T., Shporer, A., Catanzarite, J., Ake-son, R., Barclay, T., Borucki, W.J., Boyajian, T.S., Campbell, J.R., Chris-tiansen, J.L., Girouard, F.R., Haas, M.R., Howell, S.B., Huber, D., Jenk-ins, J.M., Li, J., Patil-Sabale, A., Quintana, E.V., Ramirez, S., Seader, S.,Smith, J.C., Tenenbaum, P., Twicken, J.D., Zamudio, K.A., 2016. Plane-tary Candidates Observed by Kepler. VII. The First Fully Uniform CatalogBased on the Entire 48-month Data Set (Q1-Q17 DR24). Astrophys. J.Supp. 224, 12. doi: , arXiv:1512.06149 .Dattilo, A., Vanderburg, A., Shallue, C.J., Mayo, A.W., Berlind, P., Bieryla,A., Calkins, M.L., Esquerdo, G.A., Everett, M.E., Howell, S.B., Latham,D.W., Scott, N.J., Yu, L., 2019. Identifying Exoplanets with Deep Learn-ing. II. Two New Super-Earths Uncovered by a Neural Network in K2 Data.Astron. J. 157, 169. doi: , arXiv:1903.10507 .Golovin, D., Solnil, B., Moitra, S., Kochanski, G., Karro, J., D., S., 2017.Google Vizier: A Service for Black-Box Optimization. ACM ISBN 978-1-4503-4887-4/17/08, 1487. doi: .44u, M., Eisenstat, S.C., 1996. Eﬃcient algorithms for computing a strongrank-revealing QR factorization. SIAM Journal on Scientiﬁc Computing17, 848–869.Jenkins, J.M., Caldwell, D.A., Chandrasekaran, H., Twicken, J.D.,Bryson, S.T., Quintana, E.V., Clarke, B.D., Li, J., Allen, C., Tenen-baum, P., Wu, H., Klaus, T.C., Cleve, J.V., Dotson, J.A., Haas,M.R., Gilliland, R.L., Koch, D.G., Borucki, W.J., 2010a. INITIALCHARACTERISTICS OF KEPLER LONG CADENCE DATA FORDETECTING TRANSITING PLANETS. Astrophys. J. Lett. 713,L120–L125. URL: https://doi.org/10.1088%2F2041-8205%2F713%2F2%2Fl120 , doi: .Jenkins, J.M., Caldwell, D.A., Chandrasekaran, H., Twicken, J.D., Bryson,S.T., Quintana, E.V., Clarke, B.D., Li, J., Allen, C., Tenenbaum, P., Wu,H., Klaus, T.C., Middour, C.K., Cote, M.T., McCauliﬀ, S., Girouard,F.R., Gunter, J.P., Wohler, B., Sommers, J., Hall, J.R., Uddin, A.K.,Wu, M.S., Bhavsar, P.A., Cleve, J.V., Pletcher, D.L., Dotson, J.A., Haas,M.R., Gilliland, R.L., Koch, D.G., Borucki, W.J., 2010b. OVERVIEW OFTHE KEPLER SCIENCE PROCESSING PIPELINE. Astrophys. J. Lett.713, L87–L91. URL: https://doi.org/10.1088%2F2041-8205%2F713%2F2%2Fl87 , doi: .Johnson, W.B., Lindenstrauss, J., 1984. Extensions of lipschitz mappingsinto a hilbert space. Contemporary mathematics 26, 1.Koch, D.G., Borucki, W.J., Basri, G., Batalha, N.M., Brown, T.M., Cald-well, D., Christensen-Dalsgaard, J., Cochran, W.D., DeVore, E., Dunham,45.W., Gautier, T.N., Geary, J.C., Gilliland, R.L., Gould, A., Jenkins,J., Kondo, Y., Latham, D.W., Lissauer, J.J., Marcy, G., Monet, D., Sas-selov, D., Boss, A., Brownlee, D., Caldwell, J., Dupree, A.K., Howell, S.B.,Kjeldsen, H., Meibom, S., Morrison, D., Owen, T., Reitsema, H., Tarter,J., Bryson, S.T., Dotson, J.L., Gazis, P., Haas, M.R., Kolodziejczak,J., Rowe, J.F., Cleve, J.E.V., Allen, C., Chandrasekaran, H., Clarke,B.D., Li, J., Quintana, E.V., Tenenbaum, P., Twicken, J.D., Wu, H.,2010. KEPLER MISSION DESIGN, REALIZED PHOTOMETRIC PER-FORMANCE, AND EARLY SCIENCE. Astrophys. J. Lett. 713, L79–L86. URL: https://doi.org/10.1088%2F2041-8205%2F713%2F2%2Fl79 ,doi: .Mandel, K., Agol, E., 2002. Analytic Light Curves for Planetary Tran-sit Searches. Astrophys. J. Lett. 580, L171–L175. doi: , arXiv:astro-ph/0210099 .Osborn, H.P., Ansdell, M., Ioannou, Y., Sasdelli, M., Angerhausen, D.,Caldwell, D., Jenkins, J.M., R¨aissi, C., Smith, J.C., 2020. Rapidclassiﬁcation of TESS planet candidates with convolutional neural net-works. Astron. Astrophys. 633, A53. doi: , arXiv:1902.08544 .Pan, C.T., 2000. On the existence and computation of rank-revealing LUfactorizations. Linear Algebra and its Applications 316, 199–222.Ricker, G.R., Winn, J.N., Vanderspek, R., Latham, D.W., Bakos, G. ´A.,Bean, J.L., Berta-Thompson, Z.K., Brown, T.M., Buchhave, L., But-ler, N.R., Butler, R.P., Chaplin, W.J., Charbonneau, D., Christensen-46alsgaard, J., Clampin, M., Deming, D., Doty, J., De Lee, N., Dressing,C., Dunham, E.W., Endl, M., Fressin, F., Ge, J., Henning, T., Holman,M.J., Howard, A.W., Ida, S., Jenkins, J., Jernigan, G., Johnson, J.A.,Kaltenegger, L., Kawai, N., Kjeldsen, H., Laughlin, G., Levine, A.M., Lin,D., Lissauer, J.J., MacQueen, P., Marcy, G., McCullough, P.R., Morton,T.D., Narita, N., Paegert, M., Palle, E., Pepe, F., Pepper, J., Quirrenbach,A., Rinehart, S.A., Sasselov, D., Sato, B., Seager, S., Sozzetti, A., Stassun,K.G., Sullivan, P., Szentgyorgyi, A., Torres, G., Udry, S., Villasenor, J.,2014. Transiting Exoplanet Survey Satellite (TESS). volume 9143 of Soci-ety of Photo-Optical Instrumentation Engineers (SPIE) Conference Series .p. 914320. doi: .Schwarz, G., 1978. Estimating the dimension of a model. Ann. Statist.6, 461–464. URL: https://doi.org/10.1214/aos/1176344136 , doi: .Shabat, G., Segev, D., Averbuch, A., 2018a. Uncovering unknown unknownsin ﬁnancial services big data by unsupervised methodologies: Present andfuture trends, in: Proceedings of Machine Learning Research, KDD 2017Workshop on Anomaly Detection in Finance, pp. 8–19.Shabat, G., Shmueli, Y., Aizenbud, Y., Averbuch, A., 2018b. RandomizedLU decomposition. Applied and Computational Harmonic Analysis 44,246–272.Shallue, C.J., Vanderburg, A., 2018. Identifying Exoplanets with Deep Learn-ing: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet47round Kepler-90. Astron. J. 155, 94. doi: , arXiv:1712.05044 .Yu, L., Vanderburg, A., Huang, C., Shallue, C.J., Crossﬁeld, I.J.M., Gaudi,B.S., Daylan, T., Dattilo, A., Armstrong, D.J., Ricker, G.R., Vanderspek,R.K., Latham, D.W., Seager, S., Dittmann, J., Doty, J.P., Glidden, A.,Quinn, S.N., 2019. Identifying Exoplanets with Deep Learning. III. Au-tomated Triage and Vetting of TESS Candidates. Astron. J. 158, 25.doi: , arXiv:1904.02726 .Zucker, S., Giryes, R., 2018. Shallow Transits—Deep Learning. I. FeasibilityStudy of Deep Learning to Detect Periodic Transits of Exoplanets. Astron.J. 155, 147. doi: , arXiv:1711.03163arXiv:1711.03163