[PDF] Finding the origin of noise transients in LIGO data with machine learning

Abstract

Quality improvement of interferometric data collected by gravitational-wave detectors such as Advanced LIGO and Virgo is mission critical for the success of gravitational-wave astrophysics. Gravitational-wave detectors are sensitive to a variety of disturbances of non-astrophysical origin with characteristic frequencies in the instrument band of sensitivity. Removing non-astrophysical artifacts that corrupt the data stream is crucial for increasing the number and statistical significance of gravitational-wave detections and enabling refined astrophysical interpretations of the data. Machine learning has proved to be a powerful tool for analysis of massive quantities of complex data in astronomy and related fields of study. We present two machine learning methods, based on random forest and genetic programming algorithms, that can be used to determine the origin of non-astrophysical transients in the LIGO detectors. We use two classes of transients with known instrumental origin that were identified during the first observing run of Advanced LIGO to show that the algorithms can successfully identify the origin of non-astrophysical transients in real interferometric data and thus assist in the mitigation of instrumental and environmental disturbances in gravitational-wave searches. While the data sets described in this paper are specific to LIGO, and the exact procedures employed were unique to the same, the random forest and genetic programming code bases and means by which they were applied as a dual machine learning approach are completely portable to any number of instruments in which noise is believed to be generated through mechanical couplings, the source of which is not yet discovered.

Full PDF

FFinding the origin of noise transients in LIGO data with machinelearning

Marco Cavagli`a , Kai Staats , and Teerth Gill Department of Physics and Astronomy, The University of MississippiUniversity MS 38677-1848, USA Department of Physics and Astronomy, Embry-Riddle UniversityPrescott AZ 86301, USADecember 14, 2018

Abstract

Quality improvement of interferometric data collected by gravitational-wave detectors such asAdvanced LIGO and Virgo is mission critical for the success of gravitational-wave astrophysics.Gravitational-wave detectors are sensitive to a variety of disturbances of non-astrophysicalorigin with characteristic frequencies in the instrument band of sensitivity. Removing non-astrophysical artifacts that corrupt the data stream is crucial for increasing the number andstatistical signiﬁcance of gravitational-wave detections and enabling reﬁned astrophysical inter-pretations of the data. Machine learning has proved to be a powerful tool for analysis of massivequantities of complex data in astronomy and related ﬁelds of study. We present two machinelearning methods, based on random forest and genetic programming algorithms, that can beused to determine the origin of non-astrophysical transients in the LIGO detectors. We usetwo classes of transients with known instrumental origin that were identiﬁed during the ﬁrstobserving run of Advanced LIGO to show that the algorithms can successfully identify the originof non-astrophysical transients in real interferometric data and thus assist in the mitigation ofinstrumental and environmental disturbances in gravitational-wave searches. While the datasetsdescribed in this paper are speciﬁc to LIGO, and the exact procedures employed were uniqueto the same, the random forest and genetic programming code bases and means by which theywere applied as a dual machine learning approach are completely portable to any number ofinstruments in which noise is believed to be generated through mechanical couplings, the sourceof which is not yet discovered.

On February 11 th , 2016, scientists from the Laser Interferometer Gravitational-wave Observatory(LIGO) [1] Scientiﬁc Collaboration (LSC) and the European Virgo Collaboration [2] announced theﬁrst direct detection of gravitational waves from a coalescing pair of two stellar-mass black holes [3].Detection of the GW150914 gravitational-wave signal, recorded at the LIGO sites in the morningof September 14 th a r X i v : . [ phy s i c s . d a t a - a n ] D ec he next decade will see this new branch of scientiﬁc research expand to a mature ﬁeld [4].Since GW150914, another four gravitational-wave detections from binary black hole systems [5–8]and a detection from a binary neutron star system [9] were recorded in the data stream of the Ad-vanced LIGO and Virgo interferometers. More varied detections are anticipated in future LIGO andVirgo observation runs [10–12], spurring a plethora of astrophysical and theoretical investigations.KAGRA [13] and LIGO-India [14] will join the international network, enormously improving local-ization of astrophysical sources and the network duty cycle. Commissioning activities will striveto bring the instruments to design sensitivity. Instrumental R&D will focus on the design andrealization of the next generation of gravitational-wave interferometric detectors on Earth [15] andin space [16]. All these activities will be crucial for the growth of gravitational-wave astrophysicsfrom a sensational news item to a full-grown scientiﬁc method to explore our universe.The measured rate of gravitational-wave detections in the ﬁrst observing run (O1) and sec-ond observing run (O2) of Advanced LIGO and Virgo implies that the international network ofinterferometers is poised to detect a signiﬁcant number of gravitational-wave events in the comingyears. The third Advanced LIGO-Virgo observing run (O3) is scheduled for early 2019. As thegravitational-wave detector network reaches a stage that supports rates of detections of astrophysi-cal gravitational-wave sources as high as ∼ − , a fast and accurate assessment of data qualitywill be critical.The Advanced LIGO and Virgo detectors are sensitive to a variety of disturbances of non-astrophysical origin with characteristic frequencies in the instrument band of sensitivity [17, 18].Noise transients of instrumental or environmental origin increase the false alarm rate of searches forgravitational-wave bursts and compact binary coalescences as well as aﬀect measurements of thesesignals. The most remarkable example of the eﬀect of a noise transient on a gravitational-wavesignal is undoubtely the glitch that occurred in the LIGO-Livingston detector in coincidence withthe binary neutron star merger detection [9] and had to be carefully modeled and subtracted fromthe data to accurately determine the properties of the signal. Noise in the frequency domain aﬀectssearches for long-lived transients, continuous waves and stochastic background. Removing non-astrophysical artifacts from the data and improving the background of LIGO’s searches is crucialfor reducing non-stationarity in the detectors, extending the network duty cycle, and increasing thestatistical signiﬁcance of gravitational-wave candidate events. Improvements in these areas, in turn,boost parameter estimation of the signals and enable reﬁned astrophysical interpretations of thedata. For all of these reasons, the understanding and mitigation of non-astrophysical disturbancesin the detectors is one of the top priorities of the LSC and the Virgo Collaboration.In recent years, a signiﬁcant part of LSC and Virgo activities has been devoted to investiga-tions aimed at characterizing non-astrophysical noise, improving data quality of gravitational-wavesearches and detector commissioning. Examples of these activities include investigations of noisetransients and spectral features of known and unknown origin, studies of correlations between en-vironmental and instrumental channels, detector performance assessment, and generation of dataquality ﬂags and vetoes for LIGO and Virgo’s searches. Many of these activities are conducted byinstrument specialists, commissioners and data analysts working together to identify, categorize andmitigate undesirable noise transients and spectral features that corrupt LIGO-Virgo gravitational-wave searches. These tasks are generally performed by mining the data of the gravitational-wavestrain channel and a large number ( ∼ online to provide a fast assessment2f the status of the interferometers for low-latency searches. A diﬀerent set of tools is used oﬄine fordeeper searches and follow-up of gravitational-wave candidates. With the LIGO detectors strivingto reach design sensitivity and an anticipated large number of gravitational-wave detections inthe upcoming observing runs, understanding and mitigating instrumental and environmental noisesources will become increasingly more important.The expected increase in detections from compact binary systems and the discovery of gravita-tional waves from other kinds of astrophysical sources will likely render current methods for dataquality assessment inadequate for the tasks ahead. For this reason, the development of improvedmethods to investigate noise and the exploration of new approaches to data quality issues arerecognized priorities of LIGO and Virgo researchers.Ground-based interferometric gravitational-wave detectors are complex devices that exhibitnon-linear couplings across instrumental subsystems and the environment. As a consequence, theLIGO-Virgo detector noise is non-Gaussian, variable across many parameters, and cannot be fullyanalytically modelled. Machine Learning (ML) customarily denotes the science of design, develop-ment, and applications of computer algorithms that “learn” to perform speciﬁc tasks and automat-ically improve their performance through the use of adaptive techniques and iterative procedures.Methods based on computational learning theory are powerful tools to analyse complex systemdata and may prove valuable for improving the manner in which LIGO and Virgo operate in thearea of data quality.Recently, several groups in the LSC and Virgo collaboration have investigated the use of MLtechniques for data analysis and detector characterization. The “Gravity Spy” project, for example,aims at using citizen science and ML for classiﬁcation of LIGO noise transients over the nextobserving runs [19]. Supervised deep learning algorithms have been proposed for glitch classiﬁcation[20–23] as well as real-time gravitational-wave detection [24], parameter estimation [25] and signalclassiﬁcation [26–29]. Multivariate random forest classiﬁers [30,31] and unsupervised ML algorithmsbased on Principal Component Analysis [32, 33] have been used on interferometric data over theyears. ML methods can provide complementary approaches to existing detector characterizationtechniques as they are computationally inexpensive, able to deliver results in low latency, and derivepredictive models of the system producing the data.In this paper we present a new application of ML to an old problem in experimental detectionphysics: the identiﬁcation of instrumental mechanical couplings leading to excess noise in thedetector. Our analysis focuses on ground-based gravitational-wave interferometric detectors, inparticular the LIGO-Virgo instruments. However, the methods presented here are general andcan be applied to any complex physical device with a main output channel and a set of auxiliarychannels that monitor the status of the instrument and its environment.The ultimate goal of detector characterization is not only to ﬂag and veto noisy times in theinstrument main output data stream, but also identify the instrumental or environmental source ofthe noise and, if possible, adjust the detector such that the disturbance can be removed permanently.Whereas diﬀerent ML techniques have been developed by various LSC groups to classify noisetransients, or glitches, ML has not been applied yet to the problem of identifying the cause of non-astrophysical noise in LIGO-Virgo detectors. Herein lies the novelty of our approach: We introducetwo ML algorithms that provide simple, yet robust methods to mine the data of auxiliary channelsand infer the origin of noise transients in the main detector output. The codes developed for thistask are based on two, widely-used ﬂavors of ML known as Random Forest (RF) and GeneticProgramming (GP). This choice is motivated by our ﬁnal goal of providing fast and eﬀective toolsthat commissioners and data analysts can use with little tuning. Contrarily to more black boxapproaches to ML such as deep learning and neural network-based algorithms, RF and GP methodsare interpretable, easy to use and tune, and can work with relatively small datasets without the3nherent risk of overﬁtting. The methods that we illustrate below only require an input list of timeswhen a speciﬁc class of noise transients occurs. Rather than generating ML features in the formof time-frequency images for deep learning image-based classiﬁcation (a time-consuming process),the required features are drawn directly from numerical metadata that are generated by real-timedata quality pipelines for generic detector characterization investigations readily already availableon the LIGO computing clusters. This approach minimizes the feature generation step of theprocess, which is the typical bottleneck for low-latency investigations. The ML dataset is readilyassembled by putting together the features of the noise transients and additional, randomly-selectedbackground triggers. The RF and GP codes that we developed can be trained and run in minuteson LIGO computing clusters and allow the user to complete a typical analysis with a number ofnoise transients of the order of a few thousands in low-latency and with minimal input.In the following sections we illustrate this approach by applying the RF and GP methods totwo set of glitches in LIGO data from LIGO-Virgo O1 and O2 observing runs. The origin ofthese glitches was identiﬁed by LIGO-Virgo commissioners and scientists, eventualy leading to thesuccessful mitigation of these disturbances and their removal from later data. While our presentwork does not have a direct impact on the searches for transient gravitational waves with O1 andO2 data, it presents a proof-of-concept that our method can be implemented within the currentLIGO-Virgo data quality infrastructure, which will be in place in the upcoming observing runs,and used to infer instrumental and environmental mechanical couplings aﬀecting the detectors inO3 and beyond. In this section we brieﬂy introduce the basics of the RF and GP algorithms. A full introduction tothese methods is beyond the scope of this paper and we only present information which is essentialfor the understanding of our analysis. For a deeper discussion of RF and GP, the reader is referredto Refs. [34, 35].

RF denote a popular supervised ML technique for data classiﬁcation and regression which operateswithout human intervention, employing oﬀ-line training (learning) and test (validation) to producethe outcome. The basic principle of the RF method is the construction of a number of decisiontrees at training time and then averaging over these trees to improve their perfomance on thetesting set. A decision tree can be represented by a graph with three types of elements representingtests on input features (internal nodes), test outcomes (branches) and labels that are used to makepredictions (leaves). The topmost node in a tree is called the root node. Paths from root to endleaves represent classiﬁcation or regression rules. An example of decision tree is shown in Fig. 1.Single decision trees may have high bias and overﬁt the training set, as well as be very sensitiveto outlier data. In order to avoid this, the RF method calculates a combination of independently-sampled tree predictors. This averaging procedure avoids both false minima and over-trainingleading to a single, more complex, ﬁnal tree which is less likely to overﬁt. One of the main advan-tages of RF is the transparent nature of its computational algorithm. Ensemble algorithms suchas RF are typically more powerful than other ML techniques. They require little data preparationand are easy to interpret. As they have few hyperparameters to tune, they can produce accu-rate results even with default settings. Feature importance and algorithm accuracy are generatedautomatically. 4igure 1: Graphical representation of a decision tree. The root node is represented by the redelement, internal nodes are represented by green elements, leaves are represented by blue elementsand branches are represented by black connector lines which denote success or failure of the statedcondition.In our analysis we used the standard scikit-learn [34] implementation of the RF algorithm withdefault settings with the exception of the number of estimators (trees in the forest) that we chosein the range of 300-500. The RF methods allows for a straightforward computation of the featureimportance, i.e., how much each feature contributes to the model’s predictive performance. In thescikit-learn implementation of the RF algorithm, the feature importance is deﬁned as the “giniimportance” [36], i.e., the average over all trees of the total decrease in node impurity, weighted bythe probability of reaching that node.In order to increase the signal-to-noise ratio of the RF algorithm, we reﬁt multiple times witheach iteration removing features below a user-deﬁned threshold until no features below the chosencutoﬀ value are left. This process allows us to denoise the output by eliminating features that aremarginal in the determination of the source of the glitches. The features belonging to the sameauxiliary channel are then grouped together to build the channel importance, which measures therelevance of the given channel in discriminating generic background noise triggers (label 0) fromthe glitches inder investigation (label 1).

GP is a supervised machine learning algorithm, an analog to biological natural selection thatevolves a population of programs to solve a particular problem [37]. An individual GP program isan hypothesis which when executed takes the form of a mathematical, multivariate expression.In training, GP compares the output of each executed hypothesis (predict) to an associated,qualiﬁed label (truth). This comparison is quantiﬁed as a ﬁtness score. GP programs that demon-strate a higher ﬁtness score are more likely (but not guaranteed) to be selected for the next gener-ation. Thus, each subsequent generation of programs is more likely (but not guaranteed) to solvethe given problem than the prior [38].As shown in Figure 2, GP multivariate expressions are often represented as a syntax tree,where the trees have a root (top center), nodes (mathematical operators), and leaves (operands).Operators can be arithmetic, trigonometric, and boolean, for example. Operands are variable place-5igure 2: Graphical representation of a GP syntax tree. The depth of a tree is deﬁned as thenumber of rows in the tree, i.e., two, three and four in the diagrams from left to right, respectively.holders for the real-world data. When evaluated, the real-world values of the data are substitutedfor the variables in the multivariate expressions, data point by data point. The depth of a treedetermines the complexity of the resulting, evolved multivariate expression.User deﬁned parameters aﬀect the quality and speed of the evolutionary process, including thesize and type of GP trees in the initial population, the number of GP programs selected for eachﬁtness score comparison and the type of comparison applied, and the termination criterion (eg:number of generations).The work-ﬂow of a generational GP run incorporates three basic steps: a) Generation of aninitial, stochastic population; b) Iterative selection, evaluation, and application of genetic opera-tions (reproduction, mutation and crossover –see Fig. 3); c) transfer of the evolved copy into thesubsequent generation. Steps b) and c) are repeated until the user-deﬁned termination criteria aremet [38].Key to the acceptance of GP across many ﬁelds of research is the transparent nature of itscomputational engine. At any stage of the evolutionary process, the internal workings of GPcan be readily exposed and reviewed, and the populations archived. As compared to other, moreblack box machine learning algorithms, GP provides insight to how it arrives to its evolved solution.Moreover, as the GP model is a stand-alone mathematical expression whose variables call upon datafeatures, it can be readily employed as a portable model for online data classiﬁcation or regressionanalysis. The algorithm can be tuned by choosing a number of user-deﬁned hyperparameters, whichinclude base, maximum and minimum tree depth, the size of the program population, the numberof generations and the tournament size.In our analysis we used a tree-based open source python code, Karoo GP [39], that was originallywritten by one of the authors (KS) for the mitigation of RFI in radio astronomy at the SquareKilometre Array [40]. Karoo GP is scalable, with multicore and GPU support enabled by thelibrary TensorFlow, with capacity to work with very large datasets [41].

In order to illustrate how ML can be used to identify non-astrophysical transients in LIGO dataand recover mechanical couplings in the detector subsystems, we consider two sets of glitches withknown origin in analysis-ready data from Advanced LIGO’s ﬁrst Observing Run (O1: September12 th , 2015, to January 19 th , 2016) and second Observing Run (O2: November 30 th , 2016, toAugust 25 th , 2017). Here “analysis ready” denotes data taken with the interferometers in a nominal6igure 3: Genetic operators for evolutionary computation: Point mutation (top left), branch mu-tation (bottom left), and crossover or sexual reproduction (right). Reproduction is deﬁned as nochange in a tree when copied from the current to the next generation.observing state.During observing runs the status of the LIGO interferometers is continuously monitored througha number of physical sensors that probe the detector subsystems and their environment. The digitaloutput of these sensors is recorded in thousands of auxiliary channels as raw time series. Dedicateddetector characterization pipelines use these raw data to identify non-astrophysical noise transientsand spectral features that may aﬀect the instrument main output, or gravitational-wave strainchannel. Auxiliary channels are typically separated in “safe” and “unsafe” channels. Safe auxiliarychannels are not expected to show any excess noise when an astrophysical signal is present in thegravitational strain channel. For example, physical environmental monitor (PEM) channels areconsidered safe, as a gravitational wave is not supposed to generate any signal in environmentalsensors. Thus excess noise in these channels generally denotes a non-astrophysical disturbance.Data quality ﬂags and ultimately vetoes can be created from safe channels to reduce the in-strumental background and improve the gravitational-wave searches. Unsafe auxiliary channels areknown to couple to astrophysical signals. If a gravitational wave is present in the data, its signal isexpected to couple to some of the detector subsystems, for example the interferometer output modecleaner. Unsafe channels cannot be used to create vetoes and are not considered in detector char-acterization investigations. Standard lists of safe and unsafe channels are made for each observingrun and periodically tested through insertions of simulated signals in the instrument (hardwareinjections). In our analysis we consider the standard O1 and O2 lists of safe auxiliary channelsas determined by the detector charaterization working group and used by the hveto pipeline [42],comprising 840 and 919 channels, respectively.The ﬁrst set that we consider in our study contains 2049 glitches that were identiﬁed by thehveto pipeline in a magnetometer located at one of the end-test stations of the LIGO-Livingstoninterferometer between February 9 th , 2017 and April 10 th , 2017. LIGO’s equipment which is hostedin electronics racks generates magnetic ﬁelds that may couple to other components of the detector,such as cables, connectors and actuators. In order to measure these spurious magnetic ﬁelds,several magnetometers are deployed in the main interferometer stations. These magnetometers7re also used to monitor DC power supply glitches and currents that may produce artifacts in thegravitational wave channel [43].After a power outage on February 7 th , 2017 the hveto pipeline identiﬁed a series of new glitches ofelectromagnetic origin in the LIGO-Livingston interferometer within the detector search frequencyband, around ∼ th LIGO commissioners successfully mitigated the glitches by modifying the grounding system of theelectronics bay with the installation of a ground rod which eliminated a spurious current at theorigin of the electromagnetic disturbance.The EX magnetometer glitches provide a good playground for ML testing because of their dis-tinct spatial and temporal localization, as well as their understood origin and successful mitigation.Moreover, due to their electromagnetic nature they appear only in a well-deﬁned subset of auxiliarychannels. This allows for a clear-cut test of the algorithm’s ability to identify the correct auxiliarychannels related to the noise and infer the source of the instrumental coupling.The second set contains 42 short-lived noise transients that were caused by an air compres-sor seismically coupling to the LIGO-Hanford interferometer between September 18 th , 2015 andDecember 28 th , 2015. In September 2015 detector characterization investigations indicated thepresence of some transient excess noise of unknown origin at a frequency around 50 Hz in the grav-itational strain output of the LIGO-Hanford interferometer. The origin of the noise was recognizedto originate in the EX station. The hveto pipeline indicated a correlation in time with glitches inthe EX PEM seismic (SEIS) and accelerometer (ACC) auxiliary channels, as well as the StreckeisenSTS-2 (STS) inertial sensors monitoring the ground motion (GND) of the active seismic isolationinternal to the vacuum system (ISI). The eﬀect of the excess noise due to the seismic coupling onthe transmitted light (TR) along the direction of the interferometer X arm caused some excesspitch and yaw motion of the EX test mass that was recorded in the Alignment Sensing and Control(ASC) channels. Dedicated investigations pointed to a mechanical coupling as the origin of thetransients, such as a motor or a transformer core pulse inducing the 50 Hz characteristic. A timedelay between the occurrence of the noise in the accelerometers and the voltage monitors indi-cated that the origin of the transients was not located in the immediate vicinity of the EX opticstable. The culprit was eventually identiﬁed as an air compressor turning on in the EX stationwhich was seismically coupling to the detector via the optics table. Follow-up investigations withGravity Spy [19] led to the identiﬁcation of all the 42 glitches in our dataset. Although the overallnumber of these glitches is quite low for ML training purposes, their physical properties are wellcharacterized and their instrumental mechanical couplings are well understood. Similarly to themagnetometer set, the air compressor glitches provide a good playground for testing the eﬃciencyof the ML algorithms in determining the origin of noise disturbances.These diverse sets allow us to test the eﬀectiveness of the ML algorithms in two extreme casesthat are typical of LIGO noise investigations, where thousands or just a handful of glitches can beidentiﬁed by detector characterization pipelines or manual data mining techniques. Short-lived noise transients are generally identiﬁed by a trigger in the form of a single GPS timeor a time interval where the disturbance has its peak. The peak of the glitch may be computed8y simply recording the time where the value of a given auxiliary channel time series passes apre-deﬁned threshold or through more reﬁned methods, for example by deﬁning a Signal-to-NoiseRatio (SNR) or looking at correlations between channels and/or the main interferometer output.Our analysis is based on Omicron triggers. Omicron is a widely used LIGO pipeline to identifyglitches in instrumental and environmental auxiliary channels [44]. The algorithm is based on aC++ burst-type Event Trigger Generator (ETG) which is itself based on the Q-transform [45], amodiﬁcation of the standard short-time Fourier transform similar in construction to the continuouswavelet transform. The channel time series is projected onto a parameter space tiled in time,frequency, and Q planes. An omicron trigger is identiﬁed when the SNR of a time-frequency tileis above a user pre-deﬁned threshold. The characteristics of each trigger are recorded in a user-conﬁgurable data vector describing the physical parameters of the trigger. Elements of this datavector can be used as raw features for the ML algorithm. In our analysis, we use a six-dimensionalomicron data vector with peak frequency, central frequency, bandwidth, amplitude, SNR and phaseelements.The Omicron pipeline ran daily in O1 and O2 on the standard list of auxiliary channels, record-ing noise triggers with SNR > .

5. The Omicron features corresponding to the magnetometer andair compressor glitches are obtained by selecting for each glitch time and auxiliary channel theonline Omicron trigger with the highest SNR within a coincidence window of ± . > ±

10 seconds) and then obtaining the features for each auxiliary channelas done for the glitch set. Finally, label 0 and label 1 triggers in the datasets are randomized with2/3 of the entries being used for ML training and internal validation, and the remaining 1/3 beingreserved for testing.

In order to test the ML algorithms and illustrate how to infer the glitch mechanical couplings,we ﬁrst ran the RF algorithm on the training sets to compute the channel importance (deﬁned asthe sum of the feature importances for the given channel). We tested the procedure by varyingthe number of estimators and the iteration threshold, noticing no signiﬁcant change in the results.For illustration purposes, here we present results for 500 estimators and an iteration thresholdof 0.005 and 0 for the magnetometer and air compressor sets, respectively. The RF results weresuccessfully validated with Karoo GP, which was run multiple times with diﬀerent hyperparameterconﬁgurations to test the robustness of the procedure. Below we present results averaged on allKaroo GP runs, as well as some of the results from the best performing runs.9 uxiliary channel RF Importance

ISI-ETMX ST1 BLND Z T240 CUR IN1 DQ .041PEM-EX MAG EBAY SUSRACK QUAD SUM DQ .071PEM-EX MAG EBAY SUSRACK X DQ .155PEM-EX MAG EBAY SUSRACK Z DQ .041PEM-EX MAG VEA FLOOR QUAD SUM DQ .108PEM-EX MAG VEA FLOOR X DQ .174PEM-EX MAINSMON EBAY 1 DQ .075PEM-EX MAINSMON EBAY 3 DQ .026PEM-EX MAINSMON EBAY QUAD SUM DQ .298PEM-EY MAINSMON EBAY 1 DQ .011Table 1: Auxiliary channels with nonzero RF importance for the magnetometer set. Diﬀerentcolors denote instrumental and environmental auxiliary channels corresponding to diﬀerent detectorsubsystems: Olive green = Internal Seismic Isolation (ISI), sienna = Physical and EnvironmentalMonitor (PEM).

The list of auxiliary channels with nonzero importance obtained with RF iteration threshold equalto 0.005 is listed in Table 1. Out of the ten auxiliary channels, nine channels are related to theEX detector subsystem and one channel, PEM EY MAINSMON EBAY 1 DQ, is the MAINSMONchannel of the end-Y (EY) station’s EBAY. As the latter has the lowest importance and can beremoved by choosing a higher RF iteration threshold while preserving most of the EX channels, wecan safely conclude that it is not related to the actual mechanical coupling originating the glitches.Eight of the auxiliary channels related to the EX subsystems are PEM channels for the EBAYand the Vacuum Equipment Area (VEA) magnetometers and for the MAINSMON. The additionalchannel monitors the Nanometrics Trillium 240 (T240) Inertial broadband sensor picked oﬀ at theinput to the BLEND ﬁlter bank of the active internal seismic isolation of the end test mass. Thedata in Table 1 are shown as a histogram in Fig. 4.The RF algorithm correctly identiﬁes the voltage monitor of the EX electronics bay as the originof the noise transients. The winning channel is PEM-EX MAINSMON EBAY QUAD SUM DQ,i.e., the quadrature sum of the raw EBAY MAINSMON output recorded by the Data AcquisitionSystem (DQ). The three MAINSMON channels account for about ∼

40% of the features used inthe RF classiﬁcation. The PEM-EX MAG EBAY SUSRACK channels recording the raw output ofthe EX rack magnetometer and the PEM-EX MAG VEA FLOOR channels recording the outputof the VEA magnetometer account for about ∼

27% and ∼

28% of the features, respectively. Thisresult clearly points to an electromagnetic origin of the glitches in the EX station, in agreementwith the results ﬁrst obtained with the hveto pipeline and later conﬁrmed by LIGO commissioners.The identiﬁcation of the ISI-ETMX ST1 BLND Z T240 CUR IN1 DQ by the RF code is interest-ing, as it may seem puzzling that electromagnetic glitches show in the output of an accelerometer.A possible explanation for the inclusion of this channel could simply be algorithm noise, as is thecase with the PEM EY MAINSMON EBAY 1 DQ channel. However, the fact that this auxiliarychannel is one of the accelerometer channels in the EX station and cannot be eliminated by increas-ing the iteration threshold without also eliminating most of the other EX channels suggests thatthe coupling may be real. Indeed, broadband seismometers may couple to environmental magnetic10 uxiliary channel0.101.00 R F c h a nn e l i m p o r t a n c e ISI_ETMX_ST1_BLND_Z_T240_CUR_IN1_DQPEM_EX_MAG_EBAY_SUSRACK_QUAD_SUM_DQPEM_EX_MAG_EBAY_SUSRACK_X_DQPEM_EX_MAG_EBAY_SUSRACK_Z_DQPEM_EX_MAG_VEA_FLOOR_QUAD_SUM_DQPEM_EX_MAG_VEA_FLOOR_X_DQPEM_EX_MAINSMON_EBAY_1_DQPEM_EX_MAINSMON_EBAY_3_DQPEM_EX_MAINSMON_EBAY_QUAD_SUM_DQPEM_EY_MAINSMON_EBAY_1_DQ A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Figure 4: Histogram of RF channel importance for the magnetometer set from the data in Table1. Diﬀerent colors denote diﬀerent detector subsystems and auxiliary channels: Seagreen = Arm-length Stabilization (ALS), orchid = Alignment Sensing and Control (ASC), goldenrod = PhotonCalibrator (CAL) royal blue = Hydraulic External Pre-Isolator (HPI), lime green = Input ModeCleaner (IMC), olive green = Internal Seismic Isolation (ISI), violet = Length Sensing and Con-trol (LSC), gray = Output Mode Cleaner (OMC), sienna = Physical and Environmental Monitor(PEM), orange = Pre-Stabilized Laser (PSL), turquoise = Suspension (SUS), magenta = Ther-mal Compensation (TCS). The plot clearly shows how the glitches arise from an environmentalelectromagnetic disturbance in the EX station.ﬁelds [46, 47]. In particular, the coherence of the T240 has been studied in detail in Ref. [48],where it is shown that strong magnetic ﬁelds may even dominate the seismometer signal. Thusthe inclusion of ISI-ETMX ST1 BLND Z T240 CUR IN1 DQ indicates that the electromagneticdisturbance is suﬃciently strong to couple to the ETMX seismometer, a fact that had not beenrecognized during standard detector characterization investigations leading to the mitigation of theglitches.The above results can be validated with Karoo GP. As the code does not include a standardfunction to compute the feature importance, we build a GP analog of this quantity as follows.We run the code multiple times with varying hyperparameters and select the runs that producea classiﬁcation of the testing set with recall and speciﬁcity both above a given threshold. Thenwe deﬁne the channel importance by counting how many times the features of each channel areused in the winning GP multivariate expression of the selected runs and normalize to the totalnumber of features used. Although this is a rough method of deﬁning the feature importance, itis suﬃcient for our simple RF validation task. Figure 6 shows the results from eight runs passinga 92% threshold (out of a total of 160 runs). The confusion matrix, precision and recall for these11 un TN FP FN TP RC PR

11 634 42 44 623 0.934 0.93782 659 17 50 617 0.925 0.97391 644 32 50 617 0.925 0.951126 634 42 50 617 0.925 0.936134 638 38 49 618 0.927 0.942146 625 51 45 622 0.933 0.924148 629 47 50 617 0.925 0.929153 644 32 53 614 0.921 0.950Table 2: Confusion matrix for the eight best Karoo GP runs on the magnetometer set (out of 160total runs) that are used to determine the GP channel importance. TN=True negatives (backgroundnoise correctly identiﬁed), FN=False Positives (background noise mis-identiﬁed as magnetometerglitch), FP=False Negatives (glitch mis-identiﬁed as background noise), TP=True Positives (glitchcorrectly identiﬁed), RC= Recall, PR = Precision. Run 11 has the best recall. The run with thebest precision (PR=0.998) fails to pass the recall cut-oﬀ (RC=0.864) and is not included in theruns used for the computation of the channel importance.runs are shown in Table 2. Histograms of precision and recall for all 160 runs are shown in Fig. 5.Because of the limited number of runs used to compute the channel importance, the GP resultsare generally noisier than the RF results. However, the GP feature importance is in excellent agree-ment with the RF importance calculated earlier. Table 3 and Fig. 6 show the channels with GP im-portance larger than 0.012. The auxiliary channels most used by Karoo GP to separate the magne-tometer glitches from the background are two channels of the T240 accelerometer in the EX station(ISI ETMX ST1 BLND Z T240 CUR IN1 DQ and ISI ETMX ST1 BLND RY T240 CUR IN1 DQ)and the PEM EX MAINSMON EBAY QUAD SUM DQ channel. The ﬁrst of the ISI channels andthe PEM channel are the channels with the highest RF importance for the ISI and PEM subsystem,respectively. As remarked above, the presence of additional, unrelated channels such as ISI HAM2 -BLND GS13RZ IN1 DQ, LSC POP A RF9 I ERR DQ and SUS MC1 M2 NOISEMON LR OUT DQ,denotes noisier GP results compared to the RF results. As the GP process is stochastic, cleanerresults may be obtained by increasing the number of runs and/or the eﬃciency threshold used forthe selection of the winning multivariate expressions. Another way to improve on these resultswould be to replace the rough count of the channel features with a better deﬁnition of channelimportance.

We repeat the investigation of the previous section for the air compressor set. As this datasetis reduced to only 16 air compressor noise transients after data preparation, and training occursonly on 2/3 of the glitches (plus an equal number of background triggers) we do not expect theair compressor results to be as clear-cut as the results for the magnetometer set. However, evenif the identiﬁcation of the mechanical couplings were to fail, this test would provide importantinformation about the eﬀectiveness of the method for small datasets. Prompt identiﬁcation of theorigin of mechanical couplings in the detector is of paramount importance for gravitational-wavesearches during observing runs. If the origin of mechanical couplings can be inferred with just alimited number of recorded glitches as soon as they appear in the detector, the method may provevery useful for commissioning purposes. 12 uxiliary channel GP Importance

ISI-ETMX ST1 BLND Z T240 CUR IN1 DQ 0.049

ISI-ETMX ST1 BLND RY T240 CUR IN1 DQ 0.042ISI-HAM2 BLND GS13RZ IN1 DQ 0.027LSC-POP A RF9 I ERR DQ 0.015

PEM-EX MAINSMON EBAY QUAD SUM DQ 0.042PEM-EX MAINSMON EBAY 1 DQ 0.020

PEM-EY MAINSMON EBAY 3 DQ 0.015PEM-EX MAG VEA FLOOR Z DQ 0.013

PEM-EX MAINSMON EBAY 3 DQ 0.013

SUS-MC1 M2 NOISEMON LR OUT DQ 0.027Table 3: Auxiliary channels with GP importance larger than 0.012 for the magnetometer set. Asin Fig. 4, diﬀerent colors denote instrumental and environmental auxiliary channels correspondingto diﬀerent detector subsystems. Channels in italic denote those selected also by the RF algorithm(see Table 1).

Auxiliary channel RF Importance

ASC-X TR B PIT OUT DQ .079ASC-X TR B YAW OUT DQ .169HPI-ETMX BLND L4C RX IN1 DQ .052HPI-ETMX BLND L4C RY IN1 DQ .008HPI-ETMX BLND L4C RZ IN1 DQ .010HPI-ETMX BLND L4C Y IN1 DQ .038ISI-GND STS ETMX X DQ .228ISI-GND STS ETMX Y DQ .012PEM-EX ACC BSC9 ETMX Z DQ .055PEM-EX ACC EBAY FLOOR Z DQ .203PEM-EX ACC OPLEV ETMX Y DQ .008PEM-EX ACC VEA FLOOR Z DQ .045PEM-EX SEIS VEA FLOOR Y DQ .024SUS-ETMX L3 OPLEV YAW OUT DQ .069Table 4: Auxiliary channels with nonzero RF importance for the air compressor set. Diﬀerentcolors denote instrumental and environmental auxiliary channels corresponding to diﬀerent detectorsubsystems: Orchid = Alignment Sensing and Control (ASC), royal blue = Hydraulic External Pre-Isolator (HPI), olive green = Internal Seismic Isolation (ISI), sienna = Physical and EnvironmentalMonitor (PEM), turquoise = Suspension (SUS).13 .80 0.85 0.90 0.95 1.00Precision0%6%12%19%25%31%38%44%50% P e r c e n t a g e o f r un s mean=0 . , std=0 . P e r c e n t a g e o f r un s mean=0 . , std=0 . Figure 5: Karoo GP precision (left) and recall (right) for the magnetometer testing set from 160runs with varying hyperparameters (tree base depth = 10, maximum tree depth = 10, minimumtree depth = 3, population = 300, generations = 100 or 150, tournament size = 10 or 20). Theback curves are Gaussian ﬁts to the data. Mean and standard deviation of the Gaussian ﬁts areshown in the legend.As before, we run the RF code to identify the most relevant auxiliary channels related to theair compressor set. The list of auxiliary channels with nonzero importance obtained with an RFiteration threshold of 0 is listed in Table 4 and graphically shown in Fig. 7. The code identiﬁesﬁve detector subsystems related to the glitch class: the Alignment Sensing and Control (ASC)subsystem with two auxiliary channels accounting for a total feature importance percentage of ∼ ∼ ∼ ∼

34% and theSuspension (SUS) subsystem accounting for ∼ uxiliary channel0.020.030.040.050.060.070.08 G P c h a nn e l i m p o r t a n c e ISI_ETMX_ST1_BLND_RY_T240_CUR_IN1_DQISI_ETMX_ST1_BLND_Z_T240_CUR_IN1_DQISI_HAM2_BLND_GS13RZ_IN1_DQLSC_POP_A_RF9_I_ERR_DQPEM_EX_MAG_VEA_FLOOR_Z_DQPEM_EX_MAINSMON_EBAY_1_DQPEM_EX_MAINSMON_EBAY_3_DQPEM_EX_MAINSMON_EBAY_QUAD_SUM_DQPEM_EY_MAINSMON_EBAY_3_DQSUS_MC1_M2_NOISEMON_LR_OUT_DQ A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Figure 6: Channel importance for the magnetometer set from Karoo GP. Only auxiliary channelswith GP importance > .

012 are shown.importance larger than 0.01 are shown in Table 5 and Fig. 8 for 34 runs that pass an eﬃciencythreshold on true positives and true negatives of 66%. This relatively low threshold is requiredto build enough statistics for the channel count. As the dataset is limited, very few runs producemultivariate expression with high true positive/true negative eﬃciency. Although noisier thanthe RF results, Karoo GP results well agree with the RF results. Exceptions are the presenceof the ISI-HAM4 BLND GS13RZ IN1 DQ and SUS-MC3 M1 DAMP T IN1 DQ channels. Thesechannels are clearly not related to the disturbance and are a noise artifact of the GP algorithm likelyto go away with more runs and a higher threshold on the eﬃciency for the feature computation.

We have seen in the previous section that even with a handful of glitch times both the RF methodand the GP method provide a very good characterization of the air compressor glitches. In thissection we will provide some evidence that small dataset with dimension of a few tens of glitchescan indeed be used to eﬀectively infer the mechanical systems at their origin. In order to do this,we consider again the magnetometer set and reduce its dimension by selecting a (smaller) number n of magnetometer glitches and building a reduced training dataset of dimension 2 n by addingan equal number of randomly selected background glitches. We train the ML algorithms on this2 n -dimensional set with ﬁxed identical hyperparameters for all runs and test the result on all theremaining glitches+background that were not used for training. Figure 9 shows the RF results fora number of estimators equal to 500, iteration threshold equal to 0.005 and four diﬀerent datasetsizes: n = 10, n = 20, n = 100 and n = 500. 15 uxiliary channel0.010.101.00 R F c h a nn e l i m p o r t a n c e ASC-X_TR_B_PIT_OUT_DQASC-X_TR_B_YAW_OUT_DQHPI-ETMX_BLND_L4C_RX_IN1_DQHPI-ETMX_BLND_L4C_Y_IN1_DQ ISI-GND_STS_ETMX_X_DQISI-GND_STS_ETMX_Y_DQPEM-EX_ACC_BSC9_ETMX_Z_DQPEM-EX_ACC_EBAY_FLOOR_Z_DQPEM-EX_ACC_VEA_FLOOR_Z_DQPEM-EX_SEIS_VEA_FLOOR_Y_DQSUS-ETMX_L3_OPLEV_YAW_OUT_DQ A S C H P I I M C I S I L S C P E M P S L S U S Figure 7: Histogram of RF channel importance for the air compressor set from the data in Table 4.Diﬀerent colors denote diﬀerent detector subsystems and auxiliary channels: Orchid = AlignmentSensing and Control (ASC), royal blue = Hydraulic External Pre-Isolator (HPI), lime green =Input Mode Cleaner (IMC), olive green = Internal Seismic Isolation (ISI), violet = Length Sensingand Control (LSC), sienna = Physical and Environmental Monitor (PEM), orange = Pre-StabilizedLaser (PSL), turquoise = Suspension (SUS).The results for n = 10 successfully single out the magnetic origin of the glitches in the EXstation by identifying the correct PEM EX channels, although the results are contaminated by theappearance of a few PEM channels of magnetometers located in the interferometer Corner Sta-tion (CS). The importance of these channels stands out only when the dimension of the datasetis strongly reduced ( n = 10, n = 20, n = 100) and may be due to algorithm noise or some con-tamination in the dataset. For example, some of the magnetometer glitches may be caused byenvironmental electromagnetic disturbances which spread-out across the interferometer (e.g., light-ning or power glitches) and thus be erroneously included in the dataset. Likewise, the identiﬁcationof a Length Sensing and Control (LSC) channel for the n = 10 dataset and a couple of OutputMode Cleaner (OMC) channels for the n = 20 dataset may be due to algorithm noise or indicatethat these auxiliary channels are not safe. Similarly to the CS channels, the importance of thesechannels stands out only when the dimension of the dataset is strongly reduced. The results for n = 500 are basically consistent with the results of the full dataset discussed in Sect. 4.1.We test the accuracy of the GP algorithm as the dataset size is varied by running the codewith ﬁxed hyperparameters and then comparing the confusion matrix of the various datasets. Werun Karoo GP 80 times per dataset with tree base depth = 5, maximum tree depth = 5, minimumtree depth = 3, population = 300, generations = 100 and tournament size = 20. Figure 10 showsthe ROC space (recall vs. fall-out) of the runs with datasets of size n = 10 to n = 1300, the latter16 uxiliary channel GP Importance ASC-X TR B PIT OUT DQ 0.011ASC-X TR B YAW OUT DQ 0.022HPI-ETMX BLND L4C RX IN1 DQ 0.013ISI-GND STS ETMX X DQ 0.024ISI-GND STS ETMX Y DQ 0.014

ISI-HAM4 BLND GS13RZ IN1 DQ 0.010

PEM-EX ACC BSC9 ETMX Z DQ 0.010PEM-EX ACC EBAY FLOOR Z DQ 0.025PEM-EX ACC OPLEV ETMX Y DQ 0.010PEM-EX SEIS VEA FLOOR Y DQ 0.011SUS-ETMX L3 OPLEV YAW OUT DQ 0.011

SUS-MC3 M1 DAMP T IN1 DQ 0.010Table 5: Auxiliary channels with Karoo GP importance above .01 for the air compressor set.Channels in italic denote those selected also by the RF algorithm (see Table 4).essentially corresponding to the full dataset. Each point in the scatterplot represents the resultof a Karoo GP run. Average values and standard deviations for each dataset sample size are alsoplotted. Clearly, increasing the dataset sample size improves the binary classiﬁcation of glitches vs.background. However, even for small dataset size the algorithm performs fairly well, with averagerecall above ∼

85% and fall-out below ∼ n = 10, n = 20, n = 100 and n = 500. Thechannel importance is extracted from the Karoo GP runs with an eﬃciency threshold of 88% fortrue positives and true negatives. This lower threshold compared to the threshold used for the fulldataset in Sec. 4.1 is required to obtain enough statistics for the smaller datasets, where KarooGP performance is worse (9, 15, 32 and 66 runs pass this threshold for the dimensions n = 10, n = 20, n = 100 and n = 500 datasets, respectively). As a consequence of the lower threshold,the results are noisier than for the full dataset. However, the trend is clear as the identiﬁcation ofthe correct auxiliary channels improves with the dataset dimension. The dataset n = 500 producesa GP channel importance ranking that would undoubtedly allow commissioners to identify thedisturbance as originating in the EX station. Datasets with n ≤

100 seem to provide a less-clearcut answer. However, as the GP process is stochastic, the low statistics of these datasets mayaﬀect the results. Performing more runs with diﬀerent hyperparameters is likely to improve thechannel selection. Although less strong than the RF method, the GP method seems to be ableto provide useful information on the origin of mechanical couplings also for datasets with minimaldimensionality.

Mitigation of non-astrophysical, instrumental or environmental noise in ground-based interferomet-ric detectors is critical for improving data quality, reducing the background of gravitational-wavesearches and increasing the amount of physical information that can be extracted from detectedsignals. Identiﬁcation of the unwarranted mechanical couplings that cause instrumental noise maygreatly help scientists in performing this eﬀort.In this paper we have shown that ML methods may be used to infer the possible locations17 uxiliary channel0.010.020.030.04 G P c h a nn e l i m p o r t a n c e ASC-X_TR_B_PIT_OUT_DQASC-X_TR_B_YAW_OUT_DQHPI-ETMX_BLND_L4C_RX_IN1_DQ ISI-GND_STS_ETMX_X_DQISI-GND_STS_ETMX_Y_DQISI-HAM4_BLND_GS13RZ_IN1_DQPEM-EX_ACC_BSC9_ETMX_Z_DQPEM-EX_ACC_EBAY_FLOOR_Z_DQPEM-EX_ACC_OPLEV_ETMX_Y_DQPEM-EX_SEIS_VEA_FLOOR_Y_DQSUS-ETMX_L3_OPLEV_YAW_OUT_DQSUS-MC3_M1_DAMP_T_IN1_DQ A S C H P I I M C I S I L S C P E M P S L S U S Figure 8: Channel importance for the air compressor set from Karoo GP. Only auxiliary channelswith importance > .

01 are shown.and coupling mechanisms of noise artifacts in LIGO data. We focused our study on RF and GPalgorithms, testing these methods on two sets of non-astrophysical noise transients with knownorigin from the ﬁrst and second observing runs of Advanced LIGO. The magnetometer datasetcontains over 2000 noise artifacts of electromagnetic nature. The air compressor dataset contains afew tens of noise artifacts due to environmental seismic coupling. Due to their well-deﬁned spatialand temporal localization, and their well-understood origin, these datasets provide useful testbedsfor the RF and GP algorithms in two extreme cases of large and small samples.In order to test the applicability of our methods in real-world situations, we generated MLfeatures derived from the Omicron pipeline, the standard LIGO-Virgo event trigger generator thatis used to identify glitches in LIGO’s auxiliary channels during observing runs. The mechanicalcouplings at the origin of the disturbances are inferred by computing and then ranking the im-portance of the Omicron auxiliary channel features as employed by an ML binary classiﬁcationscheme of glitches versus background. The RF channel importance is evaluated by running thestandard scikit’s RF classiﬁer with ﬁxed number of estimators, and then iterating on the results toremove the features with importance below a pre-deﬁned threshold. The GP channel importanceis calculated by counting the number of channel occurrences in a subset of Karoo GP multivariateexpressions over a ﬁxed number of runs with varying hyperparameters.Both the RF method and the GP method are able to identify the origin of the glitches and inferthe relevant mechanical couplings in the detector. Although the ranking of the auxiliary channelsbecomes noisier as the size of the training dataset decreases, the algorithms allow for a successfulidentiﬁcation of the relevant channels even in the case of small datasets with few tens of glitches,such as the air compressor set. The ability to work with just a handful of triggers is relevant for18 uxiliary channel10 -2 -1 R F c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=10

Auxiliary channel10 -2 -1 R F c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=20

Auxiliary channel10 -2 -1 R F c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=100

Auxiliary channel10 -2 -1 R F c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=500

Figure 9: RF channel importance for dimensionally-reduced magnetometer datasets. From top left,clockwise: n = 10, n = 20, n = 500 and n = 100.prompt mitigation of new noise artifacts appearing suddenly in the data, as it would allow detectorcommissioners to quickly determine the origin of these artifacts without the need to collect a largeglitch sample.As our methods rely on standard Omicron triggers which are generated in low-latency at theLIGO and Virgo sites, once a list of GPS times for a class of (unknown) noise transients withcommon characteristics in the gravitational-wave strain channel or a given auxiliary channel isprovided, mechanical couplings can be quickly determined. Preparation of a training dataset witha few tens of noise triggers requires a few minutes on the LIGO computing clusters and channelrankings can be obtained within minutes by either of the two codes. Thus we envision the RFand GP codes as quick tools for on-demand data quality investigations during observing runs andcommissioning periods.In addition to providing a new commissioning tool, RF and GP techniques provide the proof ofconcept that ML can be successfully applied to the problem of inferring instrumental mechanicalcouplings. The ML landscape is very rich, with many diﬀerent methods for scientiﬁc data analysis.It is not unlikely that the results presented in this paper may be further improved by consideringother ML approaches, as well as diﬀerent methods of feature generation that exploit all aspectsof the detector data. The methods presented above could also be easily adapted to investigations19 .0 0.1 0.2 0.3 0.4 0.5 0.6FPR (Fall-out)0.600.650.700.750.800.850.900.951.00 T P R ( R e c a ll ) Dataset size (n)

Figure 10: ROC space for the magnetometer set as the size of training sample is varied. Each pointrepresents the result of a Karoo GP run with ﬁxed hyperparameters (tree base depth = 5, maximumtree depth = 5, minimum tree depth = 3, population = 300, generations = 100, tournament size= 20).beyond detector characterization. For example, when applied to astrophysical signals, the featureimportance ranking could be used to extract information about the most relevant physical propertiesof the gravitational-wave sources. Along this direction, several investigations are currently beingpursued in the LIGO and Virgo collaborations to apply ML techniques to diﬀerent problems fromidentiﬁcation and parameter estimation of gravitational-wave signals [24, 25, 28, 49] to detectorinstrumentation [50]. The future of ML applied to the analysis of gravitational-wave detector datais certainly bright.

This work has been supported by NSF grants PHY-1707668 and PHY-1404139. The authors wouldlike to thank colleagues of the LIGO Scientiﬁc Collaboration and the Virgo Collaboration for theirhelp and useful comments, in particular Dripta Bhattacharjee, Scott Coughlin, Elena Cuoco, KateDooley, Luciano Errico, Hunter Gabbard, Hartmut Grote, Sumeet Kulkarni, Shrobona Loveall,Lorena Maga˜na Zertuche, Kentaro Mogushi, and Jade Powell.

References [1] J. Aasi et al. [LIGO Scientiﬁc Collaboration], “Advanced LIGO,” Class. Quant. Grav. ,074001 (2015) doi:10.1088/0264-9381/32/7/074001 [arXiv:1411.4547 [gr-qc]].20 uxiliary channel10 -1 G P c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=10

Auxiliary channel10 -1 G P c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=20

Auxiliary channel10 -1 G P c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=100

Auxiliary channel10 -1 G P c h a nn e l i m p o r t a n c e A L S A S C C A L H P I I M C I S I L S C O M C P E M P S L S U S T C S Dataset dimension n=500

Figure 11: GP channel importance for dimensionally-reduced magnetometer datasets. From topleft, clockwise: n = 10, n = 20, n = 500 and n = 100. Only channels with importance larger than0.02 are shown.[2] F. Acernese [Virgo Collaboration], “The Advanced Virgo detector,” J. Phys. Conf. Ser. ,no. 1, 012014 (2015). doi:10.1088/1742-6596/610/1/012014[3] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], “Observation of Gravita-tional Waves from a Binary Black Hole Merger,” Phys. Rev. Lett. , no. 6, 061102 (2016)doi:10.1103/PhysRevLett.116.061102 [arXiv:1602.03837 [gr-qc]].[4] “New Worlds, New Horizons: A Midterm Assessment (2016),” The National Academies Press,ISBN: 978-0-309-44510-8, DOI: 10.17226/23560.[5] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], “GW151226: Observation ofGravitational Waves from a 22-Solar-Mass Binary Black Hole Coalescence,” Phys. Rev. Lett. , no. 24, 241103 (2016) doi:10.1103/PhysRevLett.116.241103 [arXiv:1606.04855 [gr-qc]].[6] B. P. Abbott et al. [LIGO Scientiﬁc and VIRGO Collaborations], Phys. Rev. Lett. , no.22, 221101 (2017) doi:10.1103/PhysRevLett.118.221101 [arXiv:1706.01812 [gr-qc]].[7] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], Phys. Rev. Lett. , no. 14,141101 (2017) doi:10.1103/PhysRevLett.119.141101 [arXiv:1709.09660 [gr-qc]].218] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], arXiv:1711.05578 [astro-ph.HE].[9] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], Phys. Rev. Lett. , no. 16,161101 (2017) doi:10.1103/PhysRevLett.119.161101 [arXiv:1710.05832 [gr-qc]].[10] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], “Upper limits on the rates ofbinary neutron star and neutron-star–black-hole mergers from Advanced LIGO’s ﬁrst observingrun,” ApJ Letters, in press, arXiv:1607.07456 [astro-ph.HE].[11] B. P. Abbott et al. [LIGO Scientiﬁc and Virgo Collaborations], “The Rate of BinaryBlack Hole Mergers Inferred from Advanced LIGO Observations Surrounding GW150914,”arXiv:1602.03842 [astro-ph.HE].[12] J. Aasi et al. [LIGO Scientiﬁc and VIRGO Collaborations], “Prospects for Observing andLocalizing Gravitational-Wave Transients with Advanced LIGO and Advanced Virgo,” LivingRev. Rel. , 1 (2016) doi:10.1007/lrr-2016-1 [arXiv:1304.0670 [gr-qc]].[13] T. Akutsu [KAGRA Collaboration], “Large-scale cryogenic gravitational-wave telescopein Japan: KAGRA,” J. Phys. Conf. Ser. , no. 1, 012016 (2015). doi:10.1088/1742-6596/610/1/012016[14] C. S. Unnikrishnan, “IndIGO and LIGO-India: Scope and plans for gravitational waveresearch and precision metrology in India,” Int. J. Mod. Phys. D , 1341010 (2013)doi:10.1142/S0218271813410101 [arXiv:1510.06059 [physics.ins-det]].[15] B. P. Abbott et al. et al. [LIGO Scientiﬁc and Virgo Collaborations], “Characterization of transientnoise in Advanced LIGO relevant to gravitational wave signal GW150914,” Class. Quant. Grav. , no. 13, 134001 (2016) doi:10.1088/0264-9381/33/13/134001 [arXiv:1602.03844 [gr-qc]].[18] J. Slutsky et al. , Class. Quant. Grav. , 165023 (2010) doi:10.1088/0264-9381/27/16/165023[arXiv:1004.0998 [gr-qc]].[19] M. Zevin et al. , Class. Quant. Grav. , no. 6, 064003 (2017) doi:10.1088/1361-6382/aa5cea[arXiv:1611.04596 [gr-qc]].[20] N. Mukund, S. Abraham, S. Kandhasamy, S. Mitra and N. S. Philip, Phys. Rev. D , no. 10,104059 (2017) doi:10.1103/PhysRevD.95.104059 [arXiv:1609.07259 [astro-ph.IM]].[21] E. Cuoco and M. Razzano, “Image-based deep learning for classiﬁcation of noise transients ingravitational-wave detectors,” LIGO Document P1700254-v3.[22] D. George, H. Shen and E. A. Huerta, arXiv:1711.07468 [astro-ph.IM].[23] D. George, H. Shen and E. A. Huerta, Phys. Rev. D , no. 10, 101501 (2018).doi:10.1103/PhysRevD.97.101501[24] D. George and E. A. Huerta, Phys. Rev. D , no. 4, 044039 (2018)doi:10.1103/PhysRevD.97.044039 [arXiv:1701.00008 [astro-ph.IM]].2225] D. George and E. A. Huerta, Phys. Lett. B , 64 (2018) doi:10.1016/j.physletb.2017.12.053[arXiv:1711.03121 [gr-qc]].[26] S. J. Kapadia, T. Dent and T. Dal Canton, Phys. Rev. D , no. 10, 104015 (2017)doi:10.1103/PhysRevD.96.104015 [arXiv:1709.02421 [astro-ph.IM]].[27] S. Vinciguerra et al. , Class. Quant. Grav. , no. 9, 094003 (2017) doi:10.1088/1361-6382/aa6654 [arXiv:1702.03208 [astro-ph.IM]].[28] H. Gabbard, M. Williams, F. Hayes and C. Messenger, arXiv:1712.06041 [astro-ph.IM].[29] Timothy Gebhard, Niki Kilbertus, Giambattista Parascandolo, Ian Harry and BernhardSch¨olkopf, Deep Learning for Physical Sciences (DLPS) 2017 workshop, 31st Annual Confer-ence on Neural Information Processing Systems (NIPS), Long Beach CA, December 8, 2017,https://dl4physicalsciences.github.io/ﬁles/nips dlps 2017 13.pdf[30] R. Vaulin, L. Blackburn, R. Essick and E. Katsavounidis, “iDQ: The Real-Time Pipeline forGlitch Identiﬁcation,” LIGO Document G1300253-v1.[31] R. Biswas et al. , Phys. Rev. D , no. 6, 062003 (2013) doi:10.1103/PhysRevD.88.062003[arXiv:1303.6984 [astro-ph.IM]].[32] J. Powell, D. Triﬁr`o, E. Cuoco, I. S. Heng and M. Cavagli`a, Class. Quant. Grav. , no. 21,215012 (2015) doi:10.1088/0264-9381/32/21/215012 [arXiv:1505.01299 [astro-ph.IM]].[33] J. Powell, A. Torres-Forn´e, R. Lynch, D. Triﬁr`o, E. Cuoco, M. Cavagli`a, I. S. Heng andJ. A. Font, Class. Quant. Grav. , 235005 (2011) doi:10.1088/0264-9381/28/23/235005 [arXiv:1107.2948[gr-qc]]. 2343] “aLIGO PEM System Upgrade,” LIGO Document T1200221-v5.[44] F. Robinet, “Omicron: an algorithm to detect and characterize transient events ingravitational-wave detectors,” Virgo Technical Note VIR-0545B-14, 2016, https://tds.ego-gw.it/ql/?c=10651[45] S.K. Chatterji, “The search for gravitational wave bursts in data from the sec-ond LIGO science run,” Ph.D. Thesis, Massachusetts Institute of Technology, 2005,http://hdl.handle.net/1721.1/34388[46] T. Forbriger et al. , Geophysical Journal International, Volume 183, Issue 1, 303-312, (2010)doi:10.1111/j.1365-246X.2010.04719.x[47] T. Forbriger et al.et al.