[PDF] A Machine Learning Approach to Integral Field Unit Spectroscopy Observations: II. HII Region LineRatios

Abstract

In the first paper of this series (Rhea et al. 2020), we demonstrated that neural networks can robustly and efficiently estimate kinematic parameters for optical emission-line spectra taken by SITELLE at the Canada-France-Hawaii Telescope. This paper expands upon this notion by developing an artificial neural network to estimate the line ratios of strong emission-lines present in the SN1, SN2, and SN3 filters of SITELLE. We construct a set of 50,000 synthetic spectra using line ratios taken from the Mexican Million Model database replicating Hii regions. Residual analysis of the network on the test set reveals the network's ability to apply tight constraints to the line ratios. We verified the network's efficacy by constructing an activation map, checking the [N ii] doublet fixed ratio, and applying a standard k-fold cross-correlation. Additionally, we apply the network to SITELLE observation of M33; the residuals between the algorithm's estimates and values calculated using standard fitting methods show general agreement. Moreover, the neural network reduces the computational costs by two orders of magnitude. Although standard fitting routines do consistently well depending on the signal-to-noise ratio of the spectral features, the neural network can also excel at predictions in the low signal-to-noise regime within the controlled environment of the training set as well as on observed data when the source spectral properties are well constrained by models. These results reinforce the power of machine learning in spectral analysis.

Full PDF

DDraft version February 15, 2021

Typeset using L A TEX twocolumn style in AASTeX63

A Machine Learning Approach to Integral Field Unit Spectroscopy Observations: II. HII Region LineRatios

Carter Rhea , Laurie Rousseau-Nepton , Simon Prunet ,

2, 3

Myriam Prasow-´Emond , Julie Hlavacek-Larrondo , Natalia Vale Asari ,

4, 5, 6

Kathryn Grasha , andLaurence Perreault-Levasseur

1, 8, 9 D´epartement de Physique, Universit´e de Montr´eal, Succ. Centre-Ville, Montr´eal, Qu´ebec, H3C 3J7, Canada Canada-France-Hawaii Telescope, Kamuela, HI, United States Laboratoire Lagrange, Universit´e Cˆote d’Azur, Observatoire de la Cˆote d’Azur, CNRS, Parc Valrose, 06104 Nice Cedex 2, France Departamento de F´ısica–CFM, Universidade Federal de Santa Catarina, C.P. 476, 88040-900, Florian´opolis, SC, Brazil School of Physics and Astronomy, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK Royal Society–Newton Advanced Fellowship Research School of Astronomy and Astrophysics, Australian National University, Weston Creek, ACT 2611, Australia Mila - Quebec Artiﬁcial Intelligence Institute, Montreal, Qu´ebec, Canada Center for Computational Astrophysics, Flatiron Institute, New York, USA (Received; Revised; Accepted February 15, 2021)

Submitted to ApJABSTRACTIn the ﬁrst paper of this series (Rhea et al. 2020), we demonstrated that neural networks can robustlyand eﬃciently estimate kinematic parameters for optical emission-line spectra taken by SITELLE atthe Canada-France-Hawaii Telescope. This paper expands upon this notion by developing an artiﬁcialneural network to estimate the line ratios of strong emission-lines present in the SN1, SN2, and SN3ﬁlters of SITELLE. We construct a set of 50,000 synthetic spectra using line ratios taken from theMexican Million Model database replicating

H ii regions. Residual analysis of the network on the testset reveals the network’s ability to apply tight constraints to the line ratios. We veriﬁed the network’seﬃcacy by constructing an activation map, checking the [

N ii ] doublet ﬁxed ratio, and applying astandard k-fold cross-correlation. Additionally, we apply the network to SITELLE observations ofM33; the residuals between the algorithm’s estimates and values calculated using standard ﬁttingmethods show general agreement. Moreover, the neural network reduces the computational costs bytwo orders of magnitude. Although standard ﬁtting routines do consistently well depending on thesignal-to-noise ratio of the spectral features, the neural network can also excels at predictions in thelow signal-to-noise regime within the controlled environment of the training set as well as on observeddata when the source spectral properties are well constrained by models. These results reinforce thepower of machine learning in spectral analysis.

Keywords:

Machine Learning; ISM; Galaxies INTRODUCTIONEmission-line nebulae are a critical part of our under-standing of galactic evolution and radiative processes;thus, they are a primary targets of observation in ex-tragalactic astronomy (Kennicutt 1984; Veilleux & Os-

Corresponding author: Carter [email protected] terbrock 1987; Kewley et al. 2019).

H ii regions formfrom clumps of gas in the interstellar medium (ISM)when young O/B stars irradiate the surrounding envi-ronment (e.g. Franco et al. 2000; Osterbrock & Fer-land 1989; Shields 1990). The region becomes eitherpartially or fully ionized depending on the budget andhardness of ionizing photons, the morphology and thetotal mass of the mother cloud.

H ii regions are primar-ily composed of Hydrogen and Helium; however, they a r X i v : . [ a s t r o - ph . GA ] F e b Rhea et al. contain non-negligible amounts of metals (e.g. Shields& Tinsley 1976; Garnett & Shields 1987; Kennicutt &Oey 1993; Oey & Kennicutt 1993). Through recom-bination and collisonal processes between the ionizedatoms and the free electrons, the nebula emits character-istic strong emission-lines which indicate their underly-ing chemical structure (e.g. Baldwin et al. 1981; Craw-ford et al. 1999; Kewley et al. 2001; Kewley et al. 2006;Kewley et al. 2019). In the optical, the primary emis-sion lines include, but are not limited to, the Balmer se-ries (i.e. H αλ βλ O ii ] λλ O iii ] λλ N ii ] λλ S ii ] λλ H ii regions(e.g. Viallefond 1985; Melnick et al. 1987), shock in-duced regions such as supernova remnants (e.g. Fesenet al. 1985; Danziger & Dennefeld 1976), and planetarynebulae (e.g. Miller 1974; Oserbrock 1964). Severaldiagnostics exist to categorize emission nebulae basedoﬀ their line ratios (e.g. Baldwin et al. 1981; Kew-ley et al. 2019; Constantin & Vogeley 2006; D’Agostinoet al. 2019). Nevertheless, these diﬀerent diagnosticshave limitations due to the multi-parameters physicalnature of these objects which is not exempt of degener-acy. With that in mind, we want to test the hypothesisthat ﬁtting multiple strong lines directly using modelpredictions (e.g. CLOUDY, MAPPINGS; Ferland et al.2017 Allen et al. 2008) that covers typical physical con-ditions in the gas could help minimize the errors associ-ated on the line parameters (intensity, broadening andvelocity), provide a pathway to classiﬁcation of the neb-ulae directly from the ﬁt, and potentially signiﬁcantlyincrease the computational eﬃciency of the procedureas well as providing a new angle to estimate uncertain-ties.The recent advent of integral ﬁeld spectroscopy is ex-panding our knowledge of emission-line nebulae withtheir increased spatial and spectral resolution; these in-struments are pushing the ability of existing analysistools due to their resolution. (e.g. S´anchez et al. 2012;Leroy et al. 2016; Bundy et al. 2014; Martins et al. 2010).SITELLE, an Imaging Fourier Transform Spectrograph(IFTS) located at the Canada-France-Hawai’i Telescope,is one such instrument. SITELLE has an unrivaled ﬁeldof view of 11 (cid:48) × (cid:48) and produces data cubes containingover 4 million pixels. Each pixel contains a spectrum.(e.g. Baril et al. 2016; Drissen et al. 2019). SITELLEhas a spectral resolution between 1 and 20,000. It’s in- strumental line shape is described by a cardinal sincfunction. It can be convolved with a Gaussian (Martinet al. 2016) to account for natural broadening of the ob-served lines. Spectral ﬁts must be done cautiously to ac-curately model the line shape and capture the eﬀects ofsidelobes of the sinc, which can greatly inﬂuence the es-timated ﬂux of an emission-line (e.g. Martin et al. 2012).In many ﬁeld of astrophysics, it is necessary to model re-solved and unresolved (blended) lines accurately. Emis-sion line parameters are the root information used toinfer physical properties of locally-resolved nebulae aswell as classifying them, but they are also crucial forproper characterisation of galaxies at diﬀerent redshift.For the later, the line parameters are essential to fullyunderstand how internal and external processes aﬀectgalaxies as one cannot disentangle active galactic nu-cleus, stellar winds, supernovae, etc, multiple feedbackprocesses that aﬀect the whole galaxy in diﬀerent waysand at diﬀerent scales.In this paper we apply an artiﬁcial neural network toSITELLE spectra in order to obtain the strong emission-line ratios and validates its results by comparing withthe line ratio derived from a standard line ﬁtting tech-niques. We demonstrate the capability of the machinelearning algorithm in low signal-to-noise regimes. In § §

3, we describehow the network was trained and how it compares withthe standard ﬁtting procedure. In §

4, we explore theimpact of the signal-to-noise on the algorithm in addi-tion to applying it on a real observation of M33, a wellresolved local galaxy. We conclude in § METHODOLOGYIn the ﬁrst article of the series, Rhea et al. (2020),we explored the application of a convolutional neuralnetwork to calculate the kinematic parameters fromSITELLE spectra. In this paper we expand the useof machine learning to calculate another critical set ofphysical parameters, the line ratios of strong emission-lines. As in the previous paper, the initial step in anymachine learning application is to assemble the appro-priate training set.2.1.

Synthetic Data

In order to facilitate the training of the networkused to estimate strong-line ratios, we rely on care-fully constructed synthetic spectral data. Synthetic achine Learning Approach to Spectroscopic Observations II α (6563)˚A, [ O ii ] λ O ii ] λ β (4861)˚A, [ O iii ] λ O iii ] λ N ii ] λ N ii ] λ α (6563)˚A, [ S ii ] λ S ii ] λ orb.fit.create cm1 lines model (Martin et al.2012); ORB is the software kernel written for SITELLE,the imaging Fourier Transform Spectrometer on theCanada-France-Hawaii Telescope (Martin et al. 2016;Martin & Drissen 2017; Baril et al. 2016).

ORBS and

ORCS build upon the

ORB kernal and are used for datareduction and data analysis, respectively.Following the SIGNALS large program instrumentalconﬁguration for the observations, we set the maximalspectral resolution in SN1 and SN2 to R ∼ ∼ ∼

200 lessthan the maximal value (e.g. Martin & Drissen 2017).Previous work reveled that this is a key parameter foran accurate training of the algorithm (Rhea et al. 2020).To fully sample the velocity and broadening space ex-pected in the SIGNALS catalog (and in keeping withour previous study), we randomly select the velocity pa-rameter from a uniform distribution between -200 and200 km s − and the broadening parameter between 10and 50 km s − . These values represent expected val-ues for typical HII regions (e.g. Epinat et al. 2008);additionally, at R ∼ − (Rousseau-Nepton et al. 2019). The resolution, broadening, andvelocity were randomly selected with replacement foreach synthetic spectra so that we sample the entire pa-rameter space (e.g., James et al. 2013). Furthermore,the signal-to-noise ratio varied randomly between 5 and30 with respect to the Halpha emission – meaning thatHalpha emission is at least 5 times over the noise. How-ever, this does not ensure that all lines are above thenoise. The noise was assigned to each spectral channelindividually and was randomly sampled from a normaldistribution centered around the chosen signal-to-noiseratio with a sigma of 1. The last element required to create synthetic spectrais the relative amplitude of the strong emission-lines.For each type of nebulae, we used diﬀerent databases.Following the methodology described in detail in Rheaet al. (2020), relative line amplitude of the HII re-gions were sampled from the Mexican Million Modelsdatabase (3MdB; Morisset et al. 2015) BOND simula-tions (Vale Asari et al. 2016). Following standard proce-dure, we use 70% of the synthetic data for the trainingset, 20% for the validation set, and 10% for the test set(e.g., Breiman 2001).In contrast with our previous study, Rhea et al. 2020,we require all strong lines sampled in the SIGNALS ob-servations to be present. We also add the restrictionthat all strong emission-lines must have an amplitudeequal to or greater than 12% of the H α amplitude. Thisthreshold corresponds to a signal-to-noise ratio of 3 forthe faintest targets in the SIGNALS sample whose H α surface brightness is approximately 8 × − erg s − cm − arcsec − (Rousseau-Nepton et al. 2019). Whilethe impact of this threshold is discussed more in-depthin § § Artiﬁcial Neural Networks

In this paper, we study the application of an artiﬁcialneural network to the problem of strong emission-lineratio estimation. Using the synthetic spectra describedin § The Algorithm

Feed Forward artiﬁcial neural networks (ANN) con-tain three principal layers: the input layer, the hiddenlayer(s), and the output layer (e.g. Hansen & Salamon1990). The input layer consists of the preproccesseddata that will be used to train the network and eventu-ally be fed unseen inputs to make predictions. In thiscase, the input layer is the combined SN1, SN2, and SN3SITELLE observations described previously. The out-put layer consists of line ratio estimates. The hidden lay-ers contain an ensemble of nodes which are parametrized

Rhea et al. [ S ii ] λ S ii ] λ S ii ]/H α [ N ii ]/H α [ N ii ]/[

S ii ] [

O iii ]/H β [ O ii ]/H β [ O ii ]/[

O iii ] H α /H β *0.73-0.79 0.27-1.18 0.27-2.03 0.14-6.97 0.33-2.47 0.24-6.97 0.60-30.89 1.93-5.99 Table 1 : Ranges of line ratio parameters from synthetic spectra. The distribution of the parameters are all skewedheavily to the lower values (with the exception of the [

S ii ] doublet ratio which is evenly distributed over a smallrange) which represents the likelihood of ﬁnding those parameters in the simulations from which these values areobtained. After applying an arcsinh transformation, the variables are normally distributed. * The wide range is dueto artiﬁcially injected dust redenning following the prescription described in § x , multiplies itby a weight matrix, w , and adds a bias, b . The node isthen activated by a predetermined function. A commonactivation function, the Rectiﬁed Linear Unit (ReLU), isemployed in every layer of the network used in this paperexcept for the ﬁnal layer (e.g. Chen et al. 1990). TheReLU function, g ( w × x + b ) = max(0 , w × x + b ), takesthe value of the node unless the value is negative; in thatcase, it takes the value 0. The result is a non-negativevalue for each node in a layer where the vector-valuedfunction of each layer, l , is denoted as h l ( x , b , w ). In tra-ditional neural networks, such as the network applied inthis work, the layers are fully-connected ; thus, each nodein a layer is connected to all nodes in the previous andsubsequent layer.After calculating h l ( x , b , w ) sequentially for eachlayer, we have a vector-valued output f ( x , b , w ). Wecan calculate the loss between the ﬁnal predictions, f ,and the correct outputs, y . We adopt the Huber lossfunction which is deﬁned as: k δ ( f, y : w, b )) =  ( y − f ( w, b )) | y − f ( w, b ) | ≤ δδ | y − f ( w, b ) | − δ otherwise (1)where δ is a tune-able parameter and initialized as 1.The Huber loss function is often employed since it re-duces the eﬀects of outliers on the ﬁnal cost calculation(e.g. Huber 1964).We then minimize the loss function through back-propagation in which we alter the weights and biases(Hecht-Nielsen 1989):arg min w,b k ( f, y ; w, b ) (2)We apply the Adam implementation of the stochasticgradient descent algorithm in order to minimize the lossfunction (Lechevallier & Saporta 2010; Kingma & Ba2017). In this manner, we are able to train the net-work by updating the weights and biases until the lossfunction is minimized on the training set.2.3. The Network and Hyper-Parameters

In order to determine the structural parameters of thenetwork, we ﬁrst constructed four networks; The ﬁrst

Figure 1 : Final mean absolute percentage error afterﬁve epochs of training as a function of the number oflayers in the neural network. Each layer contains 1000nodes and is activated by a relu function. The graphicsshow the curve for the mock training, validation, andtest sets. We emphasize that the synthetic data used inthe structural parameter tuning were not used again.network had two hidden layers, the second network threehidden layers, and so on. Since the problem is nonlin-ear, we did not test a single hidden layer network. Wenote that the training, validation, and test sets used todetermine the structural parameters contain only 10000synthetic spectra. These spectra were not re-used inthe ﬁnal training, validation, and testing ensemble. Ini-tially, each hidden layer contained 1000 nodes. Thisnumber is suﬃcient to test the optimal number of layers(e.g. Sheela & Deepa 2013). Figure 1 demonstrates thatthe MAPE plateaus for the training set at four layers.However, since the mean squared error (MSE) increasein both the validation and test sets between three andfour layers, we chose three layers as the optimal value.Although the ﬁgure is not included, the same patternshold for the mean absolute error. Next, we applied agrid search on the number of hidden neurons allowingeach to vary from 64, 128, 256, 512, and 1024 nodes.The MSE was minimized for all sets when the ﬁrst layerhas 1024 nodes, the second layer has 1024 nodes, and achine Learning Approach to Spectroscopic Observations II Figure 2 : Graphical depiction of the artiﬁcial neural network employed in this paper. The spectrum vectors for SN1,SN2, and SN3 are combined to form a single input vector which is then passed through a series of three fully connectedlayers. A linear activation function takes the ﬁnal layer and compresses it into a ﬁnal prediction for each line ratio.the third layer has 512 nodes. Figure 2 graphically de-picts the entire network. The structure of the algorithmis as follows:1. Concatenate spectral vectors from SN1, SN2, andSN3 as input2. Fully Connected Layer with ReLU Activation(1x1024)3. Dropout (25%)4. Fully Connected Layer with ReLU Activation(1x1024)5. Dropout (15%)6. Fully Connected Layer with ReLu Activation(1x512)7. Fully Connected Output Layer with Linear Acti-vation (1x8)We determined the optimal hyperparameter values byusing the grid search and random search cross-validationtechniques implemented in sklearn . The batch size’soptimal value was determined to be four. Additionally,we adopt a normal random distribution as the initial-ization of the hidden layer neurons (Thimm & Fiesler1995; de Castro et al. 1998). The data are normalizedusing to the maximum value in the the SN3 ﬁlter whichcorresponds to the maximum value in the combined ﬁl-ter. We note that this is not necessarily the amplitudeof the H α line. We use the tensorflow implementationof the ADAM optimization algorithm.Initial testing of the algorithm revealed a systematicbias of approximately 10% in the residuals (see § log10 , ln , arcsinh , etc.). Experimentationshowed that applying an arcsinh transformation re-sulted in normalized target variable distributions. Fur-thermore, this reduced the systematic bias considerably(e.g. Zheng & Casari 2008; Kuhn & Johnson 2019).Additionally, we applied a simple l2 regularization tech-nique with λ = 5 × − (e.g. Phaisangittisagul 2016;van Laarhoven 2017).In order to increase the accuracy of the method andprovide error estimates, we employ a technique knownas deep ensembling (Lakshminarayanan et al. 2017).This method leverages the fact that each neural networkis independent of the other. We train ten individualnetworks with the same architecture but with diﬀerentweight initializations. We then apply each network tothe test data individually. We average the classiﬁcationprobabilities and allow the standard deviation to act asan uncertainty estimate.2.4. SITELLE Data

SITELLE observations of the Southwest Field of M33led by P.I. Laurie Rousseau-Nepton were taken duringthe Queued Service Observation Period 18B (Program18BP41). The galaxy was imaged in the three primaryﬁlters: SN1, SN2, and SN3. Both the SN1 and SN2 ob-servations were taken at a spectral resolution R ∼ R ∼ Dereddening

Rhea et al.

Since we are using emission-line ratios spanning alarge range of wavelengths, we must include the eﬀectsof dust attenuation by reddening the spectra (Calzettiet al. 1994; Buat & Xu 1996; Pettini et al. 2001). Wecalculate dereddening following the standard procedureof postulating an eﬀective dust screen attenuation toobtain the intrinsic emission-line ﬂuxes ( F ,λ ), F ,λ = F obs ,λ e τ λ = F obs ,λ e τ V q λ , (3)where F obs ,λ is the observed ﬂux, τ λ is the optical depthat a given wavelength, τ V is the optical depth in the V -band, and the shape of the dust attenuation curve isparametrized by q λ ≡ τ λ /τ V . We adopt the Cardelliet al. (1989) attenuation law with a total-to-selectiveextinction R V = 3 .

1. We use the Balmer decrement,deﬁned as B d = F obs , H α /F obs , H β , to calculate τ V : τ V = 1 q Hβ − q Hα ln B d B d,in , (4)where B d,in is the intrinsic Balmer decrement. We as-sume this value to be 2.87, which is appropriate for theCase B, an electronic temperature of 10 000 K and lowdensity (e.g. Osterbrock & Ferland 1989). Deredden-ing is applied to the line ratios post calculation by theneural network. RESULTS3.1.

Optimal Synthetic Data

In this section we discuss the primary results of thepaper and address the eﬃcacy of the ANN when ap-plied to optimal SITELLE HII regions sythethic spec-tra in order to predict the following strong emission-lineratios : [

N ii ] λ N ii ] λ S ii ] λ S ii ] λ S ii ] λ S ii ] λ α , [ N ii ] λ α , H α /H β ,[ N ii ] λ S ii ] λ S ii ] λ O iii ] λ β ,([ O ii ] λ O ii ] λ O iii ] λ O ii ] λ O ii ] λ β .We train and validate the network using the syn-thetic data described in § (cid:80) Ni abs ( y true i − y pred i ); N is the number of train-ing and validation input) on the training and valida-tion are 0.0555 and 0.1207, respectively. The algorithmis applied to the test set; Figure 3 shows the relativeerrors achieved by the network when recovering eachemission-line ratio. We note that all line-ratios recov-ered from SN3 have a low residual standard deviation;this is attributed to the higher spectral resolution inSN3 compared to SN1 and SN2. All error plots re-veal approximately Gaussian error distributions with apositive skew. In order to validate these results usingthe standard ﬁtting techniques, we use the ORB routine fit.fit lines in spectrum . We individually ﬁt eachﬁlter; however, within a given ﬁlter, we ﬁt all emission-lines simultaneously and supply the routine with thecorrect velocity and broadening parameters initially inorder to retrieve the best possible ﬁts. The lines were ﬁtwith a sincgauss function (Martin et al. 2016). Table2 displays the comparison between the relative errorsobtained using standard ﬁtting routines and our ANNon the training set. In these conditions, the networkoutperforms the standard method for line ratios in allthree ﬁlters. It is important to note that the relativelyhigh errors in the ﬁts are largely due to signal-to-noiseeﬀects (see § α /H β errors have the potentialto be reduced signiﬁcantly which will lead to higher ﬁ-delity dereddening estimates.3.2. Noise Classiﬁcation

In this section we evaluate the network used to classifya spectrum as noisy or clean; we deﬁne a spectrum cleanif the SNR of all strong emission-lines is above a certainpre-determined threshold – in this case 5% that of H α .This value was determined by running our network onthe SN3 spectra of varying thresholds (from 1%–20%).The results indicated that below 5%, the ability of thenetwork to recover the line ratios becomes inhibited.Synthetic spectra for all ﬁlters were created in amethod identical to that described in § α .Spectra were classiﬁed as noisy if any emission-line hadan amplitude less than 5% that of H α ; otherwise, thespectrum was classiﬁed as noiseless. In spectra where asingle strong emission-line amplitude was below the cho-sen threshold, several other lines were also beneath thethreshold; thus, this constraint accurately categorizesdata into noisy and noiseless. We created 1,000 noisyand 1,000 noiseless spectra with signal-to-noise ratios ofH α varying between 5 and 30.We use a Decision Tree Classiﬁer, but in order to re-duce bias and probabilistic errors, we aggregate severaltrees into a Random Forest. The data were randomlyshuﬄed; 90% were set aside for training, and 10% fortesting. Using 10 estimators (or 10 decision trees), theRandom Forest Classiﬁer reports 100% accuracy in clas-sifying the two spectral types resulting in a diagonalconfusion matrix. Although reaching perfect accuracy achine Learning Approach to Spectroscopic Observations II Fitting Procedure [

S ii ] λ S ii ] λ S ii ]/H α [ N ii ]/H α [ N ii ]/[

S ii ] [

O iii ]/H β [ O ii ]/H β [ O ii ]/[

O iii ] H α /H β ORB

ORB (SNR>10)

ANN

Table 2 : Standard deviation calculations for relative errors of strong emission-line ratios. The top row reports valuescalculated using the standard

ORB routine while the second row calculates the same values for test set spectra with asignal-to-noise ratio greater than ten; comparatively, the bottom row contains values obtained using our ANN. [

S ii ]refers to the addition of the [

S ii ] doublets: [

S ii ] λ S ii ] λ O ii ] refers to the addition of thetwo primary [

O ii ] lines: [

O ii ] λ O ii ] λ O iii ] refers only to a single line: [

O iii ] λ §

100 75 50 25 0 25 50 75 1000.00.10.20.30.40.5 D e n s i t y S[II] 6731/S[II] 6716

100 75 50 25 0 25 50 75 1000.00000.00250.00500.00750.01000.01250.01500.0175

N[II] 6584/H (6563) Å

100 75 50 25 0 25 50 75 1000.000.010.020.030.040.050.06

O[II] 3726/H

100 75 50 25 0 25 50 75 1000.0000.0050.0100.0150.0200.0250.0300.035

OII 3726/OIII 5007

100 75 50 25 0 25 50 75 100

Relative Error (%) D e n s i t y (S[II] 6717+S[II] 6731)/H (6563) Å

100 75 50 25 0 25 50 75 100

Relative Error (%)

N[II] 6584/(S[II] 6717+S[II] 6731)

300 200 100 0 100 200 300

Relative Error (%)

OIII 5007/H

100 75 50 25 0 25 50 75 100

Relative Error (%)

H /H

Figure 3 : Density plots of line ratio relative errors calculated using the ANN described in § DISCUSSION4.1.

Veriﬁcation of the Network

Although neural networks trained on synthetic dataare notoriously diﬃcult to verify, we explore severalmethods to test whether the network is accurately learn-ing the line ratios and is portable to real data (e.g.Bishop 1994; Krogh & Vedelsby 1995). We applythe following three techniques: k-fold cross-validation,tracking the static [

N ii ] λ N ii ] λ N ii ] λ N ii ] λ Rhea et al.

Figure 4 : Relative error as a function of signal-to-noise for the [

N ii ] doublet: [

N ii ] λ N ii ] λ σ errorsassociated with a given bin.throughout all emission-line nebula and is frequently setto 3 in ﬁtting code (e.g. Schirmer et al. 2013). Thisis reﬂected in the synthetic data set. Figure 4 revealsthat the calculated [ N ii ] doublet ratio relative error be-tween the true and network-estimated value is between0 and -1% while the standard deviation is approximately0.27%. Therefore, the network accurately replicates thestatic relation between the [

N ii ] lines.Subsequently, we calculate the saliency map of thenetwork. In order to calculate the saliency map, we cal-culate the derivative of the input with respect to theoutput of the neural network: ∂ [ input ] ∂ [ output ] . By multiplyingthe gradients together, we are left with our desired par-tial derivative (e.g. Simonyan et al. 2014). We normalizethe values to unity in order to compare their relative im-portance with ease. In this manner, input nodes with asaliency value of zero do not aﬀect the neural network,while input nodes with a saliency value of 1 have themost importance in aﬀecting the neural network’s esti-mation. In Figure 5, we show the saliency values plottedover a reference spectrum with a signal-to-noise of 20; wenote that saliency values less than 0.05 are not includedin the ﬁgure. The saliency map clearly shows that thenetwork prioritizes the amplitude of the emission lines.However, the map also reveals the importance of the re-gions adjacent to the emission-lines and the areas of thespectrum in between emission-lines. This reﬂects theimportance of the sidelobes of the ILS on the ﬂux ratioestimates.4.2. Eﬀects of Varying Signal-to-Noise

Signal-to-noise constrains the eﬃcacy of traditionalﬁtting methods (e.g. Campbell et al. 1986; Endl & Cochran 2016). In this section, we compare the eﬀects ofsignal-to-noise on the standard ﬁtting techniques imple-mented in

ORB compared to the artiﬁcial neural networkdescribed in this paper.In order to study the eﬀects of the signal-to-noise onthe eﬃcacy of line ratio estimates, we bin the line ratioresiduals as a function of signal-to-noise. A signal-to-noise bin is created at each integer value of the sampledsignal-to-noise used to create the synthetic data (5-030).Residuals were calculated by taking the estimated valueand subtracted the ground truth and dividing that valueby the ground truth value; the value was then multipliedby 100 to make it a percentage. We then removed alloutliers; outliers are deﬁned as residual values more than3- σ oﬀ the median value. We then calculated the medianvalue of the remaining set of residuals in each signal-to-noise bin. Errors were calculated as the 1- σ deviationsfrom the median. The plots are shown in appendix A.Figure 7 demonstrates that the residuals do not changeas a function of the signal-to-noise when calculated bythe neural network. Conversely, the residuals and theirassociated errors are greatly reduced in high signal-to-noise regimes ( R >

20) when calculated using standardﬁtting techniques.4.3.

Application to M33

Having demonstrated the feasibility of using a neuralnetwork to estimate strong emission-line ratios, we applyour methodology to the Southwest ﬁeld of M33 studiedin our previous article (Rhea et al. 2020). This ﬁeld con-tains several previously identiﬁed emission region types(classic

H ii regions, supernova remnants, and planetarynebulae; e.g. Zaritsky et al. 1989; Hodge et al. 1999;Viallefond et al. 1986); additionally, this ﬁeld is a SIG-NALS target. All ﬁts (both from the algorithm devel-oped here and

ORCS ) were run using a computing serverlocated at the CFHT headquarters in Waimea, Hawaiinamed iolani . The server has 2 Intel

XEON E5-2630 v3

CPUs operating at 2.40GHz with 8 cores each. Theconﬁguration also has 64 GB of RAM available for com-puting purposes.To compare our results with those from the

ORCS ﬁt-ting pipeline, we ﬁt each cube seperately using

ORCS . Weuse the fit lines in region function to ﬁt the strongemission-lines present in a given ﬁlter. Within a ﬁlter,the lines are ﬁt simultaneously with a single sinc func-tion convolved with a Gaussian which returns each line’sﬂux, velocity, and broadening (Martin & Drissen 2017).

ORCS ﬁts the observed data to a sinc function convolvedwith a Gaussian using the Levenberg-Marquardt leastsquares optimization algorithm. Velocity and broaden-ing priors were determined by ﬁtting a binned (8 × achine Learning Approach to Spectroscopic Observations II Figure 5 : Activation Map of the SN1, SN2, and SN3 ﬁlters using the reference spectrum. The relative weights arecentered on emission line peaks and surrounding regions reﬂecting the importance of amplitude and side-lobes on ﬂuxratio estimations. The signal-to-noise of this sample spectrum is approximately 20.data cube. The ﬁnal unbinned ﬁts ( ∼ ORCS ﬁts in order to quantify the accuracy of the algo-rithm. Normalized residual plots for each line ratio canbe found in appendix B. A visual inspection of the resid-ual plots reveal that the machine learning algorithm re-turns similar values to those calculated by

ORCS . Resultsdeviate most strongly in regions which we show in Rheaet al. (2020) to be best described by multiple emissionproﬁles, regions identiﬁed as non-

H ii regions, and thosewith a low signal-to-noise ratio. We note, however, thatthe normalized residual plots (Figures 9 and 10) showgeneral agreement between the standard

ORCS ﬁts andthose calculated by the neural network. Furthermore,discrepancies between the

ORCS and neural network es-timates illustrate the limitations of such an approachwhen used as a replacement to global ﬁtting algorithmssuch as

ORCS . These results further indicate the impor-tance of taking multiple emission proﬁles into account

Figure 6 : Coadded H α and [ N ii ] λ α ﬂuxless than 2 × − erg s − cm − are masked out whichcorresponds to a signal-to-noise ratio of approximately5.when modeling – this is the topic of the following paperin the series. CONCLUSIONSApplications of machine learning in astronomy arebroad: from the estimation of stellar spectral param-eters (e.g., Fabbro et al. 2018) to the discovery of ob-jects of interest in extensive astronomical surveys (e.g.,ˇSkoda et al. 2020). In this work, we apply an artiﬁcialneural network to combined-ﬁlter (SN1, SN2, and SN3)SITELLE data representing typical SIGNALS large pro-gram observations. The network is designed to calcu-late important emission-line ratios for

H ii -like regionswhich are present in the primary SITELLE ﬁlters. Wetrain, validate, and test the algorithm using syntheticdata created with the

ORBS software package. We adopt0

Rhea et al. physically-derived line amplitudes from the Million Mex-ican Model Database (Morisset et al. 2015). Our resultsindicate that the network can potentially constrain theline ratios with greater precision than the standard lineﬁtting technique implemented in

ORCS if the source spec-tral properties are well represented in the training set.To demonstrate the applicability of the method beyondsynthetic data, we apply the network to the Southwestﬁeld of M33. Timing analysis indicates that the networkcan analyze the entire cube approximately 100 timesfaster than the standard methods.These results not only have an impact on the compu-tational aspects of line ratio calculations, but they alsocarry scientiﬁc implications. Although our knowledge ofgalactic dynamics has expanded considerably over thepast several decades, spectroscopic conclusions are re-stricted by the rigor of the ﬁtting schemes employed andprecision of the results. In this paper, we have demon-strated that machine learning algorithms can consider-ably increase the precision of emission line ratios in bothlow and high signal-to-noise regimes. This has profoundimplications on the study of these regions since it willallow stricter categorization using methods such as line-ratio diagnostics in conjunction with BPT diagrams (e.g.Baldwin et al. 1981; Kewley et al. 2006; Kewley et al.2019). These methods require concise measurements inorder to accurately categorize the emission-region typeand break any model degeneracies.Following up on the success of our ﬁrst report, thework presented here represents the second article in aseries of articles covering the application of machinelearning algorithms to SITELLE data cubes. Our re- sults have been encouraging in mapping out emissionline ratios in pixels dominated by

H ii region emissionserves as a proof-of-concept that using machine learn-ing to identify line ﬂuxes is a viable methodology. Wenote this work is not meant to be a replacement toglobal line ﬁtting algorithms. Identifying regions con-taining multiple, blended emission components, as wellas multiple sources of emission with spectral featuresnot represented in a training set remains to be ex-plored. Additionally, example code can be found athttps://github.com/sitelle-signals/Pamplemousse.ACKNOWLEDGMENTSThe authors would like to thank the Canada-France-Hawaii Telescope (CFHT) which is operated by the Na-tional Research Council (NRC) of Canada, the InstitutNational des Sciences de l’Univers of the Centre Nationalde la Recherche Scientiﬁque (CNRS) of France, and theUniversity of Hawaii. The observations at the CFHTwere performed with care and respect from the sum-mit of Maunakea which is a signiﬁcant cultural and his-toric site. C. R. acknowledges ﬁnancial support from thephysics department of the Universit´e de Montr´eal, theMITACS summer scholarship program, and the IVADOdoctoral excellence scholarship. J. H.-L. acknowledgessupport from NSERC via the Discovery grant program,as well as the Canada Research Chair program. NVAacknowledges support of the Royal Society and the New-ton Fund via the award of a Royal Society–NewtonAdvanced Fellowship (grant NAF \ R1 \ A. SIGNAL-TO-NOISE AND RESIDUALSIn this section we display the signal-to-noise vs residual plots described in § § ORB as described in § B. LINE RATIO RESIDUAL PLOTSThis section contains the line ratio residual plots (ORCS ﬁts - ANN estimates; Figures 9 and 10). The two methodsare in agreement in regions found to be best described by a single emission proﬁle for each strong-line (see Rhea et al.(2020) for details); in regions best described by two emission proﬁles, the results diﬀer signiﬁcantly. In the subsequentpaper, we will explore machine learning techniques to determine whether or not emission regions are best describedby a single or double emission proﬁle. REFERENCES

Allen, M. G., Groves, B. A., Dopita, M. A., Sutherland,R. S., & Kewley, L. J. 2008, The Astrophysical JournalSupplement Series, 178, 20, doi: 10.1086/589652 Baldwin, J. A., Phillips, M. M., & Terlevich, R. 1981,Publications of the Astronomical Society of the Paciﬁc,93, 5, doi: 10.1086/130766 achine Learning Approach to Spectroscopic Observations II Figure 7 : Signal-to-noise ratio vs relative errors for the estimations obtained on the test set using the artiﬁcial neuralnetwork described in this paper. The signal-to-noise bins were taken at integer intervals. The black dots are the meanresiduals. The grey bars represent the 1-sigma errors in a given signal-to-noise bin.

Figure 8 : Signal-to-noise ratio vs relative errors for the estimations obtained on the test set using the software package

ORBS . The black dots are the mean residuals. The signal-to-noise bins were taken at integer intervals. The grey barsrepresent the 1-sigma errors in a given signal-to-noise bin.

Baril, M., Grandmont, F., Mandar, J., et al. 2016,International Society for Optics and Photonics, 9908Bengio, Y., & Grandvalet, Y. 2004, Journal of MachineLearning Research, 1089Bishop, C. M. 1994, IEE Proceedings - Vision, Image andSignal Processing, 141, 217, doi: 10.1049/ip-vis:19941330Breiman, L. 2001, Machine Learning, 45, 5Buat, V., & Xu, C. 1996, Astronomy and Astrophysics,306, 61.http://adsabs.harvard.edu/abs/1996A%26A...306...61B Bundy, K., Bershady, M. A., Law, D. R., et al. 2014, TheAstrophysical Journal, 798, 7,doi: 10.1088/0004-637X/798/1/7Calzetti, D., Kinney, A. L., & Storchi-Bergmann, T. 1994,The Astrophysical Journal, 429, 582, doi: 10.1086/174346Campbell, A., Terlevich, R., & Melnick, J. 1986, MonthlyNotices of the Royal Astronomical Society, 223, 811,doi: 10.1093/mnras/223.4.811Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, TheAstrophysical Journal, 345, 245, doi: 10.1086/167900 Rhea et al. (a)[

S ii ] λ S ii ] λ S ii ] λ S ii ] λ α (c)[ N ii ] λ S ii ] λ S ii ] λ N ii ] λ α (e)([ O ii ] λ O ii ] λ β (f)[ O iii ] λ β Figure 9 : Residual Plots created by taking the diﬀerence between the

ORCS ﬁts and the values calculated by theartiﬁcial neural network for the Southwest Field of M33 normalized by the

ORCS ﬁt values. As discussed in the text,regions with large discrepancies between the ORCS and ANN ﬁts are generally either not classic

H ii regions or aredescribed best by multiple components. achine Learning Approach to Spectroscopic Observations II (g)([ O ii ] λ O ii ] λ O iii ] λ S ii ] λ S ii ] λ α /H β Figure 10 : Extension of Figure 9.4