A Machine Learning Approach to Integral Field Unit Spectroscopy Observations: II. HII Region LineRatios
Carter Rhea, Laurie Rousseau-Nepton, Simon Prunet, Myriam Prasow-Emond, Julie Hlavacek-Larrondo, Natalia Vale Asari, Kathryn Grasha, Laurence Perreault-Levasseur
DDraft version February 15, 2021
Typeset using L A TEX twocolumn style in AASTeX63
A Machine Learning Approach to Integral Field Unit Spectroscopy Observations: II. HII Region LineRatios
Carter Rhea , Laurie Rousseau-Nepton , Simon Prunet ,
2, 3
Myriam Prasow-´Emond , Julie Hlavacek-Larrondo , Natalia Vale Asari ,
4, 5, 6
Kathryn Grasha , andLaurence Perreault-Levasseur
1, 8, 9 D´epartement de Physique, Universit´e de Montr´eal, Succ. Centre-Ville, Montr´eal, Qu´ebec, H3C 3J7, Canada Canada-France-Hawaii Telescope, Kamuela, HI, United States Laboratoire Lagrange, Universit´e Cˆote d’Azur, Observatoire de la Cˆote d’Azur, CNRS, Parc Valrose, 06104 Nice Cedex 2, France Departamento de F´ısica–CFM, Universidade Federal de Santa Catarina, C.P. 476, 88040-900, Florian´opolis, SC, Brazil School of Physics and Astronomy, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK Royal Society–Newton Advanced Fellowship Research School of Astronomy and Astrophysics, Australian National University, Weston Creek, ACT 2611, Australia Mila - Quebec Artificial Intelligence Institute, Montreal, Qu´ebec, Canada Center for Computational Astrophysics, Flatiron Institute, New York, USA (Received; Revised; Accepted February 15, 2021)
Submitted to ApJABSTRACTIn the first paper of this series (Rhea et al. 2020), we demonstrated that neural networks can robustlyand efficiently estimate kinematic parameters for optical emission-line spectra taken by SITELLE atthe Canada-France-Hawaii Telescope. This paper expands upon this notion by developing an artificialneural network to estimate the line ratios of strong emission-lines present in the SN1, SN2, and SN3filters of SITELLE. We construct a set of 50,000 synthetic spectra using line ratios taken from theMexican Million Model database replicating
H ii regions. Residual analysis of the network on the testset reveals the network’s ability to apply tight constraints to the line ratios. We verified the network’sefficacy by constructing an activation map, checking the [
N ii ] doublet fixed ratio, and applying astandard k-fold cross-correlation. Additionally, we apply the network to SITELLE observations ofM33; the residuals between the algorithm’s estimates and values calculated using standard fittingmethods show general agreement. Moreover, the neural network reduces the computational costs bytwo orders of magnitude. Although standard fitting routines do consistently well depending on thesignal-to-noise ratio of the spectral features, the neural network can also excels at predictions in thelow signal-to-noise regime within the controlled environment of the training set as well as on observeddata when the source spectral properties are well constrained by models. These results reinforce thepower of machine learning in spectral analysis.
Keywords:
Machine Learning; ISM; Galaxies INTRODUCTIONEmission-line nebulae are a critical part of our under-standing of galactic evolution and radiative processes;thus, they are a primary targets of observation in ex-tragalactic astronomy (Kennicutt 1984; Veilleux & Os-
Corresponding author: Carter [email protected] terbrock 1987; Kewley et al. 2019).
H ii regions formfrom clumps of gas in the interstellar medium (ISM)when young O/B stars irradiate the surrounding envi-ronment (e.g. Franco et al. 2000; Osterbrock & Fer-land 1989; Shields 1990). The region becomes eitherpartially or fully ionized depending on the budget andhardness of ionizing photons, the morphology and thetotal mass of the mother cloud.
H ii regions are primar-ily composed of Hydrogen and Helium; however, they a r X i v : . [ a s t r o - ph . GA ] F e b Rhea et al. contain non-negligible amounts of metals (e.g. Shields& Tinsley 1976; Garnett & Shields 1987; Kennicutt &Oey 1993; Oey & Kennicutt 1993). Through recom-bination and collisonal processes between the ionizedatoms and the free electrons, the nebula emits character-istic strong emission-lines which indicate their underly-ing chemical structure (e.g. Baldwin et al. 1981; Craw-ford et al. 1999; Kewley et al. 2001; Kewley et al. 2006;Kewley et al. 2019). In the optical, the primary emis-sion lines include, but are not limited to, the Balmer se-ries (i.e. H αλ βλ O ii ] λλ O iii ] λλ N ii ] λλ S ii ] λλ H ii regions(e.g. Viallefond 1985; Melnick et al. 1987), shock in-duced regions such as supernova remnants (e.g. Fesenet al. 1985; Danziger & Dennefeld 1976), and planetarynebulae (e.g. Miller 1974; Oserbrock 1964). Severaldiagnostics exist to categorize emission nebulae basedoff their line ratios (e.g. Baldwin et al. 1981; Kew-ley et al. 2019; Constantin & Vogeley 2006; D’Agostinoet al. 2019). Nevertheless, these different diagnosticshave limitations due to the multi-parameters physicalnature of these objects which is not exempt of degener-acy. With that in mind, we want to test the hypothesisthat fitting multiple strong lines directly using modelpredictions (e.g. CLOUDY, MAPPINGS; Ferland et al.2017 Allen et al. 2008) that covers typical physical con-ditions in the gas could help minimize the errors associ-ated on the line parameters (intensity, broadening andvelocity), provide a pathway to classification of the neb-ulae directly from the fit, and potentially significantlyincrease the computational efficiency of the procedureas well as providing a new angle to estimate uncertain-ties.The recent advent of integral field spectroscopy is ex-panding our knowledge of emission-line nebulae withtheir increased spatial and spectral resolution; these in-struments are pushing the ability of existing analysistools due to their resolution. (e.g. S´anchez et al. 2012;Leroy et al. 2016; Bundy et al. 2014; Martins et al. 2010).SITELLE, an Imaging Fourier Transform Spectrograph(IFTS) located at the Canada-France-Hawai’i Telescope,is one such instrument. SITELLE has an unrivaled fieldof view of 11 (cid:48) × (cid:48) and produces data cubes containingover 4 million pixels. Each pixel contains a spectrum.(e.g. Baril et al. 2016; Drissen et al. 2019). SITELLEhas a spectral resolution between 1 and 20,000. It’s in- strumental line shape is described by a cardinal sincfunction. It can be convolved with a Gaussian (Martinet al. 2016) to account for natural broadening of the ob-served lines. Spectral fits must be done cautiously to ac-curately model the line shape and capture the effects ofsidelobes of the sinc, which can greatly influence the es-timated flux of an emission-line (e.g. Martin et al. 2012).In many field of astrophysics, it is necessary to model re-solved and unresolved (blended) lines accurately. Emis-sion line parameters are the root information used toinfer physical properties of locally-resolved nebulae aswell as classifying them, but they are also crucial forproper characterisation of galaxies at different redshift.For the later, the line parameters are essential to fullyunderstand how internal and external processes affectgalaxies as one cannot disentangle active galactic nu-cleus, stellar winds, supernovae, etc, multiple feedbackprocesses that affect the whole galaxy in different waysand at different scales.In this paper we apply an artificial neural network toSITELLE spectra in order to obtain the strong emission-line ratios and validates its results by comparing withthe line ratio derived from a standard line fitting tech-niques. We demonstrate the capability of the machinelearning algorithm in low signal-to-noise regimes. In § §
3, we describehow the network was trained and how it compares withthe standard fitting procedure. In §
4, we explore theimpact of the signal-to-noise on the algorithm in addi-tion to applying it on a real observation of M33, a wellresolved local galaxy. We conclude in § METHODOLOGYIn the first article of the series, Rhea et al. (2020),we explored the application of a convolutional neuralnetwork to calculate the kinematic parameters fromSITELLE spectra. In this paper we expand the useof machine learning to calculate another critical set ofphysical parameters, the line ratios of strong emission-lines. As in the previous paper, the initial step in anymachine learning application is to assemble the appro-priate training set.2.1.
Synthetic Data
In order to facilitate the training of the networkused to estimate strong-line ratios, we rely on care-fully constructed synthetic spectral data. Synthetic achine Learning Approach to Spectroscopic Observations II α (6563)˚A, [ O ii ] λ O ii ] λ β (4861)˚A, [ O iii ] λ O iii ] λ N ii ] λ N ii ] λ α (6563)˚A, [ S ii ] λ S ii ] λ orb.fit.create cm1 lines model (Martin et al.2012); ORB is the software kernel written for SITELLE,the imaging Fourier Transform Spectrometer on theCanada-France-Hawaii Telescope (Martin et al. 2016;Martin & Drissen 2017; Baril et al. 2016).
ORBS and
ORCS build upon the
ORB kernal and are used for datareduction and data analysis, respectively.Following the SIGNALS large program instrumentalconfiguration for the observations, we set the maximalspectral resolution in SN1 and SN2 to R ∼ ∼ ∼
200 lessthan the maximal value (e.g. Martin & Drissen 2017).Previous work reveled that this is a key parameter foran accurate training of the algorithm (Rhea et al. 2020).To fully sample the velocity and broadening space ex-pected in the SIGNALS catalog (and in keeping withour previous study), we randomly select the velocity pa-rameter from a uniform distribution between -200 and200 km s − and the broadening parameter between 10and 50 km s − . These values represent expected val-ues for typical HII regions (e.g. Epinat et al. 2008);additionally, at R ∼ − (Rousseau-Nepton et al. 2019). The resolution, broadening, andvelocity were randomly selected with replacement foreach synthetic spectra so that we sample the entire pa-rameter space (e.g., James et al. 2013). Furthermore,the signal-to-noise ratio varied randomly between 5 and30 with respect to the Halpha emission – meaning thatHalpha emission is at least 5 times over the noise. How-ever, this does not ensure that all lines are above thenoise. The noise was assigned to each spectral channelindividually and was randomly sampled from a normaldistribution centered around the chosen signal-to-noiseratio with a sigma of 1. The last element required to create synthetic spectrais the relative amplitude of the strong emission-lines.For each type of nebulae, we used different databases.Following the methodology described in detail in Rheaet al. (2020), relative line amplitude of the HII re-gions were sampled from the Mexican Million Modelsdatabase (3MdB; Morisset et al. 2015) BOND simula-tions (Vale Asari et al. 2016). Following standard proce-dure, we use 70% of the synthetic data for the trainingset, 20% for the validation set, and 10% for the test set(e.g., Breiman 2001).In contrast with our previous study, Rhea et al. 2020,we require all strong lines sampled in the SIGNALS ob-servations to be present. We also add the restrictionthat all strong emission-lines must have an amplitudeequal to or greater than 12% of the H α amplitude. Thisthreshold corresponds to a signal-to-noise ratio of 3 forthe faintest targets in the SIGNALS sample whose H α surface brightness is approximately 8 × − erg s − cm − arcsec − (Rousseau-Nepton et al. 2019). Whilethe impact of this threshold is discussed more in-depthin § § Artificial Neural Networks
In this paper, we study the application of an artificialneural network to the problem of strong emission-lineratio estimation. Using the synthetic spectra describedin § The Algorithm
Feed Forward artificial neural networks (ANN) con-tain three principal layers: the input layer, the hiddenlayer(s), and the output layer (e.g. Hansen & Salamon1990). The input layer consists of the preproccesseddata that will be used to train the network and eventu-ally be fed unseen inputs to make predictions. In thiscase, the input layer is the combined SN1, SN2, and SN3SITELLE observations described previously. The out-put layer consists of line ratio estimates. The hidden lay-ers contain an ensemble of nodes which are parametrized
Rhea et al. [ S ii ] λ S ii ] λ S ii ]/H α [ N ii ]/H α [ N ii ]/[
S ii ] [
O iii ]/H β [ O ii ]/H β [ O ii ]/[
O iii ] H α /H β *0.73-0.79 0.27-1.18 0.27-2.03 0.14-6.97 0.33-2.47 0.24-6.97 0.60-30.89 1.93-5.99 Table 1 : Ranges of line ratio parameters from synthetic spectra. The distribution of the parameters are all skewedheavily to the lower values (with the exception of the [
S ii ] doublet ratio which is evenly distributed over a smallrange) which represents the likelihood of finding those parameters in the simulations from which these values areobtained. After applying an arcsinh transformation, the variables are normally distributed. * The wide range is dueto artificially injected dust redenning following the prescription described in § x , multiplies itby a weight matrix, w , and adds a bias, b . The node isthen activated by a predetermined function. A commonactivation function, the Rectified Linear Unit (ReLU), isemployed in every layer of the network used in this paperexcept for the final layer (e.g. Chen et al. 1990). TheReLU function, g ( w × x + b ) = max(0 , w × x + b ), takesthe value of the node unless the value is negative; in thatcase, it takes the value 0. The result is a non-negativevalue for each node in a layer where the vector-valuedfunction of each layer, l , is denoted as h l ( x , b , w ). In tra-ditional neural networks, such as the network applied inthis work, the layers are fully-connected ; thus, each nodein a layer is connected to all nodes in the previous andsubsequent layer.After calculating h l ( x , b , w ) sequentially for eachlayer, we have a vector-valued output f ( x , b , w ). Wecan calculate the loss between the final predictions, f ,and the correct outputs, y . We adopt the Huber lossfunction which is defined as: k δ ( f, y : w, b )) = ( y − f ( w, b )) | y − f ( w, b ) | ≤ δδ | y − f ( w, b ) | − δ otherwise (1)where δ is a tune-able parameter and initialized as 1.The Huber loss function is often employed since it re-duces the effects of outliers on the final cost calculation(e.g. Huber 1964).We then minimize the loss function through back-propagation in which we alter the weights and biases(Hecht-Nielsen 1989):arg min w,b k ( f, y ; w, b ) (2)We apply the Adam implementation of the stochasticgradient descent algorithm in order to minimize the lossfunction (Lechevallier & Saporta 2010; Kingma & Ba2017). In this manner, we are able to train the net-work by updating the weights and biases until the lossfunction is minimized on the training set.2.3. The Network and Hyper-Parameters
In order to determine the structural parameters of thenetwork, we first constructed four networks; The first
Figure 1 : Final mean absolute percentage error afterfive epochs of training as a function of the number oflayers in the neural network. Each layer contains 1000nodes and is activated by a relu function. The graphicsshow the curve for the mock training, validation, andtest sets. We emphasize that the synthetic data used inthe structural parameter tuning were not used again.network had two hidden layers, the second network threehidden layers, and so on. Since the problem is nonlin-ear, we did not test a single hidden layer network. Wenote that the training, validation, and test sets used todetermine the structural parameters contain only 10000synthetic spectra. These spectra were not re-used inthe final training, validation, and testing ensemble. Ini-tially, each hidden layer contained 1000 nodes. Thisnumber is sufficient to test the optimal number of layers(e.g. Sheela & Deepa 2013). Figure 1 demonstrates thatthe MAPE plateaus for the training set at four layers.However, since the mean squared error (MSE) increasein both the validation and test sets between three andfour layers, we chose three layers as the optimal value.Although the figure is not included, the same patternshold for the mean absolute error. Next, we applied agrid search on the number of hidden neurons allowingeach to vary from 64, 128, 256, 512, and 1024 nodes.The MSE was minimized for all sets when the first layerhas 1024 nodes, the second layer has 1024 nodes, and achine Learning Approach to Spectroscopic Observations II Figure 2 : Graphical depiction of the artificial neural network employed in this paper. The spectrum vectors for SN1,SN2, and SN3 are combined to form a single input vector which is then passed through a series of three fully connectedlayers. A linear activation function takes the final layer and compresses it into a final prediction for each line ratio.the third layer has 512 nodes. Figure 2 graphically de-picts the entire network. The structure of the algorithmis as follows:1. Concatenate spectral vectors from SN1, SN2, andSN3 as input2. Fully Connected Layer with ReLU Activation(1x1024)3. Dropout (25%)4. Fully Connected Layer with ReLU Activation(1x1024)5. Dropout (15%)6. Fully Connected Layer with ReLu Activation(1x512)7. Fully Connected Output Layer with Linear Acti-vation (1x8)We determined the optimal hyperparameter values byusing the grid search and random search cross-validationtechniques implemented in sklearn . The batch size’soptimal value was determined to be four. Additionally,we adopt a normal random distribution as the initial-ization of the hidden layer neurons (Thimm & Fiesler1995; de Castro et al. 1998). The data are normalizedusing to the maximum value in the the SN3 filter whichcorresponds to the maximum value in the combined fil-ter. We note that this is not necessarily the amplitudeof the H α line. We use the tensorflow implementationof the ADAM optimization algorithm.Initial testing of the algorithm revealed a systematicbias of approximately 10% in the residuals (see § log10 , ln , arcsinh , etc.). Experimentationshowed that applying an arcsinh transformation re-sulted in normalized target variable distributions. Fur-thermore, this reduced the systematic bias considerably(e.g. Zheng & Casari 2008; Kuhn & Johnson 2019).Additionally, we applied a simple l2 regularization tech-nique with λ = 5 × − (e.g. Phaisangittisagul 2016;van Laarhoven 2017).In order to increase the accuracy of the method andprovide error estimates, we employ a technique knownas deep ensembling (Lakshminarayanan et al. 2017).This method leverages the fact that each neural networkis independent of the other. We train ten individualnetworks with the same architecture but with differentweight initializations. We then apply each network tothe test data individually. We average the classificationprobabilities and allow the standard deviation to act asan uncertainty estimate.2.4. SITELLE Data
SITELLE observations of the Southwest Field of M33led by P.I. Laurie Rousseau-Nepton were taken duringthe Queued Service Observation Period 18B (Program18BP41). The galaxy was imaged in the three primaryfilters: SN1, SN2, and SN3. Both the SN1 and SN2 ob-servations were taken at a spectral resolution R ∼ R ∼ Dereddening
Rhea et al.
Since we are using emission-line ratios spanning alarge range of wavelengths, we must include the effectsof dust attenuation by reddening the spectra (Calzettiet al. 1994; Buat & Xu 1996; Pettini et al. 2001). Wecalculate dereddening following the standard procedureof postulating an effective dust screen attenuation toobtain the intrinsic emission-line fluxes ( F ,λ ), F ,λ = F obs ,λ e τ λ = F obs ,λ e τ V q λ , (3)where F obs ,λ is the observed flux, τ λ is the optical depthat a given wavelength, τ V is the optical depth in the V -band, and the shape of the dust attenuation curve isparametrized by q λ ≡ τ λ /τ V . We adopt the Cardelliet al. (1989) attenuation law with a total-to-selectiveextinction R V = 3 .
1. We use the Balmer decrement,defined as B d = F obs , H α /F obs , H β , to calculate τ V : τ V = 1 q Hβ − q Hα ln B d B d,in , (4)where B d,in is the intrinsic Balmer decrement. We as-sume this value to be 2.87, which is appropriate for theCase B, an electronic temperature of 10 000 K and lowdensity (e.g. Osterbrock & Ferland 1989). Deredden-ing is applied to the line ratios post calculation by theneural network. RESULTS3.1.
Optimal Synthetic Data
In this section we discuss the primary results of thepaper and address the efficacy of the ANN when ap-plied to optimal SITELLE HII regions sythethic spec-tra in order to predict the following strong emission-lineratios : [
N ii ] λ N ii ] λ S ii ] λ S ii ] λ S ii ] λ S ii ] λ α , [ N ii ] λ α , H α /H β ,[ N ii ] λ S ii ] λ S ii ] λ O iii ] λ β ,([ O ii ] λ O ii ] λ O iii ] λ O ii ] λ O ii ] λ β .We train and validate the network using the syn-thetic data described in § (cid:80) Ni abs ( y true i − y pred i ); N is the number of train-ing and validation input) on the training and valida-tion are 0.0555 and 0.1207, respectively. The algorithmis applied to the test set; Figure 3 shows the relativeerrors achieved by the network when recovering eachemission-line ratio. We note that all line-ratios recov-ered from SN3 have a low residual standard deviation;this is attributed to the higher spectral resolution inSN3 compared to SN1 and SN2. All error plots re-veal approximately Gaussian error distributions with apositive skew. In order to validate these results usingthe standard fitting techniques, we use the ORB routine fit.fit lines in spectrum . We individually fit eachfilter; however, within a given filter, we fit all emission-lines simultaneously and supply the routine with thecorrect velocity and broadening parameters initially inorder to retrieve the best possible fits. The lines were fitwith a sincgauss function (Martin et al. 2016). Table2 displays the comparison between the relative errorsobtained using standard fitting routines and our ANNon the training set. In these conditions, the networkoutperforms the standard method for line ratios in allthree filters. It is important to note that the relativelyhigh errors in the fits are largely due to signal-to-noiseeffects (see § α /H β errors have the potentialto be reduced significantly which will lead to higher fi-delity dereddening estimates.3.2. Noise Classification
In this section we evaluate the network used to classifya spectrum as noisy or clean; we define a spectrum cleanif the SNR of all strong emission-lines is above a certainpre-determined threshold – in this case 5% that of H α .This value was determined by running our network onthe SN3 spectra of varying thresholds (from 1%–20%).The results indicated that below 5%, the ability of thenetwork to recover the line ratios becomes inhibited.Synthetic spectra for all filters were created in amethod identical to that described in § α .Spectra were classified as noisy if any emission-line hadan amplitude less than 5% that of H α ; otherwise, thespectrum was classified as noiseless. In spectra where asingle strong emission-line amplitude was below the cho-sen threshold, several other lines were also beneath thethreshold; thus, this constraint accurately categorizesdata into noisy and noiseless. We created 1,000 noisyand 1,000 noiseless spectra with signal-to-noise ratios ofH α varying between 5 and 30.We use a Decision Tree Classifier, but in order to re-duce bias and probabilistic errors, we aggregate severaltrees into a Random Forest. The data were randomlyshuffled; 90% were set aside for training, and 10% fortesting. Using 10 estimators (or 10 decision trees), theRandom Forest Classifier reports 100% accuracy in clas-sifying the two spectral types resulting in a diagonalconfusion matrix. Although reaching perfect accuracy achine Learning Approach to Spectroscopic Observations II Fitting Procedure [
S ii ] λ S ii ] λ S ii ]/H α [ N ii ]/H α [ N ii ]/[
S ii ] [
O iii ]/H β [ O ii ]/H β [ O ii ]/[
O iii ] H α /H β ORB
ORB (SNR>10)
ANN
Table 2 : Standard deviation calculations for relative errors of strong emission-line ratios. The top row reports valuescalculated using the standard
ORB routine while the second row calculates the same values for test set spectra with asignal-to-noise ratio greater than ten; comparatively, the bottom row contains values obtained using our ANN. [
S ii ]refers to the addition of the [
S ii ] doublets: [
S ii ] λ S ii ] λ O ii ] refers to the addition of thetwo primary [
O ii ] lines: [
O ii ] λ O ii ] λ O iii ] refers only to a single line: [
O iii ] λ §
100 75 50 25 0 25 50 75 1000.00.10.20.30.40.5 D e n s i t y S[II] 6731/S[II] 6716
100 75 50 25 0 25 50 75 1000.00000.00250.00500.00750.01000.01250.01500.0175
N[II] 6584/H (6563) Å
100 75 50 25 0 25 50 75 1000.000.010.020.030.040.050.06
O[II] 3726/H
100 75 50 25 0 25 50 75 1000.0000.0050.0100.0150.0200.0250.0300.035
OII 3726/OIII 5007
100 75 50 25 0 25 50 75 100
Relative Error (%) D e n s i t y (S[II] 6717+S[II] 6731)/H (6563) Å
100 75 50 25 0 25 50 75 100
Relative Error (%)
N[II] 6584/(S[II] 6717+S[II] 6731)
300 200 100 0 100 200 300
Relative Error (%)
OIII 5007/H
100 75 50 25 0 25 50 75 100
Relative Error (%)
H /H
Figure 3 : Density plots of line ratio relative errors calculated using the ANN described in § DISCUSSION4.1.
Verification of the Network
Although neural networks trained on synthetic dataare notoriously difficult to verify, we explore severalmethods to test whether the network is accurately learn-ing the line ratios and is portable to real data (e.g.Bishop 1994; Krogh & Vedelsby 1995). We applythe following three techniques: k-fold cross-validation,tracking the static [
N ii ] λ N ii ] λ N ii ] λ N ii ] λ Rhea et al.
Figure 4 : Relative error as a function of signal-to-noise for the [
N ii ] doublet: [
N ii ] λ N ii ] λ σ errorsassociated with a given bin.throughout all emission-line nebula and is frequently setto 3 in fitting code (e.g. Schirmer et al. 2013). Thisis reflected in the synthetic data set. Figure 4 revealsthat the calculated [ N ii ] doublet ratio relative error be-tween the true and network-estimated value is between0 and -1% while the standard deviation is approximately0.27%. Therefore, the network accurately replicates thestatic relation between the [
N ii ] lines.Subsequently, we calculate the saliency map of thenetwork. In order to calculate the saliency map, we cal-culate the derivative of the input with respect to theoutput of the neural network: ∂ [ input ] ∂ [ output ] . By multiplyingthe gradients together, we are left with our desired par-tial derivative (e.g. Simonyan et al. 2014). We normalizethe values to unity in order to compare their relative im-portance with ease. In this manner, input nodes with asaliency value of zero do not affect the neural network,while input nodes with a saliency value of 1 have themost importance in affecting the neural network’s esti-mation. In Figure 5, we show the saliency values plottedover a reference spectrum with a signal-to-noise of 20; wenote that saliency values less than 0.05 are not includedin the figure. The saliency map clearly shows that thenetwork prioritizes the amplitude of the emission lines.However, the map also reveals the importance of the re-gions adjacent to the emission-lines and the areas of thespectrum in between emission-lines. This reflects theimportance of the sidelobes of the ILS on the flux ratioestimates.4.2. Effects of Varying Signal-to-Noise
Signal-to-noise constrains the efficacy of traditionalfitting methods (e.g. Campbell et al. 1986; Endl & Cochran 2016). In this section, we compare the effects ofsignal-to-noise on the standard fitting techniques imple-mented in
ORB compared to the artificial neural networkdescribed in this paper.In order to study the effects of the signal-to-noise onthe efficacy of line ratio estimates, we bin the line ratioresiduals as a function of signal-to-noise. A signal-to-noise bin is created at each integer value of the sampledsignal-to-noise used to create the synthetic data (5-030).Residuals were calculated by taking the estimated valueand subtracted the ground truth and dividing that valueby the ground truth value; the value was then multipliedby 100 to make it a percentage. We then removed alloutliers; outliers are defined as residual values more than3- σ off the median value. We then calculated the medianvalue of the remaining set of residuals in each signal-to-noise bin. Errors were calculated as the 1- σ deviationsfrom the median. The plots are shown in appendix A.Figure 7 demonstrates that the residuals do not changeas a function of the signal-to-noise when calculated bythe neural network. Conversely, the residuals and theirassociated errors are greatly reduced in high signal-to-noise regimes ( R >
20) when calculated using standardfitting techniques.4.3.
Application to M33
Having demonstrated the feasibility of using a neuralnetwork to estimate strong emission-line ratios, we applyour methodology to the Southwest field of M33 studiedin our previous article (Rhea et al. 2020). This field con-tains several previously identified emission region types(classic
H ii regions, supernova remnants, and planetarynebulae; e.g. Zaritsky et al. 1989; Hodge et al. 1999;Viallefond et al. 1986); additionally, this field is a SIG-NALS target. All fits (both from the algorithm devel-oped here and
ORCS ) were run using a computing serverlocated at the CFHT headquarters in Waimea, Hawaiinamed iolani . The server has 2 Intel
XEON E5-2630 v3
CPUs operating at 2.40GHz with 8 cores each. Theconfiguration also has 64 GB of RAM available for com-puting purposes.To compare our results with those from the
ORCS fit-ting pipeline, we fit each cube seperately using
ORCS . Weuse the fit lines in region function to fit the strongemission-lines present in a given filter. Within a filter,the lines are fit simultaneously with a single sinc func-tion convolved with a Gaussian which returns each line’sflux, velocity, and broadening (Martin & Drissen 2017).
ORCS fits the observed data to a sinc function convolvedwith a Gaussian using the Levenberg-Marquardt leastsquares optimization algorithm. Velocity and broaden-ing priors were determined by fitting a binned (8 × achine Learning Approach to Spectroscopic Observations II Figure 5 : Activation Map of the SN1, SN2, and SN3 filters using the reference spectrum. The relative weights arecentered on emission line peaks and surrounding regions reflecting the importance of amplitude and side-lobes on fluxratio estimations. The signal-to-noise of this sample spectrum is approximately 20.data cube. The final unbinned fits ( ∼ ORCS fits in order to quantify the accuracy of the algo-rithm. Normalized residual plots for each line ratio canbe found in appendix B. A visual inspection of the resid-ual plots reveal that the machine learning algorithm re-turns similar values to those calculated by
ORCS . Resultsdeviate most strongly in regions which we show in Rheaet al. (2020) to be best described by multiple emissionprofiles, regions identified as non-
H ii regions, and thosewith a low signal-to-noise ratio. We note, however, thatthe normalized residual plots (Figures 9 and 10) showgeneral agreement between the standard
ORCS fits andthose calculated by the neural network. Furthermore,discrepancies between the
ORCS and neural network es-timates illustrate the limitations of such an approachwhen used as a replacement to global fitting algorithmssuch as
ORCS . These results further indicate the impor-tance of taking multiple emission profiles into account
Figure 6 : Coadded H α and [ N ii ] λ α fluxless than 2 × − erg s − cm − are masked out whichcorresponds to a signal-to-noise ratio of approximately5.when modeling – this is the topic of the following paperin the series. CONCLUSIONSApplications of machine learning in astronomy arebroad: from the estimation of stellar spectral param-eters (e.g., Fabbro et al. 2018) to the discovery of ob-jects of interest in extensive astronomical surveys (e.g.,ˇSkoda et al. 2020). In this work, we apply an artificialneural network to combined-filter (SN1, SN2, and SN3)SITELLE data representing typical SIGNALS large pro-gram observations. The network is designed to calcu-late important emission-line ratios for
H ii -like regionswhich are present in the primary SITELLE filters. Wetrain, validate, and test the algorithm using syntheticdata created with the
ORBS software package. We adopt0
Rhea et al. physically-derived line amplitudes from the Million Mex-ican Model Database (Morisset et al. 2015). Our resultsindicate that the network can potentially constrain theline ratios with greater precision than the standard linefitting technique implemented in
ORCS if the source spec-tral properties are well represented in the training set.To demonstrate the applicability of the method beyondsynthetic data, we apply the network to the Southwestfield of M33. Timing analysis indicates that the networkcan analyze the entire cube approximately 100 timesfaster than the standard methods.These results not only have an impact on the compu-tational aspects of line ratio calculations, but they alsocarry scientific implications. Although our knowledge ofgalactic dynamics has expanded considerably over thepast several decades, spectroscopic conclusions are re-stricted by the rigor of the fitting schemes employed andprecision of the results. In this paper, we have demon-strated that machine learning algorithms can consider-ably increase the precision of emission line ratios in bothlow and high signal-to-noise regimes. This has profoundimplications on the study of these regions since it willallow stricter categorization using methods such as line-ratio diagnostics in conjunction with BPT diagrams (e.g.Baldwin et al. 1981; Kewley et al. 2006; Kewley et al.2019). These methods require concise measurements inorder to accurately categorize the emission-region typeand break any model degeneracies.Following up on the success of our first report, thework presented here represents the second article in aseries of articles covering the application of machinelearning algorithms to SITELLE data cubes. Our re- sults have been encouraging in mapping out emissionline ratios in pixels dominated by
H ii region emissionserves as a proof-of-concept that using machine learn-ing to identify line fluxes is a viable methodology. Wenote this work is not meant to be a replacement toglobal line fitting algorithms. Identifying regions con-taining multiple, blended emission components, as wellas multiple sources of emission with spectral featuresnot represented in a training set remains to be ex-plored. Additionally, example code can be found athttps://github.com/sitelle-signals/Pamplemousse.ACKNOWLEDGMENTSThe authors would like to thank the Canada-France-Hawaii Telescope (CFHT) which is operated by the Na-tional Research Council (NRC) of Canada, the InstitutNational des Sciences de l’Univers of the Centre Nationalde la Recherche Scientifique (CNRS) of France, and theUniversity of Hawaii. The observations at the CFHTwere performed with care and respect from the sum-mit of Maunakea which is a significant cultural and his-toric site. C. R. acknowledges financial support from thephysics department of the Universit´e de Montr´eal, theMITACS summer scholarship program, and the IVADOdoctoral excellence scholarship. J. H.-L. acknowledgessupport from NSERC via the Discovery grant program,as well as the Canada Research Chair program. NVAacknowledges support of the Royal Society and the New-ton Fund via the award of a Royal Society–NewtonAdvanced Fellowship (grant NAF \ R1 \ A. SIGNAL-TO-NOISE AND RESIDUALSIn this section we display the signal-to-noise vs residual plots described in § § ORB as described in § B. LINE RATIO RESIDUAL PLOTSThis section contains the line ratio residual plots (ORCS fits - ANN estimates; Figures 9 and 10). The two methodsare in agreement in regions found to be best described by a single emission profile for each strong-line (see Rhea et al.(2020) for details); in regions best described by two emission profiles, the results differ significantly. In the subsequentpaper, we will explore machine learning techniques to determine whether or not emission regions are best describedby a single or double emission profile. REFERENCES
Allen, M. G., Groves, B. A., Dopita, M. A., Sutherland,R. S., & Kewley, L. J. 2008, The Astrophysical JournalSupplement Series, 178, 20, doi: 10.1086/589652 Baldwin, J. A., Phillips, M. M., & Terlevich, R. 1981,Publications of the Astronomical Society of the Pacific,93, 5, doi: 10.1086/130766 achine Learning Approach to Spectroscopic Observations II Figure 7 : Signal-to-noise ratio vs relative errors for the estimations obtained on the test set using the artificial neuralnetwork described in this paper. The signal-to-noise bins were taken at integer intervals. The black dots are the meanresiduals. The grey bars represent the 1-sigma errors in a given signal-to-noise bin.
Figure 8 : Signal-to-noise ratio vs relative errors for the estimations obtained on the test set using the software package
ORBS . The black dots are the mean residuals. The signal-to-noise bins were taken at integer intervals. The grey barsrepresent the 1-sigma errors in a given signal-to-noise bin.
Baril, M., Grandmont, F., Mandar, J., et al. 2016,International Society for Optics and Photonics, 9908Bengio, Y., & Grandvalet, Y. 2004, Journal of MachineLearning Research, 1089Bishop, C. M. 1994, IEE Proceedings - Vision, Image andSignal Processing, 141, 217, doi: 10.1049/ip-vis:19941330Breiman, L. 2001, Machine Learning, 45, 5Buat, V., & Xu, C. 1996, Astronomy and Astrophysics,306, 61.http://adsabs.harvard.edu/abs/1996A%26A...306...61B Bundy, K., Bershady, M. A., Law, D. R., et al. 2014, TheAstrophysical Journal, 798, 7,doi: 10.1088/0004-637X/798/1/7Calzetti, D., Kinney, A. L., & Storchi-Bergmann, T. 1994,The Astrophysical Journal, 429, 582, doi: 10.1086/174346Campbell, A., Terlevich, R., & Melnick, J. 1986, MonthlyNotices of the Royal Astronomical Society, 223, 811,doi: 10.1093/mnras/223.4.811Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, TheAstrophysical Journal, 345, 245, doi: 10.1086/167900 Rhea et al. (a)[
S ii ] λ S ii ] λ S ii ] λ S ii ] λ α (c)[ N ii ] λ S ii ] λ S ii ] λ N ii ] λ α (e)([ O ii ] λ O ii ] λ β (f)[ O iii ] λ β Figure 9 : Residual Plots created by taking the difference between the
ORCS fits and the values calculated by theartificial neural network for the Southwest Field of M33 normalized by the
ORCS fit values. As discussed in the text,regions with large discrepancies between the ORCS and ANN fits are generally either not classic
H ii regions or aredescribed best by multiple components. achine Learning Approach to Spectroscopic Observations II (g)([ O ii ] λ O ii ] λ O iii ] λ S ii ] λ S ii ] λ α /H β Figure 10 : Extension of Figure 9.4