Understanding the input-output relationship of neural networks in the time series forecasting radon levels at Canfranc Underground Laboratory
aa r X i v : . [ phy s i c s . c o m p - ph ] M a r Understanding the input-output relationship of neuralnetworks in the time series forecasting radon levels atCanfranc Underground Laboratory
I˜naki, Rodr´ıguez-Garc´ıa, Miguel C´ardenas-Montes a a CIEMAT, Department of Fundamental Research.Avda. Complutense 40. 28040. Madrid, Spain.
Abstract
Underground physics experiments such as dark matter direct detection needto keep control of the background contribution. Hosting these experimentsin underground facilities helps to minimize certain background sources suchas the cosmic rays. One of the largest remaining background sources is theradon emanated from the rocks enclosing the research facility. The radonparticles could be deposited inside the detectors when they are opened toperform the maintenance operations. Therefore, forecasting the radon lev-els is a crucial task in an attempt to schedule the maintenance operationswhen radon level is minimum. In the past, deep learning models have beenimplemented to forecast the radon time series at the Canfranc UndergroundLaboratory (LSC), in Spain, with satisfactory results. When forecasting timeseries, the past values of the time series are taken as input variables. Thepresent work focuses on understanding the relative contribution of these in-put variables to the predictions generated by neural networks. The resultsallow us to understand how the predictions of the time series depend on theinput variables. These results may be used to build better predictors in thefuture.
1. Introduction
Nowadays, underground Physics experiments are focused on searchingparticles with low energy signals such as dark matter direct detection. The
Email address: [email protected], [email protected] (I˜naki, Rodr´ıguez-Garc´ıa, Miguel C´ardenas-Montes)
Preprint submitted to Elsevier March 2, 2021 earched signals are, usually, very weak and with low expected frequency. Forthese reasons, it is fundamental to control and understand as much as possiblethe sources of background. For example, for minimizing the contribution ofthe cosmic rays, these experiments are carried out in subterranean facilitiessuch as mines and caves under mountains.These underground facilities are enclosed by rocks in the mountains. Therocks emit radon, towards the caves, which is a potential source of back-ground. In the underground laboratories, the radon is produced by the decayof radium-226 present in the rocks. With an activity ranging from tens tohundreds of
Bqm , the radon-222 decays into polonium-218 and emits an α par-ticle. Due to the high rate of emission, the α particles can hide the searchedsignal.The Canfranc Underground Laboratory (LSC) is a laboratory located inthe Spanish side of the Pyrenees. Over the laboratory there are about 800meters of rocks, that gives natural protection against cosmic rays. For thisreason, the LSC is a good host for low-energy events search experiments likeANAIS [1], ArDM [2] and DArT [3]. For the international collaborationswhose experiments are located at LSC, it is useful to have high-quality pre-dictions of the radon levels in order to schedule the maintenance operationsduring the low-level periods. Minimizing the exposure of the materials istherefore an objective.The present paper continues with the forecasting efforts of the radon lev-els at LSC. In the past, classical time series algorithms such as Holt-Winters,ARIMAs and STL Decomposition have been compared to deep models in [4].An ensemble of STL Decomposition joined to the Convolutional Networks,called STL+CNN, to improve the forecasts, was presented in [5] for variousdatasets and, in [6] for the radon Series. Also in [7] the authors proposed apopulation-based incremental learning (PBIL) algorithm to optimize the hy-perparameters of STL+CNN models. Other novel research lines have beentested, in [9] the authors proposed to use the weather variables from fourcities around the LSC, to improve the predictions of radon levels. The anal-ysis of the uncertainty in Deep Learning models and Gaussian Process wasdone in [8]. The main goal of the present paper is to understand how theinput influences the predictions of the neural networks which forecast theradon-222 at levels. For this task, the algorithms of Garson [10] and Olden[11] are applied.The rest of the paper is organized as follows: in Section 2, the radon timeseries, the neural networks and the importance algorithms are described. The2esults and their posterior analysis is shown in Section 3. Finally, Section 4presents the conclusions of this work.
2. Methods and Materials
The LSC is located between a road tunnel and old train tunnel that unitesSpain and France across the Pyrenees. The laboratory is composed of threesites (LAB780, LAB2400 and LAB2500) under 850 meters of rocks whichgive a cosmic rays protection of 2450 m.w.e. The main site is also divided inthree halls (A, B and C) for hosting the diverse experiments.In the Hall A, the radon level has been recorded by an Alphaguard P30every ten minutes since July 2013. The raw measurements have a lot of noiseas it can be seen in Figure 1. Few missing values are in the data set. Thelarge gaps appear in July 2014 with 913 missing values -about a week-, inJune 2015 with 1053, and in January 2016 with 585 -about four days.Following previous work [4], the levels of radon exhibit a certain season-ality in the medians, as Figure 2 shows, where the weekly medians are lowerin winter than in the summer. For this reason, the work-data are the weeklymedians of the raw values in order to reduce the noisy contributions. Aftersampling data into weekly values, there are only two missing values. Theseare filled after a Gaussian distribution over the time series.
Figure 1: Example of raw measurements of the radon-222 concentration in the Hall A ofthe LSC collected in 2017.
014 2015 2016 2017 2018 2019 2020050100150200250300 R n ( B q / m ) (a) R n ( B q / m ) (b)Figure 2: Weekly box-plots of radon-222 level at Hall A of the LSC, by weekly over rangeof the years (a) and by week independently the years (b). The final time series is formed by the 383 median weekly values (orangepoints in Figure 2(a) from July 21th, 2013 to November 8th, 2020. The lasttwo years (about the 30%), are reserved to the test set. This implies that thelast date of the train set is November 4th, 2018; and the test set begins thenext week, on November 11th, 2018. Both data sets are plotted in Figure 3.4
014 2015 2016 2017 2018 2019 2020
Dates R n l e v e l s [ B q / m ] Figure 3: Final dataset composed of the weekly medians of the radon-222 measured in theLSC from July of 2013 to November of 2020. The blue line corresponds to the values ofthe train set, and the dashed green line to the test set.
In machine learning, the dimensions of the input data depend on thenature of the problem being analyzed. When the input data is an univariatetime series, the input is a 1D vector, but, for a multivariate time series theinput is a 2D vector. When the problem deals with images, the input is 2Dmatrix or 3D tensors, depending if the images have one or three channels.Also the input can have more dimensions for examples where several imagesare concatenated in a sequence of images.Time series seem as an unique vector or example with size as large asthe number of records. The most frequent way of creating the independentand dependent variables in machine learning is to divide the series in severalframes or sub-series of a fixed size L . The look-back L indicates how muchpast values are incorporated in the input frames.As an example, in the weekly data, if L = 4, it indicates that the framesare formed by the information of the previous month, but if L = 12 theframes are formed by the previous quarter.To show how the time series is divided in frames, the following time seriesX is used. It is composed of ten values and it will be divided into sub-serieswith size of seven values. X = ( x , x , x , x , x , x , x , x , x , x , x ) (1)5he sub-series are: X = ( x , x , x , x , x , x , x ) X = ( x , x , x , x , x , x , x ) X = ( x , x , x , x , x , x , x ) X = ( x , x , x , x , x , x , x ) (2)And now, the sub-series X , X , X and X can be used to predict the values x , x , x and x respectively, forming a dataset with four events. Thenumber of events in the dataset are the length of the original time series andminus the input window length. Each variable of the sub-serie is named as lag i where i is the position in the sub-serie, and the following value is called actual . In the example below, x , x , x and x are values of the variable actual for each event. The lag refers to the immediate previous value of actual value, if actual = x , then lag = x , and lag L is the last value of thesub-serie; in the same example, lag = x . Artificial Neural Networks (ANN) are biological-inspired algorithms basedin the brain networks relations. The fundamental unit of ANN, called neu-ron, is a nonlinear combination of input values and weights as it is shownbelow (Eq. 3): a n = f h ( b n + w n x + w n x + w n x + ... + w in x i + ... + w In x I ) (3)where a n is the value of the neuron n , x i is the i-input, w in is the weight ofthe neuron n respect the input i , b n is the bias unit of the neuron n , and f b ( · )is the activation function of the hidden layer, and it has to be a nonlinearfunction. The neurons are grouped in layers; where the relationships betweenlayers are: the outputs of one layer are the inputs for the next layer. Thelayers are divided in three classes; the input layer is the first layer of theneuron, the final layer is called output layer whose values are the networkresults (or predictions); the rest of layers are hidden layers . In the followingonly ANN with a single hidden layer is considered.The equation for the output neuron is similar to Eq. 3, where the inputx has to be changed by the hidden neurons (Eq. 4):ˆ y = f o ( v + v a + v a + v a + ... + v N a N + ... ) (4)where ˆ y is the prediction of the neural network, v n the weight of the outputneuron respect the hidden neuron n and f o ( · ) is the activation function ofthe output layer. 6 .4. Evaluating the Input Importance This work follows the algorithms developed by Garson [10] and Olden[12] in order to know how the input influences the predictions of the models.The Garson’s algorithm was presented in 1991 , it calculates the impor-tance of the input using the absolute value of the products between theweights of the neurons of the hidden layer w in and the weights of neurons ofthe output layer v n (Eq. 5). c i = m X j =1 | w ij · v j | P Ik =1 | w kj · v j | (5)where w are the weights of the neurons in the hidden layer, v are the weightsof the neurons in the output layer, m are the number of the neurons in thehidden layers, and I are the number of the dimensions of the input variables.Variables with low importance in the prediction obtain coefficients c i closeto zero.The Olden’s algorithm has a similar structure to Garson’s one but skipsthe normalization. This allows coefficients positive and negative. Variableswith low relevance produce coefficients c i close to zero. In the comparison[12] authors claim a higher performance of the former in comparison withthe latter one. c i = m X j =1 w ij · v j (6)where w are the weights of the neurons in the hidden layer, v are the weightsof the neurons in the output layer, and m are the number of the neurons inthe hidden layers. Before applying the algorithms previously described, a grid search is im-plemented to know which hyperparameters configuration produces the bestprediction. The architecture of the neural networks is: one input layer withthe same neurons as the window input L , followed by one hidden layer whichhas N neurons with relu as activation function. Then, an output layer with The original paper where the Garson’s algorithm was presented, is not available. Thedetails can be found in [12, 13]. linear . The size of the input andthe number of hidden neurons are the hyperparameters being evaluated inthe grid search. The size of input ranges between 10 and 60 lags values withstep of one, and for the number of neurons from 10 to 100 with step of 5.During the train sessions the callback
EarlyStooping is used. It permitsto stop training after a number of epochs (chosen in 20) when the meansquare error does not improve. It is also able to restore the weights of thebest epoch. The initial number of epochs was chosen in 2000. The rest ofthe hyperparameters are: • A batch with a size of 32 events. • The optimizer is Adam with the defaults parameters. • The loss function is the mean squared error (MSE), and also the rootmean squared error (RMSE) and the mean absolute error (MAE) arerecorded as metrics.During the grid search, ten independent executions of each network areperformed, and to discriminate the best model, the RMSE is the figure ofmerit.The neural networks were implemented on a NVIDIA Pascal GPU withthe use of the library
Tensorflow [14]. The libraries scikit-learn [15], pandas , numpy , matplotlib and seaborn are also used.
3. Results and Discussion
The best one-layer architecture to forecast the radon-222 level in the LSCis chosen after the grid search process. It searches between two hyperparame-ters, the input window size —the number of lags used for forming the input—and the number of neurons in the single hidden layer. Since the weights ofneurons are used for evaluating the relative importance of the variables, theymust be as close as possible to the optimal set of weights. The results of thegrid search are summarized in Figure 4.The metrics for the ten best architectures from the grid search are detailedin Table 1. The lowest error corresponds to a model with 12 past values asinput and 55 hidden neurons. It is also notable that the rest of high-qualityconfigurations use low inputs, in the order of a dozen of lagged values.8
Hidden neurons L a g w i n d o w s a s i n p u t Figure 4: Results from the grid search process after 10 independent runs.
Input Neurons RMSE MAE12 55 10 . ± .
09 8 . ± . . ± .
11 8 . ± . . ± .
12 8 . ± . . ± .
14 8 . ± . . ± .
14 8 . ± . . ± .
14 8 . ± . . ± .
15 8 . ± . . ± .
16 8 . ± . . ± .
16 8 . ± . . ± .
16 8 . ± . Table 1: Results for the ten best models according to the Root Mean Squared Error(RMSE) and the Mean Absolute Error on the test set for 10 independent runs.
An intuitive initial configuration corresponds to an input 52 lags, whichcorresponds to a window of one year. A priori good results are expected.9owever, it is not pointed to as a high-quality configuration by this gridsearch.The radon-222 values predicted by the best model -input of 12 and 55hidden neurons- are shown in the 5 together with the real values. R a d o n l e v e l [ B q / m ] Test valuesAverage predictionMax-Min prediction
Figure 5: Comparison between the real values (green) and the average prediction (red)over the test set for the ten independent executions of the best model composed by 55hidden neurons and input of 12 past weeks. The filled area represents the maximum andminimum values achieved for 10 independent runs.
It is notable that the forecasts seem to have shifted one value ahead. Onemight think that the network only repeated using the last value of the inputwindow and that the rest tended to confuse the network. To compare this,the neural network performance is compared with the model ˆ y t = y t − , whichessentially repeats the past value as forecast.The RMSE and MAE achieved 11.42 and 8.40 respectively, and both arelarger than those associated with the best neural network. Then the neuralnetwork is better than forecasting the past value. In this way, the rest of theinput variables tend to improve the prediction. Indeed in as many as 67.10%of the networks analyzed had less error. Figures 6 and 7 show the coefficients of the importance of the inputvariables. Garson’s algorithm gives a relative high importance to the firstfive lags, being the highest for lag . For more delayed lags, the importancereduces progressively. 10 C o e ff i c i e n t Average coefficientMax-min coefiicients
Figure 6: Garson’s algorithm for the best model which is composed by 55 hidden neuronsand input of 12 past weeks.The filled area represents the maximum and minimum valuesachieved for 10 independent runs.
In comparison with Garson’s algorithm, Olden’s one also gives a rela-tively high importance to lag , at the same time that exhibits a more criticalreduction of the importance of the variables as far as the lags grow. C o e ff i c i e n t Average coefficientMax-min coefiicients
Figure 7: Olden’s Algorithm for the best model which is composed by 55 hidden neuronsand input of 12 past weeks.The filled area represents the maximum and minimum valuesachieved. . Conclusions In this paper a first attempt to understand the input importance in thedeep models which forecast the time series of radon-222 at Canfranc Under-ground Laboratory is presented. A grid search was performed to find thebest neural networks with a single hidden layer to forecast the radon-222levels. The result is a neural network with input of 12 past values and 55hidden neurons.In order to know how the input influences the predictions, the algorithmsof Olden and Garson were applied to the weights of the best model previouslyobtained. Both algorithms show that the lag value has the greatest impor-tance for the prediction. For larger lags and depending on the analysis, therelative importance goes down. Olden’s algorithm still considers relevant thelags until lag . This result differs in Garson’s algorithm that gives it a muchlower importance. Furthermore, Olden’s algorithm establishes a strongerreduction of the importance as far as the lags increase than Garson’s one. Acknowledgment
I˜naki Rodr´ıguez-Garc´ıa is funded through the PEJ2018-003089-P projectto promote Young Employment and Implementation of the Youth Guaranteein R+D+i within the framework of the State Subprogram for the Incorpora-tion of the State Program for the Promotion of Talent and Employability inR+D+i within the framework of the State Plan for Scientific and TechnicalResearch and Innovation 2017-2020. This contract is co-funded by the Euro-pean Social Fund and the Youth Employment Initiative through the YouthEmployment Operational Program. Authors wish to express their thanksfor the data support and sharing to Canfranc Underground Laboratory and,particularly to Iulian C. Bandac.
References [1] J. Amare et al. , Update on the ANAIS experiment. ANAIS-0 prototyperesults at the new Canfranc Underground Laboratory, J. Phys. Conf.Ser. (2012), 012026 doi:10.1088/1742-6596/375/1/012026[2] J. Calvo et al. , Commissioning of the ArDM experiment at the Canfrancunderground laboratory: first steps towards a tonne-scale liquid argon12ime projection chamber for Dark Matter searches, JCAP (2017),003 doi:10.1088/1475-7516/2017/03/003[3] E. Sanchez Garcia, DArT, a detector for measuring the Ar de-pletion factor, JINST (2020) no.02, C02044 doi:10.1088/1748-0221/15/02/C02044[4] I. M´endez-Jim´enez, M. C´ardenas-Montes, Modelling and Forecasting ofthe Rn Radiation Level Time Series at the Canfranc UndergroundLaboratory. In: Hybrid Artificial Intelligent Systems - 13th InternationalConference, HAIS. Volume 10870 of Lecture Notes in Computer Science,Springer (2018), pp. 158–170[5] I. M´endez-Jim´enez, M. C´ardenas-Montes, Time Series Decompositionfor Improving the Forecasting Performance of Convolutional Neural Net-works. In: Advances in Artificial Intelligence - 18th Conference of theSpanish Association for Artificial Intelligence, CAEPIA. Volume 11160of Lecture Notes in Computer Science, Springer (2018), pp. 87–97[6] M. C´ardenas-Montes, I. M´endez-Jim´enez, Ensemble Deep Learning forForecasting Rn Radiation Level at Canfranc Underground Labora-tory. In: 14th International Conference on Soft Computing Models inIndustrial and Environmental Applications, SOCO. Volume 950 of Ad-vances in Intelligent Systems and Computing, Springer (2019), pp. 157–167[7] R.A. Vasco-Carofilis, M.A. Guti´errez-Naranjo, M. C´ardenas-Montes,PBIL for optimizing hyperparameters of Convolutional Neural Networksand STL Decomposition, HAIS. Volume 12344 of Lecture Notes in Com-puter Science, Springer (2020), pp. 147–159[8] M. C´ardenas-Montes, Uncertainty estimation in the forecasting of the Rn Radiation Level Time Series at the Canfranc Underground Lab-oratory. Logic Journal of the IGPL (2020), Oxford University Press,10.1093/jigpal/jzaa057[9] T. S´anchez-Pastor, M. C´ardenas-Montes, R. Santorelli, Nowcasting forimproving radon forecasting at Canfranc Underground Laboratory, Sub-mitted (2020) 1310] G. David Garson, Interpreting neural-network connection weights. AIExpert 6, 4 (1991), 46–51.[11] J.D Olden, D.A. Jackson, Illuminating the “black box”: a randomizationapproach for understanding variable contributions in artificial neuralnetworks, Ecological Modelling, 154(1-2), (2002) pp.[12] J.D Olden, M.K. Joy, R.G. Death, An accurate comparison of methodsfor quantifying variable importance in artificial neural networks usingsimulated data, Ecological Modelling, 178(3-4), (2004) pp. 389–397[13] A.T.C. Goh, Back-propagation neural networks for modeling complexsystems, Artificial Intelligence in Engineering, 9(3), (1995), pp. 143–151[14] M. Abadi et al. , TensorFlow: Large-scale machine learning on heteroge-neous systems, 2015. Software available from tensorflow.org.[15] F. Pedregosa et al.et al.