[PDF] A Wavelet-CNN-LSTM Model for Tailings Pond Risk Prediction

Abstract

Tailings ponds are places for storing industrial waste. Once the tailings pond collapses, the villages nearby will be destroyed and the harmful chemicals will cause serious environmental pollution. There is an urgent need for a reliable forecast model, which could investigate the variation trend of stability coefficient of tailing dam and issue early warnings. In order to fill the gap, this work presents an hybrid network - Wavelet-based Long-Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), namely Wavelet-CNN-LSTM netwrok for predicting the tailings pond risk. Firstly, we construct the especial nonlinear data processing method to impute the missing value with the numerical inversion (NI) method, which combines correlation analysis, sensitivity analysis, and Random Forest (RF) algorithms. Secondly, a new forecasting model was proposed to monitor the saturation line, which is the lifeline of the tailings pond and can directly reflect the stability of the tailings pond. After using the discrete wavelet transform (DWT) to decompose the original saturation line data into 4-layer wavelets and de-noise the data, the CNN was used to identify and learn the spatial structures in the time series, followed by LSTM cells for detecting the long-short-term dependence. Finally, different experiments were conducted to evaluate the effectiveness of our model by comparing it with other state-of-the-art algorithms. The results show that Wavelet-CNN-LSTM achieves the best score both in mean absolute percentage error (MAPE), root-mean-square error (RMSE) and R 2 .

Full PDF

AA Wavelet-CNN-LSTM Model for Tailings Pond RiskPrediction

Jun Yang , Qing Li , and Yixuan Sun Abstract

Tailings ponds are places for storing industrial waste. Once the tailings pond collapses,the villages nearby will be destroyed and the harmful chemicals will cause seriousenvironmental pollution. There is an urgent need for a reliable forecast model, whichcould investigate the variation trend of stability coefficient of tailing dam and issueearly warnings. In order to fill the gap, this work presents an hybrid network -Wavelet-based Long-Short-Term Memory (LSTM) and Convolutional Neural Network(CNN), namely Wavelet-CNN-LSTM netwrok for predicting the tailings pond risk.Firstly, we construct the especial nonlinear data processing method to impute themissing value with the numerical inversion (NI) method, which combines correlationanalysis, sensitivity analysis, and Random Forest (RF) algorithms. Secondly, a newforecasting model was proposed to monitor the saturation line, which is the lifeline ofthe tailings pond and can directly reflect the stability of the tailings pond. After usingthe discrete wavelet transform (DWT) to decompose the original saturation line datainto 4-layer wavelets and de-noise the data, the CNN was used to identify and learn thespatial structures in the time series, followed by LSTM cells for detecting thelong-short-term dependence. Finally, different experiments were conducted to evaluatethe effectiveness of our model by comparing it with other state-of-the-art algorithms.The results show that Wavelet-CNN-LSTM achieves the best score both in meanabsolute percentage error (MAPE), root-mean-square error (RMSE) and R . Introduction

Tailings ponds are places for storing industrial waste. The tailings pond failure is ranked18th in the world’s risk assessment [8]. Worldwide, at least 84 major tailings damaccidents were reported that caused significant damage from 1960–2020 [1]. Nowadays,the safety performance of tailings ponds can only be obtained by manual observation ormeasurement analysis from specific sensors. The measurement includes the saturationline, displacement and deformation of the dam body, seepage flow, and dry beach length.1/19 a r X i v : . [ ee ss . SP ] S e p n practice, considering the variability of topography, weather, and mine constructionconditions, the stability of the tailings dam is very complicated and changeable.At present, a large number of researchers are devoted on tailings pondmonitoring [2, 14, 26, 41, 42]. The researchers are mainly focusing on the stability statusby monitoring data from sensors and make early-warnings in time by mathematicalmodeling method, image recognition method and data analysis. Huang et al. [3]conducted a tailings pond monitoring and early-warning system based onthree-dimensional GIS, the response time of the safety monitoring and early warningsystem is less than 5 seconds. Li et al. [5] proposed GPS means to monitor thedisplacement of tailings dam online. Gao et al. [6] established remote sensinginterpretation using high-resolution remote sensing images. M.Necsoiu [7] used satelliteradar interferometry to monitor the tailings sedimentation. D.F.Che et al. [8] assessedthe risk of tailings pond by runoff coefficient, which can simultaneously determine thesafety performance of multiple tailings dams. Dong et al. [10] set up the alarm systembased on the cloud platform, showing good performance in real-time monitoring. Qiu etal. [11] designed a monitoring system of saturation line based on mixed programming.Tailings dams are usually located in remote mountainous areas, the structure is verycomplicated and the dam breaks problems are almost nonlinear. As a result, thestability of the tailings pond cannot be directly observed. Recently, with the advantagesof handling almost any non-linear and linear problems whatever low- andhigh-dimensions, neural network and machine learning methods have been effectivelycomposed in real-time risk analysis and evaluation [4, 9, 12, 19, 43–46]. However, the roleof real-time monitoring cannot be equated with early warning and forecasting. In otherwords, risk prediction methods could help people perceive risk before it happens. Withexcellent ability to process time-series, classic prediction model such as Auto-RegressiveIntegrated Moving Average (ARIMA) and LSTM have been used in predictionproblems [21, 28, 47–49]. They analyze and identify the time series information oftraining data and give the prediction value for a few days in advance. Nevertheless,different from LSTM, the ARIMA model only gets a high score at the condition of datawith linear correlation or without obvious fluctuation. With the rapid development ofdeep learning, the CNN and LSTM have been the most popular networks. The CNNcould filter out the noise data and extract important features, achieving goodperformance in images, speech, and time-series [50, 51]. While the LSTM network hasthe ability to find the linear or nonlinear time series information from the shallow anddeep network and combine it with current memory [52]. In light of this, combiningLSTM with CNN may achieve better prediction performance to a large extent.As the most important factor of stability of tailings dams, for every 1-meter drop insaturation line, the safety factor of static stability is increased by 0.05 or more [11].High saturation line will lead to a decrease of the dam stability and even potentiallycausing leakage, landslide, and dam break [32–34]. Therefore, the saturation line iscalled the lifeline of tailings dams [13]. Therefore, the stability of tailings dam can bedetermined by their saturation line position of tailings dam accurately. It is imperativeto establish accurate models to predict the height of saturation line and the securitysituation of tailings ponds. However, the prediction research of tailings pond is almostnonexistent. For this purpose, our goal is to propose a new model that can make full useof the strengths of deep learning. In more detail, utilizing the hidden information of theprevious saturation line, the model will predict the value and tendency in the next fewdays. Meanwhile, our proposed model is evaluated by comparing with state-of-the-artmodels, which shows our two kinds of CNN-LSTM models are the most effective choice,especially the CN N − LST M . Where convolutional layers play important roles ingrabbing more complex information and pass it on to the LSTM layers. It should bementioned that because of the complex situation of the tailings dam, the data sequence2/19f the saturation line is unstable and there is a lot of noise. whether in terms of time orfrequency of data, the noises usually contain useless information, which not only takesup a lot of space or memory, but also affects the analyst to draw accurate conclusions.To overcome the drawbacks of simple networks that cannot de-noise the raw data, weapplied the discrete wavelet transform (DWT) to decompose the saturation line intodifferent time-frequency sequences, and the rigrsure strategy is used to calculatethreshold to remove the noise in the decomposed data, and finally the de-noised waveletis reconstructed to obtain new integrated data for further study. Combining the DWTmethod, our CN N − LST M can be improved as Wavelet-CNN-LSTM to achieve thebetter performance. In this work, taking Jiande tailings pond, China, as the study area,four main contributions of our study are presented:(1) Proposing a NI method using RF algorithm to fill missing values, which save thetime-series information as much as possible.(2) Proposing a new CNN-LSTM network to solve the tailings pond risk predictionproblem, which achieves great performance in MAPE, RMSE and R .(3) Comparing our CNN-LSTM model with different hyperparameters and withother state-of-the-art algorithms.(4) Combining the DWT method with CNN-LSTM network to achieve betterperformance, especially in data with a lot of noise.Totally, in this work, Pearson correlation coefficient, feature importance of RF modeland sensitivity analysis techniques have been employed for the saturation lineprediction, especially severed as tools of dimensionality reduction. After the process ofdimensionality reduction, only two kinds of monitoring data are needed to restructurethe saturation line data. After de-noise the data using DWT, our Wavelet-CNN-LSTMmodel was established for further tailings pond risk forecast. Materials and Methods

Numerical Inversion Method

In the monitoring data, a small part of the data is missing or abnormal. It should benoticed that, for a time-series prediction problem, missing value will cause the loss oftime dependence, which will restrict the performance of the predictionmodel [17, 18, 29, 30]. Hence, we hope to keep our data with favourable long-term andshort-term continue information. Similarly, instead of deleting the abnormal datadirectly, abnormal saturation line value could be reconstructed by our NI system. Thekey to the solution is to find the relationship between missing value and other normalvalues. According to the special relationship, the missing value could be reconstructedby other normal values. However, it is hard to find the precise computing relationshipbetween the saturation line values and other features. The method in this study is tocreate a direct mapping from the inputs to the outputs, using machine learning, whichhas the ability for finding the relationship between inputs and outputs [16].In other words, we composed our NI method to reconstruct the data from buildingthe RF model, by doing so, more data are achievable. In more detail, this NI systemincludes three steps. First, considering that a large number of parameters may have astrong correlation and if the correlation among features is strong, it will be difficult toevaluate the importance of a single feature. Taking into account the possibility ofmissing values for each parameter, we should choose as few parameters as possible asthe input of NI method. To solve the problem, Pearson correlation coefficients [38, 39]are calculated and a heat map is drawn, which helps eliminate the characteristics with astrong correlation (correlation coefficient greater than 0.8). The Pearson correlation3/19 igure 1.

The Pearson Correlation heat maps. The left side shows the correlationsamong original data, the right side shows the correlations among the remaining features.coefficients is defined as follows: P m i ,n i = k (cid:80) m i n i − (cid:80) m i (cid:80) n i [ k (cid:80) m i − ( (cid:80) m i ) ] [ k (cid:80) n i − ( (cid:80) n i ) ] (1)where m i , n i are two different variables, k is the number of variables. From Figure 1,the left side shows the correlations among original data, the right side shows thecorrelations among the remaining features by Pearson method. Second, a RF model hasbeen composed, where we explored the feature importance ranking generated byRF [22, 31] by sorting the features according to how much they contributed to the modelduring building process. Third, posterior judgment is also required. We are interestedin which features have great impacts on the output of the trained RF model. Sobolsensitivity analysis is adopted to explore the contribution of the individual feature andwhich parameters are influential and drive model outputs [23–25, 39]. The featureimportance ranking according to the RF model and sensitivity analysis are shown inFigure 2. They jointly selected x x x x

4) and outputs (saturation line) to predictthe saturation line [16]. Moreover, the abnormal data will be deleted and replaced withpredicted data by NI method. It should be noticed that rainfall and water level arefactors that directly affect the height of the saturation line, and they have a similartime-series relationship. Therefore, the NI method in this study greatly preserves thetime-series information of the saturation line and generate more achievable values forfurther deep learning prediction.

Discrete Wavelet Transform

Although the window Fourier transform (short-time Fourier transform) can partiallylocate the time, since the window size is fixed, it is only suitable for stationary signalswith small frequency fluctuations, and not suitable for non-stationary signals with largefrequency fluctuations. As a signal time-frequency analysis method, the wavelettransform (WT) can automatically adjust the window size according to the frequency.What has greatly contributed the effectiveness of WT is the truth that it is an adaptivetime-frequency analysis method which can perform multi-resolution analysis. As a verysuitable non-stationary signals method, local features of signals can be extracted byWT. As a result, wavelet transform is known as a microscope for analyzing andprocessing signals. In our study, we apply the discrete wavelet transform (DWT) [26] todecompose the collected saturation line data of tailings pond in to 4 frequencysequences. After removing the noise in the decomposed data, the wavelets arereconstructed to obtain new integrated data for further multi-resolution study. The WTrefers to the displacement of a certain basic wavelet function by ω units, and then theinner product with the analysis signal p ( t ) at different scales. 4/19 igure 2. The feature importance ranking according to the RF model(left) andsensitivity analysis(right). It means that in the RF model we trained, x x W T s ( (cid:15), ω ) = 1 √ n (cid:90) + ∞−∞ p ( t ) φ ( t − ωn ) dt (2)where (cid:15) is the scale factor (larger than 0), which is used to stretch the each basicwavelet φ ( t ). ω is the displacement. Mallat algorithm [23] provides an effective way todisplay DWT to process the data using the low-high-pass filters: oL = ∞ (cid:88) i = −∞ T ( i ) ψ l (2 n − i ) (3) oH = ∞ (cid:88) i = −∞ T ( i ) ψ h (2 n − i ) (4)Where T ( i ) means the signal. ψ l , ψ h , oL, oH are the low-pass filter, high-pass filter,output of low-pass filter, and output of high-pass filter, respectively. Notably, In thewavelet domain, the coefficient corresponding to the effective signal is large, and thecoefficient corresponding to the noise is small, the rigrsure threshold is an effective wayin DWT: g ( k ) = [ sort | t | ] , ( k = 0 , , ..., N −

1) (5)In the equation, the absolute value of each signal is achieved and then sorted, and thesquare of each number is taken to obtain a new signal sequence. γ t = (cid:112) g ( k ) , ( t = 0 , , ..., N −

1) (6)

Risk ( t ) = ( N − t + (cid:80) ti =1 g ( j ) + ( N − t ) f ( N − t )) N (7) γ t = (cid:112) g ( t min ) (8)The t is the signal, γ t is the threshold and Risk ( t ) is the generated risk. Take theminimum g ( t ) corresponding to all risks r ( K ) to get the final threshold γ t . 5/19 igure 3. The Discrete Wavelet Transform process.The 3-level decomposition and the reconstruction process of DWT using Mallatalgorithm is shown in Figure 3(a) and Figure 3(b), respectively. From Figure 3(a) wecan see that after decomposing the signal into three different levels. In more detail, atthe first level, the original signal T is decomposed to the detail coefficients oL and oH .Then the achieved oL is decomposed to the other two coefficients oH and oL at thesecond level. The decomposition process does not end until the set number of n -levelsteps is reached. The Figure(b) illustrates the process of de-noise and reconstruction.The noises are shown with small wavelet coefficients, while the useful signals are shownwith small wavelet coefficients [35, 36]. The time-series signal T passes through thelow-pass filter oL and high-pass filter oH for removing the wavelet coefficients oflower amplitude and restore the wavelet coefficients of higher amplitude to achieve theeffect of noise reduction. Subsequently the wavelet reconstruction and integrationprocess is applied on all of these coefficients. Employing the coefficient oL , the lowfrequency and high amplitude rL is reconstructed. As shown in rL in Figure(b), thesequences become smooth, showing the more obvious tendency change patterns. CNN-LSTM Prediction Model

Our study aims to develop the construction of a prediction system for forecasting thesaturation line utilizing state-of-the-art LSTM and CNN networks. What has devotedto the popularity of the convolutional layer is the fact that it good at extracting andrecognizing as well as identifying the structures of the time series in the monitoringdata, while the LSTM networks achieve good performance in detecting long-short-termdependence. In light of this, the principle idea of our study is to combine theadvantages of CNN and LSTM.The proposed model in our study named CNN-LSTM model, including two versions,which include two parts of layers. The first part is convolutional layers and max-pooling6/19 igure 4.

The CNN-LSTM auto-encoder model.layers, while the second part is LSTM layers. The convolutional layers encode thetime-series information, while the LSTM layer decodes the encoded the informationfrom convolutional layers, and will be flatten and pushed into a fully-connected layer.The CNN-LSTM auto-encoder model is shown in Figure 4.

Convolutional and Pooling Layers

The convolutional layers and the max-pooling layer detect the spatial structures andfeatures of the saturation line values together with reducing the redundantcharacteristics, respectively. More important, the convolutional layer could extracthidden information in the time dimension, and usually passes higher quality and denserfeatures to further layers.More specifically, numerous useful convolved features will be generated byconvolution kernels, which are always more important than the original features. As asubsampling method, max-pooling layer saves certain information from the convolvedfeatures and reduce the original data dimension. Specifically, the pooling layer helps tocollect and summarize the features from convolutional layer.

Long-Short-Term Memory (LSTM) network

As a popular type of recurrent neural network(RNN), LSTM achieves good performancein detecting long-term dependencies. The problem named ”lack of memory” was solvedafter LSTM was proposed, which means the time-series information cannot beeffectively exhibited. Moreover, “vanishing gradient problem” prevents the RNN forlong-time dependencies detecting. The LSTM model is composed of one memory unitand other three interactive gates: memory cell, input gate, forget gate and output gate.The memory cell memorizes the state from the previous state. The input gatedetermines how much input data of the network needs to be saved to the unit state atthe current moment t . The forget gate controls whether the information will bediscarded or enters the input gate as reserved information at time t −

1. The outputgate determines what information will be utilized as the output. Eqs.(1)–(6) brieflydescribe the update in the LSTM layers. i t = σ ( V i x t + W i h t − + b i ) (9) f t = σ ( V f x t + W f h t − + b f ) (10)7/19 igure 5. The architecture structure of proposed

CN N − LST M and CN N − LST M (cid:101) c t = tanh ( V c x t + W c h t − + b c ) (11) c t = f t ⊗ c t − + i t ⊗ (cid:101) c t (12) o t = σ ( V o x t + W o h t − + b o ) (13) h t = o t ⊗ tanh ( c t ) (14)where x t is the input data at time t , V ∗ and W ∗ denote the weight matrix, h ∗ is thehidden state, b ∗ is the bias. σ and tanh are the activation function of sigmoid and tanh,respectively. i t , f t , c t and o t stand for the input gate, forget gate, memory cell andoutput gate, respectively. The ⊗ means the component-wise operation. Finally, output h t is calculated by output gate and information in memory cell. CNN-LSTM Model For Prediction

In our study, two different CNN-LSTM structures are utilized. The first version named

CN N − LST M , which consists of two convolutional layers of 16 and 32, amax-pooling layer filters of 2, a LSTM layer of 50, a flatten layer and a fully-connectedlayer in order. The second version named CN N − LST M , which includes oneconvolutional layer filters of 32, a max-pooling layer filters of 2, a flatten layer, twoLSTM layers of 25, 50, a flatten layer and a fully-connected layer in order. Differentparameters are compared for further study. The two kinds of CNN-LSTM structures areshown in Figure 5(a) and Figure 5(b). Data Preparation

The study site is Jiande copper mine tailings pond, Hangzhou, Zhejiang Province,China, where the amount of mineral copper metal accounts for about 60% of theprovince’s total output. The main mineral products are copper concentrate, zincconcentrate, sulfur concentrate, and by-product gold and silver. The tailings pond levelis

III . Different geological hazard sensors are installed to monitor the surfacedisplacement, dam body internal displacement, saturation line height, water level,rainfall, and seepage flow [26, 27, 37]. The research data for this work were collectedfrom the sensors mentioned above from 2018-03-18 to 2019-04-29. For this study, our8/19 able 1.

Describe of the datasets used for saturation line prediction.

NO.8 NO.13 NO.17 NO.21 NO.28 NO.33 Base status − − − − − − − mean 4.57 7.73 11.32 10.67 14.52 15.03 Normalmin 4.08 6.86 11.14 10.08 14.04 14.70 Normal25% 4.51 7.68 11.28 11.28 14.23 14.96 Normal50% 4.56 7.80 11.36 10.76 14.61 15.01 Normal75% 4.61 7.84 11.37 10.80 14.68 15.04 Normalmax 4.95 8.18 11.40 10.88 15.40 15.41 Normal

Figure 6.

The monitored saturation line data at different positionsdata are from 5 different positions, specifically the 8, 13, 17, 21, 28, 33 stage of thetailings dam, and the time interval between data is two hours.After collecting the data, we used our proposed NI system to fill the missing valueand the abnormal value will be deleted and replaced with predicted value by NI system.Finally, 8365 data were collected for our further study. The continuous monitoring valueensures a wide range of time-series information. It should be noted that ourCNN-LSTM model trained and validated on the 8365 data. Among the 8365 data, werandomly choose 70% of the data as the training sets, the 10% as the validation set.The performance of the models was evaluated on the rest 20% data, which is the unseenpart during the model building process. For keeping the long-short-term dependence inthe data, these data cannot be shuffled as usual in traditional deep learning studies.Table 1 shows the describe of the collected data, and the first three rows are historicalmonitoring data. The distribution of monitoring data is shown in Figure 6. As is shownin Figure 6, there is a wide range of variation in the monitoring data. These changes arelargely affected by tailings pond operations and weather change, such as the dischargeof a large amount of wastewater and waste residue on a certain day or the experience ofheavy rain.In order to eliminate the impact of different data dimensions on the calculation, weused Z − score normalization on the data, the formula is as follows:˙ x = x t − µ t σ t (15)where x t is the input data, mu t and sigma t are the averages and standard deviation ofdata. 9/19 xperiment and Results Two different version

CN N − LST M and CN N − LST M are evaluated andcompared to show the prediction performance. The simulation hardware environment ofthis experiment is Intel Core CPU i7-8750. GPU is NVIDIA GTX 1060, and thememory is 6GB. The algorithm is implemented using Python in conjunction with theTensorFlow framework. Experiment and Results using CNN-LSTM model

The prediction performance of our proposed model is evaluated by root mean squareerror (RMSE). In fact, RMSE meets an important problem: let us consider thatalthough the model has an error of less than 0.5% in the 98% dataset and very big errorin the other 2% dataset, the overall RMSE will be still very high, resulting in this modelconsidered as a poor model. To solve this problem, mean absolute percentage error(MAPE) is utilized in the evaluation process. What’s more, coefficient of determination,denoted as R , is also used in our evaluation methodology. It is the proportion of thetotal variation of the dependent. RM SE = [ 1 n n (cid:88) i =1 ( y t − (cid:98) y p ) ] (16) M AP E = n (cid:88) i =1 | y t − (cid:98) y p y t | ∗ n (17) R = 1 − (cid:80) ni =0 ( y t − (cid:98) y p ) (cid:80) ni =0 ( y t − y t ) (18)Where y t represents the true value, (cid:98) y p represents predicted saturation line value, y t represents average of true value, and n is the count of data. Figure 7 shows theprediction results of CN N − LST M and CN N − LST M on five different monitoringsites about 1750 test sets.In this study, we trained our model for 120 epochs with a batch size of 64, RMSE asloss function and Adam for optimizer. The Adam is an improved RMSProp optimizercombining with the moments trick. It is worth noticing that in order to reduce thefeature loss during the convolutional layers, same padding operation was conductedduring this process. The last but not least, the forecasting sequence length should be setproperly to make sure the model performance. Specifically, we set the sequence lengthas 10. On the one hand, considering that a longer sequence length will occupy a hugecomputer memory, on the other hand, we found through experiments that set thesequence length to 10 achieves better performance than, for example, 4, 7, 20. The mostimportant thing for the hyperparameter selection of the model is the learning rate of thenetwork, which has a significant influence on time consumption until convergence. If thelearning rate is set too large, the loss function will be difficult to converge, resulting in alower final detection accuracy; On the contrary, a small learning rate will lead to slowconvergence and increase the training time. This paper uses cross-validation todetermine the optimal learning rate for each partial network, and the most appropriateoptimal learning rate is 0.001 and weight decay is 0.005. For the selection of the numberof network iterations, the training process is stopped when the model no longerconverges.The prediction performance of our proposed CN N − LST M and CN N − LST M are shown in Table 2 and Table 3, respectively. NO.8, NO.13, NO.17, NO.21, NO.28,NO.33 means the different station of saturation line mentioned above. The 10/19 igure 7. The prediction results of saturation line at different positions. The green line,red line, orange line represent the prediction value from

CN N − LST M , predictionvalue from CN N − LST M and raw data, respectively. From the prediction result, wecan see the CN N − LST M outperform the CN N − LST M . Table 2.

Prediction performance of the proposed

CN N − LST M model using MAPE,RMSE and R . Metrics NO.8 NO.13 NO.17 NO.21 NO.28 NO.33

RMSE 0.0214 0.0344 0.0491 0.0313 0.0134 0.0342MAPE 3.814 4.411 3.731 3.242 3.712 3.621 R CN N − LST M consists of two convolutional layers of 16 and 32, a max-pooling layerfilters of 2, a LSTM layer of 50, a flatten layer and a fully-connected layer. While the CN N − LST M includes one convolutional layer filters of 32, a max-pooling layerfilters of 2, a flatten layer, two LSTM layers of 25, 50, a flatten layer and afully-connected layer. From Table 2 and Table 3 combing with Figure 7, we canconclude that in terms of RMSE, MAPE and R , our proposed model CN N − LST M outperform the CN N − LST M . More specifically, the model which includes oneconvolutional layer, one max-pooling layer, a flatten layer, two LSTM layers, a flattenlayer and a fully-connected layer is more accurate. In fact, even the convolutional layeris good at extraction and recognition, which could detect the spatial features of thesaturation line value well. The deep and abstract features the convolutional layerlearned may be different from the ordinary time-series information from the raw data.This is obviously a disadvantage when the monitoring data contains only simpleinformation. While using one convolutional layer is more suitable and two LSTM layerscan capture the long-short-term data dependencies to a significant degree from theresult. The scatter plots of raw data and predicted saturation line is illustrated inFigure 8, which helps show the prediction performance more intuitively.To show the superiority of our proposed model CN N − LST M , we appliedcomparative studies with other state-of-the-art machine learning and deep learningmodels, including the support vector regression (SVR), decision tree regression (DTR),random forest regression (RFR), multilayer perception (MLP), single GRU, simple RNN11/19 igure 8. The prediction scatters of saturation line at different positions using

CN N − LST M . Table 3.

Prediction performance of the proposed

CN N − LST M model using MAPE,RMSE and R . Metrics NO.8 NO.13 NO.17 NO.21 NO.28 NO.33

RMSE 0.0209 0.030 0.0336 0.0170 0.0123 0.0366MAPE 3.346 3.316 3.207 1.589 3.221 3.432 R as well as LSTM models. Table 4 presents the RMSE, MAPE and R score of thesemodels in our experiments, which demonstrates that our CN N − LST M methodsignificantly outperforms the others in R . Besides, the runtime for 120 epochs is muchless than other deep learning models.In order to build the complete saturation line prediction model and show thereliability of our CNN-LSTM model together with parameters set, we compareddifferent hyperparameters such as batch size, filters in of the convolutional layers,max-pooling size, number of LSTM cells in our experiments. Table 5 lists the differentsituations of combing multiple hyperparameters. In term of the evaluation metrics usedin this task, although Case 2 and Case 5 achieve a litter bit better performance thanour model using ordinary hyperparameters in Case 9, the Runtime is almost twice theCNN-LSTM model. Excessive running time will reduce the real-time performance of Table 4.

Performance comparison of different machine learning and deep learningmodels.

RMSE MAPE R Runtime (second) Model Type − SVR0.141 4.312 0.489 − DTR0.251 4.186 0.839 − RFR— — — — —0.0504 3.744 0.798 44.08 MLP0.0308 3.645 0.864 47.54 RNN0.0221 3.602 0.879 63.55 GRU0.0214 3.596 0.887 77.08 LSTM able 5. Prediction cases using different hyperparameters in CNN-LSTM model. Batch Conv Pool LSTM RMSE MAPE R Runtime

9* 64 32 2 [25,50] 0.0209 3.346 0.969 25.49 prediction, especially when the amount of data is very large. The disadvantage is morepronounced for a large amount of data, and this incurs no loss of generality. Case 3need the least Runtime but achieve low accuracy. As a result, the CNN-LSTM withone convolutional layer and two LSTM layers become the best performer. This is also infull compliance with deep learning logic. Although the padding method restricts thefeature loss of the time-series data to some extent, the pooling layer inevitably losespart of the data information. Considering the accuracy and running time of the model,we keep the model parameters as same as the ordinary model. To be clear, the batchsize is equal to 64, one convolutional layer filters of 32, a max-pooling layer filters of 2,two LSTM layers of 25 and 50. When the tailings ponds meet more complex situation,the data will become very complicated and internal time-series information is harder tocalculate. At that time, single shallow deep learning layer lacks the capability tocapture complex information and deeper layers of LSTM cells will be more suitable. Experiment and Results using Wavelet-CNN-LSTM model

After the above analysis and experiments, our CNN-LSTM model has achieved effectivepredictions in terms of infiltration line prediction. It can be seen from Figure 8 that theoriginal data and the predicted data still do not satisfy the relationship between y = x ,especially in the predictions NO17, NO.28, and NO.33. This is because although thelong-term and short-term dependence and hidden time series information can bediscovered from the data, the prediction accuracy is greatly affected due to the presenceof noise in the data.To overcome the drawbacks of CNN-LSTM model that cannot de-noise the rawdata, we applied the DWT to decompose the saturation line into differenttime-frequency sequences and remove the random noise. Subsequently, the data afternoise reduction is trained by our CNN-LSTM model. Our Wavelet-CNN-LSTM modelis shown in Figure 10. The part A is the de-noise process using discrete wavelettransform, which removes the noise with small wavelet coefficients and the waveletcoefficients belonging to the useful signal are retained, making the data easier to extractto its local features. In part B, the de-noised data is feed to the CNN-LSTM model forpredicting the future tendency. The decomposition-denoise-reconstruction process ofsaturation line data is illustrated in Figure 9(a) and the original data (green) andde-noised data (orange) in different positions is illustrated in Figure9 (b). The results ofall positions are shown in Figure 11. This once again proves that the DWT method canremove a large amount of useless information, thereby assisting our CNN-LSTM modelto more accurately explore the time series information hidden between the data. Table 6presents the RMSE, MAPE and R comparing the CNN-LSTM model withWavelet-CNN-LSTM model. It can be illustrated from Table 6 that theWavelet-CNN-LSTM model achieve the better performance than the CNN-LSTM igure 9. The decomposition-denoise-reconstruction process of saturation line data. Itis obviously that after DWT process, the random noise of original data (green) is deletedand the data become smooth (orange).

Figure 10.

The Wavelet-CNN-LSTM model. The part A is the de-noise process. Inpart B, the de-noised data is feed to the CNN-LSTM model for predicting.model at all saturation line stations. After overcoming the shortcomings that theoriginal CNN-LSTM model cannot de-noise the data, our Wavelet-CNN-LSTMbecomes the better choice for the prediction purpose. Discussion and Conclusion

In this work, we applied a new method to predict the safety of tailings pond accordingto the saturation line using

W avelet − CN N − LST M model, which is also first usedin tailings pond risk prediction. Compared with the traditional methods, the riskevaluation method of tailings ponds has the characteristics of high accuracy and highreal-time performance. The contributions of this work is four fold: Firstly, a NI system(including Pearson correlation coefficients, sensitivity analysis and random forestalgorithms) was applied for reconstructing missing and abnormal values of saturationline by water level of the tailings pond and rainfall. It should be observed that thewater level and rainfall have the same time-series information with saturation line at thesame period. Secondly, two CNN-LSTM models, especially the CN N − LST M model14/19 igure 11. The prediction scatters of saturation line at different positions usingWavelet-CNN-LSTM model. The prediction results of NO.17, NO.28, and NO.33positions are respectively marked as yellow, red and blue, which obviously outperformthe results (NO.17*, NO.28*, and NO.33*) using the CNN-LSTM model. Table 6.

Performance comparison of CNN-LSTM and Wavelet-CNN-LSTM . Station RMSE MAPE R Model Type