Genetically Optimized Prediction of Remaining Useful Life
Shaashwat Agrawal, Sagnik Sarkar, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu
11 Genetically Optimized Prediction of RemainingUseful Life
Shaashwat Agrawal ∗ , Sagnik Sarkar ∗ ,Gautam Srivastava* † ,Praveen Kumar Reddy Maddikunta, ‡ Thippa Reddy Gadekallu ‡∗ School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, IndiaE-mail:[email protected],[email protected] † Department of Mathematics and Computer Science, Brandon University, Manitoba, Canada.E-mail: [email protected] ‡ School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, IndiaE-mail: [email protected], [email protected]
Abstract —The application of remaining useful life (RUL) pre-diction has taken great importance in terms of energy optimiza-tion, cost-effectiveness, and risk mitigation. The existing RULprediction algorithms mostly constitute deep learning frame-works. In this paper, we implement LSTM and GRU models andcompare the obtained results with a proposed genetically trainedneural network. The current models solely depend on Adamand SGD for optimization and learning. Although the modelshave worked well with these optimizers, even little uncertaintiesin prognostics prediction can result in huge losses. We hope toimprove the consistency of the predictions by adding anotherlayer of optimization using Genetic Algorithms. The hyper-parameters - learning rate and batch size are optimized beyondmanual capacity. These models and the proposed architectureare tested on the NASA Turbofan Jet Engine dataset. Theoptimized architecture can predict the given hyper-parametersautonomously and provide superior results.
Index Terms —LSTM, GRU, genetically trained neural net-work, prognostic, hyper-parameters,learning rate, batch size,Remaining Useful Life.
I. I
NTRODUCTION
Remaining Useful Life is an important characteristic ofany machinery or battery. Engines, lithium-ion batteries, waterpumps are some such machinery that requires constant main-tenance as their efficiency decreases with time. Prediction oftheir useful life helps industries in easy replacement, cost-effectiveness, and production efficiency by changing from sys-tematic to condition-based maintenance [1]. Various datasetslike NASA turbofan jet engine, lithium-Ion battery have beenused in the research of this field [2], [3].The NASA turbofan jet engine dataset is a sequential datasetthat inspects a variety of aspects required for prediction.It measures 3 operational parameters and 21 sensor values.Various model-based and data-driven approaches have beenapplied to study and understand this data. Most researchcomes under statistical-based methods of Dynamic BayesianNetworks [4] and machine learning algorithms. Deep Learn-ing models of Long-Short Term Memory(LSTM) and GatedRecurrent Unit(GRU) have been used to study this datasetextensively. Given its time-series data, these models haveshown significant results in this field. Various hybrid layershave also been included to compensate for the complexity of the basic recurrent models and build on them. These mainly in-clude 1-D, 2-D Convolution layers, and Bi-directional, Multi-Directional LSTM layers [5]–[7].In any RUL application, precision takes huge priority toavoid accidents and huge losses. Even if composite algo-rithms and models can provide accurate results, consistencyis important. In any particular unit time cycle, there canbe various possible outcomes depending on even a slightchange in operational settings and other factors. Research onoptimization has increased to cover this necessity. All DeepLearning architectures utilize a certain optimizer for theirlearning but these alone might not be enough. In general,Stochastic Gradient Descent(SGD) and Adam are used dueto their efficiency and consistency.Other optimization methodologies are used on top ofthe default ones to get better results. Adaboost, RecursiveLevenberg-Marquardt (RLM) are some algorithms used. Fol-lowing them, evolutionary algorithms have also become knownfor their optimization success. Nature-inspired algorithms arebeing used extensively in various ways to minimize the energyand optimize the training of neural networks [8]. Their flexibil-ity in terms of application opens up many possibilities in termsof optimization. They can be used in deep learning to optimizehyper-parameters, activation functions, model architectures,and so on.In this paper, the NASA dataset is trained with a tuned 2-layer LSTM and GRU models. The results are compared toexisting outputs and inference shown. After consulting a lot ofliterature, we introduce a semi-novel optimization algorithmusing Genetic Algorithms. Hyper-parameters of the model- learning rate and batch size are self tuned by a GeneticAlgorithm for every generation. ∆ validation loss of every in-dividual in a generation is taken and evaluated. Top individualspass on their genes to the next generation. This methodology ofoptimization works hand in hand with Adam to prevent over-fitting and under-fitting of data. In Section II, the domainsunder study are defined. Section III introduces the existingLSTM and GRU model architectures in use and the proposedtraining methodology using Genetic Algorithms. After thisdefinition, the results of the implemented architectures andtheir comparisons are shown in Section IV. Section V deals a r X i v : . [ c s . A I] F e b with conclusion and future work.II. L ITERATURE S URVEY
A. Remaining Useful Life Prediction
Degradation and rusting occur in every component andelement of our environment. As research progresses, we tendto move from the discipline of diagnosis to one of prognosis.Similarly, instead of systematic and continuous maintenance,the current research focuses on predictive and condition-basedmaintenance. The remaining Useful Life of any machinery canbe predicted by a recorded history of operational conditions,various sensor values, and other evaluation metrics. Instead offocusing on physical models, [3] that prevent failure, data-driven machine learning and other algorithmic models arepreferred [4], [9]. This is because the ease and efficiency ofdata-driven models have attracted more attention than othermethodologies.
B. Machine Learning and Recurrent Neural Networks
Recurrent Neural Networks get their success from thesequential nature of data. Simple RNNs, LSTMs [10], [11], Bi-directional LSTMs, and GRUs are the most utilized recurrentdeep learning networks. The nature of the remaining usefullife data makes them the perfect statistical tool. They cananalyze minute trends in sensor values and still remember longterm details. LSTMs occupy the center stage of these recurrentnetworks because of their structure. Convolutional layers andCNN-LSTM layers have also been stacked on existing LSTMlayers for RUL prediction to increase accuracy. Depending ondata pre-processing and dimension of convolution, the dataattributes can be treated as features of an image and extractedsimilarly [12]. On close observation of the data, it can beinferred that the value ranges provided by each sensor donot amount to much, especially considering the mode. Featureextraction in these cases has to be tuned carefully or it couldresult in the exploding gradient problem. To overcome similarissues, complex training architectures like auto-encoders havealso been proposed [13].
C. Evolutionary Algorithms and Optimization
Adam and SGD optimizers are commonly used for trainingany deep learning architecture. They enable the model tolearn through a loss function using back-propagation. Attimes these optimizations alone are not enough. Adaboost,Recursive Levenberg-Marquardt, and evolutionary algorithmscan be used together with the default ones to better thetraining process. The flexibility and compatibility of GeneticAlgorithms with deep learning frameworks have incited hugeamounts of research. Traditional Genetic Algorithm, Artifi-cial Bee Colony Algorithm [14] and NEAT Algorithm [15]are some extensively used optimizers over neural networks.Genetically optimized LSTM models are also used for similartasks of stock prediction [16] and water temperature prediction[17]. III. P
RELIMINARIES AND P ROPOSED A RCHITECTURE
In this section, we discuss three different architectures:LSTM, GRU, and Genetically Optimized-LSTM respectively.The subsections III-A and III-B discuss the detailed archi-tecture of the generic LSTM and GRU models and theirtraining parameters. Subsection III-C discusses in detail themethodology followed in the proposed training model and thespecific parameters of fitness criteria. The Turbofan jet engine[18] dataset is processed, filtered, and normalized. The modelsare trained on input data of shape, (timeSteps, features) andare later validated. A set of 10 well-defined individuals aretrained in each generation to tune the learning rate and batchsize. Figure 1 shows the methodology adopted in this paper.
Fig. 1. Structure Diagram of Implementation Models.
A. LSTM Architecture
The implementation is a 3-layer model with two LSTMlayers and a single dense layer. Each LSTM layer consists of64 neurons which are followed by a 1-neuron output layer.The first LSTM layer returns sequences to the successiveLSTM layer. The output layer passes through a ReLU acti-vation function for eliminating negative prediction of life andleave the positive output unrectified. The cost functions MeanSquared Error(MSE) and Mean Absolute Error(MSE) are usedfor evaluating the model for both training and validation.We have chosen this cost function due it its performance onregression models. In Figure 2, each LSTM cell consists ofa forget gate, input gate, and output gate which helps it havelong-term memory by default [19].
B. GRU Architecture
The implementation of the GRU model is very similar to theLSTM model. The GRU model has 3 layers - 2 GRU layerseach consisting of 64 parallel GRU cells and followed by asingle 1 neuron wide output layer. The first GRU layers returnsequences to the second GRU layer which unlike the previouslayer does not return a sequence. The output of this secondGRU layer then is connected to the output layer which uses aReLU activation for rectification of the output and preventingnegative values as the predicted life. The same cost functionsMean Squared Error(MSE) and Mean Absolute Error(MAE)are used to evaluate the model on train and validation datadue to its high performance on regression models.
Fig. 2. Long Short Term Memory Cell.
C. Proposed Architecture
We propose a genetic optimization methodology to tunethe learning rates and batch sizes of the LSTM and GRUarchitectures. Fine-tuning any model requires a great deal oftime and yet it is almost impossible to get the perfect learningrate required for any model. To overcome such difficultieswe implement the Genetic Algorithm for each epoch of ourmodel to tune these hyper-parameters. The methodology canbe divided into some major subsections: initialization, training,cross-over, mutation. Figure 3 shows complete flow of theoptimization process.
Fig. 3. Proposed Architecture Model for Optimization Process.
A generation is initialized at the beginning of training. Arandom set of commonly used learning rate and batch sizevalues is parameterized to each model. For generation 1, anew set of individuals are formed. Later generations consistof individuals, and children formed by the mating of parentspresent in the previous one. Each generation has genes of itsdirect predecessor and hence performs better than them.Every individual in a generation is trained for a epoch. Anindividual is an LSTM/GRU model with randomly initializedweights. After training their losses are stored and evaluated.The currently available batch sizes are allotted to each modelin order of succession. Every model shows some decrease intheir loss after training. This loss ( loss current ) is stored andsubtracted from the previous loss ( loss prev ) to evaluate thelearning that took place in that epoch. Individuals with poor ∆ loss Eqn.1 indicate the presence of improper hyper-parameters. ∆ loss = loss current − loss prev (1)The individuals are sorted concerning ∆ loss. Top individ-uals are directly promoted to the next generation. The rest of the new generation is formed by cross-over of individualsfrom the previous generation. Considering any two-parent in-dividuals at random, one out of four combinations of learningrate and batch size are selected and assigned to a clonedmodel as a new child. The core prosperity of any GeneticAlgorithm comes from cross-over. It introduces randomnessand yet completeness in the new generation. After every cross-over, a possibility of mutation is explored. A random factorEqn.2 is chosen from (-1, 0, 1). A non-zero value will resultin a 10% mutation of the learning rate Eqn.3. f actor = ( − , , (2) learningRate ( lr ) = lr + f actor × lr (3)Ideally, 66.7% of each generation undergoes mutation . Thisresults in variable learning rates which would not have beenpossible manually. Mutation can be performed in various waysdepending on application and dataset.IV. R ESULTS AND D ISCUSSION
The advantage of having LSTM and GRU models for usefullife prediction is the sequential nature of the data. They canidentify the minute changes in sensor values and identify thetrend in the data. Mean squared Error and Mean Absolute Er-ror with cross-validation are used to measure the performanceof these models. All models have been implemented and testedwith TensorFlow [20]. The tests are carried out on the NASATurbofan Jet Engine data set containing 3 operational settingvalues and 21 sensor values. It is divided into cases of 250engines each for train and test.
A. Data Preprocessing
The NASA dataset helps in temporal analysis of the remain-ing useful life prediction. It contains a total of 26 attributescategorized into Engine number, Cycle number, 3 operationalsettings, and 21 sensor values. Figure 4 shows the time seriesof a few sensors for three engines. Each column in Figure 4represents an engine and the rows represent particular sensors.The first 2 parameters as well as operational settings do notcontribute to the predictions. Likewise, some sensor valuesalso show constant values for changing cycles and hence holdno significant value. Each value in the dataset is normalizedusing Eqn.4 so that it ranges between [0,1] to avoid highvariation in the data. Figure 5 shows the standardized datapoints for the sensor values for a single-engine.
N ormalize ( x i ) = x i − min ( x ) max ( x ) − min ( x ) (4)All the values of the sensors and operating modes are takenand grouped in terms of engines. The total number of timecycles is calculated by the number of entries for each enginein the dataset.Unlike the usual regression problems, data needs to begrouped sequentially and broken into a constant amount oftime steps as mentioned to perform trend analysis. This datais then ready to be input into the model. We have chosen a time Fig. 4. Plot of the time series of a few sensors belonging to an engine series of a total of 20 historical time steps and a prediction ofthe remaining number of time steps. We have thus obtained146179 sequences of shape (20, 24) each. The expected outputvalue is the normalized remaining life of the jet engine at anygiven time cycle of the engine.
Fig. 5. Standardized sensor values.
B. Training of LSTM model
LSTMs (Long Short Term Memory) models are modifiedRNNs that can store long term memory of data. They canprovide consistency of results and understand the flow of databetter. We have built a 3-layer model with 2 LSTM layersand 1 dense layer. Each LSTM layer in the model contains64 LSTM cells which mediate the flow of information. ThisLSTM network is trained on the Adam optimizer with a meanabsolute error as the cost function for 10 epochs. We haveobtained a falling curve for the training loss metric as shownin Figure 6. The model performance metrics are tabulated inTable I.
C. Training of GRU model
GRUs (Gated Recurring Unit) models are a modified versionof the LSTM model. The enhancement in the GRU structure isthe introduction of a forget gate. We have constructed a GRU
Epochs MSE MAE val. MSE val. MAE
HE COST FUNCTION (MSE - M
EAN S QUARED E RROR AND
MAE - M
EAN A BSOLUTE E RROR ) FOR BOTH TRAINING AND VALIDATION FOR THE
LSTM
MODEL .Fig. 6. Performance metrics of the LSTM model. model similar to that of the LSTM model, it has 3 layers -2 GRU layers and 1 Dense layer. The GRU layers like theLSTM model, have 64 GRU cells each. The GRU model isalso trained on the Adam optimizer with mean absolute erroras the loss function. This model too has been trained for 10epochs. The performance of the model has been tabulated in
Table II and visualized in Figure 7.
Epochs MSE MAE val. MSE val. MAE
HE COST FUNCTION (MSE - M
EAN S QUARED E RROR AND
MAE - M
EAN A BSOLUTE E RROR ) FOR BOTH TRAINING AND VALIDATION FOR THE
GRU
MODEL .Fig. 7. Performance metrics of the GRU model
D. Optimized training of model
The same LSTM model is used for optimized training usinga Genetic Algorithm. 10 individuals are made by cloning thesame model so that there is no favoritism in the trainingprocess. these 10 individuals are then trained for an epoch fora randomly assigned batch size. The loss obtained for eachindividual is then subtracted from the previous individual. Forthe first generation, we subtract it from zero. The individualsare sorted and passed over to the crossover phase. In thecrossover phase, the models with the two highest losses arepassed forward without any change. For the remaining 8individuals, mating is performed between the individuals ofthe previous generation. These individuals go through theprocess of mutation based on a generated factor. Once thenext generation is completely formed, it goes through the sameprocess of training, crossover, and mutation. this process goeson for several generations. In our case, we have gone with 10generations. Table III shows the metrics of the last generationof individuals that we have obtained.
E. Performance Analysis
Once the final generation of the optimized training modelis obtained, the best individual from the lot is selected (this
Individual MSE MAE val. MSE val. MAE
HE COST FUNCTION (MSE - M
EAN S QUARED E RROR AND
MAE - M
EAN A BSOLUTE E RROR ) FOR BOTH TRAINING AND VALIDATION FOR THE
GRU
MODEL .Fig. 8. Predictions of the models against the actual remaining life of the jetengine is done based on the least loss obtained by the model). Thisindividual competes against the LSTM and GRU models thatwe have trained earlier. The LSTM and the GRU models showhigh variation and deviation from the expected values. Theperformance of the LSTM and GRU models are not too faroff from the expected value but is overtaken by the geneticallytrained model. The genetically trained model correctly predictsthe end of life and converges with the expected value incontrast to the LSTM and GRU model as shown in Figure 8.This model, therefore, serves as a better model for predictionand deployment.V. C
ONCLUSION AND F UTURE W ORK
The main objective of Remaining Useful life Predictionis to obtain low-cost maintenance and conserve capital. Thestate-of-the-art implementations and models have shown greatresults in the domain of prognostics. Similarly, the proposedsolution aims to optimize the training process of recurrentneural networks for better predictions. The results obtainedcan display the superiority of the optimized architecture overexisting LSTM and GRU models. It can tune the existingmodels beyond manual limits with the help of cross-over andmutation. Current optimization is limited by the architectureand degree of randomization. The metric of ∆ loss usedcan also be improvised upon. Further research would include greater tuning of hyper-parameters. Initially, all individualsrepresent the same architecture, but bring randomness to thiscould improve the efficiency of the algorithm in the laterstages of training. Other deep learning architectures like CNNsand auto-encoders could be explored. Ensemble learning tech-niques could be used on the final generation to get averagevalues. R EFERENCES[1] P. K. R. Maddikunta, G. Srivastava, T. R. Gadekallu, N. Deepa, andP. Boopathy, “Predictive model for battery life in iot networks,”
IETIntelligent Transport Systems , vol. 14, no. 11, pp. 1388–1395, 2020.[2] Y. Zhang, R. Xiong, H. He, and M. G. Pecht, “Long short-term memoryrecurrent neural network for remaining useful life prediction of lithium-ion batteries,”
IEEE Transactions on Vehicular Technology , vol. 67,no. 7, pp. 5695–5705, 2018.[3] S. Ghorbani and K. Salahshoor, “Estimating remaining useful life ofturbofan engine using data-level fusion and feature-level fusion,”
Journalof Failure Analysis and Prevention , vol. 20, no. 1, pp. 323–332, 2020.[4] K. Medjaher, D. A. Tobon-Mejia, and N. Zerhouni, “Remaining usefullife estimation of critical components with application to bearings,”
IEEETransactions on Reliability , vol. 61, no. 2, pp. 292–302, 2012.[5] X. Li, Q. Ding, and J.-Q. Sun, “Remaining useful life estimationin prognostics using deep convolution neural networks,”
ReliabilityEngineering & System Safety , vol. 172, pp. 1–11, 2018.[6] J. Li, X. Li, and D. He, “A directed acyclic graph network combinedwith cnn and lstm for remaining useful life prediction,”
IEEE Access ,vol. 7, pp. 75 464–75 475, 2019.[7] M. Alazab, S. Khan, S. S. R. Krishnan, Q. Pham, M. P. K. Reddy,and T. R. Gadekallu, “A multidirectional lstm model for predicting thestability of a smart grid,”
IEEE Access , vol. 8, pp. 85 454–85 463, 2020.[8] P. K. R. Maddikunta, T. R. Gadekallu, R. Kaluri, G. Srivastava, R. M.Parizi, and M. S. Khan, “Green communication in iot networks using ahybrid optimization algorithm,”
Computer Communications , 2020.[9] X.-S. Si, W. Wang, C.-H. Hu, and D.-H. Zhou, “Remaining useful lifeestimation–a review on the statistical data driven approaches,”
Europeanjournal of operational research , vol. 213, no. 1, pp. 1–14, 2011.[10] K. Deng, X. Zhang, Y. Cheng, Z. Zheng, F. Jiang, W. Liu, and J. Peng,“A remaining useful life prediction method with long-short term featureprocessing for aircraft engines,”
Applied Soft Computing , p. 106344,2020.[11] Y. Wu, M. Yuan, S. Dong, L. Lin, and Y. Liu, “Remaining useful lifeestimation of engineered systems using vanilla lstm neural networks,”
Neurocomputing , vol. 275, pp. 167–179, 2018.[12] T. R. Gadekallu, N. Khare, S. Bhattacharya, S. Singh, P. K. R. Mad-dikunta, and G. Srivastava, “Deep neural networks to predict diabeticretinopathy,”
J. Ambient Intell. Humaniz. Comput , 2020.[13] G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, R. Kaluri, D. S. Rajput,G. Srivastava, and T. Baker, “Analysis of dimensionality reductiontechniques on big data,”
IEEE Access , vol. 8, pp. 54 776–54 788, 2020.[14] S. Kumar, V. K. Sharma, and R. Kumari, “A novel hybrid crossoverbased artificial bee colony algorithm for optimization problem,” arXivpreprint arXiv:1407.5574 , 2014.[15] A. A. ElSaid, A. G. Ororbia, and T. J. Desell, “The ant swarm neuro-evolution procedure for optimizing recurrent networks,” arXiv preprintarXiv:1909.11849 , 2019.[16] H. Chung and K.-s. Shin, “Genetic algorithm-optimized long short-termmemory network for stock market prediction,”
Sustainability , vol. 10,no. 10, p. 3765, 2018.[17] S. Stajkowski, D. Kumar, P. Samui, H. Bonakdari, and B. Gharabaghi,“Genetic-algorithm-optimized sequential model for water temperatureprediction,”
Sustainability , vol. 12, no. 13, p. 5374, 2020.[18] A. Saxena and K. Goebel,
Turbofan Engine Degradation SimulationData Set , 2008, https://ti.arc.nasa.gov/c/13/, NASA Ames, Moffett Field,CA.[19] M. Parimala, R. Swarna Priya, M. Praveen Kumar Reddy, C. Lal Chowd-hary, R. Kumar Poluru, and S. Khan, “Spatiotemporal-based sentimentanalysis on tweets for risk assessment of event using deep learningapproach,”
Software: Practice and Experience .[20] T. org,