[PDF] Simple statistical models and sequential deep learning for Lithium-ion batteries degradation under dynamic conditions: Fractional Polynomials vs Neural Networks

Abstract

Longevity and safety of Lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions: hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors contributing to the battery capacity degradation, especially because the different degradation processes occur at various timescales and their interactions play an important role. This paper proposes and compares two data-driven approaches: a Long Short-Term Memory neural network, from the field of deep learning, and a Multivariable Fractional Polynomial regression, from classical statistics. Models from both classes are trained from historical data of one exhausted cell and used to predict the SoH of other cells. This work uses data provided by the NASA Ames Prognostics Center of Excellence, characterised by varying loads which simulate dynamic operating conditions. Two hypothetical scenarios are considered: one assumes that a recent true capacity measurement is known, the other relies solely on the cell nominal capacity. Both methods are effective, with low prediction errors, and the advantages of one over the other in terms of interpretability and complexity are discussed in a critical way.

Full PDF

SSimple statistical models and sequential deep learning forLithium-ion batteries degradation under dynamic conditions:Fractional Polynomials vs Neural Networks

Clara B. Salucci , Azzeddine Bakdi ,Ingrid K. Glad , Erik Vanem , Riccardo De Bin February 12, 2021

Abstract

Longevity and safety of Lithium-ion batteries are facilitated by eﬃcient monitoring and adjustmentof the battery operating conditions: hence, it is crucial to implement fast and accurate algorithmsfor State of Health (SoH) monitoring on the Battery Management System. The task is challengingdue to the complexity and multitude of the factors contributing to the battery capacity degrada-tion, especially because the diﬀerent degradation processes occur at various timescales and theirinteractions play an important role. This paper proposes and compares two data-driven approaches:a Long Short-Term Memory neural network, from the ﬁeld of deep learning, and a MultivariableFractional Polynomial regression, from classical statistics. Models from both classes are trainedfrom historical data of one exhausted cell and used to predict the SoH of other cells. This work Corresponding author.Department of Mathematics, University of Oslo, 0851 Oslo, Norway, [email protected] Department of Mathematics, University of Oslo, 0851 Oslo, Norway, [email protected] Department of Mathematics, University of Oslo, 0851 Oslo, Norway, [email protected] Department of Mathematics, University of Oslo, 0851 Oslo, Norway;DNV GL Group Technology and Research, 1322 Høvik, Norway, [email protected] Department of Mathematics, University of Oslo, 0851 Oslo, Norway, [email protected] a r X i v : . [ s t a t . A P ] F e b ses data provided by the NASA Ames Prognostics Center of Excellence, characterised by varyingloads which simulate dynamic operating conditions. Two hypothetical scenarios are considered:one assumes that a recent true capacity measurement is known, the other relies solely on the cellnominal capacity. Both methods are eﬀective, with low prediction errors, and the advantages ofone over the other in terms of interpretability and complexity are discussed in a critical way. Keywords

Lithium-ion batteries; State of Health prediction; Long Short-Term Memory; Multivariable Frac-tional Polynomial; Linear regression; NASA Randomized Battery Usage Data Set

Transports account for the largest share of greenhouse gas emissions, hence it is fundamental tooptimise sustainable and low-emission solutions such as electric batteries. Lithium-ion batteries(LIBs) are the most popular battery technology, as they oﬀer important advantages compared toother battery types such as lead-acid, nickel-cadmium or nickel-metal-hydride [1, 2]. The perfor-mance of LIBs is nevertheless destined to deteriorate over time (calendar ageing) and usage (cycleageing), as they are complex electrochemical systems sensitive to operating conditions and theirnonlinear characteristics are time-varying due to ageing. To cope with the increasing demand forhigh-performance and durability of rechargeable batteries, Prognostics and Health Management(PHM) received tremendous attention over the recent years. PHM plays a crucial role: it en-ables the operators to monitor the State of Health (SoH) of the battery, deﬁned in Section 2.2,and take actions to maintain availability and reliability. The LIB prognostics phase includes ﬁvesteps [3]: measurement, feature extraction, SoH estimation, SoH prediction, and Remaining UsefulLife (RUL) estimation. Various approaches are proposed in the literature for SoH estimation ofLIBs, which may be categorised as: (i) experimental approaches, including direct SoH testing andexperiment-based static models; and (ii) adaptive data-based methods, including ﬁlters, observers,expert systems, statistical and machine learning (ML) methods. This article focuses on statistical2nd ML methods.Experimental direct testing methods [4] include internal resistance and impedance measurement,battery energy level, and incremental capacity analysis methods [5]. Common measurement-basedmodels for SoH estimation include coulomb counting, destructive tests [3], and other data-ﬁttingmodels obtained only from test measurements. The experimental methods can only be conductedoﬄine and are highly time-consuming: hence they are inappropriate for the Battery ManagementSystem (BMS). Adaptive models are based on online parameter estimation using: physical modelssuch as equivalent circuits [6], and electrochemical models [7]; purely data-driven models; domainknowledge; hybrid models. Among the approximate physical models, the widest classes of SoHestimation methods are ﬁlters and state observers: especially, Kalman ﬁlter and its unscented andextended versions are widely adopted for the estimation of State of Charge (SoC), deﬁned e.g. in [8],and have their applications successfully extended to SoH estimation [9] and sliding-mode observers[10]. Pure knowledge-based SoH estimation approaches comprise expert systems such as fuzzy logic[11] and Bayesian networks [12] with structures designed by experts: these are however limited,though widely integrated with the other approaches [13].Data-driven SoH estimation models are abundant in the literature: they are usually based onpartial segments of charging/discharging curves, and mainly rely on ML methods. For example,[14] developed an energy-based segmentation called ”energy of equal distance voltage diﬀerence”to estimate SoH using deep neural network (NN). A gate recurrent unit-convolutional NN wasdeveloped in [15], and deep convolutional NN (CNN) in [16] to estimate SoH from full trajectoriesof ﬁxed charge curves; whereas [17] extended the idea to multiple cells SoH estimation from their(voltage, current, temperature) trajectories over a sliding window using Long Short-Term Memory(LSTM). A more comprehensive overview on data-driven models can be found in [8].While exhibiting good prediction abilities, ML approaches lack interpretability and require largeamounts of data: the complexity of ML models, which are diﬃcult to be handled and checked, risksreplacing the complexity of the physical problem that was to be avoided by considering a data-drivenapproach. For instance, a NN acts as a black box: it provides results based on transformationsunavailable to the user. The example of [18], in which a transformation of the response is amongthe inputs, shows that the NN may be forced to ﬁnd spurious relationships: in fact, the prediction3rror is not 0 as it should have been, had the right relationship been identiﬁed. Furthermore, thisexample points out that recognizing potential issues can be diﬃcult when dealing with black-boxmethods: in contrast, a linear regression with the same response transformation among the inputswould have provided a clear indication of the one-to-one relationship between the input and theoutcome. Motivated by this case, we aim at promoting the use of simple statistical methods tomodel LIB degradation, recognising that complex models have the additional downside of beingnot implementable in the BMS. We ﬁrst design a ML approach based on a Deep LSTM RegressionNetwork (D–LSTM–RN), then present an alternative perspective in which a statistical model isidentiﬁed via Multivariable Fractional Polynomial (MFP) approach. We discuss the advantagesin terms of interpretability (e.g. explanation of the individual feature contribution to ageing ac-celeration), generalisability (ML methods are generally prone to overﬁtting, which is more easilyhandled for simpler models), and portability (e.g. computational eﬃciency and implementabilityon the BMS), without substantial loss of performance in terms of prediction ability. Both our MLand statistical approaches are non-destructive; they make use of measured variables only, withoutneed of computing SoC, open circuit voltage (OCV), or any derived quantity; and they are basedon LIB historical data, without using inputs from speciﬁc reference cycles or particular segments.Other works considering linear regression in connection to LIB capacity estimation or predictionare [19, 20, 21, 22, 23]. Diﬀerent statistical methods have also been proposed, for example: [24]used support vector machines to develop a curve-similarity factor to estimate SoH from chargingvoltage segments; [25] estimate SoH using Gaussian process regression based on another form ofpartial segments designed as n equispaced voltage points. Further methods require the segment tobe a full cycle, such as approximate weighted total least squares [26] to estimate the rated dischargecapacity from arbitrary total capacity. Due to their common idea of partial segments, the accuracyof these methods is subject to the availability of long deep monotonic segments, and they cannotbe applied to SoH estimation for LIBs under dynamic conditions and calendar ageing.Conversely, history-based SoH prediction approaches can explain the degradation of batteryhealth as inﬂuenced by the whole LIB history, and they are crucial in PHM for their simultaneousadvantages of predicting the future performance and optimising the present operating conditions.Diﬀerent operating conditions have diﬀerent eﬀects on the LIB ageing behaviour; a SoH prediction4odel with capabilities of predictive and prescriptive analytics, such as those considered in thisarticle, enables the BMS to adjust temperature and charge/discharge currents to increase longevityand facilitate safe, high-performance operation.The LIB degradation emerges from a complex interplay between many inﬂuencing elements,degradation mechanisms (internal side reactions), degradation modes, and observed degradationeﬀects [27]. The inﬂuencing elements include: cell and pack design factors; production factors;and application (stress) factors; they inﬂuence internal side reactions through complex irreversiblephysical and chemical processes, which in turn lead to the various degradation modes of Lithiumdepletion, active material loss, electrolyte decomposition, and increase of internal resistance [27]. Asa result, the observed LIB degradation eﬀects are capacity fade and power fade. While the designand production factors of inﬂuence are ﬁxed and depend on the monitored LIB, the stress factors aredynamic features that may accelerate the LIB degradation behaviour. They can be extracted frommeasured variables or estimated states and they include: exposure to elevated and low voltages;Depth of Discharge (DoD); cycle bandwidth; cycling frequency; high and low temperatures; highdischarge rates (Section 2.1).A critical review [3] emphasises that the degradation eﬀects originate from various processes andtheir interactions: studying the ageing mechanism is challenging as these processes occur simulta-neously, they have diﬀerent time scales, and it should be avoided to analyse them independently.However, very few multi-factor SoH degradation analyses are reported in the literature: [28] stud-ied the eﬀects of current, cycling limits, and temperature on battery ageing using four dependentmodels of these factors; [29] conducted a weight analysis to study the inﬂuence of voltage, capacityand internal resistance inconsistency on module capacity; [30] conducted orthogonal experiments tostudy the impact degree of single and multiple stress factors on capacity loss. Unfortunately, thesefactors are considered independently or in subgroups, as pointed out in Section 2.1. Thus, thispaper also aims at exploring how various combinations of stress factors aﬀect the LIB degradation,and the degree of impact of diﬀerent factors to ageing acceleration.The remainder of this article is organised as follows: Section 2 overviews the stress factorsfor LIBs, introduces the problem of SoH degradation in relation to the battery capacity fade,and presents the experimental data used for the analysis; Section 3 introduces the LSTM neural5etwork, and presents our deep LSTM regression network model and its results; Section 4 describesthe MFP algorithm in the context of linear regression, and presents the MFP models and theirresults; Section 5 compares the two methodologies with each other and with recently publishedmethodologies applied on the same dataset; Section 6 summarises the main results and providesconcluding remarks. LIB capacity decrease is attributed to multiple factors and processes and their interactions overvarious timescales, including: exposure to elevated voltages; Depth of Discharge (DoD); cycle band-width; cycling frequency; elevated temperatures; and high discharge rates [31, 32].Temperature has a signiﬁcant impact on performance, safety, and ageing of LIBs. The temper-ature range ( − ◦ C , +60 ◦ C) is considered acceptable [33], whereas the desired temperature rangeis (+15 ◦ C , +35 ◦ C) [34]. The eﬀects of temperature on LIBs can be classiﬁed into low-temperatureand high-temperature eﬀects: in both cases, extreme temperatures aﬀect both calendar and cycleageing. Low temperatures may cause slow chemical reactions and charge transfer, decreased ionicconductivity, and Lithium plating [35]. Discharging at low temperatures results in power limitation,while low-temperature charge produces a reduced power capability and cold cranking [36]. Hightemperatures may cause loss of Lithium and increase of internal resistance, which in turn produceloss of capacity and of power, respectively. Furthermore, the operating temperature aﬀects theState of Charge (SoC) of the battery, and extreme temperatures may accelerate Lithium platingon the anode. Temperature-based features will be designed to account for the inﬂuence of tem-perature on the battery ageing behaviour. Fast charging is a desired aspect in batteries; however,high charge-discharge rates may cause mechanical-induced damage of active particles in LIBs andaccelerate the capacity fade [37]. Lithium plating is also associated with fast charging, particularlyin combination with low temperatures. The eﬀects of diﬀerent current rates on coulomb eﬃciencyand capacity loss are studied in [38], and [39] explores the impact on capacity estimation, conﬁrmingthat the C-rate (or, equivalently, the current intensity) is an important feature for capacity fade6rediction. In this work, we consider the discharge current I .The charge–discharge cycling frequency is related to mechanical stress on LIBs: it aﬀects thedegradation behaviour, especially when it is extreme [40], but high frequency may also come asa consequence of the battery ageing. The data at hand are characterised by a varying cyclingfrequency, which we consider in the analysis through the cycle duration, the proportion of cycleslasting less than the default duration, and the rest time. The rest time aﬀects the recovery eﬀects,too, and the charge balancing of the battery, and inﬂuences its lifetime [41]. Besides temperature,both cycle current and rest time aﬀect the SoC estimation, a complex process which often leads touncertain results: this work advantageously avoids relying on SoC estimation.High voltages and overcharge contribute to Lithium plating and electrolyte decomposition, whichaccelerate the battery ageing [42]. The Depth of Discharge DoD = SoC − SoC is also usuallyconsidered as a stress factor; however, [43] showed that the adopted SoC range (SoC , SoC ) playsa bigger role: in fact, though batteries cycled with range (100%, 25%) degraded faster than thosewith (100%, 40%), it was also observed that (100%, 40%) degraded much faster than (85%, 25%),despite the same DoD. Furthermore, (100%, 50%) showed a faster degradation than (85%, 25%),despite the lower DoD. In our work, the initial voltage and the voltage diﬀerence of the cycle accountfor the stress factors of high voltages, DoD, and SoC range.All these stress factors are accounted for, directly or indirectly, in our model. However, thereare redundancies in the input feature space, to account for the presence of nonlinear relationshipsbetween the features. These factors are analysed in previous works separately; however, they shouldnot be analysed independently as their interactions play an important role in LIBs ageing. Thisanalysis models the battery capacity loss as a process of all factors simultaneously, and researcheswhich combinations of features contribute mostly in explaining the capacity fade, measuring theirdegree of importance. The dataset we use is part of the “Randomized Battery Usage Data Set” by the NASA AmesPrognostics Center of Excellence (PCoE) [44], which comprises ageing data for 18650 LIBs cycledunder randomly generated current proﬁles. The aim of the experiment was to mimic the dynamic7perating condition of batteries used in real-life. In fact, though laboratory data have enabledimportant progress in the study of LIB deterioration, they are typically gathered under particularand unrealistic conditions, for example with small temperature variation and constant current. Thisrandomised dataset constitutes an eﬀort towards a better approximation of real-life conditions; it isa well-known benchmark that is widely used for training, testing, and comparing various methods inthe literature, hence we adopt it to validate our work. The data considered in this study pertain tofour battery cells, RW9, RW10, RW11 and RW12, that were operated in a controlled environmentunder diﬀerent modes; the two modes relevant for this analysis are: • Reference discharge: a controlled full discharge cycle, occurring immediately after a con-trolled full charge cycle, allows to compute the cell capacity periodically. Thus, the capacitypredicted by the models can be compared to the true capacity. During a reference chargecycle, the cell is initially charged at a constant current of 2 A, until the battery reaches themaximum voltage of 4.2 V; then, the voltage is kept constant until the charging current dropsto 0.01 A. During the subsequent reference discharge cycle, the cell is discharged at 1 A untilthe voltage reaches the threshold of 3.2 V. • Random walk (RW) steps: the current load is selected at random from the set: { -4.5 A, -3.75 A, -3 A, -2.25 A, -1.5 A, -0.75 A, 0.75 A, 1.5 A, 2.25 A, 3 A, 3.75 A, 4.5 A } with negative values implying charging and positive values implying discharging. The ran-domly selected charge or discharge lasts for 5 minutes, unless the voltage reaches the rangeboundaries [3.2 V, 4.2 V]: in this case, the RW step is immediately stopped and a new currentis selected from the set to proceed with another step. Between every two charge/dischargesteps, a short resting time ( < C d = (cid:90) t cutoff I d dt ; (1)in particular, since the discharging current for the reference cycles throughout the experiment isalways I d = 1 A, the magnitude of the discharging capacity corresponds to the discharging time,expressed in hours. It should be noted, however, that the batteries are not likely to be at equilibriumat the beginning of each reference cycle, since there were too short resting periods (or no restingperiods at all) to allow reaching the steady state. Consequently, the initial voltage of the cycles isuncertain, and generally diﬀerent from the desired value of 4.2 V. This translates into an uncertaintyon the benchmarked capacity [47], the determination of which would involve an accurate study of thebattery transient dynamics, possibly complemented by gathering of data from cycles interspersedwith longer resting periods of the cell. An adjustment for the transient eﬀects is introduced inSection 3.2.2; the adjusted capacities for the four cells considered in the study are shown in Figure1.Figure 1: True capacity values for cells RW9 (blue), RW10 (green), RW11 (red), RW12 (brown)adjusted as explained in Section 4.2.2 9igure 2 shows the RW9 data for voltage, current intensity and temperature for the ﬁrst andthe last 50 RW steps respectively: it can be noticed that there is a consistent diﬀerence betweenthe two voltage curves: in the ﬁrst steps, the voltage rarely hits the range boundaries of 3.2 and 4.2V; while in the last steps, this happens more frequently. As a result, the last steps are also shorter,as many of them last for less than 5 minutes: therefore, the total time is only 1 hour comparedto the 3.5 hours of the ﬁrst RW steps. There is also a signiﬁcant diﬀerence in the temperature,which is higher in the second sub-ﬁgure. All these eﬀects are a clear indication of the cell healthdegradation.Figure 2: Measurements of voltage (red), current intensity (blue) and temperature (green) duringthe ﬁrst 50 RW steps (left sub-ﬁgure) and the last 50 RW steps (right sub-ﬁgure) of cell RW9. Long short-term memory (LSTM) is a type of Recurrent Neural Network (RNN), i.e. a multi-layerNN. The LSTM architecture was originally introduced by Hochreiter and Schmidhuber [48] withthe purpose of overcoming the vanishing or exploding gradients problem [49], by allowing constanterror ﬂow through self-connected units embedded in the LMST cell. This key feature of LSTMmakes it capable of learning long-term dependencies, as opposed to vanilla-RNN.The computational unit of a NN is the neuron, often called node or cell. The LSTM-NN has aparticular neuron, called LSTM cell or memory cell, which will be explained in the following basedon the description in [50] and [51]. The state of the network at the k -th LSTM cell, ( c k , H k ), is10omposed of: the cell state c k , which encloses the learnt information up to step k ; and the hiddenstate H k , which is the output of the cell. The network state ( c k , H k ) is fed back as input by the cellat the next step, k + 1, which can optionally modify the state by adding or removing information.The learnable weights of an LSTM layer are: the input weights W I , the recurrent weights

W R ,and the bias B . Inside the LSTM cell, the network state is modiﬁed through three main steps thatare controlled by gates: • Forget : a “forget gate” decides how much information to keep from the previous cell state,through a sigmoid activation function σ ( x ) = [1 + exp( − x )] − , f k +1 = σ ( W I f x k +1 + W R f H k + B f ) . (2) • Update : an “input gate” decides which are the values that shall be updated, again through asigmoid function, i k +1 = σ ( W I i x k +1 + W R i H k + B i ); (3)in addition, another sigmoid or hyperbolic tangent layer produces new candidate values ˜ c t +1 that could be used to update the cell state,˜ c k +1 = σ ( W I ˜ c x k +1 + W R ˜ c H k + B ˜ c ) or ˜ c k +1 = tanh( W I ˜ c x k +1 + W R ˜ c H k + B ˜ c ) . (4)The combination of these two procedures, added up to the product of the previous cell stateby the forget gate, creates an update to the cell state, c k +1 = f k +1 (cid:12) c k + i k +1 (cid:12) ˜ c k +1 , (5)where (cid:12) indicates the element-wise multiplication of vectors.11 Output : Lastly, the sigmoidal “output gate” o k +1 = σ ( W I o x k +1 + W R o H k + B o ) (6)decides how much of the information carried by the newly updated cell state ought to beadded to the hidden state, H k +1 = o k +1 (cid:12) σ ( c k +1 ) or H k +1 = o k +1 (cid:12) tanh( c k +1 ) , (7)where a sigmoid or tanh function has been applied to the updated cell state c k +1 .At the end of the process, the weights W I , W R and the biases B of a LSTM cell are concatenationsof each gate’s weights and biases: W I =  W I i W I f W I ˜ c W I o  , W R =  W R i W R f W R ˜ c W R o  , B =  B i B f B ˜ c B o  . (8) Using exclusively sensor data, a minimum and suﬃcient set of input features for the Deep LSTMregression network (D–LSTM–RN) is designed to account for all the degradation acceleration fac-tors, without redundancy and without the need for State of Charge (SoC) estimation. Stress factorsduring the j -th RW step are therefore represented by six features: · V j , initial voltage at the beginning of the cycle; · ∆ V j , cycle signed diﬀerence between initial and ﬁnal voltage values; · ∆ t j , duration of the cycle; 12 T min,j , minimum temperature during the cycle; · T max,j , maximum temperature during the cycle; · ¯ I j , mean current during the cycle.Temperature and current features are directly associated to the temperature- and high-discharge-rate-related battery degradation factors mentioned in Section 2.1; ∆ t j accounts for the cyclingfrequency in these experiments; while voltage features are suﬃcient to express the exposure toelevated voltages, DoD and cycle bandwidth. The need for manual feature engineering is thusremoved, as this process is achieved in the deep hidden layers of the D–LSTM–RN structure as partof the learning problem. The capacity-fade-prediction problem is here considered as a multivariate-sequence-to-one-scalarregression problem using D–LSTM–RN. The target variable may be the capacity drop between anypair of reference discharge cycles p and n , occurring at time points t p < t n . The capacity fade inthe period [ t p , t n ] is modelled as the cumulative eﬀect of all RW charge and discharge steps that thecell experienced between t p +1 and t n − (excluding reference charge/discharge cycles); the sequencelength varies and it is in the order of S ≈ · Short-term prediction: we assume that the capacity C ( t p ) at cycle p is known, and the targetvariable is ∆ C ( t p , t n ) = C ( t p ) − C ( t n ) , (9)where p and n are two consecutive reference cycles, and C ( t p ) is the true capacity at t p ; · Long-term prediction: we assume that the only known capacity is the cell nominal capacity,and the target variable is 13 C ( t p , t n ) = ˆ C ( t p ) − C ( t n ) , (10)where p and n are two consecutive reference cycles, and ˆ C ( t p ) is the capacity estimated bythe D-LSTM-RN at cycle p .The true capacity used for short-term prediction is not directly computed from Equation 1. Infact, in order to account for the batteries not having reached equilibrium (Section 2.2), a correctionhas been introduced: the voltage curves of each reference discharge have been interpolated witha monotone Hermite spline, which allowed to extrapolate how much longer it would have takenif the cycles had started from the threshold voltage value of 4.2 V, instead of the observed ones.This is but a ﬁrst step towards a more accurate capacity estimation, but it allows to introducea small correction (Figure S2 in the Supplementary Material). Note that, in the long-term case,data pertaining to the reference cycles are not used at all, making this approach more useful inpractice; however, due to the use of the estimated capacity at cycle p , the long-term model suﬀersfrom cumulative prediction error.Given the aforementioned suﬃcient set of features, the battery degradation model can be ap-proximated eﬀectively by D–LSTM–RN in spite of the non-linearities, dynamics, and time-variantcharacteristics of the true ageing process. The structure of the deep learning model is illustrated inFigure 3 with one training example. The model consists of: an input layer through which the sixfeatures are normalised and the multivariate sequence of S ≈ L of N = 200 LSTM cells that outputs the whole sequence of hidden states H ( t ) ( t = 1 , . . . , S ); a second LSTM layer L of N = 200 LSTM cells that outputs the last hiddenstate H ( S ); a fully connected layer L ; and a regression layer that predicts the capacity fade.Training the network consists of optimizing the network parameters to minimise the loss func-tion, here the mean squared error between the true and predicted target values. The networkparameters are the input weights W I ∈ R N × , W I ∈ R N × N , W I ∈ R × N and biases B ∈ R N × , B ∈ R N × , B ∈ R for L , L , and L , respectively; and the recurrent weights W R ∈ R N × N and W R ∈ R N × N of LSTM layers L , and L , respectively. The model istrained using back propagation through time and stochastic gradient descent method with batch14 … … … … … … 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 𝑗2 𝐿𝑆𝑇𝑀 𝑁 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 𝑖1 𝐿𝑆𝑇𝑀 𝑁 𝑉 𝑡 ∆𝑉 𝑡 ∆𝑡 𝑡 𝑇𝑚𝑖𝑛 𝑡 𝑇𝑚𝑎𝑥 𝑡 𝐼ҧ 𝑡 𝐶 Given

𝐶ሺ𝑇𝑝ሻ ? 𝐶ሺ𝑇𝑛ሻ 𝑇0 𝑇𝑝 +1 𝑇𝑛 -1 Time

𝑆 ≈ 1500 cycles 𝑡 = 1, … , 𝑆 𝑘 𝑡ℎ RW: 𝑋 𝑘 ∈ ℝ Sequence input layer 𝐿 : hidden LSTM layer 1 𝐿 : LSTM layer 2 𝐻 ሺ 𝑡 ሻ ∈ ℝ 𝑁 × 𝑡 = , … , 𝑆 ℎ 𝑖 ሺ𝑡ሻ ℎ 𝑗 ሺ𝑡ሻ 𝐻 ሺ𝑡 − 1ሻ 𝐻 ሺ 𝑆 ሻ ∈ ℝ 𝑁 × 𝐻 ሺ𝑡 − 1ሻ 𝐿 : F u ll y c onn ec t e d l a y e r R e g r e ss i on ou t pu t l a y e r 𝑦 ∈ ℝ C a p a c i t y f a d e ∆ 𝐶 ሺ 𝑇 𝑝 , 𝑇 𝑛 ሻ Figure 3: Structure of the deep LSTM regression network with one RW example sequence of cycles.size b = 10, gradient threshold τ = 10, and constant learning rate γ = 0 .

03. Cell RW9 is used fortraining the D–LSTM–RN model; RW10, RW11, RW12 are used for testing. The computationaltime in the training phase is in the range of 1 hour for 600 iterations, and it is negligible in thetesting phase.

To evaluate accuracy in prediction, we consider the Root Mean Squared Error:RMSE( ˆ

C, C ) = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 (cid:0) C i − ˆ C i (cid:1) . (11)In particular, note that we compare RMSEs on the capacity scale, despite the target variable beingthe capacity drop between two consecutive reference cycles. This is done to ease the comparisonbetween the diﬀerent methods we used, and with other works in the literature using the same data.A normalised version where the deviation is divided by the true capacity is also provided:RMSE norm ( ˆ C, C ) = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 (cid:18) C i − ˆ C i C i (cid:19) . (12)15he data-gathering for the cells under study extended well beyond their End of Life (EoL): this, infact, is deﬁned to occur when the present capacity of the battery reaches 70% or 80% of its nominalcapacity [52, 53]. SoH prediction after EoL is practically less important and it is generally ignoredin some works in the literature: for this reason, along with the two metrics RMSE and RMSE norm computed on the whole test set, we report their RMSE EoL and RMSE

EoLnorm counterparts computedon data up to the cell EoL, deﬁned to occur when the capacity is 80% of the nominal capacity.Figure 4 depicts capacity fade prediction results using the D-LSTM-RN models described inSection 3.2.2. Both short-term and long-term predictions for the four batteries RW9 (training),RW10, RW11, and RW12 are shown. Predictions are compared to the true capacity values, and thenormalised error metrics are displayed for each cell. It emerges that the short-term predictions arein all cases very close to the true values, with RMSE norm ranging from 1.8% to 7.25% and RMSE

EoLnorm from 1.15% to 1.80%. The predictions for the long-term case suﬀer, as expected, from cumulativeerror which is reﬂected in higher RMSE norm values (from 6.8% to 13.45%). However, when consid-ering the capacity fade only up to the cell EoL we get much smaller errors: the RMSE

EoLnorm is lessthan 1% for cell RW10 and about 2% for cell RW11, while it increases to almost 9% for RW12.16 hort-term Long-term R W RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue

RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue R W EoL RMSE normEoL : norm : Time C apa c i t y ( A h ) PredictedTrue

EoL RMSE normEoL : norm : Time C apa c i t y ( A h ) PredictedTrue R W RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue

RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue R W RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue

RMSE norm : normEoL : Time C apa c i t y ( A h ) PredictedTrue

Figure 4: Short-term (left column) and long-term (right column) capacity fade prediction resultsusing D-LSTM-RN, together with the considered error metrics. Predictions are shown as red crosses;true capacity values are shown as green stars. The ﬁrst row (cell RW9) reports the training errorof the models. 17

Multivariable Fractional Polynomials

The Multivariable Fractional Polynomial (MFP) approach of Sauerbrei and Royston [54] consistsin using an algorithm to ﬁnd the best input transformations in a multivariable linear regression.A multivariable linear regression assumes a linear relationship between the response, or targetvariable, y and a set of inputs (also called features or covariates) x , ..., x p , y = E [ y | x , ..., x p ] + ε = β + β x + ... + β p x p + ε (13)where β , ..., β p are the regression coeﬃcients, ε is an error term, and E [ y | x , ...x p ] = β + β x + ... + β p x p is the expected value of y conditioned on x , ..., x p . Note that linearity is assumed withrespect to the regression coeﬃcients β , ..., β p , not necessarily to x , ..., x p : on the contrary, it isimportant to consider possible nonlinear contributions from the inputs, which if not accounted formay lead to misspeciﬁed ﬁnal models [54]. The MFP method, implemented in R with the mfp package [55], is chosen to this end, as the inputs are transformed by using the most suitable FPfunctions.Given a unidimensional input x , a FP function of ﬁrst degree is deﬁned as x l , where the power l can be either integer or fraction, positive or negative, from the predeﬁned set A = { -2,-1, -0.5, 0, 0.5, 1, 2, 3 } where x ≡ log x . The best l in the set A is considered to be that yielding the lowest deviance d = − (cid:96) ( ˆ β ML ), where (cid:96) ( ˆ β ML ) is the maximised log-likelihood function. In a multivariate settingwith p > mfp package allows for FPs of degree m >

1: the second-degree polynomial is preferred over the ﬁrst-degree on the basis of a χ test,and so on. Further details about the MFP method with FPs of a generic degree m can be found in1854, 56, 57]. For the analysis presented in this article, FPs with maximum permitted degree m = 2were considered, but only ﬁrst-degree models were eventually selected by the algorithm.Considering a training set of n labelled pairs ( y , x l ), where x l is the q -dimensional vector ofinputs x , ..., x p transformed according to the MFP algorithm and l = ( l , . . . , l q ) the set of selectedpowers, the linear model can be written as y = E [ y | ˜ X ] + ε = ˜ X β + ε , (14)where: y is a vector of n observations of the target variable; ˜ X is a n × ( q + 1) matrix assumed tohave full rank q + 1, containing a column of 1’s in the ﬁrst position to account for the intercept term β , and the transformed inputs in the remaining columns; and ε is a vector of n independent errorterms that are here assumed to be N (0 , σ )-distributed. In general, q (cid:54) = p due to the possibility ofintroducing an m -degree polynomial for each input x , ..., x p ; however, q = p in our case where onlyﬁrst-degree polynomials were selected. Given the training set, one can learn the linear relationshipthrough the least squares criterion, i.e. by solvingˆ β = argmin β ∈ R q (cid:107) y − ˜ X β (cid:107) . (15)Despite constituting such a simple setup, linear regression proved to work extremely well inmany non-trivial contexts and diﬀerent applications, e.g. [58, 59, 60]. An attractive characteristicof multiple linear regression is the interpretation of the regression coeﬃcients β j for j = 1 , ..., q ,which may be read as the change in E [ y | x l , ...x l q q ] for an increase of one unit in x l j j , holding allother features constant. Importantly, this enables immediate identiﬁcation of the most relevantfeatures, and allows to study the eﬀect of one single feature while adjusting for the eﬀects of theothers. In addition, the coeﬃcient of determination R can be interpreted, in the context of linearregression, as the proportion of total variability in the outcomes that is explained by the model:19 = 1 − RSSTSS with RSS = (cid:80) ni =1 ( y i − ˆ y i ) residual sum of squaresˆ y i = ˆ β + (cid:80) qj =1 ˆ β j x lq TSS = (cid:80) ni =1 ( y i − ¯ y ) total sum of squares¯ y = n (cid:80) ni =1 y i (16)with 0 ≤ R ≤

1. For this reason, R provides an interesting diagnostic measure. However, since R increases every time new features are added to the regression equation, one should preferablyconsider its penalised version R adj = 1 − RSS n − p − n − , (17)called adjusted coeﬃcient of determination , that increases only when newly added features increasethe variability explained by the model.It is to avoid inclusion of overabundant features which do not add valuable information, thatthe regression models considered in this work have undergone a variable selection mechanism: astepback procedure based on the AIC criterion. As a consequence, only the subset of the mostsigniﬁcant variables to predict the outcome is kept into the model. Variable selection is importantfor both theoretical and practical reasons, such as: achieving a reduction of the model variance,thus improving the prediction accuracy; facilitating the model interpretation and providing a cleanerview of the data-generating process; and reducing the computational and usage time of the model,which makes it more portable. Selecting variables is an important and delicate point in particularfor explanatory models such as linear regression, since including or excluding highly correlatedfeatures may lead to signiﬁcantly diﬀerent interpretations of their eﬀects [61].20 .2 Implementation The structure of the dataset is such that capacity measurements are only available at the n referencecycles performed throughout the cell life: as a consequence, inputs from the RW steps occurringbefore each reference cycle need to be summarised, to construct a n × ( p + 1) input matrix. Regard-ing the number and choice of features, quantities reﬂecting the well-known stress factors describedin Section 2.1 were chosen, along with covariates describing characteristics of this particular exper-iment. The following inputs have been considered for each reference cycle i occurring after m RWdischarge steps and l RW charge steps: · C prev = C i − , true capacity at the previous reference discharge cycle; · ∆ t = (cid:80) m + lk =1 ( t endik − t startik ), total duration of the cycle; · s V in = m (cid:80) mk =1 v ink , average of the initial voltages of each k -th RW discharge step; · Ě ∆ V = m (cid:80) mk =1 ( v endik − v startik ), average of the voltage drops of the RW discharge steps; · ¯ I = m (cid:80) mk =1 ( I ik ), average of the current intensities of the RW discharge steps; · s T min = m (cid:80) mk =1 ( T minik ), average of the minimum temperatures of each RW discharge step; · s T max = m (cid:80) mk =1 ( T maxik ), average of the maximum temperatures of each RW discharge step; · λ = short steps m , proportion of steps in which the voltage reaches the boundaries before thedefault duration (300 s); · ∆ t rest , a nonlinear smooth saturation function of the time elapsed between the last RW stepof the cycle and the beginning of the reference cycle:∆ t rest = 11 + exp (cid:20) − (cid:0) ˜ tσ t< (cid:1)(cid:21) with ˜ t = t start,refi − t end, ( m + l ) i σ t< is the variance of the observations such that ˜ t <

20 h; · C approx = m (cid:80) mk =1 ( ˆ C ik ), average of rough capacity estimates at each RW discharge step,obtained through a preliminary linear regression model described in the Supplementary Ma-terial. The purpose of C approx is to compensate for the potential information loss that comesas a consequence of summarising a large number of RW steps in each cycle. Three regression models were considered:a) a model including all features presented in Section 4.2.1, but C prev and C approx ;b) a model including all features presented in Section 4.2.1, but C prev ;c) a model including all features presented in Section 4.2.1, with no exceptions.Model c assumes that C prev , the true capacity measured at the most recent reference cycle, isknown and can be used to add important information to the regression model. Model b reﬂects themore realistic scenario in which C prev is undetermined, but it includes C approx as an approximatesurrogate with the purpose of maximising the information extracted from the data sensed duringthe RW discharges; model a relies uniquely on the data directly measured by sensors, withoutadditional true or estimated capacity values. In each case, the target variable for the analysis is thechange in the battery capacity at time t compared to its nominal capacity, where capacity valueshave been adjusted as described in Section 3.2.2: y = ∆ C ( t ) = C ( t ) − C ( t ) . (18)After having undergone the MFP and variable selection procedures, the three ﬁnal models are: MFPa ∆ C a,i = α + α · s T min,i + α · s T max,i + α · λ i + ε i MFPb ∆ C b,i = β + β · C approx,i + β · ∆ t rest,i + β · s T min,i + β · λ i + β · s V in,i + ε i ; MFPc ∆ C c,i = γ + γ · C prev,i + γ · ∆ t rest,i + γ · C approx,i + γ · s T min,i + ε i .Cell RW9 is used for training each model, while RW10, RW11 and RW12 are used for testing. The trained models are reported in Table 1: the estimated regression coeﬃcients are shown togetherwith their standard errors, the corresponding p-values, and the R and R adj coeﬃcients.23ovariate Est Std err p-value R R adj MFPa intercept 0.43 0.17 0.0142 0.985 0.984 s T min s T max -0.53 0.08 9.13e-08 λ C approx -3.99 0.40 1.83e-11∆ t rest -0.26 0.07 0.60e-03 s T min -0.06 0.01 8.39e-09 λ s V in -0.10 0.04 0.0230MFPc intercept 4.63 0.21 < C prev -1.36 0.13 4.87e-12∆ t rest -0.34 0.04 5.23e-10 C approx -0.87 0.13 7.83e-08 s T min -0.03 0.005 1.05e-05Table 1: Results for the three models MFPa, MFPb and MFPc, reporting the estimated regressioncoeﬃcients (Est), standard errors (Std err) and corresponding p -values, together with the R andthe adjusted R coeﬃcients.MFPa includes only s T min and s T max , together with the proportion λ of steps interrupted due tothe voltage reaching the boundaries before the default duration. The temperatures are the mostsigniﬁcant variables. Increasing s T max by one degree while holding all other features constant impliesan increase in the capacity drop of 0.54; this eﬀect is counterbalanced by s T min having an oppositecoeﬃcient. When they have close values, the eﬀect of the two variables is close to 0, meaning thata narrow temperature range, hence a controlled temperature variation, does not aﬀect the batteryhealth seriously. Concerning λ , it also appears strongly signiﬁcant and it is included as a cubiceﬀect with a positive regression coeﬃcient.With MFPb we included the average of the approximate capacity for each RW step, C approx .This led to increased values of both R and R adj : the square root of C approx is in fact the mostsigniﬁcant feature of the model, correctly associated with a negative coeﬃcient: a higher value ofthe square root of C approx involves a smaller capacity gap. A mild beneﬁcial eﬀect is attributed alsoto s V in and s T min : the initial voltage is not extremely signiﬁcant; while the minimum temperaturecontinues to have a very low p-value, which makes it the second most important feature in themodel. s T min is the only temperature variable selected for MFPb, and its eﬀect is smaller than and24pposite to that of MFPa: without s T max in the model, s T min is left alone to account for the eﬀect ofextreme temperatures. ∆ t rest is the third covariate for importance and it produces a reduction inthe capacity variation: in fact, as conﬁrmed from studies such as [62, 63], an apparent increase inthe capacity of LIBs can be achieved by allowing the battery to rest for some time. The resting timewas not selected in MFPa: this might be ascribed to the correlation that exists between ∆ t rest and λ , as λ has a reduced eﬀect in MFPb: its p-value changed from 0.0007 to 0.0146 and the feature isnow included as a linear eﬀect with a smaller coeﬃcient estimate. Concurrently, since λ is both acause and a consequence of capacity fade in the experimental settings of the NASA datasets, it islikely that the reduction in its signiﬁcance compared to MFPa is strongly connected to the presenceof C approx .MFPc is the result of variable selection starting from the full set of inputs described in Section4.2.1, including the most recent true capacity value, C prev . The inclusion of C prev adds a greatdeal of exact information to the model, which unsurprisingly results in a further increase of both R and R adj , almost reaching their maximum value of 1. The results of MFPc seem consistentwith those of MFPb: the most important feature is now the square root of C prev , which also hasthe larger (in magnitude) estimated coeﬃcient. Interestingly, the second most signiﬁcant covariateis now ∆ t rest , with a stronger eﬀect also in its coeﬃcient: this could again be explained with itsrelation to λ , not present in this model. However, once again, the absence of λ should be alsorelated to its connection with C prev and C approx : this emerges clearly as λ was strongly signiﬁcantin MFPa, where neither C prev nor C approx were considered; less important in MFPb where C approx was included; not present at all in MFPc where both the capacity measures are part of the model. C approx continues to be extremely signiﬁcant, but it is now included linearly and it is less importantthan in MFPb. Finally, s T min has an even smaller eﬀect than in MFPb, but persists in being animportant input to predict the change in capacity.For the sake of interpretability and explanation, MFPc seems the best choice: it has the highestcoeﬃcients of determination, it is reasonably sparse and all the included variables are extremelysigniﬁcant. However, it assumes that the true capacity is known at every previous cycle, which ishardly the case: then, MFPb constitutes a valid alternative as it takes advantage of approximatecapacity estimates derived directly from the steps sensor data. However, it is noticeable that MFPa25lso reaches high R and R adj while comprising only three features which can be very easily obtainedin practice.When it comes to accuracy in prediction, the normalised RMSEs are presented in Figure 5, wherethe predicted capacity fade according to each model is compared to the true values. The grey arearepresents the 90% prediction interval, which has been computed using basic asymptotic results atalmost no additional computational cost. The ﬁrst row (cell RW9) reports the training error of eachmodel. The results show that the diﬀerence in the performances of the three considered modelsis not huge: all of them have good predictive accuracy with RMSE norm and RMSE EoLnorm rangingrespectively from 2.22% to 11.69% and from 3.21% to 7.18%. The errors reﬂect the similarities anddissimilarities in the production phase and operational history of the four cells, which also emergesin Figure 1 in Section 2.2. Considering RMSE norm , there is a consistent improvement going frommodel MFPa to MFPb and MFPc for batteries RW10 and RW11, while the minimum RMSE

EoLnorm is obtained in model MFPa; for cell RW12, remarkably, we obtain better results with model MFPaaccording to both the error metrics. 26

FPa MFPb MFPc R W RMSE norm : EoL