State-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
Alexandre Barbosa de Lima, Maurício B. C. Salles, José Roberto Cardoso
SS TATE - OF -C HARGE E STIMATION OF A L I -I ON B ATTERY USING D EEP F ORWARD N EURAL N ETWORKS
A P
REPRINT
Alexandre Barbosa de Lima
Polytechnic School of the University of São PauloDepartment of Energy and Automation [email protected]
Maur´’cio B. C. Salles
Polytechnic School of the University of São PauloDepartment of Energy and Automation [email protected]
José Roberto Cardoso
Polytechnic School of the University of São PauloDepartment of Energy and Automation [email protected]
September 22, 2020 A BSTRACT
This article presents two Deep Forward Networks with two and four hidden layers, respectively, thatmodel the drive cycle of a Panasonic 18650PF lithium-ion (Li-ion) battery at a given temperatureusing the K-fold cross-validation method, in order to estimate the State of Charge (SOC) of the cell.The drive cycle power profile is calculated for an electric truck with a 35kWh battery pack scaled for asingle 18650PF cell. We propose a machine learning workflow which is able to fight overfitting whendeveloping deep learning models for SOC estimation. The contribution of this work is to presenta methodology of building a Deep Forward Network for a lithium-ion battery and its performanceassessment, which follows the best practices in machine learning.
Keywords
Electrical Energy Storage · Li-ion battery · State-Of-the-Charge · Deep Learning · Artificial Intelligence
Energy storage acts as a mediator between variable loads and variable sources. Electricity storage is not new. Voltainvented the modern battery in 1799. Batteries were implemented in telegraph networks in 1836 [1]. The Rocky RiverHydroelectric Power Plant in New Milford, Connecticut, was the first major electrical energy storage (EES) systemproject in the United States. The plant used hydroelectric storage technology through Pumped Hydroelectric Storage(PHS) pumping.This research is motivated by the study of the application of EES systems in the area of sustainable energy sources.Hannan et al. [2] present a detailed taxonomy of the types of energy storage systems taking into account the form ofenergy storage and construction materials: mechanical, electrochemical (rechargeable and flow batteries), chemical,electrical (ultracapacitor or superconducting magnetic coil), thermal and hybrid.Recently, industry and academia have given great importance to the electrification of the transport system, given the needto reduce the emission of greenhouse gases. Hybrid electric vehicles, such as the Toyota Prius, or fully electric vehicles,such as the various Tesla models, the Nissan Leaf and the GM Volt, are successful cases in the United States [3].The advancement of EES technologies enabled the emergence of the iPod, smartphones and tablets with lithium-ion(li-ion) batteries. If renewable sources, such as solar and wind, become prevalent, the EES will be one of the critical a r X i v : . [ ee ss . SP ] S e p tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks A P
REPRINT components of the electricity grid, given the intermittent nature of these energy sources [3, 4]. EES systems arenecessary even when renewable sources are connected to the grid, because it is necessary to smooth the energy supply.For example, the EES of a building or factory can be charged during hours of reduced demand and supply/supplementenergy demand during peak hours.EES technology consists of the process of converting a form of energy (almost always electrical) to a form of storableenergy, which can be converted into electrical energy when necessary. EES has the following functions: to assist inmeeting the maximum electrical load demands, to provide time-varying energy management, to relieve the intermittencyof renewable energy generation, to improve energy quality/reliability, to serve remote loads and vehicles, to support therealization of smart grids, improve the management of distributed/standby power generation and reduce the import ofelectricity during peak demand periods [4, 5].An EES (which can connect to the network or operate in stand-alone mode) consists of two main subsystems: i) storageand ii) power electronics. Such subsystems are complemented by other components that include monitoring and controlsystems [1].Lithium-ion battery technology has attracted the attention of industry and academia for the past decade. This ismainly due to the fact that lithium-ion batteries offer more energy, higher power density, higher efficiency and lowerself-discharge rate than other battery technologies such as NiCd, NiMH, etc. [6].The efficient use of the lithium-ion battery requires the supervision of a Battery Management System (BMS), as it isnecessary that the battery operates under appropriate conditions of temperature and charge (State-Of-Charge (SOC)) [7].The cell temperature produces deleterious effects on the open circuit voltage, internal resistance and available capacityand can also lead to a rapid degradation of the battery if it operates above a given temperature threshold. Therefore, themodeling of the battery is of paramount importance, since it will be used by the BMS to manage the operation of thebattery [6].There are two methods of battery modeling: i) model-driven and ii) data-driven (based on data that is collected from thedevice) [8].Electrothermal models, which belong to the category of model-driven methods, are commonly classified as: i)electrochemical or ii) based on Equivalent Circuit Models (ECM) [6, 7].Electrochemical models are based on partial differential equations [9] and are able to represent thermal effects moreaccurately than ECM [10]. However, the first class of models requires detailed knowledge of proprietary parameters ofthe battery manufacturer: cell area, electrode porosity, material density, electrolyte characteristics, thermal conductivity,etc. This difficulty can be eliminated by characterizing the battery using a thermal camera and thermocouples. But thissolution is expensive, time consuming and introduces other challenges such as the implementation of dry air purgesystems, ventilation, security, air and water supply, etc. Electrochemical models demand the use of intensive computingsystems [7].On the other hand, the ECM-based approach has been used for computational/numerical analysis of batteries [7]. Inthis case, the objective is to develop an electrical model that represents the electrochemical phenomenon existing in thecell. The level of complexity of the model is the result of a compromise between precision and computational effort.Note that an extremely complex and accurate ECM may be unsuitable for application in embedded systems.The most recent literature show that the machine learning approach, based on deep learning algorithms is the state ofthe art in the area [8, 11–20]. Machine learning is a branch of AI, as will be seen in section 2.Chemali et al [12] compared the performance of Deep Neural Networks (DNN) with those of other relevant algorithmsthat have been proposed since the second half of the 2000s. The article shows that the SOC estimation error obtainedwith deeep learning is less than the following methods:• Model Adaptive-Improved Extended Kalman Filter (EKF) [21];• Adaptive EKF with Neural Networks [22];• Adaptive Unscented Kalman Filter (AUKF) with Extreme Machine Learning [23];• Fuzzy NN with Genetic Algorithm [24]; and• Radial Bias Function Neural Network [25].Estimating the SOC of lithium ion cells in a BMS by means of deep learning offers at least two significant advantagesover model driven approaches, namely: i) neural networks are able to estimate the non linear functional dependencethat exists between voltage, current and temperature (observable quantities) and unobservable quantities, such as SOC,with great precision and ii) the problem of identifying ECM parameters is avoided.2tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT
In relation to the literature already mentioned, this paper takes a step back as far as reasons related to the methodologyadopted for the construction and performance measurement of a deep learning model are concerned.First, we work with Deep Feedforward Networks (DFN) as a baseline family model, as they form the basis of manyimportant commercial applications [14]. Our preliminary results show that a simple architecture with four hidden layersis already quite interesting. We do not work with Recurrent Neural Networks (RNN) because they are suitable forsequential data processing problems such as machine translation.Second, the central challenge in machine learning is that our model has to perform well on new, unseen inputs, not justthose on which our algorithm was trained. This ability is called generalization. What separates machine learning fromtraditional optimization is that we want the generalization error, or test error, to be as low as possible [14]. To do this,we need a test set. In this work we followed the best practice of breaking the test set into two separate sets: validationand test sets [26]. That is, the generalization power of the models were measured against validation and test sets. Thevalidation set is used to fine tune the network hyperparameters.Third, as a corollary of generalizatin, the central problem in machine learning, namely overfitting, has not been properlyaddressed, to the best of our knowledge, in the recent mainstream literature of SOC estimation of Li-ion batteries usingdeep learning. Thus, we have to apply concepts from statistical learning theory [14]. Overfitting occurs when thegap between the training error and generalization error is too large. The processing of mitigating overfitting is calledregularization, which can be defined as any modification we make to a learning algorithm with the goal of reducing itsgeneralization error but not its training error. To see this phenomenon, one has to plot the training and generalizationlearning curves, see Fig. 1 for instance.Figure 1: Training and generalization errors behave differently. The horizontal axis represents the number of epochs.The vertical axis is the loss function. Note that the model starts to overfit around the fifth epoch.Note that the literature of machine learning presents a solid framework for solving deep learning problems, such asmodel evaluation and the attack on overfitting, among others [14, 26, 27]. The satisfactory result obtained in this articlein terms of a low generalization error takes overfitting into account.Fourth, as mentioned before, we have to consider the optimization problem in the context of deep learning, whichcompletely differs from traditional optimization algorithms in several ways. In this work, we use algorithms withadaptive learning rules, such as RMSProp and Adam, which include the concept of momentum, allowing fasterconvergence than Stochastic Gradient Descent (SGD), at the cost of more computation. The learning rate is one of themost difficult hyperparameters of a artificial neural network (ANN) to be configured as it significantly affects the modelperformance. Note that small values of the learning rate result in a slow convergence of deep learning. On the otherhand, if the learning rate is too large, gradient descent can overshoot the minimum. It may fail to converge, or evendiverge.Fifth, the validation error is estimated by taking the average validation error across K trials. We use a simple, butpopular solution, called K -fold cross-validaton (Fig. 2), which consists of splitting the available training data into twopartitions (training and validation), instantiating K identical models, for each fold k ∈ { , , . . . , K } , and training eachone on the training partitions, while evaluating on the validation partition. The validation score for the model used isthen the average of the K validation scores obtained. This procedure allows network hyperparameters to be adjusted sothat overfitting is mitigated [27, p. 23]. It is usual to use about of the data for the training set, and for thevalidation set. Note that the validation scores may have a high variance with regard to the validation split. Therefore, K -fold cross-validaton help us improve the reliability when evaluating the generalization power of the model.3tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks A P
REPRINT
Figure 2: K-fold cross-validation.
The remainder of the paper is organized as follows. Section 2 presents an overview of AI, machine learning and deeplearning for the reader who is not familiar with the subject. Section 3 presents our experimental results. Finally, section4 presents our conclusions.
The Handbook of Artificial Intelligence presents the following operational definition for Artificial Intelligence (AI) [28]:“Artificial Intelligence (AI) is the part of computer science concerned with designing intelligentcomputer systems, that is, systems that exhibit the characteristics we associate with intelligence inhuman behavior - understanding language, learning, reasoning, solving problems, and so on”.There are two main lines of research in AI: the connectionist and the symbolic. According to Boden [29], both wereinspired by the seminal article entitled A Logical Calculus of the Ideas Immanent in Nervous Activity (1943), by WarrenSturgis McCulloch and Walter Pitts [30], the first modern computational theory . The literature recognizes that theresearch carried out by McCulloch and Pitts is the pioneering work in AI [32].Fig. 3 illustrates the McCulloch and Pitts artificial neuron model.Figure 3: McCulloch and Pitts model There is no consensus on the concept of intelligence. The theory is considered modern because it employs the mathematical notion of computation established by Turing in 1936 [31].
A P
REPRINT
The artificial neuron (or unit) of Fig. 3 has the following characteristics:• input signals (activations) ( a , a , . . . , a n ) are or bits. The external signal a = 1 is known as the bias ;• synaptic weights of the j-th neuron: w , w , . . . , w n ; and• a j denotes the output signal (or output activation) of the jth neuron, given by z j = n (cid:88) i =0 w i,j a i (1) a j = g ( z j ) = { , } (2)where z = f ( a ) is the input function and g ( z ) is a nonlinear activation function. The output is binary (bit orbit ); therefore, the McCulloch and Pitts model is said to have the “ all or nothing ” property.The activation function of the McCulloch and Pitts model is the Heaviside function (unit step function) g ( z ) = (cid:26) , z ≥ , z < (3) g ( z ) = 1 se z ≥ ou g ( z ) = 0 para z < .We take the opportunity to make a necessary digression on the neuron model of Fig. 3, in order to present the intuitionbehind the fact that modern ANN (see Fig. 4) are able to approximate nonlinear functions with arbitrary precision,at least in theory (universal approximation theorem [33]). The explanation below considers a network with only onelayer of N + 1 = M neurons in parallel, where each neuron is excited by the input signal a = { a , a , . . . , a N } , notnecessarily binary. This layer is known as the hidden layer.Figure 4: Example of a densely connected neural net with one hidden layer. The hidden layer has three units ( a (2)0 , a (2)1 ,and a (2)2 ). The input features are the input signals x . The function y = h w ( x ) is called the model or hypothesis.Let’s rewrite (1) in a vectorized form z i = W T a (4)where we adopt the notation W for the vector of synaptic weights, a for the vector of entries and T denotes thetransposition of matrices .Consider the Discrete Fourier Transform (DFT) of an input signal b = { b , b , . . . , b N } given by B [ k ] = N (cid:80) i =0 ( e − j π kM i ) b i , ≤ k ≤ N , otherwise (5) The bias plays the role of the intercept b in the simple linear regression model y = wx + b . This article assumes that vectors are always column vectors, as it is usual in the signal processing area. The imaginary unit is represented by j . A P
REPRINT and the corresponding inverse transformation, called Inverse Discrete Fourier Transform (IDFT) [34] z i = N (cid:80) k =0 ( M e j π kM i ) B [ k ] , ≤ k ≤ N , otherwise (6)Rewriting (6) in vectorized form, we obtain z i = W T B (7)where W = { ( M e j π kM i ) } , ≤ k ≤ N , and B = { B , B . . . B N } .Compare (7) and (4). Note that these equations are equal if a = B . Therefore, Eq. (7) suggests that it would be possibleto represent the function z = f ( a ) through a neural network that uses the weights { M e j π kM i } , ≤ k ≤ N . Eq. (7)looks like a Fourier series.Remember that Fourier showed in 1807 that an arbitrary and aperiodic function f ( t ) defined in a finite interval T canbe reconstructed from a trigonometric series called the Fourier series [35]. So “there is nothing new under the sun”.Electrical engineers are well familiarized with this notion.As mentioned before, the universal approximation theorem states that a feedforward network with a linear input and atleast one hidden layer of artificial units with an non linear activation function can approximate any “function” fromone finite-dimensional space to another with any desired nonzero amount of error, provided that the network is givenenough units [14, 33]. However, the theorem does not say what the number of units in the hidden layer should be. Wealso have no guarantees that the training algorithm will be able to learn that function. This may be due to the existenceof local minimums in the cost function to be optimized.Nowadays, the activation function called REctified Linear Unit (Relu) (see Fig5), given by g ( z ) = max { , z } (8)Figure 5: Relu activation funcion.is commonly used in ANN [14].An ANN is a distributed parallel processing system, inspired by the processing structure of the human brain. The ANNtechnique is a form of non-algorithmic computation of functions.The connectionist approach uses ANN. The symbolic line, sometimes called the symbolic AI (“Good Old Fashioned AI(GOFAI) [29]) follows the logical tradition and had John McCarthy and Allen Newell, among others, as some of itsgreat exponents [36].Since 2006, the connectionist line has gained prominence, due in large part to the fundamental contributions of theresearchers Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, who were awarded the 2018 A. M. Turing Award [37].Nowadays AI research is dominated by systems that use ANN, also known as Deep Learning [14]. Fig. 7 shows a deeplearning model for handwritten digit recognition (a classical problem in computer vision), first solved by LeCun etal [38]. The model has one input layer, four hidden layers, and one output layer. In fact, any Borel function. This concept is beyond the scope of this paper and will not be discussed.
A P
REPRINT
Figure 6: Relationship between AI, machine learning and deep learning.Figure 7: Handwritten Digit Recognition with a 6 layers ANN.
Starting at 1952, Arthur Samuel, the pioneer of machine learning in the era of digital computers [39], wrote a series ofprograms for IBM computers that learned to play checkers through a learning process that was not based on neuralnetworks. The program was shown on TV at 1956, making a strong impression on public opinion [40, 41].Tom Mitchell gives a definition of machine learning [42, p. 2]“ A computer is said to learn from experience E with respect to some class of tasks T and performancemeasure P , if its performance at tasks in T , as measured by P , improves with experience E ”.As Samuel have shown, checkers is a problem that can be solved using machine learning algorithms. According toMitchell’s definition, we have that:• E = “ training ” of the program, which consists of the program playing a sufficiently large number of checkersgames against itself.• T = playing games.• P = percent of games won against opponents.Machine learning algorithms can be divided into four groups [26]:• Supervised Learning: in which the man provides the input data (training set), as well as the expected responsesfrom the data (a label is attached to each data ). The output of the processing is a set of rules, which will beapplied on new input data, in order to produce original responses (inferences of the classification or regressiontype). The work was developed during Samuel’s free time. For example, hundreds of thousands of games.
A P
REPRINT • Unsupervised Learning: for training a descriptive model capable of recognizing patterns. There is no teacher.Applications: discover shopping patterns in supermarkets and data clustering tasks (eg, identifying groups ofpeople who share common interests such as sports, religion or music).• Self-supervised Learning: is a type of supervised learning; there are still labels involved (because learningneeds to be supervised by something), but they are generated from the input data, usually using a heuristicalgorithm.• Reinforcement Learning: a program is trained to interact in a environment through actions to achieve a goal.Learning uses a rewards and/or punishment mechanism. The AlphaZero algorithm , developed by GoogleDeepMind, uses this technique [43].Supervised learning, which is used in this work, involves the training of a model or hypothesis. To this end, given a setof training data (or training set), the learning algorithm aims to fit a model that minimizes a metric of choice.In 1959, Bernard Widrow and Martian E. Hoff discovered the famous machine learning algorithm called Least MeanSquare (LMS) [44] The LMS algorithm belongs to the family of stochastic gradient algorithms and uses the method ofoptimization of the gradient descent, which is also used in deep learning, whose objective is to find the values of theparameters that minimize a given function, called objective function or cost function.Adaline implements supervised machine learning, since the desired response for each input pattern is provided.In 1958, Frank Rosenblatt presented the theory of a hypothetical nervous system called the Perceptron [45]. Perceptronis a single-layer ANN with a learning rule capable of implementing a linear classifier.A Perceptron follows the feed-forward processing model, from left (inputs) to right (output (s)).The 1980s were marked by the discovery of the backpropagation algorithm, which updates the gradient for multilayernetworks [46]. The wide dissemination of the results in the collection Parallel Distributed Processing [46], edited byRumelhart and McClelland, caused great excitement in the areas of Computer Science and Psychology.In fact, the backpropagation algorithm was discovered independently several times [47–50].
This section introduces a very short review of the main concepts about stochastic processes (also called data-generatingprocesses by the literature of deep learning [14]) and statistical learning that are used in this paper. The purpose is justto indicate what is important to be known without entering into the details. The interested reader should refer to theappropriate literature [27, 51].
Definition 2.3.1 (Stochastic Process) . Let T be an arbitrary set. A stochastic process is a family { x t , t ∈ T } , such that,for each t ∈ T , x t is a random variable. (cid:4) When the set T is the set of integer numbers Z , then { x t } is a discrete time stochastic process (or random sequence); { x t } is a continuous time stochastic process if T is taken as the set of real numbers R .The random variable x t is, in fact, a function of two arguments x ( t, ζ ) , t ∈ T, ζ ∈ Ω , given that it is defined over thesample space Ω . For each ζ ∈ Ω we have a realization, trajectory or time series x t . The set of all realizations is calledensemble. Each trajectory is a function or a non-random sequence and for each fixed t , x t is a number.A process x t is completely specified by its finite-dimensional distributions or n -order probability distribution functions,as: F x ( x , x , . . . , x n ; t , t , . . . , t n ) = P { x ( t ) ≤ x , x ( t ) ≤ x , . . . , x ( t n ) ≤ x n } (9)in which t , t , . . . , t n are any elements of T and n ≥ .The first order probability distribution function is also known as Cumulative Distribution Function - CDF.The probability density function - PDF is given by: f x ( x , x , . . . , x n ; t , t , . . . , t n ) = ∂ n F x ( x , x , . . . , x n ; t , t , . . . , t n ) ∂x ∂x . . . ∂x n . (10) AlphaZero defeated the world champion of Go, Lee Sedol, in a challenge sponsored by Google in 2016 in South Korea.
A P
REPRINT
Applying the conditional probability density formula, f x ( x k | x k − , . . . , x ) = f x ( x , . . . , x k − , x k ) f x ( x , . . . , x k − ) , (11)in which f x ( x , . . . , x k − , x k ) denotes f x ( x , . . . , x k − , x k ; t , . . . , t k − , t k ) , repeatedly over f x ( x , . . . , x n − , x n ) we get the probability chain rule f x ( x , x , . . . , x n ) = f x ( x ) f x ( x | x ) f x ( x | x , x ) . . . f x ( x n | x n − , . . . , x ) . (12)When x t is a sequence of mutually independent random variables, (12) can be rewritten as f x ( x , x , . . . , x n ) = f x ( x ) f x ( x ) . . . f x ( x n ) . (13) Definition 2.3.2 (Purely Stochastic Process) . A purely stochastic process { x t , t ∈ Z } is a sequence of mutuallyindependent random variables. (cid:4) Definition 2.3.3 (IID Process) . An Independent and Identically Distributed (IID) process { x t , t ∈ Z } , denoted by x t ∼ IID, is a purely stochastic and identically distributed process. (cid:4)
As mentioned earlier, the central challenge in machine learning is that the algorithm performs well on the test set. Thetraining and test sets are generated by the same data-generating process. Typically, it is assumed that the examples ineach set are independent and that the training and test sets are identically distributed. These assumptions are collectivelyknown as IID.The no free lunch theorem [52] says that, considering the average over all the possible data-generating processes, anymachine learning algorithm has the same error rate when evaluated on previously unobserved examples. That is, thereis not, at least in theory, a machine learning algorithm that is better than all others for all cases.However, the no free lunch theorem is valid only when working with the average over all the possible data-generatingprocesses. Fortunately, that does not happen in real life, as the physical data-generating process of a Li-Ion cell is aresult of the restrictions imposed by the real world over all the possible data-generating processes. Thus, in practicalapplications, it is realistic to think about designing algorithms that have a good performance for a given Li-Ion cell.Going further, the goal of research in machine learning is not to look for a universal learning algorithm. Instead, we haveto understand what the stochastic characteristics of our dataset are, in order to design, validate and test an algorithm thatis efficient for that specific data set.
Interest in ANNs has resurfaced with the advent of the Deep Belief Nets in 2006 [53]. The work of Hinton, Osinderoand Teh demonstrated that a type of DNN could be trained with high efficiency. Their research triggered the currentwave of research in ANN, which popularized the term deep learning.Deep learning denotes the idea of a neural network with multiple hidden layers that has the ability to partition therepresentation of an entry into multiple layers (see Fig.7). The number of layers in a model corresponds to the depth ofthe network [26].The success of deep learning today is due to: i) the emergence of Big Data, which made it possible to store data fortraining in databases with tens of millions of examples, ii) the advent of Graphics Processing Unit (GPU), and iii)advances in algorithms.Fig. 8 illustrates how deep learning works [26]. The variables x and y denote the input signal (training example) andthe desired signal (target), respectively. The function h w ( x ) is the mathematical model or hypothesis. The estimationerror (residual or loss score) is given by e = y − h w ( x ) . At startup, random values are assigned to the w weights of thenetwork, so the value of the initial residue is high. However, in the course of processing the training examples, theweights are adjusted incrementally in the correct direction; at the same time, the value of the loss function decreases.This is the training loop, which, being repeated enough times, typically dozens of iterations over thousands of examples,produces weights that minimize the cost function. A network is considered trained when the minimum of the costfunction is reached.Deep learning has achieved the following advances, all in historically difficult areas of machine learning, such as [18,26]:• nonlinear regression;• superhuman image classification; 9tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks A P
REPRINT
Figure 8: How deep learning works.• voice recognition at an almost human level;• transcription of almost human handwriting;• automatic translation;• text to speech conversion;• autonomous vehicle driving at an almost human level; and• superhuman performance in games like chess, shogi and Go.However, deep learning has limitations. Current ANN architectures do not have the power to track statistical changes ofthe training data in real time. In other words, deep learning is not yet adaptive. Note that training a DNN at a dedicatedworkstation can take days or weeks. This is due to the computational complexity of deep learning.Furthermore, the science of deep learning is not like mathematics or physics, in which theoretical advances can beachieved with a chalk and a blackboard. Deep learning is an engineering science [54], as it does not yet have amathematical formalism like that of the area of adaptive filtering. For example, as we have stated before, there is nodesign criterion for the number of layers in the network, much less for the number of neurons in a hidden layer. Thefield is driven by experimental discoveries. But of course, there are best practices to be followed [26].
In this paper, we follow an adapted version of the supervised machine learning workflow proposed by Chollet (see Fig.9) [26]:1. Choose a reliable dataset. If you do not find a dataset, collect the data of interest, and annotate it with labels.2. Choose how you will measure success on your problem. Which metrics will you monitor on your validationdata?3. Determine your evaluation protocol: K-fold cross-validation? Which portion of the data should you use forvalidation?4. Develop a baseline model with statistical power.5. Develop a model that overfits.6. Regularize your model and tune its hyperparameters, based on performance on the validation data.10tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT
Figure 9: Deep learning workflow.
We selected the 2.9 Ah Panasonic 18650PF Li-ion Battery Data provided by Dr. Phillip Kollmeyer, University ofWisconsin-Madison [55]. Note that this dataset has been used by some of the top works in the area [11, 12, 19, 20].A series of nine drive cycles were made public, among them a Neural Network (NN) cycle, which is the cycle of ourinterest. More specifically, the simulations in this section present the results for data collected at a temperature of o C.The NN drive cycle was designed to have some additional dynamics which are useful for training neural networks. Thedrive cycle power profile is calculated for an electric Ford F150 truck with a 35kWh battery pack scaled for a single18650PF cell.Fig. 10 shows the following 2.9 Ah Panasonic 18650PF Li-ion cell characteristic curves:• temperature ( o C) vs SOC (%);• amp-hours discharged vs time (minutes);• voltage (V) vs time (minutes);• current (A) vs time (minutes);• temperature ( o C) vs time (minutes); and• voltage (V) vs SOC (%).The input data ( x ) or features are: x = v ( t ) (voltage in V), x = i ( t ) (current in A), and x = T ( t ) (temperature in o C), where t denotes time in seconds. We see no reason for the inclusion of extra features in the hypothesis space, asthe other data collected in [55] are: i ) Wh (measured watt-hours, with Wh counter reset after each charge, test, or drivecycle), Power (measure power in watts), and Chamber-Temp-degC (measured chamber temperature in degrees Celsius).The output variable or target ( y ) is the SOC (%). The dataset has , examples, which were divided examples fortraining, validation, and testing, respectively.We applied feature normalization on the input data using the formula x normalized = x − µ x σ x (14)where µ x and σ x denote the mean and standard deviation of x .11tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks A P
REPRINT (a) (b)(c) (d)(e) (f)
Figure 10: in (a), (b), (c), (d), (e), and (f) we have: temperature ( o C) vs SOC (%), amp-hours discharged vs time(minutes), voltage (V) vs time (minutes), current (A) vs time (minutes), temperature ( o C) vs time (minutes), and voltage(V) vs SOC (%), respectively. 12tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT
We choose Mean Absolute Error (MAE) as the performance metric [12] and the K-fold cross-validation as the basicmethod to fight overfitting. Notwithstanding, we have also used weight regularizers and dropout layers with the samegoal.The literature review indicated that the
Python language is the most suitable for this research, for it has several free andopen source frameworks for deep learning. In addition,
Python is the language most used by the machine learningcommunity [26, 56, 57]. Other options like R also have great machine learning libraries [54].Open source deep learning frameworks such as TensorFlow [58], PyTorch [59], MXNet [60] and Microsoft CognitiveToolkit (CNTK) [61] have stood out in recent years. However, TensorFlow 2 and the Keras API offer somedifferentials, such as the possibility of executing codes in Google Collaboratory, or "Colab" [63], just using a browserand with free access to CPU/GPU (and even to Google’s Tensor Processor Unit (TPU)).In addition, TensorFlow offers a browser-based visualization tool called TensorBoard, whose main objective is to helpthe user visually monitor everything that happens inside their model during training [26]. TensorBoard automates somefeatures such as visualizing the learning curve of neural networks.Thus, we decided to use the Python language and the TensorFlow 2 framework in conjunction with the Keras API. Wecoded/prototyped the deep learning model in the Spyder IDE (
Python
Fig. 11 shows the architecture of a densely connected DFN with four (4) layers (two hidden layers). Fig. 12 showsthe learning curves using Adam optimizer and -fold cross-validations. Note that overfitting manifests itself as a gapbetween the validation MAE (red plot) and the training MAE (blue plot) in those figures. (a) (b) Figure 11: (a) and (b): densely connected 4-layer DFN and its schematic version, respectively.We tuned the hyperparameters of the deep learning model using L / L parameter norm penalties and adding two extradropout layers.In L regularization, the cost added to the objective function is proportional to the square root of the sum of the squarevalues of the weight coefficients ( L norm – || w || ), whereas in L regularization the cost added to the objetive functionis proportional to the sum of the absolute values of the weight coefficients ( L norm – || w || ).Fig. 13 shows the architecture of a DFN with six (6) layers, where we have, after the input layer (layer 1), in sequence,two pairs of a units/hidden layer followed by a dropout layer, then the final layer. The neural net uses -foldcross-validations. In Fig. 14, note that overfitting occurs only around epochs. Thus, this model has a greater powerof generalization than the previous ones, as expected. The DFN model with two hidden layers, units/hidden layer, batch size of and without regularization achievesthe best SOC’s estimate on the test set, with a MAE of approximately . . The Caffe2 [62] library has been absorbed by PyTorch.
A P
REPRINT (a) (b)(c)
Figure 12: in (a), (b), and (c), we have learning curves for: units/hidden layer, batch size of and epochs; units/hidden layer, batch size of and epochs; and units/hidden layer, batch size of , and epochs,respectively. Figure 13: DFN with six (6) layers.However, as indicated by the learning curves in Fig. 14, the DFN model with four hidden layers (two pairs of a units/hidden layer followed by a dropout layer with a dropout rate of . ) has a greater power of generalization than theprevious model. The MAE obtained on the test set was approximately . in this case. This paper presents two simple DFN models with two and four hidden layers, respectively, using an optimizer withadaptive learning rules, and the Relu activation function, in order to estimate the State of Charge (SOC) of a Panasonic18650PF lithium-ion battery of the Neural Network (NN) drive cycle of dataset [55] using the K-fold cross-validationmethod.The DFN model with four hidden layers presents a better power of generalization, not only because it has a greatercapacity in terms of more layers, but also due to the application of additional regularization techniques such as dropout14tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT
Figure 14: learning curves for a model with units/hidden layer, batch size of , dropout rate of . .layers and parameter norm penalties. The contribution of this work is to present a methodology of building a DFN for alithium-ion battery and its performance assessment, which follows the best practices in machine learning. References [1] S. N. Laboratories, “DOE/EPRI 2013 electricity storage handbook in collaboration with NRECA,” Sandia NationalLaboratories, USA, Tech. Rep., 2013.[2] M. A. Hannan and et al., “Review of Energy Storage Systems for Electric Storage Vehicle Applications: Issuesand Challenges,”
Renewable and Sustainable Energy Reviews , vol. Vol. 69, pp. 771 – 789, 2016.[3] M. S. Whittingham, “History, Evolution, and Future Status of Energy Storage,”
Proceedings of the IEEE , vol. Vol.100, pp. 1518 – 1534, 2012.[4] X. Luo and et al, “Overview of Current Development in Electrical Energy Storage Technologies and the Applica-tion Potential in Power System Operation,”
Applied Energy , vol. Vol. 137, pp. 511 – 536, 2015.[5] R. H. Byrne and et al, “Energy Management and Optimization Methods for Grid Energy Storage Systems,”
IEEEACCESS , 2017.[6] S. N. Motapon and et al., “A Generic Electrothermal Li-ion Battery Model for Rapid Evaluation of Cell Tem-perature Temporal Evolution,”
IEEE Transactions on Industrial Electronics , vol. Vol. 64, no. 2, pp. 998 – 1007,February 2017.[7] T. Huria and et al., “High fidelity electrical model with thermal dependence for characterization and simulation ofhigh power lithium battery cells,” in
IEEE International Electric Vehicle Conference . IEEE, 2012.[8] L. Ren and et al, “Remaining useful life prediction for lithium-ion battery: A deep learning approach,”
IEEEACCESS , vol. 6, pp. 50 587–50 598, 2018.[9] D. H. Jeon, “Numerical Modeling of Lithium Ion Battery for Predicting Thermal Behavior in a Cylindrical Cell,”
Current Appl. Phys. , vol. Vol. 14, no. 2, pp. 196 – 205, Feb. 2014.[10] J. Li and et al., “An Electrochemical-Thermal Model Based on Dynamics Responses for Lithium Iron PhosphateBattery,”
J. Power Sources , vol. Vol. 255, pp. 130 – 143, June 2014.[11] R. Zhao, P. J. Kollmeyer, R. D. Lorenz, and T. M. Jahns, “A compact unified methodology via a recurrentneural network for accurate modeling of lithium-ion battery voltage and state-of-charge,” in , 2017, pp. 5234–5241.[12] E. Chemali and et al, “State-of-charge estimation of li-ion batteries using deep neural networks: a machinelearning approach,”
Journal of Power Sources , vol. 400, pp. 242–255, 2018.[13] B. Chen, T. Medini, and A. Shrivastava, “SLIDE: In Defense of Smart Algorithms over Hardware Accelerationfor Large-Scale Deep Learning Systems,” arXiv , march 2019.[14] I. GoodFellow, Y. Bengio, and A. Courville,
Deep Learning , 1st ed. MIT Press, 2016.[15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”
Nature , vol. 521, pp. 436–444, 2015.15tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT [16] S. Mallat, “Understanding deep convolutional networks,”
Phil. Trans. R. Soc. A , 2016. [Online]. Available:http://rsta.royalsocietypublishing.org[17] S. Minaee and A. A. Abdolrashidi, “Deep-emotion: Facial expression recognition using attentional convolutionalnetwork,” arXiv , february 2019. [Online]. Available: http://arxiv.org/abs/1902.01019[18] D. Silver and et. al., “Mastering chess and shogi by sef-play with a general reinforcement learning algorithm,”
Science , vol. 362, pp. 1140–1144, dec. 2018.[19] E. Chemali, P. J. Kollmeyer, M. Preindl, R. Ahmed, and A. Emadi, “Long short-term memory networks foraccurate state-of-charge estimation of li-ion batteries,”
IEEE Transactions on Industrial Electronics , vol. 65, no. 8,pp. 6730–6739, 2018.[20] P. Kollmeyer, A. Hackl, and A. Emadi, “Li-ion battery model performance for automotive drive cycles withcurrent pulse and eis parameterization,” in ,2017, pp. 486–492.[21] S. Sepasi, R. Ghorbani, and B. Y. Liaw, “Improved extended kalman filter for state of charge estimation of batterypack,”
Journal of Power Sources , vol. 255, pp. 368–376, 2014.[22] M. Charkhgard and M. Farrokhi, “State-of-charge estimation for lithium-ion batteries using neural networks andekf,”
IEEE Transactions on Industrial Electronics , vol. 57, dec. 2010.[23] J. Du, Z. Liu, and Y. Wang, “State of charge estimation for li-ion battery based on model from extreme learningmachine.”
Control Engineering Practice , vol. 26, pp. 11–19, 2014.[24] Y.-S. Lee, W.-Y. Wang, and T.-Y. Kuo, “Soft computing for battery state-of-charge (bsoc) estimation in batterystring systems,”
IEEE Transactions on Industrial Electronics , vol. 55, no. 1, jan. 2014.[25] W.-Y. Chang, “Estimation of the state of charge for a lfp battery using a hybrid method that combines a rff neuralnetwork, an ols algorithm and aga,”
Electrical Power and Energy Systems , vol. 53, pp. 603– 611, 2013.[26] F. Chollet,
Deep Learning with Python , 1st ed. Manning Publications, 2018.[27] K. P. Murphy,
Machine Learning: A Probabilistic Approach . The MIT Press, 2012.[28] R. Anderson et al, “Chapter I – Introduction,” in
The Handbook of Artificial Intelligence , A. Barr and E. A.Feigenbaum, Eds., vol. 1. Elsevier, 1981.[29] M. A. Boden,
AI Its Nature and Future , 1st ed. Oxford University Press, 2016.[30] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,”
Bulletin ofMathematical Biophysics
Proceedings of theLondon Mathematical Society , vol. 42, pp. 230–265, 1936.[32] S. Russel and P. Norvig,
Inteligência Artificial , terceira ed. ed. Rio de Janeiro: Elsevier, 2013. [Online].Available: http://aima.cs.berkeley.edu/[33] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
Math. Control Signal Systems , vol. 2, pp.303–314, 1989. [Online]. Available: https://doi.org/10.1007/BF02551274[34] A. V. Oppenheim and R. W. Schafer,
Discrete-Time Signal Processing , 3rd ed. Pearson, 2009.[35] B. P. Lathi,
Signal Processng and Linear Systems . Oxford University Press, 1998.[36] A. Newell, “Physical symbom systems,”
Cognitive Science , vol. 4, pp. 135–183, April–June 1980.[37] ACM, “A. M. Turing Award 2018,” https://awards.acm.org/about/2018-turing, 2020.[38] L. Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Handwritten DigitRecognition with a Back-Propagation Network,” in
Advances in Neural Information Processing Systems . MorganKaufmann, 1990, pp. 396–404.[39] J. McCarthy and E. A. Feigenbaum, “In Memoriam: Arthur Samuel: Pioneer in Machine Learning,”
AI Magazine
IBM Journal of Research andDevelopment , vol. 3, no. 3, pp. 210–229, 1959.[41] W. Ertel,
Introduction to Artificial Intelligence , 2nd ed. Springer, 2017.[42] T. M. Mitchell,
Machine Learning . McGraw-Hill Dscience/Eng./Math, 1997.16tate-of-Charge Estimation of a Li-Ion Battery using Deep Forward Neural Networks
A P
REPRINT [43] D. Silver and et al, “General Reinforcement Learning Algorithm,” arXiv , 2017. [Online]. Avail-able: https://arxiv.org/pdf/1712.01815.pdf?utm_campaign=nathan.ai%20newsletter&utm_medium=email&utm_source=Revue%20newsletter[44] B. Widrow and M. E. Hoff, “Adaptive switching circuits,”
IRE WESCON Convention Record , pp. 96–104, 1960.[45] F. Rosenblatt, “The Perceptron: a probabilistic model for information storage and organization in the brain,”
Psychological Review , vol. 65, pp. 386–408, 1958.[46] D. E. Rumelhart, G. E. Hinton, and J. W. Ronald, “Learning Internal Representations by Error Propagation,” in
Parallel Distributed Processing: Explorations in The Microstructure of Cognition , D. E. Rumelhart and J. L.McClelland, Eds., vol. 1. Foundations, Cambridge, MA: Bradford Books/MIT Press, 1986.[47] A. E. B. Jr. and Y. C. Ho,
Applied Optimal Control . Blaisdell, 1969.[48] P. J. Werbos, “Beyond Regression: New tools for prediction and analysis in the behavioral sciences,” 1974.[49] D. B. Parker, “Learning Logic,” Center for Computational Research in Economics and Management Science, MIT,1985.[50] Y. LeCun, “A Learning Scheme for Asymmetric Threshold Network,” in
Disordered systems and biologicalorganization , E. Bienenstock, F. Fogelman, and G. Weisbuch, Eds. Springer Verlag, 1986.[51] A. Papoulis,
Probability, Random Variables, and Stochastic Processes , 3rd ed. McGraw-Hill, 1996.[52] D. H. Wolpert, “The Lack of A Priori Disctintions Between Learning Algorithms,”
Neural Computation , vol. 8,pp. 1341–1390, 1996.[53] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,”
Neural Computation ,vol. 18, pp. 1527–1554, 2006.[54] F. Chollet and J. J. Allaire,
Deep Learning with R , 1st ed. Manning Publications, 2017.[55] P. Kollmeyer, “Panasonic 18650PF Li-ion Battery Data,”
Mendeley Data , vol. 1, 2018. [Online]. Available:https://data.mendeley.com/datasets/wykht8y7tg/1[56] A. Gulli, A. Kapoor, and S. Pal,
Deep Learning with TensorFlow 2 and Keras , 2nd ed. Packt>, 2019.[57] S. Skansi,