[PDF] Neuroscience-Inspired Algorithms for the Predictive Maintenance of Manufacturing Systems

Abstract

If machine failures can be detected preemptively, then maintenance and repairs can be performed more efficiently, reducing production costs. Many machine learning techniques for performing early failure detection using vibration data have been proposed; however, these methods are often power and data-hungry, susceptible to noise, and require large amounts of data preprocessing. Also, training is usually only performed once before inference, so they do not learn and adapt as the machine ages. Thus, we propose a method of performing online, real-time anomaly detection for predictive maintenance using Hierarchical Temporal Memory (HTM). Inspired by the human neocortex, HTMs learn and adapt continuously and are robust to noise. Using the Numenta Anomaly Benchmark, we empirically demonstrate that our approach outperforms state-of-the-art algorithms at preemptively detecting real-world cases of bearing failures and simulated 3D printer failures. Our approach achieves an average score of 64.71, surpassing state-of-the-art deep-learning (49.38) and statistical (61.06) methods.

Full PDF

IIEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 1

Neuroscience-Inspired Algorithms for the PredictiveMaintenance of Manufacturing Systems

Arnav V. Malawade,

Student Member, IEEE,

Nathan D. Costa,

Member, IEEE,

Deepan Muthirayan,

Member, IEEE,

Pramod P. Khargonekar,

Fellow, IEEE,

Mohammad A. Al Faruque,

Senior Member, IEEE

Abstract —If machine failures can be detected preemptively,then maintenance and repairs can be performed more efﬁciently,reducing production costs. Many machine learning techniquesfor performing early failure detection using vibration data havebeen proposed; however, these methods are often power anddata-hungry, susceptible to noise, and require large amountsof data preprocessing. Also, training is usually only performedonce before inference, so they do not learn and adapt asthe machine ages. Thus, we propose a method of performingonline, real-time anomaly detection for predictive maintenanceusing Hierarchical Temporal Memory (HTM). Inspired by thehuman neocortex, HTMs learn and adapt continuously and arerobust to noise. Using the Numenta Anomaly Benchmark, weempirically demonstrate that our approach outperforms state-of-the-art algorithms at preemptively detecting real-world cases ofbearing failures and simulated 3D printer failures. Our approachachieves an average score of 64.71, surpassing state-of-the-artdeep-learning (49.38) and statistical (61.06) methods.

Index Terms —Predictive Maintenance, Prognostics, AnomalyDetection, Hierarchical Temporal Memory

I. I

NTRODUCTION P REDICTIVE Maintenance (PM) is an emerging newparadigm in manufacturing where symptoms of machinedegradation are detected before failures occur. It is a majorpart of the Industry 4.0 and smart manufacturing vision. Usingsensor readings, process parameters, and other operationalcharacteristics, PM can help maximize tool life by reducing thenumber of unnecessary repairs performed while also reducingthe likelihood of unexpected failures [1]. In the United Statesalone, improper maintenance and the resulting outages costmore than 60 billion dollars per year [2]. Thus, smart data-driven paradigms such as PM have the potential to reduceindustrial production costs signiﬁcantly.Recently, many statistical, machine learning (ML), anddeep learning (DL) techniques for PM have been proposed.However, these methods are not without their shortcomings:statistical methods require extensive domain knowledge andoften do not generalize well to more complex use cases, whileDL and ML techniques often require large amounts of training ©2021 IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, including reprinting/republishingthis material for advertising or promotional purposes, collecting new collectedworks for resale or redistribution to servers or lists, or reuse of any copyrightedcomponent of this work in other works.A. Malawade, N. Costa, D. Muthirayan, P. Khargonekar, and M. Al Faruqueare with the Department of Electrical Engineering and Computer Science,University of California - Irvine, Irvine, CA, 92697 USA e-mail: { malawada,ndcosta, dmuthira, pramod.khargonekar, alfaruqu } @uci.edu.Manuscript received June 11, 2020; revised January 4, 2021. data and are susceptible to increased error as machines ageover time. Furthermore, ML and DL algorithms are highlysusceptible to noise, making them insufﬁciently robust forindustrial settings without data preprocessing. Due to thehigh noise level and diversity among industrial systems, PMmodels that do not require signiﬁcant preprocessing or domainknowledge are considered more practical [3].To overcome these issues, we propose the use of a learningalgorithm inspired by neuroscience called Hierarchical Tem-poral Memory (HTM) , pioneered by Hawkins and Blakeslee[4]. Using binary sparse distributed representations (SDRs) torepresent data and an architecture incorporating feed-forward,lateral, and feedback connections, HTMs emulate the inter-actions between pyramidal neurons in the neocortex. HTMsare online learning algorithms that require less application-speciﬁc tuning, are robust to noise, and adapt to variations inthe data as they continuously learn. In practice, this meansHTMs can efﬁciently learn from a single training pass oversmall training datasets with little to no hyperparameter tuning.These characteristics also enable HTMs to learn in nearreal-time. For these reasons, they are suitable for practicalapplications such as detecting early symptoms of failure inmanufacturing equipment. In this work, we demonstrate theeffectiveness of an HTM-based anomaly detection methodol-ogy at detecting these symptoms in roller-element bearingsand 3D printers.

A. Related Work

We focus on the speciﬁc task of PM on roller-elementbearings due to their broad application and utility in manu-facturing. We also evaluate Additive Manufacturing (AM) asit is a modern technique that presents unique challenges dueto the dynamics of 3D printers. Here, we brieﬂy discuss worksrelated to PM for roller bearings and additive manufacturing.Many PM methods use statistical models due to theirsimplicity and explainability. These approaches rely on ex-tracted time and frequency domain features. For example, theenergy entropy mean and root mean squared (RMS) values ofwavelets were used to diagnose ball bearing faults in [5]. Inanother example, the spectral kurtosis (SK) of vibration andcurrent signals was used to detect and classify the surfaceroughness of ball bearings in [6]. Using a particle ﬁltermethod, Zhang et al. performed fault detection on bearingssimilar to those found in helicopter oil cooler fans [7].In addition to statistical methods, ML techniques havebeen applied to a wide array of industrial prognosis tasks. a r X i v : . [ c s . N E ] F e b EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 2

One such method: AutoRegressive Integrated Moving Average(ARIMA), is one of the most popular techniques for time-series forecasting and was used to predict failures and identifyquality defects in a slitting machine in [8]. In another ap-proach, Tobon-Mejia et al. used Mixture of Gaussians HMMsand Wavelet Packet Decomposition to estimate the RemainingUseful Life (RUL) of roller-element bearings [9].DL methods such as Long Short-Term Memory (LSTM)Networks and Convolutional Neural Networks (CNNs) havealso been used extensively for PM. In one example, Fenget al. used an LSTM for detecting anomalies in industrialcontrol systems [10]. Additionally, an RNN-LSTM was usedto perform PM on an air booster compressor motor used inoil and gas equipment in [11].Due to the increased complexity and relatively late adoptionof AM systems, PM techniques for AM have not been studiedin great detail. Proposed approaches often draw from researchin related applications, such as PM for bearings. For example,Yoon et al. evaluated the feasibility of AM equipment faultdiagnosis using a piezoelectric strain sensor and an acousticsensor. In this work, features such as RMS value, kurtosis,skewness, and crest factor were used to detect faults [12].Deep learning has also been used for AM anomaly detection,such as in [13] where a neural network was used to classifyfaults in 3D printer vibration data.Despite the proliferation of statistical, ML, and DL ap-proaches to PM for manufacturing, to the best of our knowl-edge, no HTM-based solutions have been proposed. However,the structural and temporal properties of HTM algorithmsallow them to excel at cross-domain tasks that apply tomanufacturing, such as anomaly detection [14]. Since the coreobjective of PM in manufacturing is detecting early symptomsof part failure, HTMs are a natural candidate for this task.HTMs were shown to match or surpass neural networks atdetecting and classifying foreign materials on a conveyorbelt in a cigarette manufacturing plant [15]. HTMs have alsoproven effective at detecting anomalies in crowd movements[16], trafﬁc patterns [17], human vital signs [18], electricalgrids [19], and computer hardware [20].

B. Research Challenges

Overall, PM for manufacturing presents the following keyresearch challenges:1)

Identifying time-series anomalies in near real-time de-spite ambient noise. Learning efﬁciently from small training datasets to im-prove applicability to practical use cases. Developing a solution that can be generalized to manyheterogeneous manufacturing systems without requiringextensive domain-speciﬁc tuning. Adapting to changes in data statistics (i.e., machineaging).

Despite the successes achieved by existing methods in theaforementioned applications, industrial manufacturing systemsare diverse and complex, making it difﬁcult to ﬁnd solutionsthat generalize across applications. Consequently, PM systems require specialization, which necessitates specialized knowl-edge and cross-domain skills. This is especially true in the caseof bearing-failure prognosis, as bearing design and lifetimemanagement lies squarely in the mechanical and materialsengineering domains.It is difﬁcult for any single technique to address allthese research challenges effectively. For example, statisticalmethods such as thresholding based on kurtosis or spectralanalysis are highly efﬁcient and real-time capable but requireexplicitly deﬁned health indicators and thresholds, which aremachine- and application-speciﬁc. Also, stationary methodsincluding RMS, kurtosis, and crest factor are only effectivefor stationary signals (signals with time-invariant statisticalproperties), but bearing vibration signals are generally cy-clostationary (statistical properties vary cyclically) or non-stationary (statistical properties change depending on speedand load conditions)[21]. Spectral kurtosis is applicable tonon-stationary and non-periodic signals but is sensitive tonoise and outliers [22].Classical ML algorithms such as AR Models, SupportVector Machines, Hidden Markov Models (HMM), RandomForests, and k-Nearest Neighbors have been demonstrated forPM in existing work, but require the extraction of explicithealth indicators (features) from data [23]. These algorithmsalso require application-speciﬁc hyperparameter tuning, datapreprocessing as they have poor noise robustness [3], andregular updates of model settings as they do not adapt toaccount for machine aging [23]. Moreover, both HMM andAR methods are ineffective on non-stationary signals [21].In DL algorithms such as neural networks and LSTMs,health indicators can be learned implicitly by the network.However, a network trained for one machine cannot generalizeto a new machine without retraining with a large amount ofdata for hundreds or thousands of epochs. Larger models maybe able to generalize better, but the complexity of trainingand optimizing these models increases drastically with size[23]. This domain-speciﬁc training and tuning process canbe expensive, time-consuming, and impractical for real-worlduse cases. Like the ML methods, DL algorithms also havepoor noise robustness [24] and require high-quality data, orelse performance can suffer signiﬁcantly [3]. To address this,signiﬁcant preprocessing steps are often needed to generateclean data for these models [3].As stated in Section I-A, HTM-based anomaly detectionmethods have demonstrated success in several distinct ﬁelds.However, to the best of our knowledge, no prior work hascomprehensively explored HTM’s ability to model vibrationdata or demonstrated its practical value for PM. Overall, allof these existing methods fall short of addressing one or moreresearch challenges.

C. Our Novel Contributions

To address these key research challenges and improve on thePM performance demonstrated by previous works, our paperpresents the following contributions:1)

We demonstrate the ability of HTM-based anomaly de-tectors to detect early symptoms of bearing failure in

EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 3 several months’ worth of real-world vibration data. Weshow that HTM’s can efﬁciently learn with only a singletraining pass. We demonstrate the ability of HTMs to generalize acrossapplications without much ﬁne-tuning and their abilityto continuously learn and adapt by evaluating theiranomaly detection performance on a second, highlydynamic application: 3D printer vibration data. Thesecharacteristics of HTMs make them more practical forreal-world use cases. We compare the performance of HTM anomaly detec-tion methods against state-of-the-art anomaly detectiontechniques and traditional machine prognosis methodssuch as condition-based maintenance. Speciﬁcally, weevaluate each algorithm’s anomaly detection accuracyand robustness to noise. We demonstrate the efﬁciency and real-time capability ofHTM-based prognosis by comparing its execution timewith that of the other techniques.

II. B

ACKGROUND T HEORY

A. Hierarchical Temporal Memory

Hierarchical temporal memory is a sequence learning frame-work modeled after the structure of the neocortex in the humanbrain [4].The basic unit of HTM is a neuron modeled after thosepresent in the neocortex (Fig. 1(b)). These neurons are stackedon top of one another to form a column like the ‘corticalcolumn’ of the neocortex. The ﬁnal HTM is a compositionof many such columns. A single HTM neuron (Fig. 1(c)), isconnected to two types of segments: (i) proximal segments(aggregation of feed-forward connections from the input) and(ii) distal segments (aggregation of lateral connections fromneurons of the other columns). Each HTM neuron can be inthree states: (i) inactive (the default state), (ii) predictive, and(iii) active. The predictive state of a neuron is determined bythe activity of the distal segments, which in turn is determinedby the activation state of the other neurons. A neuron becomesactive at any time only if it was in the predictive state atthe previous instant, with an exception that will be describedin Section III-A. When the sequences of activations areviewed temporally, it is easy to see that the distal segmentsprovide the temporal context for activation and thus capturethe temporal relations. The column structure augments thiscapability of HTM by enabling them to store multiple suchoverlapping temporal sequences. Further details on the HTM-based anomaly detection methodology are discussed in SectionIII-A.

B. PM of Roller-Element Bearings

Roller-element bearings perform the critical task of reduc-ing friction between rotating parts in machinery. Generally,catastrophic bearing failures present warning signs such asanomalous vibrations and/or noise. These anomalies can occurdue to environmental factors (moisture or debris entering thebearing) as well as installation errors (misalignment, excessive loads, or poor/improper lubrication) [25]. Recently, sensor-based techniques that leverage vibration and temperature datato monitor bearing health have been proposed. For example,the NASA Bearing Dataset and the Pronostia Bearing Datasetcontain vibration and temperature data for several bearingswhich were run until failure [26], [27]. In both datasets,anomalies in the vibration and temperature signals increase insize and frequency as the bearings approach failure, showinga strong correlation between the sensors’ readings and systemstate.

C. PM of 3D Printers

3D printing is a manufacturing process where a physicalobject is constructed from layers of material in an iterativeprocess. Fused Deposition Modeling (FDM) is a standardtechnique where melted thermoplastic is extruded througha moving print head nozzle to build each layer. To ensureprecision, stepper motors control the extrusion rate of thenozzle as well as the X, Y, and Z-axis movement of the printhead. Since the motors, bearings, and belts are moving parts,they are prone to wear and must be regularly maintained toprevent component failures. As shown in [28], these com-ponents leak vibration information that can be used by PMsystems. However, this leaked information is non-stationarysince 3D printers move on multiple axes and change directionand speed often, presenting a challenge for conventional PMmethods. III. M

ETHODOLOGY

A. Anomaly Detection using Hierarchical Temporal Memory

The end-to-end framework for the HTM-based detector isshown in Figure 2. Our methodology for anomaly detectionconsists of the following steps. First, the time-series vibrationdata X ( t ) is taken as input and encoded into a SparseDistributed Representation (SDR). Next, the SDR is passedthrough the spatial pooler. The spatial pooler’s output is fedinto the temporal pooler, which then outputs a prediction forthe next activation Π( t n +1 ) . Simultaneously, the predictionfrom the previous time step Π( t n ) is compared with thecolumn activations in the current time step A ( t n ) to givea prediction error value: a high error value indicates thatthis activation was not expected and may be anomalous.Finally, the anomaly detector uses the historical distributionof anomaly scores to calculate the anomaly likelihood L ( t n ) for the current data point based on the prediction error value;if L ( t n ) exceeds a set threshold, then X ( t n ) is ﬂagged asan anomaly. In the following paragraphs, we describe each ofthese components in detail.

1) Encoder:

The ﬁrst stage in processing the input data X ( t ) is the encoder . The encoder converts the incoming datapoint X ( t ) into a sparse distributed representation (SDR).This representation is a vector of binary values, and it is sparsebecause only of the bits are activated for any input. Thiscontrasts with deep learning methods that store and learn adense, distributed representation. Later, we shall describe theadvantages of using a sparse representation. We denote theoutput of the encoder by x , a × n vector. EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 4

Context (lateral connections)

Feedback

Feedforward (Input)

HTMNeuron a) Human Brain c) HTM Neuron

Apical

Proximal

Distal b) Pyramidal Neuron

NeocortexLimbic SystemReptilian Brain

Fig. 1. How Neocortical structures are modeled by Hierarchical Temporal Memory. The neocortex is composed of a large number of interconnected pyramidalneurons, each with proximal (feed-forward), apical (feedback), and distal (lateral) dendrites to connect to other neurons. These relations are modeled in HTMneurons as feed-forward, feedback, and lateral connections.

Sparse Distributed Representation (SDR)X(t n-2 )X(t n-1 ) X(t n ). . . Input Data

Encoder

Anomaly

LikelihoodPrediction

ErrorSpatial Pooler Temporal Pooler Π(t n ) Π(t n+1 ) A(t n ) (from previous time step) L(t n ) Fig. 2. HTM Anomaly Detection Framework. The time-series input X ( t ) is encoded into a Sparse Distributed Representation (SDR). This information ispassed through a spatial pooler and a temporal pooler before outputting a prediction Π( t n +1 ) for the next set of column activations. The prediction errorbetween Π( t n ) and A ( t n ) and the historical distribution of anomaly scores are used to determine the anomaly likelihood L ( t n ) .

2) Spatial Pooling:

The second stage is spatial pooling .The spatial pooler identiﬁes spatial relations between differentregions of the encoder’s output through the proximal connec-tions. Spatial poolers can also be stacked to identify morecomplex relations. The proximal segment of each neuron in acolumn is initialized such that each neuron, where the neuronsof the same column share the same proximal segment, isconnected to a large fraction of the inputs ( ). The outputof this stage is also an SDR representing the columns of theHTM that will be activated in the ﬁnal output. We denotethe spatial pooling operation mathematically by I k ( . ) , wherethe input is the list of columns ordered in decreasing orderof their proximal segment values, and k indicates the numberof columns to be picked for activation from the top of thislist. The number k is typically the top , so the outputrepresentation is sparse. Let y c denote the activation of thecolumns and P denote the proximal connections where P isa binary matrix of size n × N . Then y c = I k ( xP ) (1)

3) Prediction:

The next stage is prediction . The predictionfor the next time step is the predictive state of the HTMat the end of the current time step. Let the weights of thelateral connections of the d th distal segment of the i th neuronof j th column be D di,j . We note that only those weights ofconnections which are above a certain threshold are consideredto be established and the rest are set to zero. A neuron ( i, j ) enters the predictive state provided the sum of activations of atleast one of the distal segments exceeds a certain threshold, θ d .Denote the predictive state of a neuron at time t n by π i,j ( t n ) .We denote the current activation state of all neurons at time t n by A ( t n ) . We denote the total predictive state by the matrix Π( t ) , whose elements are therefore π i,j ( t n ) . Mathematically, π i,j ( t n ) is given by, π i,j ( t n ) = (cid:26) if ∃ d s.t. || D di,j (cid:12) A ( t n ) || > θ d , otherwise. (2)where (cid:12) denotes the element-wise multiplication operation.

4) Temporal Pooling:

The ﬁnal stage is temporal pooling .Temporal pooling computes the activation state A ( t n ) (an M × N matrix where M is the number of neurons per mini-column and N is the number of mini-columns in the layer)of the HTM, which is also the output of HTM based on atemporal context. A neuron i is activated provided its columnis activated, i.e., y c ( j ) = 1 and provided it is in the predictivestate, i.e., π i,j ( t n − ) = 1 . The other neurons in this columnare inhibited. If none of the neurons in a column that is activeare in the predictive state, then all the neurons of this columnare activated. Here, the predictive state π i,j ( t n − ) from theprevious time step is the temporal context. This temporalcontext is updated at the end of this time step as describedin the prediction step above. Let a i,j ( t ) be the i, j th elementof A ( t n ) denoting the activation state of neuron i in column j .Then, the temporal pooling operation can be mathematicallydescribed as: a i,j ( t ) =  if y c ( j ) = 1 and π i,j ( t n − ) = 1 , y c ( j ) = 1 and (cid:80) i π i,j ( t n − ) = 0 , otherwise. (3)Figure 2 shows the different stages of HTM processingin the context of anomaly detection. After activation, theprediction error between the prediction from the previous timestep Π( t n ) and the current activation state A ( t n ) is computedand passed to the anomaly likelihood block, which uses thehistorical distribution of anomaly scores to determine if X ( t n ) is a true anomaly. EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 5

5) Learning:

HTMs use a Hebbian-type learning algorithmthat reinforces the connection weights of the segments thatcorrectly predict the activation at the next time-step. Each timestep, the weights are re-evaluated as follows. The connectionweights of an activated neuron’s segments that originatedfrom previously active neurons are increased. The connectionweights from neurons that were not active in the previous time-step are decreased. Additionally, weights of connections thatare wrongly predicted are also decreased but at a lesser rate,i.e., forgetting happens at a slower rate than updating. It is thistype of learning that allows HTMs to learn continuously andadapt to changes over a long term . The learning algorithm isdiscussed in much greater detail in [29].

6) On Capacity, Robustness, and Efﬁciency:

Here, weillustrate why HTMs are efﬁcient and robust to noise. Let usconsider an HTM with a large n , where n denotes the sizeof the encoder’s output, x , a binary vector. Denote by w themaximum number of bits that can be one. Typically, w is smallrelative to n . Given this, lets deﬁne: α := w/n . Here, α is ameasure of sparsity and denotes the fraction of the bits thatcan be active in the SDR of size n . An example would be, n = 2048 and w = 4 and so α ≈ . .The number of possible unique encodings, N e that can bestored in vector x , given n and w , is given by, N e = (cid:18) nw (cid:19) = n ! w !( n − w )! (4)For example, if n = 2048 and w = 20 then N e = 10 .Given N e , the probability that one SDR x will match anotherSDR y , which is randomly picked, is trivially computable: P( x = y ) = 1 /N e (5)Thus, the probability of a false match is, for all practicalpurposes, zero. This shows that SDRs can store and recallreliably an astronomically large number of vectors. Conse-quently, it follows that HTMs can store and recall reliably anastronomically large number of sequences .We can now relax the requirement and say that two SDRsare equivalent if θ ( < w ) or more bits match. In this case, thematching is allowed an error of up to w − θ bits. Denote by Ω x ( b ) the set of sparse vectors (of size n and sparsity α ) thathave an overlap of b bits with x . Then, the probability that afalse match will be generated, P fm , is given by, P fm = (cid:80) b ≥ θ Ω x ( b ) N e , where Ω x ( b ) = (cid:18) wb (cid:19) × (cid:18) n − ww − b (cid:19) (6)Clearly, the probability of a false match has increased byallowing an error of up to w − θ . In the same example asabove, if θ = 10 , then w − θ = 10 , that is an error up to is allowed. We ﬁnd that the probability of a false match is still / , which for all practical purposes is zero. This is whatgives SDRs and thereby HTMs robustness to noise .The sparsity of x allows for sparse computation, whichmakes computations with SDRs very efﬁcient. For a repre-sentation x of size n and sparsity α , one does not need tostore information on all the bits. Instead, one can just storethe address of the locations of bits of value one. Then, for anoperation like matching, one just needs to check the value of the bits of the vector y at its corresponding locations; thisis doable almost in constant time. We can trivially extendthis argument to show that the spatial pooling, prediction,and temporal pooling operations described above can also beperformed very efﬁciently in HTMs, thus giving HTMs their computational efﬁciency . Next, we discuss our experimentalsetup for demonstrating the performance of the HTM-basedanomaly detector. B. Experimental Setup

We evaluate our proposed methodology on real-world bear-ing failure and simulated 3D printer failure datasets. Here, wediscuss details about these datasets and the scoring systemused for evaluation.

1) Bearing Dataset:

We used the NASA Bearing Datasetand the Pronostia Bearing Dataset [26], [27]. The NASABearing Dataset contains three tests of bearings run to failure.The Pronostia Bearing Dataset contains vibration snapshotsrecorded with three different radial load and RPM settings. Theaccelerometer data for Test 2 of the NASA Dataset is shownin Figure 3. In total, our testing set consists of 40 vibrationdata ﬁles and 191 labeled anomalies.

2) 3D Printer Dataset:

Our experimental testbed for col-lecting vibration data from a 3D printer is shown in Figure4. The 3D printer uses one stepper motor to control eachmovement axis (X, Y, and Z). We placed one accelerometer di-rectly behind each stepper motor to capture vibration data fromprints of various 3D objects. To the best of our knowledge,no publicly available 3D printer component-failure datasetsexist, and generating real-world failures would risk damagingour equipment. Thus, we instead opted to generate syntheticanomalies in the 3D printer vibration data.3D printer vibration signals are inherently non-stationary,meaning that their statistical properties vary with time. How-ever, since printers contain bearings and rotating componentswith similar dynamics, they share the same time-series andfrequency domain features as those correlated with bearinghealth, such as power spectral density (PSD) [21], [22]. Forexample, in Figure 3 it is clear that the overall power ofthe vibration signal increases as the bearing nears failure.Intuitively, this same phenomenon will occur in a 3D printeras components wear out. Thus, we synthesized anomalies inthe 3D printer vibration data by mapping the PSD from ourbearing failure data to the 3D printer data. This compositionenabled us to simulate the magnitude changes characteristicof bearing and component failures in the 3D printer whilepreserving the frequency components unique to the 3D printer.Our PSD mapping algorithm, shown in Algorithm 1, oper-ates on a sliding window over one bearing vibration ﬁle andone 3D printer vibration ﬁle. For each window t , the followingsteps are performed: First, the Fast Fourier Transform (FFT) X b [ t ] of the bearing time-series data b [ n ] is calculated for apre-set frequency bin-size. Next, the power in each frequencybin is calculated. Then, we calculate the ratio C between theprevious window’s power value and the current power valuein each bin. This ratio is used to scale the correspondingfrequency bin in the FFT of the 3D printer data F F T ( p [ t ]) , EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 6 -2-1 S e n s o r V a l u e Date Anomalies start appearing

Bearing failure

Start of test

Fig. 3. Accelerometer Data from Test 2 of the NASA Dataset [26]. Symptoms of bearing failure can be seen on 2/17 and 2/18 before the bearing’s outerrace failed on 2/19.

Vibration Sensor PlacementData CollectionComputer

Fig. 4. Experimental testbed used to collect vibration data from our 3Dprinter. Three accelerometers were placed on the printer in total; one sensorwas placed directly behind each of the printer’s three stepper motors. yielding an FFT with synthesized anomalies X s [ t ] . Finally, theInverse FFT (IFFT) of X s [ t ] is taken and added to the outputat location s [ t ] .The result after all iterations is a 3D printer vibrationsignal with synthesized anomalies s [ n ] . Using this mappingalgorithm, we produced a simulated 3D printer failure datasetcontaining 15 test cases and 57 hand-labeled anomalies. Algorithm 1:

PSD Mapping Algorithm

Result:

3D printer data with anomalies: s [ n ] Initialize bearing and 3D printer data: b [ n ] , p [ n ] ;Initialize output signal: s [ n ] ← [0 , ..., ; t ← ; while t < length ( p ) do X b [ t ] ← F F T ( b [ t ]) ; P b [ t ] ← | X b [ t ] | ; C ← (cid:112) P b [ t ] /P b [ t − ; X s [ t ] ← F F T ( p [ t ]) (cid:12) C ; s [ t ] ← s [ t ] + IF F T ( X s [ t ]) ; t ← t + 1 ; end

3) Anomaly Detectors:

To evaluate the performance ofHTMs at PM, we use the following two HTM-based anomalydetectors in our approach with slightly different temporalmemory implementations, which we denote as HTM [14] andTM-HTM [30]. To explore the effectiveness of anomaly like-lihood for HTM-based detectors, we evaluated HTM and TM-HTM with three different anomaly likelihood conﬁgurations:1) no anomaly likelihood: the prediction error of the HTMwas directly used as the anomaly score.2) historical distribution (HD): the implementation de-scribed in Section III-A.3) LSTM-based predictor (LP): The HD anomaly likeli-hood block was replaced with a 2-layer LSTM predictortrained to predict normal HTM prediction error values in order to ﬁlter out false positives/noise. The predictionerror of the LSTM was used as the ﬁnal anomaly score.We also evaluated baseline and state-of-the-art anomalydetectors including an RNN-based detector conﬁgured to useLSTM cells (denoted as LSTM) [31] (similar to [10], [11]),Windowed Gaussian (based on the tail probability of thedistribution over a sliding window), a threshold-based detector(similar to condition-based maintenance and [5]), EXPoSE[32], Contextual Anomaly Detector (CAD-OSE) [33], RelativeEntropy [34], Etsy Skyline [35], KNN Conformal AnomalyDetector (KNN-CAD) [36], Bayesian Changepoint (BC) [37],Random (random anomaly score), and Null (constant anomalyscore). All of the listed algorithms except LSTM were exposedto the training data once before testing and updated theirmodels as they were exposed to unseen test data. LSTMwas trained for over 1000 epochs on the training data andwas tested with the model settings that resulted in the lowestvalidation loss. LSTM was tested ofﬂine, meaning that it didnot update its model weights during testing. The LP anomalylikelihood conﬁguration was also trained in this manner butused the HTM output as its input data instead.

4) Scoring:

To score each algorithm fairly, we rely onthe Numenta Anomaly Benchmark (NAB) [14]. NAB wasdesigned to fairly benchmark anomaly detection algorithmsagainst one another. It contains a built-in anomaly scoringalgorithm, normalization, and three threshold optimizationsettings: standard, low false positives (Low FP), and lowfalse negatives (Low FN). NAB takes in datasets with labeledanomalies and produces anomaly windows . These are used toscore anomaly detectors on how precisely they can pinpointanomalies; early/on-time detections are rewarded, and veryearly/late detections are penalized.The NAB scoring function is as follows: given an appli-cation proﬁle A = [ A T P , A

F P , A

T N , A

F N ] specifying theweights for each kind of detection, and the position y of thedetection relative to the anomaly window, the scoring functionfor each detection is: σ A ( y ) = ( A T P − A F P ) (cid:18) e y ) − (cid:19) (7)These scores are summed up for all the detections in a ﬁle;the following weighted penalty is deducted for every misseddetection ( f d ): A F N f d . The summed score is then normalizedto a 0-100 scale where 0 represents equivalent (or worse)performance to the Null detector, and 100 represents a perfectanomaly detector. An example of the scoring functionality isshown in Figure 5. To provide ground-truth values of anomalylocations in the dataset, we followed the NAB ofﬁcial anomalylabeling guide and manually labeled anomalies in each dataset.The ﬁrst 15% of each vibration data ﬁle was used for trainingwith the remaining 85% used for testing and scoring. EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 7 -0.200.20.40.60.811.2 Anomaly Score Anomaly Window Scoring Function

Anomaly detected by algorithm

Fig. 5. NAB Scoring Functionality: Detection scores are assigned accordingto the scoring function. The anomaly detected in this example is given a scoreof 0.65.

IV. R

ESULTS

A. Roller Bearing Anomaly Detection

Table I shows the NAB results for the selected algorithmson the labeled bearing failure dataset as well as the totalrunning time of each algorithm. The runtime was recordedover the complete dataset using a PC with an Intel Corei7-7700k processor. As shown in Table I, TM-HTM+HDachieved the highest anomaly detection score for the Standardand Low FN proﬁles while HTM+LP achieved the highestscore for the Low FP proﬁle. TM-HTM+HD scored , , and for the Standard, Low FN, and Low FPproﬁles, respectively. The approach that scored closest to HTMwas Windowed Gaussian, which achieved scores of , , and for the same proﬁles, respectively. HTMand HTM+LP performed better than TM-HTM TM-HTM+LP,indicating that TM-HTM’s implementation only works wellwith the HD anomaly likelihood block.As expected, the statistical methods (Windowed Gaussian,Threshold-Based, Relative Entropy) processed the datasetfaster than the DL, ML, and HTM-based methods, albeit withlower performance. The HTMs using HD were 1.41x slowerthan the HTMs with no anomaly likelihood and 3.76x fasterthan the HTMs using LP on average. TM-HTM+HD processedthe dataset than LSTM. Anomaly Detector Scoring Proﬁle Runtime (s)Standard Low FN Low FP

TM-HTM+HD (Ours) 67.05 73.33

ORMALIZED

NAB

SCORES FOR ANOMALY DETECTION ON THE BEARINGFAILURE DATASET . To evaluate the qualitative performance of each anomalydetector, we plotted the anomaly scores over time for eachdetector for Test 1 of the Pronostia Bearing dataset and compared them to the labeled ground truth anomaly windowsin Figure 6. -25-15-551525

Accelerometer Value Anomaly Windows

Bayesian Changepoint Anomaly Windows relativeEntropy Anomaly Windows Threshold-Based Detector

Windowed Gaussian CAD-OSE Anomaly Windows

EXPoSE Etsy Skyline Anomaly Windows

KNN-CAD LSTM Anomaly Windows

HTM+HD TM-HTM+HD

Anomaly Windows

HTM TM-HTM

Anomaly Windows HTM+LP TM-HTM+LP

Anomaly Windows

Fig. 6. Anomaly scores for each detector in comparison to the ground truthanomaly windows for Test 1 of the Pronostia Bearing Dataset.

B. 3D-Printer Anomaly Detection

Table II shows our experimental results for the 3D printerdataset. HTM+HD achieved the highest score on the LowFN proﬁle while LSTM achieved the highest score on theStandard and Low FP proﬁles. HTM+HD achieved scores of , , and for the Standard, Low FN, and Low FPscoring proﬁles, respectively. LSTM scored , , and at the same proﬁles, respectively. On both applicationsthe HTM, TM-HTM, and TM-HTM+LP detectors performedworse than the HTM+HD, HTM+LP, and TM-HTM+HD de-tectors. Overall, the use of HD anomaly likelihood yielded thebest HTM performance across applications. Each algorithm’sexecution time is consistent with the results shown in Table I.V. D ISCUSSION

A. Overall Performance and Adaptability

Interestingly, algorithms that performed well on the bear-ing dataset, such as EXPoSE and Etsy Skyline performed

EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 8

Anomaly Detector Scoring Proﬁle Runtime (s)Standard Low FN Low FP

HTM+HD (Ours)

ORMALIZED

NAB

SCORES FOR ANOMALY DETECTION ON THE PRINTER DATASET . worse on the 3D printer dataset. Additionally, algorithmsthat performed worse on the bearing dataset, such as LSTMand BC performed much better on the 3D printer dataset.Our HTM-based methodology using HD anomaly likelihoodachieved consistently high performance on both applicationswithout any hyperparameter tuning, demonstrating that thisconﬁguration can generalize and adapt to different applicationswithout domain-speciﬁc tuning. This result also suggests thatHTMs signiﬁcantly beneﬁt from the inclusion of an HDanomaly likelihood block.Also, HTM+LP was the best performing model on the LowFP proﬁle for the bearing dataset. However, this performancewas not replicated in the 3D printer dataset. Similarly, LSTMbeat HTM on the Standard and Low FP proﬁle for the 3Dprinter dataset while performing worse than HTM on thebearing dataset. Hence, our results suggest that LSTMs arehighly data-dependent and need to be re-tuned for everymachine and/or application. Thus, the LSTM approach istime-consuming, expensive, and impractical for real-worldapplications.The beneﬁts of HTM’s continuous learning capability areclearly shown in Figure 6: after identifying earlier anomalies,the HTM-based approaches learn the new baseline for the sig-nal and can pinpoint the future anomalies despite higher signalamplitudes. CAD-OSE also appears to learn continuously, butnot as well as the HTMs. B. Real-Time Detection Capability

In addition to detection accuracy and precision, an optimalPM system should be able to detect failure symptoms in real-time to allow adequate time for repairs to be scheduled andperformed.However, part failures are infrequent and generally presentprogressive symptoms before failure, so a hard real-timerequirement for processing raw sensor data may unnecessarilylimit the complexity (and subsequently the performance) ofanomaly detection methods. Thus, we evaluate the anomalydetectors in the context of ”soft real-time,” where we deter-mine if each detector can process a subsampled data segmentbefore the next subsampled data segment arrives. For example, 1 second of data can be recorded each minute as a data seg-ment to reduce data size while still ensuring that a wide rangeof vibration frequencies are captured at frequent intervals.Both HTM+HD and TM-HTM+HD were able to processthe complete bearing failure dataset in under 100 minutes;since the bearing dataset contains several months’ worth ofvibration data and minimal data preprocessing was performed(subsampling and timestamping), this demonstrates that HTMscan accurately detect failure symptoms in real-time, mean-ing that machine operators can be notiﬁed of degradationpromptly. Other complex algorithms such as CAD-OSE, KNN-CAD, and EXPoSE had execution time on the same orderof magnitude as HTMs and are thus also capable of real-time anomaly detection. Although HTM+LP, TM-HTM+LP,and LSTM took longer to process the dataset than HTM+HDand TM-HTM+HD, they can still be considered real-time dueto the aforementioned dataset characteristics. However, thesigniﬁcant training time associated with the LSTM (over 12hours on our hardware platform) and the need for application-speciﬁc hyperparameter tuning put LSTM at a disadvantagein terms of applicability to practical use cases.

C. Tunability, and Robustness to Noise

Figure 6 clearly shows HTM’s ability to pinpoint anomalieswhile remaining robust to noise in the input. This is likelydue to HTMs use of sparse encodings, making it unlikelythat bit errors in the input due to noise will affect the bitscorresponding to the input pattern, making them robust tonoise. From the ﬁgure, it is also clear that the HTM implemen-tations using anomaly likelihood blocks were more robust tonoise outside of the anomaly windows than the HTM or TM-HTM alone. This is likely because the anomaly likelihoodcomponents ﬁlter out smaller detections to isolate only themost plausible anomalies. The HTM+HD and TM-HTM+HDdetected anomalies earlier than the other conﬁgurations, albeitwith slightly more false positives. The outputs of the differentHTMs starkly contrast with the highly variable anomaly scoreoutputs of Windowed Gaussian, EXPoSE, KNN-CAD, andBC, among others. These detectors record high anomaly scoreseven when there is relatively low noise in the input, meaningthat they will likely suffer from false positives at higher noiselevels.A detector’s threshold can be tuned to account for highernoise levels; however, for detectors such as Windowed Gaus-sian, which used the maximum detection threshold of 1.0, thethreshold cannot be increased further to reduce its sensitivity.In contrast, TM-HTM+HD used a threshold of 0.5497 on theStandard proﬁle. Thus, although Windowed Gaussian outper-formed TM-HTM+HD on the Low FP scoring proﬁle, it lackstunability and will likely perform much worse than this HTMconﬁguration in more noisy environments.LSTM appears to have good robustness to noise, as shownin Figure 6. However, it is clear from the ﬁgure that itmissed some of the earlier anomaly windows completely. Inthe context of PM, this can mean that an observer will onlybe warned of degradation later and will not have much timeto organize repairs. Overall, our methodology demonstrates

EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 9 signiﬁcant noise-robustness, better tunability, and the ability todetect early anomalies as well as larger, late-stage anomalies.

D. Limitations and Future Work

Another related PM problem is Remaining Useful Life(RUL) estimation. In many cases, RUL and anomaly detectiongo hand in hand as part of a comprehensive PM system.Although we did not evaluate the performance of HTM at RULestimation, the core architecture of HTM is good at sequenceprediction and could likely be used to solve this problem. Weleave this for future work.Another limitation of our work is the use of synthesized3D printer anomalies instead of real-world examples of 3Dprinter failures. Due to resource constraints, we opted notto perform these experiments and used synthetic failure datainstead. The question of whether HTM’s performance onsynthetic anomalies translates to real-world PM remains anopen research problem.

E. Feasibility

The idea of predicting machine failures in advance isnot brand new; many variants of PM systems have alreadybeen implemented in real-world manufacturing applications.However, based on our results, we believe that HTM isa better solution than current state-of-the-art methods. Ourresults demonstrate that HTMs are efﬁcient enough to runon consumer-grade processors while learning and adaptingcontinuously. Additionally, HTMs can be easily installed onexisting PM systems as they only require time-series sensorinputs, which likely already exist in the system. As shown byour results, the industry-standard LSTM requires a signiﬁcantamount of time for training (over 1000 epochs) as well asapplication-speciﬁc tuning. In contrast, HTMs do not requireany application-speciﬁc parameter tuning and are essentiallyplug-and-play since they only need to be trained with a singlepass on normal sensor data. These characteristics make HTMsan extremely viable, out-of-the-box solution for industrial PM.VI. C

ONCLUSION

Existing methods for predicting machine failures from sen-sor data are limited in their practicality due to shortcomings,including poor noise resistance, efﬁciency, and adaptability.Our experiments demonstrated that our methodology outper-forms state-of-the-art approaches at detecting anomalies inboth bearing and 3D printer failure data with minimal to nopre-processing or application-speciﬁc tuning. On the Standardscoring proﬁle, our methodology using HD anomaly likelihoodachieved an average NAB score of . In comparison,the other top algorithms: LSTM and Windowed Gaussian,achieved average scores of 49.38 and 61.06, respectively.Furthermore, our qualitative results show that our methodol-ogy is signiﬁcantly more noise-resistant than the WindowedGaussian, KNN-CAD, EXPoSE, and BC detectors, whichwe attribute to the use of SDRs and an anomaly likelihoodcomponent. We also demonstrated that our methodology isreal-time capable, with an execution time on the same order of magnitude as state-of-the-art methods. Consequently, weconclude that HTM-based anomaly detection is a novel, prac-tical solution for a wide range of industrial PM applications.A

CKNOWLEDGMENT

This work was partially supported by the National ScienceFoundation (NSF) under awards CMMI-1739503 and ECCS-1839429 as well as by Graduate Assistance in Areas ofNational Need (GAANN) under award P200A180052. Anyopinions, ﬁndings, conclusions, or recommendations expressedin this paper are those of the authors and do not necessarilyreﬂect the views of the funding agencies.R

EFERENCES[1] C. Scheffer and P. Girdhar,

Practical machinery vibration analysis andpredictive maintenance . Elsevier, 2004.[2] R. K. Mobley,

An introduction to predictive maintenance . Elsevier,2002.[3] J. Fausing Olesen and H. R. Shaker, “Predictive maintenance for pumpsystems and thermal power plants: State-of-the-art review, trends andchallenges,”

Sensors , vol. 20, no. 8, p. 2425, 2020.[4] J. Hawkins and S. Blakeslee,

On intelligence: How a new understandingof the brain will lead to the creation of truly intelligent machines .Macmillan, 2007.[5] O. Seryasat, F. Honarvar, A. Rahmani et al. , “Multi-fault diagnosis ofball bearing using fft, wavelet energy entropy mean and root mean square(rms),” in . IEEE, 2010, pp. 4295–4299.[6] F. Immovilli, M. Cocconcelli, A. Bellini, and R. Rubini, “Detection ofgeneralized-roughness bearing fault by spectral-kurtosis energy of vi-bration or current signals,”

IEEE Transactions on Industrial Electronics ,vol. 56, no. 11, pp. 4710–4717, 2009.[7] B. Zhang, C. Sconyers, C. Byington, R. Patrick, M. E. Orchard, andG. Vachtsevanos, “A probabilistic fault detection approach: Applicationto bearing fault detection,”

IEEE Transactions on Industrial Electronics ,vol. 58, no. 5, pp. 2011–2018, 2010.[8] A. Kanawaday and A. Sane, “Machine learning for predictive mainte-nance of industrial machines using iot sensor data,” in . IEEE, 2017, pp. 87–90.[9] D. A. Tobon-Mejia, K. Medjaher, N. Zerhouni, and G. Tripot, “A data-driven failure prognostics method based on mixture of gaussians hiddenmarkov models,”

IEEE Transactions on reliability , vol. 61, no. 2, pp.491–503, 2012.[10] C. Feng, T. Li, and D. Chana, “Multi-level anomaly detection inindustrial control systems via package signatures and lstm networks,” in . IEEE, 2017, pp. 261–272.[11] T. Abbasi, K. H. Lim, and K. San Yam, “Predictive maintenance of oiland gas equipment using recurrent neural network,” in

IOP ConferenceSeries: Materials Science and Engineering , vol. 495, no. 1. IOPPublishing, 2019, p. 012067.[12] J. Yoon, D. He, and B. Van Hecke, “A phm approach to additivemanufacturing equipment health monitoring, fault diagnosis, and qualitycontrol,” in

Proceedings of the prognostics and health managementsociety conference , vol. 29. Citeseer, 2014, pp. 1–9.[13] C.-T. Yen and P.-C. Chuang, “Application of a neural network integratedwith the internet of things sensing technology for 3d printer faultdiagnosis,”

Microsystem Technologies , pp. 1–11, 2019.[14] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-timeanomaly detection for streaming data,”

Neurocomputing , vol. 262, pp.134–147, 2017.[15] L. Rodriguez-Cobo, P. B. Garcia-Allende, A. Cobo, J. M. Lopez-Higuera, and O. M. Conde, “Raw material classiﬁcation by meansof hyperspectral imaging and hierarchical temporal memories,”

IEEESensors Journal , vol. 12, no. 9, pp. 2767–2775, 2012.[16] A. Bamaqa, M. Sedky, T. Bosakowski, and B. B. Bastaki, “Anomalydetection using hierarchical temporal memory (htm) in crowd man-agement,” in

Proceedings of the 2020 4th International Conference onCloud and Big Data Computing , 2020, pp. 37–42.

EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 10 [17] A. Almehmadi, T. Bosakowski, M. Sedky, and B. B. Bastaki, “Htm basedanomaly detecting model for trafﬁc congestion,” in

Proceedings of the2020 4th International Conference on Cloud and Big Data Computing ,2020, pp. 97–101.[18] B. B. Bastaki, “Application of hierarchical temporal memory to anomalydetection of vital signs for ambient assisted living,” Ph.D. dissertation,Staffordshire University, 2019.[19] A. Barua, D. Muthirayan, P. P. Khargonekar, and M. A. Al Faruque,“Hierarchical temporal memory based one-pass learning for real-timeanomaly detection and simultaneous data prediction in smart grids,”

IEEE Transactions on Dependable and Secure Computing , 2020.[20] S. Faezi, R. Yasaei, A. Barua, and M. A. Al Faruque, “Brain-inspiredgolden chip free hardware trojan detection,”

IEEE Transactions onInformation Forensics & Security , 2021.[21] W. Yan, H. Qiu, and N. Iyer, “Feature extraction for bearing prognosticsand health management (phm)-a survey (preprint),” AIR FORCE RE-SEARCH LAB WRIGHT-PATTERSON AFB OH MATERIALS ANDMANUFACTURING . . . , Tech. Rep., 2008.[22] D. Wang, K.-L. Tsui, and Q. Miao, “Prognostics and health management:A review of vibration based bearing and gear health indicators,”

IEEEAccess , vol. 6, pp. 665–676, 2017.[23] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning forsmart manufacturing: Methods and applications,”

Journal of Manufac-turing Systems , vol. 48, pp. 144–156, 2018.[24] M. Kordos and A. Rusiecki, “Reducing noise impact on mlp training,”

Soft Computing et al. , “Bearing data set,”

IMS, University ofCincinnati, NASA Ames Prognostics Data Repository, Rexnord TechnicalServices , 2007.[27] P. Nectoux, R. Gouriveau, K. Medjaher, E. Ramasso, B. Chebel-Morello,N. Zerhouni, and C. Varnier, “Pronostia: An experimental platform forbearings accelerated degradation tests.” in

IEEE International Confer-ence on Prognostics and Health Management, PHM’12.

IEEE CatalogNumber: CPF12PHM-CDR, 2012, pp. 1–8.[28] S. R. Chhetri and M. A. Al Faruque, “Side channels of cyber-physicalsystems: Case study in additive manufacturing,”

IEEE Design & Test ,vol. 34, no. 4, pp. 18–25, 2017.[29] J. Hawkins and S. Ahmad, “Why neurons have thousands of synapses, atheory of sequence memory in neocortex,”

Frontiers in neural circuits ,vol. 10, p. 23, 2016.[30] numenta, “Numenta Temporal Memory Implementation,”Feb 2020, [Online; accessed 10. Feb. 2020]. [Online].Available: https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp[31] J. Park, “RNN based Time-series Anomaly Detector Model Implementedin Pytorch,” 2018, [Online code repository]. [Online]. Available: https://github.com/chickenbestlover/RNN-Time-series-Anomaly-Detection[32] M. Schneider, W. Ertel, and F. Ramos, “Expected similarity estimationfor large-scale batch and streaming anomaly detection,”

Machine Learn-ing , vol. 105, no. 3, pp. 305–333, 2016.[33] M. Smirnov, “Contextual Anomaly Detector,” Aug 2016, [Online coderepository]. [Online]. Available: https://github.com/smirmik/CAD[34] C. Wang, K. Viswanathan, L. Choudur, V. Talwar, W. Satterﬁeld, andK. Schwan, “Statistical techniques for online anomaly detection in datacenters,” in . IEEE, 2011, pp.385–392.[35] A. Stanway, “Etsy Skyline,” Oct 2015, [Online code repository].[Online]. Available: https://github.com/etsy/skyline[36] E. Burnaev and V. Ishimtsev, “Conformalized density-and distance-basedanomaly detection in time-series data,” arXiv preprint arXiv:1608.04585 ,2016.[37] R. P. Adams and D. J. MacKay, “Bayesian online changepoint detec-tion,” arXiv preprint arXiv:0710.3742 , 2007.

Arnav V. Malawade received a B.S. in ComputerScience and Engineering from the University ofCalifornia Irvine (UCI) in 2018. He is currently anM.S./Ph.D. Student studying Computer Engineeringat UCI under the supervision of Professor Moham-mad Al Faruque. His research interests include thedesign and security of cyber-physical systems inconnected/autonomous vehicles, manufacturing, IoT,and healthcare.

Nathan D. Costa received a B.S. in ComputerScience and Engineering from the University ofCalifornia Irvine (UCI) in 2020. He is currentlyapplying to industries relevant to his interests, thosebeing embedded software development and embed-ded system design.

Deepan Muthirayan is currently a Post-doctoralResearcher in the department of Electrical Engi-neering and Computer Science at University ofCalifornia at Irvine. He obtained his Phd from theUniversity of California at Berkeley (2016) andB.Tech/M.tech degree from the Indian Institute ofTechnology Madras (2010). His doctoral thesis workfocused on market mechanisms for integrating de-mand ﬂexibility in energy systems. Before his termat UC Irvine he was a post-doctoral associate atCornell University where his work focused on onlinescheduling algorithms for managing demand ﬂexibility. His current researchinterests include control theory, machine learning, topics at the intersectionof learning and control, online learning, online algorithms, game theory, andtheir application to smart systems.

Pramod P. Khargonekar received B. Tech. Degreein electrical engineering in 1977 from the IndianInstitute of Technology, Bombay, India, and M.S.degree in mathematics in 1980 and Ph.D. degree inelectrical engineering in 1981 from the Universityof Florida, respectively. He was Chairman of theDepartment of Electrical Engineering and ComputerScience from 1997 to 2001 and also held the positionof Claude E. Shannon Professor of EngineeringScience at The University of Michigan. From 2001to 2009, he was Dean of the College of Engineeringand Eckis Professor of Electrical and Computer Engineering at the Universityof Florida till 2016. After serving brieﬂy as Deputy Director of Technologyat ARPA-E in 2012-13, he was appointed by the National Science Foundation(NSF) to serve as Assistant Director for the Directorate of Engineering (ENG)in March 2013, a position he held till June 2016. Currently, he is ViceChancellor for Research and Distinguished Professor of Electrical Engineeringand Computer Science at the University of California, Irvine. His researchand teaching interests are centered on theory and applications of systemsand control. He has received numerous honors and awards including IEEEControl Systems Award, IEEE Baker Prize, IEEE CSS Axelby Award, NSFPresidential Young Investigator Award, AACC Eckman Award, and is a Fellowof IEEE, IFAC, and AAAS.

EEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. X, NO. X, MONTH 2021 11