[PDF] A Novel Algorithm for Optimized Real Time Anomaly Detection in Timeseries

Abstract

Observations in data which are significantly different from its neighbouring points but cannot be classified as noise are known as anomalies or outliers. These anomalies are a cause of concern and a timely warning about their presence could be valuable. In this paper, we have evaluated and compared the performance of popular algorithms from domains of Machine Learning and Statistics in detecting anomalies on both offline data as well as real time data. Our aim is to come up with an algorithm which can handle all types of seasonal and non-seasonal data effectively and is fast enough to be of practical utility in real time. It is not only important to detect anomalies at the global but also the ones which are anomalies owing to their local surroundings. Such outliers can be termed as contextual anomalies as they derive their context from the neighbouring observations. Also, we require a methodology to automatically determine the presence of seasonality in the given data. For detecting the seasonality, the proposed algorithm takes up a curve fitting approach rather than model based anomaly detection. The proposed model also introduces a unique filter which assess the relative significance of local outliers and removes the ones deemed as insignificant. Since, the proposed model fits polynomial in buckets of timeseries data, it does not suffer from problems such as heteroskedasticity and breakout as compared to its statistical alternatives such as ARIMA, SARIMA and Winter Holt. Experimental results the proposed algorithm performs better on both real time as well as artificial generated datasets.

Full PDF

AA Novel Algorithm for Optimised Real TimeAnomaly Detection in Timeseries

Krishnam Kapoor Indian Institute of Technology, Kharagpur, India [email protected]

Abstract. Observations in data which are significantly different from itsneighbouring points but cannot be classified as noise are known asanomalies or outliers. These anomalies are a cause of concern and a timelywarning about their presence could be valuable. In this paper, we haveevaluated and compared the performance of popular algorithms fromdomains of Machine Learning and Statistics in detecting anomalies on bothoffline data as well as real time data. Our aim is to come up with analgorithm which can handle all types of seasonal and non-seasonal dataeffectively and is fast enough to be of practical utility in real time. It is notonly important to detect anomalies at the global but also the ones which areanomalies owing to their local surroundings. Such outliers can be termed ascontextual anomalies as they derive their context from the neighbouringobservations. Also, we require a methodology to automatically determinethe presence of seasonality in the given data. For detecting the seasonality,the proposed algorithm takes up a curve fitting approach rather than modelbased anomaly detection. The proposed model also introduces a uniquefilter which assess the relative significance of local outliers and removesthe ones deemed as insignificant. Since, the proposed model fitspolynomial in buckets of timeseries data, it does not suffer from problemssuch as heteroskedasticity and breakout as compared to its statisticalalternatives such as ARIMA, SARIMA and Winter Holt. Experimentalresults the proposed algorithm performs better on both real time as well asartificial generated datasetsKeywords: Outliers, Timeseries, Anomalies, Real Time, Curve-Fitting, Periodicity Detection, Non Linear Regression.

Anomalies can be defined as observations that deviates so much from otherobservations as to arouse suspicion that it was caused by a different mechanism (Hawkins, 1980 [1] ) . Anomaly detection in timeseries is an important area withrespect to business. Timely alerts about anomalous behaviour of data can saveillions of dollars worth of losses. Identification and subsequent treatment ofanomalous data is also required for developing a reliable forecasting model.Although, an extensive amount of work has been done in this field [2] , still none ofthe existing algorithms have the desired accuracy, performance or auto tuningcapacity to be used in real time anomaly detection on business data. So, wepresent a novel algorithm which has a high degree of accuracy in indentifyinglocal anomalies and is optimised enough to be of practical utility for real timedata. We adopted a local polynomial based approach which fits local regressionmodels to small segments of out data. These local models are subsequently usedto compute the residual time series which is then compared with an adaptivethreshold to identify the anomalies. This adaptive thershold adjusts itself to detectanomalies which occur owing to the local context of the neighbourhood. We thenemploy a classifier in the form of a filter which determines if the detected anomalyis a false alarm. Any anomaly which does not satisfy the threshold criteria of thisfilter is discarded as a false positive. This enables the algorithm to effectivelydetect anomalies even in the presence of high noise. Employing such a method to detect anomalies requires two very important inputparameters – the size of local window in which the polynomial is fitted and theorder of the polynomial to be fitted in this window. The first parameter can bedetermined by detecting the periodicity of the data. If the data under considerationturns out to be periodic, it is naturally a good idea to take the window size equal tothe periodicity of the data. We empoly a non-parametric to detect the periodicityaccurately. Our algorithm also handles problems such as loss of periodicity * andpresence of multiple periodicities in the data. The second parameter can thensubsequently be estimated from the calculated window size. In this paper, we have evaluated the performance of 5 different outlier detectionmethods against our algorithm on various real as well as artificial data sets. Thealgorithms compared are based on different statistical and machine learningmodels such as SARIMA, Holt Winter, One Class Support Vector Machines andFB Prophet. Since, we did not come across any standard dataset tagged foroutliers, we resort to comparison based analysis on differernt datasets which weremanually annotated by industry experts. * The data under consideration might become aperiodic for a short interval due to noise.

Related Work

We have divided various outlier detection algorithms present in the literature intofour major categories-2.1 Statistical Models Business and economic time series are often complex and exihibit difficultperiodic patterns. A large number of model based approaches (S. C. HILLMER andG. C. TIAO [3] ) have been proposed which assumes time series to be a multiplicativeor additive combination of components such as trend, seasonality and residue.Such temporal based decomposition methods go all the way back to Holt-Wintersmethodology (Chatfield 1978 [4] ) . These approaches try to model each of thesecomponents seperately. However, these decompositions are often suited to theproblem they are designed for and thus fail to generalize to all cases. ARIMA based models (Chatfield 2016 [5] ) have also heavily been used to addressthe problem of outlier detection. In case of seasonal data, more generalizedSARIMA models have been employed. The problem with these models is thatthese are linear combinations of Autoregressive and Moving Average termswhereas many business time series are non-linear in nature. Although, the problemof automatic determination of order has been addressed using autoarima [6] , thesemodels still suffer from problems such as heteroskedasticity and automaticperiodicity determination. Apart from this, these statistical models assume that thetimeseries is ergodic with a gaussian noise model which might not hold in everycase. Fig. 1. Automated determination of periodicity (Source: Periodicity Paper) [7] is a interestingapproach. Many neural network [8] based approaches have been tried and evenombination of various machine learning and statistical models, for instanceneural network and SARIMA [9] have been studied for the purpose of forecasting .This approach can as well be employed for the problem of outlier detection but theproblem lies in the fact that time series data is usually short and hence neuralnetworks are often not trained properly. Anomaly classifiers based on One-class SVMs [10] are an attractive alternativebecause they can naturally detect outliers in a set of vectors. However, thesemodels do not generalize to all scenarios and require the proportion of outliers inthe dataset as a hyper-parameter, hence limiting their usage for real time data. Netflix’s anomaly detection based on Principle Component Analysis [11] andTwitter’s S-H-ESD [12] are also promising alternatives. However, drawbacks suchas requirement of high cardinality of data for PCA to work limit the usage ofthese algorithms for every time series.2.3 Deep Learning Models LSTM based models [8] can also be employed for anomaly detection but they toosuffer from the same problems, such as lack of data for proper training andextremely slow operation, as neural networks. Hence, deep learning based modelscannot be used in our case.2.4 Curve Fitting Models FB Pophet [13] proposes a non linear regression model with three modelcomponents: trend, seasonality and holiday. Prophet fits a linear or logisticfunction to model the trend and depends on Fourier series for seasonalitydetection. It also takes into account the effect of holidays to account for irregularshocks. Subsequently, Prophet uses LOESS regression for modelling the errorterms. This model can be used for anomaly detection as well using the calculatedbounds. Prophet was made keeping business time series into account, hence it wasexpected to perform well. However, our experiments show that it does not possessa relaible periodicity detection capacity for real time anomaly detection.

3 Outlier Detection Methods [4] and SARIMA [9] . The Holt Wintermodel [4] , also known as the triple smoothing approach, is a combination of Holtmodel which was formulated in 1957 and Winter’s model in 1960. There are twoversions of this model, the additive version and the multiplicative version. Boththese versions are constitute three components – the level, the growth and theeasonal factor. We have used the additive version for this paper. On the otherhand, SARIMA model is basically a seasonality generalization of the ARIMAmodel with 3 additional hyperparameters. It is denoted as SARIMA(p,d,q)(P,D,Q) m where p stands for trend autoregression order, d stands for trenddifference order, q stands for trend moving average order, P stands for seasonalautoregressive order, D stands for seasonal difference order, Q stands for seasonalmoving average order and m stands for the number of time steps for a singleseasonal period. In both the cases, we calculate the value of an observation using the specifiedmodel and subsequently calculate the residual errors by substracting thiscalculated value from the true value. İf the deviation of any point in the residualerrors is more than a set threshold * , we mark that point as an anomaly. All themodels considered in this paper are tested in both online ** as well as offline *** environments.3.2 One Class Support Vector Machines One Class Support Vector Machines [10] , abbreviated as OC SVM, convert atime series into a set of vector in projected phase spaces and subsequently classifyoutliers based on these vectors. The proportion of outliers in the data given byhyperparameter ν **** must be specified to the model. One of the obviousdrawbacks of this model is that the value of this hyperparameter is not known inadvance.3.3 FB Prophet FB Prophet fits a number of linear or a logistic trends to the given time seriesbased on its detected changepoints. It is basically a non-linear additive modelcapable of handling yearly, weekly and daily periodicity along with integrating theeffect of calendar holidays. One of its key features is that it is robust to missingdata and possess automatic seasonality detection. The upper and lower confidencebands are calculated for the given data automatically and point violating thesebands is marked as an anomaly. However, one of its major drawbacks is that itdoes not automatically detect any periodicity other than the predefined yearly,weekly and daily seasonalities.

4 The TOAD Model

The acronym TOAD stands for “Time Optimised Anomaly Detection”. In our * Here, we have taken the threshold to be 2 standard deviations from the mean of residual. ** Here, the algorithm is run multiple times with a new data point being introduced each time to check if the latest point is anomalous in real time. ***

Here, the entire time series is fed to the algorithm at once to determine the anomalies. ****

For this paper, we have used the value of ν which gives the best results. lgorithm, we have broken down the problem of outlier detection into four sub-problems, namely periodicity detection, outlier identification, removal of falsealarms and optimisation. Periodicity detection can be handled accurately by theuse of a hybrid algorithm given by

Vlachos, Michail & Yu, Philip & Castelli, Vittorio.(2005) [13] . According to this method, a time series has a true periodicity only if weget a peak in PSD and a local maxima in ACF corresponding to that periodicity.

Fig. 2. Automated determination of periodicity (Source: [13])

We have observed that in some cases, periodicitydetection becomes extremely hard for some part of thedata due to the presence of extreme noise. In order toovercome this problem, we save the periodicity of thedata once it is detected and adjust the value in case itchanges. During our experimentation, we encounteredanother problem, in case of small time series data, weget hills towards the end of ACF. However, thesecandidates for periodicity so be removed by setting aconfidence window to make sure that the detectedperiodicity is truly present. If multiple trueperiodicities are detected, we consider the smallestone. The reason for this will be clear once we discusshow curve fitted model works.4.1 Outlier Detection Methdology

Fig. 4. The illustrastion shows the concept of bucketing and subsequently fitting localmodels to a time series

Once the periodicity is detected, we bucket the time series into a number ofcontiguous windows to fit local models. In case of periodic series, it is a goodidea to take the periodicity of the data as the window size. This is also verified

Fig. 3. Process to update theperiodicity xperimentally as shown in fig. 4. Our experiments show that increasing thepolynomial order by 1 for every five data points results in a good fit of localmodels. In the case of non-periodic data, we experimentally conclude that awindow size of 10 with a linear regression model will result in proper curve fittinglocal models. The process of bucketization is initiated from the first data point ofthe time series. If there are not sufficient data points in the last window, we mergethat window with the previous one. All the local piece-wise models are thenstacked together and subsequently the window breaks between them aresmoothened by using polynomial regression with the same window size aroundthe point of the window break. Hence, this results in a smooth trend replica of ourtime series data. We obtain the residual errors by substracting this replica from the original timeseries. The residual errors are then bucketed in a similar fashion as the originaldata which subsequently help to calculate the local standard standard deviations.Bounds for each of the buckets are calculated by taking a linear combination ofthese local deviations with the global standard deviations given by the equation -

Residual Bounds = α ( SD Local ) + ( − α ) ( SD Global ) where α is constant between 0 to 1. We set the threshold as two times thecalculated bounds for a given bucket in both directions with respect to the mean ofeach bucket. The data points which reside outside these bounds are deemed asoutliers.4.2 False Alert Removal (FAS) Filter We calculate the bounds for all the local models of the bucketed signal in thesame way as we calculated bounds on the residual errors given as - Signal Bounds = α ( Signal SD

Local ) + ( −α ) ( Signal SD

Global ) We now define a filter known as FAS whose value is calculated for each for thethese buckets by the formulae -

Filter = log ( Signal BoundsResidual Bounds ) This value essentially measures the strength of the residual errors w.r.t. to thestrength of the original signal. The idea behind designing such a filter is that manylocal anomalies become insignificant in the presence of a very strong trend orseasonality. This filter assess the relative strength of the residual errors w.r.t. thesignal. We remove the outliers of a window if the FAS value of that window isgreater than a threshold. A threshold of 1 implies that the variance of residualerrors is 100 times weaker than the variance of the signal. Hence, a thresholdvalue of 1 is reasonable for the FAS filter.

Fig. 5. The illustrastion on the left shows the results without employing the FAS whereasthe image on the right shows the filtered output.

5 Results

For analysing the relative performance of our algorithm, we compare it with 4other anomaly detection methodologies. These include Holt Winter, SARIMA,One Class SVM and Prophet based anomaly detection algorithms. For getting abetter understanding about performance of these algorithms, we resort to bothonline and offline based comparison. We start by comparing these algorithms on10 artifically produced datasets of which 2 of the toughest cases are highlightedbelow. The first data set is made of an artificial sine wave with a periodicity of 28and heteroskedastic amplitude as shown in fig 6. This dataset does not contain anyanomaly. The second data set has 500 data points with an artificial breakout at the250 th point as shown in fig 6. Fig. 6. The figure on the left shows the second dataset and the figure on the right shows thefirst dataset.Table 1. Performance analysis of various algorithms in both online and offlineenvironments on the first dataset.

Algorithm

True Positives True Negatives False Positives False Negatives TypeTOAD O FF L I N E Holt Winter

SARIMA ** OC SVM * Prophet

TOAD O N L I N E Holt Winter

SARIMA ** OC SVM * Prophet * We have used value of ν =0.01 for OC SVM model. ** Here, we have employed SARIMA(1,0,0)(1,0,1) model.able 2. Performance analysis of various algorithms in both online and offlineenvironments on the second dataset. Algorithm

True Positives True Negatives False Positives False Negatives TypeTOAD O FF L I N E Holt Winter

SARIMA * OC SVM ** Prophet

TOAD O N L I N E Holt Winter

SARIMA * OC SVM ** Prophet

Table 1 clearly shows the effectiveness of our algorithm against false alarmswhile Table 2 indictes that our algorithm also works effectively during extremeconditions such as breakout. Similar results were obtained for the remaining 8datasets. We now compare the performance of these algorithms on real datasets.Both the third and fourth datasets are aperiodic and are shown in Fig. 7.

Fig. 7. The illustrastion on the left shows the third dataset and the illustration on the rightshows the fourth dataset.Table 3. Performance analysis of various algorithms in both online and offlineenvironments by F1 score comparison on the third dataset.

Algorithm

True Positives True Negatives False Positives False Negatives F1 Score TypeTOAD

13 273 1 0 0.96 O FF L I N E Holt Winter

12 274 0 1 0.96

SARIMA ***

OC SVM ****

12 272 2 1 0.89

Prophet

TOAD

13 271 1 2 0.90 O N L I N Holt Winter

12 268 5 2 0.77

SARIMA ***

13 267 5 2 0.79

OC SVM ****

14 269 3 1 0.87

Prophet * Here, we have empolyed SARIMA(1,1,1)(0,0,0) model. ** We have used ν =0.01 for OC SVM model. *** Here, we have empolyed SARIMA(1,1,1)(0,0,0) model. **** We have used ν =0.05 for OC SVM model.able 4. Performance analysis of various algorithms in both online and offlineenvironments by F1 score comparison on the fourth dataset. Algorithm

True Positives True Negatives False Positives False Negatives F1 Score TypeTOAD O FF L I N E Holt Winter

SARIMA * OC SVM ** Prophet

TOAD O N L I N E Holt Winter

SARIMA * OC SVM ** Prophet

Table 5. This Table shows the reduction in time complexity by optimising the number ofruns of the algorithm on various datasets.

DatasetNumber

Total numberof possibleruns Number ofruns beforeoptimisation Number ofruns afteroptimisation Total timetaken beforeoptimisation Total timetaken afteroptimisation TypeDataset 1

265 265 237 35.7 31.9 O N L I N Dataset 2

480 480 261 119 s 41 s

Dataset 3

267 267 221 34.7 s 26.6 s

Dataset 4

101 101 42 7.41 s 2.72 s

6 Conclusion

This paper proposes a novel algorithm for real time anomaly detection using curve fitting approach and proves its superiority with respect to its existing counterparts. There are many interesting directions for future work. For instance, the problem of automated determination of hyperparameters can be investigated, the effect of public holidays to give context to the anomalies can be studied or criteria for classification based on the severity of anomaly can be considered. * We have employed SARIMA(1,0,1)(0,0,0)

Here, we have taken ν =0.05 for OC SVM model. *** This algorithm was run on Intel® Core™ i3-7100U CPU @ 2.40GHz × 4 with 8 GB RAM eferenceseferences