[PDF] Exploring Data and Knowledge combined Anomaly Explanation of Multivariate Industrial Data

Abstract

The demand for high-performance anomaly detection techniques of IoT data becomes urgent, especially in industry field. The anomaly identification and explanation in time series data is one essential task in IoT data mining. Since that the existing anomaly detection techniques focus on the identification of anomalies, the explanation of anomalies is not well-solved. We address the anomaly explanation problem for multivariate IoT data and propose a 3-step self-contained method in this paper. We formalize and utilize the domain knowledge in our method, and identify the anomalies by the violation of constraints. We propose set-cover-based anomaly explanation algorithms to discover the anomaly events reflected by violation features, and further develop knowledge update algorithms to improve the original knowledge set. Experimental results on real datasets from large-scale IoT systems verify that our method computes high-quality explanation solutions of anomalies. Our work provides a guide to navigate the explicable anomaly detection in both IoT fault diagnosis and temporal data cleaning.

Full PDF

aa r X i v : . [ c s . D B ] J a n Exploring Data and Knowledge combined AnomalyExplanation of Multivariate Industrial Data

Xiaoou Ding ∗ , Hongzhi Wang † , Chen Wang § , Zijue Li † , Zheng Liang †∗† School of Computer Science and Technology, Harbin Institute of Technology. § National Engineering Laboratory for Big Data Software, EIRI, Tsinghua University.Email: ∗ [email protected], † { wangzh,lizijue,lz20 } @hit.edu.cn, § wang [email protected] Abstract —The demand for high-performance anomaly detec-tion techniques of IoT data becomes urgent, especially in industryﬁeld. The anomaly identiﬁcation and explanation in time seriesdata is one essential task in IoT data mining. Since that theexisting anomaly detection techniques focus on the identiﬁcationof anomalies, the explanation of anomalies is not well-solved. Weaddress the anomaly explanation problem for multivariate IoTdata and propose a 3-step self-contained method in this paper. Weformalize and utilize the domain knowledge in our method, andidentify the anomalies by the violation of constraints. We proposeset-cover-based anomaly explanation algorithms to discover theanomaly events reﬂected by violation features, and furtherdevelop knowledge update algorithms to improve the originalknowledge set. Experimental results on real datasets from large-scale IoT systems verify that our method computes high-qualityexplanation solutions of anomalies. Our work provides a guideto navigate the explicable anomaly detection in both IoT faultdiagnosis and temporal data cleaning.

Index Terms —Anomaly explanation, time series data cleaning,rule-based violation detection, temporal data mining,

I. I

NTRODUCTION

Anomaly is summarized as any unusual change in a value ora pattern, which does not conform to speciﬁc expectations [1],[2]. Identifying the anomalies (roughly regarded as outliers,errors, and glitches) is one of the most challenging and excitingtopics in data mining and data cleaning community [1], [3].Researchers have gone a long way in anomaly detectionstudies in various ﬁelds. [4] introduces anomaly detectiontechniques for temporal data, which covers several kindsof temporal data including time series. Anomaly detectiontechniques with different methods such as Density-, Window-and Constraint-based approaches have been developed andapplied in various real scenarios (see [4], [5] as survey).The rapid development of sensor technologies and thewidespread use of sensor devices witness the ﬂowering ofdata management and data mining technologies in sensordata. The demand for high-performance mining techniques ofInternet of Things (IoT) data also becomes urgent, especiallyin industry ﬁeld. Time series is one of the most importanttypes in IoT data [5], [6]. Thus, the anomaly identiﬁcation andexplanation tasks for time series data are essential. Despite ofthe advanced anomaly detection techniques, existing studiespay much attention to the identiﬁcation of anomalies anderrors, leaving the anomaly reasoning and explanation not wellsolved. The characteristics of multivariate time series data inIoT applications urgently expect the detection methods to go further to explain the occurrence of anomalies, and thus, toachieve a higher dependability and interpretability in anomalystudies.Anomaly explanation, or called explainable anomaly iden-tiﬁcation, will promote the involvement of domain knowledgein the techniques, and assist users to well understand theanomalies, which in turn improves the identiﬁcation perfor-mance. It also complements existing data cleaning techniquesconsidering the explanation discovery for errors and violations.Considering the limitation of the current state-of-art anomalyexplanation approaches, the challenges of the problem are asfollows.(1).

Less attention paid to the dependency of data . Since thatthe multivariate time series are collected with each sequence( i.e. , attribute) from to one sensor, the abnormal data in se-quences may not be completely independent. Anomalies wouldresult in the glitches existing across multiple attributes withcomplex interactions. Treating all sequences independentlymay fail to correctly identify the actual errors in data.(2).

Under-utilized domain knowledge . Discovering the in-teractions and causes of anomalies requires the involvement of knowledge which comes from domain experts or professionalrules, especially in IoT data. Though knowledge-driven meth-ods, such as fault tree analysis (FTA), expert system (ES) [7],Dynamic Bayesian Network (DBN) [8], have been developedfor anomaly and fault diagnosis tasks, it still has limitationin knowledge modeling. Moreover, the incompleteness andfuzziness add to the difﬁculty in utilizing the knowledge.(3).

Lack of scalability in (IoT) big data . Since that currentknowledge-based methods usually focus on small-scale spe-ciﬁc scenarios, neither the knowledge nor the method can beeasily adapted to other scenarios.Referring to the desirable properties of causality analysis inviolation detection [9], [10], we summarize that a high-qualityexplicable anomaly detection approach in industry applicationsalways focuses on the following objectives. • Coverage . The solution of the method is expected tocomprehensively cover anomaly instances existing in data. • Conciseness . The method needs to provide a concisenesssolution rather than a redundant one, for the reason that,both time and human resources are costly in the responseprocedure to the anomalies. In addition, the consequence ofthe unexpected anomalies is unpredictable. That requires themethod to provide a small-scale solution for decision makings much as possible. • Self-update . The method is expected to deal with the newanomaly instances whose patterns are unknown in the knowl-edge base or does not occur (be detected) in the historical data. • Less tolerance of False Negative (FN) . Though it isdifﬁcult to entirely avoid False Negative and False Positivein practical detection tasks, it demands a high performanceof the method in industrial ﬁeld, especially the electricityand manufacture scenarios. FN means one fails to identifyuntraceable anomalies in data, which is likely to result in moreserious effects, compared with FP.

Contributions . Motivated by the above, we explore theanomaly explanation problem in multivariate time series un-der industrial scenarios with data and knowledge combinedmethod in this paper. Our contributions are summarized asfollows.(1). We formalize the anomaly explanation problem inmultivariate temporal data, and design a self-contained 3-stepanomaly explanation method framework for multivariate data(see Figure 2), according to the aforesaid four objectives.The proposed framework provides a guide to navigating theexplicable anomaly detection, especially in temporal datacleaning and IoT fault diagnosis techniques.(2). We apply the 4 type of constraints proposed in [6]which formalize the dependence on attributes (columns) andentities (rows) to accurately uncover the anomalies hiddenin multivariate data in violation detection step (see SectionIII). We devise a set-cover-based algorithm

AEC to addressthe anomaly explanation, and provide concise and reliableexplanation solutions covering all the anomaly representations(see Section IV).(3). We formalize and well utilize the domain knowledge toachieve the description and the explanation of the anomaliesin data. We also provide knowledge update procedures andalgorithms during the iteration of detection and explanation,which allows both manual intervention and automatic update(see Section V).(4). We conduct a thorough experiment on real-life datasetsfrom large-scale IoT systems. Results of the comparisonexperiment results verify our method provides high-qualityexplanation solutions of anomalies.II. F

RAMEWORK O VERVIEW

A. Problem Statement

We outline the multivariate time series in Figure 1. S = h s , ..., s N i is a sequence on sensor S , where N = | S | is the length of S , i.e. , the total number of elements in S . s n = h x n , t n i , ( n ∈ [1 , N ]) , where x n is a real-valuednumber with a time point t n , and for ∀ n, k ∈ [1 , N ] , it has ( n < k ) ⇔ ( t n < t k ) . Let Eq be an equipment sensor group. S Eq = { S , ..., S M } ∈ R N × M is a M -dimensional time series,where M is the total number of equipment sensors, i.e ., thenumber of dimensions. T = { t , ..., t n } is the set of timepoints of time series S Eq .In this paper, we use rule-based techniques to detect theanomalies from the violation of the given constraints. We Sensor Set S (Eq) Sensor 1 Sequence S equipped with s s s ... ... s N Sensor M Sequence S M Time interval T [ l:n ] ...... ...... ...... A sequence tuple S ( t i ) Equipment (Eq) produceproduce s s s ... ... s N Fig. 1. Multivariate IoT time series. introduce the constraint set for one sequence in Deﬁnition 1.Accordingly, given constraint c for sequence S , S is identiﬁedto violate c if the data in S does not satisfy the contentdescribed by c . We denote such violation by S = c . Deﬁnition 1: ( Constraint set ). C is the set of all constraintsdeﬁned on sequence S , denoted by C ( S ) = { c , ..., c n } , where c i is a formulated or learnt constraint or rule the data need tomeet. Deﬁnition 2: ( Violation feature ). Given sequence S andthe constraints set of S , i.e., C ( S ) , we maintain a 2-tuple v = h S, F ( c ) i of S w.r.t. constraint c, ( c ∈ C ( S )) , where F ( c ) is a degree function computed by a speciﬁed violationmeasurement F on c , which has two formats:(1). If c is a qualitative constraint, F ( c ) = (cid:26) , S = c , S | = c. (2). If c is a quantitative constraint, it has F ( c ) = [ d, u ] ,where d and u are the lower bound and the upper boundcomputed by the measurement F , respectively. v is regarded as the violation feature of S on c when S = c . V ( S ) = { v , v , ... } is the set of all violation features of S , and V ( S ) = { V ( S ) , ..., V ( S M ) } is the total set of allviolation features in sequences of data S .It is acknowledged that the anomaly explanation discoveryproblem needs the assistance of knowledge provided by do-main experts who have accumulated countless practical ex-perience. The knowledge supplied from industry applicationshas various forms, including 1)a fault ID number which isacknowledged in the diagnostic system and can be retrievedin the users’ manual, 2)some general descriptions of abnor-mal patterns, 3)the empirical causal inference of anomalyinstances, etc. In this section, we formalize several concepts w.r.t the provided knowledge, which is the critical input of thestudied problem, besides data S and the constraint set C .In general, we aim to explain the detected anomalies byﬁnding out the corresponding fault reasons. We consider an(acknowledged) anomaly event E to be one reason of theoccurrence of anomalies. Due to the fact that each fault eventwill lead to a series of unexpected changes in sensors data,we consider one change in data as one anomaly representation,denoted by r in Deﬁnition 3. r is regarded as the smallest unitof the given knowledge in our problem. We brieﬂy presentexamples of a knowledge set of a sensor group in a powerplant in Table I. Deﬁnition 3: ( Anomaly representation ). r = h S, F r ( c ) i is the anomaly representation of constraint c on sequence S ,where F r ( c ) is one kind of formal description depicted from ABLE IE

XAMPLE OF A KNOWLEDGE SET

Event E Explanation R ( E ) Representation

Id-1 Sensor break temperature decline,... h S , [ −∞ , i · · · Id-2 Sensor break pressure drop,... h S , F ( c ) i , h S , F ( c ) i · · · Id-1 Engine off zero in power,... · · ·

Id-1 Boiler state instability temperature shock,... · · · domain experts or professional knowledge. F r ( c ) has the same structure of F ( c ) referring to Deﬁnition2. For quantitative constraints, F r ( c ) = [ d r , u r ] , where d r (resp. u r ) is the lower (resp. upper) value of the descriptionof the knowledge.Accordingly to Deﬁnition 3, the unexpected changes causedby an event E is formally presented as a set of anomalyrepresentations. Such a set of representations is considered asan explanation of anomaly instances in data. That is, R ( E ) = { r , ..., r n } is a set of anomaly representation describing oneevent E . R ( E ) is the maximum set of representation r swhich can be provided by domain experts. The set of allexplanations, denoted by R = { R ( E ) , ..., R ( E N ) } is theformal description of the domain knowledge provided for theequipment sensor group S Eq . Formally, the problem studied inthis paper is stated in Deﬁnition 4. Deﬁnition 4: ( Problem description ). Given the multivariatetime series S of equipment Eq , the constraint set C , and theknowledge set R , the anomaly explanation problem includestwo tasks below, with the four objectives proposed in SectionI: (1) to detect the violations in S according to C , locate theviolated sequence ID with time interval T , and record theviolations in the set V ( S ) , and (2) to discover an explanationset R ′ ⊆ R w.r.t V ( S ) . B. Overview of the Approach

Figure 2 outlines our method, which contains three phases:violation detection, anomaly explanation, and knowledge up-date. We will discuss the violation detection with types ofconstraints in temporal data in Section III, and introduce ouranomaly explanation algorithms in Section IV in detail. Theknowledge update step will be discussed in Section V withthe procedure of Algorithm 3 and the function in Algorithm4 to ﬁnd out the candidate update explanations.III. D

ISCOVERY OF ANOMALY INSTANCES

A. Constraint-based anomaly detection

Since that the dependance and relevance does exists among multivariate time series, we apply 4 types of constraintsdiscussed in [6] in our violation detection process. As shownin Table II, the 4-Type constraints embody the dependence onattributes (columns) and entities (rows) for temporal data.

TABLE IIT

YPES OF CONSTRAINTS

Type Singe column Multi-columnSingle row

Type 1 Type 2

Multi-row

Type 3 Type 4

Accordingly, we summarize some instances of the four-typeconstraints in Figure III. We consider the value domain of data

TABLE IIIE

XAMPLES OF CONSTRAINT TYPES

Type Singe sequence Multi-sequenceSingle time point

T-1: Value domain T-2: CFDsfrom documents T-2: Physical Mechanism

Time interval

T-3: SD, SC [11], T-4: Similarity ConstraintsT-3: Variance Constraints [12] points in sequence as the simple instance of Type-1 constraints,CFD for relational data and Physical Mechanism for industrialdata are concluded as multi-sequence constraints. Constraints,such as SD, SC, and VC, formalizing the dependence of datapoints along the time in one sequence belongs to Type-3constraints. Rules describing the similarity located in multisequences can be classiﬁed as Type-4 constraints. Varioustypes of constraints assist to precisely locate the anomalies andit has potential to uncover the anomalies as early as possible.

B. Anomaly distance measurement

With types of constraints, we can detect the violationshidden in sequences. After we obtain violation features, weneed to compare them with the given anomaly representationsto determine the real-happened anomaly events. We ﬁrstpropose the concept of “

Explicable ” in Deﬁnition 5, andpropose anomaly distance function in Deﬁnition 6, accordingto which we are able to quantize the closeness of violationswith acknowledged anomaly events.

Deﬁnition 5: ( Explicable feature ). Given a detected vi-olation feature v : h S, F ( c ) i , v is explicable by R , iff ∃ r : h S ′ , F r ( c ′ ) i ∈ R, R ⊆ R , S = S ′ and c = c ′ . Deﬁnition 6: ( Anomaly distance ). Given feature v : h S, F ( c ) i and the (corresponding) representation r : h S, F r ( c ) i , the distance between v and r w.r.t constraint c iscomputed as follows,If c is a quantitative constraint, then it has dist (cid:0) v, r (cid:1) = dist (cid:0) F ( c ) , F r ( c ) (cid:1) = 1 − [ d, u ] ∩ [ d r , u r ][ d, u ] ∪ [ d r , u r ] , (1)If c is a qualitative constraint, then it has dist (cid:0) v, r (cid:1) = dist (cid:0) F ( c ) , F r ( c ) (cid:1) = | F r ( c ) − F ( c ) | . (2)The proposed distance function dist ( · , · ) is only measurablewith regard to the same constraint. It does not make senseto compute the distance between v and r which hold dif-ferent constraints. Obviously, the anomaly distance dist ( v, r ) coincides with the properties of the distance function, whichlocates in [0 , and the value of dist ( v, r ) is lower as feature v is closer to representation r . More speciﬁcally, for qualitativeconstraints, dist ( v, r ) only has two values where dist ( v, r ) =0 shows the detected v is consistent with the representation r , and dist ( v, r ) = 1 , otherwise. For quantitative constraints, dist ( v, r ) ∈ (0 , shows v is partially consistent with r , while dist ( v, r ) = 1 indicates that feature v is completely differentfrom the representation r , with [ d, u ] ∩ [ d r , u r ] = 0 . Thus, wedescribe how to determine whether the feature is consistentwith the representation in Deﬁnition 7. iolation detection Violation features V Distancemeasurement dist ( v , r ) Knowledge set R Explanation filter ExplanationupdateUsers

Step 1: violation detection Step 2: anomaly explanation Step 3: knowledge update inexplicable violation features

IoT data Constraints C r + s updated representationslearned constraints Domainexperts

Fig. 2. Method framework overview

Deﬁnition 7:

Given feature v and the corresponding repre-sentation r , v is identiﬁed to be consistent with r if having dist ( v, r ) (cid:26) = 0 , c is qualitative ,< , c is quantitative . (3)More generally, dist ( v, r ) < is too loose to estimate whetherthe feature matches a representation. Thus, we introduce athreshold θ ∈ (0 , and consider v is consistent with r when dist ( v, r ) < θ . θ can be learned from amount of experiments,or set manually as required.We highlight that, one may fail to ﬁnd the one-to-onematch between each violation feature and each representationin the real industry scenarios. That is, a violation feature v obtained from the violation detection method is not neces-sarily identiﬁed to be explicable (by R ). It results from boththe precision limitation of violation detection techniques andthe incompleteness in knowledge representations provided byexperts. In the next section, we will introduce our anomalyexplanation approach, in which we take into considerationthe multiple conditions between the detected features and thegiven representations in detail.IV. I DENTIFYING A NOMALY E XPLANATIONS

A. Candidate Explanations Discovery

In anomaly explanation and analysis research, especiallythe knowledge-based study, it is acknowledged that both incompleteness and ambiguity always exist in experts knowl-edge. For the former, domain experts may not provide alldescriptions about one anomaly instance reﬂected in the sensordata. The main reasons include 1) experts’ limitations inprofessional degree and human’s understandability of anomalyproblems, and 2) the deﬁcient deployed sensors, which mayfail to record some parameters that are critical to the expla-nation of abnormal data. For the latter, since some knowledgeare accumulated form practical experience, one cannot expectindustry experts to be always deﬁnite about they explanations.Part of these explanations may happen with probability.According to our investigation of manufacturing and theelectricity industry, both deﬁnite and presumable knowledgeare applied in anomaly explanation and fault diagnose tasks.In this case, we divide the explanations of anomaly into twocategories: exact explanations and possible explanations . Given a fault event E , the exact explanation of E is themaximum set of anomaly representations, denoted as R ∗ ( E ) ,in which the fault event E leads to the existing of represen-tations { r , ..., r x } in data. ∀ i ∈ [1 , x ] , r i is called an exactrepresentation for E . The possible explanation of E is a setof anomaly representations, denoted by R + ( E ) = { r , ..., r y } ,in which E lead to the existing of anomaly representation r i , i ∈ [1 , y ] with probability Pr , (cid:0) Pr ∈ (0 , (cid:1) . ∀ j ∈ [1 , y ] , r j is called a possible representation for E .With the two categories, we denote the explanation R bythe combination of R ∗ and R + , as shown in Proposition 1. Proposition 1:

The explanation R w.r.t. event E is the unionof the exact explanation and the possible explanation of E ,denoted by R = R ∗ ∪ R + , where R ∗ ∩ R + = ∅ and R ∗ = ∅ .In general, the exact R ∗ is considered as the key factor in theidentiﬁcation of E , while the possible R + helps to describethe event in a rough way. In industry scenarios, the occurrenceof a fault, especially a known one, will certainly give rise tothe violation of a series of constraints, as described by R ∗ .But on the contrary, one cannot be sure that whether the eventreally happens when some representations in R ∗ have beendetected. We formally describe the relationship between a fault E and its explanation R ∗ ( E ) in Proposition 2. Accordingly,we are able to make further analysis in order to obtain thefault set and provide a reliable and high-quality explanationof the anomalies. Proposition 2:

Given a fault event E with R ( E ) , E is thesufﬁcient and unnecessary condition of its exact explanation R ∗ ( E ) , ( R ∗ ( E ) ⊆ R ( E )) .From the above, we are able to narrow the computation on R by ﬁnding out a subset of R , in which all exact representationshave appeared in data. Such subset of R is denoted as G = { R , R , ... } as shown in Deﬁnition 8. Thus, we ﬁrst ﬁnd outthe candidate explanation set G from R by { verifying theappearance of exact explanations with all violation features } detected by the previous steps, and then precisely compute theexplanation set from the candidate result G . Deﬁnition 8:

Given the violation feature set V of S w.r.t. T , and the set R , G is identiﬁed as a candidate explanationset if satisfying: 1) G ⊆ R , and 2) ∀ R ∈ G, R ∗ ⊆ R , ∀ r : h S, F r ( c ) i ∈ R ∗ , ∃ v ∈ V , v is consistent with r w.r.t. c .The candidate explanation set discovery process is shown in lgorithm 1: Compute Candidate Explanations

Input: V = { V ( S ) , ..., V ( S M ) } : the set of violationfeatures in S w.r.t T , the explanation set R Output: the set of candidate explanations G initialize G ← ∅ ; foreach R ∈ R do cand ← ; foreach r ∈ R.R ∗ do if v.F ( c ) = 0 or dist ( v, r ) = 1 then cand ← ; break; if cand = 1 then G ← G ∪ { R } ; return G ; Algorithm 1. After initialize an empty set G , we enumerateeach explanation set R from R in the outer loop (Lines 2-9),and maintain a label cand to record whether all elements in R exist in the detected violation features. Within the outer loop,we enumerate each representation r from the exact explanationset of R , i.e., R.R ∗ and identify the occurrence of r in data. Welet the label cand = 0 when the sequence S does not violateconstraint c w.r.t r , or the violation in S is different from r , i.e., dist ( v, r ) = 1 (Lines 5-7). After all exact representationsin R.R ∗ are visited, we add all R s with label cand = 1 into G , and ﬁnally obtain the objective subset G from R . B. Cost-based Explanation determination

According to Algorithm 1, we can qualitatively ﬁnd out thecandidate set which contains an overall explanation of faultthat really happens in data. However, the set G is far from ahigh-quality solution for industry applications. In the next step,we aim to make further discovery of anomaly explanationsconsidering the four objectives as discussed in Section I.Note again that an explanation includes a few representa-tions, either the exact or the possible ones. We identify theexplanations of anomalies by measuring the distance betweenthe violation feature and the representation w.r.t. the same c .Intuitively, we should choose those explanations with closedistance values between the detected features and the givenrepresentations. In order to model the distance degree between v s and r s, we introduce a cost-based principle to quantitativelycompute how much the violation features matches an anomalyexplanation in Deﬁnition 9. Deﬁnition 9: ( Explanation Cost ). The cost of applyingevent E as an explanation of the anomalies in S is, Cost ( E ) = P xr i ∈ R ∗ dist ( v i , r i ) | r i .S | + P yr j ∈ R + w · dist ( v j , r j ) | r j .S | , where R ∗ ( E ) (resp. R + ( E ) ) is the exact (resp. possible) ex-planation of E , v is the detected violation feature of sequence S w.r.t constraint c , dist ( v, r ) is the anomaly distance between v and r w.r.t c , and | r.S | is the number of sequences involvedin r , and ω ∈ (0 , is the probability value of a possiblerepresentation in R + . TABLE IVC

ONFUSION MATRIX IN THE EXPLANATION PHASE

Knowledge DetectedAbnormal data Normal dataExist representations

Set A: V ∗ Set B (

R \ R o ) No representation

Set C:

V \ V ∗ Set D

According to Deﬁnition 9, a higher

Cost ( E ) value showsthat the violation features in data less match the fault rep-resentations of E , and intuitively, fault E is less likely to bea reason of the detected anomalies. It is acknowledged thatthere probably exists more than one fault in the equipmentsystem at the same time interval, which urges us to explore themultiple reasons which can cover all the anomalies detectedby constraints. Here, we introduce the minimum weight setcovering problem (MWSCP) in our method to derive theoptimal explanations of anomalies.Note that we are faced with different cases in the compari-son of the observed ( a.k.a detected) real data and the knowl-edge representations. The constraint-based anomaly detectionphase identiﬁes the input data as two categories: 1) abnormaldata which violate at least one constraint, and 2) normaldata which do not have violations. Considering the givenknowledge, there also has two cases: 1) existing representationwhich describes the violation of certain constraint, and 2) nogiven representations at all. Accordingly, there are totally fourkinds of conditions to be considered in the explanation analysisphase, as shown in Table IV. From the point of the monitoringdata, we summarize the total four kinds of data as Set A, B,C, D, respectively.Set D in Table IV is beyond our research, for data in Set D isneither detected to be abnormal nor described by knowledge.We focus on the analysis of Set A, B, and C. Speciﬁcally,Set A contains the violated data which can be explicable byrepresentations from knowledge set R . Set B contains the datawhich have representations in R but are identiﬁed as normal.Set C contains the data detected to have violations while thereare no representations in R to explain these violations. Itmainly has two reasons: i ) The constraint instances are settoo strict so that the method falsely identiﬁes some normaldata to be abnormal, or ii ) some new anomaly patterns arediscovered which is unknown in the present knowledge set.Thus, the cases in Set C are also serious in our solutions,because it can assist update the knowledge set.Below we ﬁrst introduce how to precisely explain Set A, andwe will propose the updating method according to the detectedSet C in the next Section. Considering our four objectives, oursolution needs to cover all the detected anomalies with a small-size results. That is, to ﬁnd out a concise set of explanationsfrom R which could explain all the violations. Moreover,our method is expected to provide explanations much welldescribing the anomalies with close distance values. C. Set-Cover-based anomaly explanation

As discussed above, we aim to ﬁnd a subset of R as theproblem solution which covers all the violated instances indata. We denote the data in Set A as V ∗ = { v : h S, F ( c ) i| v ∈ ( S ) , and v is explicable by R} . We apply the minimumweight set covering problem (MWSCP) [13], [14] to solve ourAE problem. Deﬁnition 10 formalizes our problem, where theset V ∗ is the target to be covered, and the explanation cost Cost ( R ) is regarded as the weight. Deﬁnition 10: ( Anomaly Explanation Problem ) Giventhe violation set V ( S ) w.r.t . S , the knowledge set R , andthe candidate explanation set G . Our anomaly explanationproblem is to ﬁnd an explanation set H which satisﬁes min X R ( E ) ∈ H cost ( E ) s.t. H ⊆ G, [ R o ( E ) ⊆ R ( E ) R o .cover = V ∗ , ∀ R ( E ) ∈ H where R o ( E ) = { r , ...r m } is a subset of R ( E ) w.r.t anomalyevent E having ∀ r i ∈ R o ( E ) , i ∈ [1 , m ] , ∃ explicable feature v ∈ V ∗ , dist ( v, r i ) < , and R o .cover represents the total setof the violation features which are consist with r s in R o . The greedy-based heuristic algorithm . Since the set covercomputing is NP-hard [10], [14], we introduce greedy-basedalgorithms for our AE problem. Considering the coverageissues, the solution of our anomaly explanation problemproposed above is a covering of the set V ∗ ﬁrst, and thenit satisﬁes the minimum cost principle. In this case, we areable to ﬁnd out whether an explanation R ∈ R is certainlycontained in the the solution H or certainly does not exist in H . Both cases are concluded in Proposition 3 and Proposition4, respectively. Proposition 3 shows that if the explanation R j covers more violations besides R i .cover , while R j has littlecost than R i , R i will not be selected into the solution H .Proposition 4 shows that an explanation R must be selectedinto the solution if it is the only explanation that can coversviolation v . Proposition 3:

Given R , the explanation R i is not valid anddoes not exist in the solution H , if ∃ R i , R j ∈ R , R i .cover ⊆ R j .cover , and Cost ( R i ) > Cost ( R j ) . Proposition 4: R from R is a valid explanation and doesexist in the solution H , if ∃ v ∈ V ∗ , there is only oneexplanation R in R which satisﬁes v ∈ R.cover .Considering both effectiveness and efﬁciency, we propose agreedy-based heuristic algorithm to obtain H . The generalprinciple is, to give priorities to choosing the explanation i )which has smaller cost value, and ii ) covering the violationsof constraints deﬁned on multiple sequences, speciﬁcally thePhysical Mechanism constraints in this paper. For the former,it is obvious that the fault event with smaller cost value ismore reliable to explain part of the anomalies. For the later, weconsider to cover the violations existing in multiple sequencesprior to the ones in single sequence, for three main reasons:1) the violation between sequences are more likely to involvefault event(s) than the violation happening in single sequence.Because the detected single-sequence violations may justoccur sporadically for some reasons, which does not requirean explanation. The faults which happen in several sensors are Algorithm 2:

Compute Explanation Set

Input: the set V ∗ of violation features in S w.r.t T , thecandidate explanation set G Output: the set of ﬁnal explanations H initialize H ← ∅ ; delete R s from G according to Proposition 3; insert R ′ s from G into H according to Proposition 4; G ← G \ R ′ s ; V ∗ ← V ∗ \ R ′ .cover ; select the subset V ∗ M from V ∗ where V ∗ M = { v | v.K > and v ∈ V ∗ } ; sort all v s in V ∗ M in the descending order of the size of v.K ; foreach v ∈ V ∗ M do R ( E ) ← arg min v ∈ R ( E ) .cover Cost ( E ) ; H ← H ∪ { R ( E ) } ; V ∗ un ← V ∗ \ H.cover ; while V ∗ = ∅ do R ( E ) ← arg max R ( E ) | R.cover ∩V ∗ un | Cost ( E ) ; H ← H ∪ { R ( E ) } ; V ∗ un ← V ∗ un \ R.cover ; return H ; always more serious than the ones only happen in one sensor,and 2) the multi-sequences violations always contains muchmore features than single-sequence violations. The processof multi-sequences violations contributes to increasing thecoverage of the solution.Algorithm 2 outlines our heuristic algorithm, which mainlyconsists of three steps, as discussed below. Global optimization (Lines 1-5). After initializing an emptyset H , we ﬁrst execute the global optimization according toProposition 3 and Proposition 4. Thus, we narrow the size ofthe input G by deleting the invalid explanations R s, while weinsert the valid explanations R ’s into set H . After that, wedelete R ’s from G and correspondingly delete R ′ .cover fromthe violation set V ∗ . Covering multi-sequences violations (Lines 6-11). After wedeal with all the valid and invalid explanations, we begin toselect explanations from the present set G to cover the multi-sequences violations. We sort all multi-sequence features in thedescending order of the number of sequences involved in eachfeature v . We then enumerate each feature v from the sortedset V ∗ M , and greedily ﬁnd an fault E whose explanation R ( E ) which can cover v with the minimum Cost ( E ) value. We putsuch R into the solution set H , and ﬁnish the iteration whenall the features in V ∗ M have been visited. We then delete allthe violation features covered by H from V ∗ and let the setbe V ∗ un , which needs to be covered in the following step. Covering single-sequence violations (Lines 12-15). Facedwith V ∗ un , we compute the total number of v s in V ∗ un coveredby the same explanation R , denoted by | R.cover ∩ V ∗ un | , andwe iteratively choose the explanation R which has the maxi-mum ratio of the above number to Cost ( E ) . Correspondingly,we add R into H and then delete R.cover from the present set V ∗ un . This iteration ﬁnishes until there is no features in V ∗ un ,and we ﬁnally obtain the solution H . omplexity . The modiﬁcation process in Algorithm 2 lines2-5 costs O ( |V| · |R| ) time, and the sorting in line 6 costs O ( |V| · log |V| ) . Generally, the loops Lines 8-11 and Lines12-15 both cost O ( |V| · |R| ) at worst. To put it together,Algorithm 2 spends O ( |V| · |R| · max {|V| , |R|} ) .V. K NOWLEDGE U PDATE

Though we ﬁnd out a solution of explaining the detectedanomalies with the existing reasons in Algorithm 2, thereremains some anomalies which are inexplicable by the knowl-edge set, i.e. , the Set C in Table IV. There are mainly severalreasons of the occurrence of the violated data in Set C:(1) Some of the explanations w.r.t a fault are not reliableenough to conclude the corresponding fault event. That is, therepresentations in such explanation are far from precise whichfails to identiﬁed the fault from the violation features. (2) Newfault events are discovered by the constraint set C which arenot known in the present knowledge set R . With the bothcases, we aim to update and improve the present knowledgeset according to the detected results. The updated knowledgeset will provide more precise anomaly explanations in return.In this section, we propose our update strategies faced withthe inexplicable violations. We ﬁrst discuss the update ofanomaly representations, especially the possible representa-tions, utilizing the relevance between the detected violationfeatures in Section V-A, and we introduce a knowledge setmodiﬁcation strategy in Section V-B, which assists improvethe quality of the knowledge set from the iterations of anomalyexplanation and knowledge update. A. Update of Anomaly Representations

As discussed above, the imperfect description concluded in R is one of the most serious reasons which results in theremaining inexplicable anomalies. Faced with Set C in TableIV, we consider to update the explanation of fault eventsby either adding new representations to an explanation ordirectly adding an explanation w.r.t . a new fault event. We ﬁrstconsider to update the existing representations within a fault’sexplanation with a relevance analysis of the violation features,and then consider to create a new record of the unknownanomalies in R .When we try to ﬁnd the faults to explain the anomalies, thereprobably remains some violation instances that we fail to ﬁndthe fault reasons to cover them. We need to update and modifythe representations of faults in order to improve the descriptionof faults. Faced with the update task, we generally considerto insert some new violation features to the explanation of afault, where these new features are regarded as supplementary(possible) representations of the existing explanation. Thus,we are able to ﬁnd a more precise solution H ′ to cover thedetected violations.When real faults happen, the violations w.r.t one faultprobably do not occur individually. One one hand, it ispossible that the anomaly in one sequence S brings aboutthe multiple violations of different constraints on S . On theother hand, some violations w.r.t a multi-sequence constraint c would occur at the same time in the involved sequences, i.e. , c.domain . To achieve the interpretability and the de-pendability of the representation update, we introduce the relevance analysis between the existing knowledge and thelearnt violation features. We discuss the relevance in SectionV-A1, and then propose our update algorithm in Section V-A2.

1) Relevance in anomaly representations:

Suppose r +1 and r +2 are two anomaly representations learnt from violationfeatures after lots of iterations of detection and update. Therelevance between them is formalized in Deﬁnition 11. Weconsider the different representations in the same sequence tobe directly related to each other. Besides, the representationsin different sequences come from one multi-sequences con-straints are also directly related to each other. Deﬁnition 11: ( Relevance between r s ). Given two anomalyrepresentations r : h S i , F ( c m ) i and r : h S j , F ( c n ) i , r is related to r , denoted by r ↔ r , if it satisﬁes either(1). S i and S j are the same sequence, while c m and c n aredifferent constraints, i.e. , i = j , and m = n , or(2). S i and S j are different sequences, while c m and c n are the same constraint. Speciﬁcally, c m ( a.k.a c n ) is a multi-sequence constraint whose domain contains S i and S j .Otherwise, r and r are not related to each other, denotedby r = r .The relevance between r s is a symmetrical relation, whileit does not has transitivity because there are two factorsin the identiﬁcation of the relation “ ↔ ”, i.e, r s w.r.t thesame sequences, and r s w.r.t the same constraints. With therelation “ ↔ ”, we are able to update R s according to therelevance between a learnt representation r + and an existingrepresentation in R . Such learnt r + s come from the detectedviolation features, as formalized in Proposition 5. Proposition 5:

Given feature v , let CoverAE ( v ) = { R | R ∈R , v ∈ R.cover } be the set of all explanations which coverthe occurrence of v . Feature v ∗ is identiﬁed to be a learntrepresentation of the anomaly event E described by R , if itsatisﬁes (1) v ∗ is uncovered by the solution H , and (2) v ∗ isrelated to v .

2) Update Algorithm:

We propose the update process of theknowledge set R in Algorithm 3. We ﬁrst construct a graph G where each vertex denotes a violation feature, and there existsan edge between two vertices v i and v j if they are relatedto each other according to Deﬁnition 11. We maintain a setof explanations CountAE for each feature v , in which eachelement ( i.e. , explanation) is able to cover v (Lines 2-3). Afterinitializing the visit ﬂag of features, we begin to iterativelyvisit uncovered features and update them into the existingexplanations or create a new anomaly explanation (Lines 5-13). Concretely, for an uncovered feature v , we compute all thefeatures which are related to v (denoted by Candr ), and obtainall the explanations which should be updated w.r.t v (denotedby UpSet ) in Line 6. This process is implemented by a functionF

IND U P as discussed in Algorithm 4 below. If the set UpSet is empty, which means there does not exist any anomalyevents that can potentially explain the occurrence of v , weconsider to create a new anomaly event described by v and lgorithm 3: Explanation Update

Input: the knowledge set R , the set V of all detectedviolations, the set V uncover of violation featuresuncovered by H Output: the updated knowledge set R construct graph G = ( V , E ) where each identiﬁed feature v denotes a vertex in G , and the edge ( v i , v j ) ∈ E exists ifhaving v i related to v j ; foreach v ∈ V do CoverAE ( v ) ← { R | R ∈ R , v ∈ R.cover } ; Initialize ∀ v ∈ V , Flag [ v ] ← False; foreach v ∈ V uncover do Candr , UpSet ← F IND U P ( v ) ; if UpSet = ∅ then create a new anomaly event and update R up into UpSet ; foreach v ∈ Candr do CoverAE ( v ) ← UpSet ; foreach R up ∈ UpSet do insert v into R up as a new possiblerepresentation r + with the initial weight w ; V uncover ← V uncover \ Candr ; return R ; its related features (Lines 7-8). We enumerate each violationfeature v in set Candr , update the set

CountAE ( v ) with thenew explanation, and insert v into R up as a new representation r + = h S, F ( c ) i with a initial weight w (Lines 10-12). Afterﬁnishing the update process of Candr , we delete all theelements in

Candr from V uncover , and continue to process nextfeature in V uncover . We obtain the undated knowledge set R until all uncovered features have been visited.We then introduce the proposed function F IND U P in Algo-rithm 4, where we achieve to ﬁnd out (1) all uncovered featuresrelated to the input feature v , denoted by the set Candr , and (2)all the anomaly explanations which need to be updated w.r.t v , denoted by UpSet . Given a feature v , we ﬁrst mark that v has been visited, and then we determine whether v has beenalready covered by acknowledged explanations. If so, v willnot be considered as a candidate new representation. ∅ and thepresent set CountAE will be returned (Lines 2-3). Otherwise,when the present feature is not covered by any anomaly events,we initialize the set

Candr with v and the set UpSet with ∅ ,and begin to add elements into the both set. We iterativelyvisit the uncovered features related to v , and complete the set Candr as well as ﬁnd out all the existing explanations to beupdated w.r.t v . After the loop in lines 5-9 ﬁnishes, both set Candr and

UpSet will be returned to Algorithm 3.

Complexity . In Algorithm 3 lines 1-3, it costs O ( |V| ) to construct the graph, and O ( |V| · |R| ) to ﬁnd out theexplanation set CoverAE of all the features, where |V| and |R| denote the number of violation features and anomalyexplanations, respectively. The outer loop (Lines 5-13) costs O ( V uncover ) times, while the inner loop (Lines 9-12) spends O ( |V cover | · | UpSet | + | Candr | · |

UpSet | ) . In practice, the sizeof V uncover becomes smaller with the outer loop. Thus, the Algorithm 4: F IND U P ( v ) Input: the present violation feature v Output:

Candr , UpSet Flag [ v ] ← True; if CoverAE( v ) = ∅ then return ∅ , CoverAE ( v ) ; Initialize

Candr ← { v } , UpSet ← ∅ ; foreach v ∗ ∈ v. neighbours do if Flag [ v ∗ ] = False then temp Candr , temp

UpSet ← F IND U P ( v ∗ ) ; Candr ← Candr ∪ temp Candr ; UpSet ← UpSet ∪ temp UpSet ; return Candr , UpSet ; outer loop is executed in O ( |V uncover || Candr | ) on average. Together, thewhole loop costs O (cid:0) |V uncover || Candr | ·| UpSet |· max (cid:8) |V cover | , | Candr | (cid:9)(cid:1) .Whether |V uncover | is larger than |V cover | or not, it always has | UpSet | |R| . To put it together, Algorithm 3 totally costs O (cid:0) |V| · max {|V| , |R|} (cid:1) to update the whole knowledge set.We highlight that, r + is considered to be inserted into thepossible explanation set of a fault with a probability w , for thereason that the updated anomaly representations are derivedfrom multiple real detections, whose dependability is less thanthe existing knowledge. We will discuss our blueprint of howto modify the knowledge set in the iteration of detectionsand updates in Section V-B, including the consideration ofthe weight w and the violation features of r . B. Modiﬁcation on the Knowledge Set

Note again that R ( E ) is divided into R ∗ ( E ) and R + ( E ) ,where we highly trust the representations in R ∗ ( E ) whilewe consider the representations in R + ( E ) with uncertainty.Actually, the possible explanation set R + ( E ) likely comesfrom the learning result of real detections, i.e. , the updateprocess in Algorithm 3. The violation features, especially theones appearing frequently in detection phases, can be appliedin the knowledge modiﬁcation process.It is practicable to return anomaly explanation results todomain experts who are able to adjust and improve theexisting knowledge system. And the improved knowledgewould in turn contributes to the accuracy and the reliability ofthe following detections. Besides the supplementary possiblerepresentations r + s, we can also make the representation r more accurate by modifying the degree function value F ( c ) of r . In this section, we further discuss two kinds of potentialknowledge modiﬁcation strategies. Type-1 Modiﬁcation: degree function values . As proposedin Deﬁnition 2, the function value F ( c ) of a violation feature v measures to which degree a sequence violates the constraint c . For a violation feature of quantitative constraints denotedby v : h S, F ( c ) = [ d, u ] i , let F ′ ( c ) = [ d ′ , u ′ ] be the presentdegree function obtained from k times of updates. We modifythe function F ( c ) w.r.t v as follows. F ( c ) (cid:26) = [ d ′′ , u ′′ ] = [ k · d ′ + dk +1 , k · u ′ + uk +1 ] , dist ( F, F ′ ) < will be returned to manual , dist ( F, F ′ ) = 1 . ype-2 Modiﬁcation: weights . The weight w of a possiblerepresentation r + in an anomaly explanation R would beestimated by the conditional probability Pr( r + | R ) in Equation(4), ˆ w ( r + ) = Pr( r + | R ) = Pr( R, r + )Pr( R ) = N positive ( Rr + ) N positive ( R ) (4)where N positive ( R ) denotes the occurrence number of anomalyexplanation R in solution H during times of learning process,while N positive ( Rr + ) denotes the occurrence number of theconditions that R exists in H and ∃ v is consistent with r + .VI. E XPERIMENTAL S TUDY

We now evaluate the experimental study of the proposedmethods. All experiments run on a computer with 3.40 GHzCore i7 CPU and 32GB RAM.

A. Experimental Settings

Data source . We conduct our experiments on real-lifeindustrial equipment data, named FPP, which has 80 sensorsrecording the working conditions of a fan-machine group froma large-scale fossil-fuel power plant. We have analyzed dataon more than 1620 K historical time points for 5 consecutivemonths with log ﬁles and functional documents. We report ourexperimental results on 64 sensors after preprocessing. Implementation . We have developed

Cleanits , a datacleaning system for industrial time series in our previous work[15], which reads and writes data from Apache IoTDB [16].The anomaly explanation method proposed in this paper isapplied as one main function of

Cleanits .We implement all algorithms proposed in this paper, withthe constraint-based detection method

VioDetect , the expla-nation algorithm

AEC , and the update algorithm

Update . Theconstraints used in experiments contain half real constraintsprovided by domain knowledge and half synthetic ones con-cluded by a long-term research of both the historical data andlog ﬁles. Due to the fact that not only the industrial knowledgeset is far from comprehensive, but also the labelled anomaliesas well as explanations are limited, we extend the existingknowledge (mostly from documents and domain experts), andmanually regulate some synthetic explanations with the corre-sponding representations based on the acknowledge documentsand fault logs.We consider the original clean time series data as groundtruth, and inject anomalies w.r.t constraints into sequences indifferent time intervals. Without loss of generality, we intro-duce anomaly instances according to the error types in [17],[18], and deeply consider the anomaly patterns simultaneouslylocated in multiple sequences referring to the acknowledgedreal anomaly events. We totally apply 210 constraints and 60anomaly events as the given knowledge set.Besides the porposed method, we also implement ﬁvealgorithms for comparative evaluation: • greedyC : uses greedy strategy for the set cover problemin the candidate explanation set G to iteratively select event E satisfying R ( E ) = arg min Cost ( E ) | R.cover ∩V ∗ uncover | , withconstraint types insensitive. • greedynC : describes the violation in multi-sequence con-straints c with n violation features in the involved n sequences w.r.t c , rather than apply only one featureto denote such violations, with others the same with greedyC . • MFnC : treats the multi-sequence-constraint violation with n features in the involved sequences w.r.t c , with othersthe same with the proposed AEC . • TopK : sorts the explanations R ( E ) s in the ascendingorder of Cost ( E ) , and chooses the top K explanationsas the result. K is determined by K = |R| · |C vio ||C| , where |C| is the number of adapted constraints and |C vio | is thenumber of detected violated constraints. • AE : output all explanations satisfying R ( E ) =arg Cost ( E ) | R ( E ) .cover | ≤ λ , where λ ∈ (0 , is a set threshold.We report results with λ =0.4, for it provides the bestresults among possible threshold values.We note that the ﬁrst three algorithms are cover-based, while TopK and AE are not. Measure . We apply Precision ( P ) and Recall ( R ) metricsto evaluate the performance of algorithms in Equation (5). P measures the ratio between the number of correctly-identiﬁedanomaly events, i.e., R is the ratiobetween P = , R = . (5) B. General Performance

With the condition that

Constraints =210, we perform allcomparison algorithms on 4 datasets which has 10.8K timepoints on 64 sensors recording data for one day. About 20anomaly events occur in each dataset. As shown in Figure3,

VioDetect reaches high performance on both P and R on all datasets. This is the foundation for a high-qualityexplanation computing. The proposed AEC has the best scoreswith P = 0 . on average, when MFnC comes the second. Itreveals that it is better to treat the violation of a multi-sequenceconstraint as only one violation feature, than maintain n features in each involved sequence. The gap in P between greedyC and greedynC also conﬁrms this. However, bothalgorithms fail to provide precise explanations. It is becausethey both treat all types of constraints equally, and fail togive priority to multi-sequence constraints, whose violationsare more likely to show major features to be identiﬁed as ananomaly event.Figure 3 shows the four cover-based algorithms have similarRecall scores in the four datasets. It reveals that the coveringsolutions can capture and recall at least 85% of the anomalyevents. In addition, it is almost no difference in Recall whetherthe constraint types are sensitive or not. As for algorithm TopK and AE , the performance of both algorithms is not ioDetect AEC MFnC greedyC greedynC TopK AE D1 D2 D3 D40.60.70.80.91.0 P r e c i s i on VioDetect AEC MFnC greedyC greedynC TopK AE

D1 D2 D3 D40.60.70.80.91.0 R e c a ll Fig. 3. General performance comparison on 4 datasets in FPP.

30 60 90 120 150 180 2100.40.60.81.0

VioDetect AEC MFnC greedyC greedynC TopK AE P r e c i s i on R e c a ll Fig. 4. Performance comparison vs. the number of constraints steady in different datasets. This shows that simply choosingexplanations w.r.t

Cost cannot well identify the occurredanomaly events.

C. Evaluation on Explanation Performance

We next report the performance under three vital parame-ters: constraints amount (

Constraints ), data amount (

Timepoints ), and the number of occurred anomaly events ( AE ). Varying

Constraints . Figure 4 shows the performance onthe condition that

Time points =20K. It shows that

VioDetect and the four cover-based algorithms have higher scores withthe increasing of

Constraints , which indicates the complete-ness of the constraint set affects the explaining quality.Both P and R of the cover-based algorithms trend to bestable after Constraints reaches 120. Our proposed

AEC shows the best scores on P ( > R ( > MFnC comes the second, and greedyC the third. It also shows that theprecision difference between

AEC and

MFnC becomes largerwith the increasing constraint amount. It veriﬁes again thatthe advantage of describing the violation w.r.t one constraintwith only one feature. As for Recall,

AEC and

MFnC (resp. greedyC and greedynC ) present quite closed scores, whichshows that the constraint-type sensitive does not obviouslyeffect the Recall level.

Varying

Time points . Figure 5 shows results with

Con-straints =210 and varying

Time points . Anomaly eventsare evenly located in data. The detection performance of

VioDetect is stable and only has a little drop when the dataamount becomes larger, and it achieves P =0.92 and R =0.91 indetecting almost 2-day data ( i.e. , Time points =20K).While all the comparison algorithms have drops in differentdegree with the increasing data amount,

AEC beats the restcomparison algorithms in both metrics. It achieves P > R > Time points ≤ greedyC and greedynC is faster. It shows thatthe simple greedy strategy cannot provide good solutions whendata amount gets larger, where exists more anomaly instances. Varying AE . Figure 6 reports the results with Con-straints =210, and varying number of happened anomaly events

VioDetect AEC MFnC greedyC greedynC TopK AE P r e c i s i on R e c a ll Fig. 5. Performance comparison vs. the number of time points

VioDetect AEC MFnC greedyC greedynC TopK AE P r e c i s i on R e c a ll Fig. 6. Performance comparison vs. the number of anomaly events in

Time points =20K data.

VioDetect has stable Recall scoreswhile Precision has a little drop from 0.942 to 0.904 with AE varying from 5 to 30. It shows that the anomaly event amountdoes not obviously effect the the violation detection result. AEC has the least drop in both metrics among the ﬁvecomparison methods, and presents P > R > AE reaches 30. It conﬁrms the effectiveness of ourmethod when faced with quite a few anomaly events. The fastdecline in the performance of greedyC and greedynC showsagain that the naive greedy-based algorithms would computeless reliable solutions when the growth number of anomalies.The baseline algorithm TopK also drops with the increasing AE , and it always provides poor results compared with others.Note that another baseline algorithm AE has closer gaps withthe cover-based algorithms in P , however it has poor Recalls.We highlight that AE >

20 is a strict condition in realscenarios, even through in a 2-day time series data. Thus, thehigher performance of our proposed

AEC as well as the largegap between

AEC and others algorithms indicates our methodhas the potential of the effectiveness and robustness in solvingreal anomaly explanation problems.

D. Evaluation on Update Performance

We then introduce the method performance of our knowl-edge update phase with three parameters:

Constraints , AE ,and the incomplete rate in explanations of the knowledge set R ( inr % ). We treat the performance of AEC with the original R as baseline. We randomly select the explanation set of someanomaly events in R , where we delete a percentage of possiblerepresentations. We denote the knowledge set after deletion as R − , and report the explanation results with R − by rRemove .We execute the update algorithms proposed in Section V for R − , and denote the updated knowledge set as R + . We reportthe explanation results with R + by Update . Varying

Constraints . Figure 7 shows the performance un-der R , R − , and R + on the condition that Time points =20K,and inr % =15%. rRemove has the worst scores on both P and R , and the score gap between AEC and rRemove trendsto be larger with the growth of

Constraints . It shows that P r e c i s i on R e c a ll Fig. 7. Update performance vs. the number of constraints P r e c i s i on R e c a ll Fig. 8. Update performance vs. the number of anomaly events the incomplete knowledge set would reduce the quality ofanomaly explanation results. When it comes to

Update with R + , it shows a signiﬁcant improvement on both metrics.Our update method assists to recover 93% of the originalperformance with R on average. Moveover, the update resultbecomes better with the increasing constraint numbers. Itis because the proposed Algorithm 4 effectively computesthe relevance between uncovered violation features and theexisting representations, which helps the recovery and updateof the incomplete knowledge set, and further contributes tothe improvement of explaining the anomalies. Varying AE . Figure 8 presents the performance onthe condition Constraints =210,

Time points =20K, and inr %=15%. The Precision scores of rRemove and

Update arestable against the growth in AE , while the Recall scores havean obvious drop. However, the performance differences of rRemove and Update in P is quite larger than the differencesin R . While rRemove only reaches P =0.7 on average, Update is able to provide more precise results and reaches 0.8 in P faced with 30 anomaly events. Update recovers 94% of R onaverage. As for Recall, results on the three knowledge setshave similar scores in R , this indicates, to some degree, ouranomaly explanation method has robustness in Recall againstan incomplete knowledge set. Varying Incomplete rate in R . Figure 9 shows the resultson the condition Time points =20K and

Constraints =210with varying incomplete rate of R . We put P =0.89 and R =0.88of AEC with the original R along the X-axis, and focus onthe method performance on R − and R + . It is obvious thatthe scores of both metrics drop with the increasing inr %. Itpresents only 0.69 in P and 0.73 in R with the 20%-incompleteknowledge set. Our update method is able to correctly recoverthe missing representations by computing and analyzing theuncovered violation features. Update recovers to 96.8 in P and 0.9881 in R of AEC with inr %=4%, and 0.921 in P and0.954 in R with inr %=20%.Though Update achieves effective improvement of explain-ing anomalies with incomplete knowledge set, it does not per-fectly recover to the same level of

AEC with the original set R .It is believed that the relation among anomaly representationswithin or between anomaly events is quite complex. When P r e c i s i on Incomplete rate % 4 8 12 16 200.60.70.80.91.0 R e c a ll Incomplete rate %

Fig. 9. Update performance vs. the incomplete rate of R TABLE VT

IME COSTS ( Time points =20K, inr %=15%) |C| AE AE Time AE F1 In Round UP Time UP F1

60 20 <

1s 0.715 3K 4.69s 0.6560.876 3K 8.41s 0.812150 20 <

1s 5K 10.21s 0.8210.876 8K 16.39s 0.8240.876 10K 20.16s 0.830210 20 <

2s 0.865 3K 9.49s 0.83230 <

2s 0.82 3K 9.62s 0.790 some representations are missing in the given knowledge, itis challenged to be automatically well-identiﬁed and well-recovered. The improvement between the performance gap of

AEC and

Update and the gap between R and R + will beaddressed in our future work. E. Efﬁciency results

We report the time costs of the methods in Table V, whichpresents results with several typical parameter values due tothe limited space. Here, In Round denotes the iteration timesof the explanation and update. It is worth noting that theﬁrst two steps in our method ( i.e., violation detection andanomaly explanation) spend little time (denoted as AE Time),and ﬁnish computing the anomalies in 2 seconds with 210constraints on 20K data. The update time cost increases withthe growth of

Constraints and In Round, respectively. Itshows that AE has slight effect on the update time, for thereason that the limited AE in real scenarios will not leadto large computation in either the covering and the updatealgorithms. Though the iteration times lead time growth, theproposed method can present almost the same performancewith less In Round. Such efﬁciency performance shows thatour method has the potential to be used for processing large-scale IoT data. VII. R ELATED W ORK

We summarize a few works related to our proposed issuesin time series anomaly explanation.

Anomaly detection in temporal data . Anomaly detection(see [19] as a survey) is a important step in time seriesmanagement process [20], which aims to discover unexpectedchanges in patterns or data values in time series. Guptaet al. [4] summarizes anomaly detection tasks in kinds oftemporal data and provide an overview of detection tech-niques ( e.g. , statistical techniques, distance-based approaches,classiﬁcation-based approaches). Autoregression and windowmoving-average models ( e.g.,

EWMA, ARIMA [21]) arewidely used in outlier points detections [3]. On the other hand,anomalous subsequences are more challenged to be detectedbecause abnormal behaviors within subsequences are difﬁcultto be distinguished from normal behaviors [1]. Sequencepatterns discovery in time series is continuously studied, i.e.,

Rule-based temporal data cleaning . Data cleaning andrepairing is of great importance in data preprocessing. With therise of temporal data mining, effective cleaning on temporaldata is gaining attention according to its valuable temporalinformation. Ihab F. Ilyas and Xu Chu give an overview ofthe end-to-end data cleaning process including error detectionand repair methods in [10]. Both statistical-based [27], [28]and constraints-based [11], [29] cleaning are widely appliedin temporal date quality improvement. [29] extends the idea ofconstraints from dependencies deﬁned on relational database( e.g ., FD , CFD in [30]), and proposes sequential dependencies( SD ) to describe the semantics of temporal data. Accordingly,speed constraints are developed in sequential data and appliedto time series cleaning solutions [11], [28]. Causality analysistries to reason about the responsibility of a source in causingerrors result. Systems like Scorpion [31] and DBRx [9] havebeen developed to compute the causality and responsibility ofviolations. The DBRx makes explanation discovery erroneoustuples with desirable properties, namely coverage, preciseness,and conciseness. As the existing techniques mostly focus onrelational data. We move a step further in anomaly explanationstudy in temporal data in this paper. Our work can alsocomplement the state-of-art data cleaning techniques.VIII. C ONCLUSION

We formalized the anomaly explanation problem in mul-tivariate temporal data and construct a self-contained 3-stepmethod to solve the problem. We identiﬁed anomalies asthe violations of types of constraints, and devised set-cover-based algorithms to reason the anomaly events with thegiven knowledge set. Further, we proposed knowledge updatemethods to improve the knowledge quality and in turn addto the effectiveness of our method. Experiments on real IoTdata showed that the proposed method computed high-qualityexplanation solutions of anomalies.R

EFERENCES[1] M. Toledano, I. Cohen, Y. Ben-Simhon, and I. Tadeski, “Real-timeanomaly detection system for time series at scale,” in

Proceedings ofthe KDD Workshop on Anomaly Detection , 2017, pp. 56–65.[2] T. Dasu, J. M. Loh, and D. Srivastava, “Empirical glitch explanations,”in

The 20th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, KDD ’14, New York, NY, USA - August24 - 27, 2014 , 2014, pp. 572–581.[3] J. Takeuchi and K. Yamanishi, “A unifying framework for detectingoutliers and change points from time series,”

IEEE Trans. Knowl. DataEng. , vol. 18, no. 4, pp. 482–492, 2006.[4] M. Gupta, J. Gao, C. C. Aggarwal, and J. Han,

Outlier Detection forTemporal Data , ser. Synthesis Lectures on Data Mining and KnowledgeDiscovery. Morgan & Claypool Publishers, 2014.[5] X. Wang and C. Wang, “Time series data cleaning: A survey,”

IEEEAccess , vol. 8, pp. 1866–1881, 2020. [6] T. Dasu, R. Duan, and D. Srivastava, “Data quality for temporal streams,”

IEEE Data Eng. Bull. , vol. 39, no. 2, pp. 78–92, 2016.[7] Braatz, L. H. Chiang, and E. L. R. D, “Fault detection and diagnosis inindustrial systems,” 2001.[8] R. Fujimaki and T. N. et al, “Mining abnormal patterns from heteroge-neous time-series with irrelevant features for fault event detection,”

Stat.Anal. Data Min. , vol. 2, no. 1, pp. 1–17, 2009.[9] A. Chalamalla, I. F. Ilyas, M. Ouzzani, and P. Papotti, “Descriptive andprescriptive data cleaning,” in

International Conference on Managementof Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 . ACM,2014, pp. 445–456.[10] I. F. Ilyas and X. Chu,

Data Cleaning . ACM, 2019. [Online].Available: https://doi.org/10.1145/3310205[11] S. Song, A. Zhang, J. Wang, and P. S. Yu, “SCREEN: stream datacleaning under speed constraints,” in

Proceedings of the 2015 ACMSIGMOD International Conference on Management of Data, Melbourne,Victoria, Australia, May 31 - June 4, 2015 , pp. 827–841.[12] W. Yin, T. Yue, H. Wang, Y. Huang, and Y. Li, “Time series cleaningunder variance constraints,” in

Database Systems for Advanced Appli-cations - DASFAA 2018 International Workshops , ser. Lecture Notes inComputer Science, vol. 10829. Springer, 2018, pp. 108–113.[13] R. M. Karp, “Reducibility among combinatorial problems,” in

50 Yearsof Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art . Springer, 2010, pp. 219–241.[14] J. E. Beasley and K. Jornsten, “Enhancing an algorithm for set coveringproblems,”

European Journal of Operational Research , vol. 58, no. 2,pp. 293–300, 1992.[15] X. Ding, H. Wang, J. Su, Z. Li, J. Li, and H. Gao, “Cleanits: A datacleaning system for industrial time series,”

PVLDB , vol. 12, no. 12, pp.1786–1789, 2019.[16] C. Wang, X. Huang, and J. Q. et al, “Apache iotdb: Time-series databasefor internet of things,”

Proc. VLDB Endow. , vol. 13, no. 12, pp. 2901–2904, 2020.[17] R. S. Tsay, “Outliers, level shifts, and variance changes in time series,”

Journal of Forecasting , vol. 7, no. 1, pp. 1–20, 1988.[18] R. S. Tsay, D. Pena, and A. E. Pankratz, “Outliers in multivariate timeseries,”

DES - Working Papers. Statistics and Econometrics. WS , 1998.[19] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,”

ACM Comput. Surv. , vol. 41, no. 3, pp. 15:1–15:58, 2009.[20] S. K. Jensen, T. B. Pedersen, and C. Thomsen, “Time series managementsystems: A survey,”

IEEE Trans. Knowl. Data Eng. , vol. 29, no. 11, pp.2581–2600, 2017.[21] W. W. S. Wei,

Time series analysis - univariate and multivariatemethods . Addison-Wesley, 1989.[22] S. Papadimitriou, J. Sun, and C. Faloutsos, “Streaming pattern discoveryin multiple time-series,” in

PVLDB .[23] F. M¨orchen, “Algorithms for time series knowledge mining,” in

Pro-ceedings of the Twelfth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Philadelphia, PA, USA, August20-23, 2006 , 2006, pp. 668–673.[24] U. Rebbapragada, P. Protopapas, C. E. Brodley, and C. R. Alcock,“Finding anomalous periodic time series,”

Machine Learning , vol. 74,no. 3, pp. 281–313, 2009.[25] S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-classSVM with deep learning,”

Pattern Recognition , vol. 58, pp. 121–134,2016.[26] H. Liu, X. Li, J. Li, and S. Zhang, “Efﬁcient outlier detection forhigh-dimensional data,”

IEEE Trans. Systems, Man, and Cybernetics:Systems , vol. 48, no. 12, pp. 2451–2461, 2018.[27] M. Yakout, L. Berti- ´Equille, and A. K. Elmagarmid, “Don’t be scared:use scalable automatic repairing with maximal likelihood and boundedchanges,” in

Proceedings of the ACM SIGMOD International Conferenceon Management of Data, SIGMOD 2013 , pp. 553–564.[28] A. Zhang, S. Song, and J. Wang, “Sequential data cleaning: A statisticalapproach,” in

Proceedings of the International Conference on Manage-ment of Data, SIGMOD Conference , 2016, pp. 909–924.[29] L. Golab, H. J. Karloff, F. Korn, A. Saha, and D. Srivastava, “Sequentialdependencies,”

PVLDB , vol. 2, no. 1, pp. 574–585, 2009.[30] W. Fan and F. Geerts,

Foundations of Data Quality Management ,ser. Synthesis Lectures on Data Management. Morgan & ClaypoolPublishers, 2012.[31] E. Wu and S. Madden, “Scorpion: Explaining away outliers in aggregatequeries,”