ABACUS: Unsupervised Multivariate Change Detection via Bayesian Source Separation
AABACUS: Unsupervised Multivariate Change Detection via Bayesian Source SeparationWenyu Zhang ∗ [email protected] Daniel Gilbert ∗ [email protected] David S. Matteson ∗ [email protected] Abstract
Change detection involves segmenting sequential datasuch that observations in the same segment share somedesired properties. Multivariate change detection con-tinues to be a challenging problem due to the varietyof ways change points can be correlated across chan-nels and the potentially poor signal-to-noise ratio onindividual channels. In this paper, we are interestedin locating additive outliers (AO) and level shifts (LS)in the unsupervised setting. We propose ABACUS,
Automatic BAyesian Changepoints Under Sparsity , aBayesian source separation technique to recover latentsignals while also detecting changes in model param-eters. Multi-level sparsity achieves both dimension re-duction and modeling of signal changes. We show ABA-CUS has competitive or superior performance in simu-lation studies against state-of-the-art change detectionmethods and established latent variable models. Wealso illustrate ABACUS on two real application, model-ing genomic profiles and analyzing household electricityconsumption.
Keywords : blind source separation; dimensionreduction; latent factor model; multivariate changepoints; sparse signal extraction; unsupervised learning
Change detection segments sequential data such thatobservations in each segment share the same character-istics. We can view it as a specific form of clusteringwhere sequential data points tend to cluster together.Two common sequential orderings are time and physi-cal location. Offline change detection segments the dataretrospectively and is useful for uncovering events andsystematic behaviors in data analysis tasks. It is appliedin a variety of fields including energy consumption [13],genomics [22] and finance [10]. Furthermore, in the po-tential presence of change points, utilizing change de-tection prior to data modeling can help prevent build-ing inappropriate models under the assumption of datahomogeneity, and consequently supports improved pre-diction and statistical inference.In this paper, we study offline multiple change de- ∗ Cornell University
Figure 1:
Given observations generated by the linear mixingof signals contaminated by noise, ABACUS estimates the sourcesignals and detect additive outliers (AO, red) and level shifts(LS, blue). In M , darker and lighter cells represent negative andpositive values respectively, and medium gray cells represent zero. tection in multivariate data, specifically where the dataexhibit mean changes that can occur simultaneouslyin several channels. The direction and magnitude ofchange can be different across channels. Here, we re-fer to mean changes lasting a single time unit with animmediate return as additive outliers (AO), and meanchanges with duration two or greater as level shifts (LS).We assume that the multivariate data are generatedby low-dimensional latent source signals through linearmixing according to the model Y = M S + E , shownin Figure 1, similar to the general linear setting used inthe blind source separation literature [15,21]. Notation-wise, M is the mixing matrix, and Y , S and E arethe observations, source signals and noise, respectively.Observed mean changes manifest from the latent space,and we detect changes by estimating these latent sourcesignals, which possess ‘semantic’ meaning of the under-lying states and are free of noise.Multivariate data are readily observed in many ap-plications in today’s world, and mean changes are ofparticular interest since the mean is often a salient as-pect of the system state. Multivariate data can be ob-servations from multiple channels monitoring a singlesystem, or a collection of univariate data streams from a r X i v : . [ s t a t . M E ] O c t ultiple related systems. Examples of the first sce-nario include household power consumption measuredwith sub-meters [13], and wine quality based on physic-ochemical test variables [1]. Examples of the secondscenario include array comparative genomic hybridiza-tion measurements from several patients with the samemedical condition [20]. In these and other examples,change points in multivariate data sometimes occur si-multaneously in multiple channels because the signalsmay be driven by the same underlying processes. Itis of interest to identify these shared change points tofurther analyze the relationship between channels. Run-ning univariate change detection on each channel doesnot encourage identification of such shared changes.Finding changes in multidimensional data is knownto be a difficult problem. If the magnitude of change asmeasured by symmetric Kullback-Leibler divergence iskept constant, detectability of the change worsens whenthe data dimension P increases. This can hinder detec-tion even at dimensions as low as P = 10 [1]. Anotherissue arises when the data dimensions P exceeds thesample size N . If one wishes to use hypothesis testingto test for homogeneity, naive calculations of familiartest statistics such as the Hotelling’s t-squared statis-tic are prohibitive. Several approaches tackle multivari-ate data by incorporating a dimensionality reductionstep [25, 26], but these either project the data onto asingle dimension or require the user to select the re-duced dimensionality.Our main contribution is to successfully integratesparse Bayesian blind source separation with a changedetection framework. No previous work on latent vari-able modeling explicitly considered source signals withunconstrained mean changes. Bayesian variations ofprinciple component analysis (PCA) are capable of au-tomatic dimensionality selection [4, 28], and shrinkagepriors also achieve desirable properties in trend filter-ing [19]. In our Bayesian latent model, we use horse-shoe priors to recover the lower-dimensional source sig-nals and to simultaneously model the change points.The two tasks complement each other since the sourcesignals exhibit changes. We propose ABACUS, Auto-matic BAyesian Changepoint Under Sparsity , an auto-matic procedure that simultaneously detects additiveoutliers and level shifts via estimating components fromthe source separation problem. Figure 1 gives an ex-ample where ABACUS recovers the true latent changespace of size three by estimating values in the appro-priate dimensions of M and S to zero, and ABACUSalso locates relevant change points. We show throughsimulations and real data applications that ABACUSachieves better performance in both change detectionand source recovery. Authors of [6] formulated multivariate change detectionas a group fused Lasso, and showed empirically thatdetection probability approaches one with increasing P when noise is small. Variants of binary segmentationproduce approximately optimal segmentations by itera-tively detecting single change points [20, 22]. Dynamicprogramming with a suitable multivariate goodness-of-fit metric can recursively the data [27]. The above meth-ods directly segment the observations and some assumeindependence across channels [11, 25]. We recover thelatent change space with prior belief that only the latentsignals are independent given model parameters.Some works use a two-step procedure with datacompression onto a low dimension K (cid:28) P followed bychange detection. Projection onto a single dimensionenables univariate change detection [11]. For K > K . Our proposed methodABACUS is more robust to the specification of K dueto automatic dimensionality selection by our sparsityassumptions. In contrast to the latent variable modelthat we employ, these methods also ignore estimatingthe mixing matrix.Bayesian approaches in change detection typicallyrely on using indicator variables to denote the presenceof change points. The BCP method [3,10] assumes thatobservations in each segment are independent and iden-tically distributed as Gaussian, and updates posteriorsegment means conditional on the segmentation at eachiteration of an MCMC scheme. A uniform prior U(0 , q )is put on the change point probabilities, and the usertunes the chances of discovering shorter or longer seg-ments through q . In [13], given the segmentation in-formed by the indicator variables, a Wilcoxon rank sumtest is performed at each index of the data and the re-sulting p-values are modeled as a Beta-Uniform mixture.The data likelihood is written as a composite marginallikelihood of the p-values. The formulation makes noassumption on the distributional form of the data.ABACUS similarly utilizes the sparsity of changesby applying horseshoe priors, modeling the presenceand absence of changes, but also the change directionsand magnitudes. We utilize the horseshoe prior as it isknown for robustness and superior shrinkage properties7]. Empirically, differences in neighboring non-changelocation means are effectively shrunk to zero. We observe Y ∈ R P × N , a P -dimensional data stream oflength N . Each column take the form Y · n = M S · n + E · n ,where M ∈ R P × r is the mixing matrix, S · n is the r -dimensional source signal, and E · n is the P -dimensionalnoise vector, at index n . This is the general formulationof the cocktail party problem with P microphones and r conversations observed for N time points. Here, Y isnot necessarily a time series, but data which are indexedsequentially. S is assumed to have full row rank.We assume that the source signals are piecewise-constant. Each segment can be of any length, and adja-cent segments have different means. Latent variables aredriven by the same underlying system state, and hencemay share change locations, but change directions andmagnitudes are not necessarily the same. We assumethat the linearly-mixed signals are corrupted by inde-pendent Gaussian noise, but noise variances are not nec-essarily the same across channels. In the cocktail partyanalogy, this means that each microphone is subject toa different amount of noise due to the environment andmicrophone quality. The Gaussian assumption is stan-dard in parametric change detection models [3, 22, 26].We aim to decompose Y into its components with-out further information. Although the decompositionsolution is not unique, [12] reports that sparsity formu-lations in their Bayesian latent variable model helpedto stabilize fitting. We similarly apply multiple levels ofsparsity in our model, as described in the next section. We introduce our Bayesian data model and estimationmethod, as well as our change detection approach whichmakes use of MCMC posterior samples.
We de-compose source signals further into components consist-ing of either additive outliers (AO) or level shifts (LS).Additive outliers are abrupt mean changes lasting foronly one index, while level shifts persist for two or moreindices. This decomposition allows us to naturally dis-tinguish between the two types of changes, such thatthey can be studied separately, e.g., a user may removeadditive outliers and retain level shifts for analysis. Let K be a user-specified upper bound for rank( S ) = r suchthat r ≤ K < P . Then our modified formulation is Y · n = M S · n + E · n S · n = S (0) · n + S (1) · n S (0) · n = V (0) · n and (cid:52) S (1) · n = V (1) · n where M is the P × K mixing matrix, S is the K × N source signal matrix, E is the P × N error matrix, S (0) and S (1) are the K × N component matrices of S , V (0) and V (1) are K × N ‘sparse’ matrices, and (cid:52) is thedifferencing operator. The diagonal covariance matrixof E · n is denoted by Ψ = diag ( ψ ), so E · n ∼ N(0 , Ψ).We place sparse group priors on the columns of M and rows of V (0) and V (1) for dimensionality reductionof the latent space. Furthermore, we place sparse grouppriors on the columns of V (0) and V (1) to select a subsetof indices as change locations. We also use elementwisesparsity on V (0) and V (1) to allow sparse changes foreach latent variable.We choose to use horseshoe priors because thehorseshoe-shaped shrinkage profile discovers null valueswithout diminishing strong signals. [7]. We extend theglobal-local shrinkage hierarchy to impose sparsity inthe model at the element and group level.For 1 ≤ i ≤ P and 1 ≤ h ≤ K and 1 ≤ n ≤ N and d ∈ { , } , we set priors as M · h | λ (0) h , λ (1) h , τ (0) , τ (1) , Ψ ∼ N (cid:16) , λ (0) h λ (1) h τ (0) τ (1) Ψ (cid:17) V ( d ) hn | φ ( d ) n , λ ( d ) h , γ ( d ) hn , τ ( d ) ∼ N (cid:16) , φ ( d ) n λ ( d ) h γ ( d ) hn τ ( d ) (cid:17) ψ i ∼ Γ − (1 , τ ( d ) | ξ ( d ) ∼ Γ − (cid:18) , ξ ( d ) (cid:19) λ ( d ) h | η ( d ) h ∼ Γ − (cid:32) , η ( d ) h (cid:33) φ ( d ) n | ω ( d ) n ∼ Γ − (cid:32) , ω ( d ) t (cid:33) γ ( d ) hn | ζ ( d ) n ∼ Γ − (cid:32) , ζ ( d ) hn (cid:33) ξ ( d ) , η ( d ) h , ω ( d ) n , ζ ( d ) hn ∼ Γ − (cid:18) , (cid:19) where N () denotes the Gaussian distribution and Γ − ()denotes the Inverse Gamma distribution. Marginally,the shrinkage parameters τ ( d ) , λ ( d ) h , φ ( d ) n and γ ( d ) hn arehalf-Cauchy, as in the horseshoe setup. Given theshrinkage parameters, we impose the prior belief thatthe source signals are independent, but the posterior isnot necessarily so.Let D (1) be the matrix representation of (cid:52) suchthat S (1) (cid:2) D (1) (cid:3) T = V (1) , and let D (0) = I such that S (0) = V (0) . Now, we define the expression F = SS T + diag (cid:0) τ (0) τ (1) λ (0) λ (1) (cid:1) − , which appears below.or 1 ≤ i ≤ P , 1 ≤ n ≤ N , and d ∈ { , } , we derivethe full conditionals for the posterior distribution of themain model components below. Distributions of all ad-ditional parameters are provided in the SupplementaryMaterials. First, M i · |· ∼ N (cid:0) F − SY i · , ψ i F − (cid:1) ψ i |· ∼ Γ − (cid:18) N , Y i · − M i · S ) T ( Y i · − M i · S ) (cid:19) and for V ( d ) · n , the full conditional distribution isN (cid:18)(cid:104) B ( n ) (cid:105) − M T Ψ − C ( n ) (cid:104) D ( d ) (cid:105) − · n , (cid:104) B ( n ) (cid:105) − (cid:19) where B ( n ) = M T Ψ − M (cid:18)(cid:104) D ( d ) (cid:105) − Tn · (cid:104) D ( d ) (cid:105) − · n (cid:19) +diag (cid:16) φ ( d ) n λ ( d ) γ ( d ) · n τ ( d ) (cid:17) − C ( n ) = Y − M S + M V ( d ) · n (cid:104) D ( d ) (cid:105) − Tn · . We use Gibbs sampling to approximate the poste-rior. The procedure is easily parallelized. Furthermore,the number of model components and parameters de-pend on K and correctly setting a small K can signifi-cantly reduce computational time.In our modified Y = M S + E model, multiple levelsof sparsity regulate the transformations each solutionpair M and S can take to reach a different solution pair,but we cannot identify the sign and scaling of M and S .To recover the components and parameters empirically,we use the median of the posterior samples to providerobustness against possible movements of the samplingpath between different solutions. In our data model, V (0) and V (1) contain the changes for each latent variable at eachindex. The matrices are sparse since only entries whichcorrespond to changes are nonzero. Let f ( d ) n be theelement with the largest magnitude in V ( d ) · n . At anyindex n , f ( d ) n is nonzero if and only if there is a changeof type d in at least one latent variable. Finding all suchindices is equivalent to finding the change locations. Weuse the median defined (cid:98) g ( d ) n = median (cid:16) (cid:98) f ( d ) n (cid:17) for robustness with empirical samples.Since we impose horseshoe priors on V ( d ) , theentries are shrunk to approximately zero but not exactlyzero. To identify the approximately zero values in the estimated (cid:98) g ( d ) , we apply kernel density estimation on | (cid:98) g ( d ) | with a rectangular kernel and set the cutoff to beat the first minimum in the density function such thatthe minimum value is below threshold δ . The thresholdensures that the approximately zero and non-zero valuesare sufficiently different. We set δ = 10 − for all ourexperiments. We fit the full Bayesian latent variable model in Section4.1 by first fitting a partial model. The partial modeldiffers only in that it does not include S (0) or V (0) and their associated parameters, and hence we dropthe superscripts when referring to its components andparameters. Changepoints cpt detected by the partialmodel are a mix of additive outliers (AO) and level shifts(LS), with the former being detected as two consecutivemean changes of opposite signs in (cid:98) g . We distinguishbetween the two types of changes according to thisobservation with Algorithm 1, and produce additiveoutliers cpt cpt
1. We decompose theestimated components and parameters from the partialmodel according to cpt cpt
1, and pass them tothe full model as initialization. For example, V (0) isinitialized with values from (cid:98) V at cpt
0, and V (1) isinitialized with values from (cid:98) V at cpt Algorithm 1:
Separating AO and LS changes
Data:
Estimated (cid:98) g , ordered change points cpt Result:
Additive outliers cpt
0, level shifts cpt cpt cpt {} ; i = 1; while not at end of cpt do condition 1: cpt [ i + 1] − cpt [ i ] = 1; condition 2: (cid:98) g corresponding to cpt [ i ] and cpt [ i + 1] are of opposite signs; if condition 1 and 2 are True then add cpt [ i ] to cpt i = i + 2; else add cpt [ i ] to cpt i = i + 1; end end The partial model is smaller and hence can quicklyestimate components and parameters for initialization.This step stabilizes fitting the full model and helpsto achieve better distinction between the two types ofchanges. The entire procedure is shown in Figure 2.The final two boxes in green indicate the final outputsfor change detection and source recovery. igure 2:
Implementation procedure. From observations Y, a partial model is first fit and its estimations initialize the full Bayesianmodel. Final estimates of source signals and change points are obtained from the median of MCMC samples.
We conduct several experiments according to the model Y = M S + E described in Section 3. We fix the la-tent space dimensionality r = 3, and vary N and P .Some methods require a user-specified K as an esti-mate for r , and we test their robustness to the se-lection of K . Entries of M are drawn independentlyfrom Unif( − , ψ i ∼ Unif(0 . , { , , , . . . , N − } . This ensures that levelshifts are at least of length two and that we do not un-intentionally construct level shifts through consecutiveadditive outliers. To construct sparse changes, at eachchange location, the number of latent signals experienc-ing change is selected uniformly at random. Changemagnitudes are drawn from Unif(1 ,
5) with the sign be-ing equally likely to be positive or negative.We compare ABACUS against state-of-the-artchange detection techniques and popular latent vari-able models which are marked by × and ◦ , respectively,in plots in this section. We use default parameters insoftware packages unless otherwise specified. To findadditive outliers, we set the minimum segment lengthparameter to one where possible in competing changedetection implementations. The detected changes arecategorized into additive outliers and level shifts us-ing Algorithm 1 without Condition 2, except for TSOmentioned below which automatically outputs differenttypes of changes. For all MCMC procedures, number ofiterations is 3000 and burn-in is 500. Each simulation isrun 100 times, and we report the average performanceaccording to the evaluation metrics in Section 6.1.Amongst competing multivariate change detectionmethods, GFLseg [6] finds candidate mean changes bygroup fused Lasso followed by selection via dynamic pro-gramming. E-divisive [20] uses binary segmentation toiteratively locate each single change point through mea-suring between-segment distance by the energy statis- tic. We specify its moment index parameter α = 2 tofind level shifts, and min.size = 2 the smallest seg-ment length allowed, which implies E-divisive is unableto find additive outliers. BCP [10] is a Bayesian methodwhich models the presence of mean change at each lo-cation through an indicator variable and uses MCMCsampling to infer the posterior probability of change.BCP outputs a set of change points corresponding toeach posterior sample, hence for evaluation we computethe average metric across all these sets. We also com-bine BPCA [4] and BCP to obtain a two-step Bayesianapproach to first compress and then detect.Inspect [25] transforms observations into a univari-ate series through cumulative sum transformation be-fore applying wild binary segmentation. We also testthree univariate methods by first applying PCA tothe observations. PELT [18] is a popular paramet-ric approach that uses dynamic programming to effi-ciently find the segmentation that minimizes the nega-tive log-likelihood plus a penalty. We refer to the non-parametric version as np-PELT, which uses the empir-ical distribution instead [14]. A third method, TSO,jointly estimates ARIMA model parameters and changeeffects due to additive outliers and level shifts [8].To fit the latent variable model, we tested againstwell-established methods including Independent Com-ponent Analysis (ICA), Factor Analysis (FA) andBayesian Principal Component Analysis (BPCA). Notethat ICA and FA do not impose sparsity assumptions,whereas BPCA imposes sparsity on the columns of M .For ICA, we use the FastICA implementation whichmeasures non-Gaussianity using negentropy [16]. ForFA, we use the factanal function in R [24] which auto-matically checks for identifiability given K and does notfit a model if K is too large to fit a unique model. We evaluate the detectionof additive outliers and level shifts separately since somecompeting methods [20] detect one but not the other.e report precision and recall, and treat an estimate asaccurate if it is within w of a true change location. Weset w = 1 for the small sample experiment in Section6.2, and w = 3 for the larger sample experiments inSection 6.3 and 6.4.We evaluate the quality of model recovery throughcomponents M and S , and noise variance parameter ψ . Given true mixing matrix M and estimate (cid:99) M , wecenter and scale each row of the matrices and measuretheir dissimilarity using the squared trace metric in [12], (cid:15) M = 1 P T r (cid:16)
M M T − (cid:99) M (cid:99) M T (cid:17) . The metric (cid:15) M is invariant to orthogonal rotation andallows cases where either M M T or (cid:99) M (cid:99) M T is singular.Next, given true source signals S and estimate (cid:98) S , wemeasure their dissimilarity using a variation of averagedsquared Euclidean distance (cid:15) S = 1 r r (cid:88) i =1 (1 − | ρ i | )where ρ i is the Pearson correlation coefficient between S i · and some (cid:98) S j · , and each pair is found greedily bydescending magnitude of correlation. This measure isinvariant to sign and label switching. Finally, giventrue noise variance ψ and estimate (cid:98) ψ , the difference ismeasured by their scaled squared norm (cid:15) E = 1 P (cid:107) ψ − (cid:98) ψ (cid:107) . P We test thecase of small sample size N = 100 and varying P ∈{ , , , , } . Each sample has two additiveoutliers and two level shifts, and K is set to 5.As seen from Figure 3, competing methods havehigh precision but low recall on additive outliers. As P increases, ABACUS can locate most of the additive out-liers, and is one of the best-performing methods for levelshifts. Both precision and recall on level shifts decreaseas P increases for BCP, possibly because parameterssuch as the prior on change probabilities need to be ad-justed. BPCA + BCP has more consistent performance,indicating the advantage of detecting changes on latentsignals. In terms of model recovery, our method alsogives the lowest errors for M , S and ψ , see Figure 4. N We fix P = 10and vary N ∈ { , , , , , } . Eachsample has N additive outliers and N level shifts,and K is set to 5. Performance of all methods isconsistent across N , as shown in Figures 5 and 6. BCPshows deteriorating performance in detecting level shifts just as it did in Section 6.2, again possibly becausemodel parameters need to be adjusted according to thesample size. Overall, ABACUS offers the best balanceof precision and recall on additive outliers while all othercompeting change detection methods tend to miss them.ABACUS has the highest recall for level shifts, andalmost always has the lowest errors for model recovery. K We fix P = 10and N = 1000. Each sample has ten additive outliersand ten level shifts. We vary the user-specified estimateof the latent space dimensionality K between 2 and 9.The true dimensionality r is 3. The horizontal lines inFigures 7 and 8 correspond to results of methods whichdo not have the parameter K . According to Figure 7,the change detection results of ABACUS are consistentacross K . From Figure 8, ABACUS has much moreconsistent error (cid:15) S in S compared to competing latentvariable models, whose (cid:15) S increases sharply at K ≥ r . In both data applications below we set K = 5 and alsostudy the robustness of ABACUS to different K values. Array-based comparative genomichybridization (aCGH) is a technique for studying copynumber alterations in event of diseases. We obtain thedataset from the R package ecp [17], which has alreadyremoved sequences with more than 7% missing values,and leaves 43 samples of different individuals withbladder tumor. Each sample has 2215 probes measuringthe log2 ratio between the number of transcribed DNAcopies from tumorous cells and from a healthy reference[13]. A negative ratio indicates deletion, a positiveratio indicates amplification, and zero indicates anunaltered segment. We expect shared change locationsfor individuals with the same medical condition.To reduce computations and ease visualization, wethin the samples by taking every 20 th value. We arriveat a dataset with P = 43 and N = 111. ABACUS takesapproximately one minute to run on a standard desktopcomputer, and finds three additive outliers and sevenlevel shifts. An additive outlier here indicates a shortersegment of genetic aberration compared to a level shift.A plot of all 43 samples with the estimated changepoints overlaid is in the Supplementary Materials.At least 99% of the variance of our estimated latentsignals can be explained by four principal components,while those found by ICA and FA require all five. Asobserved in Figure 9, the third latent source signal re-covered exhibits no evident changes. We map the fourother signals to unique sets of genetic aberrations in dif-ferent stages of bladder tumor in Table 1. For instance, a) Additive outliers (b)
Level shifts
Figure 3:
Average errors in change detection as data dimensionality P is varied; N = 100 and K = 5 are fixed. (a) Error (cid:15) M for M (b) Error (cid:15) S for S (c) Error (cid:15) E for ψ Figure 4:
Average errors in model recovery as data dimensionality P is varied; N = 100 and K = 5 are fixed. FA does not supportcomputations for P = 110 due to non-identifiability. (a) Additive outliers (b)
Level shifts
Figure 5:
Average errors in change detection as sample size N is varied; P = 10 and K = 5 are fixed. (a) Error (cid:15) M for M (b) Error (cid:15) S for S (c) Error (cid:15) E for ψ Figure 6:
Average errors in model recovery as sample size N is varied; P = 10 and K = 5 are fixed. (a) Additive outliers (b)
Level shifts
Figure 7:
Average errors in change detection as estimated latent space dimensionality K is varied; fixed N = 1000 and P = 10. (a) Error (cid:15) M for M (b) Error (cid:15) S for S (c) Error (cid:15) E for ψ Figure 8:
Average errors in model recovery as latent space dimensionality parameter K is varied; N = 1000 and P = 10 are fixed.FA does not support computations for K ≥ igure 9: aCGH: Latent source signals (1-5) recovered (black),and additive outliers (red) and level shifts (blue) detected. Graylines indicate the boundaries between chromosome pairs. S Chromosome arm with changes Tumor stage1 2q, 3q, 20p/q pT pT pT a , pT , pT − pT − Table 1: aCGH: Genetic aberrations corresponding to changesdetected on latent source signals. To read the table, 20p is theshort arm of chromosome 20, and 20q is the long arm. Tumorstages range from a , 1 to 4 in order of severity. patients with genetic aberrations on chromosome arms2q, 3q and 20p/q simultaneously tend to be in tumorstage pT , hence the changes detected can be indicativeof diseases for new patients. The mapping is establishedbased on a bladder tumor research article [5] which liststhe frequent genomic alterations by chromosome arm intumor stages pT a , pT and pT − . Each stage is deter-mined pathologically depending on the size and locationof the tumor.ABACUS performs consistently across different K .Figure 10 shows that for K ∈ { , , , , } , thechange points and latent source signals recovered arevery similar to those found with K = 5. Thisdataset contains per-minute measurements of electricpower consumption in one household and is available onthe UCI Machine Learning Repository [9]. The data hasseven dimensions including global active power (GAP),global reactive power (GRP), voltage (V), global inten-sity (GI), and three sub-meterings corresponding to thekitchen (S1), laundry room (S2) and heating system(S3). We expect shared change points since the sevendimensions are related arithmetically, and some electri-cal appliances tend to be used simultaneously. For in- (a)
Additive outliers (red)and level shifts (blue) (b)
Average correlation to la-tent signals at K = 5 Figure 10: aCGH: Changes and latent source signals recoveredby ABACUS are similar regardless of the specification of K.
Figure 11:
Power: Latent source signals (1-5) recovered (black),and additive outliers (red) and level shifts (blue) detected. (a)
Additive outliers (red)and level shifts (blue) (b)
Average correlation to la-tent signals at K = 5 Figure 12:
Power: Changes and latent source signals recoveredby ABACUS are similar regardless of the specification of K. stance,
GAP − S1 − S2 − S3 is the power consumedby appliances outside of the kitchen, laundry room andheating system. We analyze a full day’s worth of data,that is, the observation matrix has P = 7 and N = 1440.ABACUS takes approximately fifteen minutes to run ona standard desktop computer.The Supplementary Materials contain a plot of thestandardized data with estimated changes overlaid. Al-though the data does not follow our model assumptionsexactly since the amount of fluctuations or noise is moresignificant in the first half of the day, and there are mi-nor trend changes in the second half of the day, ABA-CUS is robust and with post-processing it finds one ad-ditive outlier and sixteen level shifts. We post-processby dynamic programming to prune the initially esti-mated level shifts. This is similar to GFLseg [6], exceptthat we apply the procedure on the latent source signalswhich are less contaminated by noise.The change points are indicative of the household’spattern of electricity usage, which concentrates in thefirst half of the day as illustrated in Figure 11. Thefourth latent signal reflects the usage fluctuations andtrends which differ across the two halves of the day asmeasured by GAP and GI. ABACUS performs consis-tently across different specifications of K . Figure 12shows that for K ∈ { , , , , } , the estimated changepoints and latent source signals recovered are similar tothose found at K = 5.Since the sub-meterings S1, S2 and S3 demonstratedistinct level shifts when the respective appliances areutilized, we extract ground truths for level shifts by find-ing positions where these signals deviate from their baselevels. Compared to other change detection methods inFigure 13 and 14, ABACUS has the best overall perfor-mance with precision = 1 and recall = 0 . igure 13: Power: Additive outliers (red) and level shifts (blue)estimated vs ground truth level shifts (green). (a)
Precision (b)
Recall
Figure 14:
Power: Performance in estimating level shifts.
In this paper, we propose ABACUS, an automaticchange detection procedure which makes use of Bayesianlatent variable modeling. Due to the separation ofadditive outlier and level shift effects in the model,ABACUS naturally identifies these two types of changesseparately, unlike many competing approaches.In simulations, ABACUS shows competitive or su-perior performance in both change detection and modelrecovery. In two real data applications, ABACUS foundrelevant change points and source signals. It is robustto over-specification of K , an important property sincethe true value is rarely known to the user in practice. NSF grant DMS-1455172, a Xerox PARC Faculty Re-search Award, and Cornell University Atkinsons Cen-ter for a Sustainable Future AVF-2017 is gratefully ac-knowledged.
References [1]
C. Alippi, G. Boracchi, D. Carrera, andM. Roveri , Change detection in multivariate datas-treams: Likelihood and detectability loss , in IJCAI,2016.[2]
G. K. Atia , Change detection with compressivemeasurements , Signal Processing Letters, 22 (2015),pp. 182–186.[3]
D. Barry and J. A. Hartigan , A bayesian analysisfor change point problems , JASA, 88 (1993), pp. 309–319.[4]
C. M. Bishop , Bayesian pca , in NIPS, Cambridge,MA, USA, 1999, MIT Press, pp. 382–388. [5]
E. Blaveri, J. L. Brewer, R. Roydasgupta,J. Fridlyand, S. DeVries, T. Koppie, S. Pejavar,K. Mehta, P. Carroll, J. P. Simko, and F. M.Waldman , Bladder cancer stage and outcome by array-based comparative genomic hybridization , Clinical Can-cer Research, 11 (2005), pp. 7012–7022.[6]
K. Bleakley and J.-P. Vert , The group fused lassofor multiple change-point detection , (2011).[7]
C. M. Carvalho, N. G. Polson, and J. G. Scott , Handling sparsity via the horseshoe , in AISTATS, vol. 5of Proceedings of Machine Learning Research, PMLR,16–18 Apr 2009, pp. 73–80.[8]
J. L. de Lacalle , tsoutliers: Detection of Outliers inTime Series , 2017. R package version 0.6-6.[9] D. Dheeru and E. Karra Taniskidou , UCI machinelearning repository , 2017.[10]
C. Erdman and J. Emerson , bcp: An r packagefor performing a bayesian analysis of change pointproblems , JSS, 23 (2007), pp. 1–13.[11] P. Fryzlewicz , Wild binary segmentation for multiplechange-point detection , The Annals of Statistics, 42(2014), pp. 2243–2281.[12]
C. Gao, C. Brown, and B. Engelhardt , A latentfactor model with a mixture of sparse and dense factorsto model gene expression data with confounding effects ,(2013).[13]
F. Harl´e, F. Chatelain, C. Gouy-Pailler, andS. Achard , Bayesian model for multiple change-pointsdetection in multivariate time series , IEEE Transac-tions on Signal Processing, 64 (2016), pp. 4351–4362.[14]
K. Haynes, R. Killick, P. Fearnhead, andI. Eckley , changepoint.np: Methods for Nonparamet-ric Changepoint Detection , 2016. R package version0.0.2.[15] A. Hyvarinen, J. Karhunen, and E. Oja , Indepen-dent component analysis , in Wiley Interscience, 2001.[16]
A. Hyvrinen and E. Oja , Independent componentanalysis: algorithms and applications , Neural Net-works, 13 (2000), pp. 411 – 430.[17]
N. A. James and D. S. Matteson , ecp: An R pack-age for nonparametric multiple change point analysisof multivariate data , JSS, 62 (2014), pp. 1–25.[18] R. Killick, P. Fearnhead, and I. Eckley , Optimaldetection of changepoints with a linear computationalcost , JASA, 107 (2012), pp. 1590–1598.[19]
D. R. Kowal, D. S. Matteson, and D. Ruppert , Dynamic shrinkage processes , (2018).[20]
D. S. Matteson and N. A. James , A nonparametricapproach for multiple change point analysis of multi-variate data , JASA, 109 (2014), pp. 334–345.[21]
D. S. Matteson and R. S. Tsay , Independent com-ponent analysis via distance covariance , JASA, 112(2017), pp. 623–637.[22]
A. B. Olshen and E. Venkatraman , Circular binarysegmentation for the analysis of array-based dna copynumber data , Biostatistics, 5 (2004), pp. 557 – 572.[23]
T. D. Popescu , Blind separation of vibration sig-nals and source change detection application to ma-hine monitoring , Applied Mathematical Modelling, 34(2010), pp. 3408 – 3421.[24]
R Foundation for Statistical Computing , R: ALanguage and Environment for Statistical Computing ,Vienna, Austria, 2018.[25]
W. Tengyao and S. R. J. , High dimensional changepoint estimation via sparse projection , JRSS: Series B,80 (2018), pp. 57–83.[26]
Y. Xie, M. Wang, and A. Thompson , Sketching forsequential change-point detection , in GlobalSIP, Dec2015, pp. 78–82.[27]
W. Zhang, N. A. James, and D. S. Matteson , Pruning and nonparametric multiple change point de-tection , in 2017 IEEE International Conference on DataMining Workshops (ICDMW), Nov 2017, pp. 288–295.[28]
Q. Zhao, D. Meng, Z. Xu, W. Zuo, and L. Zhang , Robust principal component analysis with complexnoise , in ICML - Volume 32, JMLR.org, 2014, pp. II–55–II–63.
A Supplementary MaterialsA.1 Posterior Distributions
Let D (1) be the ma-trix representation of (cid:52) such that S (1) (cid:2) D (1) (cid:3) T = V (1) .Also D (0) = I such that S (0) = V (0) . We define thefollowing expressions for the full conditionals of the pos-terior distributions: F = SS T + diag (cid:16) τ (0) τ (1) λ (0) λ (1) (cid:17) − G (0) = p (cid:88) i =1 K (cid:88) h =1 M ih λ (0) h λ (1) h τ (1) ψ i + N (cid:88) n =1 K (cid:88) h =1 (cid:104) V (0) hn (cid:105) φ (0) n λ (0) h γ (0) hn H (0) h = p (cid:88) i =1 M ih τ (0) τ (1) λ (1) h ψ i + N (cid:88) n =1 (cid:104) V (0) hn (cid:105) φ (0) n γ (0) hn τ (0) For 1 ≤ i ≤ P and 1 ≤ h ≤ K and 1 ≤ n ≤ N and d ∈ { , } , we derive the full conditionals for theposterior distributions below. We leave out τ (1) and λ (1) h since their full conditional distributions are similarin form to those of τ (0) and λ (0) h respectively. M i · |· ∼ N (cid:0) F − SY i · , ψ i F − (cid:1) ψ i |· ∼ Γ − (cid:18) N , Y i · − M i · S ) T ( Y i · − M i · S ) (cid:19) τ (0) |· ∼ Γ − (cid:18) K ( p + N )2 , ξ (0) + G (0) (cid:19) ξ ( d ) |· ∼ Γ − (cid:18) , τ ( d ) (cid:19) λ (0) h |· ∼ Γ − (cid:32) p + N , η (0) h + H (0) h (cid:33) η ( d ) h |· ∼ Γ − (cid:32) , λ ( d ) h (cid:33) φ ( d ) n |· ∼ Γ − K , ω ( d ) n + K (cid:88) h =1 (cid:104) V ( d ) hn (cid:105) λ ( d ) h γ ( d ) hn τ ( d ) ω ( d ) n |· ∼ Γ − (cid:32) , φ ( d ) n (cid:33) γ ( d ) hn |· ∼ Γ − , ζ ( d ) hn + (cid:104) V ( d ) hn (cid:105) λ ( d ) h φ ( d ) n τ ( d ) ζ ( d ) hn |· ∼ Γ − (cid:32) , γ ( d ) hn (cid:33) For V ( d ) · n , the full conditional distribution isN (cid:18)(cid:104) B ( n ) (cid:105) − M T Ψ − C ( n ) (cid:104) D ( d ) (cid:105) − · n , (cid:104) B ( n ) (cid:105) − (cid:19) where B ( n ) = M T Ψ − M (cid:18)(cid:104) D ( d ) (cid:105) − Tn · (cid:104) D ( d ) (cid:105) − · n (cid:19) +diag (cid:16) φ ( d ) n λ ( d ) γ ( d ) · n τ ( d ) (cid:17) − C ( n ) = Y − M S + M V ( d ) · n (cid:104) D ( d ) (cid:105) − Tn · A.2 Additional Plots for aCGH Data
Figure 15plots all 43 samples with the estimated change pointsoverlaid.For comparison, we include again the recoveredlatent source signals with the estimated change pointsoverlaid in Figure 16.
A.3 Additional Plots for Electric Power Con-sumption Data
Figure 17 plots the standardized datafrom the electric power consumption dataset with esti-mated changes overlaid. ABACUS is run on the stan-dardized dataset.For comparison, we include again the recoveredlatent source signals with the estimated change pointsoverlaid in Figure 18. igure 15: aCGH: Additive outliers (red) and level shifts (blue) detected by ABACUS. Gray lines indicate the boundaries betweenchromosome pairs. Additive outliers correspond to shorter segments of genetic aberrations and level shifts correspond to longersegments.
Figure 16: aCGH: Latent source signals (1-5) recovered (black), and additive outliers (red) and level shifts (blue) detected. Graylines indicate the boundaries between chromosome pairs. igure 17:
Power: Additive outliers (red) and level shifts (blue) detected by ABACUS. The level shifts detected correspond wellwith appliance usages in sub-meterings S1, S2 and S3.