# Changepoint detection on a graph of time series

CChangepoint detection on a graph of time series

Karl L. Hallgren ∗ , Nicholas A. Heard and Melissa J. Turcotte Department of Mathematics, Imperial College London Microsoft Corporation

Abstract

When analysing multiple time series that may be subject to changepoints, it is sometimespossible to specify a priori, by means of a graph G , which pairs of time series are likely to beimpacted by simultaneous changepoints. This article proposes a novel Bayesian changepointmodel for multiple time series that borrows strength across clusters of connected time seriesin G to detect weak signals for synchronous changepoints. The graphical changepoint modelis further extended to allow dependence between nearby but not necessarily synchronouschangepoints across neighbour time series in G . A novel reversible jump MCMC algorithmmaking use of auxiliary variables is proposed to sample from the graphical changepoint model.The merit of the proposed model is demonstrated via a changepoint analysis of real networkauthentication data from Los Alamos National Laboratory (LANL), with some success atdetecting weak signals for network intrusions across users that are linked by network connec-tivity, whilst limiting the number of false alerts. Keywords : changepoint detection; graphical model; informative prior; auxiliary variable MCMC;cyber-security.

1. Introduction

This article considers the problem of jointly analysing multiple time series that may be subject tochangepoints. For each time series, it is of interest to identify the times when the distribution ofobservations undergoes a change. The focus is on settings for which there are particular clustersof time series more likely to be simultaneously aﬀected by changepoints, where the structure ofthose clusters is described by a graph.To motivate our work, we consider an application of changepoint detection in cyber-security.A cyber attack typically changes the behaviour of the target network. Therefore, to detect thepresence of a network intrusion, it is informative to monitor for changes in the high-volumedata sources that are collected inside an enterprise computer network. However, cyber data areoften complex and heterogeneous, and apparent changes are not guaranteed to correspond to anattack. As a result, to limit the number of false alerts and yet not overlook weak signals fromgenuine, small footprint attacks, it is key to incorporate expert knowledge. A commonly heldbelief of security experts is that cyber attacks tend to correspond to coordinated sequences ofevents across multiple connected endpoints on the network (Sexton et al., 2015). For example,some attacks are most likely to be identiﬁed through a chain of behavioural changes acrossrelated activity types on the same machine, or across machines linked by network connectivity.Hence, there is a requirement for methods that combine evidence from multiple sources to detectchangepoints, whilst taking into account which structures of changepoints are a priori likely tobe of interest. ∗ Email: [email protected] a r X i v : . [ s t a t . M E ] F e b here already exist changepoint detection methods that can borrow strength across multipletime series to detect synchronous changepoints across a subset of the time series; however,limited attention has been given to structures formed by changepoints across multiple timeseries. For instance, Jeng et al. (2012) uses higher criticism in conjuction with circular binarysegmentation to detect synchronous changepoints that aﬀect a proportion of the time series.Bardwell and Fearnhead (2017) gives a Bayesian approach to analysing multiple time serieswith the aim of detecting regions where the distribution of the data deviates from some normalbehaviour, allowing that changepoints may only be present in a subset of time series. Bolton andHeard (2018) presents a Bayesian changepoint model for multivariate processes that are subjectto regime switching. Wang and Samworth (2018) proposes a method via projection to studymultivariate time series in which, at certain time points, the mean structure changes in a sparsesubset of the components. Grundy et al. (2020) presents a method that maps a high-dimensionaltime series to a two dimensional space in order to detect changes in the mean and variance of theoriginal time series. Fisch et al. (2019) gives an approach to detect collective anomalies withinmultivariate data streams such that anomalies are not necessarily perfectly aligned. Bardwellet al. (2019) addresses the problem of pooling information across multiple time series to detectthe most recent changepoints.Graphical models provide a useful framework for characterising joint distributions for randomvariables by means of a graph: the nodes identify the random variables and the edges characterisedependencies among these variables (Lauritzen, 1996). Areas of application are diverse and in-clude real-life network modelling where edges are natural representations of network connections.In particular, graphical models have been employed to encode prior beliefs on the entities of agraph: for example, in the context of Bayesian variable selection for regression models, Li andZhang (2010) assumes that covariates lie on an undirected graph and formulates an Ising modelprior on the covariate space to incorporate structural information.This article proposes a novel Bayesian changepoint model for graphs of time series that takesinto account structures of changepoints across time series. The model relies on the construction ofa class of undirected graphical model-based priors for changepoints that can be used to encodeprior beliefs on the dependence structure of changepoints across time series. Our approachassumes that time series indices lie in an undirected graph G that describes which pairs of timeseries are likely to be simultaneous aﬀected by changepoints. For our purposes, changepointsare represented in discrete time by a binary matrix S = ( S i,t ) , such that S i,t indicates whetherthe time point t is a changepoint for the time series with index i . Then, independent andidentical Markov random ﬁelds (Lauritzen, 1996) with respect to G are assumed a priori for thecolumns of S . As a result, the model assumes that clusters of time series, according to G , arelikely to be simultaneously aﬀected by changepoints. A key consequence of the model is thatstronger evidence from data is required to infer scattered synchronous changepoints than clusteredsynchronous changepoints according to G . Furthermore, a more general model is proposed thatadmits changepoints not occuring at the exact same time to form a priori likely structures acrosstime series; the extended model supposes that changepoints may cluster according to G withinsome time windows of possibly unknown lengths, which are speciﬁc to each series.A common approach to sampling changepoints for a single time series is that of Green(1995) and consists of using a reversible jump MCMC algorithm to explore the state spaceof the changepoints: at each iteration of the algorithm, a new changepoint is proposed, orelse an existing changepoint is either deleted or shifted to a new position. Specifying a jointmodel for dependent changepoints across multiple time series introduces additional computationalchallenges that are not present when changepoints are inferred for each time series independently.It will be demonstrated via a simulation study that it can be impractical to simply propose updatesto the changepoints of a randomly chosen time series via one of the moves of Green (1995). Toeﬃciently explore the state space of dependent changepoints, it is necessary to consider joint2roposals for changepoints across multiple time series.We propose an MCMC algorithm making use of auxiliary variables (Besag and Green, 1993)to sample from the posterior distribution. Swendsen and Wang (1987) and Higdon (1998)provide notable examples of use of auxiliary variables in MCMC schemes that improve mixingand convergence for undirected graphical models. In brief, our sampling strategy is the following.The changepoint parameter space is augmented with auxiliary variables that induce clustersof time series indices according to the dependence graph G . Then, the MCMC algorithm ofGreen (1995) is extended to sample from the augmented parameter space, such that, at eachiteration of the algorithm, a new cluster of changepoints may be proposed or an existing clusterof changepoints may be deleted or shifted.Bayesian inference for changepoints quantiﬁes uncertainty about the number and the positionsof changepoints. However, in some applications such as cyber-security, it will also be necessary toreport a point estimate for changepoint parameters. Yet no existing loss function in the literatureseems suitable for taking into account both the number and the positions of changepoints. Toaddress this gap, we propose using matchings in graphs (Bondy and Murty, 1976) to deﬁnea novel loss function for changepoints, which can be used to obtain a point estimate from aposterior sample of candidate changepoints.The practical beneﬁts of the proposed graphical model are demonstrated via a changepointanalysis of real network authentication data from Los Alamos National Laboratory (LANL), wherea subset of the data relating to a ‘red team’ exercise provide a proxy for intruder behaviour (Kent,2015). The challenge consists of monitoring for temporal changes in the authentication activityof network users to detect the presence of red team actors. The graphical changepoint model canencode beliefs that signals for network intrusions are a priori likely to occur at nearby times forusers historically linked by previous network connectivity. We show that, as a consequence, theproposed model can detect weak signals for red team activity in the network, whilst limiting thenumber of false alerts, in contrast with a standard model assuming independence of behaviouralchanges across users.The remainder of the article is organised as follows. Section 2 presents Bayesian changepointmodelling for multiple time series, proposing a novel loss function for assessing changepoints.Section 3 motivates a graph-dependent changepoint model with an application in cyber-security.Section 4 introduces a class of changepoint models that take into account graph-dependentstructures of changepoints across time series. Section 5 proposes an auxiliary variable MCMCsampling strategy. Section 6 presents a simulation study demonstrating the merit of the proposedgraphical changepoint model and sampling strategy. Section 7 presents results of a changepointanalysis of network authentication data, illustrating the practical beneﬁts of the proposed model.Section 8 brieﬂy discusses some potential extensions.

2. Changepoint analysis for multiple time series

This section introduces the problem of Bayesian changepoint analysis for multiple time series,proposing a novel loss function for assessing changepoints.

Suppose we observe L time series of length T . For notational simplicity it will be assumedthere are no missing data, but these would present no methodological complication. For all i = 1 , . . . , L , let x i = ( x i, , . . . , x i,T ) denote the data for the time series with index i , and let x = ( x , . . . , x L ) .For each time series i = 1 , . . . , L , suppose there are k i (cid:62) changepoints that partition thepassage of time into k i + 1 segments. The ordered locations of the changepoints, denoted by3 i = ( τ i, , . . . , τ i,k i ) , belong to the set T k i , where T k = (cid:110) ( τ , . . . , τ k ) ∈ N k ; 1 ≡ τ < τ < · · · < τ k < τ k +1 ≡ T + 1 (cid:111) . (1)In parametric modelling, the j th segment for the i th time series, which consists of the time points τ i,j − , . . . , τ i,j − , is assigned a segment parameter θ i,j . The segment parameters for the i th timeseries are assumed to be independent with identical prior π i ( θ ) . Conditional on the changepointsand the segment parameters, the data are assumed to be independent and identically distributedsuch that x i,t ∼ f i ( · | θ i,j ) , τ i,j − (cid:54) t < τ i,j , (2)for an assumed parametric density f i ( · | θ ) which is not necessarily identical for all time series.The parameters of interest are the changepoint parameters ( k , τ ) , where k = ( k , . . . , k L ) and τ = ( τ , . . . , τ L ) . Hence, motivated by computational considerations, it is assumed thatsegment parameters may be marginalised so that the likelihood of the data conditional on thechangepoints can be computed. Deﬁne L i ( t , t ) = (cid:90) (cid:32) t − (cid:89) t = t f i ( x i,t | θ ) (cid:33) π i ( θ ) d θ (3)for all i and t < t . Then, conditional on the changepoints, the likelihood of the data withinthe j th segment of the i th time series is L i ( τ i,j − , τ i,j ) , and the likelihood of the data x is (cid:81) Li =1 (cid:81) k i +1 j =1 L i ( τ i,j − , τ i,j ) . The integral (3) can be calculated analytically when π i ( θ ) is chosento be conjugate to f i ( · | θ ) ; and for non-conjugate cases, (3) may be calculated numerically forlow-dimensional segment parameters. Given a prior for the changepoint parameters, π ( k , τ ) , onecan consequently compute the posterior density function for the changepoint parameters, up toa normalising constant.When changepoints are assumed to be independent across time series, the posterior distri-bution of changepoints can be estimated for each time series separately. In this setting, it isstandard to assume a priori that, for all time series, discrete time changepoints follow a Bernoulliprocess (Fearnhead, 2006) such that, for all changepoint parameters ( k , τ ) , π ( k , τ | p ) = L (cid:89) i =1 p k i i (1 − p i ) T − − k i (4)for some Bernoulli parameters p = ( p , . . . , p L ) with < p i < for all i . For each time series i , the hyperparameter p i encodes prior belief on the expected number of changepoints. Fornotational convenience, we assume that, for all i , p i = p for some < p < . Note that itwill be useful to consider the probability p in terms of the logarithm of the corresponding oddslogit ( p ) = log { p/ (1 − p ) } . To give an account of the posterior distribution of changepoint parameters for multiple time series,for each time series i , following Green (1995), one may consider the posterior marginal distributionof the number of changepoints k i , and the posterior distribution of the changepoint positions τ i conditional on k i . However, in practice, for each time series i , it will also be of interest to reporta point estimate (ˆ k i , ˆ τ i ) for the changepoint parameters ( k i , τ i ) . Following normative Bayesiantheory, to deﬁne an optimal Bayes estimate for changepoints, we propose a loss function thatevaluates the quality of estimated changepoints. When assessing the cost associated with theestimate (ˆ k i , ˆ τ i ) of ( k i , τ i ) , both the number and the positions of changepoints must be takeninto account. To address this challenge, we use matchings in graphs, as deﬁned in Deﬁnition 1and Deﬁnition 2, to deﬁne a loss function L for changepoint estimates in Deﬁnition 3.4 eﬁnition 1. (Maximum matching in a graph). Let B = ( V, E ) be a graph where V is a vertexset and E ⊆ V × V is an edge set. A matching M in B is a subset of E such that no two edgesin M share a common vertex. A maximum matching in B is a matching that is not a subset ofa larger matching in B . Deﬁnition 2. (Minimum weight maximum matching in a graph).

Let B = ( V, E ) be a graphwith weights w i,j (cid:62) for all ( i, j ) ∈ E . A minimum weight maximum matching in B is amaximum matching in B for which the sum of weights of the edges is minimised. When B is a weighted bipartite graph, the Kuhn–Munkres algorithm, also known as theHungarian algorithm, (Bondy and Murty, 1976) ﬁnds a minimum weight maximum matching in B ; the time complexity of the algorithm is O ( | E || V | + | V | log log | V | ) , where V and E denotethe vertex set and the edge set of B , respectively. Deﬁnition 3. (Loss function L for changepoint estimates). Let γ (cid:62) . For all k, ˆ k (cid:62) , τ = ( τ , . . . , τ k ) ∈ T k and ˆ τ = (ˆ τ , . . . , ˆ τ ˆ k ) ∈ T ˆ k , let B be the weighted complete bipartite graphwith vertex sets V = { , . . . , k } and ˆ V = { , . . . , ˆ k } , and weights w i,j = min { γ, | τ i − ˆ τ j |} (5) for all i ∈ V and j ∈ ˆ V . Given a minimum weight maximum matching M in B , for all i ∈ V and j ∈ ˆ V , let m i,j = 1 if i and j are matched, that is ( i, j ) ∈ M , and m i,j = 0 otherwise.Then, deﬁne the loss to be L (cid:104) (ˆ k, ˆ τ ) , ( k, τ ) (cid:105) = γ | ˆ k − k | + 12 (cid:88) i ∈ V (cid:88) j ∈ V (cid:48) m i,j w i,j . (6)Consider the complete bipartite graph B with independent vertex sets V = { , . . . , k } , ˆ V = { , . . . , ˆ k } and weights (5). A minimum weight maximum matching M in B gives amatching of the elements of ˆ τ i and τ i that minimises the sum of distances (5) between matchedchangepoints. Given M , according to the loss function (6), the cost associated with the estimate (ˆ k i , ˆ τ i ) of ( k i , τ i ) is then obtained by adding the cost γ for each unmatched changepoint and thetotal distance between matched changepoints. Note that according to (5), the cost of matchingtwo changepoint positions, that are separated by more than γ time units, is equal to the cost ofan unmatched changepoint, namely γ . Therefore, the loss function L takes into account boththe number and the positions of changepoints, and the cost γ is chosen to be the maximumacceptable distance between a changepoint position and its estimated position. The optimalBayes estimate (ˆ k i , ˆ τ i ) is the changepoint parameters that minimise the expected posterior losswith respect to the posterior marginal distribution of the changepoints ( k i , τ i ) .Partially as a consequence of the intractability of the normalising constant of the posteriordensity of changepoint parameters, inference via simulations is required; Section 5 discusses algo-rithms to obtain a sample of approximate draws from the posterior distribution of changepointsfor all time series. Given a sample from the posterior distribution, the optimal Bayes estimate (ˆ k i , ˆ τ i ) for each series is approximately identiﬁed by numerically ﬁnding within the sample thechangepoint parameters that minimise the estimated posterior expected loss.

3. Motivational example: changepoint detection in cyber-security

To motivate the graphical changepoint model introduced in this article, we consider an applicationof changepoint detection in cyber-security. A cyber-attack typically changes the behaviour of5onnected endpoints on the target network. Therefore, to detect the presence of a networkintrusion, it is informative to monitor for changes in the behaviour of network entities.Kent (2015) presents a comprehensive data set summarising days of traﬃc on the enterprisenetwork of Los Alamos National Laboratory (LANL), which is available online at https://csr.lanl.gov/data/cyber1 . The network authentication data consist of records describingauthentication activity of users connecting from source computers to destination computers.The occurrence of a ‘red team’ penetration testing operation during the data collection periodmakes these data suitable for testing network intrusion detection methods. Further details onthe data are given in Appendix A.1.To detect occurrences of malicious activity in the network, the authentication activity of eachuser is monitored via hourly counts of network logons per source computer. Let M denote thenumber of distinct source computers in the network. For each user i , let x i,t = ( x i,t, , . . . , x i,t,M ) , (7)where x i,t,(cid:96) denotes the number of network logons initiated by user i from source computer (cid:96) during the t -th hour of the day data collection period.For each user i , it is of interest to detect temporal changes in the distribution of networklogons across source computers as possible evidence for malicious activity, and therefore the data x i, , . . . , x i,T are assumed to follow the changepoint model speciﬁed in Section 2.1, with f i ( · | θ ) = f ( · | θ ) , (8)where f ( · | θ ) denotes the density of the multinomial distribution with unknown probability pa-rameter θ ∼ Dirichlet ( M ) , where M denotes the M dimensional vector of ones, for each timeseries i . The data exhibit much variability, and some changes can correspond to legitimate ac-tivity, but chains of quasi-synchronous changes across multiple users that are connected on thenetwork are a priori more likely to be attack related (Sexton et al., 2015). This follows becauseattackers must often take control of multiple users to navigate in the network to achieve theirobjectives.The network can be represented by a graph G = ( V, E ) , (9)where the set of nodes V denotes the users and an edge ( i, i (cid:48) ) ∈ E ⊆ V × V indicates bothuser i and user i (cid:48) successfully initiated a network logon from the same source computer on thesame day. The edge set E , which indicates which pairs of users are linked on the network, isassumed to be readily available prior to running network intrusion detection methods; note that,in practice, the edge set could be derived from historic data.In this setting, the standard prior in (4) is not suﬃciently ﬂexible to encode prior beliefs onchangepoints. Appendix A.2 exposes limitations resulting from the assumption of changepointindependence across time series through a comparative study. No choice of p seems satisfactory:choosing a small value for p will limit the number of false alerts due to noise and user speciﬁclegitimate activity; yet it will also prevent the detection of weak signals for changes shared bydiﬀerent users which are linked in the network, that may be of great interest. It would bepreferable to specify a priori that changepoints are more likely to occur simultaneously acrosstime series that are linked in G , in order to require strong evidence from the data for changesimpacting a single user, yet possibly weak signals for changes that impact multiple users linkedin the network.This application of changepoint detection in cyber-security provides an example of a settingwith multiple time series where it is desirable to specify a priori which pairs of time series arelikely to be aﬀected by quasi-synchronous changepoints. For these settings, graphical modelsprovide a framework to describe dependency structures for these pairs of time series.6 . Graphical models for dependent changepoints across multipletime series This section proposes a ﬂexible class of models for dependent changepoints across time series.Given a graph describing which pairs of time series are a priori likely to be simultaneously aﬀectedby changepoints, changepoints are modelled by means of an undirected graphical model. Thechangepoint model is further extended by relaxing the assumption that dependent changepointsacross time series are synchronous; the extended model assumes dependent changepoints acrosstime series correspond to nearby but not necessarily identical time points.

In Section 2.1, changepoints were most simply deﬁned in terms of changepoint parameters ( k , τ ) .Subsequently, it will be useful to represent changepoints by means of a binary matrix. Forchangepoint parameters ( k , τ ) , let S = ( S i,t ) be the binary matrix such that, for all i = 1 , . . . , L and t = 2 , . . . , T , S i,t = (cid:26) if ∃ j ∈ { , . . . , k i } s.t. t = τ i,j otherwise , (10)so that ( k , τ ) and S are equivalent representations of the changepoints. Moreover, let S i, = S i,T +1 = 1 for all i .To encode the dependence structure of synchronous changepoints across time series, supposethe time series indices , . . . , L are the nodes of an undirected graph G with edge set E , andlet λ = ( λ i,i (cid:48) ) be a symmetric matrix of non-negative edge weights for the graph satisfying λ i,i (cid:48) > if and only if ( i, i (cid:48) ) ∈ E for all i, i (cid:48) = 1 , . . . , L . Then, conditional on λ , changepointsare assumed to have a prior distribution described by the weighted, undirected graph G suchthat, for all ( k , τ ) , π ( k , τ | p, λ ) = 1 Z ( p, λ ) T (cid:89) t =2 exp (cid:40) ¯ p L (cid:88) i =1 S i,t + (cid:88) i* ,the higher the probability for time series i and i (cid:48) to be simultaneously aﬀected by changepoints.Hence, with the prior in (11) it is possible to specify that changepoints are likely to simultaneouslyoccur across clusters of time series according to G .7o understand how to choose the changepoint hyperparameters ¯ p and λ in practice, it isinstructive to consider the conditional prior distribution of the components of the binary matrix S . Under (11), the conditional prior distribution of S i,t given S − ( i,t ) = { S i (cid:48) ,t (cid:48) : ( i (cid:48) , t (cid:48) ) (cid:54) = ( i, t ) } is π ( S i,t | S − ( i,t ) , p, λ ) ∝ exp S i,t ¯ p + (cid:88) i (cid:48) : ( i,i (cid:48) ) ∈ E λ i,i (cid:48) S i (cid:48) ,t , S i,t ∈ { , } . (13)Therefore, for all i and t , the hyperparameter p = 1 / (1 + exp {− ¯ p } ) corresponds to the priorprobability that t is a changepoint for the i th time series given that no changepoints occur attime t for the graph neighbour time series of i ; and, for all i (cid:48) such that ( i, i (cid:48) ) ∈ E , the interactionparameter λ i,i (cid:48) governs how much the conditional prior probability increases if the neighbourtime series i (cid:48) is impacted by a changepoint at time t . Moreover, to perceive the inﬂuence ofthe changepoint hyperparameters on the posterior distribution of changepoints, it is helpful toconsider the full conditional distribution of S i,t given S − ( i,t ) = { S i (cid:48) ,t (cid:48) ; ( i (cid:48) , t (cid:48) ) (cid:54) = ( i, t ) } , π ( S i,t | S − ( i,t ) , p, λ , x ) ∝ (cid:18) L i ( τ (cid:48) t , t ) L i ( t, τ (cid:48)(cid:48) t ) L i ( τ (cid:48) t , τ (cid:48)(cid:48) t ) (cid:19) S i,t π ( S i,t | S − ( i,t ) , p, λ ) ∝ exp S i,t log (cid:26) L i ( τ (cid:48) t , t ) L i ( t, τ (cid:48)(cid:48) t ) L i ( τ (cid:48) t , τ (cid:48)(cid:48) t ) (cid:27) + ¯ p + (cid:88) i (cid:48) : ( i,i (cid:48) ) ∈ E λ i,i (cid:48) S i (cid:48) ,t , (14)where L i is deﬁned in (3), τ (cid:48) t = max { t (cid:48) : t (cid:48) < t, S i,t (cid:48) = 1 } and τ (cid:48)(cid:48) t = min { t (cid:48) : t (cid:48) > t, S i,t (cid:48) = 1 } .In essence, ¯ p determines which level of evidence from the data can be considered strong, andthe edge weight parameters control how weak signals for synchronous changepoints, relative to ¯ p , can be combined across time series. In practice, it will often be natural to assume that, for all ( i, i (cid:48) ) ∈ E , λ i,i (cid:48) = λ for some ﬁxedvalue λ > . For all i and t , let n i,t = (cid:88) i (cid:48) :( i,i (cid:48) ) ∈ E S i (cid:48) ,t (15)be the number of neighbour time series of i that are aﬀected by a changepoint at time t . Then,under (11), the conditional prior distribution of S i,t given S − ( i,t ) = { S i (cid:48) ,t (cid:48) : ( i (cid:48) , t (cid:48) ) (cid:54) = ( i, t ) } is π ( S i,t | S − ( i,t ) , p, λ ) ∝ exp { S i,t (¯ p + λn i,t ) } , S i,t ∈ { , } . (16)Moreover, λ will typically be chosen relative to ¯ p and the degree distribution of the nodes in G .For example, it can be convenient to assume λ = λ s | ¯ p | /n , where n denotes the maximum degreeof G , for some λ s > . The prior distribution discussed in Section 4.1 is suitable for a wide variety of settings. Thissection provides graph motifs that are building blocks to encode the dependence structure ofchangepoints across multiple time series. For these examples, we consider the parametrisationintroduced in Section 4.1.2 and we provide some insight on how to choose the changepointhyperparameters p and λ , hoping it gives intuition on how to set hyperparameters for morecomplicated cases. These exemplar dependence structures for changepoints are explored via asimulation study presented in Section 6. 8 (a) (b) (c) Figure 1: Graphs of time series indices i ∈ { , . . . , } corresponding to diﬀerent dependencestructures for changepoints; panels (a), (b) and (c) correspond to a star, a × lattice and a -chain, respectively. Suppose the ultimate goal is to detect changepoints for one particular time series i = 1 , referredto as the central time series, and the remaining time series i = 2 , . . . , L , which are also susceptibleto changepoints, are called the peripheral time series. Moreover, for all t , it is known a priorithat the central time series is more likely to be aﬀected by a changepoint at time t as the numberof peripheral time series aﬀected by a changepoint at time t increases. Furthermore, supposethe peripheral time series changepoints are conditionally independent given the changepoints ofthe central time series. For the prior for dependent changepoints discussed in Section 4.1, itis then appropriate to consider a star graph for the indices of the time series such that E = { (1 , , . . . , (1 , L ) } . Panel (a) of Figure 1 gives a cartoon representation of a star graph for time series. Under the prior distribution (11), p corresponds to the prior probability that thecentral time series is aﬀected by a changepoint when no peripheral time series are aﬀected by achangepoint, and λ > is chosen such that, for all t , the prior conditional probability that t is achangepoint for the central time series is / (1 + exp {− ¯ p − n ,t λ } ) where n ,t ∈ { , . . . , L − } is the number of peripheral time series i > aﬀected by a changepoint at time t . It might be natural to choose the edge set E to induce a (cid:96) × (cid:96) lattice graph when the number oftime series is L = (cid:96) (cid:96) for some (cid:96) , (cid:96) > ; that is, ( i, i (cid:48) ) ∈ E if and only if | i − i (cid:48) | + | i − i (cid:48) | = 1 where, for all i , i − i (cid:96) + i with (cid:54) i (cid:54) (cid:96) . Panel (b) of Figure 1 gives a cartoonrepresentation of a × lattice graph. For example, suppose the data x are recorded for theanalysis of some spatio-temporal phenomenon such that x i,t denotes the observation at time t and at the coordinate i of some (cid:96) × (cid:96) grid over a map of the region of interest, and it is ofinterest to detect the times and the coordinates at which the distribution of the data changes.With the prior for the dependent changepoints given in (11), one can specify that changepointsare likely to occur as connected clusters of simultaneous changepoints on the lattice. For example,set p to be the prior probability that a changepoint occurs in isolation on the lattice and set λ > such that, for each i and t , the conditional probability that t is a changepoint for the i th timeseries is / (1 + exp {− ¯ p − n i,t λ } ) where n i,t ∈ { , . . . , } is deﬁned in (15).9 .2.3. r -chains Another dependence structure of interest arises when there is a natural ordering of the time series,which is encoded by the time series indices < · · · < L , and changepoints are a priori likelyto occur as chains of simultaneous changepoints across consecutive time series. For instance,suppose the data consist of multiple time series that are recorded to monitor various aspects ofa system; and, it is of interest to detect some event which evolves through multiple phases, suchthat each phase is likely to manifest through the pertubation of one aspect of the system. Insuch a setting, it is appropriate to consider the following graph for the time series indices, whichwe call an r -chain graph: let ( i, i (cid:48) ) ∈ E if and only if (cid:54) | i − i (cid:48) | (cid:54) r , for some r > chosen toallow gaps of length r − within chains of changepoints. Panel (c) of Figure 1 gives a cartoonrepresentation of a -chain graph. For the prior given in (11), ¯ p and λ are chosen such that,for all i and t , the prior conditional probability that t is a changepoint for the i th time series is / (1 + exp {− ¯ p − n i,t λ } ) where n i,t ∈ { , . . . , r } . The model in Section 4.1 assumes changepoints are likely to simultaneously aﬀect clusters oftime series according to the dependence graph G . In this section, we relax this model to allowdependence between changepoints in diﬀerent time series at nearby points in time. The extendedmodel relies on representing changepoints as latent changepoints that may be lagged. The latentchangepoints are distributed according to the model introduced in Section 4.1 and, conditionalon these latent changepoints, time series-speciﬁc lags are assumed to be uniformly distributedover some small time window.Let ( k , τ ) be changepoint parameters for multiple time series as deﬁned in Section 2.1,where it is assumed that the i th time series is subject to k i changepoints whose positions aredenoted τ i = ( τ i, , . . . , τ i,k i ) ∈ T k i as deﬁned in (1). The asynchronous model further assumesthat, for all time series i = 1 , . . . , L , there exist latent changepoint positions ˜ τ = ( ˜ τ , . . . , ˜ τ L ) , ˜ τ i = (˜ τ i, , . . . , ˜ τ i,k i ) ∈ T k i , and lags d = ( d , . . . , d L ) , d i = ( d i, , . . . , d i,k i ) ∈ { , . . . , w i } k i , for w = ( w , . . . , w L ) , where w i (cid:62) is an upper bound for the lags, such that, for all j = 1 , . . . , k i ,the j th changepoint for time series i is τ i,j = ˜ τ i,j + d i,j . (17)Let ˜ τ i, = τ i, = 1 and ˜ τ i,k i +1 = τ i,k i +1 = T + 1 . For all ( k i , τ i ) and w i (cid:62) , if ˜ τ i = τ i and d i is the zero vector then (17) holds, and therefore the existence of a corresponding pair ( ˜ τ i , d i ) is guaranteed. If w i = 0 then the latent changepoints and the changepoints must be identical;yet, in general, given some changepoints ( k i , τ i ) and w i > , there are multiple distinct pairs oflatent changepoints ( k i , ˜ τ i ) and lags d i satisfying (17).Suppose the latent changepoints ( k , ˜ τ ) are distributed according to the prior distribution (11)given some < p < and graph edge weight parameters λ . Then, independently for all timeseries i , conditional on w i and ( k i , ˜ τ i ) , the lags d i = ( d i, , . . . , d i,k i ) are assumed to be uniformlydistributed on the set D ( w i , k i , ˜ τ i ) = (cid:110) ( d i, , . . . , d i,k i ) ∈ { , . . . , w i } k i : (˜ τ i, + d i, , . . . , ˜ τ i,k i + d i,k i ) ∈ T k i (cid:111) , (18)such that, for all d = ( d , . . . , d L ) , π ( d | k , ˜ τ , w ) = L (cid:89) i =1 D ( w i ,k i , ˜ τ i ) ( d i ) |D ( w i , k i , ˜ τ i ) | . (19)Proposition 1 gives a recursion to derive the cardinality of (18).10 roposition 1. (Cardinality of D ). Let w (cid:62) , k (cid:62) , τ = ( τ , . . . , τ k ) ∈ T k with τ = 1 and τ k +1 = T + 1 , and let D ( w, k, τ ) be the set deﬁned in (18).(i) For all j (cid:62) and l (cid:62) , with ρ ( j, l ) = min { w + 1 , T + 1 − τ j } − ( τ j + l − τ j ) , let Q ( j, l ) = Γ( ρ ( j, l ) + l + 1)Γ( ρ ( j, l ))Γ( l + 2) { ,...,w } ( τ j + l − τ j ) . (20) Moreover, let Z (0) = 1 , Z (1) = Q (1 , and, for all k > , Z ( k ) = k (cid:88) j =1 ( − k − j Z ( j − Q ( j, k − j ) . (21) Then |D ( w, k, τ ) | = Z ( k ) . (22) In particular, if k = 0 then τ is the empty sequence and D ( w, k, τ ) contains a uniqueelement, namely the empty sequence.(ii) |D ( w, k, τ ) | (cid:54) ( w + 1) k and the equality holds if and only if τ j +1 − τ j > w for all j .Proof. See Appendix B.Consequently, the joint prior density for ( k , ˜ τ , d ) is π ( k , ˜ τ , d | p, λ , w ) = π ( k , ˜ τ | p, λ ) (cid:81) Li =1 |D ( w i , k i , τ i ) | (23)and the induced changepoint prior distribution for ( k , τ ) is π ( k , τ | p, λ , w ) = (cid:88) ( ˜ τ , d ) ∈ Υ( k , τ , w ) π ( k , ˜ τ , d | p, λ , w ) , (24)where Υ( k , τ , w ) denotes the set of pairs of latent changepoints and lags, ( k , ˜ τ , d ) , that identifythe changepoints ( k , τ ) according to (17). If no lags are allowed, that is w i = 0 for all i , then Υ( k , τ , w ) = { ( k , τ , ) } and the prior in (24) is identical to the prior discussed in Section 4.1.For some applications the upper bounds for the lags, w , may be known, but this might notbe the case. A possible choice of prior distribution for w would assume that, independently for alltime series i , w i ∼ Geometric ( η i ) for some series-speciﬁc parameter η i , admitting the possibilitythe upper bounds for the lags are likely to be larger for some time series than others.*

5. Markov chain Monte Carlo inference

Joint sampling of changepoints across time series is required when the assumption of indepen-dence for changepoints is relaxed. In this section, we propose a reversible jump MCMC algorithm(Green, 1995) to sample changepoints in multiple time series, ( k , τ ) . The changepoint parameterspace is augmented with auxiliary variables (Besag and Green, 1993; Higdon, 1998) that induceclusters of time series indices according to G . Then, the reversible jump MCMC algorithm ofDenison et al. (2002) is extended to sample from the augmented parameter space, thereby pro-viding a means to eﬃciently explore the changepoint parameter space. At each iteration of thealgorithm, a new cluster of changepoints may be proposed or an existing cluster of changepointsmay be deleted or shifted. The validity of the proposed MCMC algorithm follows immediatelyfrom the reversibility of the proposed moves. 11 .1. Sampler for synchronous dependent changepoints We begin by proposing an MCMC algorithm to sample from the posterior distribution of change-points when changepoint parameters are a priori distributed according to the prior introduced inSection 4.1, π ( k , τ | p, λ ) , given < p < and some interaction parameters λ .To sample changepoints for multiple time series, consider the following adaptation of thestandard MCMC algorithm to sample changepoints for a unique time series (Denison et al.,2002), which will be called the “single site updating” MCMC algorithm thereafter. At eachiteration of the algorithm, with ( k , τ ) denoting the latest particle of the sample chain, proposeone of the following two moves: for a uniformly chosen index ( i, t ) , propose S i,t to be updatedto − S i,t , thereby allowing birth or death of a changepoint; alternatively, the position of arandomly chosen changepoint, τ i,j , is sampled uniformly from { τ i,j − + 1 , . . . , τ i,j +1 − } .As a result of the graph interaction parameters λ , synchronous changepoints can be correlatedacross time series, and the single site updating MCMC algorithm can become impractical, asillustrated in Section 6.2 within a simulation study. Instead, it will be necessary to proposemoves that allow birth, death or shift of clusters of synchronous changepoints according to thegraph induced by λ . To provide a means of moving eﬃciently through the state space of the changepoint parameters,the parameter space is augmented with binary auxiliary variables u = ( u , . . . , u T ) such that, forall t , u t is deﬁned with components u t ( i, i (cid:48) ) ∈ { , } for each pair of time series indices i < i (cid:48) .The augmented prior density is assumed to take the conditionally independent form π ( u | λ , δ, k , τ ) = T (cid:89) t =2 (cid:89) i* , so that clusters induced by u t will tend to be clusters on the graph induced by the edge weight parameters. To generate realisations from the posterior distribution of the changepoints, we consider a “clusterupdating” MCMC algorithm that samples from the joint posterior density (27). By inducing12lusters of time series determined by the edge weight parameters for each time point t , theauxiliary variables u provide a means to eﬃcient exploration of the state space of ( k , τ ) .The parameter δ is a tuning parameter for the cluster updating MCMC algorithm. Thesize of clusters will tend to increase with δ ; in particular, if δ = 0 then, for all t , each clustercorresponds to a unique time series index, even if the edge weights of the dependence graphare large, so that the “cluster updating” MCMC algorithm reduces to the “single site updating”algorithm. Typically δ is ﬁxed (Higdon, 1998) to control the probabilities in (25) and thereforethe expected size of clusters. However, we propose to treat δ (cid:62) as an unknown parameter withprior distribution π ( δ ) , so that expected sizes of cluster may vary in the sample; speciﬁcally, weassume that δ = 0 with probability δ and that δ is drawn from Beta ( δ , δ ) otherwise.For the cluster updating MCMC algorithm, at each iteration of the algorithm, with ( k , τ , u , δ ) denoting the latest particle of the sample chain, one of the following moves is proposed. Birth/death move*

Conditional on the auxiliary variables, the birth/death move proposes the birth or death of acluster of synchronous changepoints. Sample t (cid:48) uniformly from { , . . . , T } . A cluster γ of timeseries indices is randomly chosen from C t (cid:48) , the set of clusters of time series induced by the auxiliaryvariables u t (cid:48) = ( u t (cid:48) ( i, i (cid:48) )) . Then, leaving the auxiliary variables unchanged, propose changepointparameters ( k (cid:48) , τ (cid:48) ) such that, for all i = 1 , . . . , L and t = 2 , . . . , T , S (cid:48) i,t = (cid:26) − S i,t if i ∈ γ and t = t (cid:48) S i,t otherwise , (28)where S and S (cid:48) are binary matrix representations of ( k , τ ) and ( k (cid:48) , τ (cid:48) ) according to (10),respectively. Shift move

The shift move proposes to shift the position of a cluster of synchronous changepoints. First, atime unit t is uniformly chosen from { t : (cid:80) Li =1 S i,t > } . Let C ∗ t ⊆ C t denote the set of clustersof time series indices γ induced by u t such that, for all i ∈ γ , S i,t = 1 . A cluster γ is uniformlychosen from C ∗ t . For all i ∈ γ , let j i be the index of the changepoint with position t for the i thtime series, that is τ i,j i = t . Then, sample uniformly t (cid:48) from (cid:84) i ∈ γ { τ i,j i − + 1 , . . . , τ i,j i +1 − } and propose changepoint parameters ( k , τ (cid:48) ) that are identical to ( k , τ ) but with τ (cid:48) i,j i = t (cid:48) , forall i ∈ γ .In parallel, it is required to propose auxiliary variables u (cid:48) which are adapted to ( k , τ (cid:48) ) .The updated auxiliary variables diﬀer from u as follows. For all i, i (cid:48) ∈ γ , u (cid:48) t (cid:48) ( i, i (cid:48) ) = u t ( i, i (cid:48) ) and u (cid:48) t ( i, i (cid:48) ) = u t (cid:48) ( i, i (cid:48) ) ; for all i ∈ γ and i (cid:48) / ∈ γ such that t (cid:48) / ∈ τ i (cid:48) , u (cid:48) t (cid:48) ( i, i (cid:48) ) = 0 ; and,ensuring reversibility of the move, for all i ∈ γ and i (cid:48) / ∈ γ such that t / ∈ τ i (cid:48) , u (cid:48) t ( i, i (cid:48) ) is sampledconditionally on ( k , τ (cid:48) ) according to the Bernoulli (cid:0) − exp {− δλ i,i (cid:48) (1 − | S i,t − S i (cid:48) ,t | ) } (cid:1) targetdistribution implied by (25). Update of auxiliary variables

Changepoints are left unchanged, δ is sampled from its prior distribution, and auxiliary variablesare sampled from the full conditional distribution given in (25), thereby proposing an updatedclustering of time series indices for all t . According to the changepoint model (24) introduced in Section 4.3, changepoints do not need tooccur at the same time to be related. Consequently, to explore the changepoint parameter space it13ill be required to propose the birth, death or shift of clusters of asynchronous changepoints. Thissection extends the MCMC algorithm from Section 5.1 to sample from the posterior distributionof changepoints when changepoint parameters are a priori distributed according to π ( k , τ | p, λ , w ) from (24).Recall that under the asynchronous model, changepoints ( k , τ ) are deterministically speciﬁedby latent changepoints ( k , ˜ τ ) and unknown lags d according to (17). Therefore, a sample from ( k , τ ) can be obtained from a sample from ( k , ˜ τ , d ) . Next, we propose a sampler from the jointposterior distribution of ( k , ˜ τ , d ) , updated from the prior density (23) by the observed data x ,thereby providing a means to obtain a sample from the posterior distribution of ( k , τ ) .As in Section 5.1, the parameter space is augmented with auxiliary variables u = ( u , . . . , u T ) to facilitate the exploration of the state space of the parameters of interest ( k , ˜ τ , d ) . Con-ditionally on latent changepoints ( k , ˜ τ ) and independently of the lags and the data, for all t and i < i (cid:48) , u t ( i, i (cid:48) ) is assumed to be distributed according to (25), such that u t ( i, i (cid:48) ) ∼ Bernoulli (cid:16) − exp {− δλ i,i (cid:48) (1 − | ˜ S i,t − ˜ S i (cid:48) ,t | ) } (cid:17) , where ˜ S is the binary matrix representation of ( k , ˜ τ ) according to (10). As described in Section 5.1.1, it follows that the auxiliary variables u t = ( u t ( i, i (cid:48) )) induce a partition of the time series, C t , such that, for each cluster γ ∈ C t , eitherall time series or no time series in γ are aﬀected by a latent changepoint at time t .The joint posterior density of the augmented parameters ( k , ˜ τ , d , u ) is π ( k , ˜ τ , d , u | p, λ , w , x , δ ) = π ( u | λ , δ, k , ˜ τ ) π ( k , ˜ τ , d | p, λ , w , x ) . (29)To sample from the posterior distribution of ( k , ˜ τ , d , u ) , or ( k , ˜ τ , d , u , w ) if upper bounds forthe lags are a priori unknown, the MCMC algorithm discussed in Section 5.1 is extended asfollows: the birth/death and shift moves are adapted to pairs of latent changepoints and lags;and additional moves are introduced for updating the lags. For the lags, note that according to(18), for all i = 1 , . . . , L and j = 1 , . . . , k i , to maintain monotonicity in the changepoints, thelag associated to the j th changepoint of the i th time series must satisfy d i,j ∈ D j ( w i , k i , ˜ τ i ) = { (cid:96) ∈ N ; d − (cid:54) (cid:96) (cid:54) d + } , (30)where d − = max(0 , d i,j − + τ i,j − i − τ i,j + 1) and d + = min( w i , d i,j +1 + τ i,j +1 − τ i,j − ; andthe full conditional probability distribution of d i,j is such that, for all d i,j ∈ D j ( w i , k i , ˜ τ i ) , π (cid:0) d i,j | d − ( i,j ) , k i , ˜ τ i , w i , x i (cid:1) ∝ L i (˜ τ i,j − + d i,j − , ˜ τ i,j + d i,j ) L i (˜ τ i,j + d i,j , ˜ τ i,j +1 + d i,j +1 ) (31)where d − ( i,j ) = { d i (cid:48) ,j (cid:48) ; ( i (cid:48) , j (cid:48) ) (cid:54) = ( i, j ) } and L i is deﬁned in (3).For the extended cluster updating MCMC algorithm, at each iteration of the algorithm, with ( k , ˜ τ , d , u , w ) denoting the latest particle of the sample chain, one of the following moves isproposed. Extended birth/death move

The extended birth/death move proposes the birth or death of a cluster of asynchronous change-points. First, conditionally on the auxiliary variables, latent changepoints ( k (cid:48) , ˜ τ (cid:48) ) are proposedaccording to the birth/death move detailed in Section 5.1: for all time series i ∈ γ ⊆ { , . . . , L } ,the birth or death of latent changepoint with position t is proposed. Then, updated lags areproposed conditional on ( k (cid:48) , ˜ τ (cid:48) ) : If the birth of changepoints is proposed, then, for all time series i ∈ γ , there is j i such that ˜ τ (cid:48) i,j i = t , and the lags d (cid:48) i = ( d i, , . . . , d i,j i − , d (cid:48) i,j i , d i,j i . . . , d i,k i ) are proposed for the i th time series, where d (cid:48) i,j i is sampled from the full conditional distribution(31); otherwise, if the death of changepoints is proposed, then, for all i ∈ γ , there is j i suchthat ˜ τ i,j i = t , and the lags d (cid:48) i = ( d i, , . . . , d i,j i − , d i,j i +1 . . . , d i,k i ) are proposed for the i th timeseries. 14 xtended shift move The extended shift move proposes to shift the positions of a cluster of asynchronous changepoints.First, latent changepoints ( k (cid:48) , ˜ τ (cid:48) ) and auxiliary variables u (cid:48) are proposed according to the shiftmove discussed in Section 5.1: for all time series with index i ∈ γ , the position t (cid:48) is proposedfor latent changepoint with position t . Then, for all i ∈ γ , letting j i denote the index such that ˜ τ (cid:48) i,j i = t (cid:48) , propose d (cid:48) i,j i from the full conditional distribution (31). Update of auxiliary variables δ is sampled from its prior distribution and, conditional on latent changepoints ( k , ˜ τ ) , auxiliaryvariables are sampled from their full conditional distribution (25). Update of lags

A pair ( i, j ) is uniformly chosen from { ( i, j ); i = 1 , . . . , L and j = 1 , . . . , k i } , and the lag d i,j issampled from the full conditional distribution given in (31). Update of upper bounds for lags

If the maximal lags w are a priori unknown, for a randomly chosen time series with index i it isproposed to update w i to w (cid:48) i = w i + σ with probability / , and to update w i to w (cid:48) i = | w i − σ | otherwise, where σ is drawn from Geometric ( ρ ) for some < ρ < . Proposing to update w i requires proposing updated lags d (cid:48) i = ( d (cid:48) i, , . . . , d (cid:48) i,k i ) ∈ D ( w (cid:48) i , k i , ˜ τ i ) for the i th time series. For j = 1 , . . . , k i , given w (cid:48) i and ( d (cid:48) i, , . . . , d (cid:48) i,j − , d i,j +1 . . . d i,k i ) , the lag d (cid:48) i,j is sampled from the fullconditional distribution (31). Specifying that changepoints are dependent across time series necessitates joint sampling ofchangepoint parameters ( k , τ ) , what will require a greater running time than sampling inde-pendent changepoint parameters ( k i , τ i ) for each time series i in parallel. In particular, therunning time of the proposed sampler will tend to increase with the number of edges | E | in thegraph G describing which pairs of time series are a priori likely to be impacted by simultaneouschangepoints.To speed-up the convergence of the sampler, we may consider the following initialisation of thesample chain: latent changepoints are set to be the Bayes changepoint estimates, according theloss function deﬁned in Deﬁnion 3, that correspond to the standard model assuming independentchangepoints across time series with prior π ( k , τ | p ) , and that are obtained in parallel for eachtime series i ; initial lags, upper bounds for lags and auxiliary variables are set to . As a result,the burn-in for the sampler begins with a sensible positioning of changepoints at a limited cost:no iterations for the burn-in of the joint sampling of changepoints across time series are wastedidentifying strong signals for changepoints, relative to ¯ p .

6. Simulation study

A simulation study is presented to demonstrate the model for dependent changepoints proposedin Section 4 and the sampling strategy discussed in Section 5. In particular, the graphicalchangepoint model is compared with the standard model for independent changepoints, whichassumes all edge weight parameters are null (12).Consider L = 30 time series each of length T = 300 , which are assumed to follow thechangepoint model discussed in Section 2.1, with f i ( · | θ ) corresponding to Poisson ( θ ) and θ ∼ (a) (b) Figure 2: Graphs of time series indices i ∈ { , . . . , } representing diﬀerent dependence struc-tures of changepoints discussed in Section 4.2; panels (a) and (b) correspond to a × latticeand a -chain, respectively. For all i , circles indicate i ∈ N and squares indicate i ∈ C ; coloursblue, green and red indicate that i ∈ C , C and C , respectively. Γ(100 , . for each time series i . To sample data for the simulation study, the set of time seriesindices { , . . . , L } is partitioned into two sets C and N = { , . . . , L } \ C such that k i = 1 if i ∈ C and k i = 0 if i ∈ N , so that each time series is either impacted by a single changepointor not impacted by changepoints. In the following experiments, diﬀerent graphical models ofdependence structures of changepoints across time series are considered by means of furtherdivisions of the set C into subsets C , C , C , as illustrated in Figure 2. Moreover, we ﬁx thesegment parameters θ i, = α/β = 1000 for all time series i ∈ { , . . . , L } , and θ i, > for all i ∈ C . For all i ∈ C , conditional on the position of the changepoint τ i, , the larger θ i, is, thestronger the signal for a changepoint is, and the level of evidence may be quantiﬁed by MonteCarlo estimation of the expected Bayes factor comparing a model with a changepoint τ i, and amodel with no changepoints.Next, we consider diﬀerent graphical models of dependent changepoints, in comparison withthe standard model for independent changepoints (12), to illustrate how the posterior distributionof changepoints varies with the partition C , the level of evidence for changepoints and the hyper-parameters of the changepoint prior. A supplementary experiment, related to star dependencestructures, is discussed in Appendix C.1. r -chain of time series By considering dependence structures for changepoints that are induced by the lattice and the r -chain graphs discussed in Section 4.2, we demonstrate the merit of the graphical changepointmodel to combine weak signals for changepoints across clustered time series, in comparison withthe standard model for independent changepoints.The set C is partitioned into three sets C = { , , . . . , } , C = { , , , , , } and C = { } . As illustrated in Figure 2, for both the × lattice graph and the -chain graph, theelements of C and C cluster according to the graph, yet the element of C has no neighboursin C so that it is isolated on the graph. For all i ∈ C = C ∪ C ∪ C , we set τ i, = 200 , sothat changepoints are synchronous, and we let θ i, = θ ∈ { , , } , corresponding toincreasing levels of evidence for changepoints. We performed ten simulations from each model.The prior for synchronous dependent changepoints deﬁned in (11) is assumed, and we considertwo diﬀerent dependence structures: the dependence structure corresponding to a × latticegraph for time series indices, given in Section 4.2.2; and the dependence structure corresponding16o the r -chain graph for time series indices, with r = 2 , given in Section 4.2.3. For eachsimulation from the model, for each type of dependence structure for the changepoints, andfor diﬀerent changepoint hyperparameters ¯ p ∈ {− , − , . . . , − } and λ = λ s | ¯ p | / with λ s ∈ { , . , . . . , . } , a sample of size

50 000 was obtained from the posterior distributionof changepoints, via the MCMC algorithm proposed in Section 5.1, with a burn-in of

10 000 iterations. We set δ = 0 . , δ = 1 , δ = 30 for the prior of the parameter δ , so that relatedtime series indices are expected to bond with probability . when δ > , according to (25).For all samples, the Monte Carlo estimates of the probabilities that k i = 0 for all i ∈ N andthat k i < for all i ∈ C are greater than . . Figure 3 displays Monte Carlo estimates of theposterior probability that k i = 1 , for i ∈ C , C and C , as a function of the hyperparametersof the prior for dependent changepoints, for each dependence structure considered, and given θ = 1050 . As ¯ p increases, the probability that k i = 1 decreases for i ∈ C = C ∪ C ∪ C .For i ∈ C , the simulated changepoint is isolated on the graph for each dependence structureconsidered; therefore, the interaction parameter λ has no impact on the posterior probabilitythat k i = 1 . However, for i ∈ C ∪ C , as λ increases, it becomes more likely a posteriorithat k i = 1 , because weak signals for changepoints, relative to ¯ p , are combined across timeseries which cluster according to the two graphs considered. Moreover, the posterior distributionadapts to the dependence structure for changepoints speciﬁed a priori: as λ increases, for thedependence structure induced by the -chain, the probability that k i = 1 is greater for i ∈ C than for i ∈ C , since the subgraph induced by C has more edges than the subgraph induced by C ; for the dependence structure induced by the lattice, however, the subgraph induced by C has fewer edges than the subgraph induced by C , so that the probability that k i = 1 is smallerfor i ∈ C than for i ∈ C .Results for other levels of evidence considered in the experiment, corresponding to θ ∈{ , } , are discussed in Appendix C.2. C l og i t ( p ) , w i t h r c ha i n g r aph − − − − − − C − − − − − − C − − − − − − l s l og i t ( p ) , w i t h l a tt i c e g r aph − − − − − − l s − − − − − − l s − − − − − − Figure 3: Monte Carlo estimates of the posterior probability that k i = 1 , given θ = 1050 , for i ∈ C , C , and C , as a function of ¯ p = logit ( p ) and λ s , for two types of assumed dependencestructure for changepoints: -chain graph based dependence; lattice graph based dependence.17 .2. Importance of auxiliary variables to sample dependent changepoints We demonstrate that it is pertinent to use auxiliary variables to sample from the posteriordistribution of dependent changepoints, as discussed in Section 5. For the data generated forthe experiment discussed in Section 6.1, which corresponds only to scenarios with θ = 1050 ,changepoints were sampled via MCMC as described in Section 6.1, but now with δ = 0 , meaningthe parameter space was not augmented with auxiliary variables. For all samples, the Monte Carloestimates of the probabilities that k i = 0 for all i ∈ N , and that k i < for all i ∈ C = C ∪ C ∪ C is greater than . . Figure 4 compares Monte Carlo estimates of the posterior probability of k i = 1 from samples obtained with and without auxiliary variables, for i ∈ C , C , and for the -chain and the lattice graph based dependence structures for changepoints. Without auxiliaryvariables, the MCMC algorithm fails to explore the state space of changepoints so that theprobabilities that k i = 1 tend to be underestimated; in other words, clusters of weak signalsfor changepoints across time series tend to be overlooked. The diﬀerence in performance, infavour of the MCMC algorithm making use of auxiliary variables, is particularly important wheninteraction parameters are large and weak signals for changepoints correspond to time serieswhose indices induce subgraphs with a large number of edges, as for i ∈ C given the -chaindependence structure in Figure 4. . . . . . . logit(p)=−80 p r ob . f o r C , r − c ha i n . . . . . . logit(p)=−90 . . . . . . logit(p)=−100 . . . . . . logit(p)=−110 . . . . . . logit(p)=−120 . . . . . . p r ob . f o r C , r − c ha i n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p r ob . f o r C , l a tt i c e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l s p r ob . f o r C , l a tt i c e . . . . . . l s . . . . . . l s . . . . . . l s . . . . . . l s Figure 4: Monte Carlo estimates of the posterior probability that k i = 1 , obtained via simulationswith auxiliary variables (blue squares) and without auxiliary variables (red triangles), for i ∈ C , C , for the -chain and the lattice graph based dependence structures, and for a collectionof changepoint hyperparameters p and λ s . 18 .0 0.2 0.4 0.6 0.8 logit(p)=−70 v = l l l l l logit(p)=−80 l l l l l logit(p)=−90 l l l l l logit(p)=−100 l l l l l logit(p)=−110 l l l l l v = l l l l l l l l l l l l l l l l l l l l l l l l l l s v = l l l l l l s l l l l l l s l l l l l l s l l l l l l s l l l l l Figure 5: Monte Carlo estimates of the expected loss with respect to the simulated changepointparameters for time series i ∈ C , given the -chain dependence structure for changepoints,corresponding to diﬀerent assumptions for the upper bounds for the lags - ﬁxed window (bluesquares), variable window (black circles), and zero window (red triangles) - for increasing levelsof asynchrony of the changepoints v , and for a collection of changepoint hyperparameters p , λ s . To illustrate the model for asynchronous dependent changepoints proposed in Section 4.3, datawere simulated according to the set-up of the experiment discussed in Section 6.1, with the onlydiﬀerence that changepoint positions were not forced to be synchronous: for some v > , weset τ i, = 200 + v for i ∈ { , , , } ; τ i, = 200 for i ∈ { , , , } ; and τ i, = 200 − v for i ∈ { , , , , } , so that no two neighbouring time series, according to both the latticegraph and the -chain graph, are simultaneously impacted by a changepoint. For increasing levelsof asynchrony for the changepoints, v ∈ { , , } , ten simulations were performed from themodel with θ = 1050 .The prior for asynchronous dependent changepoints deﬁned in (24) is assumed. Three dif-ferent scenarios are considered for the upper bounds w for the lags: for the ﬁxed window sce-nario, we ﬁx w i = 30 for all i ; for the variable window scenario, it is assumed a priori that w i ∼ Geometric (0 . for all i ; for the zero window scenario, we ﬁx w i = 0 for all i , whichis equivalent to assuming the prior for synchronous dependent changepoints deﬁned in (11).For each simulation from the model, for each scenario for the maximal lags, given the -chaindependence structure for changepoints, and for the collection of changepoint hyperparametersconsidered in Section 5.1, a sample of size

50 000 was obtained from the posterior distributionof changepoints, via the MCMC algorithm proposed in Section 5.2, with a burn-in of

10 000 iterations. Note that the prior for δ was speciﬁed as described in Section 6.1.To take into account both the number and the positions of changepoints, for all scenariosconsidered, estimations of the posterior distribution of changepoints are compared in terms of theexpected loss with respect to the changepoints that were used to sample data for the experiment,given the loss function L deﬁned in Deﬁnition 3 with γ = 40 . Figure 5 displays Monte Carlo19stimates of the expected loss for the time series in C , which correspond to diﬀerent assumptionsfor the upper bounds for the lags. As the level of asynchrony of the changepoints v increases,the expected loss tends to be greater for the zero window scenario than for the ﬁxed window orthe variable window scenarios. Hence, both in terms of detecting signals for changepoints andcorrectly estimating the positions of these signals, there is merit in relaxing the assumption thatsignals for changepoints must be synchronous to be combined. Moreover, the expected loss tendsbe greater for the variable window than for the ﬁxed window scenarios, therefore encouragingpractioners to specify ﬁxed time windows when possible.

7. Red team detection in network authentication data from LANL

This section presents results of an analysis of the LANL network authentication data presented inSection 3 that demonstrates the utility of the graphical changepoint model proposed in Section 4,through a comparison with the standard model for independent changepoints across time series(4).

The occurrence of a red team exercise during the ﬁrst month of the data collection providessurrogate intruder behaviour in the authentication data (Kent, 2015). In particular, userIDs are known to have been used by the red team. We show the graphical model for dependentchangepoints can combine evidence from multiple users which are linked in the network, to detectchains of quasi-synchronous weak signals for changes in the authentication activity of red teamusers, whilst limiting the number of false alerts.For our demonstration purposes, it suﬃces to examine a subset of the full LANL network ofusers, which is represented by the graph G = ( V, E ) deﬁned in (9). Let R denote the set of redteam users and let N ⊆ { u ∈ V \ R : ∃ r ∈ R s.t. ( u, r ) ∈ E } denote | R | randomly selectedusers that are not labelled as red team users in the data but are linked to red team users on thenetwork. The focus is on the network corresponding to the subgraph G (cid:48) induced in G by the setof users V (cid:48) = R ∪ N . Figure 6 shows the degree distribution of the users in G (cid:48) . Red teamusers tend to have a greater degree in G (cid:48) than legitimate users; to traverse the network towardshigh value targets, intruders tend to take control of users that are highly linked on the network. Recall from Section 3 that, for each user i ∈ V (cid:48) , the data x i, , . . . , x i,T consist of hourly counts ofnetwork logons per source computer for the ﬁrst month of data collection as deﬁned in (7), whichare now assumed to follow the changepoint model speciﬁed in (8). To encode prior belief thatsignals for changes resulting from an attack are likely to occur at similar times across users thatare linked in the network G (cid:48) , the prior for dependent changepoints speciﬁed in (24) is assumedwith an identical edge weight parameter λ > , as deﬁned in Section 4.1.2, for all pairs of timeseries corresponding to users that are linked in G (cid:48) . Recall that the same set-up with λ = 0 corresponds to the standard changepoint model assuming independence of changepoints acrosstime series (12).For comparison purposes and to illustrate the ﬂexibility of the proposed model, a collectionof changepoint hyperparameters are considered: ¯ p ∈ {− , − , − } and λ = λ s | ¯ p | /n with λ s ∈ { , . , . , . , . } , where n denotes the average node degree in G (cid:48) . Moreover, diﬀerentassumptions for the upper bounds w for the lags are compared: the zero window assumptionwith w i = 0 for all i , implying signals for attacks are assumed to be synchronous across users;and, the variable window assumption with w i ∼ Geometric (0 . for all i , admitting signals forattacks may be asynchronous across users. 20 egree F r equen cy Degree

Figure 6: Degree distribution of users in the LANL network represented by the graph G (cid:48) . Leftpanel: counts for legitimate users. Right panel: counts for red team users.For each simulation, a sample of size was obtained from the posterior distributionof the changepoints via the MCMC algorithm proposed in Section 5.2, with a burn-in of

300 000 iterations; the Bayes estimate for changepoints corresponding to the loss function (6) with γ = 48 was then derived from the sample. For each user in the network, each estimated changepoint represents a piece of evidence forpossible malicious behaviour that might require further investigation by cyber analysts. Identifyinginferred changepoints for time series corresponding to legitimate users i ∈ N as false alerts, itis meaningful to compare models in terms of the estimated number of changepoints for redteam users and for legitimate users. Moreover, since cyber-attacks tend to be identiﬁed throughclusters of behavioural changes across machines that are linked on the network, it is of interest toprioritise for investigation the estimated changepoints that belong to clusters of quasi-synchronouschangepoints on the network. Given some time window (cid:36) (cid:62) , for each changepoint τ i,j , let n (cid:36)i,j = |{ ( i, i (cid:48) ) ∈ E : ∃ j (cid:48) ∈ { , . . . , k i (cid:48) } s.t. | τ i,j − τ i (cid:48) ,j (cid:48) | (cid:54) (cid:36) }| (32)denote the number of users linked to user i in G (cid:48) that are impacted by a changepoint within (cid:36) hours of τ i,j , and let c (cid:36)i,j = n (cid:36)i,j + 1 n i + 1 , (33)where n i = |{ ( i, i (cid:48) ) ∈ E }| is the degree of node i in G (cid:48) , such that ( n i + 1) − (cid:54) c i,j (cid:54) .The larger c i,j , the more connected τ i,j to other changepoints across the network. To take intoaccount both the number of changepoints and the connectedness of the changepoints on thenetwork, for each user i , changepoint estimates are also compared via the statistic m i ≡ m (cid:36)i = k i (cid:88) j =1 c (cid:36)i,j , (34)21 l s k i l s Figure 7: Estimated average number of changepoints k i for legitimate users i ∈ N (left panel) andfor red team users i ∈ R (right panel), for diﬀerent assumptions for the upper bounds for the lags- zero window (circles) and variable window (triangles), and for a collection of hyperparameters ¯ p (identiﬁed by distinct colours) and λ s . . . . . . . l s m i . . . . . . l s Figure 8: Estimated average m i , as deﬁned in (34), for legitimate users i ∈ N (left panel) andfor red team users i ∈ R (right panel), for diﬀerent assumptions for the upper bounds for the lags- zero window (circles) and variable window (triangles), and for a collection of hyperparameters ¯ p (identiﬁed by distinct colours) and λ s . 22uch that (cid:54) m i (cid:54) k i . Note that for each user i , m i increases with both the number ofchangepoints and the connectedness of those changepoints.For each choice of changepoint hyperparameters, Figure 7 and Figure 8 display the estimatedaverage number of changepoints k i and the estimated average m i (34) with (cid:36) = 24 , respectively,for red team users i ∈ R and for legitimate users i ∈ N . As ¯ p increases, weaker evidence isrequired to infer changepoints, and therefore the estimated values of k i and m i increase for bothred team users and legitimate users. However, as λ increases, weaker signals are required toinfer changepoints impacting multiple users that are linked on the network, and consequently theestimates for k i and m i tend to increase for red team users i ∈ R only. The graphical changepointmodel detects weak signals for behavioural changes that correspond to red team activity, whilstcrucially limiting the number of false alerts, because it successfully encodes prior knowledge thatcyber attacks tend to correspond to coordinated activity across multiple users linked by networkconnectivity.Moreover, as λ increases, the increase of the estimates for k i and m i for red team userstends to be greater for the variable window scenario than for the zero window scenario. Incontrast with the zero window scenario, the variable window scenario admits attacks may resultin asynchronous behavioural changes across the network, and consequently clusters of nearby butnot necessarily synchronous weak signals for changepoints are detected across red team users.The results show that, in comparison with the standard model for independent changepointsacross time series, the proposed graphical changepoint model provides a ﬂexible tool for cyber-analysts to incorporate expert knowledge in changepoint analysis for network monitoring, therebyfacilitating network intrusion detection.

8. Conclusion

This article proposes a ﬂexible class of Bayesian changepoint models for multiple time series thatis adapted to a wide range of graphical model dependence structures for changepoints acrosstime series. A possible extension of the model is to relax the assumption that the graph G on time series indices is known a priori. It might be computationally unrealistic to specify anunconstrained prior for G . However, in some settings it might be appropriate to consider a classof possible graphs G ; for example, it may be assumed a priori that G is an r -chain, as deﬁned in4.2.3, with r (cid:62) unknown. Supplementary material

The python code and the data are available at https://github.com/karl-hallgren/cp_on_graph_of_timeseries/

References

Bardwell, L. and Fearnhead, P. (2017). Bayesian detection of abnormal segments in multipletime series.

Bayesian Analysis , 12(1):193–218.Bardwell, L., Fearnhead, P., Eckley, I. A., Smith, S., and Spott, M. (2019). Most recent change-point detection in panel data.

Technometrics , 61(1):88–98.Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation.

Journal of theRoyal Statistical Society. Series B (Methodological) , 55(1):25–37.Bolton, A. D. and Heard, N. A. (2018). Malware family discovery using reversible jump MCMCsampling of regimes.

Journal of the American Statistical Association , 113(524):1490–1502.23ondy, J. A. and Murty, U. S. R. (1976).

Graph Theory with Applications . Elsevier, New York.Denison, D., Holmes, C., Bani, M., and Smith, A. (2002).

Bayesian Methods for NonlinearClassiﬁcation and Regression . Wiley Series in Probability and Statistics, Chichester: Wiley.Fearnhead, P. (2006). Exact and eﬃcient bayesian inference for multiple changepoint.

Statisticsand Computing , 16:203–213.Fisch, A., Eckley, I., and Fearnhead, P. (2019). Subset multivariate collective and point anomalydetection. arxiv.org .Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian modeldetermination.

Biometrika , 82(4):711–732.Grundy, T. J., Killick, R., and Mihaylov, G. (2020). High-dimensional changepoint detection viaa geometrically inspired mapping.

Statistics and Computing , 30:1155–1166.Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applica-tions.

Journal of the American Statistical Association , 93(442):585–595.Jeng, X. J., Cai, T. T., and Li, H. (2012). Simultaneous discovery of rare and common segmentvariants.

Biometrika , 100(1):157–172.Kent, A. D. (2015). Cybersecurity data sources for dynamic network research. In

DynamicNetworks in Cybersecurity . Imperial College Press.Lauritzen, S. L. (1996).

Graphical Models . Oxford University Press, Oxford.Li, F. and Zhang, N. R. (2010). Bayesian variable selection in structured high-dimensionalcovariate spaces with applications in genomics.

Journal of the American Statistical Association ,105(491):1202–1214.Sexton, J. O., Storlie, C., and Neil, J. (2015). Attack chain detection.

Statistical Analysis andData Mining , 8:353–363.Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal critical dynamics in Monte Carlo simu-lations.

Phys. Rev. Lett. , 58:86–88.Wang, T. and Samworth, R. J. (2018). High dimensional change point estimation via sparse pro-jection.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 80(1):57–83.

Acknowledgements

The authors thank Niall Adams for stimulating discussions about this work. The authors acknowl-edge funding from EPSRC. Research presented in this article was supported by the LaboratoryDirected Research and Development program of Los Alamos National Laboratory (New Mexico,USA) under project number 20180607ECR and Los Alamos National Laboratory.24 ppendicesA. Motivational example: changepoint detection in cyber-security

This section gives further details on the motivational example discussed in Section 3.

A.1. Data processing

With the aim of modelling network activity that is human driven rather than high frequency andautomated, the network authentication data, which are available online at https://csr.lanl.gov/data/cyber1 , were ﬁltered as follows: events involving the same pair of user and sourcecomputer more than times were removed; computers that act as source computers formore than distinct users were discarded. The resulting data involves

11 985 distinct usersand

12 947 distinct source computers.

A.2. Limitations of changepoint independence across time series

To detect occurrences of malicious activity in the network, the authentication activity of eachuser is monitored via hourly counts of network logons per source computer. Figure 9 displays theﬁrst month of data for two distinct users that are linked on the network. The data are assumedto follow the changepoint model speciﬁed in Section 3. Assuming further that changepointsare independent with prior distribution given in (4), for a selection of Bernoulli parameters p ,a sample from the posterior distribution of changepoints was obtained via the reversible jumpMCMC algorithm proposed in Green (1995), adapted to discrete time changepoints. In Figure10, for diﬀerent Bernoulli parameters, the crosses and the red lines indicate the positions of Bayeschangepoint estimates corresponding to the loss function L given in Deﬁnition 3 with γ = 40 ;it is noticeable that p controls a priori the level of granularity of the segmentation of the data.The smaller p , the stronger the evidence required from the data to suggest a posteriori inferredchanges. No choice of p seems satisfactory: choosing a small value for p will limit the number offalse alerts due to noise and user speciﬁc legitimate activity; yet it will also prevent the detectionof weak signals for changes shared by diﬀerent users which are linked in the network, that maybe of great interest. 25 i ,t t x i ,t Figure 9: Hourly counts of network logons per source computer for two users over the ﬁrst monthof data collection. Source computers are identiﬁed by distinct colours. Top panel corresponds touser U342@DOM1, and bottom panel to user U86@DOM1.26 i ,t l

100 200 300 400 500 600 700 l og i t ( p ) − − − t x i ,t l

100 200 300 400 500 600 700 t l og i t ( p ) − − Figure 10: Hourly counts of network logons per source computer for two users over the ﬁrstmonth of data collection; top panel corresponds to user U342@DOM1 and bottom panel to userU86@DOM1. Crosses indicate positions of Bayes changepoint estimates, according to the lossfunction L with γ = 40 , corresponding to three diﬀerent Bernoulli parameters p .27 . Proof of Proposition 1 (i) Let w (cid:62) , k (cid:62) and τ k = ( τ , . . . , τ k ) ∈ T k . We introduce some notations to faciliatethe proof. For all (cid:54) j (cid:54) l (cid:54) k , let Ω j : l = Ω j : l ( w, k, τ k ) = { d j : l = ( d j , . . . , d l ) : ∀ i, (cid:54) d i < ρ ( i,

0) = min { w + 1 , T + 1 − τ i }} and Q j : l ≡ Q j : l ( w, k, τ k ) = { d j : l = ( d j , . . . , d l ) ∈ Ω j : l : τ j + d j (cid:62) τ j +1 + d j +1 (cid:62) · · · (cid:62) τ l + d l } . Note that | Ω j : l | = (cid:81) li = j ρ ( i, and |Q j : l | = Q ( j, l − j ) where Q is deﬁned in (20).It is immediate that if k = 0 then (22) holds. If k = 1 then |D ( w, k, τ k ) | = min { w +1 , T + 1 − τ } = Z (1) . We show by induction that (22) holds for k > . If k = 2 then |D ( w, k, τ k ) | = | Ω | − |Q | = ρ (1 , ρ (2 , − Q (1 , Z (1) Q (2 , − Z (0) Q (1 , Z (2) . Assume that (22) holds for k = 1 , . . . , k (cid:48) − for some k (cid:48) > . We show that (22) holdsfor k = k (cid:48) . We have |D ( w, k, τ k ) | = k (cid:88) j =1 ( − k − j |{ d k ∈ Ω k : d j − ∈ D ( w, j − , τ j − ) , d j : k ∈ Q j : k }| = k (cid:88) j =1 ( − k − j |D ( w, k − j, τ k − j ) ) | |Q j : k | = k (cid:88) j =1 ( − k − j Z ( j − Q ( j, k − j )= Z ( k ) . (ii) Follows immediately from (18). 28 . Simulation study C.1. Star of changepoints with varying number of peripheral time series aﬀectedby changepoints

Consider the dependence structure for changepoints induced by the star graph for time seriesindices discussed in Section 4.2.1. We demonstrate that as the number of peripheral time seriesaﬀected by a changepoint at some time t increases, weaker evidence is required for the detectionof a changepoint impacting the central time series i = 1 at time t .As illustrated in Figure 11, the set C is partitioned into two sets C and C such that C = { } contains the central time series and C comprises a random subset of { , . . . , L } corresponding to the peripheral time series that are impacted by a changepoint. For all i ∈ C = C ∪ C , we set τ i, = 200 , so that changepoints are synchronous. For all peripheral time series i ∈ C , we ﬁx θ i, = 1100 so that the evidence for a changepoint is overwhelming relative to thehyperparameters p that will be considered for the changepoint prior. Ten simulation repetitionswere performed for diﬀerent numbers of peripheral time series aﬀected by a changepoint | C | ∈{ , , , } and for diﬀerent θ , ∈ { , , } corresponding to increasing levels ofevidence for a changepoint for the central time series.The prior for synchronous dependent changepoints deﬁned in (11) is assumed, with a parametri-sation corresponding to a star graph for time series indices as discussed in Section 4.2.1. Foreach simulation and for diﬀerent changepoint hyperparameters ¯ p ∈ {− , − , . . . , − } and λ = | ¯ p | λ s / ( L − with λ s ∈ { , . , . . . , . } , a sample of size

50 000 was obtained from theposterior distribution of changepoints, via the MCMC algorithm proposed in Section 5.1, with aburn-in of

10 000 iterations. Recall that set-ups with λ = 0 correspond to the standard modelfor independent changepoints (12).For all samples, the Monte Carlo estimates of the probabilities that k i = 0 for all i ∈ N , that k i = 1 with τ i, = 200 for all peripheral time series i ∈ C , and that k i < for the central timeseries i = 1 are greater than . . Figure 12 displays Monte Carlo estimates of the posteriorprobability that k = 1 for the central time series as a function of the hyperparameters of theprior for dependent changepoints, for the diﬀerent scenarios considered. It is apparent that as ¯ p increases, stronger evidence for a changepoint is needed to infer a posteriori that the centraltime series is impacted by a changepoint. Moreover, if λ > , as the number of peripheral timeseries impacted by a changepoint increases, it is more likely a posteriori that the central timeseries is aﬀected by a changepoint, suggesting that weaker evidence is required for the detectionof a changepoint. Similary, if | C | > , the posterior probability that k = 1 increases with λ ,conﬁrming the role of the graph edge interaction parameter. C.2. Clusters of changepoints on a lattice and on an r -chain of time series For other levels of evidence considered in the experiment discussed in Section 6.1, correspondingto θ ∈ { , } , results are similar to results for the scenario where θ = 1050 . Yet, byconsidering results for the dependence structure corresponding to the -chain displayed in Figure13, we note that, as the level of evidence increases, the region of high posterior probability that k i = 1 for i ∈ C translates along the ¯ p axis on the ¯ p, λ -plane.29 Figure 11: Examplar graph of time series indices i ∈ { , . . . , } representing a star dependencestructure discussed in Section 4.2. For all i , circles indicate i ∈ N and squares indicate i ∈ C ;moreover, colours blue and green indicate that i ∈ C and C , respectively. |C | = l og i t ( p ) , w i t h q = − − − − |C | = − − − − |C | = − − − − |C | = − − − − l og i t ( p ) , w i t h q = − − − − − − − − − − − − − − − − l s l og i t ( p ) , w i t h q = − − − − l s − − − − l s − − − − l s − − − − Figure 12: Monte Carlo estimates of the posterior probability that k i = 1 for the central timeseries i = 1 as a function of ¯ p = logit ( p ) and λ s , for diﬀerent levels of evidence for a changepoint θ , = θ in the central series (rows) and diﬀerent numbers of peripheral times series aﬀected bya changepoint | C | (columns). 30 =1040 l og i t ( p ) , f o r i ˛ C − − − − − − q =1050 − − − − − − q =1060 − − − − − − l s l og i t ( p ) , f o r i ˛ C − − − − − − l s − − − − − − l s − − − − − − Figure 13: Monte Carlo estimates of the posterior probability that k i = 1 , given the -chaingraph based dependence structure for changepoints, as a function of ¯ p = logit ( p ) and λ s , for i ∈ C , C , and for diﬀerent level of evidence for changepoints θ i, = θ ∈ { , , }}

*
*