[PDF] An Asymptotic Theory of Joint Sequential Changepoint Detection and Identification for General Stochastic Models

Abstract

The paper addresses a joint sequential changepoint detection and identification/isolation problem for a general stochastic model, assuming that the observed data may be dependent and non-identically distributed, the prior distribution of the change point is arbitrary, and the post-change hypotheses are composite. The developed detection-identification theory generalizes the changepoint detection theory developed by Tartakovsky (2019) to the case of multiple composite post-change hypotheses when one has not only to detect a change as quickly as possible but also to identify (or isolate) the true post-change distribution. We propose a multi-hypothesis change detection-identification rule and show that it is nearly optimal, minimizing moments of the delay to detection as the probability of a false alarm and the probabilities of misidentification go to zero.

Full PDF

aa r X i v : . [ m a t h . S T ] M a r IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2021 1

An Asymptotic Theory of Joint SequentialChangepoint Detection and Identiﬁcation forGeneral Stochastic Models

Alexander G. Tartakovsky,

Senior Member, IEEE

Abstract —The paper addresses a joint sequential change-point detection and identiﬁcation/isolation problem for a generalstochastic model, assuming that the observed data may bedependent and non-identically distributed, the prior distribu-tion of the change point is arbitrary, and the post-changehypotheses are composite. The developed detection–identiﬁcationtheory generalizes the changepoint detection theory developed byTartakovsky (2019) to the case of multiple composite post-changehypotheses when one has not only to detect a change as quicklyas possible but also to identify (or isolate) the true post-changedistribution. We propose a multi-hypothesis change detection–identiﬁcation rule and show that it is nearly optimal, minimizingmoments of the delay to detection as the probability of a falsealarm and the probabilities of misidentiﬁcation go to zero.

Index Terms —Asymptotic Optimality; Changepoint Detection-Identiﬁcation Problems; Expected Detection Delay; GeneralStochastic Models; Moments of the Delay to Detection.

I. I

NTRODUCTION I N many applications, one needs not only to detect an abruptchange as quickly as possible but also to provide a detaileddiagnosis of the occurred change – to determine which typeof change is in effect. For example, the problem of detectionand diagnosis is important for rapid detection and isolationof intrusions in large-scale distributed computer networks,target detection with radar, sonar and optical sensors in acluttered environment, detecting terrorists’ malicious activity,fault detection and isolation in dynamic systems and networks,and integrity monitoring of navigation systems, to name afew (see [20, Ch 10] for an overview and references). Inother words, there are several kinds of changes that canbe associated with several different post-change distributionsand the goal is to detect the change and to identify whichdistribution corresponds to the change. As a result, the problemof changepoint detection and diagnosis is a generalizationof the quickest change detection problem [6], [13], [15],[16], [20] to the case of N > post-change hypotheses,and it can be formulated as a joint change detection andidentiﬁcation problem. In the literature, this problem is usually The work was supported in part by the Russian Science Foundation underthe grant 18-19-00452 at the Moscow Institute of Physics and Technology.A. G. Tartakovsky is a Deputy Head of the Space informatics Laboratoryat the Moscow Institute of Physics and Technology, Russia and President ofAGT StatConsult, Los Angeles, California, USA; e-mail: [email protected] received March 23, 2020; revised October 26, 2020; acceptedFebruary 22, 2021.Copyright (c) 2020 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. called change detection and isolation . The detection–isolationproblem has been considered in both Bayesian and minimaxsettings. In 1995, Nikiforov [8] suggested a minimax approachto the change detection–isolation problem and showed that themultihypothesis version of the CUSUM rule is asymptoticallyoptimal when the average run length (ARL) to a false alarmand the mean time to false isolation become large. Severalversions of the multihypothesis CUSUM-type and SR-typeprocedures, which have minimax optimality properties in theclasses of rules with constraints imposed on the ARL to afalse alarm and conditional probabilities of false isolation,are proposed by Nikiforov [9], [10] and Tartakovsky [22].These rules asymptotically minimize maximal expected delaysto detection and isolation as the ARL to a false alarm is largeand the probabilities of wrong isolations are small. Dayanik etal. [2] proposed an asymptotically optimal Bayesian detection–isolation rule assuming that the prior distribution of the changepoint is geometric. In all these papers, the optimality resultswere restricted to the case of independent and identicallydistributed (i.i.d.) observations (in pre- and post-change modeswith different distributions) and simple post-change hypothe-ses. In many practical applications, the i.i.d. assumption istoo restrictive. The observations may be either non-identicallydistributed or dependent or both, i.e., non-i.i.d. Also, in a vari-ety of applications, a pre-change distribution is known but thepost-change distribution is rarely known completely. A morerealistic situation is parametric uncertainty when the parameterof the post-change distribution is unknown since a putativeparameter value is rarely representative. Lai [5] provided acertain generalization for the non-i.i.d. case and compositehypotheses for a speciﬁc loss function. See Chapter 10 inTartakovsky et al. [20] for a detailed overview.One of the most challenging and important versions of thechange detection–isolation problem is the multidecision andmultistream detection problem when it is necessary not onlyto detect a change as soon as possible but also to identify thestreams where the change happens with given probabilities ofmisidentiﬁcation. Speciﬁcally, there are N data streams andthe change occurs in some of them at an unknown point intime. It is necessary to detect the change in distribution assoon as possible and indicate which streams are “corrupted.”Both the rates of false alarms and misidentiﬁcation shouldbe controlled by given (usually low) levels. In the following,we will refer to this problem as the Multistream SequentialChange Detection–Identiﬁcation problem.In this paper, we address a simpliﬁed multistream detection–

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2021 identiﬁcation scenario where change can occur only in asingle stream and we need to determine in which stream.We assume that the observations in streams can have a verygeneral structure, i.e., can be dependent and non-identicallydistributed. We focus on a semi-Bayesian setting assumingthat the change point is random possessing the (prior) dis-tribution. However, we do not suppose that there is a priordistribution on post-change hypotheses. We generalize theasymptotic Bayesian theory developed by Tartakovsky [23]for a single post-change hypothesis (for a single stream).Speciﬁcally, we show that under certain conditions (related tothe law of large numbers for the log-likelihood processes) theproposed multihypothesis detection–identiﬁcation rule asymp-totically minimizes the trade-off between positive momentsof the detection delay and the false alarm/misclassiﬁcationrates expressed via the weighted probabilities of false alarmand false identiﬁcation. The key assumption in the generalasymptotic theory is the stability property of the log-likelihoodratio processes in streams between the “change” and “no-change” hypotheses, which can be formulated in terms of thelaw of large numbers and rates of convergence of the properlynormalized log-likelihood ratios and their adaptive versions inthe vicinity of the true parameter values.The rest of the paper is organized as follows. In Sec-tion II, we describe the general stochastic model, which istreated in the paper. In Section III, we introduce the mixture-based change detection–identiﬁcation rule. In Section IV,we formulate the asymptotic optimization problems in theclass of changepoint detection–identiﬁcation rules with theconstraint imposed on the probabilities of false alarm andwrong identiﬁcation. In Section V, we obtain upper boundson the probabilities of false alarms and misidentiﬁcation asfunctions of thresholds. In Section VI, we derive asymptoticlower bounds for moments of the detection delay in theclass of rules with given probabilities of false alarms andmisidentiﬁcation, and in Section VII, we prove asymptoticoptimality of the proposed mixture detection–identiﬁcationrule as the probabilities of false alarm and misidentiﬁcation goto zero. In Section VIII, we consider an example that illustratesgeneral results. Section IX concludes.II. T HE G ENERAL S TOCHASTIC M ODEL

Suppose there are N independent data streams { X n ( i ) } n > , i = 1 , . . . , N , observed sequentially in time subject to achange at an unknown time ν ∈ { , , , . . . } , so that X ( i ) , . . . , X ν ( i ) are generated by one stochastic model and X ν +1 ( i ) , X ν +2 ( i ) , . . . by another model when the changeoccurs in the i th stream. We will assume that the changein distributions may happen only in one stream and it isnot known which stream is affected, i.e., we are interestedin a “multisample slippage” changepoint model (given ν and that the i th stream is affected with the parameter θ i )for which joint density p ( X n | H ν,i , θ i ) of the data X n =( X n (1) , . . . , X n ( N )) , X n ( i ) = ( X ( i ) , . . . , X n ( i )) observed up to time n is of the form p ( X n | H ν,i , θ i ) = p ( X n | H ∞ )= n Y t =1 N Y ℓ =1 g ℓ ( X t ( ℓ ) | X t − ( ℓ )) for ν > n,p ( X n | H ν,i , θ i ) = ν Y t =1 g i ( X t ( i ) | X t − ( i )) × n Y t = ν +1 f i,θ i ( X t ( i ) | X t − ( i )) × Y j ∈N \{ i } n Y t =1 g j ( X t ( j ) | X t − ( j )) for ν < n, (1)where H ν,i denotes the hypothesis that the change oc-curs at time ν in the stream i , g i ( X t ( i ) | X t − ( i )) and f i,θ i ( X t ( i ) | X t − ( i )) are conditional pre- and post-changedensities in the i th data stream, respectively (with respectto some sigma-ﬁnite measure), and N = { , , . . . , N } . Inother words, all components X t ( ℓ ) , ℓ ∈ N , have condi-tional densities g ℓ ( X t ( ℓ ) | X t − ( ℓ )) before the change occursand X t ( i ) has conditional density f i,θ i ( X t ( i ) | X t − ( i )) afterthe change occurs in the i th stream and the rest of thecomponents X t ( j ) , j ∈ N \ { i } have conditional densities g j ( X t ( j ) | X t − ( j )) . The parameters θ i ∈ Θ i , i = 1 , . . . , N ofthe post-change distributions are unknown. The event ν = ∞ and the corresponding hypothesis H ∞ : ν = ∞ mean thatthere never is a change. Notice that the model (1) implies that X ν +1 ( i ) is the ﬁrst post-change observation under hypothesis H ν,i .Regarding the change point ν we assume that it is arandom variable independent of the observations with priordistribution π k = P ( ν = k ) , k = 0 , , , . . . with π k > for k ∈ { , , , . . . } = Z + . We will also assume that a changepoint may take negative values, which means that the changehas occurred by the time the observations became available.However, the detailed structure of the distribution P ( ν = k ) for k = − , − , . . . is not important. The only value whichmatters is the total probability q = P ( ν − of the changebeing in effect before the observations become available, so weset P ( ν −

1) = P ( ν = −

1) = π − , π − ∈ [0 , . Therefore,in what follows we assume that ν ∈ {− , , , . . . } = Z andthe prior distribution of the change point is deﬁned on Z .III. T HE D ETECTION –I DENTIFICATION R ULE

A changepoint detection–identiﬁcation rule is a pair δ =( d, T ) , where T is a stopping time (with respect to the ﬁltration {F n = σ ( X n ) } n ∈ Z + ) associated with the time of alarm onchange and d = d T ∈ N is a decision on which stream isaffected (or which post-change distribution is true) which ismade at time T .It follows from (1) that for an assumed value of thechange point ν = k , stream i ∈ N , and the post-changeparameter value in the i th stream θ i ∈ Θ i , the likelihood ratio(LR) LR i,θ i ( k, n ) = p ( X n | H k,i , θ i ) /p ( X n | H ∞ ) between the ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 3 hypotheses H k,i and H ∞ for observations accumulated by thetime n has the form LR i,θ i ( k, n ) = n Y t = k +1 L i,θ i ( t ) , i ∈ N , n > k (2)( k = − , , , . . . ), where L i,θ i ( t ) = f i,θ i ( X t ( i ) | X t − ( i )) /g i ( X t ( i ) | X t − ( i )) . We suppose that L i,θ i (0) = 1 , so that LR i,θ i ( − , n ) = LR i,θ i (0 , n ) . Deﬁne the average (over the prior π k ) LRstatistics Λ πi,θ i ( n ) = n − X k = − π k LR i,θ i ( k, n ) , i ∈ N . (3)Let W i ( θ i ) , R Θ i d W i ( θ i ) = 1 , i ∈ N be mixing measuresand, for k < n and i ∈ N , deﬁne the LR-mixtures LR i,W ( k, n ) = Z Θ i LR i,θ i ( k, n ) d W i ( θ i ) , (4)and the statistics Λ πi,W ( n ) = (P n − k = − π k LR i,W ( k, n ) , i ∈ N P ( ν > n ) i = 0 ; (5) ¯Λ π,Wij ( n ) = Λ πi,W ( n ) P n − k = − π k sup θ j ∈ Θ j LR j,θ j ( k, n ) ,i, j ∈ N , i = j, n > π,Wi ( n ) = Λ πi,W ( n ) P ( ν > n ) , i ∈ N , n > , (6)where in the statistic Λ πi,W ( n ) deﬁned in (5) i = 0 correspondsto the hypothesis H that there is no change (in the ﬁrst n observations).Write N = { , , . . . , N } . For the set of positive thresholds A = ( A ij ) , j ∈ N \ { i } , i ∈ N , the change detection–identiﬁcation rule δ A = ( d A , T A ) is deﬁned as follows: T A = min ℓ ∈N T ( ℓ ) A , d A = i if T A = T ( i ) A , (7)where the Markov times T ( i ) A , i ∈ N are given by T ( i ) A = inf n n > π,Wij ( n ) > A ij ∀ j ∈ N \ { i } o . (8)In deﬁnitions of stopping times we always set inf { ∅ } = ∞ ,i.e., T ( i ) A = ∞ if there is no such n . If T A = T ( i ) A for severalvalues of i then any of them can be taken.IV. O PTIMIZATION P ROBLEMS AND A SSUMPTIONS

Let E k,i,θ i and E ∞ denote expectations under probabilitymeasures P k,i,θ i and P ∞ , respectively, where P k,i,θ i corre-sponds to model (1) with an assumed value of the parameter θ i ∈ Θ i , change point ν = k , and the affected stream i ∈ N . Deﬁne the probability measure P πi,θ i ( A × K ) = P k ∈K π k P k,i,θ i ( A ) under which the change point ν hasdistribution π = { π k } k ∈ Z and the model for the observationsis of the form (1) and let E πi,θ i denote the correspondingexpectation. For r > , ν = k ∈ Z , θ i ∈ Θ i , and i ∈ N introducethe risk associated with the conditional r th moment of thedetection delay R rk,i,θ i ( δ ) = E k,i,θ i [( T − k ) r ; d = i | T > k ] , (9)where for k = − we set T − k = T , but not T + 1 , as well asthe integrated (over prior π ) risk associated with the momentsof delay to detection ¯ R ri,θ i ( δ ) := E πi,θ i [( T − ν ) r ; d = i | T > ν ]= E πi,θ i [( T − ν ) r , d = i, T > ν ] P πi,θ i ( T > ν )= ∞ X k = − π k E k,i,θ i [( T − k ) r , d = i, T > k ] P ∞ k = − π k P k,i,θ i ( T > k )= ∞ X k = − π k R rk,i,θ i ( δ ) P k,i,θ i ( T > k ) P ( ν

0) + P ∞ k =1 π k P ∞ ( T > k )= ∞ X k = − π k R rk,i,θ i ( δ ) P ∞ ( T > k )1 − PFA π ( δ ) , (10)where PFA π ( δ ) = P πi,θ i ( T ν )= ∞ X k = − π k P k,i,θ i ( T k )= ∞ X k =0 π k P ∞ ( T k ) (11)is the weighted probability of false alarm. Note that in (10) and(11) we used the equality P k,i,θ i ( T k ) = P ∞ ( T k ) sincethe event { T k } belongs to the sigma-algebra F k = σ ( X k ) and, hence, depends only on the ﬁrst k observations whichdistribution corresponds to the measure P ∞ . This implies, inparticular, that P πi,θ i ( T > ν ) = 1 − PFA π ( δ )= P ( ν

0) + ∞ X k =1 π k P ∞ ( T > k ) . Also, introduce

PFA πi ( δ ) = P πi,θ i ( T ν ; d = i )= ∞ X k =0 π k P i,θ i ( T k ; d = i )= ∞ X k =0 π k P ∞ ( T k ; d = i ) , (12)the weighted probability of false alarm on the event { d = i } ,i.e., the probability of raising the alarm with the decision d = i that there is a change in the i th stream when there is no change.The loss associated with wrong identiﬁcation is reasonableto measure by the maximal probabilities of wrong decisions(misidentiﬁcation) PMI πij ( δ ) = sup θ i ∈ Θ i P πi,θ i ( d = j ; T < ∞| T > ν ) , (13) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2021 i, j = 1 , . . . , N, i = j . Note that P πi,θ i ( d = j ; T < ∞| T > ν ) = P πi,θ i ( d = j ; ν < T < ∞ ) P πi,θ i ( T > ν )= P ∞ k = − π k P k,i,θ i ( d = j ; k < T < ∞ )1 − PFA π ( δ ) . Deﬁne the class of change detection–identiﬁcation rules δ with constraints on the probabilities of false alarm PFA πi ( δ ) and the probabilities of misidentiﬁcation PMI πij ( δ ) : C π ( α , β ) = { δ : PFA πi ( δ ) α i , i ∈ N and PMI πij ( δ ) β ij , i, j ∈ N , i = j } , (14)where α = ( α , . . . , α N ) and β = ( β ij ) i,j ∈N ,i = j are the setsof prescribed probabilities α i ∈ (0 , and β ij ∈ (0 , .Ideally, we would be interested in ﬁnding an optimal rule δ opt = ( d opt , T opt ) that solves the optimization problem ¯ R ri,θ i ( δ opt ) = inf δ ∈ C π ( α , β ) ¯ R ri,θ i ( δ ) ∀ θ i ∈ Θ i , i ∈ N . However, this problem is intractable for arbitrary values of α i ∈ (0 , and β ij ∈ (0 , . For this reason, we will focus onthe asymptotic problem assuming that the given probabilities α i and β ij approach zero. To be more speciﬁc, we will beinterested in proving that the proposed detection–identiﬁcationrule δ A = ( d A , T A ) deﬁned in (7)–(8) is ﬁrst-order uniformlyasymptotically optimal in the following sense lim α max → ,β max → inf δ ∈ C π ( α , β ) ¯ R ri,θ i ( δ )¯ R ri,θ i ( δ A ) = 1 for all θ i ∈ Θ i and i ∈ N , (15)where A = A ( α , β ) is the set of suitably selected thresholdssuch that δ A ∈ C π ( α , β ) . Hereafter α max = max i ∈N α i , β max = max i,j ∈N ,i = j β ij .In addition, we will prove that the rule δ A = ( d A , T A ) is uniformly pointwise ﬁrst-order asymptotically optimal in asense of minimizing the conditional risk (9) for all changepoint values ν = k ∈ Z , i.e., lim α max → ,β max → inf δ ∈ C π ( α , β ) R rk,i,θ i ( δ ) R rk,i,θ i ( δ A ) = 1 for all θ i ∈ Θ i , k ∈ Z , i ∈ N . (16)It is also of interest to consider the class of detection–identiﬁcation rules C ⋆π ( α, ¯ β ) = (cid:8) δ : PFA π ( δ ) α, PMI πi ( δ ) ¯ β i , i ∈ N (cid:9) (17)( ¯ β = ( ¯ β , . . . , ¯ β N ) ) with constrains on the total probabilityof false alarm PFA π ( δ ) (deﬁned in (11)) regardless of thedecision d = i which is made under hypothesis H ∞ and onthe misidentiﬁcation probabilities PMI πi ( δ ) = sup θ i ∈ Θ i P πi,θ i ( d = i ; T < ∞| T > ν ) , i ∈ N . Obviously,

PFA π ( δ ) = P Ni =1 PFA πi ( δ ) and PMI πi ( δ ) = P j ∈N \{ i } PMI πij ( δ ) .In this paper, we consider only a ﬁxed number of hypotheses N . The large-scale (Big Data) case where N → ∞ with a certain rate (which requires a different deﬁnition of false alarmand misidentiﬁcation rates) will be considered elsewhere.In the following, we assume that mixing measures W i , i =1 , . . . , N , satisfy the condition: W i { ϑ ∈ Θ i : | ϑ − θ i | < κ } > for any κ > and any θ i ∈ Θ i . (18)By (2), for the assumed values of ν = k , i ∈ N , and θ i ∈ Θ i , the log-likelihood ratio (LLR) λ i,θ i ( k, k + n ) =log LR i,θ i ( k, k + n ) of observations accumulated by the time k + n is λ i,θ i ( k, k + n ) = k + n X t = k +1 log L i,θ i ( t ) , n > , and the LLR between the hypotheses H k,i and H k,j of obser-vations accumulated by the time k + n is λ i,θ i ; j,θ j ( k, k + n ) = log p ( X k + n | H k,i ) p ( X k + n | H k,j ) ≡ λ i,θ i ( k, k + n ) − λ j,θ j ( k, k + n ) , n > . For j = 0 , we set λ ,θ ( k, k + n ) = 0 , so that λ i,θ i ;0 ,θ ( k, k + n ) = λ i,θ i ( k, k + n ) .To study asymptotic optimality we need certain constraintsimposed on the prior distribution π = { π k } and on theasymptotic behavior of the decision statistics as the samplesize increases (i.e., on the general stochastic model).For κ > , let Γ κ ,θ i = { ϑ ∈ Θ i : | ϑ − θ i | < κ } and for (1 + ε ) I ij ( θ i , θ j ) (cid:27) , Υ r ( κ , ε ; i, θ i ) = ∞ X n =1 n r − sup k ∈ Z + P k,i,θ i n n inf ϑ ∈ Γ κ ,θi λ i,ϑ ( k, k + n ) (1 + ε ) I i ( θ i ) (cid:27) . Regarding the model for the observations (1), we assumethat the following two conditions are satisﬁed (for local LLRsin data streams): C . There exist positive and ﬁnite numbers I i ( θ i ) , θ i ∈ Θ i , i ∈ N and I ij ( θ i , θ j ) , θ j ∈ Θ j , j ∈ N \ { i } , θ i ∈ Θ i , i ∈ N ,such that for any ε > M →∞ p M,k ( ε ; i, θ i ; j, θ j ) = 0 for all k ∈ Z + , θ i ∈ Θ i , θ j ∈ Θ j , j ∈ N \ { i } , i ∈ N . (20) C . For any ε > and some r > κ → Υ r ( κ , ε ; i, θ i ) < ∞ for all θ i ∈ Θ i , i ∈ N . (21) ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 5

Note that condition C holds whenever λ i,θ i ; j,θ j ( k, k + n ) /n converges almost surely (a.s.) to I ij ( θ i , θ j ) under P k,i,θ i , i.e.,for all θ i ∈ Θ i P k,i,θ i (cid:26) lim n →∞ n λ i,θ i ; j,θ j ( k, k + n ) = I ij ( θ i , θ j ) (cid:27) = 1 . (22)Regarding the prior distribution π k = P ( ν = k ) we assumethat it is fully supported (i.e., π k > for all k ∈ Z + , π − < and π ∞ = 0 ) and the following two conditions aresatisﬁed: CP . For some µ < ∞ , lim n →∞ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ∞ X k = n +1 π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = µ. (23) CP . If µ = 0 , then in addition ∞ X k =0 π k | log π k | r < ∞ for some r > . (24)The class of prior distributions satisfying conditions CP and CP will be denoted by C ( µ ) .Note that if µ > , then the prior distribution has anexponential right tail. In this case, condition (24) holds au-tomatically. If µ = 0 , the distribution has a heavy tail, i.e.,belongs to the model with a vanishing hazard rate. However,we cannot allow this distribution to have a too heavy tail,which is guaranteed by condition CP . A typical heavy-tailed prior distribution that satisﬁes both conditions CP with µ = 0 and CP for all r > is a discrete Weibull-typedistribution with the shape parameter < κ < . Constraint(24) is often guaranteed by ﬁniteness of the r -th moment, E [ ν r ] < ∞ .To obtain lower bounds for moments of the detection delaywe need only right-tail conditions (20). However, to establishthe asymptotic optimality property of the rule δ A both right-tail and left-tail conditions (20) and (21) are needed.V. U PPER B OUNDS ON P ROBABILITIES OF F ALSE A LARMAND M ISIDENTIFICATION OF THE D ETECTION –I DENTIFICATION R ULE δ A Let e P π,ni,θ i ( A ) = P πi,θ i ( A , ν < n ) denote the measure P πi,θ i on the event { ν < n } . The distribution e P π,ni,θ i ( X n ∈ X n ) hasdensity f π,ni,θ i ( X n ) = n − X k = − " π k k Y t =1 g i ( X t ( i ) | X t − ( i )) n Y t = k +1 f i,θ i ( X t ( i ) | X t − ( i )) × P ( ν < n ) Y j ∈N \{ i } n Y t =1 g j ( X t ( j ) | X t − ( j )) , where Q − t =1 g i ( X t ( i ) | X t − ( i )) = 1 . Write f π,ni,W ( X n ) = Z Θ i f π,ni,θ i ( X n ) d W i ( θ i ) . Next, deﬁne the statistic e Λ π,Wi,j,θ j ( n ) = Λ πi,W ( n ) / Λ πj,θ j ( n ) and the measure e P π,nℓ,W ( A ) = Z Θ ℓ e P π,nℓ,θ ℓ ( A )d W ℓ ( θ ℓ ) . Denote by P | F n the restriction of the measure P to the sigma-algebra F n . Obviously, e Λ π,Wi,j,θ j ( n ) = d e P π,ni,W d e P π,nj,θ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F n , i = j, and hence, the statistic e Λ π,Wi,j,θ j ( n ) is a ( e P π,nj,θ j , F n ) -martingalewith unit expectation for all θ j ∈ Θ j . Therefore, by the Wald–Doob identity, for any stopping time T and all θ j ∈ Θ j , e E πi,θ i he Λ π,Wj,i,θ i ( T )1l {A ,T < ∞} i = e E πj,W (cid:2) {A ,T < ∞} (cid:3) = e P π,Tj,W ( A ∩ {

T < ∞} ) , (25)where e E πj,W and e E πj,θ j stand for the operators of expectationunder e P π,Tj,W and e P π,Tj,θ j , respectively.The following theorem establishes upper bounds for the PFAand PMI of the proposed detection–identiﬁcation rule δ A . Notethat these bounds are valid in the most general case – neither ofthe conditions on the model C , C or on the prior distribution CP , CP are required. Theorem 1.

Let δ A be the changepoint detection–identiﬁcation rule deﬁned in (7) – (8) . The following upperbounds for the PFA and PMI of rule δ A hold PFA πi ( δ A ) (1 + A i ) − , i ∈ N , (26) PFA π ( δ A ) N X i =1 (1 + A i ) − (27) and PMI πij ( δ A ) A i A i A ji , i, j ∈ N , i = j, (28) PMI πi ( δ A ) A i A i X j ∈N \{ i } A ji , i ∈ N . (29) Thus, if α max < − π − , then A i = 1 − α i α i and A ij = 1(1 − α j ) β ji imply δ A ∈ C π ( α , β ) , (30) and if A i = A for i ∈ N and A ij = A j for j ∈ N \ { i } ,then A = Nα (1 − α/N ) and A j = N − − α/N ) ¯ β j imply δ A ∈ C ⋆π ( α, ¯ β ) . (31) Proof:

Using the Bayes rule, notation (2)–(6), and thefact that LR i,θ i ( k, n ) = 1 for k > n , we obtain P ( ν = k |F n ) = π k LR i,W ( k, n ) P ∞ j = − π j LR i,W ( j, n )= π k LR i,W ( k, n ) P n − j = − π j LR i,W ( j, n ) + P ( ν > n ) , IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2021 so that P ( ν > n |F n ) = ∞ X k = n P ( ν = k |F n ) = P ( ν > n )Λ πi,W + P ( ν > n )= 1¯Λ π,Wi ( n ) + 1 . Next, obviously,

PFA i ( δ A ) = P πi,θ i ( T ( i ) A ν, T A = T ( i ) A ) P πi,θ i ( T ( i ) A ν ) . Therefore, taking into account that P πi,θ i ( T ( i ) A ν ) = E πi,θ i [ P ( T ( i ) A ν |F T ( i ) A )] and that ¯Λ π,Wi ( T ( i ) A ) > A i on { T ( i ) A < ∞} , we obtain PFA i ( δ A ) E πi,θ i [(1 + ¯Λ π,Wi ( T ( i ) A )) − ; T ( i ) A < ∞ ] / (1 + A i ) and inequalities (26) follow. Inequality (27) follows immedi-ately from the fact that PFA π ( δ ) = P Ni =1 PFA πi ( δ ) .To prove the upper bound (28) note that e Λ π,Wj,i,θ i ( n ) > ¯Λ π,Wji ( n ) for all n > and θ i ∈ Θ i and that ¯Λ π,Wji ( T ( j ) A ) > A ji on { T ( j ) A < ∞} and we have P πi,θ i ( d A = j, ν < T A < ∞ )= P πi,θ i ( T A = T ( j ) A , ν < T ( j ) A < ∞ )= e P π,T A i,θ i ( T A = T ( j ) A , T ( j ) A < ∞ )= e E πi,θ i " ¯Λ π,Wji ( T ( j ) A )¯Λ π,Wji ( T ( j ) A ) 1l { T A = T ( j ) A ,T ( j ) A < ∞} A ji e E πi,θ i h ¯Λ π,Wji ( T ( j ) A )1l { T A = T ( j ) A ,T ( j ) A < ∞} i A ji e E πi,θ i he Λ π,Wj,i,θ i ( T ( j ) A )1l { T A = T ( j ) A ,T ( j ) A < ∞} i ∀ θ i ∈ Θ i , where, by equality (25), the last term is equal to A ji e P π,T A j,W ( { T A = T ( j ) A } ∩ { T ( j ) A < ∞} ) . This yields P πi,θ i ( d A = j, ν < T A < ∞ ) A ji e P π,T A j,W ( { T A = T ( j ) A } ∩ { T ( j ) A < ∞} ) A ji for all θ i ∈ Θ i . Since P πi,θ i ( d A = j | T A > ν ) = P πi,θ i ( d A = j, ν ν ) and, by (26), P πi,θ i ( T A > ν ) > A i / (1 + A i ) , the upper bound (28) follows. The upperbound (29) follows from (28) and the fact that PMI πi ( δ ) = P j ∈N \{ i } PMI πij ( δ ) .Implications (30) and (31) are obvious. Remark 1.

Typically, the upper bounds (26)–(29) for PFA andPMI are not tight but rather quite conservative, especially whenovershoots over thresholds are large (i.e., when the hypotheses H i and H ∞ are not close). Unfortunately, in the general non-i.i.d. case, the improvement of these bounds is not possible.In the i.i.d. case where observations are independent and identically distributed with the common pre-change density g i ( x ) and the common post-change density f i ( x ) in the i thstream (i.e., when the post-change hypotheses are simple), itis possible to obtain asymptotically accurate approximationsusing the renewal theory similarly to how it was done in [20,Th 7.1.5, p. 327] for the PFA in the single-stream case.VI. L OWER B OUNDS ON THE M OMENTS OF THE D ETECTION D ELAY IN C LASSES C π ( α , β ) AND C ⋆π ( α, ¯ β ) For i ∈ N , deﬁne Ψ i ( α , β ) = max ( | log α i | I i ( θ i ) + µ , max j ∈N \{ i } | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) (32)and Ψ ⋆i ( α, ¯ β ) = max ( | log α | I i ( θ i ) + µ , max j ∈N \{ i } | log ¯ β j | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) . (33)The following theorem establishes asymptotic lower boundson moments of the detection delay R rk,i,θ i ( δ ) and ¯ R ri,θ i ( δ ) ( r > ) in classes of detection–identiﬁcation rules C π ( α , β ) and C ⋆π ( α, ¯ β ) deﬁned in (14) and (17), respectively. Thesebounds will be used in the next section for proving asymptoticoptimality of the detection–identiﬁcation rule δ A with suitablethresholds. Theorem 2.

Let, for some µ > , the prior distributionbelong to class C ( µ ) . Assume that for some positive and ﬁnitenumbers I i ( θ i ) ( θ i ∈ Θ i , i ∈ N ) and I ij ( θ i , θ j ) ( θ i ∈ Θ i , θ j ∈ Θ j , i ∈ N , j ∈ N \ { i } ) condition C holds. If inf θ j ∈ Θ j I ij ( θ i , θ j ) > for all j ∈ N \ { i } , then for all r > , θ i ∈ Θ i , and i ∈ N , lim inf α max ,β max → inf δ ∈ C π ( α , β ) R rk,i,θ i ( δ )[Ψ i ( α , β )] r > for all k ∈ Z , (34) lim inf α max ,β max → inf δ ∈ C π ( α , β ) ¯ R ri,θ i ( δ )[Ψ i ( α , β )] r > (35) and lim inf α max ,β max → inf δ ∈ C ⋆π ( α, ¯ β ) R rk,i,θ i ( δ )[Ψ ⋆i ( α, ¯ β )] r > for all k ∈ Z , (36) lim inf α max ,β max → inf δ ∈ C ⋆π ( α, ¯ β ) ¯ R ri,θ i ( δ )[Ψ ⋆i ( α, ¯ β )] r > , (37) where Ψ i ( α , β ) and Ψ ⋆i ( α, ¯ β ) are deﬁned in (32) and (33) ,respectively.Proof: We only provide the proof of asymptotic lowerbounds (34) and (35). The proof of (36) and (37) is essentiallysimilar.

ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 7

Notice that the proof can be split into two parts since ifwe show that, on one hand, for any rule δ ∈ C π ( α , β ) as α max → , β max → R rk,i,θ i ( δ ) > max j ∈N \{ i }  | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j )  r (1+ o (1)) ∀ k ∈ Z (38)and ¯ R ri,θ i ( δ ) > max j ∈N \{ i }  | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j )  r (1 + o (1)) , (39)and on the other hand R rk,i,θ i ( δ ) > (cid:18) | log α i | I i ( θ i ) + µ (cid:19) r (1 + o (1)) ∀ k ∈ Z , (40)and ¯ R ri,θ i ( δ ) > (cid:18) | log α i | I i ( θ i ) + µ (cid:19) r (1 + o (1)) , (41)where o (1) → , then, obviously, combining inequalities (38)and (40) yields (34) and combining (39) and (41) yields (35).The detailed proof of inequalities (38)–(41) is postponed tothe Appendix.VII. A SYMPTOTIC O PTIMALITY

The following proposition, whose proof is given in theAppendix, establishes ﬁrst-order asymptotic approximationsto the moments of the detection delay of the detection–identiﬁcation rule δ A when thresholds A ij go to inﬁnityregardless of the PFA and PMI constraints. Write A min =min i ∈N ,j ∈N \{ i } A ij . Proposition 1.

Let r > and let the prior distribution of thechange point satisfy condition (23) . Then, for a sufﬁcientlylarge A min , any < ε < J ij ( θ i , µ ) and all k ∈ Z , E k,i,θ i [( T A − k ) + ] r h e Ψ i ( A, π k , θ i , µ, ε ) i r + r r − ∞ X n = M i ( A ) n r − P k,i,θ i ( n inf ϑ ∈ Γ κ ,θi λ i,ϑ ( k, k + n ) < I i ( θ i ) − ε ) , (46) where T A − k = T A for k = − , x + = max(0 , x ) , and J ij ( θ i , µ ) = min { I i ( θ i ) + µ, min j ∈N \{ i } inf θ j ∈ Θ j I ij ( θ i , θ j ) } . Theorem 1, Theorem 2 and Proposition 1 allow us toconclude that the detection–identiﬁcation rule δ A is asymp-totically ﬁrst-order optimal in classes C π ( α , β ) and C ⋆π ( α, ¯ β ) as α max , β max → . Theorem 3.

Let r > and let the prior distribution of thechange point belong to class C ( µ ) . Assume that for some < I i ( θ i ) < ∞ , θ i ∈ Θ i , i ∈ N and for all j ∈ N \ { i } , i ∈ N . (i) If thresholds A i , i ∈ N and A ij , j ∈ N \{ i } , i ∈ N are soselected that PFA πi ( δ A ) α i , PMI ij ( δ A ) β ij and log A i ∼| log α i | , log A ij ∼ | log β ji | as α max , β max → , in particularas A i = (1 − α i ) /α i and A ij = [(1 − α j ) β ji ] − , then δ A isﬁrst-order asymptotically optimal as α max , β max → in class C π ( α , β ) , minimizing moments of the detection delay up toorder r : for all < m r , θ i ∈ Θ i , and i ∈ N inf δ ∈ C π ( α , β ) R mk,i,θ i ( δ ) ∼ R mk,i,θ i ( δ A ) ∼ max ( | log α i | I i ( θ i ) + µ , max j ∈N \{ i } | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m as α max , β max → for all k ∈ Z (47) and inf δ ∈ C π ( α , β ) ¯ R mi,θ i ( δ ) ∼ ¯ R mi,θ i ( δ A ) ∼ max ( | log α i | I i ( θ i ) + µ , max j ∈N \{ i } | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m as α max , β max → . (48) (ii) If thresholds A i = A and A ij = A j , j ∈ N \ { i } , i ∈ N are so selected that PFA π ( δ A ) α , PMI i ( δ A ) ¯ β i IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2021 and log A ∼ | log α | , log A j ∼ | log ¯ β j | as α, ¯ β max → , inparticular as A = N (1 − α/N ) /α and A j = ( N − − α/N ) ¯ β j ] − , then δ A is ﬁrst-order asymptotically optimal as α, ¯ β max → in class C ⋆π ( α, ¯ β ) , minimizing moments of thedetection delay up to order r : for all < m r , θ i ∈ Θ i ,and i ∈ N , inf δ ∈ C ⋆π ( α, ¯ β ) R mk,i,θ i ( δ ) ∼ R mk,i,θ i ( δ A ) ∼ max ( | log α | I i ( θ i ) + µ , max j ∈N \{ i } | log ¯ β j | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m as α, ¯ β max → for all k ∈ Z (49) and inf δ ∈ C ⋆π ( α, ¯ β ) ¯ R mi,θ i ( δ ) ∼ ¯ R mi,θ i ( δ A ) ∼ max ( | log α | I i ( θ i ) + µ , max j ∈N \{ i } | log ¯ β j | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m as α, ¯ β max → . (50) Proof:

Proof of (i). Setting log A i ∼ | log α i | and log A ij ∼ | log β ji | in (42) yields as α max , β max → R mk,i,θ i ( δ A ) ∼ max ( | log α i | I i ( θ i ) + µ , max j ∈N \{ i } | log β ji | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m , i ∈ N . (51)In particular, log A i ∼ | log α i | and log A ij ∼ | log β ji | if A i = (1 − α i ) /α i and A ij = [(1 − α j ) β ji ] − , and byTheorem 1, PFA πi ( δ A ) α i and PMI ij ( δ A ) β ij withthis choice of thresholds (see (30)). Comparing asymptoticapproximations (51) with the lower bounds (34) in Theorem 2completes the proof of (47). The proof of (48) is similar.Proof of (ii). Setting log A i = log A ∼ | log α | and log A ij = log A j ∼ | log ¯ β j | in (42) yields as α max , ¯ β max → R mk,i,θ i ( δ A ) ∼ max ( | log α | I i ( θ i ) + µ , max j ∈N \{ i } | log ¯ β j | inf θ j ∈ Θ j I ij ( θ i , θ j ) ) m , i ∈ N . (52)In particular, log A ∼ | log α | and log A j ∼ | log ¯ β j | if A = N (1 − α/N ) /α and A j = ( N − − α/N ) ¯ β j ] − ,and by Theorem 1, PFA π ( δ A ) α and PMI i ( δ A ) ¯ β i withthis choice of thresholds (see (31)). Comparing asymptoticapproximations (52) with the lower bounds (36) in Theorem 2completes the proof of (49). The proof of (50) is similar. Remark 2.

If the prior distribution π = π α max ,β max dependson the PFA α max and PMI β max constraints and µ α max ,β max → as α max , β max → , then a modiﬁcation of the precedingargument can be used to show that the assertions of Theorem 3hold with µ = 0 .Note that conditions (20) are satisﬁed if n λ i,θ i ; j,θ j ( k, k + n ) P k,i,θi − a.s. −−−−−−−→ n →∞ I ij ( θ i , θ j ) (see Lemma B.1 in [18, p. 243]). Assume also that for somepositive and ﬁnite numbers I ,i ( θ i ) , i ∈ N , − n λ i,θ i ( k, k + n ) P ∞ − a.s. −−−−−→ n →∞ I ,i ( θ i ) . In particular, in the i.i.d. case, these conditions hold with I ij ( θ i , θ j ) ≡ K ij ( θ i , θ j ) = Z (cid:18) log f i,θ i ( x ) f j,θ j ( x ) (cid:19) f i,θ i ( x )d x,I ,i ( θ i ) ≡ K ,i ( θ i ) = Z (cid:18) log g i ( x ) f i,θ i ( x ) (cid:19) g i ( x )d x being the Kullback–Leibler information numbers. Then, I ij ( θ i , θ j ) = I i ( θ i ) + I ,j ( θ j ) > I i ( θ i ) . Therefore, if the priordistribution of the change point is heavy-tailed (i.e., µ = 0 )and the PFA is smaller than the PMI, α i < β ji , α < ¯ β j , whichis typical in many applications, then asymptotics (48) and (50)are reduced to inf δ ∈ C π ( α , β ) ¯ R mi,θ i ( δ ) ∼ (cid:18) | log α i | I i ( θ i ) (cid:19) m ∼ ¯ R mi,θ i ( δ A ) (53)(as α max , β max → ) and inf δ ∈ C ⋆π ( α, β ) ¯ R mi,θ i ( δ ) ∼ (cid:18) | log α | I i ( θ i ) (cid:19) m ∼ ¯ R mi,θ i ( δ A ) (54)(as α, ¯ β max → ).Consider now the fully Bayesian setting where not only theprior distribution π = { π k } k ∈ Z of the changepoint ν is given,but also the prior distribution p = { p i } i ∈N of hypotheses P ( H i ) = p i , i ∈ N is speciﬁed. Then in place of the maximalprobabilities of misidentiﬁcation (13) one can consider thefollowing average probabilities of misidentiﬁcation PMI π,Wi ( δ ) = P π,Wi ( d = i, T < ∞| T > ν )= Z Θ i P πi,θ i ( d = i, T < ∞| T > ν )d W i ( θ i ) , PMI π,W,p ( δ ) = N X i =1 p i PMI π,Wi ( δ ) , and the risk associated with the detection delay is measuredby ¯ R rπ,W,p ( δ ) = E π,W,p [( T − ν ) r | T > ν ] (in place of (10)).Here P π,Wi ( A × K ) = X k ∈K π k Z Θ i P k,i,θ i ( A )d W i ( θ i ) , P π,W,p ( A × K ) = N X i =1 p i P π,Wi ( A × K ) , and E π,W,p is the expectation under the measure P π,W,p . Itfollows from Theorem 1 that for the rule δ A with A i = A , A ij = A j , i ∈ N , j ∈ N we have PMI π,W,pi ( δ ) A A N − A i , i ∈ N , PMI π,W,p ( δ ) (1 + A )( N − A N X i =1 p i A i . Introduce the class of detection–identiﬁcation rules ¯ C π,W,p ( α, β ) = n δ : PFA π ( δ ) α and PMI π,W,p ( δ ) β o ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 9 for which the weighted probability of false alarm does notexceed α ∈ (0 , and the average probability of misidentiﬁca-tion does not exceed β ∈ (0 , . Note that δ A ∈ ¯ C π,W,p ( α, β ) whenever A = Nα (1 − α/N ) and A ij = A = N − − α/N ) β . Using Theorem 3 it is easy to prove that rule δ A is ﬁrst-orderasymptotically optimal in the fully Bayesian setting in class ¯ C π,W,p ( α, β ) . Speciﬁcally, the following theorem holds. Theorem 4.

Let r > , let the prior distribution of the changepoint belong to class C ( µ ) , and let p = { p i } i ∈N be the priordistribution of hypotheses that the change occurs in the i thdata stream. Assume that for some < I i ( θ i ) < ∞ , θ i ∈ Θ i , i ∈ N and for all j ∈ N \ { i } , i ∈ N . If thresholds A i = A and A ij = A , j ∈ N \ { i } , i ∈ N in rule δ A are so selected that PFA π ( δ A ) α , PMI π,W,p ( δ ) β and log A ∼ | log α | , log A ∼ | log β | as α, β → , in particular as A = N (1 − α/N ) /α and A = ( N − − α/N ) β ] − , then δ A is ﬁrst-orderasymptotically optimal as α, β → in class ¯ C π,W,p ( α, β ) ,minimizing moments of the detection delay up to order r : forall < m r , inf δ ∈ ¯ C π,W,p ( α,β ) ¯ R mπ,W,p ( δ ) ∼ ¯ R mπ,W,p ( δ A ) ∼ max { γ ( p, W, µ ) | log α | , γ ( p, W ) | log β |} m as α, β → , (55) where γ ( p, W, µ ) = N X i =1 p i Z Θ i I i ( θ i ) + µ d W i ( θ i ) ,γ ( p, W ) = N X i =1 p i Z Θ i j ∈N \{ i } inf θ j ∈ Θ j I ij ( θ i , θ j ) d W i ( θ i ) . Remark 3.

First-order approximations (47)–(50) and (55) forthe moments of the detection delay are usually not accurate.In the general non-i.i.d. case, it is difﬁcult if at all possible,to obtain more accurate higher-order approximations. Higher-order approximations for the expected detection delays ( m =1 ) can be obtained in the i.i.d. case using nonlinear renewaltheory and techniques developed in [3, Th 3.3 ] and [20, Th4.3.4, Th 7.1.5].VIII. A N E XAMPLE : D

ETECTION OF S IGNALS WITH U NKNOWN A MPLITUDES

Suppose there is an N -channel sensor system and we areable to observe the output vector X n = ( X n (1) , . . . , X n ( N )) , n = 1 , , . . . The observations X n ( i ) in the i th channel havethe form X n ( i ) = θ i S n ( i )1l { n>ν } + ξ n ( i ) , n > , i = 1 , . . . , N, where θ i is an unknown intensity or amplitude ( θ i > ) of adeterministic signal S n ( i ) (e.g., the signal S n ( i ) = cos( ω i n ) )and { ξ n ( i ) } n ∈ Z + , i ∈ N are mutually independent noises which are AR ( p ) Gaussian stable processes that obey recur-sions ξ n ( i ) = p X t =1 ̺ i,t ξ n − t ( i ) + w n ( i ) , n > . (56)Here { w n ( i ) } n > , i ∈ N , are mutually independent i.i.d.Gaussian sequences with mean zero and standard deviation σ > . The coefﬁcients ̺ i, , . . . , ̺ i,p and variance σ areknown.A signal may appear only in one channel and should bedetected and isolated quickly, i.e., the number of a channelwhere the signal appears should be identiﬁed along withdetection.Deﬁne e S i,n = S n ( i ) − P p n t =1 ̺ i,t S n − t ( i ) and e X i,n = X n ( i ) − P p n t =1 ̺ i,t X n − t ( i ) , where p n = p if n > p and p n = n if n p . The LLRs have the form λ i,θ i ( k, k + n ) = θ i σ k + n X t = k +1 e S i,t e X i,t − θ i σ k + n X t = k +1 e S i,t ,λ i,θ i ; j,θ j ( k, k + n ) = λ i,θ i ( k, k + n ) − λ j,θ j ( k, k + n ) . Under measure P k,i,ϑ , ϑ ∈ Θ i , the LLR λ i,θ i ; j,θ j ( k, k + n ) is aGaussian process (with independent non-identically distributedincrements) with mean and variance E k,i,ϑ [ λ i,θ i ; j,θ j ( k, k + n )] =12 σ " (2 θ i ϑ − θ i ) k + n X t = k +1 e S i,t + θ j k + n X t = k +1 e S j,t , Var k,i,ϑ [ λ i,θ i ; j,θ j ( k, k + n )] =1 σ " θ i k + n X t = k +1 e S i,t + θ j k + n X t = k +1 e S j,t . (57)Let Θ i = (0 , ∞ ) , i ∈ N and assume that lim n →∞ n sup k ∈ Z + k + n X t = k +1 e S i,t = Q i , where < Q i < ∞ . This is typically the case in most signalprocessing applications, e.g., for the sequence of sine pulses S n ( i ) = sin( ω i n + φ i ) with frequency ω i and phase φ i . Thenfor all k ∈ Z + and θ i , θ j ∈ (0 , ∞ )1 n λ i,θ i ; j,θ j ( k, k + n ) P k,i,θi − a.s. −−−−−−−→ n →∞ θ i Q i + θ j Q j σ = I ij ( θ i , θ j ) , j ∈ N \ { i } , i ∈ N , n λ i,θ i ( k, k + n ) P k,i,θi − a.s. −−−−−−−→ n →∞ θ i Q i σ = I i ( θ i ) , i ∈ N , so that condition C holds. Furthermore, since all moments ofthe LLR are ﬁnite condition C holds for all r > . Indeed,using (57), we obtain that I i ( ϑ, θ i ) =: lim n →∞ n E k,i,ϑ [ λ i,θ i ( k, k + n )] = ( ϑθ i − ϑ / Q i /σ and for any κ > P k,i,θ i (cid:18) n inf | ϑ − θ i | < κ λ i,ϑ ( k, k + n ) ε ! = P k,i,θ i (cid:0) | Y k,n ( θ i ) | > ε √ n (cid:1) , where Y k,n ( θ i ) = θ i √ n k + n X t = k +1 e S i,t η t ( i ) , n > and { η t ( i ) } t > is the sequence of standard zero-mean normalrandom variables. Hence { Y k,n ( θ i ) } n > is the sequence ofnormal random variables with mean zero and variance σ i,n = n − σ − θ i P k + nt = k +1 ( e S i,t ) , which is asymptotic to θ i Q i /σ .Thus, for a sufﬁciently large n there exists δ > such that σ n δ + θ i Q i /σ , and we obtain that for all large n P k,i,θ i (cid:18) n inf | ϑ − θ i | < κ λ i,ϑ ( k, k + n ) δ + θ i Q i /σ σ n ε √ nδ + θ i Q i /σ (cid:19) P (cid:18) | ˆ η | > ε √ nδ + θ i Q i /σ (cid:19) , where ˆ η is a standard normal random variable. Therefore, Υ r ( κ , ε ; i, θ i )= ∞ X n =1 n r − sup k ∈ Z + P k,i,θ i ( n inf | ϑ − θ i | < κ λ i,ϑ ( k, k + n ) ε √ nδ + θ i Q i /σ (cid:19) , where the right-hand side term is ﬁnite for all r > due to theﬁniteness of all moments of the normal distribution, so thatcondition C holds for all r > .Obviously, inf θ j ∈ (0 , ∞ ) I ij ( θ i , θ j ) = θ i Q i / (2 σ ) = I i ( θ i ) > . Therefore, by Theorem 3, the detection–identiﬁcation rule δ A is asymptotically ﬁrst-order optimal withrespect to all positive moments of the detection delay andasymptotic formulas (48) and (50) hold with inf θ j ∈ (0 , ∞ ) I ij ( θ i , θ j ) = I i ( θ i ) = θ i Q i σ . If max j = i β ji > α i , max j = i ¯ β j > α , and µ = 0 , thenasymptotic formulas (53) and (54) hold.Note that by condition C rule δ A is asymptotically op-timal for almost arbitrary mixing distributions W i ( θ i ) . Inthis example, it is most convenient to select the conjugateprior, W i ( θ i ) = F ( θ i /v i ) , where F ( y ) is a standard normaldistribution and v i > , in which case the decision statisticscan be computed explicitly.It is worth noting that this example arises in certain inter-esting practical applications, e.g., in multichannel/multisensorsurveillance systems such as radars, sonars, and electro-optic/infrared sensor systems, which deal with detecting mov-ing and maneuvering targets that appear at unknown times, andit is necessary to detect a signal from a randomly appearing target in clutter and noise with the minimal average detectiondelay as well as to identify a channel where it appears. See[1], [7], [14], [19]. Another challenging application area wherethe multichannel model is useful is cyber-security [17], [21],[24]. Malicious intrusion attempts in computer networks (spamcampaigns, personal data theft, worms, distributed denial-of-service (DDoS) attacks, etc.) incur signiﬁcant ﬁnancial damageand are severe harm to the integrity of personal information. Itis therefore essential to devise automated techniques to detectcomputer network intrusions as quickly as possible so thatan appropriate response can be provided and the negativeconsequences for the users are eliminated. In particular, DDoSattacks typically involve many trafﬁc streams resulting in alarge number of packets aimed at congesting the target’s serveror network. IX. C ONCLUDING R EMARKS

1. Since we do not specify a class of models for theobservations such as Gaussian, Markov, or HMM and buildthe decision statistics on the LLR processes, we restrict thebehavior of LLRs which is expressed by conditions C and C related to the law of large numbers for the LLR and ratesof convergence in the law of large numbers. As the examplein Section VIII shows, these conditions hold for the additivechanges (in the mean) of the AR( p ) process governed by theGaussian process. These conditions also hold in a variety ofnon-additive examples (detection of changes in spectrum oftime series such as AR( p ) and ARCH( p ) processes) as wellas for a large class of homogeneous Markov processes [11],[12], [18, Sec 3.1, Ch 4] and for hidden Markov models withﬁnite hidden state space [4].2. While we focused on the multistream detection–identiﬁcation problem (1), it should be noted that similarresults also hold in the “scalar” detection–isolation problemwhen the observations { X n } n > represent either a scalarprocess or a vector process but all components of this processchange at time ν . Speciﬁcally, let { f θ ( X t | X t − ) , θ ∈ Θ } bea parametric family of densities and for i = 1 , . . . , N and Θ i ⊂ Θ consider the model p ( X n | H ν,i , θ ) = p ( X n | H ∞ ) = n Y t =1 f θ ( X t | X t − ) for ν > n,p ( X n | H ν,i , θ ) = ν Y t =1 f θ ( X t | X t − ) n Y t = ν +1 f θ ( X t | X t − ) for ν < n, θ ∈ Θ i , where θ is the known pre-change parameter and θ is theunknown post-change parameter. In other words, there are N types of change and for the i th type of change the valueof the post-change parameter θ belongs to a subset Θ i ofthe parameter space Θ . It is necessary to detect and isolatea change as rapidly as possible, i.e., to identify what typeof change has occurred. The change detection–identiﬁcation ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 11 rule δ A = ( d A , T A ) is deﬁned as in (7) where the statistics ¯Λ π,Wij ( n ) get modiﬁed as follows ¯Λ π,Wij ( n ) = P n − k = − π k R Θ i LR θ ( k, n ) d W i ( θ ) P n − k = − π k sup θ ∈ Θ j LR θ ( k, n ) , i, j ∈ N ;¯Λ π,Wi ( n ) = P n − k = − π k R Θ i LR θ ( k, n ) d W i ( θ ) P ( ν > n ) , i ∈ N with the likelihood ratio LR θ ( k, n ) = n Y t = k +1 f θ ( X t | X t − ) f θ ( X t | X t − ) . Write λ θ ( k, k + n ) = log LR θ ( k, n ) and λ θ,θ ∗ ( k, k + n ) = λ θ ( k, k + n ) − λ θ ∗ ( k, k + n ) , where λ θ ∗ ( k, k + n ) = 0 for θ ∗ = θ . Conditions C and C also get modiﬁed as follows C . There exist positive and ﬁnite numbers I ( θ, θ ) = I ( θ ) , θ ∈ Θ i , i ∈ N and I ( θ, θ ∗ ) , θ ∗ ∈ Θ j , j ∈ N \ { i } , θ ∈ Θ i , i ∈ N , such that for any ε > and all k ∈ Z + , θ ∈ Θ i , θ ∗ ∈ Θ j , j ∈ N \ { i } , i ∈ N lim M →∞ p M,k ( ε ; θ, θ ∗ ) = 0 . C . For any ε > and some r > r ( ε ; θ ) < ∞ for all θ ∈ Θ i , i ∈ N , where p M,k ( ε ; θ ; θ ∗ ) = P k,θ (cid:26) M max n M λ θ,θ ∗ ( k, k + n ) > (1 + ε ) I ( θ, θ ∗ ) (cid:27) , Υ r ( ε ; θ ) = lim κ → ∞ X n =1 n r − sup k ∈ Z + P k,θ ( n inf { ϑ ∈ Θ : | ϑ − θ | < κ } λ ϑ ( k, k + n ) < I ( θ ) − ε ) . Essentially the same argument shows that all previousresults hold in this case too. In particular, the assertions ofTheorem 3 are correct: as α max , β max → for all θ ∈ Θ i andall i ∈ N inf δ ∈ C π ( α , β ) ¯ R mθ ( δ ) ∼ ¯ R mθ ( δ A ) ∼ max (cid:26) | log α i | I ( θ ) + µ , max j ∈N \{ i } | log β ji | inf θ ∗ ∈ Θ j I ( θ, θ ∗ ) (cid:27) m , i.e., the detection–identiﬁcation rule δ A is asymptotically op-timal to ﬁrst order.Note also that, in general, these asymptotics are not reducedto (53) even when α i = β ji . Everything depends on theconﬁguration of the hypotheses.3. All previous results can be easily generalized for thecase when the change points are different for different streams,i.e., when ν = ν i with prior distributions π ( i ) k = P ( ν i = k ) ,assuming that condition (23) for π k = π ( i ) k holds with µ = µ i , i ∈ N . Then in relations (32), (33), (44), (47), (48), (49), (50)and other relations where µ is present, the value of µ shouldbe simply replaced with µ i . 4. For independent observations as well as for many Markovand certain hidden Markov models the decision statistics ¯Λ π,Wij ( n ) can be computed effectively, so implementation ofthe proposed detection–identiﬁcation rule is not an issue.Still, in general, the computational complexity and memoryrequirements of rule δ A are high. To avoid this complication,rule δ A can be modiﬁed into a window-limited version wherethe summation in the statistics ¯Λ π,Wij ( n ) over potential changepoints k is restricted to the sliding window of size ℓ . Followingguidelines of [18, Ch 3, Sec 3.10] (where asymptotic optimal-ity of mixture window-limited rules was established in thesingle-stream case), it can be shown that the window-limitedversion also has ﬁrst-order asymptotic optimality properties aslong as the size of the window ℓ ( A ) goes to inﬁnity as A → ∞ at such a rate that ℓ ( A ) / log A → ∞ but log ℓ ( A ) / log A → .The details are omitted.5. If π ∈ C ( µ = 0) or π α,β depends on α, β and µ α,β → as α max , β max → , then an alternative detection–identiﬁcation rule δ ∗ A = ( d ∗ , T ∗ A ) deﬁned as in (7)–(8) wherein the deﬁnition of T ( i ) A the statistics ¯Λ π,Wij ( n ) are replaced bythe statistics R ij ( n ) = P n − k =0 R Θ i LR i,θ i ( k, n ) d W i ( θ i ) P n − k =0 sup θ j ∈ Θ j LR j,θ j ( k, n ) , i, j ∈ N ; R i ( n ) = n − X k =0 Z Θ i LR i,θ i ( k, n ) d W i ( θ i ) , i ∈ N , is also asymptotically optimal to ﬁrst order. Speciﬁcally, witha suitable selection of thresholds asymptotic approximations(53) and (54) hold for δ ∗ A .6. For practical purposes, it is more reasonable to con-sider a “frequentist” problem setup that does not use priordistributions of the changepoint π and hypotheses p . Webelieve that the most reasonable performance metric for falsealarms is the maximal conditional local probability of a falsealarm in a prespeciﬁed time-window ℓ , sup k< ∞ P ∞ ( k T < k + ℓ | T > k ) (see, e.g., [18], [20] for a detaileddiscussion). The optimality results in the Bayesian problemobtained in this paper are of importance in the frequentist(minimax and pointwise) problem, which can be embeddedinto the Bayesian criterion with asymptotically improper uni-form distribution of the changepoint. See Pergamenchtchikovand Tartakovsky [11], [12] and Tartakovsky [18, Ch 4] for thesingle population. A CKNOWLEDGEMENT

The author would like to thank referees whose commentsimproved the article.A

PPENDIX : P

ROOFS

Proof of Theorem 2:

The proof is split into two parts.

Part 1: Proof of asymptotic inequalities (38) and (39).To prove (38) and (39) deﬁne M β ji = M β ji ( ε, θ i , θ j ) = (1 − ε ) | log β ji | /I ij ( θ i , θ j ) and note ﬁrst that, by the Chebyshev inequality, for every ε ∈ (0 , and r > E k,i,θ i [( T − k ) r ; d = i ; T > k ] > M rβ ji P k,i,θ i (cid:8) T − k > M β ji , d = i, T > k (cid:9) = M rβ ji P k,i,θ i (cid:8) T − k > M β ji , d = i (cid:9) . Therefore, for all θ j ∈ Θ j and j ∈ N \ { i } E k,i,θ i [( T − k ) r ; d = i ; T > k ] > (cid:20) (1 − ε ) | log β ji | I ij ( θ i , θ j ) (cid:21) r (1 + o (1)) whenever for all ε ∈ (0 , and all ﬁxed k ∈ Z lim α max ,β max → inf δ ∈ C π ( α , β ) P k,i,θ i (cid:8) T − k > M β ji , d = i (cid:9) = 1 , (A.1)and inequality (38) follows since ε can be arbitrarily small and R rk,i,θ i ( δ ) > E k,i,θ i [( T − k ) r ; d = i ; T > k ] . Recall that for k = − we set T − k = T rather than T + 1 everywhere. Note that R r ,i,θ i ( δ ) ≡ R r − ,i,θ i ( δ ) .Analogously, ¯ R ri,θ i ( δ ) > E πi,θ i [( T − ν ) r ; d = i ; T > ν ] > M rβ ji P πi,θ i (cid:8) T − ν > M β ji , d = i (cid:9) , so that inequality (39) holds whenever lim α max ,β max → inf δ ∈ C π ( α , β ) P πi,θ i (cid:8) T − ν > M β ji , d = i (cid:9) = 1 . (A.2)Hence, we now focus on proving equalities (A.1) and (A.2).Obviously, P k,i,θ i ( T − k > M β ji , d = i ) = P k,i,θ i ( d = i ) − P k,i,θ i ( T − k M β ji , d = i )= 1 − P k,i,θ i ( d = i ) − P i,k,θ i ( T k, d = i ) − P k,i,θ i ( k < T M β ji + k, d = i ) , where P i,k,θ i ( T k, d = i ) = P ∞ ( T k, d = i ) . Write Π k = P ( ν > k ) . For any δ ∈ C π ( α , β ) and k > , we have α i > PFA i ( δ ) = ∞ X t =0 π t P ∞ ( T t, d = i ) > ∞ X t = k π t P ∞ ( T t, d = i ) > P ∞ ( T k, d = i )Π k − , and PMI ij ( δ ) = sup θ i ∈ Θ i ∞ X s = − π s P s,i,θ i ( d = j, T < ∞ ) β ij so that, for any δ ∈ C π ( α , β ) , P ∞ ( T k, d = i ) α i / Π k − , k ∈ Z + (A.3)and sup θ i ∈ Θ i P k,i,θ i ( d = j, T < ∞ ) β ij /π k , k ∈ Z , (A.4) sup θ i ∈ Θ i P k,i,θ i ( d = i, T < ∞ ) π − k X j ∈N \{ i } β ij . (A.5) Therefore, P k,i,θ i ( T − k > M β ji , d = i ) > − α i / Π k − − π − k X j ∈N \{ i } β ij − P k,i,θ i ( k < T M β ji + k, d = i ) . This inequality implies that to prove (A.1) we have to showthat, as α max , β max → δ ∈ C π ( α , β ) P k,i,θ i (cid:8) < T − k M β ji , d = i (cid:9) → . (A.6)For the sake of brevity, we will write λ i,j ( k, k + n ) for theLLR λ i,θ i ; j,θ j ( k, k + n ) . Let A k,β = { k < T k + M β ji } and for C > B k,β,i,j = { d = i, A k,β } \ { max k weobtain P k,j,θ j ( d = i, T < ∞ ) = E k,j,θ j (cid:2) { d = i,T < ∞} (cid:3) = E k,i,θ i h { d = i,T < ∞} e − λ i,j ( k,T ) i > E k,i,θ i h { d = i, A k,β ,λ i,j ( k,T ) e − C E k,i,θ i (cid:2) { d = i, A k,β ,λ i,j ( k,T ) e − C " P k,i,θ i ( d = i, A k,β ) − P k,i,θ i ( max n M βji λ i,j ( k, k + n ) > C ) , where the last inequality follows from the trivial inequality P ( A ∩ B ) > P ( A ) − P ( B c ) . It follows that P k,i,θ i ( A k,β , d = i ) P k,j,θ j ( d = i, T < ∞ ) e C + P k,i,θ i ( max n M βji λ i,j ( k, k + n ) > C ) . Setting C = (1 + ε ) I ij ( θ i , θ j ) M β ji = (1 − ε ) | log β ji | yields P k,i,θ i (cid:8) < T − k M β ji , d = i (cid:9) P k,j,θ j ( d = i, T < ∞ ) e (1 − ε ) | log β ji | + p M βji ,k ( ε ; i, θ i ; j, θ j ) , (A.7)where by (A.4) sup θ j ∈ Θ j P k,j,θ j ( d = i, T < ∞ ) β ji /π k , which along with (A.7) yields the inequality sup δ ∈ C π ( α , β ) P k,i,θ i (cid:8) < T − k M β ji , d = i (cid:9) β ε ji /π k + p M βji ,k ( ε ; i, θ i ; j, θ j ) . The ﬁrst term goes to zero for any ﬁxed k and the secondterm also goes to zero as β max → by condition C , whichimplies (A.6) and (A.1). ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 13

Next, multiplying both sides of inequality (A.7) by π k andsumming over k ∈ Z , we obtain P πi,θ i (cid:8) < T − ν M β ji , d = i (cid:9) β ji e (1 − ε ) | log β ji | + ∞ X k = − π k p M βji ,k ( ε ; i, θ i ; j, θ j ) β ε ji + P ( ν > K β ) + K β X k = − π k p M βji ,k ( ε ; i, θ i ; j, θ j ) , where K β is an arbitrary integer which goes to inﬁnity as β max → . Obviously, the ﬁrst term goes to 0 as β max → .The second term P ( ν > K β ) → by conditions (23) and(24). The third term also goes to 0 due to condition C andLebesgue’s dominated convergence theorem. Hence, for any δ ∈ C ( α , β ) , P πi,θ i (cid:8) < T − ν M β ji , d = i (cid:9) → as α max , β max → . Finally, we have P πi,θ i (cid:8) T − ν > M β ji , d = i (cid:9) = P πi,θ i ( T > ν, d = i ) − P πi,θ i (cid:8) < T − ν M β ji , d = i (cid:9) , where P πi,θ i ( T > ν, d = i ) = P πi,θ i ( d = i | T > ν )[1 − PFA π ( δ )] >  − X j ∈N \{ i } β ij  − N X ℓ =1 α ℓ ! → (A.8)as α max , β max → for any δ ∈ C π ( α , β ) . This yields (A.2),and therefore, inequalities (39). Part 2: Proof of asymptotic inequalities (40) and (41).Changing the measure P ∞ → P k,i,θ i and using an argumentsimilar to that used in Part 1 to obtain (A.7) with M β ij replaced by N α i = (1 − ε ) | log α i | I i ( θ i ) + µ + ε we obtain P k,i,θ i { < T − k N α i , d = i } e (1+ ε ) I i ( θ i ) N αi P ∞ { < T − k N α i , d = i } + P k,i,θ i (cid:26) N α i max n N αi λ i,θ i ( k, k + n ) > (1 + ε ) I i ( θ i ) (cid:27) , (A.9)where for all ε ∈ (0 , e (1+ ε ) I i ( θ i ) N αi P ∞ { < T − k N α i , d = i } exp (cid:8) − ε | log α i | + ( µ + ε )( k − (cid:9) := U α i ,k ( ε, ε ) . (A.10)Using (A.9) and (A.10), we obtain sup δ ∈ C π ( α , β ) P k,i,θ i { < T − k N α i , d = i } U α i ,k ( ε, ε )+ p N αi ,k ( ε ; i, θ i ) , where for every ﬁxed k ∈ Z + the value of U α i ,k ( ε, ε ) tends to zero and also p N αi ,k ( ε ; i, θ i ) → as α max → by condition C . Hence, it follows that for every ﬁxed k ∈ Z lim α max → sup δ ∈ C π ( α , β ) P k,i,θ i { < T − k N α i , d = i } = 0 . (A.11)Next, we have R rk,i,θ i ( δ ) > E k,i,θ i [( T − k ) r , d = i, T > k ] > N rα i P k,i,θ i ( T − k > N α i , d = i, T > k )= N rα i P k,i,θ i ( T − k > N α i , d = i ) > N rα i [ P k,i,θ i ( T > k, d = i ) − P k,i,θ i (0 < T − k N α i , d = i )] , where the second inequality follows from the Chebyshevinequality and P k,i,θ i ( T > k, d = i ) = 1 − P k,i,θ i ( d = i ) − P i,k,θ i ( T k, d = i ) > − α i / Π k − − π − k X j ∈N \{ i } β ij (see (A.4)–(A.5)). Therefore, inf δ ∈ C π ( α , β ) R rk,i,θ i ( δ ) > N rα i (cid:2) inf δ ∈ C π ( α , β ) P ∞ ( T > k, d = i ) − sup δ ∈ C π ( α , β ) P k,i,θ i (0 < T − k N α i , d = i ) (cid:3) , where lim α max → inf δ ∈ C π ( α , β ) P ∞ ( T > k, d = i ) = 1 , (A.12)and by (A.11) the second term on the right hand-side goes to for any ﬁxed k ∈ Z .It follows that for all ﬁxed k ∈ Z inf δ ∈ C π ( α , β ) R rk,i,θ i ( δ ) > (cid:20) (1 − ε ) | log α i | I i ( θ i ) + µ + ε (cid:21) r (1 + o (1)) , where ε and ε can be arbitrarily small, which implies theinequality (40).Next, deﬁne K α i = K α i ( ε, µ, ε ) = (cid:22) ε | log α i | µ + ε (cid:23) . Using inequalities (A.9) and (A.10), we obtain P πi,θ i (0 < T − ν N α i , d = i )= ∞ X k = − π k P k,i,θ i (0 < T − k N α i , d = i )= K αi X k = − π k P k,i,θ i (0 < T − k N α i , d = i )+ ∞ X k = K αi +1 π k P k,i,θ i (0 < T − k N α i , d = i ) K αi X k = − π k U α i ,k ( ε, ε ) + K αi X k = − π k p N αi ,k ( ε ; i, θ i ) + ∞ X k = K αi +1 π k Π K αi + max − k K αi U α i ,k ( ε, ε ) + K αi X k = − π k p N αi ,k ( ε ; i, θ i )= Π K αi + U α i ,K αi ( ε, ε ) + K αi X k = − π k p N αi ,k ( ε ; i, θ i ) , where T − k = T for k = − . If µ > , by condition (23), log Π K αi ∼ − µ K α i as α max → , so Π K αi → . If µ = 0 ,this probability goes to as α max → as well since, bycondition (24), Π K αi < ∞ X k = K αi π k | log π k | −−−−−→ α max → . Obviously, the second term U α i ,K αi ( ε, ε ) → as α max → . By condition C and Lebesgue’s dominated convergencetheorem, the third term goes to 0, and therefore, all three termsgo to zero as α max , β max → for all ε, ε > , so that P πi,θ i (0 < T − ν N α i , d = i ) → as α max , β max → . Since P πi,θ i ( T − ν > N α i , d = i )= P πi,θ i ( T > ν, d = i ) − P πi,θ i (0 < T − ν N α i , d = i ) and by (A.8) P πi,θ i ( T > ν, d = i ) → as α max , β max → forany δ ∈ C π ( α , β ) , it follows that P πi,θ i ( T − ν > N α i , d = i ) → as α max , β max → . Finally, by the Chebyshev inequality, ¯ R ri,θ i ( δ ) > E πi,θ i [( T − ν ) r , d = i, T > ν ] > N rα i P πi,θ i ( T − ν > N α i , d = i ) , which implies that for any δ ∈ C π ( α , β ) as α max , β max → R ri,θ i ( δ ) > (cid:20) (1 − ε ) | log α i | I i ( θ i ) + µ + ε (cid:21) r (1 + o (1)) . Owing to the fact that ε and ε can be arbitrarily small theinequality (41) follows. Proof of Lemma 1:

For k ∈ Z + , deﬁne the exit times τ ( k ) i ( A ) = inf { n > λ i,W ( k, k + n ) − λ πj ( k + n ) > log( A ij /π k ) ∀ j ∈ N \ { i }} , i ∈ N , where λ i,W ( k, k + n ) = log Λ i,W ( k, k + n ) and λ π ( k + n ) =log P ( ν > k + n ) = log Π k + n − .Obviously, for any n > k and k ∈ Z + , log ¯Λ π,Wij ( n ) > log π k LR i,W ( k, n ) P n − ℓ = − π ℓ sup θ j ∈ Θ j LR j,θ j ( ℓ, n ) ! = λ i,W ( k, n ) − λ πj ( n ) + log π k , so for every set A = ( A ij ) of positive thresholds A ij , wehave ( T A − k ) + ( T ( i ) A − k ) + τ ( k ) i ( A ) and, hence, E k,i,θ i [( T A − k ) + ] r E k,i,θ i [( τ ( k ) i ( A )) r ] . Note that since we set T A − k = T A for k = − , it follows that E − ,i,θ i [( T A − k ) + ] r = E ,i,θ i [ T A ] r E ,i,θ i [( τ (0) i ( A )) r ] .Setting τ = τ ( k ) i ( A ) and N = M i ( A ) in inequality (A.1)in Lemma A1 in [18, p. 239] we obtain that the followinginequality holds: E k,i,θ i h(cid:16) τ ( k ) i ( A ) (cid:17) r i [ M i ( A )] r + r r − ∞ X n = M i ( A ) n r − P k,i,θ i (cid:16) τ ( k ) i ( A ) > n (cid:17) . (A.13)Next, we have P k,i,θ i (cid:16) τ ( k ) i ( A ) > n (cid:17) P k,i,θ i ( λ i,W ( k, k + n ) − λ πj ( k + n ) n< n log (cid:18) A ij π k (cid:19) , ∀ j ∈ N \ { i } ) P k,i,θ i ( λ i,W ( k, k + n ) − log Π k + n − n< n log (cid:18) A i π k (cid:19) ) . Let f M i ( A i ) = 1 + (cid:22) log( A i /π k ) I i ( θ i ) + µ − ε (cid:23) . Clearly, for all n > f M i ( A i ) the last probability does notexceed the probability P k,i,θ i (cid:26) λ i,W ( k, k + n ) n < I i ( θ i ) + µ − ε − | log Π k + n − | n (cid:27) and, by condition CP , for a sufﬁciently large value of A i there exists a small κ such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ − | log Π k + f M i ( A i ) − | f M i ( A i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < κ. Therefore, for all sufﬁciently large n , P k,i,θ i (cid:16) τ ( k ) i ( A ) > n (cid:17) P k,i,θ i (cid:18) n λ i,W ( k, k + n ) inf ϑ ∈ Γ κ ,θi λ i,ϑ ( k, k + n ) + log W i (Γ κ ,θ i ) , where Γ κ ,θ i = { ϑ ∈ Θ i : | ϑ − θ i | < κ } . Thus, for all sufﬁ-ciently large n and A min , for which κ + | log W (Γ κ ,θ i ) | /n <ε/ , we have P k,i,θ i (cid:16) τ ( k ) i ( A ) > n (cid:17) P k,i,θ i ( n inf ϑ ∈ Γ κ ,θi λ i,ϑ ( k, k + n ) < I i ( θ i ) − ε + κ + 1 n | log W (Γ κ ,θ i ) | ) ARTAKOVSKY: JOINT SEQUENTIAL CHANGEPOINT DETECTION AND IDENTIFICATION 15 P k,i,θ i (cid:18) n inf ϑ ∈ Γ κ ,θi λ i,ϑ ( k, k + n ) < I i ( θ i ) − ε/ (cid:19) . (A.14)Using (A.13) and (A.14) yields inequality (46) and the proofis complete. Proof of Proposition 1:

By Theorem 1, the rule δ A belongs to class C π ( α , β ) when α i = 11 + A i ; β ij = 1 + A i A i A ji , j ∈ N \ { i } , i ∈ N , and hence, Theorem 2 implies (under condition C ) theasymptotic (as A min → ∞ ) lower bounds R rk,i,θ i ( δ A ) > [Ψ i ( A, θ i , µ )] r (1 + o (1)) ∀ k ∈ Z (A.15)and ¯ R ri,θ i ( δ A ) > [Ψ i ( A, θ i , µ )] r (1 + o (1)) , (A.16)which hold for all r > , θ i ∈ Θ i , and i ∈ N . Thus, toprove the validity of the asymptotic approximations (42) and(43) it sufﬁces to show that, under the left-tail condition C ,for < m r and all θ i ∈ Θ i and i ∈ N the followingasymptotic upper bounds hold as A min → ∞ : R mk,i,θ i ( δ A ) [Ψ i ( A, θ i , µ )] m (1 + o (1)) ∀ k ∈ Z (A.17)and ¯ R mi,θ i ( δ A ) [Ψ i ( A, θ i , µ )] m (1 + o (1)) . (A.18)It follows from inequality (46) in Lemma 1 that for any < ε < J ij ( θ i , µ ) E k,i,θ i [( T A − k ) r ; d A = i ; T A > k ] h e Ψ i ( A, π k , θ i , µ, ε ) i r + r r − Υ r ( κ , ε ; i, θ i ) , (A.19)where Υ r ( κ , ε ; i, θ i ) is deﬁned in (19). Similarly to (A.3) wehave P ∞ ( T A k, d A = i ) [(1 + A i )Π k − ] − , so that P ∞ ( T A k ) k − N X i =1

11 + A i , and hence, P ∞ ( T A > k ) > − k − N X i =1

11 + A i . Using this inequality and inequality (A.19), we obtain R rk,i,θ i ( δ A ) = E k,i,θ i [( T A − k ) r ; d A = i ; T A > k ] P ∞ ( T A > k ) (cid:16) j log( A/π k ) I i ( θ i )+ µ − ε k(cid:17) r + r r − Υ r ( κ , ε ; i, θ i )1 − P Ni =1 / [(1 + A i )Π k − ] . (A.20)Since, by condition C , Υ r ( κ , ε ; i, θ i ) < ∞ for all θ i ∈ Θ i and i ∈ N , this implies the asymptotic upper bound (A.17).This completes the proof of the asymptotic approximation(42). Next, using inequality (A.19) we obtain E πi,θ i [( T A − ν ) r ; d A = i ; T A > ν ]= ∞ X k = − π k E k,i,θ i [( T A − k ) r ; d A = i ; T A > k ] ∞ X k = − π k h e Ψ i ( A, π k , θ i , µ, ε ) i r + r r − Υ r ( κ , ε ; i, θ i ) . Recall that we set T A − k = T A for k = − . Applying thisinequality together with inequality − PFA π ( δ A ) > − N X i =1

11 + A i (see (26)) yields ¯ R ri,θ i ( δ A ) = ∞ X k = − π k E k,i,θ i [( T A − k ) r ; d A = i ; T A > k ]1 − PFA π ( T A ) ∞ X k = − π k h e Ψ i ( A, π k , θ i , µ, ε ) i r + r r − Υ r ( κ , ε ; i, θ i )1 − P Ni =1 (1 / (1 + A i ) . (A.21)By condition C , Υ r ( κ , ε ; i, θ i ) < ∞ for any ε > and any θ i ∈ Θ i and, by condition (24), P ∞ k =0 π k | log π k | r < ∞ . Thisimplies that, as A min → ∞ , for all < m r , all θ i ∈ Θ i ,and all i ∈ N the following upper bound holds ¯ R ri,θ i ( δ A ) h e Ψ i ( A, π k = 1 , θ i , µ, ε ) i r (1 + o (1)) . Since ε can be arbitrarily small and lim ε → e Ψ i ( A, π k =1 , θ i , µ, ε ) = Ψ i ( A, θ i , µ ) , the upper bound (A.18) follows andthe proof of the asymptotic approximation (43) is complete.R EFERENCES[1] P. A. Bakut, I. A. Bolshakov, B. M. Gerasimov, A. A. Kuriksha, V. G.Repin, G. P. Tartakovsky, and V. V. Shirokov,

Statistical Radar Theory .Moscow, USSR: Sovetskoe Radio, 1963, vol. 1 (G. P. Tartakovsky,Editor), in Russian.[2] S. Dayanik, W. B. Powell, and K. Yamazaki, “Asymptotically optimalBayesian sequential change detection and identiﬁcation rules,”

Annalsof Operations Research , vol. 208, no. 1, pp. 337–370, Jan. 2013.[3] V. P. Dragalin, A. G. Tartakovsky, and V. V. Veeravalli, “Multihypothesissequential probability ratio tests–Part II: Accurate asymptotic expansionsfor the expected sample size,”

IEEE Transactions on Information Theory ,vol. 46, no. 4, pp. 1366–1383, Apr. 2000.[4] C.-D. Fuh and A. G. Tartakovsky, “Asymptotic Bayesian theory of quick-est change detection for hidden Markov models,”

IEEE Transactions onInformation Theory , vol. 65, no. 1, pp. 511–529, Jan. 2019.[5] T. L. Lai, “Sequential multiple hypothesis testing and efﬁcient faultdetection-isolation in stochastic systems,”

IEEE Transactions on Infor-mation Theory , vol. 46, no. 2, pp. 595–608, Mar. 2000.[6] G. Lorden, “Procedures for reacting to a change in distribution,”

Annalsof Mathematical Statistics , vol. 42, no. 6, pp. 1897–1908, Dec. 1971.[7] J. Marage and Y. Mori,

Sonar and Underwater Acoustics . London,Hoboken: STE Ltd and John Wiley & Sons, 2013.[8] I. V. Nikiforov, “A generalized change detection problem,”

IEEE Trans-actions on Information Theory , vol. 41, no. 1, pp. 171–187, Jan. 1995.[9] ——, “A simple recursive algorithm for diagnosis of abrupt changesin random signals,”

IEEE Transactions on Information Theory , vol. 46,no. 7, pp. 2740–2746, Jul. 2000. [10] ——, “A lower bound for the detection/isolation delay in a class ofsequential tests,”

IEEE Transactions on Information Theory , vol. 49,no. 11, pp. 3037–3046, Nov. 2003.[11] S. Pergamenchtchikov and A. G. Tartakovsky, “Asymptotically optimalpointwise and minimax quickest change-point detection for dependentdata,”

Statistical Inference for Stochastic Processes , vol. 21, no. 1, pp.217–259, Jan. 2018.[12] ——, “Asymptotically optimal pointwise and minimax change-pointdetection for general stochastic models with a composite post-changehypothesis,”

Journal of Multivariate Analysis , vol. 174, no. 4, pp. 1–20,Oct. 2019.[13] M. Pollak, “Optimal detection of a change in distribution,”

Annals ofStatistics , vol. 13, no. 1, pp. 206–227, Mar. 1985.[14] M. A. Richards,

Fundamentals of Radar Signal Processing , ser. 2ndedition. USA: McGraw-Hill Education Europe, 2014.[15] A. N. Shiryaev, “On optimum methods in quickest detection problems,”

Theory of Probability and its Applications , vol. 8, no. 1, pp. 22–46, Jan.1963.[16] ——,

Optimal Stopping Rules , ser. Series on Stochastic Modelling andApplied Probability. New York, USA: Springer-Verlag, 1978, vol. 8.[17] A. G. Tartakovsky, “Rapid detection of attacks in computer networks byquickest changepoint detection methods,” in

Data Analysis for NetworkCyber-Security , N. Adams and N. Heard, Eds. London, UK: ImperialCollege Press, 2014, pp. 33–70.[18] ——,

Sequential Change Detection and Hypothesis Testing: GeneralNon-i.i.d. Stochastic Models and Asymptotically Optimal Rules , ser.Monographs on Statistics and Applied Probability 165. Boca Raton,London, New York: Chapman & Hall/CRC Press, 2020.[19] A. G. Tartakovsky and J. Brown, “Adaptive spatial-temporal ﬁlteringmethods for clutter removal and target tracking,”

IEEE Transactions onAerospace and Electronic Systems , vol. 44, no. 4, pp. 1522–1537, Oct.2008.[20] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville,

Sequential Anal-ysis: Hypothesis Testing and Changepoint Detection , ser. Monographson Statistics and Applied Probability 136. Boca Raton, London, NewYork: Chapman & Hall/CRC Press, 2015.[21] A. G. Tartakovsky, A. S. Polunchenko, and G. Sokolov, “Efﬁcient com-puter network anomaly detection by changepoint detection methods,”

IEEE Journal of Selected Topics in Signal Processing , vol. 7, no. 1, pp.4–11, Feb. 2013.[22] A. G. Tartakovsky, “Multidecision quickest change-point detection:Previous achievements and open problems,”

Sequential Analysis , vol. 27,no. 2, pp. 201–231, Apr. 2008.[23] ——, “Asymptotic optimality of mixture rules for detecting changes ingeneral stochastic models,”

IEEE Transactions on Information Theory ,vol. 65, no. 3, pp. 1413–1429, March 2019.[24] A. G. Tartakovsky, B. L. Rozovskii, R. B. Bla´zek, and H. Kim,“Detection of intrusions in information systems by sequential change-point methods,”

Statistical Methodology , vol. 3, no. 3, pp. 252–293, Jul.2006.