[PDF] Conformal e-prediction for change detection

Abstract

We adapt conformal e-prediction to change detection, defining analogues of the Shiryaev-Roberts and CUSUM procedures for detecting violations of the IID assumption. Asymptotically, the frequency of false alarms for these analogues does not exceed the usual bounds.

Full PDF

aa r X i v : . [ m a t h . S T ] J un Conformal e-prediction for change detection

Vladimir VovkJune 4, 2020

Abstract

We adapt conformal e-prediction to change detection, deﬁning ana-logues of the Shiryaev–Roberts and CUSUM procedures for detecting vi-olations of the IID assumption. Asymptotically, the frequency of falsealarms for these analogues does not exceed the usual bounds.The version of this paper at http://alrw.net (Working Paper 29) isupdated most often.

We adapt conformal e-predictors, as deﬁned in [7], to change detection. Thestandard approaches to change detection assume the independence of observa-tions (given the change-point in the Bayesian approach) and known pre-changeand post-change distributions (again given the change-point in the Bayesianapproach). In this note we will just assume that before the change-point theobservations are IID (the change-point may be already the ﬁrst observation)and after the change-point the observations cease to be IID.Since our problem has so little structure, we will be able to prove only validityresults: before the change-point our procedures do not raise alarms too often.The eﬃciency (raising an alarm soon after the change) is a topic of furtherresearch, as we discuss in Section 5.So far the only method of change detection with the general IID assumption(or the assumption of randomness ) as null hypothesis has been conformal changedetection (see, e.g., [8]). The approach of this note is also based on conformalprediction but is simpler. On the negative side, our validity results will beweaker. For further details, see Section 4.We start the main part of this note, in Section 2, from another conformalversion of the Shiryaev–Roberts procedure and a simple statement about itsasymptotic validity. As a corollary, in Section 3 we obtain the asymptoticvalidity of an analogous conformal version of Page’s CUSUM procedure.Informally (and formally in the proof of Proposition 1), this note is basedon the idea of reversing the time, which is standard in conformal prediction [9,Section 8.7]. This is how we obtain the procedures that we call Roberts–Shiryaev(reversing Shiryaev–Roberts) and MUSUC (reversing CUSUM). However, for1implicity, in Section 2 we ﬁrst present the Roberts–Shiryaev procedure in itspure form, and only later, after stating the validity result, explain connectionswith its prototype in the standard theory of change detection.

Let Z be the observation space (a measurable space), (Ω , A , P ) be an underly-ing probability space (with the expectation operator E ), and Z , Z , . . . be anIID sequence of Z -valued random elements. We are interested in a sequence z , z , . . . of elements of Z and interpret Z , Z , . . . as our observations and z , z , . . . as their realized values.We will use the notation * z , . . . , z n + for a bag (also known as multiset)consisting of elements z , . . . , z n . It will be regarded as an equivalence class ofsequences ( z , . . . , z n ), where two sequences are deﬁned to be equivalent whenthey can be obtained from each other by permuting their elements.A conformal e-predictor is a measurable function f that maps any ﬁnitesequence ( z , . . . , z m ), for any m ∈ { , , . . . } , to a ﬁnite sequence ( α , . . . , α m )of nonnegative numbers of the same length with average 1,1 m m X i =1 α i = 1 , (1)that satisﬁes the following property of equivariance: for any m ∈ { , , . . . } , anypermutation π of { , . . . , m } , any ( z , . . . , z m ) ∈ Z m , and any ( α , . . . , α m ) ∈ [0 , ∞ ) m ,( α , . . . , α m ) = f ( z , . . . , z m ) = ⇒ ( α π (1) , . . . , α π ( m ) ) = f ( z π (1) , . . . , z π ( m ) ) . In terms of betting [4], f is our bet and α i shows how strange z i looks in thebag * z , . . . , z m + ; for a large m and under the assumption of exchangeability of z , . . . , z m , we do not expect α i to be large for a signiﬁcant proportion of z i . Itwill be convenient to abuse the notation by setting f ( * z , . . . , z m + , z ) := α, (2)where α is the last element of the sequence( α , . . . , α m , α ) := f ( z , . . . , z m , z ) . (It is clear that the α in (2) does not depend on the ordering of the sequence( z , . . . , z m ).)With each conformal e-predictor f we can associate the sequence of nonneg-ative random variables E , E , . . . , where E n := f ( * Z , . . . , Z n − + , Z n ) . (3)Intuitively, large values of these random variables are evidence against Z , Z , . . . being IID. 2he Roberts–Shiryaev procedure for nonrandomness detection is the sequenceof stopping times σ := 0 and σ k := min  n > σ k − | n X i = σ k − +1 E σ k − +1 . . . E i ≥ c  , k = 1 , , . . . , (4)where c > c is a large number). Theidea behind this deﬁnition is that we raise alarms at times σ , σ , . . . warning theuser that the IID assumption may have become violated. If the IID assumptionis in fact never violated, we do not want to raise (false) alarms too often. Thefollowing proposition is a simple statement of validity. Remember that thesequence of observations Z , Z , . . . is assumed to be IID, and so all alarms arefalse. Proposition 1.

Let A n be the number of alarms A n := max { k | σ k ≤ n } (5) raised by the Roberts–Shiryaev procedure (4) after seeing the ﬁrst n observations Z , . . . , Z n . Then lim sup n →∞ A n n ≤ c in probability . (6)The conclusion (6) can be spelled out as ∀ ǫ > ∃ N ∀ N ≥ N : P (cid:18) A N N > c + ǫ (cid:19) ≤ ǫ. (7)Let us see how the Roberts–Shiryaev procedure is obtained, informally, byreversing its standard counterpart. The conformal e-pseudomartingale corre-sponding to the random variables (3) is S n := E . . . E n , n = 0 , , , . . . , (8)where S is understood to be 1. It is not a genuine martingale since we only have E E n = 1 for all n instead of E ( E n | E , . . . , E n − ) = 1. It is not clear whatproperties of validity the Shiryaev–Roberts procedure would retain if appliedto S n .Instead, we can choose a large N and apply the Shiryaev–Roberts procedure([5, 3]; we will use the description in [8, (13)]) to the martingale T n := E n . . . E N , n = N, N − , . . . , T n divides thetime { , , . . . } into intervals ( a, b ) such that, roughly, b X i = a +1 T a T i ≈ c.

3y deﬁnition, this can be rewritten as E a + E a E a +1 + · · · + E a . . . E b − ≈ c, which motivates our deﬁnition (4). Proof of Proposition 1.

Let us ﬁx ǫ > N satisfying (7). Ac-cording to [8, Proposition 4.3], for the conformal Shiryaev–Roberts procedurethe inequality in (6) holds almost surely; therefore, it holds in probability. Ex-amination of the proof shows that (7) holds for the general Shiryaev–Robertsprocedure (the underlying positive martingale does not have to be a conformalmartingale) and, moreover, (7) holds uniformly in that N depends only on ǫ and nothing else. Let us choose such an N . Fix any N ≥ N .Now we use the idea of reversing the time formally. Let F n be the σ -algebragenerated by the bag * Z , . . . , Z n − + and observations Z n , . . . , Z N . (Formally, F n is the smallest σ -algebra containing the sets { ( Z , . . . , Z n − ) ∈ A } , { Z n ∈ A n } , . . . , { Z N ∈ A N } , where A ⊆ Z n − is a symmetric measurable set of sequences of n − A n , . . . , A N ⊆ Z are measurable sets of observations.) We also allow n = N + 1, in which case F N +1 is the σ -algebra generated by the bag * Z , . . . , Z N + .Then ( E n , F n ), n = N, . . . ,

1, is a stochastic sequence (meaning that each E n is F n -measurable); moreover, it is a martingale ratio , in the sense E ( E n | F n +1 ) = 1 , n = N, . . . , . The corresponding martingale is ( T n , F n ), n = N + 1 , . . . ,

1, where T n := E n . . . E N , n = N + 1 , N, . . . , , with T N +1 understood to be 1. For simplicity, let us assume that all E n arepositive, so that T n is a positive martingale.Let us apply the Shiryaev–Roberts procedure to the martingale T N +1 , . . . , T .It gives us the stopping times τ := N + 1 and τ k := max ( n < τ k − | τ k − − X i = n E n . . . E i ≥ c ) , k = 1 , , . . . , (9)where max ∅ := 0. Let A ′ N be the largest k such that τ k >

0; in words, A ′ N isthe total number of alarms raised by the Shiryaev–Roberts procedure.Notice that each set { σ k + 1 , . . . , σ k +1 } with σ k +1 ≤ N contains at least onestopping time τ j . This can be deduced from σ k +1 X i = σ k +1 E σ k +1 . . . E i ≥ c. (10)4ndeed, let j be the largest number satisfying τ j > σ k +1 (our goal is to showthat τ j +1 ≥ σ k + 1). The inequality (10) implies τ j − X i = σ k +1 E σ k +1 . . . E i ≥ σ k +1 X i = σ k +1 E σ k +1 . . . E i ≥ c, which in combination with (9) implies, in turn, that indeed τ j +1 ≥ σ k + 1.The argument of the previous paragraph shows that A N ≤ A ′ N . Therefore,the outer inequality in (7) holds once it holds for A ′ N in place of A N (which weknow to be true). A procedure that is even more popular than the Shiryaev–Roberts procedurein change detection is Page’s CUSUM procedure [2], which can be obtainedfrom Shiryaev–Roberts by replacing P with max [8, Section 4]. The MUSUCprocedure is the sequence of stopping times deﬁned by σ := 0 and (4) withmax in place of P . Notice that this deﬁnition can be simpliﬁed by droppingthe max. We can say, equivalently, that the MUSUC procedure is the sequenceof stopping times σ := 0 and σ k := min (cid:8) n > σ k − | E σ k − +1 . . . E n ≥ c (cid:9) , k = 1 , , . . . . (11)Notice that, if (8) were a genuine martingale, the conditional probabilitythat the inequality in (11) holds for some n would not exceed 1 /c ; it is a versionof Ville’s inequality [6, p. 100]. But since (8) is not necessarily a martingale, weonly have the following weaker statement analogous to Proposition 1. Proposition 2.

Let A n be the number (5) of alarms raised by the MUSUCprocedure (4) after seeing the ﬁrst n observations Z , . . . , Z n . Then (6) holds.Proof. The usual relation between the CUSUM and Shiryaev–Roberts proce-dures with the same parameter c is that the latter raises alarms more oftenthan the former (see, e.g., [8], proof of Corollary 4.2). This relation still holdsfor the MUSUC and Roberts–Shiryaev procedures (although it becomes slightlyless obvious), which, in combination with Proposition 1, implies Proposition 2.Let us check this relation. Formally, the relation is that σ k ≤ σ ′ k for all k ,where σ k (resp. σ ′ k ) is the time of the k th alarm raised by Roberts–Shiryaev(resp. MUSUC). Suppose it does not hold and let k be the smallest numbersuch that σ k > σ ′ k . It is obvious that k >

1. By deﬁnition, σ k − ≤ σ ′ k − . It isobvious that, in this case, σ k − < σ ′ k − . It is only possible if σ ′ k − Y i = σ k − +1 E i < σ k ≤ σ ′ k ). However, (12) contradicts the deﬁnition ofMUSUC; in this case we would have σ ′ k − ≤ σ k − .5 Comparison with methods based on confor-mal martingales

The only existing approach to detecting nonrandomness online is based on con-formal prediction; see [9, Section 7.1] and [1, 8]. The approach of this paperis based on conformal e-prediction. The two approaches are very diﬀerent, andit is unlikely that either of them will be better in all interesting applications.These are some diﬀerences: • Design of conformal martingales involves two distinct steps: using a con-formity measure to obtain p-values and then betting against those p-values. Conformal e-pseudomartingales do not involve such a rigid divisionand thus appear to be more ﬂexible. • On the other hand, when betting on the n th step against the n th p-value p n , n = 1 , , . . . , conformal martingales may use the previous p-values p , . . . , p n − . Such dependence on the past is not allowed for conformale-pseudomartingales. • Conformal martingales are randomized (without randomization we onlyobtain conformal supermartingales) whereas conformal e-pseudomartin-gales do not require randomization (it is optional and not used in thisnote).

As discussed in Section 1, this note only establishes simple validity results. Theeﬃciency, in the sense of raising an alarm soon after the change, is an interestingtopic of further research, theoretical or experimental (simulation or empiricalstudies).Another interesting direction is to establish non-asymptotic validity results.

Acknowledgments

This research was partially supported by Amazon, Astra Zeneca, and StenaLine.

References [1] Valentina Fedorova, Ilia Nouretdinov, Alex Gammerman, and VladimirVovk. Plug-in martingales for testing exchangeability on-line. In JohnLangford and Joelle Pineau, editors,

Proceedings of the Twenty Ninth In-ternational Conference on Machine Learning , pages 1639–1646. Omnipress,2012. 62] Ewan S. Page. Continuous inspection schemes.

Biometrika , 41:100–115,1954.[3] S. W. Roberts. A comparison of some control chart procedures.

Technomet-rics , 8:411–430, 1966.[4] Glenn Shafer. The language of betting as a strategy for statistical andscientiﬁc communication. Technical Report arXiv:1903.06991 [math.ST],arXiv.org e-Print archive, March 2019. To appear as discussion paper inthe

Journal of the Royal Statistical Society A , and to be read in September2020.[5] Albert N. Shiryaev. On optimum methods in quickest detection problems.

Theory of Probability and Its Applications , 8:22–46, 1963.[6] Jean Ville.

Etude critique de la notion de collectif . Gauthier-Villars, Paris,1939.[7] Vladimir Vovk. Cross-conformal e-prediction. Technical ReportarXiv:2001.05989 [stat.ME], arXiv.org e-Print archive, January 2020.[8] Vladimir Vovk. Testing randomness. Technical ReportarXiv:1906.09256 [math.PR], arXiv.org e-Print archive, March 2020.[9] Vladimir Vovk, Alex Gammerman, and Glenn Shafer.