aa r X i v : . [ m a t h . S T ] J un Conformal e-prediction for change detection
Vladimir VovkJune 4, 2020
Abstract
We adapt conformal e-prediction to change detection, defining ana-logues of the Shiryaev–Roberts and CUSUM procedures for detecting vi-olations of the IID assumption. Asymptotically, the frequency of falsealarms for these analogues does not exceed the usual bounds.The version of this paper at http://alrw.net (Working Paper 29) isupdated most often.
We adapt conformal e-predictors, as defined in [7], to change detection. Thestandard approaches to change detection assume the independence of observa-tions (given the change-point in the Bayesian approach) and known pre-changeand post-change distributions (again given the change-point in the Bayesianapproach). In this note we will just assume that before the change-point theobservations are IID (the change-point may be already the first observation)and after the change-point the observations cease to be IID.Since our problem has so little structure, we will be able to prove only validityresults: before the change-point our procedures do not raise alarms too often.The efficiency (raising an alarm soon after the change) is a topic of furtherresearch, as we discuss in Section 5.So far the only method of change detection with the general IID assumption(or the assumption of randomness ) as null hypothesis has been conformal changedetection (see, e.g., [8]). The approach of this note is also based on conformalprediction but is simpler. On the negative side, our validity results will beweaker. For further details, see Section 4.We start the main part of this note, in Section 2, from another conformalversion of the Shiryaev–Roberts procedure and a simple statement about itsasymptotic validity. As a corollary, in Section 3 we obtain the asymptoticvalidity of an analogous conformal version of Page’s CUSUM procedure.Informally (and formally in the proof of Proposition 1), this note is basedon the idea of reversing the time, which is standard in conformal prediction [9,Section 8.7]. This is how we obtain the procedures that we call Roberts–Shiryaev(reversing Shiryaev–Roberts) and MUSUC (reversing CUSUM). However, for1implicity, in Section 2 we first present the Roberts–Shiryaev procedure in itspure form, and only later, after stating the validity result, explain connectionswith its prototype in the standard theory of change detection.
Let Z be the observation space (a measurable space), (Ω , A , P ) be an underly-ing probability space (with the expectation operator E ), and Z , Z , . . . be anIID sequence of Z -valued random elements. We are interested in a sequence z , z , . . . of elements of Z and interpret Z , Z , . . . as our observations and z , z , . . . as their realized values.We will use the notation * z , . . . , z n + for a bag (also known as multiset)consisting of elements z , . . . , z n . It will be regarded as an equivalence class ofsequences ( z , . . . , z n ), where two sequences are defined to be equivalent whenthey can be obtained from each other by permuting their elements.A conformal e-predictor is a measurable function f that maps any finitesequence ( z , . . . , z m ), for any m ∈ { , , . . . } , to a finite sequence ( α , . . . , α m )of nonnegative numbers of the same length with average 1,1 m m X i =1 α i = 1 , (1)that satisfies the following property of equivariance: for any m ∈ { , , . . . } , anypermutation π of { , . . . , m } , any ( z , . . . , z m ) ∈ Z m , and any ( α , . . . , α m ) ∈ [0 , ∞ ) m ,( α , . . . , α m ) = f ( z , . . . , z m ) = ⇒ ( α π (1) , . . . , α π ( m ) ) = f ( z π (1) , . . . , z π ( m ) ) . In terms of betting [4], f is our bet and α i shows how strange z i looks in thebag * z , . . . , z m + ; for a large m and under the assumption of exchangeability of z , . . . , z m , we do not expect α i to be large for a significant proportion of z i . Itwill be convenient to abuse the notation by setting f ( * z , . . . , z m + , z ) := α, (2)where α is the last element of the sequence( α , . . . , α m , α ) := f ( z , . . . , z m , z ) . (It is clear that the α in (2) does not depend on the ordering of the sequence( z , . . . , z m ).)With each conformal e-predictor f we can associate the sequence of nonneg-ative random variables E , E , . . . , where E n := f ( * Z , . . . , Z n − + , Z n ) . (3)Intuitively, large values of these random variables are evidence against Z , Z , . . . being IID. 2he Roberts–Shiryaev procedure for nonrandomness detection is the sequenceof stopping times σ := 0 and σ k := min n > σ k − | n X i = σ k − +1 E σ k − +1 . . . E i ≥ c , k = 1 , , . . . , (4)where c > c is a large number). Theidea behind this definition is that we raise alarms at times σ , σ , . . . warning theuser that the IID assumption may have become violated. If the IID assumptionis in fact never violated, we do not want to raise (false) alarms too often. Thefollowing proposition is a simple statement of validity. Remember that thesequence of observations Z , Z , . . . is assumed to be IID, and so all alarms arefalse. Proposition 1.
Let A n be the number of alarms A n := max { k | σ k ≤ n } (5) raised by the Roberts–Shiryaev procedure (4) after seeing the first n observations Z , . . . , Z n . Then lim sup n →∞ A n n ≤ c in probability . (6)The conclusion (6) can be spelled out as ∀ ǫ > ∃ N ∀ N ≥ N : P (cid:18) A N N > c + ǫ (cid:19) ≤ ǫ. (7)Let us see how the Roberts–Shiryaev procedure is obtained, informally, byreversing its standard counterpart. The conformal e-pseudomartingale corre-sponding to the random variables (3) is S n := E . . . E n , n = 0 , , , . . . , (8)where S is understood to be 1. It is not a genuine martingale since we only have E E n = 1 for all n instead of E ( E n | E , . . . , E n − ) = 1. It is not clear whatproperties of validity the Shiryaev–Roberts procedure would retain if appliedto S n .Instead, we can choose a large N and apply the Shiryaev–Roberts procedure([5, 3]; we will use the description in [8, (13)]) to the martingale T n := E n . . . E N , n = N, N − , . . . , T n divides thetime { , , . . . } into intervals ( a, b ) such that, roughly, b X i = a +1 T a T i ≈ c.
3y definition, this can be rewritten as E a + E a E a +1 + · · · + E a . . . E b − ≈ c, which motivates our definition (4). Proof of Proposition 1.
Let us fix ǫ > N satisfying (7). Ac-cording to [8, Proposition 4.3], for the conformal Shiryaev–Roberts procedurethe inequality in (6) holds almost surely; therefore, it holds in probability. Ex-amination of the proof shows that (7) holds for the general Shiryaev–Robertsprocedure (the underlying positive martingale does not have to be a conformalmartingale) and, moreover, (7) holds uniformly in that N depends only on ǫ and nothing else. Let us choose such an N . Fix any N ≥ N .Now we use the idea of reversing the time formally. Let F n be the σ -algebragenerated by the bag * Z , . . . , Z n − + and observations Z n , . . . , Z N . (Formally, F n is the smallest σ -algebra containing the sets { ( Z , . . . , Z n − ) ∈ A } , { Z n ∈ A n } , . . . , { Z N ∈ A N } , where A ⊆ Z n − is a symmetric measurable set of sequences of n − A n , . . . , A N ⊆ Z are measurable sets of observations.) We also allow n = N + 1, in which case F N +1 is the σ -algebra generated by the bag * Z , . . . , Z N + .Then ( E n , F n ), n = N, . . . ,
1, is a stochastic sequence (meaning that each E n is F n -measurable); moreover, it is a martingale ratio , in the sense E ( E n | F n +1 ) = 1 , n = N, . . . , . The corresponding martingale is ( T n , F n ), n = N + 1 , . . . ,
1, where T n := E n . . . E N , n = N + 1 , N, . . . , , with T N +1 understood to be 1. For simplicity, let us assume that all E n arepositive, so that T n is a positive martingale.Let us apply the Shiryaev–Roberts procedure to the martingale T N +1 , . . . , T .It gives us the stopping times τ := N + 1 and τ k := max ( n < τ k − | τ k − − X i = n E n . . . E i ≥ c ) , k = 1 , , . . . , (9)where max ∅ := 0. Let A ′ N be the largest k such that τ k >
0; in words, A ′ N isthe total number of alarms raised by the Shiryaev–Roberts procedure.Notice that each set { σ k + 1 , . . . , σ k +1 } with σ k +1 ≤ N contains at least onestopping time τ j . This can be deduced from σ k +1 X i = σ k +1 E σ k +1 . . . E i ≥ c. (10)4ndeed, let j be the largest number satisfying τ j > σ k +1 (our goal is to showthat τ j +1 ≥ σ k + 1). The inequality (10) implies τ j − X i = σ k +1 E σ k +1 . . . E i ≥ σ k +1 X i = σ k +1 E σ k +1 . . . E i ≥ c, which in combination with (9) implies, in turn, that indeed τ j +1 ≥ σ k + 1.The argument of the previous paragraph shows that A N ≤ A ′ N . Therefore,the outer inequality in (7) holds once it holds for A ′ N in place of A N (which weknow to be true). A procedure that is even more popular than the Shiryaev–Roberts procedurein change detection is Page’s CUSUM procedure [2], which can be obtainedfrom Shiryaev–Roberts by replacing P with max [8, Section 4]. The MUSUCprocedure is the sequence of stopping times defined by σ := 0 and (4) withmax in place of P . Notice that this definition can be simplified by droppingthe max. We can say, equivalently, that the MUSUC procedure is the sequenceof stopping times σ := 0 and σ k := min (cid:8) n > σ k − | E σ k − +1 . . . E n ≥ c (cid:9) , k = 1 , , . . . . (11)Notice that, if (8) were a genuine martingale, the conditional probabilitythat the inequality in (11) holds for some n would not exceed 1 /c ; it is a versionof Ville’s inequality [6, p. 100]. But since (8) is not necessarily a martingale, weonly have the following weaker statement analogous to Proposition 1. Proposition 2.
Let A n be the number (5) of alarms raised by the MUSUCprocedure (4) after seeing the first n observations Z , . . . , Z n . Then (6) holds.Proof. The usual relation between the CUSUM and Shiryaev–Roberts proce-dures with the same parameter c is that the latter raises alarms more oftenthan the former (see, e.g., [8], proof of Corollary 4.2). This relation still holdsfor the MUSUC and Roberts–Shiryaev procedures (although it becomes slightlyless obvious), which, in combination with Proposition 1, implies Proposition 2.Let us check this relation. Formally, the relation is that σ k ≤ σ ′ k for all k ,where σ k (resp. σ ′ k ) is the time of the k th alarm raised by Roberts–Shiryaev(resp. MUSUC). Suppose it does not hold and let k be the smallest numbersuch that σ k > σ ′ k . It is obvious that k >
1. By definition, σ k − ≤ σ ′ k − . It isobvious that, in this case, σ k − < σ ′ k − . It is only possible if σ ′ k − Y i = σ k − +1 E i < σ k ≤ σ ′ k ). However, (12) contradicts the definition ofMUSUC; in this case we would have σ ′ k − ≤ σ k − .5 Comparison with methods based on confor-mal martingales
The only existing approach to detecting nonrandomness online is based on con-formal prediction; see [9, Section 7.1] and [1, 8]. The approach of this paperis based on conformal e-prediction. The two approaches are very different, andit is unlikely that either of them will be better in all interesting applications.These are some differences: • Design of conformal martingales involves two distinct steps: using a con-formity measure to obtain p-values and then betting against those p-values. Conformal e-pseudomartingales do not involve such a rigid divisionand thus appear to be more flexible. • On the other hand, when betting on the n th step against the n th p-value p n , n = 1 , , . . . , conformal martingales may use the previous p-values p , . . . , p n − . Such dependence on the past is not allowed for conformale-pseudomartingales. • Conformal martingales are randomized (without randomization we onlyobtain conformal supermartingales) whereas conformal e-pseudomartin-gales do not require randomization (it is optional and not used in thisnote).
As discussed in Section 1, this note only establishes simple validity results. Theefficiency, in the sense of raising an alarm soon after the change, is an interestingtopic of further research, theoretical or experimental (simulation or empiricalstudies).Another interesting direction is to establish non-asymptotic validity results.
Acknowledgments
This research was partially supported by Amazon, Astra Zeneca, and StenaLine.
References [1] Valentina Fedorova, Ilia Nouretdinov, Alex Gammerman, and VladimirVovk. Plug-in martingales for testing exchangeability on-line. In JohnLangford and Joelle Pineau, editors,
Proceedings of the Twenty Ninth In-ternational Conference on Machine Learning , pages 1639–1646. Omnipress,2012. 62] Ewan S. Page. Continuous inspection schemes.
Biometrika , 41:100–115,1954.[3] S. W. Roberts. A comparison of some control chart procedures.
Technomet-rics , 8:411–430, 1966.[4] Glenn Shafer. The language of betting as a strategy for statistical andscientific communication. Technical Report arXiv:1903.06991 [math.ST],arXiv.org e-Print archive, March 2019. To appear as discussion paper inthe
Journal of the Royal Statistical Society A , and to be read in September2020.[5] Albert N. Shiryaev. On optimum methods in quickest detection problems.
Theory of Probability and Its Applications , 8:22–46, 1963.[6] Jean Ville.
Etude critique de la notion de collectif . Gauthier-Villars, Paris,1939.[7] Vladimir Vovk. Cross-conformal e-prediction. Technical ReportarXiv:2001.05989 [stat.ME], arXiv.org e-Print archive, January 2020.[8] Vladimir Vovk. Testing randomness. Technical ReportarXiv:1906.09256 [math.PR], arXiv.org e-Print archive, March 2020.[9] Vladimir Vovk, Alex Gammerman, and Glenn Shafer.