Optimal Sequential Detection of Signals with Unknown Appearance and Disappearance Points in Time
Alexander G. Tartakovsky, Nikita R. Berenkov, Alexei E. Kolessa, Igor V. Nikiforov
IIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , 2021 1
Optimal Sequential Detection of Signals withUnknown Appearance and Disappearance Points inTime
Alexander G. Tartakovsky,
Senior Member, IEEE , Nikita R. Berenkov, Alexei E. Kolessa, andIgor V. Nikiforov
Abstract —The paper addresses a sequential changepoint de-tection problem, assuming that the duration of change may befinite and unknown. This problem is of importance for manyapplications, e.g., for signal and image processing where signalsappear and disappear at unknown points in time or space.In contrast to the conventional optimality criterion in quickestchange detection that requires minimization of the expected delayto detection for a given average run length to a false alarm,we focus on a reliable maximin change detection criterion ofmaximizing the minimal probability of detection in a giventime (or space) window for a given local maximal probabilityof false alarm in the prescribed window. We show that theoptimal detection procedure is a modified CUSUM procedure. Wethen compare operating characteristics of this optimal procedurewith popular in engineering the Finite Moving Average (FMA)detection algorithm and the ordinary CUSUM procedure usingMonte Carlo simulations, which show that typically the lateralgorithms have almost the same performance as the optimalone. At the same time, the FMA procedure has a substantialadvantage – independence to the intensity of the signal, whichis usually unknown. Finally, the FMA algorithm is applied todetecting faint streaks of satellites in optical images.
Index Terms —Sequential Changepoint Detection; UnknownAppearance and Disappearance Times; Probability of Detection;Probability of False Alarm; Optimal Stopping; Detection ofObject Traces.
I. I
NTRODUCTION C Hangepoint detection problems arise in a variety ofapplications that are described in detail in [35]. In mostcases, the quickest change detection is considered where onehas to detect a change as soon as possible, i.e., with theminimal average delay to detection for a given false alarm rate(see, e.g., [35] and references therein). However, in certainapplications, it is of interest to consider a reliable change
The work was supported in part by the grant 18-19-00452 from the RussianScience Foundation at the Moscow Institute of Physics and Technology.A. G. Tartakovsky is a Deputy Head of the Space informatics Laboratoryat the Moscow Institute of Physics and Technology, Russia and President ofAGT StatConsult, Los Angeles, California, USA; e-mail: [email protected]. R. Berenkov is a postgraduate student at the Moscow Institute of Physicsand Technology, Russia; email: [email protected]. E. Kolessa is the Principal Scientist in the Space informatics Labo-ratory at the Moscow Institute of Physics and Technology, Russia; e-mail:[email protected]. V. Nikiforov is Emeritus Professor at the Universit´e de Technologie deTroyes, Troyes Cedex, France; e-mail: [email protected] received 2020; revised 2021Copyright (c) 2021 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. detection when it is needed to maximize a probability ofdetection at a certain time (or space) interval for a givenprobability of false alarm. For instance, in surveillance sys-tems such as radars, sonars, and electro-optic/infrared sensorsystems, which deal with detecting moving and maneuveringobjects that appear and disappear at unknown times, it isnecessary to detect a signal from a randomly appearing targetin clutter and noise with the maximal detection probability [2],[16], [25], [34]. Also, examples include but are not limitedto: (a) aircraft navigation with many safety-critical modes(landing, takeoff, etc.) [1], where the minimum operationalperformance specifies the required time-to-alert, the worst-casemissed detection probability and the worst-case probability offalse alarm during a given period; and (b) cyber-security [4],[32], [36], [38] when there are malicious intrusion attempts incomputer networks which incur significant financial damageand cause severe harm to the integrity of personal information.In these cases, it is essential to devise effective techniques todetect anomalies in observations reliably so that an appropriateresponse can be provided and the negative consequencesare mitigated. In these and other applications, the statisticalbehavior of observed data is dynamic, so it is of importanceto detect transient changes. For example, after an outage in thepower systems, the system’s transient behavior is dominatedby the generators’ inertial response.In this paper, we address the problem of detecting a changethat has a finite duration. The problem of detecting transientchanges with known and unknown durations has been con-sidered in [6]–[8], [18], [24], [30], [31]. In particular, articles[7], [8] establish the asymptotic performance of the window-limited CUSUM procedure as the false alarm probability goesto for detection of transient changes with known duration.However, the issue of optimality or asymptotic optimality isstill open. The problem of detection of transient and movinganomalies has also been considered in papers [5], [23], [26],[27], [39] but in terms of quickest change detection.The rest of the paper is organized as follows. In Section II,we describe the stochastic model, which is treated in the paper,as well as the optimality criteria. In Section III, we find the op-timal detection procedure that maximizes detection probabilityin the worst-case scenario in the class of detection procedureswith the given local false alarm probability (in a certainwindow), assuming that the duration of a change (or windowsize) is random and distributed with the geometric distribution.This procedure turns out to be the modified Cumulative Sum a r X i v : . [ m a t h . S T ] F e b IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , 2021 (CUSUM) rule. Proof of optimality (see Theorem 1) is basedon the optimal stopping results established in the paper by Poor[22]. In Section IV, two alternative detection procedures areintroduced – the Finite Moving Average (FMA) procedure andthe conventional CUSUM procedure. In Section V, we providethe results of Monte Carlo simulations for the Gaussianmodel which show that the FMA and the CUSUM procedureshave almost the same operating characteristics as the optimalprocedure. This allows us to suggest using the FMA procedurein Section VI in an important practical problem of detectingstreaks of space objects with unknown position (beginningand end) in 2-D images obtained by telescopes. Section VIIconcludes the paper.II. T HE S TOCHASTIC M ODEL AND O PTIMALITY C RITERIA
Suppose there is a sequence of independent observations { Y n } n (cid:62) , observed sequentially in time subject to a changeat an unknown time ν ∈ { , , , . . . } , which lasts till thetime ν + N so that Y , . . . , Y ν and Y N + ν +1 , Y N + ν +2 , . . . aregenerated by one stochastic model and Y ν +1 , Y ν +2 , . . . , Y ν + N by another model. Throughout the paper, ν is treated asunknown and nonrandom, and N can be either unknowndeterministic (as in Section VI) or random (as in Section III).See also Remark 2. The joint density p ( Y n | H ν,N ) of thevector Y n = ( Y , . . . , Y n ) observed up to time n under thehypothesis H ν,N that the change happens at the time ν andends at N is of the form p ( Y n | H ν,N ) = p ( Y n | H ∞ ) = n (cid:89) t =1 g ( Y t ) for ν (cid:62) n,p ( Y n | H ν,N ) = ν (cid:89) t =1 g ( Y t ) × n (cid:89) t = ν +1 f ( Y t ) for ν < n (cid:54) ν + N,p ( Y n | H ν,N ) = ν (cid:89) t =1 g ( Y t ) × ν + N (cid:89) t = ν +1 f ( Y t ) × n (cid:89) t = ν + N +1 g ( Y t ) for n > ν + N, (1)where g ( Y t ) and f ( Y t ) are pre- and post-change densities,respectively. The event { ν = ∞} and the correspondinghypothesis H ∞ : ν = ∞ mean that there never is a change.Notice that the model (1) implies that Y ν +1 is the first post-change observation under hypothesis H ν,N .A sequential changepoint detection procedure T is a stop-ping time associated with the time of alarm on change.Conventional quickest detection optimality criteria requireminimizing the average delay to detection for a given falsealarm rate at an infinite time horizon (assuming N = ∞ )and do not consider a probability of detection of a change ina given fixed time interval [15], [20], [28], [29], [35]. Often,however, practitioners are interested in such probabilities undera given false alarm rate even if the change lasts infinitelylong. Besides, in many applications, the length of a change In practice, this means that the length of a change is substantially largerthan an average detection delay. N is finite, e.g., in problems of detecting transient changeswith known and unknown durations [7], [8], [24], [30], [31].Then stopping outside of the interval ( ν, ν + N ] of a givenduration N may not be quite appropriate. For example, in thecontext of safety-critical systems, serious degradation of thesystem security occurs when the transient change is detectedwith a delay greater than a required time-to-alert. Therefore,the probability of detection of a change within a given fixedtime interval should be used instead of the average delay todetection. In such cases, it is reasonable to find detection rulesthat maximize the probability of detection in a certain fixedtime interval ( ν, ν + M ] , M (cid:54) N , and to consider the followingoptimality criterion: Find a rule T opt ∈ C ( m, α ) such that forevery < α < and some m (cid:62) ν ∈ Z + ess inf P ν ( T opt (cid:54) ν + M | Y ν , T opt > ν )= sup T ∈ C ( m,α ) inf ν ∈ Z + ess inf P ν ( T (cid:54) ν + M | Y ν , T > ν ) , (2)where C ( m, α ) = (cid:40) T : sup (cid:96) ∈ Z + P ∞ ( T (cid:54) (cid:96) + m | T > (cid:96) ) (cid:54) α (cid:41) is the class of detection procedures for which the localmaximal probability of false alarm LPFA m ( T ) = sup (cid:96) ∈ Z + P ∞ ( T (cid:54) (cid:96) + m | T > (cid:96) ) in a time interval of a fixed length m (cid:62) does not exceeda predefined level α ∈ (0 , . Hereafter P ν denotes theprobability under which the change occurs at ν ∈ Z + and P ∞ when the change never happens; Z + = { , , , . . . } states for the set of nonnegative integers. Also, ess inf and ess sup denote essential infimum and essential supremum,respectively.Solving the optimization problem (2) for any fixed M (except for M = 1 ) is very difficult (see [17], [21], [33] forsome optimal properties established for the case M = 1 ). Evenin an asymptotic setting as α → this problem is still open.Assume now that M , M (cid:54) N is finite and random with thegiven distribution π i = Pr( M = i ) , i = 1 , , . . . . In particular,this assumption is reasonable when M = N and the unknownduration of a change N is a nuisance parameter, i.e., the factof change disappearance does not have to be detected. Theresults also valid when the change is persistent, i.e., N = ∞ ,but still, the goal is to maximize the probability of detectionin a time window of a random size M .Introduce the probability measure ¯ P ν ( I × A ) = (cid:88) i ∈I π i P ν ( A| M = i ) . Then the probability of detection is re-written as ¯ P ν ( T (cid:54) ν + M | Y ν , T > ν )= ∞ (cid:88) i =1 π i P ν ( T (cid:54) ν + i | Y ν , T > ν, M = i ) , (3) In general, m and M are different. ARTAKOVSKY, BERENKOV, KOLESSA, NIKIFOROV: OPTIMAL SEQUENTIAL DETECTION OF SIGNALS 3 and the optimality criterion (2) gets modified as inf ν ∈ Z + ess inf ¯ P ν ( T opt (cid:54) ν + M | Y ν , T opt > ν )= sup T ∈ C ( m,α ) inf ν ∈ Z + ess inf ¯ P ν ( T (cid:54) ν + M | Y ν , T > ν ) . (4)In the next section, we provide a solution to this problemfor the geometric distribution π i .III. A N O PTIMAL D ETECTION P ROCEDURE
Let π i be the geometric distribution Geom( (cid:37) ) with theparameter (cid:37) ∈ (0 , : π i = (cid:37) (1 − (cid:37) ) i − , i = 1 , , . . . . Let Λ n = f ( Y n ) /g ( Y n ) ( n = 1 , , . . . ) be the likelihood ratioand introduce the statistic V (cid:37) ( n ) by the recursion V (cid:37) ( n ) = max { , V (cid:37) ( n − } Λ n (1 − (cid:37) ) , n (cid:62) (5)with the initial condition V (cid:37) (0) = 1 as well as the associatedstopping time T (cid:37) ( B ) = inf { n (cid:62) V (cid:37) ( n ) (cid:62) B } , B > . (6)This kind of statistic first appears in the paper by Poor [22]who considered the exponential delay function in the minimax“quickest” detection problem. Also, his results will be used inthe proof of Theorem 1.Let E ν and E ∞ denote expectations under probability mea-sures P ν and P ∞ , respectively, where P ν corresponds to model(1) with an assumed value of the change point ν .The following theorem, whose proof is given in the Ap-pendix, establishes the structure of the optimal detectionprocedure. Theorem 1.
Let observations { Y n } n (cid:62) be independent with adensity g ( x ) pre-change and with a density f ( x ) post-change.Suppose the distribution of the window size M is Geom( (cid:37) ) .Further, assume that the P ∞ -distribution of the likelihood ratio Λ = f ( Y ) /g ( Y ) is continuous and that P ∞ { Λ > (1 − (cid:37) ) − } = 1 . Then the change detection rule T (cid:37) ( B ) defined in (6) with the statistic V (cid:37) ( n ) given by the recursion (5) and withthreshold B = B m,α that satisfies sup (cid:96) ∈ Z + P ∞ { T (cid:37) ( B ) (cid:54) (cid:96) + m | T (cid:37) > (cid:96) } = α (7) is maximin optimal in the problem (4) for all < α < . Note that statistic V (cid:37) ( n ) is the maximal weighted likelihoodratio V (cid:37) ( n ) = max (cid:54) k (cid:54) n n (cid:89) j = k (1 − (cid:37) )Λ j , so the optimal rule (6) is nothing but a modified CUSUM rulewith an additional factor − (cid:37) . If the distribution P ∞ (Λ (cid:54) y ) is not continuous the assertion of Theorem 1 holds for arandomized procedure with a randomization on the boundary B . In the standard CUSUM procedure (cid:37) = 0 . Remark 1.
As follows from the proof, the detection algorithm(5)–(6) is also optimal in the class of procedures subject tothe constraint on the Average Run Length to False Alarm(ARL2FA) E ∞ [ T ] (cid:62) γ since class C ( m, α ) is more stringentthan C γ = { T : E ∞ [ T ] (cid:62) γ } for some appropriately selected γ = γ ( m, α ) (see Lemma 1 in the Appendix). However,ARL2FA makes sense only if the P ∞ -distribution of stoppingtimes of detection procedures is geometric or close to geomet-ric – asymptotically exponential. Asymptotic exponentialityproperty holds for many detection procedures with Markovdetection statistics [19]. However, we do not know whetherthis is correct for the FMA procedure considered below. Ifthe no-change distribution of stopping times is not close togeometric, then a large value of ARL2FA does not guaranteea small value of the maximal local false alarm probability LPFA m ( T ) , which is usually a necessary property in practice.A detailed discussion of this issue may be found in [37]. Remark 2.
The assertions of Theorem 1 hold in two cases:(a) when the window size is equal to the unknown changeduration, M = N ∼ Geom( (cid:37) ) , and (b) when the changeis persistent, N = ∞ , and M ∼ Geom( (cid:37) ) . However, thelatter case has perhaps only theoretical rather than practicalsignificance. Remark 3.
The time index n in all previous formulas canbe replaced by the argument of the vector-valued function ofposition ( x i , y j ) , as it is done in Section VI for the problemof detecting objects in two-dimensional images.IV. A LTERNATIVE D ETECTION P ROCEDURES
In this and the next section, we set M = N , i.e., we assumethat the window size M is equal to the signal duration N .While detection procedure T (cid:37) given by (5)–(6) is strictlyoptimal, it requires a strong assumption on the geometric dis-tribution of the signal duration N . A more practical approachis to use the procedures in a fixed sliding window with size L . In papers [7], [8], a window-limited CUSUM procedure T = inf (cid:40) n ≥ L : max ≤ k ≤ L (cid:34) n (cid:88) t = n − k +1 λ t − A ( k ) (cid:35) ≥ (cid:41) , (8)where λ t = log Λ t is the log-likelihood ratio, has beenproposed for this purpose. It was shown that the Finite MovingAverage procedure given by the stopping time T FMA ( a ) = inf (cid:40) n (cid:62) L : n (cid:88) t = n − L +1 λ t (cid:62) a (cid:41) (9)and the window-limited CUSUM procedure T with a specific(optimal) variable threshold A ( k ) , minimizing the worst-casemissed detection probability, have the same asymptotic per-formance as the maximal probability of false alarm α → .Window-limited procedures were also considered by Lai in[13] and shown to be asymptotically optimal in the quickestchange detection problem with persistent changes for minimiz-ing average detection delay for i.i.d. and non-i.i.d stochasticmodels.Another reasonable method is a simple CUSUM procedure.It is easy to show that maximizing the likelihood ratio over IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , 2021 the unknown points of change appearance and disappearance( ν and N , respectively) leads to Page’s CUSUM statistic: V ( n ) = max ν (cid:62) max N (cid:62) p ( Y n | H ν,N ) p ( Y n | H ∞ )= max { , V ( n − } Λ n , n (cid:62) . (10)Hence, define the CUSUM procedure by the stopping time: T CS ( C ) = inf { n (cid:62) V ( n ) (cid:62) C } , C > . Remark 4.
If the LLR λ t is a monotone function of thestatistic S t , then the FMA procedure can be written as T FMA (˜ a ) = inf (cid:40) n (cid:62) L : n (cid:88) t = n − L +1 S t (cid:62) ˜ a (cid:41) . (11)If the post-change distribution depends on an unknown pa-rameter θ , then λ t = λ t ( θ ) and the optimal modified CUSUM T (cid:37) ( B ) as well as the ordinary CUSUM T CS ( C ) depend on θ .Therefore, they are sensitive to the mismatch of the true valueand assumed values of θ , while the FMA structure does notdepend on θ . We expect that the FMA procedure maximizes(approximately) the probability of detection PD θ uniformlyfor all parameter values.In the next section, we compare the performance of detec-tion procedures T FMA (˜ a ) and T CS ( C ) with the optimal one.V. M ONTE C ARLO S IMULATIONS
We stress that in simulations the time window size M = N is assumed random with the geometric distribution, N ∼ Geom( (cid:37) ) . The window’s length L in the FMA rule is fixedand selected as L = E [ N ] = 1 /(cid:37) .Consider the standard signal-plus-noise model Y n = θ { ν
In simulations, we use the following parameters:1) The mean after changepoint: θ = 2 . , . .2) The tuning parameter (cid:37) of the modified CUSUM rule: . , . , . ( L = 5 , , , respectively).3) The local PFA LPFA m ( T ) : . .4) The window length m for the local PFA: , .5) The number of Monte Carlo repetitions: · .The results of comparing the performance of the optimalmodified CUSUM and the FMA rules are shown in Table I inthe case where both rules are tuned to the same true parametervalue θ . TABLE ID
ETECTION CHARACTERISTICS OF THE MODIFIED
CUSUM
RULE ANDTHE
FMA
RULE θ = 2 . , m = 20 (cid:37) PD ( T (cid:37) ) PD ( T FMA ) θ = 2 . , m = 80 PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . , m = 20 (cid:37) PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . , m = 80 (cid:37) PD ( T (cid:37) ) PD ( T FMA ) It is seen from Table I that the modified CUSUM ruleoutperforms the FMA rule in this setting, as expected sinceby Theorem 1 it is strictly optimal. However, the difference inperformance is not dramatic. It is small for (cid:37) = 0 . and . .For practical purposes, it is of interest to compare theperformance of the modified CUSUM rule against the FMArule under model mismatch. There are two types of mismatch:(a) the duration N of a transient change differs from theduration defined in the detection algorithm, and (b) the truesignal amplitude θ r differs from the assumed value θ used inthe detection algorithm.We first compare the modified CUSUM rule against theFMA rule under the transient change duration mismatch,assuming that N is fixed. The tuning parameter of the modifiedCUSUM rule is given as (cid:37) = 1 /N and the FMA window’slength is L = N . The results of this comparison are givenin Table II. Now, the FMA rule outperforms the modifiedCUSUM rule. However, the modified CUSUM rule retainsthe competitive characteristics.Second, we consider the parameter θ mismatch, i.e., wecompare the performance of the modified CUSUM and FMArules when the true signal intensity θ r differs from the assumedvalue θ . Specifically, the true model has the form Y n = θ r { ν TABLE IID ETECTION CHARACTERISTICS OF THE MODIFIED CUSUM AND FMA RULES FOR MISMATCH IN CHANGE DURATION θ = 2 . , m = 20 N PD ( T (cid:37) ) PD ( T FMA ) θ = 2 . , m = 80 N PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . , m = 20 N PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . , m = 80 N PD ( T (cid:37) ) PD ( T FMA ) (cid:37) = 0 . , N ∼ Geom( (cid:37) ) , L = E [ N ] = 10 , LPFA m =0 . , m = 20 and the assumed signal intensity values are θ = 2 . , . , . , . , . . TABLE IIID ETECTION CHARACTERISTICS OF THE MODIFIED CUSUM AND FMA RULES WITH MODEL MISMATCH IN SIGNAL INTENSITY θ = 2 . θ r PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . θ r PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . θ r PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . θ r PD ( T (cid:37) ) PD ( T FMA ) θ = 1 . θ r PD ( T (cid:37) ) PD ( T FMA ) It follows from Table III that PD ( T FMA ) depends on the trueparameter value θ r but not on the assumed value θ . Very smalldifferences in different rows occur only because of statisticalerrors of Monte Carlo simulations. This is expected since thestructure of the FMA rule does not depend on the assumedparameter value θ . See (11) in Remark 4 where S t = Y t inour example. Thus, the detection probability PD ( T FMA ) is onlya function of θ r , but not of θ . On the contrary, the probabilityof detection PD ( T (cid:37) ) varies as a function of the assumed signalintensity θ because the structure of the modified CUSUMrule depends on the assumed value of θ . Hence, the detectionprobability PD ( T (cid:37) ) is a function of both θ r and θ .We can therefore conclude that in the sense of sensitivitywith respect to the mismatch between true and assumed parameter values the FMA rule has an advantage over themodified CUSUM rule. B. Comparison of CUSUM and Modified CUSUM Rules The results of the comparison of the optimal modifiedCUSUM rule T (cid:37) against the conventional CUSUM rule T CS are shown in Tables IV and V. It is seen that the operatingcharacteristics of these rules are almost the same. The optimalone only slightly outperforms the conventional one in all testedcases.We iterate that in contrast to the FMA rule structures ofboth rules depend on the assumed parameter value θ . TABLE IVD ETECTION CHARACTERISTICS OF THE MODIFIED CUSUM ANDCONVENTIONAL CUSUM RULES ( N ∼ Geom( (cid:37) ) , LPFA m = 0 . , θ = 2 . ) m = 20 (cid:37) PD ( T (cid:37) ) PD ( T CS ) m = 60 (cid:37) PD ( T (cid:37) ) PD ( T CS ) m = 100 (cid:37) PD ( T (cid:37) ) PD ( T CS ) ETECTION CHARACTERISTICS OF THE MODIFIED CUSUM ANDCONVENTIONAL CUSUM RULES ( N ∼ Geom( (cid:37) ) , LPFA m = 0 . , θ = 1 . ) m = 20 (cid:37) PD ( T (cid:37) ) PD ( T CS ) m = 60 (cid:37) PD ( T (cid:37) ) PD ( T CS ) m = 100 (cid:37) PD ( T (cid:37) ) PD ( T CS ) VI. A PPLICATION TO D ETECTION OF S TREAKS OF S PACE O BJECTS Extracting streaks of faint space objects with unknownorbits from digital frames, captured with ground-based tele-scopes, is an important problem for Space Informatics. Avariety of methods is employed using intra-frame as well asinter-frame data processing (see, e.g., [3], [9], [11], [12], [14]).In this section, the FMA procedure, described in Section IV, isapplied to the detection of a satellite streak with a low signal-to-noise ratio (SNR) in the digital frame (see Fig. 1).We use a two-stage approach. As our problem is similarto the changepoint detection problem (since the distribution IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , 2021 Fig. 1. Digital frame with a low-contrast streak (SNR ≈ ). Red rectanglemarks the streak position. of observations changes abruptly when the streak starts andends), at the first stage, we use the FMA procedure (9), whichwill be re-written for the 2-D space case. Since FMA detectsstreaks with a delay (i.e., we can only localize streaks), at thenext stage, we use maximum likelihood estimation to calculatethe streak position more accurately.When operating with real frames situation is aggravatedby the presence of stars and background that produce strongdiscrete clutter. In this case, special preprocessing for clutterremoval have to be implemented (see, e.g., [34]). For ourproblem, we consider that the input frame is free from clutterand contains only one streak and Gaussian noise after prepro-cessing, independent from pixel to pixel.To be specific, let the satellite have a linear and uniformmotion in the frame. The satellite streak is given by the vector X = ( x , y , x , y ) , where ( x , y ) corresponds to the startpoint and ( x , y ) corresponds to the endpoint.Hence, consider the following model of the observation Y i,j in pixel ( i, j ) of the 2-D frame [10]: Y i,j = AS i,j ( X ) + (cid:15) i,j , (12)where A is a signal intensity from the object, { S i,j ( X ) } arevalues of the model profile of the streak that are calculatedbeforehand assuming point spread function (PSF) is Gaus-sian with a certain effective width (see Fig. 2); and (cid:15) i,j isGaussian noise after preprocessing with zero mean and known(estimated empirically) local variance σ . Thus, the obser-vation Y i,j has normal (“pre-change”) distribution g ( Y i,j ) = N (0 , σ ) when the streak does not “cover” pixel ( i, j ) and nor-mal (“post-change”) distribution f ( Y i,j ) = N ( AS i,j ( X ) , σ ) when the streak “covers” the pixel ( i, j ) .The problem is to detect the streak with the maximalprobability of detection as well as to find the most accurateestimate ˆ X = ( ˆ x , ˆ y , ˆ x , ˆ y ) of the streak position or to makea decision that streak does not exist in the frame. Remark 5. It is worth noting that the problem of extractionof objects’ steaks can be solved non-sequentially as a fixedsample size joint hypothesis testing and estimation problem,using, e.g., the generalized likelihood ratio hypothesis (GLR)test. However, the GLR test requires testing too many hypothe-ses since neither direction nor position of streaks is known. As a result, the GLR test is usually very time-consuming.The computational complexity of the proposed changepointdetection (with further estimation) algorithm is quite low. Thisis an mportant advantage over the GLR test. Fig. 2. Model profile of the streak with the length of 80 px. A. Stage 1: Detection and Localization of the Streak In what follows, we restrict our attention to intra-framedetection of faint streaks of only subequatorial satellites withunknown orbits on frames taken with telescopes mounted atthe equator. Thus, a signal from each satellite (with unknownintensity, start and end points) is located almost vertically ina small area at the center of the frame shown in Fig. 3. Fig. 3. Search area is blue. Dotted line shows one of the possible directions.White rectangle – streak, red rectangle – sliding window. Let Ω S denote streak search area. We select a step of . pxin the upper and lower borders of the search area Ω S to definea set of directions inside Ω S . Let d denote a certain direction.Let M d ( k ) , k (cid:62) stand for a 2-D sliding rectangular windowwhich contains certain pixel numbers ( i, j ) at each step k inthe direction d . Window M d ( k ) has a fixed length of N d pixelsand a fixed width of K d pixels (the choice of the parametersdepends on the expected SNR and PSF effective width).If we fix the certain direction d , then for this directionwe solve the sequential detection problem. Thus, observations ARTAKOVSKY, BERENKOV, KOLESSA, NIKIFOROV: OPTIMAL SEQUENTIAL DETECTION OF SIGNALS 7 { Y n } n (cid:62) from Section II are formed in the following manner(see Fig. 4): Y is formed from signals { Y i,j } ( i,j ) ∈ M d (1) , { Y k } k (cid:62) are formed from signals { Y i,j } ( i,j ) ∈ M d ( k ) − M d ( k − . Fig. 4. Observations { Y k } k (cid:62) contain signals Y i,j from new pixel numbers ( i, j ) which appear in the window M d ( k ) when sliding from the previous ( k − -th step. For the Gaussian model of the streak profile and consideringthat k plays the role of the time index n , defined in Sections IIand III, the FMA procedure (9) is re-written as (cid:100) T FMA (˜ h ) = inf (cid:110) k (cid:62) R M d ( k ) ( Y ) (cid:62) ˜ h (cid:111) ,R M d ( k ) ( Y ) = (cid:88) ( i,j ) ∈ M d ( k ) Y i,j S i,j ( X ) , where { S i,j ( X ) } are values of the Gaussian model profilecalculated beforehand. Profile location is given by the vector X = ( x , y , x , y ) ; points ( x , y ) and ( x , y ) are locatedin the direction d ; R M d ( k ) ( Y ) is the changepoint detectionstatistic.Experimentally we have chosen the most suitable param-eters of the window M d ( k ) in our case: N d = 15 px and K d = 8 px, while SNR ≈ N isunknown and cannot be less than px.Hence, when sliding the 2-D window in various direc-tions inside Ω S and then choosing the longest sequence of R M d ( k ) ( Y ) values above the threshold we determine theapproximate position of the streak with a typical accuracy of5-10 pixels. Therefore, we can determine an area (localizationarea), which with a high probability contains the streak.Fig. 5 shows one of the possible behaviors of the R M d ( k ) ( Y ) statistic along the right direction in the case of avery low SNR = 0 . . A more accurate estimation of the streakposition is performed in the localization area at Stage 2. B. Stage 2. Accurate Estimation of the Streak Position After localizing the streak in a certain area (denote it by Π ), we use maximum likelihood estimates ˆ A and ˆ X of thestreak parameters X , A calculated from ( ˆ X , ˆ A ) = arg min X ,A (cid:88) ( i,j ) ∈ Π [ Y i,j − AS i,j ( X )] , Fig. 5. The behavior of the R M d ( l ) ( Y ) statistic along the correct direction.Real streak position is marked with blue. The streak is detected withcoordinates of start and end at points 47 and 117, respectively, while thetrue values are 40 and 110. where minimization is under constraints ( x , y ) ∈ Π , ( x , y ) ∈ Π , A > . It can be easily seen that ˆ X = arg min X (cid:88) ( i,j ) ∈ Π Y i,j − (cid:88) ( i,j ) ∈ Π Y i,j S i,j ( X ) (cid:88) ( i,j ) ∈ Π S i,j ( X ) S i,j ( X ) . This maximum likelihood estimate ˆ X of the streak positionyields the final solution to our problem. C. Testing We tested the proposed algorithm using Monte Carlo sim-ulation of random linear streaks with the length of 50 pxsuperimposed on Gaussian noise.The standard deviation (SD) of the estimated start andendpoints of the streak as a function of SNR was obtainedfor the range of SNR values from to . (see Fig. 6). It isseen that the accuracy is high even for the SNR =1. Fig. 6. SD as function of SNR. Dotted line corresponds to the plot for SDof the estimate of the streak start, solid line – to the estimate of the streakend. Number of MC trials · . VII. C ONCLUSION We have found a strictly optimal solution to the problemof detecting signals with unknown moments of appearance IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , 2021 and disappearance when it is required to maximize the prob-ability of detection in the window of random size, distributedaccording to geometric distribution, in the class of detectionprocedures with the given maximal probability of a false alarmin a prescribed finite window. This solution is obtained usingthe optimal stopping theory, and the optimal procedure is amodified CUSUM. The results of Monte Carlo simulations forthe Gaussian example show that the operating characteristicsof the optimal procedure are typically close to that of themore practical FMA procedure. As a result, we propose touse the latter procedure in an important for Space Informaticsproblem – for intra-frame detection of faint satellite streakswith unknown orbits in optical images.A CKNOWLEDGEMENT We are grateful to George Moustakides for useful discus-sions. Thanks also go to referees whose comments improvedthe article. A PPENDIX To prove Theorem 1 we need the following lemma, whichestablishes that class C ( m, α ) is more stringent than C γ = { T : E ∞ [ T ] (cid:62) γ } for some appropriately selected γ = γ ( m, α ) . The unconditional variant of this lemma has beenconsidered in the paper by Lai [13]. Lemma 1. Let m be a positive integer. If T ∈ C ( m, α ) , then T necessarily belongs to class C γ for some γ = γ ( m, α ) , inparticular for γ ( m, α ) = 12 α m − (cid:88) i =0 P ∞ ( T > i ) . (A.1) Proof: Let m be a positive integer, m < γ . If T ∈ C γ ,then there exists an (cid:96) , possibly depending on γ , such that P ∞ ( T (cid:54) (cid:96) + m | T > (cid:96) ) < m/γ. (A.2)Indeed, we have E ∞ [ T ] = ∞ (cid:88) (cid:96) =0 P ∞ ( T > (cid:96) ) = m − (cid:88) i =0 ∞ (cid:88) k =0 P ∞ ( T > i + km )= m − (cid:88) i =0 ∞ (cid:88) k =0 P ∞ ( T > i ) P ∞ ( T > i + km | T > i ) . (A.3)Suppose that P ∞ ( T > (cid:96) + m | T > (cid:96) ) < − m/γ for all (cid:96) ∈ Z + , which implies P ∞ ( T > i + km | T > i ) < − km/γ < (1 − m/γ ) k , so it follows from equality (A.3) that E ∞ [ T ] < m − (cid:88) i =0 P ∞ ( T > i ) ∞ (cid:88) k =0 (1 − m/γ ) k = ( γ/m ) m − (cid:88) i =0 P ∞ ( T > i ) < γ, which contradicts the assumption T ∈ C γ , and therefore,proves (A.2).Next, we prove that, for a given < α < , the constraint sup (cid:96) ∈ Z + P ∞ ( T (cid:54) (cid:96) + m | T > (cid:96) ) (cid:54) α for some m (cid:62) (A.4)is stronger than the average run length constraint E ∞ [ T ] (cid:62) γ ( γ (cid:62) ), i.e., if T ∈ C ( m, α ) , then this implies that T ∈ C γ for some γ = γ ( α, m ) . Write K = 1 /mβ . Let the inequality(A.4) hold with α = mβ . Then P ∞ ( T > i + km | T > i ) (cid:62) − kmβ = 1 − k/K for all i (cid:62) , and using (A.3) we obtain E ∞ [ T ] = m − (cid:88) i =0 ∞ (cid:88) k =0 P ∞ ( T > i ) P ∞ ( T > i + km | T > i ) , (cid:62) m − (cid:88) i =0 P ∞ ( T > i ) K (cid:88) k =0 (1 − k/K )= (cid:20) K − K (cid:18) K + 1 (cid:19)(cid:21) m − (cid:88) i =0 P ∞ ( T > i )= 12 ( K + 1) m − (cid:88) i =0 P ∞ ( T > i ) (cid:62) mβ m − (cid:88) i =0 P ∞ ( T > i )= 12 α m − (cid:88) i =0 P ∞ ( T > i ) . This implies (A.1). Proof of Theorem 1: We have ∞ (cid:88) i =1 π i P ν ( ν < T (cid:54) ν + i | T > ν, Y ν , M = i )= E ν (cid:34) ∞ (cid:88) i =1 (cid:37) (1 − (cid:37) ) i − { International Journal of Adaptive Control SignalProcessing , vol. 14, no. 7, pp. 683–700, Nov. 2000.[2] P. A. Bakut, I. A. Bolshakov, B. M. Gerasimov, A. A. Kuriksha, V. G.Repin, G. P. Tartakovsky, and V. V. Shirokov, Statistical Radar Theory .Moscow, USSR: Sovetskoe Radio, 1963, vol. 1 (G. P. Tartakovsky,Editor), in Russian.[3] V. Cvrcek and R. Sara, “Detection and certification of faint streaksin astronomical images,” in Proc. 14th International Conference onComputer Vision Theory and Applications , Jan. 2019, pp. 498–509.[4] V. L. Do, L. Fillatre, I. Nikiforov, and P. Willett, “Security of SCADAsystems against cyber-physical attacks,” IEEE Aerospace & ElectronicsSystems Magazine , vol. 32, no. 5, pp. 28–45, May 2017.[5] E. Ebrahimzadeh and A. Tchamkerten, “Sequential detection of transientchanges in stochastic systems under a sampling constraint,” in Proceed-ings of the IEEE International Symposium on Information Theory (ISIT) ,Jun. 2015, pp. 156–160.[6] D. Egea-Roca, J. A. L´opez-Salcedo, G. Seco-Granados, and H. V. Poor,“Performance bounds for finite moving average tests in transient changedetection,” IEEE Transactions on Signal Processing , vol. 66, no. 6, pp.1594–1606, Jan. 2018.[7] B. K. Gu´eppi´e, L. Fillatre, and I. V. Nikiforov, “Sequential detectionof transient changes,” Sequential Analysis , vol. 31, no. 4, pp. 528–547,Dec. 2012.[8] ——, “Detecting a suddenly arriving dynamic profile of finite duration,” IEEE Transactions on Information Theory , vol. 63, no. 5, pp. 3039–3052, May 2017.[9] P. Hickson, “A fast algorithm for the detection of faint orbital debristracks in optical images,” Advances in Space Research , vol. 62, pp.3078–3085, Sep. 2018.[10] A. E. Kolessa, “Correction of point target track parameters by meansof joint processing of sequential images,” Achievements of ModernRadioelectronics , vol. 4, Jan. 2009.[11] ——, “Detection of faint space debris elements with unknown orbits,”in , vol. 723. ESA SpecialPublication, Jan. 2013.[12] A. E. Kolessa, A. V. Pruglo, S. S. Ravdin, A. K. Kim, and A. P.Lukyanov, “Complex of algorithms for automatic detection of spaceobjects from optical images, evaluation of angular coordinates andorbital parameters,” Ecological bulletin of research centers of the BlackSea Economic Cooperation , vol. 3, no. 4, Jan. 2013.[13] T. L. Lai, “Information bounds and quick detection of parameter changesin stochastic systems,” IEEE Transactions on Information Theory ,vol. 44, no. 7, pp. 2917–2929, Nov. 1998.[14] M. L`evesque and S. Buteau, “Image processing technique for automaticdetection of satellite streaks,” Defence R&D Canada - Valcartier, Tech-nical report, 2007.[15] G. Lorden, “Procedures for reacting to a change in distribution,” Annalsof Mathematical Statistics , vol. 42, no. 6, pp. 1897–1908, Dec. 1971.[16] J. Marage and Y. Mori, Sonar and Underwater Acoustics . London,Hoboken: STE Ltd and John Wiley & Sons, 2013.[17] G. V. Moustakides, “Multiple optimality properties of the Shewhart test,” Sequential Analysis , vol. 33, no. 3, pp. 318–344, Jun. 2014.[18] J. Noonan and A. Zhigljavsky, “Power of the MOSUM test for onlinedetection of a transient change in mean,” Sequential Analysis , vol. 39,no. 2, pp. 269–293, 2020.[19] M. Pollak and A. G. Tartakovsky, “Asymptotic exponentiality of thedistribution of first exit times for a class of Markov processes withapplications to quickest change detection,” Theory of Probability andits Applications , vol. 53, no. 3, pp. 430–442, Jul. 2009. [20] M. Pollak, “Optimal detection of a change in distribution,” Annals ofStatistics , vol. 13, no. 1, pp. 206–227, Mar. 1985.[21] M. Pollak and A. M. Krieger, “Shewhart revisited,” Sequential Analysis ,vol. 32, no. 2, pp. 230–242, Jun. 2013.[22] H. V. Poor, “Quickest detection with exponential penalty for delay,” Annals of Statistics , vol. 26, no. 6, pp. 2179–2205, Dec. 1998.[23] K. Premkumar, A. Kumar, and V. Veeravalli, “Bayesian quickest tran-sient change detection,” in Proceedings of Fifth International Workshopon Applied Probability (IWAP) , Jul. 2010.[24] V. G. Repin, “Detection of a signal with unknown moments of appear-ance and disappearance,” Problems of Information Transmission , vol. 27,no. 1, pp. 61–72, Jan. 1991.[25] M. A. Richards, Fundamentals of Radar Signal Processing , ser. 2ndedition. USA: McGraw-Hill Education Europe, 2014.[26] G. Rovatsos, S. Zou, and V. Veeravalli, “Quickest change detectionunder transient dynamics,” in Proceedings of the IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP) , Mar.2017, pp. 4785–4789.[27] G. Rovatsos, S. Zou, and V. V. Veeravalli, “Sequential algorithms formoving anomaly detection in networks,” Sequential Analysis , vol. 39,no. 1, pp. 6–31, May 2020.[28] A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory of Probability and its Applications , vol. 8, no. 1, pp. 22–46, Jan.1963.[29] ——, Optimal Stopping Rules , ser. Series on Stochastic Modelling andApplied Probability. New York, USA: Springer-Verlag, 1978, vol. 8.[30] A. G. Tartakovskii, “Optimal detection of random-length signals,” Prob-lems of Information Transmission , vol. 23, no. 3, pp. 203–210, Jul. 1987.[31] ——, “Detection of signals with random moments of appearance anddisappearance,” Problems of Information Transmission , vol. 24, no. 2,pp. 39–50, Apr. 1988.[32] A. G. Tartakovsky, “Rapid detection of attacks in computer networks byquickest changepoint detection methods,” in Data Analysis for NetworkCyber-Security , N. Adams and N. Heard, Eds. London, UK: ImperialCollege Press, 2014, pp. 33–70.[33] ——, Sequential Change Detection and Hypothesis Testing: GeneralNon-i.i.d. Stochastic Models and Asymptotically Optimal Rules , ser.Monographs on Statistics and Applied Probability 165. Boca Raton,London, New York: Chapman & Hall/CRC Press, Taylor & FrancisGroup, 2020.[34] A. G. Tartakovsky and J. Brown, “Adaptive spatial-temporal filteringmethods for clutter removal and target tracking,” IEEE Transactions onAerospace and Electronic Systems , vol. 44, no. 4, pp. 1522–1537, Oct.2008.[35] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville, Sequential Anal-ysis: Hypothesis Testing and Changepoint Detection , ser. Monographson Statistics and Applied Probability 136. Boca Raton, London, NewYork: Chapman & Hall/CRC Press, Taylor & Francis Group, 2015.[36] A. G. Tartakovsky, A. S. Polunchenko, and G. Sokolov, “Efficient com-puter network anomaly detection by changepoint detection methods,” IEEE Journal of Selected Topics in Signal Processing , vol. 7, no. 1, pp.4–11, Feb. 2013.[37] A. G. Tartakovsky, “Discussion on “Is average run length to false alarmalways an informative criterion?” by Yajun Mei,” Sequential Analysis ,vol. 27, no. 4, pp. 396–405, Oct. 2008.[38] A. G. Tartakovsky, B. L. Rozovskii, R. B. Bla´zek, and H. Kim,“Detection of intrusions in information systems by sequential change-point methods,” Statistical Methodology , vol. 3, no. 3, pp. 252–293, Jul.2006.[39] S. Zou, G. Fellouris, and V. Veeravalli, “Asymptotic optimality of d-cusum for quickest change detection under transient dynamics,” in Pro-ceedings of the IEEE International Symposium on Information Theory(ISIT) , Jun. 2017, pp. 156–160. Alexander G. Tartakovsky (M’01-SM’02) researchinterests include theoretical and applied statistics;applied probability; sequential analysis; changepointdetection phenomena; and a variety of applicationsincluding statistical image and signal processing;video tracking; detection and tracking of targetsin radar and infrared search and track systems;near-Earth space informatics; information integra-tion/fusion; intrusion detection and network security;and detection and tracking of malicious activity. Heis the author of three books, several book chap-ters, and over 100 papers. Dr. Tartakovsky is a Fellow of the Institute ofMathematical Statistics and a senior member of IEEE. He received severalawards, including a Dr.Tartakovsky obtained a Ph.D. degree and an advanced Doctor-of-Sciencedegree both from Moscow Institute of Physics and Technology, Russia(FizTech). During 1981–92, he was first a Senior Research Scientist andthen a Department Head at the Institute of Radio Technology (Moscow,Russian Academy of Sciences) as well as a Professor at FizTech, workingon the application of statistical methods to optimization and modeling ofinformation systems. From 1993 to 1996, Dr. Tartakovsky worked at theUniversity of California, Los Angeles (UCLA), first in the Department ofElectrical Engineering and then in the Department of Mathematics. From1997 to 2013, he was a Professor in the Department of Mathematics andthe Associate Director of the Center for Applied Mathematical Sciences atthe University of Southern California (USC), Los Angeles. From 2013 to2015, he was a Professor of Statistics in the Department of Statistics at theUniversity of Connecticut, Storrs. Currently, he is the Deputy Head of theSpace Informatics Laboratory at FizTech as well as Vice President of AGTStatConsult, Los Angeles, California. Nikita R. Berenkov is a postgraduate student atthe Moscow Institute of Physics and Technology(MIPT), Russia and engineer in the Space informat-ics Laboratory at MIPT. Alexey E. Kolessa research interests include non-linear filtering, object detection and tracking, videosurveillance, image and signal processing, near-Earth space informatics. He has over than 50 pub-lications. Dr. Kolessa is the Principal Scientist inthe Space informatics Laboratory at the MoscowInstitute of Physics and Technology (MIPT), Russiaas well as the deputy chair of the InformationSystems Department, MIPT. Igor V. Nikiforov is Emeritus Professor at theUniversit´e de Technologie de Troyes, France. Heis currently active or has demonstrated activity inthe past in the following areas: sequential changedetection and isolation (multiple hypotheses case);statistical signal detection with nuisance parameters;statistical fault detection and isolation; sequentialanalysis; multivariable control theory and statisticalprocess control. The results of his research havefound implementations in the areas of navigation;hidden information/channel detection; safety-criticalsystem monitoring; network anomaly detection; digital parametric tomog-raphy; seismic signal processing and tsunami warning systems. He is theauthor/co-author of five books, 18 chapters, 56 journal papers and 183conference papers/presentations. He received a2017 Abraham Wald Awardin Sequential Analysis.