Learning the smoothness of noisy curves with application to online curve estimation
LLearning the smoothness of noisy curves with applicationto online curve estimation
Steven Golovkine ∗ Nicolas Klutchnikoff † Valentin Patilea ‡ September 9, 2020
Abstract
Combining information both within and across trajectories, we propose a sim-ple estimator for the local regularity of the trajectories of a stochastic process.Independent trajectories are measured with errors at randomly sampled timepoints. Non-asymptotic bounds for the concentration of the estimator are de-rived. Given the estimate of the local regularity, we build a nearly optimal localpolynomial smoother from the curves from a new, possibly very large sample ofnoisy trajectories. We derive non-asymptotic pointwise risk bounds uniformlyover the new set of curves. Our estimates perform well in simulations. Real datasets illustrate the effectiveness of the new approaches.
More and more phenomena in modern society produce observation entities in the formof a sequence of measurements recorded intermittently at several discrete points intime. Very often the measurements are noisy and the observation points in time areneither regularly distributed nor the same across the entities. Functional data analysis(FDA) considers such data as being values on the trajectories of a stochastic process,recorded with some error, at discrete random times. One of the main purposes ofthe FDA is to recover the trajectories, also called curves or functions, at any point intime. See, e.g. , [19, 13, 23, 25] for some recent references. Whatever the approach forrecovering the curve is, in the existing literature it is usually assumed that, for eachcurve, a certain number of derivatives exist. However, many applications, some ofthem presented in the following, indicate that assuming that the curves admit second,third,... order derivatives is not realistic. Assuming that the curves to be reconstructedare smoother than they really are could lead to missing important information carriedby the data. In this contribution, we propose a definition of the local regularity of thecurves which could be easily estimated from the data and used to estimate the curves.To formalize the framework, let I ⊂ R be a compact interval of time. We consider N functions X (1) , . . . , X ( n ) , . . . , X ( N ) generated as a random sample of a stochasticprocess X = ( X t : t ∈ I ) with continuous trajectories. For each ≤ n ≤ N , and ∗ Groupe Renault & CREST - UMR 9194, Rennes, France, [email protected] † Univ Rennes, CNRS, IRMAR - UMR 6625, F-35000 Rennes, France, nicolas.klutchnikoff@univ-rennes2.fr ‡ Ensai, CREST - UMR 9194, Rennes, France, [email protected] a r X i v : . [ m a t h . S T ] S e p iven a positive integer M n , let T ( n ) m , ≤ m ≤ M n , be the random observation timesfor the curve X ( n ) . These times are obtained as independent copies of a variable T taking values in I . The integers M , . . . , M N represent an independent sample of aninteger-valued random variable M with expectation µ which increases with N . Thus M , . . . , M N is the N th line in a triangular array of integer numbers. We assume thatthe realizations of X , M and T are mutually independent. The observations associatedwith a curve, or trajectory, X ( n ) consist of the pairs ( Y ( n ) m , T ( n ) m ) ∈ R × I where Y ( n ) m is defined as Y ( n ) m = X ( n ) ( T ( n ) m ) + ε ( n ) m , ≤ n ≤ N, ≤ m ≤ M n , (1)and ε ( n ) m are independent copies of a centered error variable ε . For the sake of readabil-ity, here and in the following, we use the notation X t for the value at t of the genericprocess X and X ( n ) ( t ) for the value at t of the realization X ( n ) of X . The N − sampleof X is composed of two sub-populations: a learning set of N curves and a set of N curves to be recovered that we call the online set. Thus, ≤ N , N < N and N + N = N . Let X (1) , . . . , X ( N ) denote the curves corresponding to the learning set.Our first aim is to define a meaningful concept of local regularity for the trajectoriesof X and to build an estimator for it. The estimator could be computed easily andrapidly from the observations ( Y ( n ) m , T ( n ) m ) corresponding to the curves in the learning set, and does not require a very large number N of curves. Moreover, it could be easilyupdated if more curves are added to the learning set. The problem of estimatingthe regularity of the trajectories is related to the estimation of the Hausdorff, orfractal, dimension of time series. See, for instance, [3, 2, 8] and the references therein.However, herein, we adopt the FDA point of view and use the so-called replication and regularization features of functional data (see [19], ch.22). More precisely, we combineinformation both across and within curves. Thus, taking strength from the informationcontained in the whole set of N available time series, we are able to investigate moregeneral situations: X need not to be a Gaussian, or a transformed Gaussian process,it is not necessarily stationary or with stationary increments, it could have a fractaldimension which changes over time, it is observed with possibly heteroscedastic noise,at random moments in time.Based on the regularity estimates, our second objective is to build an adaptive,nearly optimal smoothing for a possibly very large set of N new curves. Let X [1] = X ( N +1) , . . . , X [ N ] = X ( N ) , denote the curves from the online set to be recovered from the corresponding obser-vations ( Y ( n ) m , T ( n ) m ) . In the following, t ∈ I is an arbitrarily fixed point. The aimis thus to estimate X [1] ( t ) , . . . , X [ N ] ( t ) . This issue is a nonparametric estimationproblem and, if each curve regularity is given, nonparametric estimators of the curves X [1] , . . . , X [ N ] could be easily built, for instance using the local linear smoother orthe series estimator. Nevertheless in applications, there is no reason to suppose thatthe sample paths of the random process X have a known regularity at t . When it isnot reasonable to assume a given regularity for the trajectories, one could use one ofthe existing data-driven procedures for determining the optimal smoothing parameter.However, the existing procedures, such as the cross-validation or the Goldenshluger-Lepski method [9], were designed for the case where one observes only one curve. Thus2ne has to apply them for each curve separately, which could require large amounts ofresources.In Section 2, we define the local regularity and provide concentration bounds forthe estimator of the local regularity of the trajectories of X . Our results are new andof non-asymptotic type, in the sense that they hold for any values of the sample sizes N and the mean value of observation times µ , provided these values are sufficientlylarge. In Section 3, we explain the relationship between the probabilistic concept oflocal regularity for the trajectory of X and the analytic regularity of the curves whichusually determines the optimal risk rate in nonparametric estimation. Given the es-timate of the local regularity of the trajectories of X , in Section 3, we also provide anon-asymptotic bound for the pointwise risk of the local polynomial smoother, uni-formly over the online set. This uniform bound is obtained using an exponential-typemoment bound for the pointwise risk for the local polynomial smoother, a new resultof interest in itself. The pointwise risk bound is optimal, in the nonparametric regres-sion estimation sense, up to some logarithmic factors induced by our stochastic curvesmodel, the concentration of the local regularity estimator, and the uniformity over the online set. In Section 4, we provide some additional guidance for the implementationof the local polynomial smoother and report results from simulation showing that ourestimator perform well. A real data application on vehicle traffic flow analysis illus-trates the effectiveness of our approaches. The proofs of our results are postponed tothe Appendices. Additional technical proofs, simulation results, and details on trafficflow application are relegated to the Supplementary Material. To further illustrate theirregularity of the curves in applications, we also report in the Supplementary Materialthe local regularity estimates for another three functional data sets often analyzed inthe literature. The new local regularity estimator is introduced and studied in this section. Afterproviding some insight into the ideas behind the construction, we provide a concen-tration result for our estimator under general mild assumptions which do not imposea specific distribution for X . In particular, X could, but need not, be a Gaussianprocess. The case where the variance of the noise is not constant is also discussed. Let us present the main ideas behind the construction of the regularity estimate. Forthis, let us introduce some more notation used throughout the paper. Let K be aninteger value which will be defined below, and consider the order statistics of a M -sample T , . . . , T M distributed as T which admits the density f . Let t ∈ I such that f ( t ) > . We extract the subvector of the K closest values to t and denote thesevalues T (1) ≤ . . . ≤ T ( K ) . If t = inf( I ) then t ≤ T (1) , while if t = sup( I ) , then T ( K ) ≤ t . When t is an interior point of I , t likely lies between T (1) and T ( K ) .Next, we define the interval J µ ( t ) = (cid:0) t − | I | / log( µ ) , t + | I | / log( µ ) (cid:1) ∩ I, where | I | denotes the length of the interval I and, recall, µ is the expectation of M .3e assume that the process X generating the continuous curves X (1) , . . . , X ( N ) satisfies E (cid:2) ( X u − X v ) (cid:3) ≈ L t | v − u | H t , u, v ∈ J µ ( t ) , (2)for some H t ∈ (0 , . Here and in the following, ≈ means the left-hand side is equal tothe right-hand side times a quantity which tends towards 1 when µ increases. Whenthe trajectories of X are not differentiable, H t is what we call the local regularityof the process X at t . For now, we focus on this case. When, with probability 1,the trajectories of X admits derivatives of order d ≥ in a neighborhood of t , theproperty (2) will be used for the derivative of order d of the smooth trajectories. Inthis smooth case, the local regularity of the process X at t will be d + H t . SeeSection 2.3.To construct our estimator of H t , we consider the event B = { M ≥ K , T (1) ∈ J µ ( t ) , . . . , T ( K ) ∈ J µ ( t ) } , which is expected to be of high probability. Let B denote the indicator of B and letus define the expectation operator E B ( · ) = E ( · B ) . Using (2) and the independence between X and T , for any ≤ k < l ≤ K , E B h ( X T ( l ) − X T ( k ) ) i ≈ L t E B (cid:0) | T ( l ) − T ( k ) | H t (cid:1) . From this and the moments of the spacing T ( l ) − T ( k ) as given in the Lemma 2, weobtain E B h ( X T ( l ) − X T ( k ) ) i ≈ L t (cid:18) l − kf ( t )( µ + 1) (cid:19) H t . Now, for any ≤ k ≤ K , let ε ( k ) be a generic error term corresponding to the genericrealization X T ( k ) , and denote Y ( k ) = X T ( k ) + ε ( k ) . Moreover, for k such that k − ≤ K , let θ k = E B (cid:2) ( Y (2 k − − Y ( k ) ) (cid:3) . We then obtain θ k − σ L t ≈ (cid:18) k − f ( t )( µ + 1) (cid:19) H t . (3)We distinguish two situations : the case where σ is known and the case where it isunknown. In the former case, we suppose that k − is also less than K and usetwice the relationship (3) with k and k − , respectively. We deduce θ k − − σ θ k − σ ≈ H t . Taking the logarithm on both sides, we obtain the proxy value H t ( k, σ ) = log( θ k − − σ ) − log( θ k − σ )2 log 2 ,
4f the local regularity parameter H t , when σ is given. In the case where σ isunknown, assuming that k − ≤ K , we use the relationship (3) three times with k , k − and k − , respectively, to obtain θ k − − θ k − θ k − − θ k ≈ H t . A natural proxy of H t is then given by H t ( k ) = log( θ k − − θ k − ) − log( θ k − − θ k )2 log 2 . (4)Our estimator of the local regularity parameter H t is the empirical version of theproxy value H t ( k ) , or H t ( k, σ ) , built from a random sample of N trajectories of X , the learning set of curves. Formally, we consider the sequence of events B n = B n ( µ, N ) = n M n ≥ K , T ( n )(1) ∈ J µ ( t ) , . . . , T ( n )( K ) ∈ J µ ( t ) o , ≤ n ≤ N , (5)and we define ˆ θ k − = 1 N N X n =1 (cid:2) Y ( n )(4 k − − Y ( n )(2 k − (cid:3) B n and ˆ θ k = 1 N N X n =1 (cid:2) Y ( n )(2 k − − Y ( n )( k ) (cid:3) B n , (6)where, for any n and k , Y ( n )( k ) denotes the noisy measurement of X ( n ) ( T ( n )( k ) ) . If H t ( k, σ ) is indeed a good approximation of H t , a simple estimator of H t when σ is known is then b H t ( k, σ ) = log(ˆ θ k − − σ ) − log(ˆ θ k − σ )2 log 2 if min(ˆ θ k − , ˆ θ k ) > σ otherwise . When σ is unknown the corresponding estimator is b H t ( k ) = log(ˆ θ k − − ˆ θ k − ) − log(ˆ θ k − − ˆ θ k )2 log 2 if ˆ θ k − > ˆ θ k − > ˆ θ k otherwise , (7)where ˆ θ k − is obtained from the formula of ˆ θ k − after replacing k by k − .It is worth noting that our estimator could be easily updated every time new curvesare included in the learning sample, without revisiting the learning set already used.Indeed, one should only add new terms in the sums defining ˆ θ k , ˆ θ k − and ˆ θ k − . Below, we focus on the more complicated and realistic case with unknown variance.The case with given variance could be treated after obvious adjustments. The results inthis section depend on µ , the mean number of observation times T , and the cardinality N of the learning set of curves. However, they are non-asymptotic in the sense thatthey hold true for any sufficiently large µ and N satisfying our conditions. Wheneverit exists, let X ( d ) u denote the d − th derivative, d ≥ , of the generic curve X u at thepoint u . By definition X (0) u ≡ X u . For deriving our results, we impose the followingmild assumptions. 5H1) The data consist of the pairs ( Y ( n ) m , T ( n ) m ) ∈ R × I defined as in (1), with I ⊂ R acompact interval, and the realizations of X , M and T are mutually independent.(H2) The random variable T admits a density f : I → R such that f ( t ) > . More-over, there exist L f > and < β f ≤ such that | f ( u ) − f ( v ) | ≤ L f | u − v | β f , ∀ u, v ∈ J µ ( t ) . (H3) For some integer d ≥ , there exist a function φ t ( · , · ) > , the constants L t , L φ > and < β φ ≤ such that, for any u, v ∈ J µ ( t ) , we have E h ( X ( d ) u − X ( d ) v ) i = L t | u − v | H t { φ t ( u, v ) } and | φ t ( u, v ) | ≤ L φ | u − v | β φ . (H4) For d ≥ in Assumption (H3), two constants a , A > exist such that E (cid:2) | X ( d ) u − X ( d ) v | p (cid:3) ≤ (cid:18) p !2 aA p − (cid:19) | u − v | pH t , ∀ p ≥ , ∀ u, v ∈ J µ ( t ) . (H5) The variables ε ( n ) m , n, m ≥ , are independent copies of a centered variable ε ,with finite variance σ , for which constants b ≥ σ > and B > exist suchthat E ( | ε | p ) ≤ p !2 bB p − , ∀ p ≥ . (H6) The random variable M is such that M ≥ and γ > exists such that, for any s > , P ( | M − µ | > s ) ≤ exp( − γ s ) . Assumption (H2) imposes a mild condition on the distribution of the random ob-servation points which provides convenient moment bounds for their spacings. In par-ticular, it implies that, for a sufficiently large µ , f ( t ) / ≤ f ( t ) ≤ f ( t ) , ∀ t ∈ J µ ( t ) .Assumption (H3) is a version of the so-called local stationarity condition, here consid-ered for the d − th derivative process. More precisely, (H3) implies that the trajectoriesof X ( d ) = ( X ( d ) u : u ∈ I ) are Hölder continuous in quadratic mean in the neighborhoodof t , with exact exponent H t and local Hölder constant L t . Let us call ς t = d + H t the local regularity of the process X at t . Examples include, but are not limited to,stationary or stationary increment processes X . See, e.g. , [1] for some examples andreferences on processes satisfying the mild condition in (H3). Assumptions (H4) and(H5) are needed for deriving exponential bounds for the concentration of our localregularity estimator, while (H6) is a mild condition for controlling the variability ofnumber of observation points on the curves. The lower bound on M guarantees thateach curve in the learning set has a sufficient number of observation times for buildingour estimator.We first consider the case d = 0 , the general case being analyzed in Section 2.3.For a real number a , let b a c denote the largest integer not exceeding a . Theorem 1.
Let Assumptions (H1)–(H6) hold true with d = 0 . Let µ and K bepositive integers such that ( µ + 1) βf α βf α ≤ K ≤ µ µ ) , (8)6 ith α = 2 H t + β φ ∈ (0 , . Let c (cid:18) K − f ( t )( µ + 1) (cid:19) min( β φ ,β f H t / < (cid:15) < , (9) with c a constant depending only on L f , β f , β φ , f ( t ) and H t . Define k = b ( K +7) / c and let b H t = b H t ( k ) be defined as in (7) . Then, for a sufficiently large µ : P (cid:16)(cid:12)(cid:12) b H t − H t (cid:12)(cid:12) > (cid:15) (cid:17) ≤
12 exp " − f N (cid:15) (cid:18) k − f ( t )( µ + 1) (cid:19) H t , where f is a positive constant depending on a , A , b , B and the length of the interval I . To obtain a non-trivial estimator of H t , we need k ≥ , thus the upper bound in(8) should be larger than 9, and this happens as soon as µ ≥ . The exact expressionsof the constants c and f could be traced in the proof of Theorem 1. The conditionimposed on K provides a panel of choices depending on N and µ . As a result, up tosome constants, and depending on L f , β f , β φ , f ( t ) and H t , the concentration rate (cid:15) ,one could expect, could be in a range such that (cid:15)µ (cid:29) and (cid:15) log / ( µ ) (cid:28) . The bestpossible concentration of b H t is guaranteed as soon as N is larger than some power of µ , while for a concentration as fast as some negative power of log( µ ) , one only needsa small number N of curves in the learning set, that is larger than some power of log( µ ) .For the purpose of building an adaptive optimal kernel estimator for the trajectoriesof X , we will impose (cid:15) ≤ log − ( µ ) and an exponential bound equal to exp( − µ ) .The following corollary proposes a data-driven choice of K which guarantees theserequirements. This choice is guided by the fact that, for any constants a, b > , wehave the relationship log a ( µ ) ≤ exp((log log( µ )) ) ≤ µ b , provided µ is sufficiently large. Corollary 1.
Assume the conditions of Theorem 1 hold true. Let b µ = N − P N n =1 M n , b K = b b µ exp( − (log log( b µ )) ) c , and b H t = b H t ( b ( b K + 7) / c ) , with b H t defined in (7) . Then, for any constant C > , P (cid:16)(cid:12)(cid:12) b H t − H t (cid:12)(cid:12) > C log − ( µ ) (cid:17) ≤ exp( − µ ) , provided N ≥ µ b for some b > and µ is sufficiently large. One could also build b H t with only one trajectory of a stochastic process X withstationary increments. If the density of T is uniform and sufficiently many measure-ments are available, it suffices to split the interval [0 , into N intervals of the samelength and apply our methodology considering the measuring times and the noisymeasured values in each block as belonging to a different curve in the learning set.Theorem 1 and Corollary 1 remain valid. We can extend our learning approach for the local regularity of the trajectories of X to the case of smooth trajectories, i.e. , when almost all trajectories admit derivatives.7or s ∈ (0 , , let X ( d , s ) denote the class of stochastic processes X such that, withprobability 1, the trajectories admit derivatives of order d ≥ and X ( d ) = ( X ( d ) u : u ∈ I ) satisfies the Assumptions (H3) and (H4) with H t = s . Let us point outthat if X ∈ X ( d , s ) , for some d ≥ and s ∈ (0 , , and the first derivative X t hasan exponential moment, i.e. , E [exp( cX t )] < ∞ for some c > , then X ∈ X (0 , .Moreover, X cannot belong to any of the classes X (0 , s ) with s ∈ (0 , . Therefore,when d ≥ , our estimator b H t is expected to concentrate to the value 1. Meanwhile,when X belongs to X (0 , H t ) for some H t ∈ (0 , , b H t concentrates to H t . Thisremark suggests a simple check of the composite null hypothesis H : X ∈ X (0 , H t ) for some H t ∈ (0 , , against the alternative hypothesis H : X ∈ X ( d , s ) for some d ≥ and s ∈ (0 , , defined by the rule rejects H if b H t > − log − ( b µ ) , where b H t = b H t ( b ( b K + 7) / c ) is defined as in (7) with b K = b b µ exp( − (log log( b µ )) ) c and b µ = N − P N n =1 M n . The following result guarantees that this test procedure isconsistent.
Corollary 2.
Assume that log − ( µ ) µ ≤ M ≤ µ log( µ ) almost surely. Under theconditions of Corollary 1, if N ≥ µ b , for some b > , and sufficiently large µ , thenunder H , P (cid:16) b H t > − log − ( b µ ) (cid:17) ≤ exp( − µ ) , whereas under H , if E [exp( cX t )] < ∞ for some c > , P (cid:16) b H t > − log − ( b µ ) (cid:17) ≥ − exp( − µ ) . Corollary 2 is a direct consequence of Corollary 1 and of the mild simplifyingcondition on the support of M , which implies that log( µ ) / ≤ log( b µ ) ≤ µ ) . Wehence omit the details. In the case of smooth trajectories, the first step is to detectthat the trajectories are differentiable. Corollary 1 indicates that, for a suitable choiceof b K as a function of b µ , the event { b H t ( b ( b K + 7) / c ) > − log − ( b µ ) } has a very lowprobability, provided that X belongs to X (0 , H t ) and N is as large as a power largerthan 1 of µ . Based on Corollary 2, in practice, one could simply check the condition b H t ( b ( b K + 7) / c ) ≤ − log − ( b µ ) . If this condition fails, ideally one would like to consider the first derivative trajectoriesof X , build a new estimator b H t with these trajectories, and again test H with X (1 , H t ) instead of X (0 , H t ) . If H is not rejected, one could define b ς t = 1 + b H t ,while, if H is rejected, one would consider the second order derivative trajectories of X , build a new estimator b H t with these trajectories, and so on. Since the derivativesof the trajectories of X are not available, in Section 4.1, we propose a sequentialalgorithm to estimate them, d and H t , in the case of local regularities ς t larger than1. 8 .4 The case of conditionally heteroscedastic noise In some applications, the assumption of constant variance for the error term ε couldbe unrealistic. Therefore, we consider the following conditional heteroscedastic errorextension of model (1): Y ( n ) m = X ( n ) ( T ( n ) m ) + σ (cid:16) X ( n ) ( T ( n ) m ) , T ( n ) m (cid:17) u ( n ) m , ≤ n ≤ N, ≤ m ≤ M n , (10)where σ ( · , · ) is some unknown function and u ( n ) m are independent copies of a centeredvariable u with unit variance.Our approach also applies to the model (10) under some additional mild conditions.Indeed, assuming the expectations exist, we have θ k = E B (cid:2) ( Y (2 k − − Y ( k ) ) (cid:3) = E B h ( X T (2 k − − X T ( k ) ) i + E B h σ (cid:16) X T (2 k − , T (2 k − (cid:17)i + E B h σ (cid:16) X T ( k ) , T ( k ) (cid:17)i . From this identity it is clear that the arguments presented in Section 2.1 remain validas long as the last two expectations on the right-hand side of the last display are equaland their value does not depend on k . Thus, in this case, even if the conditionalvariance of ε ( n ) m is not given, we could consider the same estimator b H t . This remarkleads us to the following additional assumption.(E1) The variables u ( n ) m from model (10) satisfy the Assumption (H5) with unit vari-ance. Moreover, the function σ ( · , · ) is bounded and the map u E (cid:2) σ ( X u , u ) (cid:3) , u ∈ I , is constant in a fixed neighborhood of t .Assumption (E1) allows the error term to be conditionally heteroscedastic, butimposes marginal (unconditional) homoscedasticity in a neighborhood of t . UnderAssumption (E1), for any k we have E B h σ ( X T ( k ) , T ( k ) ) i = E h E (cid:16) σ ( X T ( k ) , T ( k ) ) | M, T , T , . . . , T M (cid:17) B i = E (cid:2) σ ( X u , u ) (cid:3) P ( B ) , and thus the terms like E B [ σ ( X T ( k ) , T ( k ) )] cancel when considering the differences θ k − − θ k − and θ k − − θ k . Corollary 3.
Assume the observations consist of the pairs ( Y ( n ) m , T ( n ) m ) ∈ R × I where Y ( n ) m defined as in (10) and the realizations of X , M and T are mutually independent.Assume that Assumptions (H2)–(H4), (H6), (E1) hold. Then Corollaries 1 and 2remain valid with the same local regularity estimator b H t . The proof of Corollary 3 could be obtained from the proof of Theorem 1 after obvi-ous modifications, and hence will be omitted. It is worthwhile noting that, even if theregularity H t is the same at any point t , one may not be able to estimate the regu-larity H t using only one observed noisy trajectory with conditionally heteroscedasticnoise. This because, intuitively, it might be impossible to identify the oscillations ofthe signal of interest, that is to separate the increments of the trajectory of X from the9ifferences of the error terms with variable variance. With our approach based on lo-cal observed increments averaged over several curves, the effect of the noise vanishes,provided the expectation of the conditional variance is constant. Hence, eventuallythe identification of the oscillations of X is recovered and there is no difference withrespect to the case of homoscedastic errors. With at hand an estimate of the local regularity ς t = d + H t obtained from a learningset of N curves, we aim at recovering N new noisy trajectories of X from what wecall the online dataset. One of the most popular smoother is the local polynomialestimator, see [5]. This type of estimator crucially depends on a tuning parameter,the bandwidth, which should ideally be chosen according to the unknown regularityof the target function.One has to connect a definition of local regularity that is meaningful from thetheory of stochastic processes to the usual definition of function regularity used innonparametric curve estimation. Fortunately, in our framework, the parameter ς t ,which is understood as the local regularity of the process ( X t : t ∈ J µ ( t )) in quadraticmean , see (2), is intrinsically linked with the regularity of the sample paths of theprocess. Indeed, in many important situations, which are covered by our assumptions,the regularity of the sample paths of a process does not depend on the realization ofthis process. For example, the regularity of any Brownian path is / , in the sense thatfor any (cid:15) > , almost surely the sample path belongs to the Hölder space C / − (cid:15) ( I ) and does not belong to C / (cid:15) ( J ) whatever J ⊂ I . Here, for any a > , C a ( I ) denotesthe space of uniformly a − Hölder continuous functions defined on I , see Theorem 2.2and Corollary 2.6 of [20] for precise definition. More generally the regularity of thesample paths of a process is linked to integrated regularities through the Kolmogorov’sContinuity Theorem [20, Theorem 2.1]. In particular, Assumption (H3) ensures that,with probability 1, the trajectories of the process ( X ( d ) t : t ∈ J µ ( t )) are Höldercontinuous with any exponent parameter < a < H t .Below, we define the local polynomial estimator and derive its theoretical prop-erties. Since our focus of interest is the simultaneous denoising of the additional N curves, we consider the following pointwise risk: for a generic estimator b X [ n ] t of X [ n ] t ,let R ( b X ; t ) = E (cid:20) max ≤ n ≤ N (cid:12)(cid:12)(cid:12) b X [ n ] t − X [ n ] t (cid:12)(cid:12)(cid:12) (cid:21) . (11)First, we provide a sharp bound for this risk with N = 1 , in the case where a suitableestimator of ς t = d + H t , computed from another independent sample, is given.Such a result, of interest in itself in nonparametric curve estimation, seems to be new.In this case, the expectation defining the risk R ( b X ; t ) should be understood as theconditional expectation given the estimator of ς t . Next, we provide a sharp boundfor R ( b X ; t ) in the case where N ≥ and the estimator of ς t is obtained using theapproach introduced in Section 2. 10 .1 Local polynomial estimation We assume that d ≥ is an integer and H t ∈ (0 , . Let ˆ d and b H t be some genericestimators of d and H t , respectively, and let b ς t = ˆ d + b H t be the correspondingestimator of ς t = d + H t . We assume that ˆ d and b H t are independent of the N fromthe online dataset, generated according to (1).The estimator of ς t could be used to smooth any curve Y [ n ] ( n = 1 , . . . , N )from the online dataset. For the sake of readability, we omit the superscript [ n ] andwe consider a generic curve from the online dataset: Y m = X ( T m ) + ε m , ≤ m ≤ M. For any u ∈ R , we consider the vector U ( u ) = (1 , u, . . . , u ˆ d / ˆ d !) . Let K : R → R be apositive kernel and define: ϑ M,h = arg min ϑ ∈ R d +1 M X m =1 (cid:26) Y m − ϑ > U (cid:18) T m − t h (cid:19)(cid:27) K (cid:18) T m − t h (cid:19) , where h is the bandwidth. The vector ϑ M,h satisfies the normal equations Aϑ M,h = a with A = A M,h = 1
M h M X m =1 U (cid:18) T m − t h (cid:19) U > (cid:18) T m − t h (cid:19) K (cid:18) T m − t h (cid:19) (12) a = a M,h = 1
M h M X m =1 Y m U (cid:18) T m − t h (cid:19) K (cid:18) T m − t h (cid:19) . (13)Let λ be the smallest eigenvalue of the matrix A and remark that, whenever λ > ,we have ϑ M,h = A − a .Taking into account the expression of the bandwidth minimizing the pointwisemean squared risk for a regression function defined on I , with derivative of order d which is Hölder continuous in a neighborhood of t , with exact exponent H t , weconsider the bandwidth b h = (cid:18) M (cid:19) / (2 b ς t +1) . Our focus of interest is on determining a nearly optimal rate of the bandwidth to beused to recover the trajectories of X . For the applications, one could also be interestedin a nearly optimal constant, which in general needs to be estimated. In Section 4.1we propose a simple way to estimate a suitable constant for the applications.With at hand the bandwidth b h , we propose the following definition of the localpolynomial estimator of X t of order d : ˆ X t = U > (0) b ϑ if λ > log − ( M ) and | U > (0) b ϑ | ≤ b τ / ( M ) b τ / ( M ) if λ > log − ( M ) and | U > (0) b ϑ | > b τ / ( M )0 otherwise, , where b ϑ = ϑ M, b h and, for any y > , b τ ( y ) = 1log ( y ) (cid:18) y log( y ) (cid:19) b ς t / (2 b ς t +1) . b τ / ( M ) is a technical device used to control the tails of ˆ X t . It has practically no influence in applications. For deriving our results on ˆ X t ,we impose the following mild assumptions.(LP1) There exist two positive constants, a and A , such that for any p ≥ E (cid:2) | X t | p (cid:3) ≤ p !2 aA p − , ∀ t ∈ [0 , . (LP2) We assume that, almost surely, µ/ log( µ ) ≤ M ≤ µ log( µ ) . (LP3) The estimator b H t satisfies the property P (cid:16)(cid:12)(cid:12) b H t − H t (cid:12)(cid:12) > log − ( µ ) (cid:17) ≤ K exp( − µ ) , ∀ µ > , where K is some positive constant.(LP4) The estimator ˆ d satisfies the property P (ˆ d = d ) ≤ K exp( − µ ) , ∀ µ > . Assumption (LP1) provides a suitably tight control on the moments of X t , butstill allows for unbounded trajectories. Assumption (LP2) is a convenient, but mild,technical condition. It could be relaxed at the price of controlling the probability ofthe complement of the event { µ/ log( µ ) ≤ M ≤ µ log( µ ) } , for instance using (H6).Assumptions (LP3) and (LP4) are very mild conditions that the generic estimatorsof the regularity should satisfy. Since µ / log ( µ ) = e / log( µ ) for any µ > , the con-centration of b H t at a suitable negative power of log( µ ) will suffice for the smoothingpurposes. For simplicity, and without loss of generality we consider the same constant K in Assumptions (LP3) and (LP4). Theorem 2.
Assume that Assumptions (H1), (H2), (H4)–(H6) and Assumptions (LP1)–(LP4) hold true and let K ( · ) be a kernel such that, for any t ∈ R : κ − [ − δ,δ ] ( t ) ≤ K ( t ) ≤ κ [ − , ( t ) , for some < δ < and κ ≥ . (14) There then exists a constant Γ such that for any µ ≥ , E " exp ((cid:18) τ ( µ ) (cid:12)(cid:12)(cid:12) ˆ X t − X t (cid:12)(cid:12)(cid:12) (cid:19) / ) ≤ Γ where τ ( µ ) = 1log ( µ ) (cid:18) µ log( µ ) (cid:19) ςt ςt . The bound on the exp( √ x ) − moment of the | ˆ X t − X t | seems a new result for localpolynomial estimators. For our purposes, it will entail a sharp bound for R ( b X ; t ) .More precisely, the price for considering a risk measure uniformly over the whole onlinedataset is very low, that is a multiplying factor as large as a power of log( N ) in the riskbound we derive below. [7] derived sharp bounds for all the moments of | ˆ X t − X t | .However, his bounds on the moments would induce a power of N as multiplying factorfor our risk bound, instead of the power of log( N ) . Theorem 3.
Assume that assumptions of Theorem 2 hold true, and let K be a kernelwhich satisfies (14) . There then exists a positive constant Γ such that R ( b X ; t ) = E (cid:20) max ≤ n ≤ N (cid:12)(cid:12)(cid:12) b X [ n ] t − X [ n ] t (cid:12)(cid:12)(cid:12) (cid:21) ≤ Γ log ( µ ) log (1+ N ) { log( µ ) } ςt ςt µ − ςt ςt .
12f all the trajectories X were in C ς t ( J µ ( t )) , ς t were known and N = 1 , therisk bound for R ( b X ; t ) would be of the usual nonparametric rate µ − ς t / (2 ς t +1) . Letus note that the fact that H t is not known does not have any consequence on therisk bound in Theorem 3. Indeed, since µ / log µ = e / log µ for any µ , the order ofthe risk bound does not change as soon as the probability of the event { ˆ d = d } ∩{| b H t − H t | ≤ / log µ } tends to 1. The log(1 + N ) factor is given by the maximumover the N curves in the online dataset. The factor { log( µ ) } ς t / (2 ς t +1) is due tothe concentration properties of M around its mean µ . This factor would not appearif M/µ is almost surely bounded and bounded away from zero. The factor log ( µ ) comes from probability theory. The trajectories of a stochastic process X with localregularity H t does not necessarily belong to C ς t ( J µ ( t )) but they are almost surelyin any C ς t − (cid:15) ( J µ ( t )) for any < (cid:15) < ς t .Finally, let us notice that Corollary 1 states that the estimator defined by (7)satisfies (H3) for d = 0 and any < H t < . This leads us to the following result. Corollary 4.
Assume d = 0 and let b H t be the estimator of < H t < defined inCorollary 1. Moreover, Assumptions (H1)–(H6) and Assumptions (LP1)–(LP2) holdtrue. If N ≥ µ b and N ≤ µ B for some b, B > , then R ( b X ; t ) ≤ Γ B log ( µ ) µ − Ht Ht . In the usual local polynomial (LP) smoothing framework, given a sample of size M , theoptimal bandwidth minimizing the pointwise mean squared error risk for a regressionfunction defined on I , with derivative of order d which is Hölder continuous in aneighborhood of t , with exact exponent H t and local Hölder constant L t , is h opt = (cid:18) CM (cid:19) / (2 ς t +1) with C = C t = σ t k K k b ς t c ! ς t L t R | K ( v ) || v | ς t dv , (15)where k K k = R K ( v ) dv and σ t is the variance of the noise that could depend on t . See for instance [22]. Based on the properties of the trajectories of X discussedabove, this is our target bandwidth. It depends on two more unknown quantities, L t and σ t , for which we now propose estimation procedures.The estimation of L t could be based on similar ideas as used for H t . For simplicity,we assume d = 0 . The extension to the case d ≥ could follow the same pattern as forthe estimation of the local regularity, using the trajectories of the derivatives. Usingtwice the relationship (3) with k and k − , respectively, we deduce L t ≈ θ k − − θ k H t − (cid:18) f ( t )( µ + 1) k − (cid:19) H t . On the other hand, using the approximation of the moments of the spacings, as givenin Lemma 2, we have η k − − η k := E B (cid:2) | T (4 k − − T (2 k − | H t (cid:3) − E B (cid:2) | T (2 k − − T ( k ) | H t (cid:3) ≈ (4 H t − (cid:18) k − f ( t )( µ + 1) (cid:19) H t . H t , the empirical counterparts of η k obtained from the learningset of N independent trajectories of X is b η k = 1 N N X n =1 (cid:12)(cid:12)(cid:12) T ( n )(2 k − − T ( n )( k ) (cid:12)(cid:12)(cid:12) b H t B n , where B n is the sequence of events defined in (5). An estimate of η k − could beobtained similarly. These facts lead us to the following estimator of the local Hölderconstant L t : b L t = b L t ( b H t ) = b θ k − − b θ k b η k − − b η k if b η k − > b η k and b θ k − > b θ k , otherwise . (16)For the implementation we propose k = b ( b K +7) / c with b K = b b µ exp (cid:0) − (log log b µ ) (cid:1) c and b µ = N − P N n =1 M n . To estimate the variance, we propose b σ = b σ t = 1 N N X n =1 |S n | X m ∈S n h Y ( n )( m ) − Y ( n )( m − i , (17)where S n ⊂ { , , . . . , M n } is a set of indices for the n − th trajectory and |S n | is thecardinal of S n . When the variance of the error ε is considered constant, one could take S n = { , , . . . , M n } . When it depends on t , one could take S n = n m : T ( n )(1) ≤ T ( n )( m ) ≤ T ( n )( b K ) o , with b K defined above. This is the choice we used in our empirical investigation.When the variance of the errors also depends on the realizations X u , as described inSection 2.4, in general it is no longer possible to consistently estimate σ ( X t , t ) . Oursimulation experiments indicate that the estimate (17) remains a reasonable choice.Finally, the constant involved in the definition of the bandwidth could be estimatedby b C obtained by plugging the estimates of the unknown quantities into the definitionof C in (15). Concerning the kernel, we use K ( t ) = (3 / (cid:0) − t (cid:1) [ − , ( t ) , that is theEpanechnikov kernel for which k K k = 3 / and R | K ( v ) || v | ς t dv = 3 { ( ς t + 1)( ς t +3) } − . An implementation of the method is available as a R package on Github at theURL adress: https://github.com/StevenGolovkine/denoisr . In this section, we illustrate the behavior of our local regularity estimator b ς t = ˆ d + b H t computed using the learning set of noisy curves, and the performance of kernelsmoother it induces for estimating the noisy curves from the online set. The procedurefor calculating b ς t is summarized in the following algorithm where LP ( d ) means localpolynomial smoother with degree d ≥ . The Nadaraya-Watson smoother correspondsto LP (0) . 14 lgorithm 1: Estimation of the local regularity ς t = d + H t Result:
Estimation of ς t from the learning set of N noisy curvesCalculate b µ = N − P N n =1 M n and b K = b b µ exp( − (log log( b µ )) ) c ;Calculate b H t and set ˆ d = 0 ; while b H t > − log − ( b µ ) do Calculate b L t (1) , as in (16), and b σ t ;Calculate b C with b ς t = ˆ d + b H t , b L t (1) and b σ t ;Calculate the bandwidth b h n = ( b C/M n ) / (2 b ς t +1) , ≤ n ≤ N ;Estimate the (ˆ d + 1) − th derivative of the trajectories of X with LP (ˆ d + 1) ;Calculate b H t using the estimated trajectories of the (ˆ d + 1) − th derivative;Set ˆ d = ˆ d + 1 ; end For the curve estimation, we use the observations ( Y [ n ]1 , T [ n ]1 ) , . . . , ( Y [ n ] M n , T [ n ] M n ) , ≤ n ≤ N , and LP (ˆ d ) with ˆ d delivered by Algorithm 1. The bandwidth is cal-culated as b h n = ( b C/M n ) / (2 b ς t +1) , ≤ n ≤ N , with b ς t obtained from Algorithm1. The constant estimate b C is the same for all curves in the online set, that is thatobtained with b ς t , b L t ( b H t ) and b σ t . We compare our approach with the classical cross-validation (CV) (least-squares leave-one-out) method applied for each curve X [ n ] sep-arately. For CV, we use the R package np [11], after rescaling the CV bandwidth toaccount for their different definition of the Epanechnikov kernel. At this stage, wewant to point out that our smoothing method is much faster than any standard,trajectory-by-trajectory approach, such as CV. We report a time comparison in theSupplementary Material, and as expected, the ratio between the times needed for CVand for our approach is at least of the same order as N . It is worth noting that onecannot follow an ad-hoc approach and transfer one CV bandwidth from a curve X [ n ] to another because H t is not known, and could even vary with t .The data are generated from the model (1) using different settings for X , thedistribution of T and the variance of the noise, as well as for N and N . For X , weconsider three types of Gaussian processes: fractional Brownian motion (fBm) withconstant Hurst parameter H ∈ (0 , , fBm with piecewise constant Hurst parameter,and integrated fBm. In the later case, X t = R t W H ( s ) ds , where W H denotes a fBmwith constant Hurst parameter H . The local regularity is constant for the first andthe third type, and variable for the second. The third type is an example of X withsmooth trajectories. We identify the setting for X by s ∈ { , , } . A more detaileddescription of these processes, as well as plots of their trajectories, are provided in theSupplementary Material. The number M of measuring times of a curve is a Poissonrandom variable with expectation µ , while for the measuring times T , we consideredeither a uniform distribution (identified by unif ), or a deterministic equispaced grid( equi ) on the range [0 , . For the noise, we considered the Gaussian distribution withboth constant and variable variance. The cases are identified by σ which could bea number or a list, respectively. The values of σ are chosen in such a way that thevariance ratio signal-to-noise remains almost unchanged. Thus, one simulation settingis defined by the -tuple ( s, N , N , µ, f, H, σ ) , with f ∈ { unif , equi } and H , the15 .00.51.01.52.0 N = 250 N = 1000 µ = 300 N = 250 N = 1000 µ = 1000 E q u i s p a ce d N = 250 N = 1000 N = 250 N = 1000 U n i f o r m H t = 0 . H t = 0 . H t = 0 . Figure 1:
Estimation of the local regularity for piecewise fBm, with constant noise variance σ = 0 . , at t = 1 / , / and / . True values: ς t = H t equal to . , . and . ,respectively. Hurst parameter, is a list in the case of fBm with piecewise constant local regularity.Below, we present the results for a few settings, complementary results are reported inthe Supplement. For each type of experiment, the reported results are obtained from500 replications of the experiment.Figure 1 presents the results for the local regularity estimation for piecewise fbMwith homoscedastic noise. The local estimations of H t are performed at t = 1 / , / and / which correspond to the middle of the interval for each regularity. The truevalues of H t are 0.4, 0.5 and 0.7, respectively. The results show a quite accurate esti-mator b H t and confirm the theoretical result on its concentration. Increasing either µ or N improves the concentration. The results for unif and equi are quite similar. Fig-ure 2 presents the estimation of ς t for different settings (3 , N , , µ, equi , . , . .As expected, our local regularity estimation approach also performs well for smoothtrajectories.Next, we present the results on the risk R ( b X ; t ) . Figure 3 presents the boxplots ofthe risk R ( b X ; t ) defined in (11) in the case of piecewise constant local regularity, withthree values of t , each one in the middle of the interval of the changes of regularityare defined. The results are quite good. Part of the curves with lower regularity areharder to estimate and thus results in higher risks than the more regular parts. Itappears that N and µ do not have the same influence on the risk as the estimationof the local regularity, and this is in line with the risk bound in Theorem 3. Thus,going from to sampling points leads to large improvement in terms of riskwhereas going from to curves in the learning dataset only results in little or noimprovement. Finally, it seems that the method achieves better results for equispacedsampling points.The same conclusions could be drawn from the results presented in Figure 4, ob-tained for the experiment defined by the -tuple (3 , , , , equi , . , . .16 = 300 µ = 1000 E s t i m a t i o n o f l o c a l r e g u l a r i t y ς t N = 250 N = 1000 Figure 2:
Estimation of the local regularity for integrated fBm, with constant noise variance σ = 0 . , at t = 0 . . True value: ς t = 1 . . Finally, we present a comparison with the CV. Because of the large amount of com-puting resources required by CV, we only considered a few cases. Figure 5 presentsthe results in terms of the risk calculated at t = 0 . for the simulation setting de-fined by the tuple (1 , , , , equi , . , . . We make the remark that ourmethod and CV perform similarly despite the fact that CV uses a specifically tailoredbandwidth for each curve in the online set. The homoscedastic setting is favorable toCV which, for a given curve, uses a global bandwidth at any t . Figure 6 presentsthe the heteroscedastic setting (2 , , , , equi , (0 . , . , . , . . CV pre-serves good performances when the local regularity varies moderately. Our methodshows close performance in this case, slightly better when H t = 0 . . Finally, Figure 7presents the results in the setting (3 , , , , equi , . , . . Again, CV andour method perform quite similarly. In this section, our method is applied to data from the Next Generation Simulation(NGSIM) study, which aims to “describe the interactions of multimodal travelers, ve-hicles and highway systems”, see [10]. This study is known to be one of the largestpublicly available source of naturalistic driving data. This dataset is widely used intraffic flow studies from the interpretation of traffic phenomena such as congestion tothe validation of models for trajectories prediction (see e.g. [4, 14, 24, 16, 12] for somerecent references). However, such data have been proved to be subject to measure-ment errors revealed by physical inconsistency between the space traveled, velocity andacceleration of the vehicles, cf. [18]. Montanino and Punzo [17] developed a trajectory-by-trajectory four-steps method to recover the signals from the noisy curves, and theirmethodology is now considered as a benchmark in the traffic flow engineering commu-nity for analyzing NGSIM data. The steps, finely tuned for the NGSIM data, are :1. removing the outliers; 2. cutting off the high- and medium-frequency responsesin the speed profile; 3. removing the residual nonphysical acceleration values, pre-serving the consistency requirements; 4. cutting off the high- and medium-frequencyresponses generated from step 3. The detailed description of these steps is providedin the Supplementary Material. 17 .00.10.20.30.40.5 µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . E q u i s p a ce d µ = 300 µ = 1000 µ = 300 µ = 1000 µ = 300 µ = 1000 U n i f o r m N = 250 N = 1000 Figure 3:
Estimation of the risks R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for piecewise fBm,with constant noise variance σ = 0 . . = 300 µ = 1000 R i s k N = 250 N = 1000 Figure 4: Estimation of the risk R ( b X ; 0 . for smoothing the noisy trajectories of anintegrated fBm, with constant noise variance σ = 0 . . Our methodCV
Risks
Figure 5:
CV versus our method: comparing the pointwise risk R ( b X ; 0 . for smoothing thenoisy trajectories of a fBm; simulation (1 , , , , equi , . , . . = 0 . H = 0 . H = 0 . Risks
Our method CV
Figure 6:
CV versus our method: comparing R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for smoothing the noisy trajectories of a piecewise fBm; simulation (2 , , , , equi , (0 . , . , . , . . Our methodCV
Risks
Figure 7:
CV versus our method: comparing the pointwise risk R ( b X ; 0 . for smoothing thenoisy trajectories of an integrated fBm; simulation (3 , , , , equi , . , . .
20o compare our smoothed curves to those of [17], we consider the following ratio: r ( b X, e X ) = P M n m =1 h Y ( n ) m − b X ( T ( n ) m ) i P M n m =1 h Y ( n ) m − e X ( T ( n ) m ) i , ≤ n ≤ , where b X denotes our curve estimation while e X is that obtained by [17]. A value ofthe ratio r ( b X, e X ) less than indicates smoothed values closer to the observations.For our illustration, we consider a subset of the NGSIM dataset, known as theI-80 dataset. It contains minutes of trajectories for vehicles on the Interstate 80Freeway in Emeryville, California, segmented into three 15-minute periods (from 4:00p.m. to 4:15 p.m.; from 5:00 p.m. to 5:15 p.m. and from 5:15 p.m. to 5:30 p.m.)on April 13, 2005 and corresponds to different traffic conditions (congested, transitionbetween uncongested and congested and fully congested). In total, the dataset containstrajectories, velocities and accelerations for N = 1714 individual vehicles that passedthrough this highway during this period, recorded every . . The number M n ofmeasurements for each curve varies from 165 to 946. We focus on the velocity variableand rescale the measurement times for each of the 1714 velocity curves such that thefirst velocity measurement corresponds to t = 0 and the last one to t = 1 . Figure8 presents a sample of five curves from this data. It can easily be noticed that thevelocities are quite erratic and their variation is not physically realistic, indicatingthe presence of a noise. Moreover, the data have been recorded at a moment ofthe day when traffic is evolving, it goes from fluid to dense traffic. Therefore, weconsider that there are three groups in the data: a first group corresponding to a fluid(high-speed) traffic, a second one for in-between fluid and dense traffic, and a thirdgroups corresponding to the dense (low-speed) traffic. To determine the three clusters,we fit a finite Gaussian mixture model to the vector of number of sampling points.The model is estimated by an EM algorithm initialized by hierarchical model-basedagglomerative clustering as proposed by Fraley and Raftery [6] and implemented inthe R package mclust [21]. The optimal model is then selected according to BIC.The three resulting classes have 239, 869 and 606 velocity trajectories, respectively.Plots of randomly selected subsamples of trajectories from each groups are providedin the Supplementary Material. The respective numbers of measures M n are plottedin Figure 9. The mean estimates b µ obtained in the three groups are , and ,respectively, and the corresponding values b K , as defined as in Corollary 1, are , and .Figure 10 presents the results of the estimation of ς t for values of t from . to . , for each group. The evolution of ς t is quite smooth, except for Group 1 (Figure10b). A possible explanation could be the small number of curves and the averageof M n in this group, which correspond to low values of N and b µ . We also providethe estimation of the regularity using the whole sample of size 1714. The differenceswe notice between the estimates of ς t from different groups support our preliminaryclustering step.To compute the curve estimate we adopt a leave-one-curve-out procedure: eachcurve is smoothed using the local regularity estimates computed from the other curvesin the group (or the other 1713 curves when the data is not split into groups). Thedensities of the resulting ratios r ( b X, e X ) are plotted in Figure 11. When the trafficis fluid and the speed is high (group 1), our method perform much better than that21 Normalized time V e h i c l e s v e l o c i t y ( k m / h ) Figure 8: I-80 dataset illustration: a sample of five velocity curves. M n : number of sampling points per curve C o un t Figure 9: I-80 dataset clusters: density of sampling points for fluid (darkest gray),in-between fluid and dense, and dense traffic (lightest gray).22
123 0.2 0.4 0.6 0.8
Normalized time L o c a l r e g u l a r i t y e s t i m a t i o n (a) Complete sample Normalized time L o c a l r e g u l a r i t y e s t i m a t i o n (b) Fluid traffic/high velocity subsample Normalized time L o c a l r e g u l a r i t y e s t i m a t i o n (c) In-between group subsample Normalized time L o c a l r e g u l a r i t y e s t i m a t i o n (d) Dense traffic/low velocity subsample Figure 10: Estimation of the local regularity of the velocity curves for different t .of Montanino and Punzo. When the traffic is dense with low speed (group 3), thesmoothed values obtained with the two methods are more similar, though our methodstill exhibits better performance for the majority of the curves. A Proof of Theorem 1
The proof of Theorem 1 is based on several lemmas that we present in the following.For these lemmas, we implicitly assume that the conditions of Theorem 1 are satisfied.
Lemma 1.
Let r be an integer such that ( µ + 1) βf (2 Ht βφ )4+ βf (2 Ht βφ ) ≤ r ≤ K . Let s ∈ { , , , } and let ≤ k, l ≤ K be such that l − k = s r . Then, for sufficientlylarge µ , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E B h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) i − L t (cid:18) l − kf ( t )( µ + 1) (cid:19) H t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:18) l − kf ( t )( µ + 1) (cid:19) H t +min( β φ ,β f H t / , where c = max(2 L φ , c ) and c is a constant depending on H t , β φ , L f , β f and f ( t ) .Proof of Lemma 1. Note that, by the definition of E B , elementary properties of the23 .000.250.500.75 0 1 2 3 4 5 Ratio D e n s i t y (a) Complete sample Ratio D e n s i t y (b) Fluid traffic/high velocity subsample Ratio D e n s i t y (c) In-between group subsample Ratio D e n s i t y (d) Dense traffic/low velocity subsample Figure 11: Densities of the ratio r ( b X, e X ) within different groups.conditional expectation, and Assumption (H3), E B h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) i = E h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) B i = E n E h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) B (cid:12)(cid:12)(cid:12) M, T , . . . , T M io = E n E h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) M, T , . . . , T M i B o = E B (cid:8) L t | T ( l ) − T ( k ) | H t (cid:2) φ t ( T ( k ) , T ( l ) ) (cid:3)(cid:9) =: ( I ) + ( II ) , where ( I ) = L t E B (cid:8) | T ( l ) − T ( k ) | H t (cid:9) .By Lemma 2 applied with α = 2 H t ≤ , ( I ) = L t (cid:18) l − kf ( t )( µ + 1) (cid:19) H t (1+ R ) with | R | ≤ c (2 H t ) (cid:18) l − kf ( t )( µ + 1) (cid:19) β f H t / . (18)On the other hand, Assumption (H3) implies that | ( II ) | ≤ L t L φ E B (cid:16) | T ( l ) − T ( k ) | H t + β φ (cid:17) , and using again Lemma 2 with α = 2 H t + β φ ≤ we obtain | ( II ) | ≤ L t L φ (cid:18) l − kf ( t )( µ + 1) (cid:19) H t + β φ , (19)24or µ large enough such that c (2 H t + β φ ) ≤ . Then, from (18) and (19) we obtain E B h(cid:12)(cid:12) X T ( l ) − X T ( k ) (cid:12)(cid:12) i = L t (cid:18) l − kf ( t )( µ + 1) (cid:19) H t (1 + R ) , where R is a remainder term such that, for sufficiently large µ , | R | ≤ max ( L φ (cid:18) l − kf ( t )( µ + 1) (cid:19) β φ , c (2 H t ) (cid:18) l − kf ( t )( µ + 1) (cid:19) β f H t / ) ≤ c (cid:18) l − kf ( t )( µ + 1) (cid:19) min( β φ ,β f H t / , with c = max(2 L φ , c (2 H t )) with c ( · ) defined in Lemma 2.For the sake of readability, we state below a technical lemma on the moments ofthe spacings T ( l ) − T ( k ) , for which the proof is given in the Supplementary Material.In Lemma 2, we consider that µ is sufficiently large to ensure ( µ + 1) β f α/ (4+ β f α ) + 1 ≤ µ/ { µ ) } . Lemma 2.
Let < α ≤ be a fixed parameter and let r be an integer such that ( µ + 1) βf α βf α ≤ r ≤ K with K ≤ µ µ ) . Let s ∈ { , , , } and let ≤ k, l ≤ K be such that l − k = s r . Then, for sufficientlylarge µ , (cid:12)(cid:12)(cid:12)(cid:12) E B (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α (cid:3) − (cid:18) l − kf ( t )( µ + 1) (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:18) l − kf ( t )( µ + 1) (cid:19) α (1+ β f / , with c = c ( α ) = 8 c { f ( t ) } β f α/ and c a constant depending on α , L f , β f and f ( t ) . Lemma 3.
Let k be a positive integer such that k − ≤ K . Then for any η > , q k ( η ) := max n P (ˆ θ k − θ k ≥ η ) , P (ˆ θ k − θ k ≤ − η ) o ≤ exp (cid:0) − e N η (cid:1) , where, using the notations introduced in Assumptions (H4) and (H5), e = 1 / (2 d + 2 D ) with d = 3 (cid:0) a | J µ ( t ) | + 2 b (cid:1) and D = 3 max( A | J µ ( t ) | , B ) . Proof of Lemma 3 .
By the definition in (5) and (6), ˆ θ k = 1 N N X n =1 Z n where Z n = (cid:2) Y ( n )(2 k − − Y ( n )( k ) (cid:3) B n , and B n = n M n ≥ K , T ( n )(1) ∈ J µ ( t ) , . . . , T ( n )( K ) ∈ J µ ( t ) o . Note that E (ˆ θ k ) = θ k . More-over, for any p ≥ , using Assumptions (H4) and (H5), we have E (cid:0) | Z n | p (cid:1) = E B (cid:0) | Y (2 k − − Y ( k ) | p (cid:1) ≤ p − E B (cid:16) | X T (2 k − − X T ( k ) | p + | ε (2 k − | p + | ε ( k ) | p (cid:17) ≤ p − p !2 aA p − sup u,v ∈ J µ ( t ) | u − v | pH t + 2 bB p − ! ≤ p !2 dD p − , d and D are defined in the statement of this lemma. Bernstein’s inequalityimplies P (ˆ θ k − θ k ≥ η ) ≤ exp (cid:18) − N η d + 2 D η (cid:19) ≤ exp (cid:0) − e N η (cid:1) , and the same bound is valid for P (ˆ θ k − θ k ≤ − η ) . The bound for q k ( η ) follows. Lemma 4.
Let ≤ k < l ≤ ( K + 1) / be two positive integers. For any η > , define p + k,l ( η ) = P (ˆ θ l − ˆ θ k ≥ (1 + η )( θ l − θ k )) and p − k,l ( η ) = P (ˆ θ l − ˆ θ k ≤ (1 − η )( θ l − θ k )) . Then, for sufficiently large µ , max n p + k,l ( η ) , p − k,l ( η ) o ≤ " − e N η (cid:18) l − kµ + 1 (cid:19) H t , with e defined in Lemma 3.Proof of Lemma 4. Assume that k and l satisfy the assumptions stated in the Lemmaand assume moreover that µ is large enough so that η ( θ l − θ k ) / < . Then p + k,l ( η ) = P (cid:2) (ˆ θ l − θ l ) − (ˆ θ k − θ k ) ≥ η ( θ l − θ k ) (cid:3) ≤ P (cid:2) ˆ θ l − θ l ≥ η ( θ l − θ k ) / (cid:3) + P (cid:2) ˆ θ k − θ k ≤ − η ( θ l − θ k ) / (cid:3) ≤ q l ( η ( θ l − θ k ) /
2) + q k ( η ( θ l − θ k ) / , and the same bound is valid for p − k,l ( η ) . By (20) we have θ l − θ k ≥ { ( l − k ) / ( µ + 1) } H t / , provided µ is sufficiently large. Then we obtain the bound for max (cid:16) p + k,l ( η ) , p − k,l ( η ) (cid:17) after applying Lemma 3. Proof of Theorem 1.
Let (cid:15) > . With the notation from (4) and (7), we can write P (cid:16)(cid:12)(cid:12) b H t ( k ) − H t (cid:12)(cid:12) > ε (cid:17) ≤ {| H t ( k ) − H t | >(cid:15)/ } + P (cid:16)(cid:12)(cid:12) b H ( k ) − H t ( k ) (cid:12)(cid:12) > ε/ (cid:17) =: B + V, and thus it suffices to bound the terms B and V . The term B . The study of the set in the indicator function boils down to thestudy of the convergence of H t ( k ) to H t . Using Lemma 1 with l − k = k − r wehave θ k − σ = E B (cid:20)(cid:16) X T (2 k − − X T ( k ) (cid:17) (cid:21) = L t (cid:18) k − f ( t )( µ + 1) (cid:19) H t (1 + ρ k ) , and | ρ k | ≤ c (cid:18) k − f ( t )( µ + 1) (cid:19) min( β φ ,β f H t / =: ρ ∗ k , with c a constant defined in Lemma 1. Using again Lemma 1 with k = 2 k − , l = 4 k − and a = 2 and taking the difference, we deduce that there exists R k such that θ k − − θ k = L t (cid:18) k − f ( t )( µ + 1) (cid:19) H t (1 + ρ k − ) − L t (cid:18) k − f ( t )( µ + 1) (cid:19) H t (1 + ρ k )= (4 H t − L t (cid:18) k − f ( t )( µ + 1) (cid:19) H t (1 + R k ) , (20)26here | R k | = (cid:12)(cid:12)(cid:12)(cid:12) H t ρ k − − ρ k H t − (cid:12)(cid:12)(cid:12)(cid:12) = ≤ H t + 14 H t − ρ ∗ k − ≤ H t + 14 H t − ρ ∗ K . Similarly, we obtain: θ k − − θ k − = (4 H t − L t (cid:18) k − f ( t )( µ + 1) (cid:19) H t (1 + R k − ) . (21)Combining (20) and (21), we obtain log( θ k − − θ k − ) − log( θ k − − θ k ) = H t log 4 + log(1 + R k − ) − log(1 + R k ) , which leads, using the definition of H t ( k ) given by (4), to: H t ( k ) = H t + η k where η k = log(1 + R k − ) − log(1 + R k )2 log 2 . Note that, for sufficiently large µ , both R k and R k − are greater that − / . Thisimplies | η k | ≤ | R k − − R k | log 2 ≤ (cid:18) H t + 14 H t − (cid:19) ρ ∗ k − . Thus, since ρ ∗ k − ≤ ρ ∗ K , the condition (cid:12)(cid:12) H t ( k ) − H t (cid:12)(cid:12) > ε/ fails and B = 0 as soonas (cid:15) > (cid:18) H t + 14 H t − (cid:19) ρ ∗ K , that is as soon as condition (9) is satisfied, provided µ is sufficiently large. The term V . Defining the event D = { ˆ θ k − > ˆ θ k − > ˆ θ k } , we can write P (cid:16)(cid:12)(cid:12) b H ( k ) − H t ( k ) (cid:12)(cid:12) > (cid:15)/ (cid:17) ≤ P (cid:16)(cid:12)(cid:12) b H ( k ) − H t ( k ) (cid:12)(cid:12) > (cid:15)/ , D (cid:17) + P ( D ) . (22)First note that using Lemma 4 we have, for sufficiently large µ : P ( D ) ≤ P (ˆ θ k ≥ ˆ θ k − ) + P (ˆ θ k − ≥ ˆ θ k − ) ≤ p − k − ,k (1) + p − k − , k − (1) ≤ " − e N (cid:18) k − µ + 1 (cid:19) H t . (23)Now, it remains to bound the quantity ℘ = P (cid:16)(cid:12)(cid:12) b H ( k ) − H t ( k ) (cid:12)(cid:12) > (cid:15)/ , D (cid:17) = P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ˆ θ k − − ˆ θ k − θ k − − θ k − × θ k − − θ k ˆ θ k − − ˆ θ k !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) log 2 , D . ˆ θ k − − ˆ θ k − and ˆ θ k − − ˆ θ k are positive under D , we have ℘ ≤ P " ˆ θ k − − ˆ θ k − θ k − − θ k − × θ k − − θ k ˆ θ k − − ˆ θ k > (cid:15) , D + P " ˆ θ k − − ˆ θ k − θ k − − θ k − × θ k − − θ k ˆ θ k − − ˆ θ k < − (cid:15) , D ≤ P " ˆ θ k − − ˆ θ k − θ k − − θ k − > (cid:15) + P " ˆ θ k − − ˆ θ k θ k − − θ k < − (cid:15) + P " ˆ θ k − − ˆ θ k − θ k − − θ k − < − (cid:15) + P " ˆ θ k − − ˆ θ k θ k − − θ k > (cid:15) . Applying Lemma 4, we obtain: ℘ ≤ p +4 k − , k − (2 (cid:15) −
1) + p − k − , k − (1 − − (cid:15) ) + p +2 k − ,k (2 (cid:15) −
1) + p − k − ,k (1 − − (cid:15) ) . Now remark that p +2 k − ,k (2 (cid:15) − ≤ " − e N (cid:16) (cid:15) − (cid:17) (cid:18) k − µ + 1 (cid:19) H t ≤ " − e log (2)64 N (cid:15) (cid:18) k − µ + 1 (cid:19) H t , and, as soon as (cid:15) < / log 2 , we have − − (cid:15)/ ≤ (cid:15)/ , which implies: p − k − ,k (1 − − (cid:15) ) ≤ " − e N (cid:16) − − (cid:15) (cid:17) (cid:18) k − µ + 1 (cid:19) H t ≤ " − e log (2)256 N (cid:15) (cid:18) k − µ + 1 (cid:19) H t . Using similar derivations for the others terms, we obtain: ℘ ≤ " − e log (2)256 N (cid:15) (cid:18) k − µ + 1 (cid:19) H t . (24)Combining (22) with (23) and (24), we obtain, for sufficiently large µ and (cid:15) < / log 2 : P (cid:16)(cid:12)(cid:12) b H ( k ) − H t ( k ) (cid:12)(cid:12) > (cid:15)/ (cid:17) ≤
12 exp " − f N (cid:15) (cid:18) k − µ + 1 (cid:19) H t , where f = e log (2) / . B Proofs of Theorems 2 and 3
The proofs of Theorems 2 and 3 are based on the following lemmas for which theproofs are provided in the Supplementary Material. For the first lemma we consider28he matrix A defined in (12) with the bandwidth b h = M − / (2 b ς t +1) . Let λ be thesmallest eigenvalue of this matrix. Let A = f ( t ) Z R U ( u ) U > ( u ) K ( u ) du, and let λ denote its smallest eigenvalue. In the following, we assume that K ( · ) satisfies (14). Then A is positive definite [see 22, for details] and thus λ > . Lemma 5.
Under Assumptions (LP2)–(LP4), the matrix A is positive semidefinite.Moreover, there exists a positive constant g that depends only on the kernel K , d , f ( t ) , β f , L f and λ , such that, for sufficiently large µ , sup <β ≤ λ / P ( λ ≤ β ) ≤ K exp h − g τ ( µ ) log ( µ ) i where τ ( µ ) = 1log ( µ ) (cid:18) µ log( µ ) (cid:19) ςt ςt ,ς t = d + H t , and K is a universal constant. Since the dimension of A and A are given by ˆ d , the probability P ( · ) in Lemma 5should be understood as the conditional probability given the estimator ˆ d . Lemma 6.
Let ξ be a positive random variable such that c := E (cid:2) exp (cid:0) η ξ (cid:1)(cid:3) < ∞ , for some positive constant η . Then, for any τ ≥ : E [exp ( τ ξ )] ≤ c exp (cid:16) c τ / (cid:17) where c = (5 / η ) / . Proof of Theorem 2.
Without loss of generality, we could suppose that f ( t ) / ≤ f ( t ) ≤ f ( t ) , ∀ t ∈ J µ ( t ) . We define the events E = { λ > λ / } , F = (cid:8) | b H t − H t | ≤ log − ( µ ) (cid:9) ∩ n ˆ d = d o , and G = {| X t | ≤ τ ˜ α ( µ ) } , with / < ˜ α < α = 5 / . Next, let Z = (cid:12)(cid:12)(cid:12) b X t − X t (cid:12)(cid:12)(cid:12) .Assume that µ is such that log − ( µ/ log( µ )) < λ / , then using Assumption (H2), wehave: E [ ϕ ( τ ( µ ) Z )] = ( A ) + ( B ) + ( C ) + ( D ) , where ϕ ( x ) = exp( x / ) and ( A ) = E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1) E F G (cid:3) ( B ) = E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1) E (cid:3) ≤ E / (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) P / ( E )( C ) = E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1) F (cid:3) ≤ E / (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) P / ( F )( D ) = E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1) G (cid:3) ≤ E / (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) P / ( G ) . We show that ( A ) is the main term, and it is bounded by a constant.By construction, (cid:12)(cid:12)(cid:12) ˆ X t (cid:12)(cid:12)(cid:12) ≤ τ α ( M ) and, by (LP2), τ α ( M ) ≥ τ α ( µ/ log( µ )) ≥ τ ˜ α ( µ ) , µ is sufficiently large. Thus, for sufficiently large µ , (cid:12)(cid:12)(cid:12) ˆ X t − X t (cid:12)(cid:12)(cid:12) G ≤ (cid:12)(cid:12)(cid:12) U > (0) ˆ ϑ − X t (cid:12)(cid:12)(cid:12) G ≤ (cid:12)(cid:12)(cid:12) U > (0) ˆ ϑ − X t (cid:12)(cid:12)(cid:12) , with ˆ ϑ = A − a and A and a defined in (12) and (13), respectively. Therefore, p τ ( µ ) Z G ≤ p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ( X T m − X t ) W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ε m W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where W m = 1 M h U > (0) A − U (cid:18) T m − t h (cid:19) K (cid:18) T m − t h (cid:19) . This leads to (cid:16)p τ ( µ ) Z (cid:17) / G ≤ p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ( X T m − X t ) W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! / + p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ε m W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! / . Then, to show that ( A ) is finite, it suffices to show that A = E exp p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ( X T m − X t ) W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! / E F , and A = E exp p τ ( µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ε m W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! / E F , are finite and to apply Cauchy-Schwarz inequality. To control the stochastic term A ,remark that: A = 1 + X p ≥ p [ τ ( µ )] p/ p ! B p , (25)where B p = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 ε m W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ E F . By Jensen’s inequality, B ≤ B / ≤ B / ≤ B / . Thus, it remains to control B p for any p ≥ . For such values of p , we use Marcinkiewicz-Zygmund’s inequality andobtain: B p ≤ (cid:16) p − (cid:17) p/ E M X m =1 ε m W m ! p/ E F . By a version of Lemma 1.3 of [22], for κ defined in (14), sup ≤ m ≤ M | W m | E λ / ≤ κλ M h , and M X m =1 | W m | E λ / ≤ κλ M h M X j =1 { t − h ≤ T m ≤ t + h } . (26)30et χ m = h − { t − h ≤ T m ≤ t + h } . By the Rosenthal inequality, there exists a universalconstant C , such that, for any q ≥ , ˜ E " M X m =1 χ m ! q ≤ C q q q log q q max ( M X m =1 ˜ E χ qm , M X m =1 ˜ E χ m ! q ) ≤ (cid:18) qCf ( t ) M log q (cid:19) q . (27)See [15]. Since W m ≤ | W m | sup ≤ j ≤ M | W j | , we deduce ˜ E " M X m =1 W m ! q E ≤ (cid:18) κλ M h (cid:19) q (cid:18) qκCf ( T ) λ log q (cid:19) q =: (cid:18) M h (cid:19) q (cid:18) c q log q (cid:19) q . Using Assumption (LP2), we deduce: E " M X m =1 W m ! q E F ≤ (cid:18) c q log q (cid:19) q E (cid:18) log µµ (cid:19) q b Ht b Ht F = 2 (cid:18) c q log q (cid:19) q (cid:8) τ ( µ ) log µ (cid:9) q . The last line can be deduced using similar arguments to those used to obtain (A) inthe Supplementary Material. Next, let f W m = W m / P Mj =1 W j . Since the error termsare independent on the T m ’s, and using (H5), by Jensen’s inequality B p (cid:16) p − (cid:17) − p/ ≤ E M X m =1 | ε m | p/ f W m ! M X j =1 W j p/ E F = E E " M X m =1 | ε m | p/ f W m ! | M, W , . . . , W M M X j =1 W j p/ E F ≤ (cid:0) E | ε | p (cid:1) / E M X m =1 W m ! p/ E F ≤ (cid:18) p !2 bB p − (cid:19) / (cid:18) c p p/ (cid:19) p/ (cid:18) τ ( µ ) log µ (cid:19) p/ . Thus we have B p ≤ (cid:16) p − (cid:17) p/ (cid:18) p !2 bB p − (cid:19) / (cid:18) c p p/ (cid:19) p/ (cid:18) τ ( µ ) log µ (cid:19) p/ = (cid:18) b B (cid:19) / (cid:18) τ ( µ ) log µ (cid:19) p/ D p , where D p = (cid:16) p − (cid:17) p/ ( p !) / (cid:18) c p B p/ (cid:19) p/ ≤ p ! (cid:18) c log p (cid:19) p/ , for some constant c . For the last inequality, we use Stirling’s formula. This impliesthat there exists a universal constant c such that B p p ! ≤ (cid:18) c log p (cid:19) p/ (cid:18) τ ( µ ) log µ (cid:19) p/ . (28)31ombining (25) with (28) we obtain: A = 1 + (cid:26) B τ / ( µ ) + 2 B τ / ( µ ) + 4 B τ / ( µ ) (cid:27) + X p ≥ p τ ( µ ) p/ p ! B p ≤ ( (cid:0) B τ ( µ ) (cid:1) / + 2 (cid:0) B τ ( µ ) (cid:1) / + 4 (cid:0) B τ ( µ ) (cid:1) / ) + X p ≥ (cid:18) c log p (cid:19) p/ (cid:18) µ (cid:19) p/ < ∞ . The inequality on the last line comes from the fact that B τ ( µ ) log ( µ ) is bounded.To control the bias term A , let us define, for any < β < H t : Λ β = sup u,v ∈ J µ ( t ) u = v | X ( d ) u − X ( d ) v || u − v | β , where here X ( d ) u denotes the d -th derivative of the trajectory X u . Applying Taylor’sformula and using the basic properties satisfied by the weights W m , we obtain: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 X ( T m ) W m − X t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 d X k =1 X ( k ) ( t ) k ! ( T m − t ) k W m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + M X m =1 (cid:12)(cid:12) X ( d ) ( t ) − X ( d ) ( ζ m ) (cid:12)(cid:12) d ! | T m − t | d | W m |≤ Λ β d ! M X m =1 | T m − t | d + β | W m | , where | ζ m − t | ≤ | T m − t | . Note that this result is obtained using : M X m =1 ( T m − t ) k W m = 0 . Since, under E we have, W m = 0 as soon as | T m − t | > h , : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 X ( T m ) W m − X t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ≤ Λ β h d + β d ! M X m =1 | W m | E ≤ Λ β d ! 4 κλ h d + β M h M X m =1 { t − h ≤ T m ≤ t + h } . The last line follows from (26). Moreover, combining the result obtained by [20, p. 27],with (H4), for any < H t − β < β where β is some sufficiently small fixed value,we have: E Λ p/ β ≤ + p ( H t +1) (cid:18) − β − H t (cid:19) p/ (cid:18) p !2 aA p − (cid:19) / ≤ a / A / ( p !) / √ A H t − β ! p/ . Λ β is independent of b H t , M and the T m ’s,by the last inequality above and inequality (27), we have: E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 X ( T m ) W m − X t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ E F ≤ (cid:18) pCf ( t )log( p/ (cid:19) p/ E (cid:20)(cid:16) h d + β (cid:17) p/ (cid:21) E h (Λ β ) p/ i . We thus obtain: A ≤ X p ≥ (cid:0) τ ( µ ) (cid:1) p/ p ! E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X m =1 X ( T m ) W m − X t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ E F ≤ X p ≥ (cid:0) τ ( µ ) (cid:1) p/ p ! (cid:18) pCf ( t )log( p/ (cid:19) p/ E (cid:20)(cid:16) h d + β (cid:17) p/ F (cid:21) E h (Λ β ) p/ i . Note that, on the event F , (cid:16) h d + β (cid:17) p/ ≤ C p/ (cid:18) log µµ (cid:19) p d + β )2( d + Ht . Taking β = H t − log − µ , since ( µ/ log µ ) / log µ is bounded, we deduce that, for someconstant C > , A ≤ X p ≥ C p/ log p/ ( p ) < ∞ . It remains to control ( B ) , ( C ) and ( D ) . For this purpose, let us first note that, bythe Assumption (LP1), c := E [exp( η X t )] < ∞ , for η = 1 / (2 A ) . We deduce that E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) ≤ E (cid:20) exp (cid:0) τ / ( µ ) (cid:12)(cid:12)(cid:12) ˆ X t − X t (cid:12)(cid:12)(cid:12) / (cid:1)(cid:21) ≤ E h exp (cid:16) τ / ( µ ) (cid:8) | ˆ X t | / + | X t | / (cid:9)(cid:17)i ≤ exp h τ / ( µ ) τ α/ ( µ log( µ )) i E h exp (cid:16) τ / ( µ ) | X t | / (cid:17)i ≤ c exp h τ / ( µ ) τ α/ ( µ log( µ )) + 2 / c τ / ( µ ) i , where for the last inequality, we apply Lemma 6 with and ξ = | X t | / , and thus c = (5 A / / . Now notice that, using Markov’s inequality P ( G ) = P ( | X t | > τ ˜ α ( µ )) ≤ c exp (cid:0) − η τ α ( µ ) (cid:1) . Since ˜ α < / , Assumptions (LP3) and (LP4) imply that for sufficiently large µ : P ( F ) ≤ K exp( − µ ) ≤ exp (cid:0) − η τ α ( µ ) (cid:1) . Moreover, Lemma 5 also implies that, for sufficiently large µ : P ( E ) ≤ K exp (cid:16) − g τ ( µ ) log ( µ ) (cid:17) ≤ exp (cid:0) − η τ α ( µ ) (cid:1) . H denotes either E , F or G , we have E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) P ( H ) ≤ C exp h τ / ( µ ) τ α/ ( µ log( µ )) + 2 / c τ / ( µ ) − η τ α ( µ ) i , where C denotes a positive constant. The choice α = 5 / and ˜ α = 9 / allows us todeduce that E (cid:2) ϕ (cid:0) τ ( µ ) Z (cid:1)(cid:3) P ( H ) is bounded. This concludes the proof of Theorem 2. Proof of Theorem 3.
By Theorem 2, max ≤ n ≤ N E (cid:20) ϕ (cid:26) τ ( µ ) (cid:12)(cid:12)(cid:12) b X [ n ] t − X [ n ] t (cid:12)(cid:12)(cid:12) (cid:27)(cid:21) ≤ Γ where τ ( µ ) = 1log ( µ ) (cid:18) µ log( µ ) (cid:19) ςt ςt , and ϕ ( x ) = exp( x / ) . Now, let x = 256 and consider e ϕ ≤ ϕ defined by e ϕ ( x ) = ( ϕ ( x )( x − x ) + ϕ ( x ) if x ≤ x ϕ ( x ) if x ≥ x , and note that e ϕ is nondecreasing and convex. Then, by Lemma 1.6 in [22], E (cid:18) max ≤ n ≤ N (cid:12)(cid:12)(cid:12) b X [ n ] t − X [ n ] t (cid:12)(cid:12)(cid:12) (cid:19) ≤ τ − ( µ ) e ϕ ← (Γ N ) , where e ϕ ← denotes the inverse function of e ϕ . Moreover, for N sufficiently large, wehave e ϕ ← (Γ N ) = log (Γ N ) . Acknowledgements
The authors wish to thank Groupe Renault and the ANRT (French National Associa-tion for Research and Technology) for their financial support via the CIFRE conventionno. 2017/1116.
Supplementary Material
In the Supplementary Material, we provide some additional proofs and simulationresults. In Section A, we provide the proof of three technical lemmas stated in theAppendix A and B and used for Theorems 1 and 2, while in Section B, momentbounds for the spacings between the observation times. Details on our simulationexperiments and additional results are gathered in Section C. Details on the trafficflow analysis are provided in Section D. Local regularity estimates for another threereal data applications (Canadian weather, Household Active Power Consumption andPPG-Dalie data sets) are presented in Section E.
References [1] D. Blanke and C. Vial. Global smoothness estimation of a Gaussian process fromgeneral sequence designs.
Electronic Journal of Statistics , 8(1):1152–1187, 2014.342] G. Chan and A. T. A. Wood. Estimation of fractal dimension for a class of non-Gaussian stationary processes and fields.
Annals of Statistics , 32(3):1222–1260,June 2004.[3] A. G. Constantine and P. Hall. Characterizing Surface Smoothness via Estimationof Effective Fractal Dimension.
Journal of the Royal Statistical Society. Series B(Methodological) , 56(1):97–113, 1994.[4] C. Dong, J. M. Dolan, and B. Litkouhi. Intention estimation for ramp mergingcontrol in autonomous driving. In , pages 1584–1589, 2017.[5] J. Fan and I. Gijbels.
Local polynomial modelling and its applications . Number 66in Monographs on statistics and applied probability , ISSN 0960-6696 ; ZDB-ID:22968-4. Chapman & Hall, London, 1996.[6] C. Fraley and A. E. Raftery. Model-Based Clustering, Discriminant Analysis, andDensity Estimation.
Journal of the American Statistical Association , 97(458):611–631, June 2002.[7] S. Gaïffas. On pointwise adaptive curve estimation based on inhomogeneous data.
ESAIM: Probability and Statistics , 11:344–364, 2007.[8] T. Gneiting, H. Ševčíková, and D. B. Percival. Estimators of Fractal Dimension:Assessing the Roughness of Time Series and Spatial Data.
Statistical Science ,27(2):247–277, 2012.[9] A. Goldenshluger and O. Lepski. Bandwidth selection in kernel density estimation:Oracle inequalities and adaptive minimax optimality.
The Annals of Statistics ,39(3):1608–1632, June 2011.[10] J. Halkias and J. Colyar. NGSIM interstate 80 freeway dataset. Technical report,US Federal Highway Administration, Washington, DC, USA, 2006.[11] T. Hayfield and J. S. Racine. Nonparametric Econometrics: The np Package.
Journal of Statistical Software , 27(1):1–32, 2008.[12] M. Henaff, A. Canziani, and Y. LeCun. Model-Predictive Policy Learning withUncertainty Regularization for Driving in Dense Traffic. arXiv:1901.02705 [cs,stat] , Jan. 2019.[13] L. Horvàth and P. Kokoszka.
Inference for Functional Data with Applications .Springer Series in Statistics. Springer-Verlag, New York, 2012.[14] Y. Hu, W. Zhan, and M. Tomizuka. A Framework for Probabilistic Generic TrafficScene Prediction. arXiv:1810.12506 [cs, stat] , Oct. 2018.[15] W. B. Johnson, G. Schechtman, and J. Zinn. Best Constants in Moment In-equalities for Linear Combinations of Independent and Exchangeable RandomVariables.
The Annals of Probability , 13(1):234–253, 1985.3516] J. Mercat, N. E. Zoghby, G. Sandou, D. Beauvois, and G. P. Gil. Inertial Sin-gle Vehicle Trajectory Prediction Baselines and Applications with the NGSIMDataset. arXiv:1908.11472 [cs] , Aug. 2019.[17] M. Montanino and V. Punzo. Making NGSIM Data Usable for Studies on TrafficFlow Theory.
Transportation Research Record: Journal of the TransportationResearch Board , 2390:99–111, Dec. 2013.[18] V. Punzo, M. T. Borzacchiello, and B. F. Ciuffo. Estimation of Vehicle Trajecto-ries from Observed Discrete Positions and Next-Generation Simulation Program(NGSIM) Data. 2009.[19] J. Ramsay and B. W. Silverman.
Functional Data Analysis . Springer Series inStatistics. Springer-Verlag, New York, 2 edition, 2005.[20] D. Revuz and M. Yor.
Continuous Martingales and Brownian Motion . SpringerScience & Business Media, Mar. 2013.[21] L. Scrucca, M. Fop, T. B. Murphy, and A. E. Raftery. mclust 5: clustering,classification and density estimation using Gaussian finite mixture models.
TheR Journal , 8(1):289–317, 2016.[22] A. B. Tsybakov.
Introduction to Nonparametric Estimation . Springer Series inStatistics. Springer New York, 2009.[23] J.-L. Wang, J.-M. Chiou, and H.-G. Mueller. Review of Functional Data Analysis. arXiv:1507.05135 [stat] , July 2015.[24] B. Wulfe, S. Chintakindi, S.-C. T. Choi, R. Hartong-Redden, A. Kodali, andM. J. Kochenderfer. Real-time Prediction of Intermediate-Horizon AutomotiveCollision Risk. arXiv:1802.01532 [cs] , Feb. 2018.[25] X. Zhang and J.-L. Wang. From sparse to dense functional data and beyond.
TheAnnals of Statistics , 44(5):2281–2321, Oct. 2016.36 upplementary material for “Learning the smoothness ofnoisy curves with application to online curve estimation”
Steven Golovkine ∗ Nicolas Klutchnikoff † Valentin Patilea ‡ September 9, 2020
In this Supplementary Material, we provide some additional proofs and simulationresults. In Section A below, we provide the proof of three technical lemmas statedin the Appendix of the main manuscript and used for Theorems 1 and 2, while inSection B, we derive moments bounds for the spacings between the observation times.Details on our simulation experiments and additional results are gathered in Section C.Details on the traffic flow analysis are provided in Section D. Local regularity estimatesfor another three real data applications (Canadian weather, Household Active PowerConsumption and PPG-Dalia data sets) are presented in Section E.
A Technical lemmas
Lemma 2.
Let < α ≤ be a fixed parameter and let r be an integer such that ( µ + 1) βf α βf α ≤ r ≤ K with K ≤ µ µ ) . Let s ∈ { , , , } and let ≤ k, l ≤ K be such that l − k = s r . Then, for sufficientlylarge µ , we have (cid:12)(cid:12)(cid:12)(cid:12) E B (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α (cid:3) − (cid:18) l − kf ( t )( µ + 1) (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ c ( α ) (cid:18) l − kf ( t )( µ + 1) (cid:19) α (1+ β f / , where c ( α ) = 8 c (2 f ( t )) β f α/ and c is defined by (16) .Proof of Lemma 2. Let C = { M ≥ K } \ B . We have E B (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α (cid:3) = E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α B (cid:3) = ( I ) − ( II ) where ( I ) = E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α M ≥ K (cid:3) and ( II ) = E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α C (cid:3) . We study separately the two terms of the right hand side of the above equation. ∗ Groupe Renault & CREST - UMR 9194, Rennes, France, [email protected] † Univ Rennes, CNRS, IRMAR - UMR 6625, F-35000 Rennes, France, nicolas.klutchnikoff@univ-rennes2.fr ‡ Ensai, CREST - UMR 9194, Rennes, France, [email protected] a r X i v : . [ m a t h . S T ] S e p tudy of ( II ) . Note that E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α C (cid:3) ≤ | I | α P ( C ) . The event C happens if, less than K random times among T , . . . , T M fall into theinterval J µ ( t ) . This implies that P ( C ) ≤ E (cid:2) P ( B M < K | M ) { M ≥ K } (cid:3) where, for any integer m ≥ , B m denotes a Binomial random variable B ( m, | J µ ( t ) | / | I | ) ,independent of M . Using the Bernstein inequality, we obtain P ( B M < K | M ) ≤ exp (cid:18) − | J µ ( t ) || I | M + 2 K (cid:19) . Since | J µ ( t ) | / | I | ≤ (log( µ )) − and K ≤ (2 log( µ )) − µ , we obtain P ( C ) ≤ exp (cid:18) − | J µ ( t ) || I | µ + 2 K (cid:19) E (cid:20) exp (cid:18) − | J µ ( t ) || I | ( M − µ ) (cid:19) { M ≥ K } (cid:21) ≤ exp (cid:18) − µ log( µ ) (cid:19) E (cid:20) exp (cid:18) − | J µ ( t ) || I | ( M − µ ) (cid:19) { M ≥ K } (cid:21) . To bound the last expectation, let < (cid:15) < be some real number. Then exp (cid:18) − µ log( µ ) (cid:19) E (cid:20) exp (cid:18) − | J µ ( t ) || I | ( M − µ ) (cid:19) { M ≥ K } (cid:21) ≤ exp (cid:18) − µ log( µ ) (cid:19)(cid:26) exp (cid:18) | J µ ( t ) || I | µ(cid:15) (cid:19) + E (cid:20) exp (cid:18) − | J µ ( t ) || I | ( M − µ ) (cid:19) { K ≤ M ≤ µ − µ(cid:15) } (cid:21)(cid:27) ≤ exp (cid:18) − µ log( µ ) (cid:19) (cid:26) exp (cid:18) µ(cid:15) log( µ ) (cid:19) + exp (cid:18) µ log( µ ) (cid:19) P [ | M − µ | > µ(cid:15) ] (cid:27) ≤ exp (cid:18) − µ log( µ ) (cid:19) (cid:26) exp (cid:18) µ(cid:15) log( µ ) (cid:19) + exp (cid:18) µ log( µ ) (cid:19) exp( − γ µ(cid:15) ) (cid:27) . This implies that: P ( C ) ≤ exp (cid:20) − µ log( µ ) (1 − (cid:15) ) (cid:21) + exp (cid:20) − µ(cid:15) (cid:18) γ − (cid:15) log( µ ) (cid:19)(cid:21) . Taking (cid:15) = 1 / , we obtain, for sufficiently large µ : P ( C ) ≤ (cid:20) − µ µ ) (cid:21) . We finally obtain, for sufficiently large µ , ( II ) = E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α C (cid:3) ≤ | I | α exp (cid:20) − µ µ ) (cid:21) . (1) Study of ( I ) . We define the random variable ρ by the equation: E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α | M (cid:3) = (cid:18) l − kf ( t )( M + 1) (cid:19) α (cid:0) ρ (cid:1) | ρ | ≤ c ( M + 1 s r + 1 M s r + (cid:18) s rM + 1 (cid:19) β f α/ ) . Whenever r ≥ ( µ + 1) β f α/ (4+ β f α ) , by bounding smaller terms by the dominant onesand balancing the dominant terms on the right hand side of the last inequality, wehave for µ large enough: | ρ | ≤ c (cid:18) s rM + 1 (cid:19) β f α/ + c (cid:18) s rµ + 1 (cid:19) β f α/ . (2)On the other hand we have E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α M ≥ K (cid:3) = E (cid:0) E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α | M (cid:3) M ≥ K (cid:1) = E (cid:20)(cid:18) l − kf ( t )( M + 1) (cid:19) α (cid:0) ρ (cid:1) M ≥ K (cid:21) = (cid:18) l − kf ( t )( µ + 1) (cid:19) α E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K (cid:21) . (3)Now, define: t = (log( µ + 1)) ≤ ( µ + 1) / , and consider the following decomposition: E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K (cid:21) = E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ |≤ t (cid:21) + E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ | >t (cid:21) . Using Assumption (H6), combined with the fact that r ≤ µ , the term of the right handside can be roughly bounded as follows: E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ | >t (cid:21) ≤ c ( µ + 1) α (1+ αβ f / P ( | M − µ | > t ) ≤ c ( µ + 1) α (1+ αβ f / exp (cid:16) − γ µ + 1)) (cid:17) . Thus, for sufficiently large µ , we have: E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ | >t (cid:21) ≤ µ + 1 . (4)It remains to study the term E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ |≤ t (cid:21) . To do so, let us define ˜ ρ α = (cid:18) µ + 1 M + 1 (cid:19) α − . K < ( µ + 1) / , we have (cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ |≤ t = (cid:0) ρ α (cid:1)(cid:0) ρ (cid:1) | M − µ |≤ t = 1 + (cid:0) ˜ ρ α + ρ + ˜ ρ α ρ (cid:1) | M − µ |≤ t − | M − µ | >t . (5)Under the event {| M − µ ≤ t |} , since t < ( µ + 1) / , we have: − αtµ + 1 ≤ (cid:18) µ + 1 M + 1 (cid:19) α ≤ α − tµ + 1 , which leads to | ˜ ρ α | {| M − µ ≤ t |} ≤ α − tµ + 1 = (2 α −
1) log ( µ + 1) µ + 1 . (6)Note also that by (2): | ρ | | M − µ |≤ t ≤ c (cid:18) s rµ + 1 (cid:19) β f α/ . (7)Gathering (5), (6) and (7) we obtain, for sufficiently large µ : (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20)(cid:18) µ + 1 M + 1 (cid:19) α (cid:0) ρ (cid:1) M ≥ K | M − µ |≤ t (cid:21) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:18) s rµ + 1 (cid:19) β f α/ + P ( | M − µ | > t ) ≤ c (cid:18) s rµ + 1 (cid:19) β f α/ . (8)Combining (3) with (4) and (8), we obtain, for sufficiently large µ : E (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α M ≥ K (cid:3) = (cid:18) l − kf ( t )( µ + 1) (cid:19) α (cid:16) R (cid:17) , (9)where | ˜ R | ≤ c (cid:18) s rµ + 1 (cid:19) β f α/ . From (1) and (9), we obtain, for µ large enough: E B (cid:2)(cid:12)(cid:12) T ( l ) − T ( k ) (cid:12)(cid:12) α (cid:3) = (cid:18) l − kf ( t )( µ + 1) (cid:19) α (1 + R ) , where | R | ≤ c (cid:18) s rµ + 1 (cid:19) β f α/ = 8 c (2 f ( t )) β f α/ (cid:18) l − kf ( t )( µ + 1) (cid:19) β f α/ . This ends the proof.Let us recall the definitions A = A M,h = 1
M h M X m =1 U (cid:18) T m − t h (cid:19) U > (cid:18) T m − t h (cid:19) K (cid:18) T m − t h (cid:19) , (10)4nd A = f ( t ) Z R U ( u ) U > ( u ) K ( u ) du, with U ( u ) = (1 , u, . . . , u ˆ d / ˆ d !) . Moreover, λ and λ are the smallest eigenvalues of A and A , respectively. The matrix A is positive definite and thus λ > . See [10].The following result shows that, with high probability, λ stays away from zero. Letus recall that in our context, ˆ d is a generic estimator of d , independent of the onlineset of curves. Since dimension of the matrices A and A are given by this estimator,the probability P ( · ) in Lemma 5 should be understood as the conditional probabilitygiven the estimator ˆ d . Finally, recall that b h = (cid:18) M (cid:19) / (2 b ς t +1) . Lemma 5.
Let K ( · ) be a kernel such that, for any t ∈ R : κ − [ − δ,δ ] ( t ) ≤ K ( t ) ≤ κ [ − , ( t ) , for some < δ < and κ ≥ . (11) Under Assumptions (LP2), (LP3) and (LP4), the matrix A defined as in (10) , with h = b h , is positive semidefinite. Moreover, there exists a positive constant g that dependsonly on K , d , f ( t ) and λ such that, for M sufficiently large, P ( λ ≤ β | M ) ≤ − g M b h ) , ∀ < β ≤ λ / , (12) and, for sufficiently large µ , sup <β ≤ λ / P ( λ ≤ β ) ≤ K exp h − g τ ( µ ) log ( µ ) i where τ ( µ ) = 1log ( µ ) (cid:18) µ log( µ ) (cid:19) ςt ςt , (13) with ς t = d + H t . Here, K is a universal constant.Proof of Lemma 5. Without loss of generality, we could work on the set { ˆ d = d } .Moreover, for simplicity, we write h instead of b h below in this proof.Note that, using Assumption (LP2), for any ≤ i ≤ j ≤ d , the element A i,j tendsalmost surely to the element A i,j as µ goes to infinity. This implies that the matrix A tends to the matrix A . This also implies that, for sufficiently large µ , we have λ > .More precisely, we have: | λ − λ | ≤ k A − A k ≤ ( d + 1) k A − A k ∞ , where k · k denotes the norm induced by the Euclidean norm whereas k · k ∞ denotesthe entrywise sup-norm. Let ˜ P ( · ) and ˜ E ( · ) denote the conditional probability P ( ·| M ) and conditional expectation E ( ·| M ) , respectively. Then: ˜ P ( λ ≤ β ) ≤ ˜ P ( | λ − λ | ≥ λ / ≤ X ≤ i,j ≤ d ˜ P ( | ( A n ) i,j − A i,j | ≥ λ / { d + 1) } ) . Next, we decompose A i,j − A i,j = A i,j − ˜ E ( A i,j ) + ˜ E ( A i,j ) − A i,j . K ( · ) has the support [ − , , we have: (cid:12)(cid:12)(cid:12) ˜ E ( A i,j ) − A i,j (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)Z R h U ( u ) U > ( u ) i i,j K ( u ) (cid:8) f ( t + hu ) − f ( t ) (cid:9) du (cid:12)(cid:12)(cid:12)(cid:12) = L f h β f Z R (cid:12)(cid:12)(cid:12)(cid:12)h U ( u ) U > ( u ) i i,j uK ( u ) (cid:12)(cid:12)(cid:12)(cid:12) du ≤ L f h β f Z R | u | K ( u ) du =: L f k K k h β f . This implies that, for h sufficiently small, that is for M sufficiently large, ˜ P ( λ ≤ β ) ≤ X ≤ i,j ≤ d ˜ P ( | A i,j − ˜ E ( A i,j ) | ≥ λ / { d + 1) } − L f k K k h β f ) ≤ X ≤ i,j ≤ d ˜ P ( | A i,j − ˜ E ( A i,j ) | > λ / { d + 1) } ) . Let us define ξ m,i,j = (cid:20) U (cid:18) T m − t h (cid:19) U > (cid:18) T m − t h (cid:19)(cid:21) i,j K (cid:18) T m − t h (cid:19) = 1 i ! j ! (cid:18) T m − t h (cid:19) i + j K (cid:18) T m − t h (cid:19) . By property (11), we have (cid:12)(cid:12)(cid:12) ξ m,i,j − ˜ E ( ξ m,i,j ) (cid:12)(cid:12)(cid:12) ≤ κ. Moreover, for h sufficiently small, that is for M sufficiently large, f ( t ) ≤ f ( t ) , ∀| t − t | ≤ h , and thus M X m =1 g Var( ξ ( m ) i,j ) ≤ M X m =1 ˜ E h { ξ ( m ) i,j } i ≤ f ( t ) M h Z R (cid:12)(cid:12)(cid:12)(cid:12)h U ( u ) U > ( u ) i i,j (cid:12)(cid:12)(cid:12)(cid:12) K ( u ) du ≤ f ( t ) k K k M h.
Applying the Bernstein inequality [see 8, p. 95], we obtain, for any x > : ˜ P M h M X m =1 (cid:12)(cid:12)(cid:12) ξ m,i,j − ˜ E ( ξ m,i,j ) (cid:12)(cid:12)(cid:12) > x ! ≤ − M x k f k ∞ k K k Mh + κxM h ! . Then equation (12) follows if we define: g = ψ (cid:18) λ d + 1) (cid:19) with ψ ( x ) = x k f k ∞ k K k + κx .
6t remains to prove (13). Let us define the events E β = { λ > β } and F = (cid:8) | b H t − H t | ≤ log − ( µ ) (cid:9) ∩ n ˆ d = d o . Using (12), we have P ( E β ) = 2 E [exp( − g M h ) F ] + P ( F ) ≤ E [exp( − g M h ) F ] + 2 K exp( − µ ) . The last line comes from Assumption (LP3). Note that under F M h = M { ˆ d + b Ht } { ˆ d + b Ht } +1 = M { d + b Ht } { d + b Ht } +1 = M { d + Ht } { d + Ht } +1 + η , with | η | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b H t − H t )(2 { ˆ d + b H t } + 1)(2 { d + H t } + 1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | b H t − H t | . Assumption (LP2) implies that, under F and, for sufficiently large µ , M h ≥ (cid:18) µ log( µ ) (cid:19) { d + Ht } { d + Ht } +1 − µ ) ≥ (cid:18) µ log( µ ) (cid:19) { d + Ht } { d + Ht } +1 . Thus, we have P ( E β ) ≤ (cid:16) − g τ ( µ ) log ( µ ) (cid:17) + 2 K exp( − µ ) ≤ K exp h − g τ ( µ ) log ( µ ) i , for some positive constant K , that does not depend on < β ≤ λ / . Lemma 6.
Let ξ be a positive random variable such that c := E (cid:2) exp (cid:0) η ξ (cid:1)(cid:3) < ∞ , for some positive constant η . Then, for any τ ≥ : E [exp ( τ ξ )] ≤ c exp (cid:16) c τ / (cid:17) where c = (cid:18) η (cid:19) / . Proof of Lemma 6.
Defining ζ = (16 η / / ξ , we can assume, without loss of gener-ality that η = 5 / . Let γ ≥ τ . Remark that, since − τ ξγ = (cid:18) − τ ξ γ (cid:19) − (cid:18) τ ξ γ (cid:19) + 4 (cid:18) τ ξ γ (cid:19) − (cid:18) τ ξ γ (cid:19) , we obtain: E [exp ( τ ξ )] = exp( γ ) E (cid:20) exp (cid:18) − γ (cid:18) − τ ξγ (cid:19)(cid:19)(cid:21) ≤ exp( γ ) E " exp γ (cid:18) τ ξγ (cid:19) ! exp γ (cid:18) τ ξγ (cid:19) ! ≤ exp( γ + η ) E " exp − η − γ η (cid:18) τ ξγ (cid:19) !! exp γ (cid:18) τ ξγ (cid:19) ! . − γ η (cid:18) τ ξγ (cid:19) = " − γ η (cid:18) τ ξγ (cid:19) − " γ η (cid:18) τ ξγ (cid:19) , we obtain: E [exp ( τ ξ )] ≤ exp( γ + η ) E (cid:20) exp (cid:18) τ ξ ηγ + 1256 τ ξ γ (cid:19)(cid:21) . Taking γ = η = τ / / , we obtain: E [exp ( τ ξ )] ≤ exp( τ / ) E (cid:20) exp (cid:18) ξ (cid:19)(cid:21) . This completes the proof.
B Moment bounds for spacings
We need to find an accurate approximation for moments like E [( T ( k ) − T ( l ) ) α | M = m ] where ≤ l < k ≤ K ≤ m , α > . Here, T (1) ≤ . . . ≤ T ( K ) are defined as in section2, that is the subvector of the K closest values to t . We assume that T admits adensity f . Such moments will be considered with k and l such that, for some fixedvalue t ∈ [0 , such that f ( t ) > , max( |b t m c − k | , |b t m c − l | ) m + 1 ≤ k − lm + 1 (14)and k − lm + 1 is small , (15)and converges to zero when m → ∞ . Herein, for any real number a , b a c denotes thelargest integer smaller than or equal to a . These conditions on k and l allows for ( k − l ) increasing slower than m .Let us point out that T (1) ≤ . . . ≤ T ( K ) defined in section 2 is not the orderstatistics from a random sample of T . In fact, T ( k ) , with ≤ k ≤ K , is the ( G + k ) − thorder statistics of the sample T , . . . , T m . Here G is a random variable and its value isdetermined by the way the subvector of K closest values to t is built. It is importantto notice that G depends of the smallest and the largest values in this subvector, butis independent of the other components of the subvector. In particular, this meansthat in the case where T has a uniform distribution, the law of the spacings between T (1) ≤ . . . ≤ T ( K ) coincides with the law of the same type of spacings between theorder statistics of a uniform sample of size m on [0 , . In particular, in the uniformcase, the law of T ( k ) − T ( l ) depends only on m and k − l . For this reason, first we considerthe case of T with uniform law. In the general case, we use the transformation by thedistribution function in order to get back to the uniform case.8 .1 The uniform case Consider U a uniform random variable on [0 , . Let U , . . . , U m be an independentsample of U and let U (1) , . . . , U ( m ) be the order statistics. In the case of a uniformsample, U ( k ) − U ( l ) and U ( k − l ) have the same distribution, that is a beta distributionBeta ( k − l, m − ( k − l ) + 1) . Hence in this case, it is equivalent to study the moments of U ( r ) with ≤ r = k − l ≤ m − . The variable U ( r ) has a Beta ( r, m − r +1) distribution.It also worthwhile to notice that U ( k ) − U ( l ) and U ( l ) are independent, and the same istrue for U ( k ) − U ( l ) and U ( k ) .By elementary calculations, we have E h U α ( r ) i = B ( α + r, m − r + 1) B ( r, m − r + 1) = Γ( α + r )Γ( r ) Γ( m + 1)Γ( m + α + 1) , where B ( · , · ) denotes the beta function and Γ( · ) the gamma function. To derive thebounds for the moments of interest, we use some existing results on the approximationof the gamma functions and the ratios of the gamma functions. The results are recalledin Section B.3 below. B.1.1 Moment bounds in the uniform case
Let M be a random variable taking positive integer values. In the following propositionwe assume that, given the realization of M ≥ K , T , . . . , T M be an independent samplewith uniform distribution on [0 , . Lemma SM 1.
Consider < α ≤ and ≤ l < k ≤ m , and let r = k − l . Then, forany m ≥ K in the support of M , (cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) ( T ( k ) − T ( l ) ) α | M = m (cid:3) − Γ( α + r )Γ( r ) 1( m + 1) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ m Γ( α + r )Γ( r ) 1( m + 1) α and (cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) ( T ( k ) − T ( l ) ) α | M = m (cid:3) − (cid:18) rm + 1 (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) rm + 1 (cid:19) α (cid:20) m + 4 r + 12 mr (cid:21) . Proof of Lemma SM 1.
Given that M = m , T ( k ) − T ( l ) is distributed as U ( r ) , the r − thorder statistic, with ≤ r = k − l ≤ m − , of an independent sample of size m fromthe uniform law on [0 , . Using inequality (21) with x = m + 1 and s = α , we canwrite (cid:12)(cid:12)(cid:12)(cid:12) E h U α ( r ) | M = m i − Γ( α + r )Γ( r ) 1( m + 1) α (cid:12)(cid:12)(cid:12)(cid:12) = Γ( α + r )Γ( r ) 1( m + 1) α (cid:12)(cid:12)(cid:12)(cid:12) ( m + 1) α Γ( m + 1)Γ( m + α + 1) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ m Γ( α + r )Γ( r ) 1( m + 1) α . x = r and s = α , and triangle inequality (cid:12)(cid:12)(cid:12)(cid:12) E h U α ( r ) | M = m i − (cid:18) rm + 1 (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) E h U α ( r ) | M = m i − Γ( α + r )Γ( r ) 1( m + 1) α (cid:12)(cid:12)(cid:12)(cid:12) + (cid:18) rm + 1 (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) Γ( α + r ) r α Γ( r ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) rm + 1 (cid:19) α (cid:20) m Γ( α + r ) r α Γ( r ) + 4 r (cid:21) ≤ (cid:18) rm + 1 (cid:19) α (cid:20) m (cid:18) r (cid:19) + 4 r (cid:21) . B.2 The general case
Given the realization of M , let T , T , . . . be an independent sample from T , a randomvariable independent of M , with an absolute continuous distribution on [0 , . Let f (resp. F ) (resp. Q ) denote the density (resp. distribution function) (resp. quantilefunction) of T . We assume that F is strictly increasing on [0 , and thus Q is theinverse function for F , and Q is differentiable with Q = 1 /f . Then, given M = m ,for any ≤ l < k ≤ m , the joint distribution of the order statistics ( T ( k ) , T ( l ) ) is thesame as the joint distribution of ( Q ( U ( k ) ) , Q ( U ( l ) )) , where U (1) , . . . , U ( m ) is the orderstatistics of an independent uniform sample on [0 , . B.2.1 Moment bounds in the general case
Assume inf t ∈ [0 , f ( t ) > and f is Hölder continuous around t , i.e. there exists L f > , < β f ≤ , and a neighborhood of t in [0 , such that for any u, v in thisneighborhood, | f ( u ) − f ( v ) | ≤ L f | u − v | β f . Lemma SM 2.
Let m be an integer value in the support of M . Let t ∈ [0 , , assumethat k and l are satisfying the conditions (14) - (15) , and let r = k − l . The for any < α ≤ , (cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) ( T ( k ) − T ( l ) ) α | M = m (cid:3) − Γ( α + r )Γ( r ) (cid:18) f ( t )( m + 1) (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ Γ( α + r )Γ( r ) (cid:18) f ( t )( m + 1) (cid:19) α " m + C (cid:18) rm + 1 (cid:19) αβ f / , and (cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) ( T ( k ) − T ( l ) ) α | M = m (cid:3) − (cid:18) rf ( t )( m + 1) (cid:19) α (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) rf ( t )( m + 1) (cid:19) α " m + 4 r + 12 mr + C (cid:18) rm + 1 (cid:19) αβ f / ≤ c (cid:18) rf ( t )( m + 1) (cid:19) α " m + 1 r + 1 mr + (cid:18) rm + 1 (cid:19) αβ f / , (16) with C and c are two constants depending only on α and L f , β f and f ( t ) . roof of Lemma SM 2. In the following, we use several times the following property:for any a, b, α ≥ , ( a + b ) α ≤ max(1 , α − ) ( a α + b α ) . Next, given M = m , E (cid:2) ( T ( k ) − T ( l ) ) α | M = m (cid:3) = E (cid:2) { Q ( U ( k ) ) − Q ( U ( l ) ) } α | M = m (cid:3) .By a first order Taylor expansion of Q ( U ( k ) ) around the point U ( l ) , we get Q ( U ( k ) ) − Q ( U ( l ) ) = 1 f ( t ) (cid:2) U ( k ) − U ( l ) (cid:3) [1 + r ( m, k, l )] , (17)with r ( m, k, l ) = Z f ( t ) − f ( U ( l ) + t [ U ( k ) − U ( l ) ]) f ( U ( l ) + t [ U ( k ) − U ( l ) ]) dt. Note that due to the fact the Q is increasing and almost surely U ( k ) > U ( l ) , the identity(17) implies that r ( m, k, l ) > almost surely. Using the triangle inequality andthe properties of f , | r ( m, k, l ) | ≤ L f f ( t ) / (cid:16) | U ( l ) − t | β f + | U ( k ) − U ( l ) | β f (cid:17) . Let t m = b t ( m − c + 1 m + 1 . Note that / ( m + 1) ≤ t m ≤ m/ ( m + 1) and t m = E [ U ( t m ( m +1)) ] . Next, we can bound | U ( l ) − t | ≤ | U ( l ) − t m | + | t m − t | ≤ | U ( l ) − E [ U ( t m ( m +1)) ] | + 2 m + 1 ≤ | U ( l ) − U ( t m ( m +1)) | + | U ( t m ( m +1)) − E [ U ( t m ( m +1)) ] | + 2 m + 1 . Thus, with the convention U (0) = 0 , E h | U ( l ) − t | β f | M = m i ≤ E h U β f ( | l − t m ( m +1) | ) | M = m i + E h(cid:12)(cid:12) U ( t m ( m +1)) − E [ U ( t m ( m +1)) ] (cid:12)(cid:12) β f | M = m i + (cid:18) m + 1 (cid:19) β f . By the facts presented in the uniform case, when l = t m ( m + 1) , E h U β f ( | l − t m ( m +1) | ) | M = m i = Γ( β f + | l − t m ( m + 1) | )Γ( | l − t m ( m + 1) | ) Γ( m + 1)Γ( m + β f + 1) , and using Wendel’s double inequality (19) with s = β f , and (14), the product of theratios of the gamma functions is bounded from above by (cid:18) | l − t m ( m + 1) | m + 1 + β f (cid:19) β f (cid:18) β f m + 1 (cid:19) ≤ (cid:18) rm + 1 (cid:19) β f .
11n the other hand, using Jensen’s inequality and the variance of a beta distributionwith parameters t m ( m + 1) and (1 − t m )( m + 1) , E h(cid:12)(cid:12) U ( t m ( m +1)) − E [ U ( t m ( m +1)) ] (cid:12)(cid:12) β f | M = m i ≤ E β f / h(cid:12)(cid:12) U ( t m ( m +1)) − E [ U ( t m ( m +1)) ] (cid:12)(cid:12) | M = m i = (cid:18) t m (1 − t m ) m + 2 (cid:19) β f / . Gathering facts and using Lemma SM 1, there exists a constant c such that E h | U ( l ) − t | β f | M = m i ≤ c (cid:18) rm + 1 (cid:19) β f / . On the other hand, since U ( k ) − U ( l ) is independent of U ( l ) , from above and LemmaSM 1 we deduce that for any < α ≤ α ≤ , E h { U ( k ) − U ( l ) } α | r ( m, k, l ) | α | M = m i ≤ C (cid:18) rm + 1 (cid:19) α + α β f / (18)for some constant C depending on L f , β f and f ( t ) .Coming back to relationship (17), taking power α on both sides of the identity, wecan write E (cid:2) { Q ( U ( k ) ) − Q ( U ( l ) ) } α | M = m (cid:3) = 1 f α ( t ) E h U α ( r ) | M = m i + R ( m, k, l ) with R ( m, k, l ) = E (cid:2) { U ( k ) − U ( l ) } α { [1 + r ( m, k, l )] α − } | M = m (cid:3) . Since for any a > − and < α ≤ , | (1 + a ) α − | = | (1 + a ) α/ − || (1 + a ) α/ + 1 | ≤ | a | α/ ( | a | α/ + 2) , using the bound (18) with α = α and α = α/ , | R ( m, k, l ) | ≤ c R (cid:18) rm + 1 (cid:19) α (1+ β f / , for some constant c R depending on L f , β f and f ( t ) . It remains to apply Lemma SM 1to complete the proof. B.3 Wendel’s type inequalities for gamma function ratios
Since in our case, we only need to consider α ∈ (0 , , we could use the sharp boundsfor the ratio of two gamma functions, as deduced by [11]. For any x > and s ≥ ,let R ( x, s ) = Γ( x + s )Γ( x ) . [11] proved that when ≤ s ≤ , (cid:18)
11 + s/x (cid:19) − s ≤ R ( x, s ) x s ≤ . (19)12ince − sx ≤ (cid:18)
11 + s/x (cid:19) − s , ∀ x ≥ , ≤ s ≤ , we can deduce that, when ≤ s ≤ , − x ≤ − sx ≤ R ( x, s ) x s ≤ , ∀ x ≥ . Next, using the recurrence formula for the gamma function, when ≤ s ≤ we canwrite R ( x, s ) x s = (cid:18) s − x (cid:19) R ( x, s − x s − and deduce − x ≤ (cid:18) s − x (cid:19) (cid:18) − s − x (cid:19) ≤ R ( x, s ) x s ≤ s − x ≤ x , ∀ x ≥ . For our purpose, we could deduce the following bounds: for any ≤ s ≤ , − x ≤ R ( x, s ) x s ≤ x , ∀ x ≥ , and − x − ≤ x s R ( x, s ) ≤ x − , ∀ x ≥ . Finally, using again the recurrence formula for the gamma function, when ≤ s ≤ ,we can write R ( x, s ) x s = (cid:18) s − x (cid:19) (cid:18) s − x (cid:19) R ( x, s − x s − and deduce, for ≤ s ≤ , and x ≥ , R ( x, s ) x s ≤ (cid:18) s − x (cid:19) (cid:18) s − x (cid:19) = 1 + 3 x + 2 x ≤ x , ∀ x ≥ , and x s R ( x, s ) ≥ − x + 2( x + 2)( x + 1) ≥ − x + 2 ≥ − x , ∀ x ≥ . On the other hand, R ( x, s ) x s ≥ (cid:18) x (cid:19) R ( x, s − x s − ≥ (cid:18) x (cid:19) (cid:18) − s − x (cid:19) ≥ − x and x s R ( x, s ) ≤ xx + 1 x s − R ( x, s − ≤ xx + 1 xx − ( s − ≤ x x − ≤ x − . Gathering facts, for ≤ s ≤ (cid:12)(cid:12)(cid:12)(cid:12) R ( x, s ) x s − (cid:12)(cid:12)(cid:12)(cid:12) ≤ x , ∀ x ≥ , (20)and (cid:12)(cid:12)(cid:12)(cid:12) x s R ( x, s ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ x − , ∀ x ≥ . (21)13 Additional simulation results
C.1 The settings for X In our simulations, we use three types of stochastic processes to generate the trajec-tories of X that we recall in the following. • Setting 1 : Fractional Brownian motion.
The curves are generated using aclassical fractional Brownian motion with constant Hurst parameter H ∈ (0 , .In this case, the local regularity of the process is the same at every point. Figure1a illustrates one realization of this setting. • Setting 2 : Piecewise fractional Brownian motion.
The curves are generated as aconcatenation of multiple fractional Brownian motions with different regularities,that is with different Hurst parameters for different time periods. In this case,the local regularity is no longer constant. Figure 1b illustrates one realization ofthis setting. • Setting 3 : Integrated fractional Brownian motion.
The curves X t are obtainedas integrals R t W H ( s ) ds , t ∈ [0 , , of the paths of a fractional Brownian motionprocess W H with constant Hurst parameter H . Here, the local regularity of theprocess is the same at each point but will be greater than , thus this setting cor-responds to the case of smooth trajectories. Figure 1c illustrates one realizationof this setting. C.2 On the computation time
Figure 2a presents the violin plots of the needed time to smooth N = 1000 curves.The results are obtained using the simulation (1 , , , , equi , . , . . Theycorrespond to the total CPU time (system time and user time) to estimate the band-width h n and then estimate the curves at their sampling points. We perform thesecomputations on a personal computer equipped with a processor Intel Core i7-6600U,CPU: 2.60 GHz , RAM: 24 Go and rerun the estimation times. We observe that oursmoothing device outperforms cross-validation and plug-in in terms of computationtime: about times faster than the cross-validation. Let H n be a set of bandwiths.For the cross-validation, we may explain these differences because of the computa-tion of the estimator for each bandwidth in H n and each curve X ( n ) of the sample( N × Card ( H n ) calls to the estimation function) while our estimator requires only oneestimation of the regularity of the functions and one evaluation of the estimator percurve ( N calls to the estimation function). In a similar way, figure 2b presents theviolin plots of the time necessary to smooth N = 1000 curves with the parametersof the simulation (3 , , , , equi , . , . . The same personal computeris used and the simulation is also run times. For setting , our procedure is slowerthan for setting , which can be easily explained by the computation of the derivativesof each curve X ( n ) . However, the computation time for the cross-validation is still notcomparable with ours. 14 (a) Brownian motion -2-10 0.00 0.25 0.50 0.75 1.00 (b) Piecewise Brownian motion (c) Integrated Brownian motion Figure 1: Illustrations of simulated data generated according to the different settings.The curves correspond to the generated trajectories without noise that we aim torecover, and the grey points correspond to the noisy measurements.15 ur methodCV1e+02 1e+03 1e+04 1e+05
Time [milliseconds] (a) For setting 1
Our methodCV 1e+03 1e+04 1e+05
Time [milliseconds] (b) For setting 3
Figure 2: Computational times (log scale) µ = 300 µ = 10000123 E s t i m a t i o n o f t h e l o c a l r e g u l a r i t y ς t N = 250 N = 1000 Figure 3: Estimation of the local regularity for fBm, with constant noise variance σ = 0 . , at t = 1 / . True value: ς t = 0 . . C.3 On the estimation of the local regularity
Figure 3 presents the results for the local regularity estimation for fBm with ho-moscedastic noise. The local esitmation of H t is performed at t = 1 / which corre-spond to the middle of the interval. The true value of H t is . . The results show an ac-curate estimator b H t , except, maybe, for the simulation (1 , , , , equi , . , . where there is not enough curves compared to the number of sampling points.Figure 4 presents the results for the local regularity estimation for piecewise fBmwith heteroscedastic noise. The local estimations of H t are performed at t = 1 / , / and / which correspond to the middle of the interval for each regularity. The truevalues of H t are . , . and . , respectively. The true values of σ are . , . and . , respectively. The results show an accurate estimator b H t .16 .00.51.01.52.0 N = 250 N = 1000 µ = 300 N = 250 N = 1000 µ = 1000 E q u i s p a ce d H = 0 . H = 0 . H = 0 . Figure 4: Estimation of the local regularity for piecewise fBm, with non-constant noisevariance σ = 0 . , . and . , at t = 1 / , / and / , respectively. True values: ς t = H t equal to . , . and . , respectively. C.4 On the pointwise risk
For technical convenience, in our theoretical study, we only considered the case wherethe regularity estimator ς t is applied with an independent sample. If one wants tosmooth the curves in the learning set, one can use a leave-one-out method. That is,for each curve, one can estimate the local regularity without that curve, and smooththe curve with the estimate obtained. Our method for calculating b H t is very fast, andsuch a leave-one-curve-out procedure is feasible. This idea was used to analyze theNGSIM data. However, one could also simply smooth the learning set curves usingthe same local regularity estimates obtained from this dataset.Figure 5 presents the estimation of the risks R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for piecewise fBm, with constant noise variance σ = 0 . , when the training and thetest set are the same. The simulation results indicate that our theoretical results couldbe extended to the case where the online set is taken equal to the learning set, thoughthe concentration deteriorates. The theoretical investigation of this issue is left forfuture work.Figure 6 presents the estimation of the risks R ( b X ; 0 . for fBm, with constant noisevariance σ = 0 . .Figure 7 presents the estimation of the risks R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for piecewise fBm, with heteroscedastic noise. The conclusion are the same than thehomoscedastic case. D Traffic flow: Montanino and Punzo [3] methodology
Montanino and Punzo [3] presents a four steps methodology to make the NGSIM datausable. For a complete description of the steps, we let the reader refer to their article17 .000.250.500.751.00 µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . E q u i s p a ce d µ = 300 µ = 1000 µ = 300 µ = 1000 µ = 300 µ = 1000 U n i f o r m N = 250 N = 1000 Figure 5: Estimation of the risks R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for piecewisefBm, with constant noise variance σ = 0 . , when the training and the test set arethe same. 18 = 300 µ = 10000.000.250.500.751.00 R i s k N = 250 N = 1000 Figure 6: Estimation of the risk R ( b X ; 0 . for smoothing the noisy trajectories of afBm, with constant noise variance σ = 0 . . µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . µ = 300 µ = 1000 H = 0 . E q u i s p a ce d N = 250 N = 1000 Figure 7: Estimation of the risks R ( b X ; 1 / , R ( b X ; 0 . and R ( b X ; 5 / for piecewisefBm, with non-constant noise variance σ = 0 . , . and . .193]. We briefly summarize their method here. The four steps below are applied foreach trajectory separately. Step . Removing the outliers They remove the measurements that lead to unreliable values of the accelerationby cutting all the records above a deterministic threshold of
30 m / s . The missingpoints are interpolated using a natural cubic spline with reference points before andafter the outliers. Step . Cutting off the high- and medium-frequency responses in the speed profile They remove the noise from the signal by linear smoothing of the signal withlow-pass filter. The considered one is a first-order Butterworth filter [1] with cutofffrequency of .
25 Hz . Step . Removing the residual unphysical acceleration values, keeping the consistencyrequirements They remove residual peaks that exceed defined thresholds (varying with speedlevels). For that, they move the position of the vehicle when the peak in accelerationappears in order to fulfill the thresholds. In order to prevent inconsistency, a th-degreepolynomial interpolation with constraint on the space traveled plus minor conditionswas applied on a window around the peak points. Step . Cutting off the high- and medium-frequency reponses generated from step 3This step is the same as the step 2 but using the results of the step 3.The methodology of [3] seems very specific to the NGSIM dataset, or at least sometrajectory dataset, and by extension can not be easily applied to others. For using thealgorithm on other trajectory datasets, their method requires some fine-tuning of theparameters.As explained in the main text, the 1714 observation units from the I-80 dataset,available in the NGSIM study, have been recorded at moments of the day when trafficis evolving, it goes from fluid to dense traffic. Therefore, we consider that there arethree groups in the data: a first group corresponding to a fluid (high-speed) traffic, asecond one for in-between fluid and dense traffic, and a third groups corresponding tothe dense (low-speed) traffic. Our local regularity approach, and the kernel smoothinginduced, are applied for each group separately. The three group clustering was per-formed using a Gaussian mixture model estimated by an EM algorithm initialized byhierarchical model-based agglomerative clustering as proposed by Fraley and Raftery[2] and implemented in the R package mclust [7]. The optimal model is then selectedaccording to BIC. The three resulting classes have 239, 869 and 606 velocity trajec-tories, respectively. Plots of randomly selected subsamples of trajectories from eachgroups are provided in Figure 8. E Complements on the real-data applications
In this section, we point out the fact that our situation is not specific only to the trafficflow data, but can be applied to other real datasets.
E.1 Canadian weather
The Canadian Weather dataset [5, 4] records the daily temperature and precipitationsin Canada averaged over the period from 1960 to 1994. Here, we are interested in the20
Normalized time V e h i c l e s v e l o c i t y ( k m / h ) (a) Fluid/high-speed traffic Normalized time V e h i c l e s v e l o c i t y ( k m / h ) (b) In-between traffic Normalized time V e h i c l e s v e l o c i t y ( k m / h ) (c) Dense/low-speed traffic Figure 8: I-80 dataset illustration of the clusters: a sample of five velocity curves fromeach of the three groups of curves 21
Normalized time T e m p e r a t u r e (a) A sample of five temperature curves. t E s t i m a t i o n o f H t (b) Estimation of H t Figure 9: Canadian weather dataset illustration.
Normalized time V o l t ag e (a) A sample of five power curves. t E s t i m a t i o n o f H t (b) Estimation of H t Figure 10: Household active power consumption dataset illustration.average daily temperature for each day of the year. It contains the measurements of canadian stations. Here, we have N = 35 and µ = 365 . A sample of five temperaturecurves has been plotted in the Figure 9a. Figure 9b presents the estimation of H t fordifferent t . We see that the estimation varies around with b K = 25 . E.2 Household Active Power Consumption
The Household Active Power Consumption dataset is part of the Monash University,UEA, UCR time series regression archive [9] and was sourced from the UCI reposi-tory . The data measures diverse energy related features of a house located in Sceaux,near Paris every minute between December 2006 and November 2010. In total, itsrepresents around million data points. These data are used to predict the dailypower consumption of a house. Here, we are only interested in the daily voltage. Thedataset contains N = 746 time series of µ = 1440 measurements. Figure 10a presentsa sample of five curves from this dataset. The estimation of the local regularity H t ,plotted in Figure 10b, is around . with b K = 73 . https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption Normalized time PP G S i g n a l (a) A sample of five PPG curves. t E s t i m a t i o n o f H t (b) Estimation of H t Figure 11: PPG-Dalia dataset illustration.
E.3 PPG-Dalia
The PPG-Dalia dataset is also part of the Monash University, UEA, UCR time seriesregression archive [9] and was also sourced from the UCI repository . PPG sensorsare widely used in smart wearable devices to measure heart rate [6]. They containa single channel PPG and 3D accelerometer motion data recorded from subjectsperforming various real-life activities. Measurements from each subject are segmentedinto second windows with second overlaps, resulting in N = 65000 time series of µ = 512 features. Here, we are interested in the PPG channel. A sample of five curvesis plotted in Figure 11a. The estimation of the local regularity H t is also around . (see Figure 11b) with b K = 25 . References [1] S. Butterworth. On the theory of filter amplifiers.
Wireless Engineer , 7(6):536–541, 1930.[2] C. Fraley and A. E. Raftery. Model-Based Clustering, Discriminant Analysis, andDensity Estimation.
Journal of the American Statistical Association , 97(458):611–631, June 2002.[3] M. Montanino and V. Punzo. Making NGSIM Data Usable for Studies on TrafficFlow Theory.
Transportation Research Record: Journal of the TransportationResearch Board , 2390:99–111, Dec. 2013.[4] J. Ramsay and B. W. Silverman.
Functional Data Analysis . Springer Series inStatistics. Springer-Verlag, New York, 2 edition, 2005.[5] J. O. Ramsay and B. W. Silverman.
Applied Functional Data Analysis: Methodsand Case Studies . Springer Series in Statistics. Springer-Verlag, New York, 2002.[6] A. Reiss, I. Indlekofer, P. Schmidt, and K. Van Laerhoven. Deep PPG: Large-ScaleHeart Rate Estimation with Convolutional Neural Networks.
Sensors (Basel,Switzerland) , 19(14), July 2019. https://archive.ics.uci.edu/ml/datasets/PPG-DaLiA TheR Journal , 8(1):289–317, 2016.[8] R. J. Serfling.
Approximation Theorems of Mathematical Statistics . John Wiley &Sons, Inc., New York, Sept. 2009. Wiley Series in Probability and MathematicalStatistics.[9] C. W. Tan, C. Bergmeir, F. Petitjean, and G. I. Webb. Monash University, UEA,UCR Time Series Regression Archive. arXiv:2006.10996 [cs, stat] , June 2020.[10] A. B. Tsybakov.
Introduction to Nonparametric Estimation . Springer Series inStatistics. Springer New York, 2009.[11] J. G. Wendel. Note on the Gamma Function.