[PDF] Denoising modulo samples: k-NN regression and tightness of SDP relaxation

Abstract

Full PDF

DDenoising modulo samples: k -NN regression and tightness of SDPrelaxation Micha¨el Fanuel ∗ [email protected] Hemant Tyagi † [email protected] September 11, 2020

Abstract

Many modern applications involve the acquisition of noisy modulo samples of a function f ,with the goal being to recover estimates of the original samples of f . For a Lipschitz function f : [0 , d → R , suppose we are given the samples y i = ( f ( x i ) + η i ) mod 1; i = 1 , . . . , n where η i denotes noise. Assuming η i are zero-mean i.i.d Gaussian’s, and x i ’s form a uniform grid,we derive a two-stage algorithm that recovers estimates of the samples f ( x i ) with a uniformerror rate O (( log nn ) d +2 ) holding with high probability. The ﬁrst stage involves embedding thepoints on the unit complex circle, and obtaining denoised estimates of f ( x i ) mod 1 via a k NN(nearest neighbor) estimator. The second stage involves a sequential unwrapping procedurewhich unwraps the denoised mod 1 estimates from the ﬁrst stage.Recently, Cucuringu and Tyagi [7] proposed an alternative way of denoising modulo 1 datawhich works with their representation on the unit complex circle. They formulated a smoothnessregularized least squares problem on the product manifold of unit circles, where the smoothnessis measured with respect to the Laplacian of a proximity graph G involving the x i ’s. This is anonconvex quadratically constrained quadratic program (QCQP) hence they proposed solvingits semideﬁnite program (SDP) based relaxation. We derive suﬃcient conditions under whichthe SDP is a tight relaxation of the QCQP. Hence under these conditions, the global solutionof QCQP can be obtained in polynomial time. In many real-life applications, we are often given access to noisy modulo samples of an underlyingsignal f : R d → R , i.e., y i = ( f ( x i ) + η i ) mod ζ ; i = 1 , . . . , n (1.1)for some ζ ∈ R + , where η i denotes noise. Here, a mod ζ ∈ [0 , ζ ) is the remainder term so that a = qζ +( a mod ζ ) for an integer q . For example, self-reset analog to digital converters (ADCs) are anew generation of ADCs which handle voltage surges by resetting its value via a modulo operation.In other words, if the voltage signal lies outside the range [0 , ζ ], then its value is simply reset bytaking its modulo ζ value [15, 24, 30]. Another important application is phase unwrapping wherethe general idea is to infer the structure of an object by transmitting waveforms, and capturing thephase coherence (measured modulo 2 π radians) between the transmitted and scattered waveforms. Authors are listed in alphabetical order ∗ KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, SignalProcessing and Data Analytics, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium. † Inria, Univ. Lille, CNRS, UMR 8524 - Laboratoire Paul Painlev´e, F-59000 a r X i v : . [ m a t h . S T ] S e p his arises for instance in InSAR (synthetic radar aperture interferometry) for estimating the depthmap of a terrain (e.g., [9, 32]); MRI, for estimating the position of veins in tissues (e.g., [10, 16]),and non destructive testing of components (e.g., [21, 12]), to name a few applications.Given the measurement model in (1.1), where we assume from now that ζ = 1, we are interestedin unwrapping ( y i ) i to recover the original samples f ( x i ) for i = 1 , . . . , n . This is clearly only possibleup to a global integer shift. Moreover, it is not diﬃcult to see that one needs to make additionalstructural assumptions on f , such as of smoothness (Lipschitz, continuous diﬀerentiability etc.).Let us ﬁrst discuss a natural sequential procedure for unwrapping y i with f assumed to beLipschitz smooth; the reader is referred to Section 2.3 for details. To begin with, if there is nonoise, one can exactly recover the original samples f ( x i ) provided n is large enough. To see this,let us consider ﬁrst the univariate setting with f : [0 , → R , and suppose for simplicity that the x i ’s form a uniform grid. If n is large enough w.r.t. the Lipschitz constant of f , then the followingidentity is easy to verify (see Lemma 2) and is remniniscent of the classical Itoh’s condition [13]from the phase unwrapping literature, f ( x i ) − f ( x i − ) =  y i − y i − ; if | y i − y i − | < / , y i − y i − ; if y i − y i − < − / , − y i − y i − ; if y i − y i − > / . This directly suggests a simple sequential procedure for recovering the original samples f ( x i ) using y i . The above argument can be extended to the noisy setting – one can show that if η i mod 1 ≤ δ for all i , then provided δ (cid:46) n is large enough, one can apply the same sequential procedureto obtain estimates (cid:101) f ( x i ) such that for some integer q (cid:63) , (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ, i = 1 , . . . , n ; (1.2)see Lemma 3. In fact, perhaps surprisingly, one can generalize the above discussion to the generalmultivariate setting as well. In particular, we show that if δ (cid:46) n is large enough, thenthere exists a sequential unwrapping procedure (see Algorithm 2) that generates estimates (cid:101) f ( x i )satisfying (3); see Lemma’s 4, 5.An important takeaway from the above discussion is that one could consider a two stage ap-proach for unwrapping – ﬁrst denoise the modulo samples ( y i ) i , and then apply the aforementionedunwrapping procedure. Indeed, the hope is that the denoising procedure will lead to estimates of f ( x i ) mod 1 with a uniform error bound much smaller than δ , which in turn will improve the ﬁnalerror estimate for the unwrapped samples on account of (1.2). This is also the basis of our ﬁrstmain result for this problem which we now outline. data We derive an algorithm (namely, Algorithm 3) for the problem of unwrapping the noisy modulo1 samples ( y i ) ni =1 generated as in (1.1). The algorithm is two-stage where in the ﬁrst stage weobtain denoised modulo samples by performing a k NN ( k nearest neighbor) regression procedure.Speciﬁcally, this is done by embedding the mod 1 data on to the unit circle as z i = exp( ι πy i )for each i , and by then performing a k NN estimation in this space (see Algorithm 1). Assuming η i ∼ N (0 , σ ) to be i.i.d. Gaussian, and the x i ’s forming a uniform grid in [0 , d , this results indenoised estimates of f ( x i ) mod 1 satisfying, with high probability, a uniform error bound (w.r.t thewrap around metric) of O (( log nn ) d +2 ). Then, feeding the denoised mod 1 samples to the multivariateunwrapping procedure in Algorithm 2, the same error rate carries over for the unwrapped estimates,i.e., δ = O (( log nn ) d +2 ) in (1.2). We outline this below in the form of the following informal Theorem;the full result is in Theorem 4 and is completely non-asymptotic.2 heorem 1. In the model (1.1) with ζ = 1 , suppose η i ∼ N (0 , σ ) i.i.d and f : [0 , d → R is Lipschitz continuous w.r.t the (cid:96) ∞ norm. Assuming the x i ’s form a uniform grid in [0 , d , let σ ≤ π . Then, if n is large enough, the estimates (cid:101) f ( x i ) obtained from Algorithm 3 satisfy, withhigh probability, the uniform error bound (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) = O (cid:32)(cid:18) log nn (cid:19) d +2 (cid:33) , i = 1 , . . . , n ; for some q (cid:63) ∈ Z . The rate (cid:16) log nn (cid:17) d +2 is the well known minimax optimal rate for estimating a Lipschitz functionin the L ∞ norm over the cube [0 , d (using a uniform grid), see for e.g., [20, Theorem 1.3.1]. Whilewe defer a detailed discussion with existing work to the end of the paper, we remark that such aresult has so far been elusive in the literature for the modulo measurement model in (1.1). data In the previous section, observe that the denoising of the modulo 1 samples was performed byrepresenting the samples y i on the unit complex circle (denoted T ) as z i = exp( ι πy i ), with z =( z , . . . , z n ) ∈ T n . Here, T n is the product manifold of n unit complex circles. This representationidea is motivated from a recent paper of Cucuringu and Tyagi [7] where they proposed a diﬀerentscheme for denoising mod 1 samples, which we now describe.For a smooth function f , we know that exp( ι πf ( x i )) ≈ exp( ι πf ( x j )) provided x i ≈ x j . Hence,[7] proposed constructing a proximity graph G on the sampling points – with an edge between i and j provided x i is close enough to x j – and solving the following optimization problemmin g ∈ T n (cid:107) g − z (cid:107) + λg ∗ Lg ⇐⇒ min g ∈ T n λg ∗ Lg − g ∗ z ) . (QCQP)Here, L is the Laplacian of G , and λ ≥ G . This is a non-convex problem – albeit with a convex objective – and it isunclear whether one can eﬃciently (i.e., in polynomial time) ﬁnd a global minimizer. Therefore,they considered solving the semideﬁnite progamming relaxation of (QCQP), i.e.,min W ∈ C ( n +1) × ( n +1) Tr(

T W ) s.t W (cid:23) , W ii = 1 (SDP)which is solvable in polynomial time via interior point methods (see for e.g. [29]). The matrices T, W are deﬁned in (3.2) and the steps leading to the formulation (SDP) are outlined in Section 3.1.As discussed therein, if the solution X of (SDP) is rank 1, then it has the form X = (cid:18)(cid:98) g (cid:98) g ∗ (cid:98) g (cid:98) g ∗ (cid:19) = (cid:18)(cid:98) g (cid:19) (cid:0)(cid:98) g ∗ (cid:1) (1.3)where (cid:98) g ∈ T n is a global solution of (QCQP). Main result.

Hence an important question is to identify conditions under which (SDP) is atight relaxation of (QCQP), i.e., its solution is rank 1, since under those conditions the solutionof (QCQP) would have been obtained in polynomial time. Such an analysis was missing in [7]although experimentally, (SDP) was shown to perform quite well. This brings us to the secondmain result of this paper where we take a step towards answering this question. It is outlined inthe theorem below; for the complete statement, see Theorem 6.3 heorem 2.

Let z ∈ T n be a noisy observation of the ground truth signal h ∈ T n satisfying (cid:107) z − h (cid:107) ∞ ≤ δ . Denote ∆ to be the maximum degree of the graph G . Then there exist constants < c , c < such that if δ ≤ c and λ ∆ ≤ c , then (SDP) has a unique solution X ∈ C ( n +1) × ( n +1) of the form (1.3) . Consequently, (cid:98) g is the unique solution of (QCQP) . The above result is for any graph G , and does not make any assumptions on the noise, other thenbeing uniformly bounded. While in the setup of [7], we have h i = exp( ι πf ( x i )), the frameworkin which we study the problem is more abstract since it applies to any graph G and does notnecessarily assume the model in (1.1). Since (cid:107) z − h (cid:107) ∞ ≤ δ (cid:46) h w.r.t. G should also play an important role as part of the conditions ensuring tightness. WhileTheorem 6 does have a smoothness parameter 0 ≤ B n ≤ We now discuss the notation used throughout, followed by the outline of the rest of the paper.

Notation.

We will denote [ n ] = { , . . . , n } , and ι = √− T n := { u ∈ C n : | u i | = 1; i = 1 , . . . , n } is the product manifold of unit radius circles, i.e., T n = T × · · · × T . For u ∈ C , we deﬁne aprojection on T n as (cid:18) u | u | (cid:19) i = (cid:40) u i | u i | if u i (cid:54) = 0 , i ∈ [ n ], and also deﬁne the angle arg( u ) ∈ [0 , π ) such that u = | u | exp( ι arg( u )). Denote d w : [0 , → [0 , /

2] to be the usual wrap around metric deﬁned as d w ( t, t (cid:48) ) := max (cid:8)(cid:12)(cid:12) t − t (cid:48) (cid:12)(cid:12) , − (cid:12)(cid:12) t − t (cid:48) (cid:12)(cid:12)(cid:9) . For a vector x ∈ C n and any 1 ≤ p ≤ ∞ , (cid:107) x (cid:107) p denotes the usual (cid:96) p norm of x . We say that a function f : [0 , d → C is M -Lipschitz if there exists a constant M > | f ( x ) − f ( y ) | ≤ M (cid:107) x − y (cid:107) ∞ for all x, y ∈ [0 , d . Moreover, the L ∞ norm of f is deﬁned as (cid:107) f (cid:107) ∞ := sup x ∈ [0 , d | f ( x ) | . For non-negative numbers a, b , we write a (cid:46) b if there exist a constant C > a ≤ Cb . Furthermore,we write a (cid:16) b if a (cid:46) b and b (cid:46) a . Finally, we also denote by ◦ the usual Hadamard product. Outline of paper.

Section 2 contains the analysis for the unwrapping problem, culminatingwith Theorem 4 which is our main result for this problem. Section 3 derives suﬃcient conditionsunder which (SDP) is a tight relaxation of (QCQP), with Theorem 6 being our main result for thisproblem. Section 4 contains some numerical simulations, and we conclude with a discussion withrelated work along with directions for future work in Section 5.4

Denoising and unwrapping via k NN regression

In this section, we introduce and analyze an algorithm for robustly unwrapping noisy mod 1 samplesof a Lipschitz function. We begin by formally outlining the problem setup.

Let f : [0 , d → R be an unknown M -Lipschitz function. Let the circle-valued function h : [0 , d → T be given as h ( x ) = exp( ι πf ( x )). Fact 1.

The function h : [0 , d → T be given as h ( x ) = exp( ι πf ( x )) is πM -Lipschitz.Proof. We have | h ( x ) − h ( y ) | = 2 | sin[ π ( f ( x ) − f ( y ))] | ≤ π | f ( x ) − f ( y ) | ≤ πM (cid:107) x − y (cid:107) ∞ , for all x, y ∈ [0 , . We consider n datapoints on a uniform grid X of points x i = ( x i , . . . , x i d ) ∈ [0 , d indexed bythe d -tuple i = ( i , . . . , i d ) ∈ [ m ] d , where x i j = i j − m − . We assume that we have noisy versions of f ( x i ) modulo 1, that is, y i = ( f ( x i ) + η i ) mod 1 where η i ∼ N (0 , σ ) i.i.d. These noisy modulo 1samples are mapped to the complex circle as z i = exp( ι πy i ) = h i exp( ι πη i ) , (2.1)where, for simplicity, we write h i = h ( x i ). The following simple fact will be used extensively in ouranalysis. Fact 2.

We have E [ z i ] = e − π σ h i for all i ∈ [ m ] d .Proof. This follows from the moment generating function of a normal.Our goal is to obtain estimates of the samples f ( x i ), namely (cid:101) f ( x i ), such that for some integer q (cid:63) ∈ Z , (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) is “small” for all i ∈ [ m ] d . To this end, we propose a two-stage strategyoutlined formally as Algorithm 3.1. In the ﬁrst stage, we consider a k nearest neighbors ( k NN) regression scheme applied tothe noisy samples z i . The purpose of this stage is to produce denoised mod 1 estimates (cid:98) g ( x i ) ∈ [0 ,

1) for all i ∈ [ m ] d . This is outlined as Algorithm 1 and analyzed in Section 2.2.2. The second stage involves a sequential unwrapping procedure which takes the denoised esti-mates (cid:98) g ( x i ) as input, and outputs the ﬁnal unwrapped estimates (cid:101) f ( x i ) for all i ∈ [ m ] d . Thisis outlined as Algorithm 2 and analyzed in Section 2.3. k NN regression

Our k NN scheme for denoising the modulo samples is outlined in Algorithm 1. Before proceedingwith its analysis, it will be useful to introduce some preliminaries for the k NN estimator.5 lgorithm 1

Denoising modulo samples with k NN regression Input: integer k >

X ⊂ [0 , d , |X | = n = m d ; noisy modulo samples y i = ( f ( x i ) + η i ) mod 1 for i ∈ [ m ] d . Output: denoised modulo samples (cid:98) g ( x i ) ∈ [0 ,

1) for all i ∈ [ m ] d . Compute z j = exp( ι πy j ) for all j ∈ [ m ] d . for j ∈ [ m ] d do Compute h k ( x j ) = k (cid:80) i : x i ∈N k ( x j ) z i and normalize (cid:98) h k ( x j ) = h k ( x j ) | h k ( x j ) | . (cid:98) g ( x j ) = π arg (cid:0)(cid:98) h k ( x j ) (cid:1) . end for k NN estimator.

Let x ∈ [0 , d and k be a strictly positive integer. Then, we deﬁne the k NNradius of x as the smallest distance such that a (cid:96) ∞ ball centered at x contains at least k neighbours,namely r k ( x ) = inf { r : | B ( x, r ) ∩ X | ≥ k } where B ( x, r ) = { x (cid:48) ∈ [0 , d : (cid:107) x − x (cid:48) (cid:107) ∞ < r } . Hence,the k NN set of x is simply N k ( x ) = B ( x, r k ( x )) ∩ X . We are ready to introduce the k NN estimator (cid:98) h k ( x ) = h k ( x ) | h k ( x ) | with h k ( x ) = 1 |N k ( x ) | (cid:88) i : x i ∈N k ( x ) z i . Notice that (cid:98) h k ( x ) does not depend on the normalization of h k ( x ). Hence, we introduce (cid:101) h k ( x ) = e π σ h k ( x ) where by construction E [ (cid:101) h k ( x )] = |N k ( x ) | (cid:80) i : x i ∈N k ( x ) h i holds thanks to Fact 2. Statistical guarantees.

In order to obtain statistical guarantees for the k NN regressor, we ﬁrstderive an expression for the k NN radius on a grid which essentially allows for bounding the bias ofour estimator.

Lemma 1.

With the notations deﬁned above, it holds that sup x ∈ [0 , d r k ( x ) = (cid:100) k /d (cid:101)− m − . Proof.

The supremum can be attained at several x ∈ [0 , d , in particular, it is attained at a cornerof the hyper-cube [0 , d , say x = 0 to ﬁx the ideas. Let a sub-cube be positioned at 0 so that eachof its edges contains (cid:100) k /d (cid:101) grid points. This means that this hypercube contains at least k gridpoints and that the length of its edge is c = (cid:100) k /d (cid:101)− m − . This cube is included in a (cid:96) ∞ ball centeredat 0 and of radius c which completes the proof.An upper bound on the pointwise expected risk follows readily from Lemma 1. This result isgiven in Proposition 1, which displays a classical bias-variance trade-oﬀ in terms of the number ofneighbours k . Proposition 1 (Pointwise expected risk) . Let x ∈ [0 , d . If σ ≤ π and if n ≥ d , we have E (cid:12)(cid:12)(cid:12)(cid:98) h k ( x ) − h ( x ) (cid:12)(cid:12)(cid:12) ≤ π M (cid:0) kn (cid:1) /d + π σ k . Proof.

Thanks to Fact 3, we have the following inequality | (cid:98) h k ( x ) − h ( x ) | ≤ | (cid:101) h k ( x ) − h ( x ) | , so thatwe upper bound only | (cid:101) h k ( x ) − h ( x ) | . Classically, we have the bias/variance splitting E (cid:12)(cid:12)(cid:12)(cid:101) h k ( x ) − h ( x ) (cid:12)(cid:12)(cid:12) = E (cid:12)(cid:12)(cid:12)(cid:101) h k ( x ) − E [ (cid:101) h k ( x )] (cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) variance + (cid:12)(cid:12)(cid:12) E [ (cid:101) h k ( x )] − h ( x ) (cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) bias . | h ( x i ) − h ( x ) | ≤ πM (cid:107) x i − x (cid:107) ∞ , and therefore, bias ≤ π M r k ( x ) .By using Lemma 1, we ﬁnd sup x ∈ [0 , d r k ( x ) = (cid:100) k /d (cid:101) − m − ≤ k /d m for m ≥ / ( m − ≤ /m . The variance can be exactly computed as follows E (cid:12)(cid:12)(cid:12)(cid:101) h k ( x ) − E [ (cid:101) h k ( x )] (cid:12)(cid:12)(cid:12) = 1 |N k ( x ) | (cid:88) i : x i ∈N k ( x ) E (cid:12)(cid:12)(cid:12) z i e − π σ − h i (cid:12)(cid:12)(cid:12) = e π σ − |N k ( x ) | , where we used again the formula of moment generating function of a normal. Next, we use theinequality e x ≤ x if x ≤

1. This gives e π σ − ≤ π σ if 4 π σ ≤

1. By combining thebounds on the bias and variance, we obtain the desired result.

Corollary 1 (Rate for pointwise expected risk) . By choosing the number of neighbours k = (cid:100) k (cid:63) (cid:101) with k (cid:63) = (cid:16) dσ M (cid:17) dd +2 n d +2 , we obtain the following bound on the expected risk E (cid:12)(cid:12)(cid:12)(cid:98) h k ( x ) − h ( x ) (cid:12)(cid:12)(cid:12) ≤ π M dd +2 σ d +2 n − d +2 . The rate n − d +2 matches the pointwise rate for estimation of a Lipschitz function on the [0 , d cubefrom a uniform grid, see page 24 of [20].Proof. The proof goes as follows. The value of k (cid:63) is obtained by minimizing the RHS of the boundin Proposition 1, which is considered as a function over the reals of the form R ( k ) = α ( n ) k /d + β/k. It is minimized at k (cid:63) = (cid:16) dβ α ( n ) (cid:17) dd +2 . Hence, by using k (cid:63) ≤ (cid:100) k (cid:63) (cid:101) ≤ k (cid:63) , we ﬁnd R ( k (cid:63) ) ≤ R ( k ) = α ( n ) k /d + βk ≤ α ( n )(2 k (cid:63) ) /d + βk (cid:63) = (cid:101) R ( k (cid:63) ) , with k = (cid:100) k (cid:63) (cid:101) . The latter upper bound is (cid:101) R ( k (cid:63) ) = α ( n ) dd +2 β d +2 (cid:32) d (cid:18) d (cid:19) d +2 + (cid:18) d (cid:19) dd +2 (cid:33) ≤ α ( n ) dd +2 β d +2 where we used the simpliﬁcations ( d ) d +2 ≤ ( d +22 ) d +2 ≤ d ) dd +2 ≤

2, as well as 2 d ≤ d ≥

1. Then, the ﬁnal expression is obtained by using the inequality 2 dd +2 < d ≥ Theorem 3. If σ ≤ π and n ≥ d , then the following is true.1. (In-sample (cid:96) ∞ bound) With probability at least − /n , we have (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ πM (cid:18) kn (cid:19) /d + 643 (2 π σ + 1) log( n ) k + 32 πσ (cid:114) log( n ) k , ∀ i ∈ [ m ] d . (2.2)7 . (Out-of-sample L ∞ bound) With probability at least − /n , it holds (cid:107) (cid:98) h k − h (cid:107) ∞ ≤ πM (cid:18) kn (cid:19) /d + 32( d + 2) (cid:32) π σ + 43 log( n ) k + √ πσ (cid:114) log( n ) k (cid:33) . (2.3)In order to have a non-empty bound, since | (cid:98) h k ( x ) − h ( x ) | ≤

2, we need to make sure that theRHS in Theorem 3 is smaller than 2. Notice that, compared with k NN regression in the absenceof the mod 1 indeterminacy [14], the variance term in the bound does not vanish if σ = 0. This isdue to the Bernstein inequality used in the proof. Proof.

Let x ∈ [0 , | (cid:98) h k ( x ) − h ( x ) | ≤ | (cid:101) h k ( x ) − h ( x ) | , thanks toFact 3. Then, we have | (cid:101) h k ( x ) − h ( x ) | ≤ | (cid:101) h k ( x ) − E η [ (cid:101) h k ( x )] | (cid:124) (cid:123)(cid:122) (cid:125) := V x + | E η [ (cid:101) h k ( x )] − h ( x ) | (cid:124) (cid:123)(cid:122) (cid:125) := b x . Consider ﬁrstly the second term, interpreted as a bias and can be upper bounded with probability1. It holds that b x = (cid:12)(cid:12)(cid:12) E η [ (cid:101) h k ( x )] − h ( x ) (cid:12)(cid:12)(cid:12) ≤ |N k ( x ) | (cid:88) i : x i ∈N k ( x ) | h ( x i ) − h ( x ) | ≤ πM r k ( x ) , where we used Fact 1 that | h ( x i ) − h ( x ) | ≤ πM (cid:107) x i − x (cid:107) ∞ . On the other hand, the ﬁrst term canbe interpreted as a variance term. Let V x = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:88) i ∈N k ( x ) Z i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , with Z i = z i e − π σ − h i a set of zero-mean independent random variables. Notice that | Z i | ≤ e π σ := K almost surelyand Var( Z i ) = e π σ − S . In order to be able to use a concentration result, we split the sumabove into real and imaginary parts as follows V x ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:88) i ∈N k ( x ) Re Z i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:88) i ∈N k ( x ) Im Z i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = u x + w x . (2.4)Each of the two terms above involves a sum of zero-mean independent variables which can bebounded thanks to Bernstein inequality for bounded random variables (cfr. Theorem 7 with thechange of variables t (cid:55)→ kt in Appendix). Firstly, we notice that | Re Z i | ≤ K and Var(Re Z i ) ≤ S .Then, Bernstein inequality givesPr ( u x > t ) ≤ (cid:18) − kt / S + Kt/ (cid:19) , (2.5)while the same bound holds for Pr ( v x > t ). Then, thanks to the union bound, we ﬁnd thatPr ( u x > t/ v x > t/ ≤ (cid:18) − kt / S + Kt/ (cid:19) , u x ≤ t/ v x ≤ t/ ≥ − (cid:18) − kt / S + Kt/ (cid:19) . Hence, in view of (2.4), the variance V x ≤ t with a probability equal at least to 1 − (cid:16) − kt / S + Kt/ (cid:17) . In-sample bound.

Let i ∈ [ m ] d . The ﬁrst statement follows by taking a ﬁxed x = x i for some i ∈ [ m ] d . We know thanks to Bernstein inequality (2.5) that V x i ≤ t with a probability larger than1 − (cid:16) − kt / S + Kt/ (cid:17) . This yields a condition on the minimal value for t > δ . Indeed, thanks to a union bound, we ﬁndPr  (cid:91) x i : i ∈ [ m ] d ( V x i > t )  ≤ n exp (cid:18) − kt / S + Kt/ (cid:19) . Now, we redeﬁne the failure probability δ such that 4 n exp (cid:16) − kt / S + Kt/ (cid:17) ≤ δ , which yields to theequivalent condition k t − K (cid:18) nδ (cid:19) t − S log (cid:18) nδ (cid:19) ≥ . This gives, with a probability larger than 1 − δ , V x i ≤ K k log (cid:18) nδ (cid:19) + (cid:115)(cid:18) K k log (cid:18) nδ (cid:19)(cid:19) + 8 S k log (cid:18) nδ (cid:19) , ∀ i ∈ [ m ] d . Now, we recall that (cid:12)(cid:12)(cid:12)(cid:101) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ b x i + V x i , where we can upper bound b x i ≤ πM (cid:100) k /d (cid:101)− m − byusing Lemma 1. Then, if n /d = m ≥

2, one obtains (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ πM (cid:100) k /d (cid:101) − m − K k log (cid:18) nδ (cid:19) + 4 (cid:115) S k log (cid:18) nδ (cid:19) ≤ πM (cid:18) kn (cid:19) /d +  K (cid:0) nδ (cid:1) k + 4 √ S (cid:115) log (cid:0) nδ (cid:1) k  by using √ a + b ≤ √ a + √ b in the ﬁrst inequality, and then, m − ≥ m/ (cid:100) k /d (cid:101) − ≤ k /d .Next, in order to simplify the expressions of K and S , we use the inequality exp( x ) ≤ x if x ≤

1. Namely, if 2 π σ ≤

1, we can upper bound K = 1 + e π σ ≤ π σ . Similarly, S = e π σ − ≤ π σ if 4 π σ ≤

1. Next, we choose δ = 1 /n so that we ﬁnd (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ πM (cid:18) kn (cid:19) /d + 16 (cid:32)

13 (2 π σ + 1) log(4 n ) k + πσ (cid:114) log(4 n ) k (cid:33) . In order to simplify the bound above, we use the inequality log(2 n ) ≤ n ) for n ≥

2. Thisﬁnally yields (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ πM (cid:18) kn (cid:19) /d + 643 (2 π σ + 1) log( n ) k + 32 πσ (cid:114) log( n ) k . ut-of-sample bound. Now, we can use (2.5) and a union bound considering all distinct k NNsets to show that Pr (cid:32) sup x ∈ [0 , d V x > t (cid:33) ≤ N exp (cid:18) − kt / S + Kt/ (cid:19) , where N = |{N k ( x ) : x ∈ [0 , d }| is the number of distinct k NN sets. Now we use the inequality N ≤ d · n d due to Jiang ([14] Lemma 3). Again, let δ >  sup x ∈ [0 , d V x > K k log (cid:18) Nδ (cid:19) + (cid:115)(cid:18) K k log (cid:18) Nδ (cid:19)(cid:19) + 8 S k log (cid:18) Nδ (cid:19) ≤ δ. The remainder of the proof follows the same steps above, so that we ﬁnd (cid:107) (cid:98) h k − h (cid:107) ∞ ≤ πM (cid:18) kn (cid:19) /d + 163 (2 π σ + 1) log(4 dn d +2 ) k + 16 πσ (cid:114) log(4 dn d +2 ) k . Similarly as above, we have log(4 dn d +2 ) ≤ dn d +2 ) for dn d +2 ≥

2. Then, we uselog( dn d +2 ) ≤ log (cid:16) ( d + 2) n d +2 (cid:17) = ( d + 2) log (cid:16) ( d + 2) d +2 n (cid:17) . We remark that x /x (with x >

0) is maximized at x = e . Hence, we have ( d + 2) d +2 ≤ e e (with e e ≈ . < dn d +2 ) ≤ d + 2) log n, where we used that log(2 n ) ≤ n ) for n ≥

2. Therefore, we ﬁnd (cid:107) (cid:98) h k − h (cid:107) ∞ ≤ πM (cid:18) kn (cid:19) /d + 32( d + 2) (cid:32) π σ + 43 log nk + √ πσ (cid:114) log nk (cid:33) , where we used that √ d + 2 ≤ d + 2 since d ≥ (cid:96) ∞ bound is a direct consequence of Theorem 3. Corollary 2 (Statistical rates for (cid:96) ∞ risk – in sample) . Let σ ≤ π and n ≥ d . Let the number ofneighbours be k = (cid:100) k (cid:63) (cid:101) with k (cid:63) = n d +2 (log n ) dd +2 (cid:32) d ( π σ +23 + πσ ) πM (cid:33) dd +2 . Provided n log( n ) ≥ (cid:18) πM d ( π σ + πσ ) (cid:19) d , the following upper bound on the (cid:96) ∞ risk in (2.2) holds withprobability at least − /n , (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ γ (cid:18) log nn (cid:19) d +2 , ∀ i ∈ [ m ] d (2.6)10 ith γ = 6(8 πM ) dd +2 (cid:16) π σ +23 + πσ ) (cid:17) d +2 . Furthermore, if (2.6) holds and the RHS of (2.6) isless than or equal to , then this implies d w ( (cid:98) g ( x i ) , g ( x i )) ≤ γ (cid:18) log nn (cid:19) d +2 , ∀ i ∈ [ m ] d where (cid:98) g ( x ) = π arg (cid:0)(cid:98) h k ( x ) (cid:1) and g ( x ) = π arg (cid:0) h ( x ) (cid:1) .Proof. Assuming k ≥ log n , the bound in (2.2) simpliﬁes to (cid:12)(cid:12)(cid:12)(cid:98) h k ( x i ) − h ( x i ) (cid:12)(cid:12)(cid:12) ≤ πM (cid:18) kn (cid:19) /d + 32 (cid:18)

13 (4 π σ + 2) + πσ (cid:19) (cid:114) log( n ) k = R ( k )with R ( k ) = α ( n ) k /d + β ( n ) k − / where α ( n ) = πMn /d and β ( n ) = 32 (cid:16) π σ +23 + πσ (cid:17) (cid:112) log( n ).Then, the minimization of R ( k ) with respect to k gives k (cid:63) = (cid:16) dβ ( n )2 α ( n ) (cid:17) dd +2 . Then, the lower boundon n log n in the statement follows from the requirement that (cid:16) dβ ( n )2 α ( n ) (cid:17) dd +2 ≥ log n . Hence, by using k (cid:63) ≤ (cid:100) k (cid:63) (cid:101) ≤ k (cid:63) where we take k = (cid:100) k (cid:63) (cid:101) , we obtain the upper bound R ( k (cid:63) ) ≤ R ( k ) ≤ α ( n )(2 k (cid:63) ) /d + β ( n )( k (cid:63) ) − / = (cid:101) R ( k (cid:63) ) . By substituting back the expression of k (cid:63) in (cid:101) R ( k (cid:63) ), we ﬁnd (cid:101) R ( k (cid:63) ) = α ( n ) dd +2 β ( n ) d +2 (cid:32) /d (cid:18) d (cid:19) d +2 + (cid:18) d (cid:19) dd +2 (cid:33) ≤ α ( n ) dd +2 β ( n ) d +2 = γ (cid:18) log nn (cid:19) d +2 where we used the simpliﬁcations ( d ) d +2 ≤ ( d +22 ) d +2 ≤ d ) dd +2 ≤

2, as well as 2 /d ≤ γ (cid:16) log nn (cid:17) d +2 ≤

2, then the stated bound on the wrap-around distance isobtained readily using Fact 4 in the Appendix.For completeness, we also give the out-of-sample statistical rate for the for L ∞ risk. Corollary 3 (Statistical rates for L ∞ risk – out of sample) . Let σ ≤ π and n ≥ d . Let thenumber of neighbours be k = (cid:100) k (cid:63) (cid:101) with k (cid:63) = n d +2 (log n ) dd +2 (cid:32) d ( d + 2)( π σ +43 + √ πσ ) πM (cid:33) dd +2 . Then, provided n log( n ) ≥ (cid:18) πM d ( d +2)( π σ + √ πσ ) (cid:19) d , we obtain the following upper bound on the L ∞ risk in (2.3) , (cid:13)(cid:13)(cid:13)(cid:98) h k − h (cid:13)(cid:13)(cid:13) ∞ ≤ γ (cid:18) log nn (cid:19) d +2 , with γ = 6(8 πM ) dd +2 (cid:16) d + 2)( π σ +43 + √ πσ ) (cid:17) d +2 .Proof. The proof follows step by step the in-sample case, mutatis mutandis . Remark 1.

The restriction σ ≤ π on the noise in the statements of this section is assumed inorder to avoid cumbersome expressions. Therefore, similar guarantees can be obtained by relaxingthis condition. .3 Robustly unwrapping the modulo samples In this subsection, we discuss and analyze a procedure for unwrapping the denoised mod 1 samplesobtained from Algorithm 1.As a warm up, let us ﬁrst look at the univariate case where f : [0 , → R . Consider theunknown ground truth function g ( x ) = f ( x ) mod 1 and let (cid:98) g : [0 , → [0 ,

1) be an estimate of g . Say (cid:98) g ( x ) is close to g ( x ) for all x on the grid X = { x , . . . , x n } where x i = i − n − ; i = 1 , . . . , n .Formally, for some δ ∈ [0 , / d w ( (cid:98) g ( x i ) , g ( x i )) ≤ δ holds for all i . Given theperturbed estimates (cid:98) g ( x i ), we will now show a stable recovery procedure that produces estimates (cid:101) f ( x i ) of f ( x i ), which satisfy (up to an integer shift) the bound (cid:12)(cid:12)(cid:12) f ( x i ) − (cid:101) f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ, ∀ i = 1 , . . . , n. To begin with, observe that for each i , there exists η i ∈ [ − δ, δ ] such that (cid:98) g ( x i ) = ( f ( x i ) + η i ) mod 1.Denoting (cid:98) f ( x i ) := f ( x i ) + η i , we will now recover each (cid:98) f ( x i ) (up to an integer shift) sequentiallyby a simple procedure that relies on the following lemma. It can be viewed as an adaptation of theclassical Itoh’s condition [13] from the phase unwrapping literature, to our setup. Lemma 2. If δ + Mn − < , then the following holds true for each i = 2 , . . . , n . (cid:98) f ( x i ) − (cid:98) f ( x i − ) =  (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if | (cid:98) g ( x i ) − (cid:98) g ( x i − ) | < / , (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if (cid:98) g ( x i ) − (cid:98) g ( x i − ) < − / , − (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if (cid:98) g ( x i ) − (cid:98) g ( x i − ) > / . (2.7) Proof.

Using the Lipschitz continuity of f and triangle inequality, we readily obtain the bound (cid:12)(cid:12)(cid:12) (cid:98) f ( x i ) − (cid:98) f ( x i − ) (cid:12)(cid:12)(cid:12) ≤ δ + Mn − < , ∀ i = 1 , . . . , n, due to our assumption on δ, n . Now denoting (cid:98) q ( x i ) ∈ Z to be the quotient term associated with (cid:98) f ( x i ), we arrive at the identity (cid:98) f ( x i ) − (cid:98) f ( x i − ) = (cid:98) q ( x i ) − (cid:98) q ( x i − ) + (cid:98) g ( x i ) − (cid:98) g ( x i − ) . (2.8)Clearly, the bound (cid:12)(cid:12)(cid:12) (cid:98) f ( x i ) − (cid:98) f ( x i − ) (cid:12)(cid:12)(cid:12) < / (cid:98) q ( x i ) − (cid:98) q ( x i − ) ∈ { , , − } .1. If (cid:98) q ( x i ) = (cid:98) q ( x i − ), then (2.8) readily implies | (cid:98) g ( x i ) − (cid:98) g ( x i − ) | < / (cid:98) q ( x i ) = (cid:98) q ( x i − ) + 1, then (cid:98) f ( x i ) − (cid:98) f ( x i − ) ∈ (0 , / (cid:98) g ( x i ) − (cid:98) g ( x i − ) < − / (cid:98) q ( x i ) = (cid:98) q ( x i − ) −

1, then (cid:98) f ( x i ) − (cid:98) f ( x i − ) ∈ ( − / , (cid:98) g ( x i ) − (cid:98) g ( x i − ) > / (cid:98) g ( x i ) − (cid:98) g ( x i − ) are all disjoint, the identity in (2.7) follows.The above lemma tells us that if δ (cid:46) n (cid:38) M , then the ﬁnite diﬀerence (cid:98) f ( x i ) − (cid:98) f ( x i − ) isdetermined completely by (cid:98) g ( x i ) − (cid:98) g ( x i − ). Therefore we can recover the estimates (cid:101) f ( x i ) as (cid:101) f ( x ) = (cid:98) g ( x ) , (cid:101) f ( x i ) = (cid:101) f ( x i − ) +  (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if | (cid:98) g ( x i ) − (cid:98) g ( x i − ) | < / , (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if (cid:98) g ( x i ) − (cid:98) g ( x i − ) < − / , − (cid:98) g ( x i ) − (cid:98) g ( x i − ) ; if (cid:98) g ( x i ) − (cid:98) g ( x i − ) > / . (2.9)12 emark 2. The sequential procedure in (2.9) was also considered in [7], however, without anyformal analysis.

Using Lemma 2, it is easy to derive the following uniform error bound for the estimates (cid:101) f ( x i ). Lemma 3. If δ + Mn − < then there exists q (cid:63) ∈ Z such that (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ, ∀ i = 1 , . . . , n. (2.10) Proof.

Denote q (cid:63) = (cid:98) q ( x ) ∈ Z to be the quotient term of (cid:98) f ( x ). We will show by induction that (cid:101) f ( x i ) + q (cid:63) = (cid:98) f ( x i ) holds for each i . The bound in (2.10) then follows since (cid:12)(cid:12)(cid:12) (cid:98) f ( x i ) − f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ .To show the induction argument based step, note that (cid:101) f ( x ) + q (cid:63) = (cid:98) f ( x ) is trivially true. Forconvenience, denote the term within braces in (2.9) by a i,i − . Then for any i >

1, we have that (cid:101) f ( x i ) + q (cid:63) = (cid:101) f ( x i − ) + q (cid:63) + a i,i − (using (2.9))= (cid:98) f ( x i − ) + a i,i − (using the induction hypothesis on i )= (cid:98) f ( x i ) (using Lemma 2)which completes the proof. The general d ≥ setting. We now show that the above discussion generalizes to the mul-tivariate setting where f : [0 , d → R . For an integer m >

1, denote X = { x , . . . , x m } d where x i = i − m − to be the uniform grid, with |X | = n = m d . Also denote i = ( i , . . . , i d ) ∈ [ m ] d to bea d -tuple, and x i = ( x i , . . . , x i d ) ∈ [0 , d where x i j = i j − m − . Then with g : [0 , d → [0 ,

1) deﬁnedas g ( x ) = f ( x ) mod 1, denote (cid:98) g : [0 , d → [0 ,

1) to be an estimate of g in the sense that for some δ ∈ [0 , / d w ( (cid:98) g ( x i ) , g ( x i )) ≤ δ ∀ x i ∈ X . This means that for each i ∈ [ m ] d , there exists η i ∈ [ − δ, δ ] such that (cid:98) g ( x i ) = ( f ( x i ) + η i ) mod 1.Denoting (cid:98) f ( x i ) := f ( x i ) + η i , we will recover each (cid:98) f ( x i ) (up to an integer shift) by a generalizationof the procedure in (2.9). For clarity of exposition, let us deﬁne the ﬁnite diﬀerence operator D j (cid:98) f ( x i ) := (cid:98) f ( x i , . . . , x i j , . . . , x i d ) − (cid:98) f ( x i , . . . , x i j − , . . . , x i d ); ∀ j ∈ [ d ] , and i ∈ [ m ] d with i j > . We now present the following generalization of Lemma 2 to the multivariate setting.

Lemma 4. If δ + Mm − < , then the following holds true for each j ∈ [ d ] , and i ∈ [ m ] d with i j > . D j (cid:98) f ( x i ) =  D j (cid:98) g ( x i ) ; if | D j (cid:98) g ( x i ) | < / , D j (cid:98) g ( x i ) ; if D j (cid:98) g ( x i ) < − / , − D j (cid:98) g ( x i ) ; if D j (cid:98) g ( x i ) > / . (2.11) Proof.

The proof is similar to that of Lemma 2. Indeed, using the Lipschitz continuity of f andtriangle inequality, we ﬁrst have that (cid:12)(cid:12)(cid:12) D j (cid:98) f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ + Mm − < , for each 1 ≤ j ≤ d, and i ∈ [ m ] d with i j > . Now denoting (cid:98) q ( x i ) ∈ Z to be the quotient term associated with (cid:98) f ( x i ), note that D j (cid:98) f ( x i ) = D j (cid:98) q ( x i ) + D j (cid:98) g ( x i ) . (2.12)13he key observation is that (2.12) involves taking a ﬁnite diﬀerence only along the coordinate j ,hence the same reasoning as for the univariate setting applies, and it is clear that (cid:12)(cid:12)(cid:12) D j (cid:98) f ( x i ) (cid:12)(cid:12)(cid:12) < / D j (cid:98) q ( x i ) ∈ {± , } . From hereon, the rest of the argument is the same as for Lemma 2 andhence omitted. Remark 3.

When d = 2 , Lemma 4 is similar in spirit to the generalization of Itoh’s conditionto the bivariate case, in the phase unwrapping literature (see [31, Lemma 1.2]). To the best of ourknowledge, a general multivariate unwrapping procedure does not exist in the literature. Equipped with the above lemma, we arrive at the procedure in Algorithm 2 which is a general-ization of the procedure in (2.9) for recovering (cid:98) f ( x i ) at each x i . Finally, we arrive at the following Algorithm 2

Sequentially unwrapping the modulo samples Input: uniform grid

X ⊂ [0 , d , |X | = n = m d ; modulo samples (cid:98) g ( x i ) ∈ [0 , ∀ x i ∈ X Initialization: (cid:101) f (0 , . . . ,

0) = (cid:98) g (0 , . . . , Output: (cid:101) f ( x i ) ∀ x i ∈ X for j = 1 , . . . , d do Fix i j +1: d := ( i j +1 , . . . , i d ) = (1 , . . . , for each i j − := ( i , . . . , i j − ) ∈ [ m ] j − do for i j = 2 , . . . , m do (cid:101) f ( x i j − , x i j , x i j +1: d ) = (cid:101) f ( x i j − , x i j − , x i j +1: d )+  D j (cid:98) g ( x i j − , x i j , x i j +1: d ) ; if (cid:12)(cid:12) D j (cid:98) g ( x i j − , x i j , x i j +1: d ) (cid:12)(cid:12) < / , D j (cid:98) g ( x i j − , x i j , x i j +1: d ) ; if D j (cid:98) g ( x i j − , x i j , x i j +1: d ) < − / , − D j (cid:98) g ( x i j − , x i j , x i j +1: d ) ; if D j (cid:98) g ( x i j − , x i j , x i j +1: d ) > / . (2.13) end for end for end for • • • • • • • ˜ f ( x (1 , ) = ˆ g ( x (1 , ) ˜ f ( x ( m, ) (a) Unwrapping the ﬁrst edge. • • •• • •• • •• • •• • •• • •• • • • • • • • • • ˜ f ( x ( i, )˜ f ( x ( i,m ) ) (b) Unwrapping the face.

Figure 1:

Illustration of the unwrapping algorithm in two dimensions. The unwrappingstarts at the top corner in Figure 1a. Each arrow indicates an operation as given in (2.13) in Algorithm 2, while dark dots are unwrapped samples. Once the edge is unwrapped, eachrow ’rooted’ at this edge is unwrapped, as in Figure 1b. In higher dimensions, the sameprocedure continues, by starting from the last unwrapped face. (cid:101) f ( x i ), ∀ x i ∈ X . Lemma 5. If δ + Mm − < then there exists q (cid:63) ∈ Z such that (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ, ∀ x i ∈ X . (2.14) Proof.

The proof is along the same lines as for Lemma 3. Denote q (cid:63) = (cid:98) q (0 , . . . , ∈ Z to be thequotient term of (cid:98) f (0 , . . . , ≤ j ≤ d , (cid:101) f ( x i j − , x i j , x i j +1: d ) + q (cid:63) = (cid:98) f ( x i j − , x i j , x i j +1: d ) , ∀ i ∈ [ m ] j − × [ m ] × { } d − j , (2.15)as the bound in (2.14) then follows readily. Let us denote the term in braces in (2.13) by a ( i j ,i j − .1. Consider j = 1. We will show by induction that (2.15) is true for each i j ∈ [ m ]. When i = 1,then (2.15) is trivially true by construction. When i >

1, we have (cid:101) f ( x i , , . . . ,

0) + q (cid:63) = (cid:101) f ( x i − , , . . . ,

0) + q (cid:63) + a ( i ,i − (using (2.13))= (cid:98) f ( x i − , , . . . ,

0) + a ( i ,i − (using the induction hypothesis on i )= (cid:98) f ( x i , , . . . ,

0) (using Lemma 4) .

2. Now consider any j > j − i j = 1, thisimplies (cid:101) f ( x i j − , x i j (cid:124)(cid:123)(cid:122)(cid:125) =0 , , . . . ,

0) + q (cid:63) = (cid:98) f ( x i j − , , , . . . , i j . If i j >

1, then (cid:101) f ( x i j − , x i j , , . . . ,

0) + q (cid:63) = (cid:101) f ( x i j − , x i j − , , . . . ,

0) + q (cid:63) + a ( i j ,i j − (using (2.13))= (cid:98) f ( x i j − , x i j − , , . . . ,

0) + a ( i j ,i j − = (cid:98) f ( x i j − , x i j , , . . . ,

0) (using Lemma 4) , where we use for the second equality above the induction hypothesis on i j . This completesthe proof.The unwrapping procedure is summarized in Algorithm 2. For notational convenience, the lines5 and 6 have to be skipped whenever i is indexed by an empty set. We can combine the results of Corollary 2 and Lemma 5 to provide a guarantee on the recoveringof samples of f given noisy mod 1 samples by following the procedure described in Algorithm 3. Theorem 4 (Main result) . Let σ ≤ π and m d = n ≥ d and let δ ( n ) = 6(8 πM ) dd +2 (cid:18)

32( 4 π σ + 23 + πσ ) (cid:19) d +2 (cid:18) log( n ) n (cid:19) d +2 . f δ ( n ) ≤ , then with probability at least − /n , Algorithm 1 yields denoised mod estimates (cid:98) g ( x i ) such that d w (cid:16)(cid:98) g ( x i ) , g ( x i ) (cid:17) ≤ δ ( n ) , i ∈ [ m ] d . Furthermore, if δ ( n ) + Mm − < then there exists q (cid:63) ∈ Z and ˜ f ( x i ) given by Algorithm 2 such that (cid:12)(cid:12)(cid:12) (cid:101) f ( x i ) + q (cid:63) − f ( x i ) (cid:12)(cid:12)(cid:12) ≤ δ ( n ) , ∀ i ∈ [ m ] d . Proof.

The result is obtained by using Corollary 2 and subsequently, by choosing δ in Lemma 5 as δ ( n ). Note that the condition δ ( n ) ≤ n log n ≥ d +2 (8 πM ) d (cid:18)

32( 4 π σ + 23 + πσ ) (cid:19) which is stricter than the condition n log n ≥ (cid:18) πM d ( π σ + πσ ) (cid:19) d stated in Corollary 2.This result indicates indeed that if the number of samples n is large enough, the denoisingprocess yields a suﬃciently good estimate of the noiseless mod 1 signal so that the unwrappingprocedure achieves a good estimate of the ground truth signal. Algorithm 3

Denoising and unwrapping modulo samples with kNN regression Input: integer k >

X ⊂ [0 , d , |X | = n = m d ; noisy modulo samples y i = ( f ( x i ) + η i ) mod 1 for all i ∈ [ m ] d . Output: unwrapped denoised modulo samples ˜ f ( x i ) for all i ∈ [ m ] d . Denoising step: Input ( y i ) i ∈ [ m ] d in Algorithm 1 to get denoised mod 1 samples ( (cid:98) g ( x i )) x i ∈X . Unwrapping step: Input ( (cid:98) g ( x i )) x i ∈X in Algorithm 2 to yield ( (cid:101) f ( x i )) x i ∈X . samples on a graph with an SDP relaxation This section formally analyzes a SDP approach for denoising modulo 1 samples which was recentlyproposed by Cucuringu and Tyagi [7]. We begin by formally outlining the problem setup.

Let G = ([ n ] , E ) be a connected, undirected graph where E ⊆ {{ i, j } : i (cid:54) = j ∈ [ n ] } denotes its setof edges. Denote the degree of vertex p by d p , the maximum degree of G by (cid:52) := max p d p , andthe (combinatorial) Laplacian matrix associated with G by L ∈ R n × n . Let h ∈ T n be an unknownground truth signal which is smooth w.r.t G in the sense that B n := max { i,j }∈ E | h i − h j | (3.1)is “small”, where B n > n . Ideally, we will be interested in the setting where B n → n → ∞ . For example, in the setting of Section 2 with d = 1, we may choose G to be the pathgraph where E = {{ i, i + 1 } : i = 1 , . . . , n − } . Then (cid:52) = 2 and B n = 2 πL/n , the latter obtainedfrom Fact 1. 16iven information about h in the form of noisy z ∈ T n (cfr. (2.1)), our goal is to identifyconditions under which (SDP) is a tight relaxation of (QCQP), i.e., the solution to (SDP) is ofrank 1. As we will see below, this will lead to the global solution (cid:98) g of (QCQP), and in particular,we would have obtained (cid:98) g in polynomial time. Remark 4.

In the notation of Section 2, Cucuringu and Tyagi [7] considered a speciﬁc class ofgraphs G = ([ m ] d , E ) where E = (cid:8) { i , j } : i , j ∈ [ m ] d , i (cid:54) = j , (cid:107) i − j (cid:107) ∞ ≤ k (cid:9) , for some integer k > . For such graphs, we have (cid:52) = (2 k + 1) d − . Moreover, no analysis was provided concerningthe tightness of (SDP) . The SDP relaxation of (QCQP) can be derived as follows. Denoting T = (cid:18) λL − z − z ∗ (cid:19) , W = (cid:18) gg ∗ gg ∗ (cid:19) ; g ∈ T n , (3.2)the objective of (QCQP) is simply Tr( T W ). Note that W is a rank-1 positive semi-deﬁnite (p.s.d)matrix with W ii = 1 for each i . In fact, it is easy to see that any rank-1, p.s.d matrix W (cid:48) ∈ C ( n +1) × ( n +1) with W (cid:48) ii = 1 will be of the form W (cid:48) = (cid:18) g (cid:48) g (cid:48)∗ g (cid:48) g (cid:48)∗ (cid:19) ; g (cid:48) ∈ T n . (3.3)Hence (QCQP) is equivalent tomin W ∈ C ( n +1) × ( n +1) Tr(

T W ) s.t W (cid:23) , rank( W ) = 1 , W ii = 1 (3.4)and we obtain a solution g of (QCQP) as the ﬁrst n entries of the last column of a solution W of(3.4). Problem (3.4) is non-convex (due to the rank constraint) and is in general NP-hard if T isan arbitrary Hermitian matrix [33, Proposition 3.3]. By dropping the rank constraint, we ﬁnallyarrive at (SDP).Due to the equivalence of (3.4) and (QCQP), we can see that if X is a solution of (3.4) andhas rank 1, then it will be of the form in (3.3) where g (cid:48) is a solution of (QCQP). (cid:96) ∞ error bound for (QCQP) To begin with, we will prove the following (cid:96) ∞ stability bound for any solution (cid:98) g ∈ T n of (QCQP).This stability result will then be employed later in Section 3.3 for proving the main result, namelyTheorem 6. Theorem 5.

Assume that the observation z ∈ T n satisﬁes (cid:107) z − h (cid:107) ∞ ≤ δ . If λ (cid:52) < √ holds, thenany solution (cid:98) g of (QCQP) satisﬁes the bound (cid:107) (cid:98) g − h (cid:107) ∞ ≤ δ + δ + λ (cid:52) ( B n + √ − λ (cid:52)√ . Proof of Theorem 5.

For any given p ∈ [ n ], consider (cid:101) g ∈ T n of the form (cid:101) g = (cid:98) g + ( h p − (cid:98) g p ) e p , i.e., (cid:101) g q = (cid:26) h q if q = p, (cid:98) g q otherwise, for q = 1 , . . . , n. The steps leading to the relaxation are explained more clearly here as opposed to [7]. In our case, T is a speciﬁc matrix involving the Laplacian L so it is not clear if it is NP hard. This idea of constructing (cid:101) g is taken from the proof of Lemma 4 . (cid:101) g is feasible for (QCQP). Since (cid:98) g is optimal, we obtain the inequality λ (cid:98) g ∗ L (cid:98) g − (cid:98) g ∗ z ) ≤ λ (cid:101) g ∗ L (cid:101) g − (cid:101) g ∗ z ) ⇔ λ ( (cid:98) g ∗ L (cid:98) g − (cid:101) g ∗ L (cid:101) g ) ≤ (cid:98) g − (cid:101) g ) ∗ z ) . (3.5)The LHS of (3.5) can be simpliﬁed as λ (cid:88) { i,j }∈ E ( | (cid:98) g i − (cid:98) g j | − | (cid:101) g i − (cid:101) g j | )= λ (cid:88) { i,j }∈ E : p (cid:54)∈{ i,j } ( | (cid:98) g i − (cid:98) g j | − | (cid:101) g i − (cid:101) g j | ) + λ (cid:88) j : { p,j }∈ E ( | (cid:98) g p − (cid:98) g j | − | (cid:101) g p − (cid:101) g j | )= λ (cid:88) { i,j }∈ E : p (cid:54)∈{ i,j } ( | (cid:98) g i − (cid:98) g j | − | (cid:98) g i − (cid:98) g j | ) + λ (cid:88) j : { p,j }∈ E ( | (cid:98) g p − (cid:98) g j | − | (cid:101) g p − (cid:101) g j | )= 2 λ (cid:88) j : { p,j }∈ E Re(( h p − (cid:98) g p ) ∗ (cid:98) g j )where the second equality uses the deﬁnition of (cid:101) g , and the ﬁnal equality follows from simple algebra.Since the RHS of (3.5) equals 2Re(( (cid:98) g p − h p ) ∗ z p ), hence (3.5) is equivalent to λ (cid:88) j : { p,j }∈ E Re(( h p − (cid:98) g p ) ∗ (cid:98) g j ) ≤ Re(( (cid:98) g p − h p ) ∗ z p ) . (3.6)Not let us denote ε ∈ [ − ,

1] to be the largest number such that Re( (cid:98) g ∗ i h i ) ≥ ε holds for all i = 1 , . . . , n . Since Re( (cid:98) g ∗ i h i ) = 1 − | (cid:98) g i − h i | , hence | (cid:98) g i − h i | ≤ − ε ) holds for each i . Our goal isto now provide simpliﬁed lower and upper bounds on the LHS and RHS of (3.6) respectively. Inorder to obtain the lower bound, we note that λ (cid:88) j : { p,j }∈ E Re( h ∗ p (cid:98) g j ) − λ (cid:88) j : { p,j }∈ E Re( (cid:98) g ∗ p (cid:98) g j )= λ (cid:88) j : { p,j }∈ E Re( h ∗ p h j ) + λ (cid:88) j : { p,j }∈ E Re( h ∗ p ( (cid:98) g j − h j )) − λ (cid:88) j : { p,j }∈ E Re( (cid:98) g ∗ p (cid:98) g j ) ≥ λ (cid:88) j : { p,j }∈ E Re( h ∗ p h j ) − λd p − λd p (cid:112) − ε ) (3.7)where the last inequality follows from Re( (cid:98) g ∗ p (cid:98) g j ) ≤ h ∗ p ( (cid:98) g j − h j )) ≥ − | (cid:98) g j − h j | ≥ − (cid:112) − ε ).Plugging the bound Re( h ∗ p h j ) = 1 − | h p − h j | ≥ − B n in (3.7), we obtain the lower bound λd p (cid:18) − B n (cid:19) − λd p − λd p (cid:112) − ε ) = − λd p (cid:18) B n (cid:112) − ε ) (cid:19) . (3.8)The upper bound on the RHS of (3.6) follows readily as shown below.Re(( (cid:98) g p − h p ) ∗ z p ) = Re( (cid:98) g ∗ p ( z p − h p )) + Re( (cid:98) g ∗ p h p ) − Re( h ∗ p z p )= Re( (cid:98) g ∗ p ( z p − h p )) (cid:124) (cid:123)(cid:122) (cid:125) ≤| z p − h p |≤ δ +Re( (cid:98) g ∗ p h p ) −  − | h p − z p | (cid:124) (cid:123)(cid:122) (cid:125) ≤ δ /  ≤ Re( (cid:98) g ∗ p h p ) − δ + δ . (3.9)18pplying (3.8), (3.9) in (3.6), we obtainRe( (cid:98) g ∗ p h p ) ≥ − δ − δ − λd p (cid:18) B n (cid:112) − ε ) (cid:19) ≥ − δ − δ − λ (cid:52) (cid:18) B n √ − ε (cid:19) (Since d p ≤ (cid:52) and √ − ε ≤ − ε . (3.10)Since ε is the largest lower bound holding uniformly for Re( (cid:98) g ∗ p h p ) for each p = 1 , . . . , n , hence ε must be larger than the RHS of (3.10). If furthermore λ (cid:52) < √

2, we obtain ε ≥ − δ − δ − λ (cid:52) (cid:18) B n √ − ε (cid:19) ⇔ ε ≥ − δ − δ − λ (cid:52) (cid:16) B n + √ (cid:17) − λ (cid:52)√ and the stated bound on (cid:107) (cid:98) g − h (cid:107) ∞ follows since | (cid:98) g i − h i | ≤ − ε ) holds for each i = 1 , . . . , n . Remark 5.

The above result is a stability result for the solution (cid:98) g of (QCQP) and states thatif λ (cid:46) / (cid:52) then (cid:107) (cid:98) g − h (cid:107) ∞ (cid:46) √ δ + √ λ (cid:52) . The bound is admittedly not satisfactory, since, asseen from Theorem 5, it does not really satisfy the “denoising property” (cid:107) (cid:98) g − h (cid:107) ∞ < (cid:107) z − h (cid:107) ∞ which is what one would ideally like to prove (since (QCQP) is denoising z ). The problem lies inthe proof technique where we do not really make use of any particular property of (cid:98) g , but rather,use a feasibility argument with a suitably constructed feasible point (cid:101) g . In the next section, we willsee certain optimality conditions necessarily satisﬁed by (cid:98) g , however it is unclear how they can beemployed to yield better bounds. (QCQP) We will now derive conditions under which (SDP) is a tight relaxation of (QCQP), i.e., the solutionof (SDP) is a rank-1 matrix. The main result of this section is stated as the following Theorem.

Theorem 6.

Assume that the observation z ∈ T n satisﬁes (cid:107) z − h (cid:107) ∞ ≤ δ . If λ, (cid:52) , δ satisfy1. δ + (cid:113) (3 δ + λ (cid:52) ( B n + √ ≤ √ , and2. λ (cid:52) ≤ ,then (SDP) has a unique solution X ∈ C ( n +1) × ( n +1) where X = (cid:18)(cid:98) g (cid:98) g ∗ (cid:98) g (cid:98) g ∗ (cid:19) = (cid:18)(cid:98) g (cid:19) (cid:0)(cid:98) g ∗ (cid:1) ; (cid:98) g ∈ T n . Consequently, (cid:98) g is the unique solution of (QCQP) . Our technique will essentially follow the idea proposed by Bandeira et al. [1] in the contextof the tightness of the SDP relaxation of the MLE for the phase synchronization problem, and isdetailed in the ensuing sections. The ﬁrst step of the proof is to identify the KKT conditions whichare satisﬁed by any global minimizer X of (SDP) and its dual (see Lemma 6). This necessitatesconstructing a dual feasible matrix (cid:98) S ∈ C ( n +1) × ( n +1) satisfying (cid:98) SX = 0. The second step involvesidentifying the ﬁrst order optimality conditions of (QCQP) which are necessarily satisﬁed by any19local) minimizer of (QCQP) (and hence by a global minimizer (cid:98) g as well). This enables us toguess the form of (cid:98) S , which itself depends on (cid:98) g , and satisﬁes all except one KKT condition, namelypositive semi-deﬁniteness. In the third step, we show in Lemma 8 that if δ (noise level) and (cid:52) λ are respectively suﬃciently small, then rank( (cid:98) S ) = n and (cid:98) S (cid:23)

0. This implies (from Lemma 6) thatthe solution X of (SDP) is unique and has rank 1. (SDP) and its dual To begin with, we will need the following Lemma from [1] (adapted to our setup) which states theKKT conditions for (SDP) and its dual.

Lemma 6 ([1, Lemma 4.3]) . A Hermitian matrix X ∈ C ( n +1) × ( n +1) is a global minimizer of (SDP) iﬀ there exists a Hermitian matrix (cid:98) S ∈ C ( n +1) × ( n +1) such that1. X ii = 1 for all i = 1 , . . . , n + 1 ;2. X (cid:23) ;3. (cid:98) SX = 0 ;4. (cid:98) S − T is (real) diagonal;5. (cid:98) S (cid:23) .If furthermore rank( (cid:98) S ) = n , then X has rank one and is the unique global minimizer of (SDP) . Conditions (1), (2) (resp. (4), (5)) are primal (resp. dual) feasibility conditions, while condition(3) is the complementary slackness condition. (cid:98) S We will now derive the ﬁrst order optimality conditions that are necessarily satisﬁed by any (local)minimizer of (QCQP). Since every u = u + ιu ∈ C is uniquely identiﬁed by ( u , u ) ∈ R we canendow C with the Euclidean metric (cid:104) u, v (cid:105) = Re( u ∗ v ). Moreover, T is a submanifold of C with thetangent space at each u ∈ T given by T u T = { (cid:101) u ∈ C : (cid:104) (cid:101) u, u (cid:105) = 0 } . Thus T (resp. T n ) is a smooth Riemannian submanifold of C (resp. C n ). For x ∈ T n , the tangentspace of T n at x is given by T x T n = { (cid:101) x ∈ C n : Re(diag( (cid:101) xx ∗ )) = 0 } . Denoting F ( g ) = λg ∗ Lg − g ∗ z ), we would like to minimize F over T n . Following [1], let usdeﬁne the orthogonal projection operator proj x : C n → T x T n asproj x ( (cid:101) x ) = (cid:101) x − Re(diag( (cid:101) xx ∗ )) x. (3.11)Denoting grad F ( g ) := proj g ∇ F ( g ) to be the Riemannian gradient of F at g ∈ T n , where ∇ F ( g ) isthe usual Euclidean gradient of F with respect to (Re( g ) , Im( g )) (cid:62) . Then, for any local minimizer g of (QCQP), it holds that grad F ( g ) = proj g ∇ F ( g ) = 0 , ∇ F ( g ) = 2( λLg − z ), and hence the global minimizer (cid:98) g of (QCQP) satisﬁes proj (cid:98) g ( λL (cid:98) g − z ) = 0, orequivalently (cid:18)(cid:18) λL − z − z ∗ (cid:19) − Re (cid:18) diag (cid:18)(cid:18) λL − z − z ∗ (cid:19) (cid:18)(cid:98) g (cid:19) (cid:0)(cid:98) g ∗ (cid:1)(cid:19)(cid:19)(cid:19) (cid:18)(cid:98) g (cid:19) = 0 . (3.12)Recall the deﬁnition of the matrix T in (3.2). Then denoting (cid:101) g = (cid:18)(cid:98) g (cid:19) and (cid:98) S = T − Re(diag( T (cid:101) g (cid:101) g ∗ )) , (3.13)(3.12) can be written as (cid:98) S (cid:101) g = 0. Now denoting X = (cid:101) g (cid:101) g ∗ ∈ C ( n +1) × ( n +1) , clearly • X is primal feasible for (SDP) (conditions (1), (2) of Lemma 6); • (cid:98) SX = 0 (condition (3) of Lemma 6), and • (cid:98) S − T is a real, diagonal matrix (condition (4) of Lemma 6).Hence setting (cid:98) S as in (3.13) to be our dual certiﬁcate candidate, we now only need to ﬁnd conditionsunder which (cid:98) S (cid:23) n since from Lemma 6 this would imply X is the unique solutionof (SDP). Consequently, (cid:98) g will be the unique solution of (QCQP). Lemma 7 (Properties of critical points) . Let L = diag( W ) − W where W is the adjacency matrixof the graph. If (cid:98) g is a ﬁrst order critical point of (QCQP) then, the following statements hold.(i) z ∗ (cid:98) g is real, and(ii) (cid:98) g ∗ i ( z + λW (cid:98) g ) i is real, for all ≤ i ≤ n .Furthermore, let the real symmetric matrix W (cid:98) g = W ◦ Re ( (cid:98) g (cid:98) g ∗ ) . Then, if (cid:98) g is a second order criticalpoint of (QCQP), we have(iii) u (cid:62) (cid:8) Re diag( z (cid:98) g ∗ ) + λ (cid:2) diag( W (cid:98) g ) − W (cid:98) g (cid:3)(cid:9) u ≥ , for all u ∈ R n .Proof. The ﬁrst order condition (3.12) can be written as follows λL (cid:98) g − Re(diag( λL (cid:98) g (cid:98) g ∗ )) (cid:98) g + Re(diag( z (cid:98) g ∗ )) (cid:98) g = z. Then, ( i ) and ( ii ) are obtained, respectively, by multiplying the above expression by (cid:98) g ∗ and (cid:98) g ∗ i e i .Next, we rely on the second order condition (see Proposition 2 in appendix) (cid:104) ˙ g, Hess F ( (cid:98) g )[ ˙ g ] (cid:105) ≥ g ∈ T (cid:98) g T n , with Hess F ( g )[ ˙ g ] = 2 { λL ˙ g − Re[diag (( λLg − z ) g ∗ )] ˙ g } . Then, we parametrize ˙ g ∈ T (cid:98) g T n as ˙ g = ι diag( u ) (cid:98) g where u ∈ R n . Consequently, since diag( u ) (cid:98) g = diag( (cid:98) g ) u and thanks to (ii), we have theequivalent expression (cid:104) ˙ g, Hess F ( (cid:98) g )[ ˙ g ] (cid:105) = 2Re (cid:110) u (cid:62) diag( (cid:98) g ∗ ) [Re diag( z (cid:98) g ∗ + λW (cid:98) g (cid:98) g ∗ ) − λW ] diag( (cid:98) g ) u (cid:111) = 2Re (cid:110) u (cid:62) [Re diag( z (cid:98) g ∗ + λW (cid:98) g (cid:98) g ∗ ) − λ diag( (cid:98) g ∗ ) W diag( (cid:98) g )] u (cid:111) . Then, the condition becomesRe (cid:110) u (cid:62) [Re diag( z (cid:98) g ∗ ) + Re diag( λ diag( (cid:98) g ∗ ) W diag( (cid:98) g ) ) − λ diag( (cid:98) g ∗ ) W diag( (cid:98) g )] u (cid:111) ≥ u ∈ R n and (iii) follows. 21 orollary 4. If (cid:98) g is a second order critical point of (QCQP), then it holds that z ∗ (cid:98) g ≥ and (cid:98) g ∗ i ( z + λW (cid:98) g ) i ≥ , for all ≤ i ≤ n .Proof. The inequality z ∗ (cid:98) g ≥ u = . While the secondinequality follows by using the positivity of the diagonal in (iii) and the statement (ii). Remark 6.

Although Lemma 7 states that z ∗ (cid:98) g is real whenever (cid:98) g is a ﬁrst order critical point, thequantity z ∗ i (cid:98) g i , with ≤ i ≤ n , is not necessarily real. Similarly, if (cid:98) g is a second order critical point,we have, thanks to Corollary 4, that (cid:98) g ∗ i z i + λ (cid:98) g ∗ i ( W (cid:98) g ) i ≥ , for all ≤ i ≤ n . However, again, theﬁrst term in the latter inequality is not necessarily real. rank( (cid:98) S ) = n and (cid:98) S (cid:23) (cid:98) S as in (3.13) is positivesemideﬁnite and of rank n . The following Lemma states that if the noise level δ , and the term λ (cid:52) are respectively small, then rank( (cid:98) S ) = n and (cid:98) S (cid:23) Lemma 8 (Suﬃcient condition for tightness of SDP) . Let (cid:98) g ∈ T n be a global minimizer of (QCQP)min g ∈ T n λg ∗ Lg − Re ( g ∗ z ) , and denote (cid:101) g = (cid:18)(cid:98) g (cid:19) . Under the notation deﬁned earlier, if we have λ (cid:52) ( B n + 2 (cid:107) (cid:98) g − h (cid:107) ∞ ) + 3 − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) (cid:16) δ + (cid:107) (cid:98) g − h (cid:107) ∞ (cid:17) < , (3.14) then (cid:98) S = T − Re (diag( T (cid:101) g (cid:101) g ∗ )) satisﬁes (cid:98) S (cid:23) , and rank( S ) = n while the unique solution to (SDP) reads X = (cid:101) g (cid:101) g ∗ = (cid:18)(cid:98) g (cid:98) g ∗ (cid:98) g (cid:98) g ∗ (cid:19) . In particular, the condition (3.14) is satisﬁed if1. δ + (cid:113) (3 δ + λ (cid:52) ( B n + √ ≤ √ , and2. λ (cid:52) ≤ .Proof. To begin with, note that the ﬁrst order optimality condition (3.12) states thatRe(( T (cid:101) g ) i (cid:101) g ∗ i ) (cid:101) g i = ( T (cid:101) g ) i ; i = 1 , . . . , n + 1 , and hence ( T (cid:101) g ) i (cid:101) g ∗ i is real for each i . Consequently, we have that z ∗ (cid:98) g and ( λL (cid:98) g − z ) i (cid:98) g ∗ i for i = 1 , . . . , n are real. Thus (cid:98) S = T − diag( T (cid:101) g (cid:101) g ∗ ). Denoting D ∈ R n × n to be a real diagonal matrix with D = diag( (cid:98) g ∗ ◦ ( z − λL (cid:98) g )), one can verify that (cid:98) S = (cid:18) λL − z − z ∗ (cid:19) + (cid:18) D z ∗ (cid:98) g (cid:19) = (cid:18) λL + D − z − z ∗ z ∗ (cid:98) g (cid:19) . If z ∗ (cid:98) g (cid:54) = 0, then from Sylvester’s law of inertia (see [11, Theorem 4.5.8]) we know that (cid:98) S is ∗ -congruent to the Hermitian block diagonal matrix (cid:98) S D = (cid:18) λL + D − zz ∗ z ∗ (cid:98) g z ∗ (cid:98) g (cid:19) (cid:98) S, (cid:98) S D have the same inertia . Since (cid:98) g is a second order criticalpoint of (QCQP), we have z ∗ (cid:98) g ≥ z ∗ (cid:98) g >

0, while we establishbelow necessary conditions so that this is satisﬁed. Thus, it follows that (cid:98) S is rank- n and p.s.d iﬀthe matrix M = λL + D − zz ∗ z ∗ (cid:98) g ∈ C n × n is p.s.d and has rank ( n − M (cid:23) M ) = n − (cid:98) g ∗ i ( λL (cid:98) g − z ) i is real for each i , hence (cid:98) g ∗ i ( λL (cid:98) g − z ) i = Re( (cid:98) g ∗ i ( λL (cid:98) g − z ) i ) = λ Re( (cid:98) g ∗ i ( L (cid:98) g ) i ) − Re( (cid:98) g ∗ i z i ) . Denote D , D ∈ R n × n to be diagonal matrices with ( D ) ii = Re( (cid:98) g ∗ i z i ), and ( D ) ii = λ Re( (cid:98) g ∗ i ( L (cid:98) g ) i )for i = 1 , . . . , n . Then we can write M as M = λL + D − D − zz ∗ z ∗ (cid:98) g , where we observe that (cid:98) g is an eigenvector of M with eigenvalue 0. Let u ∈ C n be orthogonal to (cid:98) g ,i.e., u ∗ (cid:98) g = 0 and u (cid:54) = 0. We will now establish conditions under which u ∗ M u > M ) = n −

1, and M (cid:23)

1. Consequently, we have the bounds z ∗ (cid:98) g = Re( z ∗ (cid:98) g ) = n (cid:88) i =1 Re( z ∗ i (cid:98) g i ) ≥ n (cid:18) − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) (cid:19) , (3.17) (cid:107) z − (cid:98) g (cid:107) = 2( n − Re( z ∗ (cid:98) g )) ≤ n ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) . (3.18)Finally, for any i = 1 , . . . , n we can bound the term Re( (cid:98) g ∗ i ( L (cid:98) g ) i ) in two ways as follows:Re( (cid:98) g ∗ i ( L (cid:98) g ) i ) = (cid:98) g ∗ i (cid:88) j : { i,j }∈ E ( (cid:98) g i − (cid:98) g j ) ≤ (cid:88) j : { i,j }∈ E | (cid:98) g i − (cid:98) g j | ≤ (cid:52) . and by using the triangle inequalityRe( (cid:98) g ∗ i ( L (cid:98) g ) i ) ≤ (cid:107) Lh (cid:107) ∞ + (cid:107) L ( g − h ) (cid:107) ∞ ≤ (cid:52) ( B n + 2 (cid:107) (cid:98) g − h (cid:107) ∞ ) . (3.19)23lugging (3.16), (3.17), (3.18), (3.19) in (3.15) we arrive at the bound u ∗ M u ≥ (cid:107) u (cid:107) (cid:32) − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) − λ (cid:52) min (cid:26) , B n (cid:107) (cid:98) g − h (cid:107) ∞ (cid:27) − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) − ( δ + (cid:107) (cid:98) g − h (cid:107) ∞ ) (cid:33) One can verify that u ∗ M u ≥ (cid:107) u (cid:107) / λ (cid:52) ≤ / δ + (cid:107) (cid:98) g − h (cid:107) ∞ ≤ √ /

3. Finally, note that thecondition λ (cid:52) ≤ / (cid:107) (cid:98) g − h (cid:107) ∞ therein, one can verify that δ + (cid:107) (cid:98) g − h (cid:107) ∞ ≤ δ + (cid:114)

87 (3 δ + λ (cid:52) ( B n + √ . Therefore δ + (cid:107) (cid:98) g − h (cid:107) ∞ ≤ √ / δ + (cid:113) (3 δ + λ (cid:52) ( B n + √ ≤ √ /

3. This com-pletes the proof.

As a proof of concept, numerical illustrations of Algorithm 3 for denoising and unwrapping mod1 samples are given ﬁrstly on artiﬁcial 1D examples, and then, on a 2D problem constructed withreal data.

The output of Algorithm 3 is compared with two other methods which are described in more detailin Section 5.1, namely a trust region subproblem (TRS) and an unconstrained quadratic program(UCQP). Two example functions are chosen to illustrate unwrapping and denoising on a uniformgrid when d = 1, • Example 1 : f : [0 , → R , x (cid:55)→ sin(4 πx ), • Example 2 : f : [0 , → R , x (cid:55)→ x cos(2 πx ) − πx ) + 4 . n can be visualized in Figure 3 and Figure 5, respectively. These error plots display averages over50 Monte-Carlo trials for Gaussian noise for (i) recovery of mod 1 samples with respect to themean square wrap-around error, and (ii) unwrapped samples (after alignment) with respect to theMean Square Error (MSE). The alignment procedure follows the same methodology as in [7], i.e., itrelies on the determination of the mode of a histogram constructed from the distances between theunwrapped and clean samples. The Gaussian noise level in our experiments is taken to be σ = 0 . igure 2: kNN denoising and unwrapping for Example 1. Parameters: n = 10 , C = 0 . .

200 400 600 800 1 , . . . . . . . n W r a p a r o und M S E kNNTRSUCQPNoisy 200 400 600 800 1 , . n M S E kNNTRSUCQPNoisy Figure 3:

Comparisons between error rates obtained by (kNN), (UCQP) and (TRS) forExample 1, see Figure 2. Error bars are standard deviations over 50 Monte-Carlo runs.Parameters: C = 0 . , κ = 0 . . igure 4: kNN denoising and unwrapping for example 2. Parameters: n = 10 , C = 0 . .

200 400 600 800 1 , . . . . n W r a p a r o und M S E kNNTRSUCQPNoisy 200 400 600 800 1 , . . . n M S E kNNTRSUCQPNoisy Figure 5:

Comparisons between error rates obtained by (kNN), (UCQP) and (TRS) onthe example 2, see Figure 4. Error bars are standard deviations over 50 Monte-Carlo runs.Parameters: C = 0 . , κ = 0 . . • For kNN, in view of Corollary 2, the number of neighbours is chosen such that k = (cid:100) k (cid:63) (cid:101) with k (cid:63) = Cn (log n ) , where C > • For (UCQP) and (TRS), the analysis of the corresponding problems by Tyagi [28] (see Corol-lary 4 and Corollary 8 therein) indicates the choice λ (cid:16) (cid:0) σ n / /M (cid:1) / . Hence, for theexample of Figure 2 and Figure 4, we take λ = κn / where κ is given hereafter. Bothmethods rely on an appropriate smoothness graph G = ([ n ] , E ) which in our experiments istaken to be the path graph where E = {{ i, i + 1 } : i = 1 , . . . , n − } .In the relatively simple example of Figure 3, one observes that, for the chosen parameters, allmethods have a similar performance. For the more complex example corresponding to Figure 5, weobserve that (kNN) and (UCQP) yield a slightly smaller error. Notice that the diﬀerences betweenall the methods are not always signiﬁcant. Also, ﬁne-tuning the parameters for a given n mightfurther improve the results, however all three methods seem to achieve a similar error when n becomes large. In order to provide a proof-of-concept in two dimensions, we illustrate the performance of Algo-rithm 3 for denoising and unwrapping mod 1 samples on a 2D grid. To do so, in Figure 6, wesimulate the reconstruction of the elevation map of Mount Vesuvius from noisy mod 1 samples byfollowing a similar methodology as in [7].Firstly, the latitude and longitude grid is rescaled to be a uniform grid in [0 , . Next, theelevation data is scaled down by a factor 500. This “change of units” is necessary to make surethat the clean data is smooth enough so that the unwrapping of the noiseless mod 1 samples matchthe original noiseless data. Then, the mod 1 samples are corrupted by an additive zero meanGaussian noise. Then, the denoising is performed with a kNN estimator. In Figure 6, the outputof Algorithm 3 is also compared to a simple unwrapping of the noisy data using Algorithm 2.The upshot is that, for a large enough noise level, the denoising step in Algorithm 3 is indeeda necessary step before applying the unwrapping algorithm. Namely, spurious jumps are visiblein the plots of Figure 6f and Figure 6i. Naturally, the denoising procedure also smoothes out thepeaks on top of the mount. 27 (a) Noiseless mod 1 samples. (b)

Denoised mod 1 samples. (c)

Noisy mod 1 samples. (d)

Noiseless samples. (e)

Denoised unwrapped samples. (f)

Noisy unwrapped samples. (g)

Noiseless samples. (h)

Denoised unwrapped. (i)

Noisy unwrapped.

Figure 6:

Denoising and unwrapping mod 1 samples obtained from the elevation mapof Mount Vesuvius. The ﬁrst two rows contain contour plots which allow to visualize thesmoothness of the samples. Parameters: k = 40 and σ = 0 . . The elevation map of MountVesuvius ( N40E014.hgt.zip ) was downloaded thanks to the readhgt.m script written byFran¸cois Beauducel from https://dds.cr.usgs.gov/srtm/version2_1 . We start with a detailed overview of related work from the literature and conclude by outliningdirections for future work.

As discussed in Section 1, the phase unwrapping problem has been studied extensively in the signalprocessing community with a long history of work. Let us deﬁne a “wrap” function w γ : R → [ − γ, γ ) w γ ( t ) := 2 γ (cid:18)(cid:20) t γ + 12 (cid:21) − (cid:19) that outputs centered modulo 2 γ values, with [ a ] denoting the fractional part of a ∈ R . Note that t mod (2 γ ) = w γ ( t ) so there is a one-to-one correspondence between these two operators. In phaseunwrapping γ = π so that we are given noisy modulo samples y i = w π ( f ( x i ) + η i ) for 1 ≤ i ≤ n ,28ith f : R d → R the unknown signal of interest. Denoting (cid:98) f i = f ( x i ) + η i , the classical Itoh’scondition [13] for d = 1 states that if (cid:12)(cid:12)(cid:12) (cid:98) f i − (cid:98) f i − (cid:12)(cid:12)(cid:12) ≤ π ; i = 2 , . . . , n, then this implies (cid:98) f i − (cid:98) f i − = w π ( y i − y i − ) for all i . This suggests that if Itoh’s condition holds,then one can recover the samples (cid:98) f i , up to a global shift of an integer multiple of 2 π , in a sequentialmanner. As mentioned in Remark 3, the generalization of this for the case d = 2 is known, howeverwe are unaware of a general version of Itoh’s condition for the multivariate setting.Apart from the natural approach where one denoises the wrapped samples with the hope thatthe denoised estimates satisfy Itoh’s condition, numerous other robust methods have been proposedin the phase unwrapping literature. While the list is too long to review in detail here, we remarkthat these approaches can be roughly classiﬁed as (i) least squares approach (e.g. [23, 18]), (ii)branch cut methods [22, 5] and (iii) network ﬂow methods [6, 27]. The reader is referred to [7] as wellas the excellent survey by Ying [31] for a more comprehensive discussion about the literature. Onedrawback of the phase unwrapping literature is that the methods are typically based on heuristics,and do not, in general, come with theoretical performance guarantees.In the past couple of years several new approaches have been proposed for this problem with anemphasis on theoretical guarantees. As detailed below, these approaches typically rely on makingcertain smoothness assumptions on the underlying f .1. Bhandari et al. [2] considered a setup where f is a univariate bandlimited function (spectrumlying in [ − π, π ]), with equispaced noiseless modulo samples available via the map w γ . Theirmain result was to show that if the sampling width satisﬁes T ≤ πe , then the samples of f , and hence f itself, can be recovered exactly. The same authors extended their results toother settings where diﬀerent assumptions were made on f . Speciﬁcally, they assume in [4]that f can be represented as the convolution of a sum of k Diracs, while in [3], they consider f to be a sum of k sinusoids. In both these papers, they show that if the number of samplesis large enough (i.e., n (cid:38) k ), and T ≤ πe , then f can be recovered exactly.2. Rudresh et al. [25] consider f to be a univariate Lipschitz function (with Lipschitz constant M ) with equispaced sampling through the map w γ . The method proposed therein involvesthe application of a wavelet ﬁlter to the modulo samples, which is then followed by a LASSOtype procedure to ultimately recover f . Their main result states if f is a polynomial ofdegree p , then a sampling width less than (up to a constant) Mp suﬃces for exact recoveryof f . While M should of course depend on p , this was not stated explicitly in [25]. Whileno theoretical results are provided in the presence of noise, they showed their approach to bemore robust than that of Bhandari et al. [2] through numerical simulations.3. The work of Cucuringu and Tyagi [7] that we introduced in Section 1.2 essentially focuses onsolving (QCQP) and its relaxations for denoising mod 1 samples, for the model (1.1). Apartfrom the SDP relaxation discussed eariler, they also considered a “sphere-relaxation” of theconstraint set leading to a trust region subproblem (TRS)min (cid:107) g (cid:107) = n (cid:107) g − z (cid:107) + λg ∗ Lg ⇐⇒ min (cid:107) g (cid:107) = n λg ∗ Lg − g ∗ z ) . (TRS)The main theoretical results in [7] revolve around bounding the error term (cid:107) (cid:98) g − h (cid:107) where (cid:98) g is the solution of (TRS) and h ∈ T n is the ground truth as deﬁned in Section 2.1. For29nstance, when d = 1 and η i ∼ N (0 , σ ) i.i.d, they show that if λ ∆ (cid:46) σ (cid:46)

1, thenprovided the x i ’s form a uniform grid in [0 , w.h.p (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) g | (cid:98) g | − h (cid:13)(cid:13)(cid:13)(cid:13) (cid:46) σn + λM ∆ n . (5.1)where M is the Lipschitz constant of f . However, this bound is in general weak due to thefact that w.h.p, (cid:107) z − h (cid:107) (cid:16) σ n when σ (cid:46)

1. Therefore the bound in (5.1) does not showthat (cid:13)(cid:13)(cid:13) (cid:98) g | (cid:98) g | − h (cid:13)(cid:13)(cid:13) (cid:28) (cid:107) z − h (cid:107) .4. In a parallel work with the present paper, Tyagi [28] provided an improved (cid:96) error analysis forthe (TRS) estimator, as well as an unconstrained quadratic program (UCQP) correspondingto the unconstrained relaxation of (QCQP)min g ∈ C n (cid:107) g − z (cid:107) + λg ∗ Lg. (UCQP)For both (TRS) and (UCQP), (cid:96) error bounds are derived for the more general denoisingsetting where h ∈ T n is smooth with respect to an undirected, connected graph G = ([ n ] , E )in the sense that the quadratic variation h ∗ Lh is “small”. The results are also applied to themodel (1.1) when d = 1 and η i ∼ N (0 , σ ) i.i.d, with the x i ’s forming a uniform grid, and G a path graph. For the choice λ (cid:16) (cid:16) σ n / M (cid:17) / it is shown for any ﬁxed ε ∈ (0 ,

1) that if o (1) ≤ σ (cid:46) n large enough, then the solutions (cid:98) g of (UCQP) and (TRS) satisfy (w.h.p) (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) g | (cid:98) g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε (cid:107) z − h (cid:107) . The above results are in the nonparametric setting where f is typically highly non-linear. Howeverthe setting where f is linear has also been considered recently. For e.g., Shah and Hegde [26]assume f to be a sparse linear function, and provide conditions for exact recovery of f in thenoiseless setting (in the regime n (cid:28) d ). This is accomplished via an alternating minimizationbased algorithm. Musa et al. [19] also consider f to be a sparse linear function, but assume that itis generated from a Bernoulli-Gaussian distribution. The recovery of f is achieved via a generalizedapproximate message passing algorithm, but no theoretical analysis is provided. An important direction for future work is to improve our analysis for the tightness of the SDPestimator. As discussed in Remark 5, the main bottleneck of our analysis is in the (cid:96) ∞ error boundfor the solution (cid:98) g of (QCQP) (in Theorem 5) which is admittedly not satisfactory. Deriving boundssatisfying the property (cid:107) (cid:98) g − h (cid:107) ∞ (cid:28) (cid:107) z − h (cid:107) ∞ is an important question in its own right, and theanalysis for the same should utilize information about (cid:98) g available through its ﬁrst and secondorder optimality conditions. Moreover, we expect to see an “optimal choice” of the regularizer λ –similar to the aforementioned (cid:96) error analysis for (TRS), (UCQP) derived in [28] – which minimizesthe bound on (cid:107) (cid:98) g − h (cid:107) ∞ . Such an analysis is typically facilitated in a random noise model where z i = h i exp( ι πη i ), with η i ∼ N (0 , σ ) i.i.d Gaussian for each i . The result in [7] bounds (cid:107) (cid:98) g − h (cid:107) but we can use the inequality in Fact 3. Moreover, [7, Theorem 14] has a morecomplicated statement than what is stated in (5.1), however it can be veriﬁed that it is of the same order as in (5.1). cknowledgments EU: The research leading to these results has received funding from the European Research Coun-cil under the European Union’s Horizon 2020 research and innovation program / ERC AdvancedGrant E-DUALITY (787960). This paper reﬂects only the authors’ views and the Union is notliable for any use that may be made of the contained information. Research Council KUL: Opti-mization frameworks for deep kernel machines C14/18/068. Flemish Government: FWO: projects:GOA4917N (Deep Restricted Kernel Machines: Methods and Foundations), PhD/Postdoc grant.This research received funding from the Flemish Government (AI Research Program). Ford KULeuven Research Alliance Project KUL0076 (Stability analysis and performance improvement ofdeep reinforcement learning algorithms).

References [1] A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semideﬁniterelaxation for angular synchronization.

Mathematical Programming , 163(1):145–167, 2017.[2] A. Bhandari, F. Krahmer, and R. Raskar. On unlimited sampling. ArXiv e-prints,arXiv:1707.06340v1, 2017.[3] A. Bhandari, F. Krahmer, and R. Raskar. Unlimited sampling of sparse signals. In , pages4569–4573, 2018.[4] A. Bhandari, F. Krahmer, and R. Raskar. Unlimited sampling of sparse sinusoidal mixtures.In , pages 336–340, 2018.[5] S. Chavez, Q.S Xiang, and L. An. Understanding phase maps in mri: a new cutline phaseunwrapping method.

IEEE Transactions on Medical Imaging , 21(8):966–977, 2002.[6] N.H Ching, R. Rosenfeld, and M. Braun. Two-dimensional phase unwrapping using a minimumspanning tree algorithm.

IEEE Transactions on Image Processing , 1(3):355–365, 1992.[7] M. Cucuringu and H. Tyagi. Provably robust estimation of modulo 1 samples of a smooth func-tion with applications to phase unwrapping.

Journal of Machine Learning Research , 21(32):1–77, 2020.[8] S. Foucart and H. Rauhut.

A Mathematical Introduction to Compressive Sensing . Birkh¨auserBasel, 2013.[9] L. C. Graham. Synthetic interferometer radar for topographic mapping.

Proceedings of theIEEE , 62(6):763–768, 1974.[10] M. Hedley and D. Rosenfeld. A new two-dimensional phase unwrapping algorithm for mriimages.

Magnetic Resonance in Medicine , 24(1):177–181, 1992.[11] R. A. Horn and C. R. Johnson.

Matrix Analysis . Cambridge University Press, New York, NY,USA, 2nd edition, 2012.[12] Y.Y. Hung. Shearography for non-destructive evaluation of composite structures.

Optics andLasers in Engineering , 24(2):161 – 182, 1996.3113] K. Itoh. Analysis of the phase unwrapping algorithm.

Appl. Opt. , 21(14):2470–2470, 1982.[14] H. Jiang. Non-asymptotic uniform rates of consistency for k-nn regression. In

The Thirty-ThirdAAAI Conference on Artiﬁcial Intelligence, AAAI , pages 3999–4006, 2019.[15] W. Kester. Mt-025 tutorial adc architectures vi: Folding adcs. Analog Devices, Tech. report,2009.[16] P. Lauterbur. Image formation by induced local interactions: examples employing nuclearmagnetic resonance.

Nature , 242:190–191, 1973.[17] H. Liu, M.-C. Yue, and A. Man-Cho So. On the estimation performance and convergence rateof the generalized power method for phase synchronization.

SIAM Journal on Optimization ,27(4):2426–2446, 2017.[18] J.L. Marroquin and M. Rivera. Quadratic regularization functionals for phase unwrapping.

J.Opt. Soc. Am. A , 12(11):2393–2400, 1995.[19] O. Musa, P. Jung, and N. Goertz. Generalized approximate message passing for unlimited sam-pling of sparse signals. In , pages 336–340, 2018.[20] A. Nemirovski. Topics in non-parametric statistics.

Ecole dEt´e de Probabilit´es de Saint-Flour ,28:85, 2000.[21] D. Paoletti, G.S. Spagnolo, P. Zanetta, M. Facchini, and D. Albrecht. Manipulation of specklefringes for non-destructive testing of defects in composites.

Optics and Laser Technology ,26(2):99 – 104, 1994.[22] C. Prati, M. Giani, and N. Leuratti. Sar interferometry: A 2-d phase unwrapping techniquebased on phase and absolute values informations. In , pages 2043–2046, 1990.[23] M. D Pritt and J.S. Shipman. Least-squares two-dimensional phase unwrapping using ﬀt’s.

IEEE Transactions on Geoscience and Remote Sensing , 32(3):706–708, 1994.[24] J. Rhee and Y. Joo. Wide dynamic range cmos image sensor with pixel level adc.

ElectronicsLetters , 39(4):360–361, 2003.[25] S. Rudresh, A. Adiga, B. A. Shenoy, and C. S. Seelamantula. Wavelet-based reconstruction forunlimited sampling. In , pages 4584–4588, 2018.[26] V. Shah and C. Hegde. Signal reconstruction from modulo observations, 2018.[27] M. Takeda and T. Abe. Phase unwrapping by a maximum cross-amplitude spanning treealgorithm: a comparative study.

Optical Engineering , 35:35 – 35 – 7, 1996.[28] Hemant Tyagi. Error analysis for denoising smooth modulo signals on a graph. in preparation,2020.[29] L. Vandenberghe and S. Boyd. Semideﬁnite programming.

SIAM Rev. , 38(1):49–95, 1996.3230] T. Yamaguchi, H. Takehara, Y. Sunaga, M. Haruta, M. Motoyama, Y. Ohta, T. Noda,K. Sasagawa, T. Tokuda, and J. Ohta. Implantable self-reset cmos image sensor and itsapplication to hemodynamic response detection in living mouse brain.

Japanese Journal ofApplied Physics , 55(4S):04EM02, 2016.[31] L. Ying.

Phase Unwrapping . John Wiley and Sons, Inc., 2006.[32] H.A. Zebker and R.M. Goldstein. Topographic mapping from interferometric synthetic aper-ture radar observations.

Journal of Geophysical Research: Solid Earth , 91(B5):4993–4999,1986.[33] S. Zhang and Y. Huang. Complex quadratic optimization and semideﬁnite programming.

SIAM Journal on Optimization , 16(3):871–890, 2006.

A Technical results

The following technical results are instrumental for proving our main results.

Fact 3 ([17]) . Let q ≥ , z ∈ T n and w ∈ C n . Then, it holds (cid:107) w | w | − z (cid:107) q ≤ (cid:107) w − z (cid:107) q . Fact 3 indeed means that the distance between z ∈ T n and w ∈ C n after projection on T n canbe upper bounded by the distance before projection, up to a scalar factor. It is proved in [17].The following result relates the distance between two points u, v ∈ T with their arguments. Fact 4.

Let u = exp(2 πιf ) and v = exp(2 πι (cid:98) f ) For ≤ (cid:15) ≤ , let | v − u | ≤ (cid:15) . Then, we have d w ( (cid:98) f mod 1 , f mod 1) ≤ (cid:15) . Proof.

The proof follows the same lines as in [7]. We know that | u − v | = | exp(2 πιf ) − exp(2 πι (cid:98) f ) | = | − exp(2 πι ( (cid:98) f mod 1 − f mod 1)) | = 2 | sin( π ( (cid:98) f mod 1 − f mod 1)) | = 2 sin( π | (cid:98) f mod 1 − f mod 1 | )= 2 sin( π (1 − | (cid:98) f mod 1 − f mod 1 | ))= 2 sin( πd w ( (cid:98) f mod 1 , f mod 1))where the second and third last equalities follow from (cid:98) f mod 1 − f mod 1 ∈ [ − , {| (cid:98) f mod 1 − f mod 1 | , − | (cid:98) f mod 1 − f mod 1 |} ≤ π arcsin( (cid:15)/ . Notice that arcsin( x ) is a convex function on [0 ,

1] with arcsin(0) = 0 and arcsin(1) = π/

2. Theupshot is that arcsin( x ) ≤ πx/ x ∈ [0 , d w ( (cid:98) f mod 1 , f mod 1) ≤ (cid:15). Theorem 7 (Bernstein inequality for bounded random variables [8]) . Let X , . . . , X M be indepen-dent random variables with zero mean such that | X (cid:96) | ≤ K almost surely for (cid:96) ∈ [ M ] and someconstant K > . Furthermore, assume E | X (cid:96) | ≤ σ (cid:96) for constants σ (cid:96) > , (cid:96) ∈ [ M ] . Then, for all t > , Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M (cid:88) (cid:96) =1 X (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ (cid:18) − t / σ + Kt/ (cid:19) , where σ = (cid:80) M(cid:96) =1 σ (cid:96) . B Optimality conditions of QCQP

Denote the objective of the QCQP by F ( g ) = λg ∗ Lg − g ∗ z ) . We recall the covariant derivative isgrad F ( g ) = proj g ∇ F ( g ) = 2 { Lg − z − Re diag (( Lg − z ) g ∗ ) } where the projection on the tangent space at g ∈ T n is deﬁned in (3.11). The ﬁrst order necessaryoptimality condition in then indeed grad F ( (cid:98) g ) = 0. The second order necessary condition involvesthe Hessian as follows (cid:104) ˙ g, Hess F ( g )[ ˙ g ] (cid:105) ≥ g ∈ T g T n , where Hess F ( g )[ ˙ g ] = proj g D grad F ( g )[ ˙ g ] with D denoting the directional derivative. Proposition 2.

We have