Error analysis for denoising smooth modulo signals on a graph
aa r X i v : . [ m a t h . S T ] S e p Error analysis for denoising smooth modulo signals on a graph
Hemant Tyagi ∗ [email protected] September 11, 2020
Abstract
In many applications, we are given access to noisy modulo samples of a smooth functionwith the goal being to robustly unwrap the samples, i.e., to estimate the original samples of thefunction. In a recent work, Cucuringu and Tyagi [11] proposed denoising the modulo samples byfirst representing them on the unit complex circle and then solving a smoothness regularized leastsquares problem – the smoothness measured w.r.t the Laplacian of a suitable proximity graph G – on the product manifold of unit circles. This problem is a quadratically constrained quadraticprogram (QCQP) which is nonconvex, hence they proposed solving its sphere-relaxation leadingto a trust region subproblem (TRS). In terms of theoretical guarantees, ℓ error bounds werederived for (TRS). These bounds are however weak in general and do not really demonstratethe denoising performed by (TRS).In this work, we analyse the (TRS) as well as an unconstrained relaxation of (QCQP). Forboth these estimators we provide a refined analysis in the setting of Gaussian noise and derivenoise regimes where they provably denoise the modulo observations w.r.t the ℓ norm. Theanalysis is performed in a general setting where G is any connected graph. Many modern applications involve the acquisition of noisy modulo samples of a function f : R d → R ,i.e., we obtain y i = ( f ( x i ) + η i ) mod ζ ; i = 1 , . . . , n (1.1)for some ζ ∈ R + , where η i denotes noise. Here, a mod ζ lies in the interval [0 , ζ ) and is such that a = qζ + ( a mod ζ ) for an integer q . This situation arises, for instance, in self-reset analog to digitalconverters (ADCs) which handle voltage surges by simply storing the modulo value of the voltagesignal [17, 26, 35]. In other words, if the voltage signal exceeds the range [0 , ζ ], then its value issimply reset via the modulo operation. Another important application is phase unwrapping whereone typically obtains noisy modulo 2 π samples. There, one usually seeks to infer the structure of anobject by transmitting waveforms, and measuring the difference in phase (in radians) between thetransmitted and scattered waveforms. This is common in synthetic radar aperture interferometry(InSAR) where one aims to learn the elevation map of a terrain (see for e.g. [13, 36]), and alsoarises in many other domains such as MRI [15, 20] and diffraction tomography [25], to name a few.Let us assume ζ = 1 from now. Given (1.1), one can ask whether we can recover the originalsamples of f , i.e., f ( x i )? Clearly, this is only possible up to a global integer shift. Furthermore,answering this question requires making additional assumptions about f , for instance, we mayassume f to be smooth (for e.g., Lipschitz, continuously differentiable etc.). In this setting, when ∗ Inria, Univ. Lille, CNRS, UMR 8524 - Laboratoire Paul Painlev´e, F-59000 = 1, it is not difficult to see that we can exactly recover the samples of f when there is nonoise (i.e., η i = 0 in (1.1)) provided the sampling is fine enough. This is achieved by sequentiallytraversing the samples and reconstructing the quotient q i ∈ Z corresponding to f ( x i ) by setting itto q i − + a i − ,i where a i − ,i equals either (a) 0 (if y i is “close enough” to y i − ), or (b) ± y i is sufficiently smaller/larger than y i − ). If f is smooth, then a fine enough sampling density willensure that nearby samples of f have quotients which differ by either 1 , − f ( x i ) is of course no longer possible in the presence of noise, one canshow [12] that if η i mod 1 is not too large (uniformly for all i ), and if n is sufficiently large, thenthe estimates ˜ f ( x i ) returned by the above procedure are such that (up to a global integer shift)for each i , (cid:12)(cid:12)(cid:12) ˜ f ( x i ) − f ( x i ) (cid:12)(cid:12)(cid:12) is bounded by a term proportional to the noise level. This suggests anatural two stage procedure for estimating f ( x i ) – first obtain denoised estimates of f ( x i ) mod 1,and in the second stage, “unwrap” these denoised modulo samples to obtain the estimates ˜ f ( x i ).This was the motivation behind the recent work of Cucuringu and Tyagi [10, 11] that focusedprimarily on the first (modulo denoising) stage, which is an interesting question by itself. Beforediscussing their result, it will be convenient to fix the notation used throughout the paper. Notation.
Vectors and matrices are denoted by lower and upper case symbols respectively, whilesets are denoted by calligraphic symbols (eg., N ), with the exception of [ n ] = { , . . . , n } for n ∈ N .The imaginary unit is denoted by ι = √−
1. For a, b ≥
0, we write a . b when there exists anabsolute constant C > a ≤ Cb . If a . b and a & b , then we write a ≍ b . For u ∈ C n , we write u = Re( u ) + ι Im( u ) where Re( u ) , Im( u ) ∈ R n denote its real and imaginary partsrespectively. The Hermitian conjugate of u is denoted by u ∗ , and k u k p denotes the usual ℓ p normin C n for 1 ≤ p ≤ ∞ . Cucuringu and Tyagi [11, 10] considered the problem of denoising modulo samples by formulatingit as an optimization problem on a manifold. Assume f : [0 , d → R , and suppose that x i ’s forma uniform grid in [0 , d . The main idea is to represent each sample y i via z i = exp( ι πy i ) andobserving that if x i is close to x j , then exp( ι πf ( x i )) ≈ exp( ι πf ( x j )) holds provided f is smooth.In other words, the samples z = ( z , . . . , z n ) lie on the manifold C n := { u ∈ C n : | u i | = 1; i = 1 , . . . , n } which is the product manifold of unit radius circles, i.e., C n = C × · · · × C . Hence, by constructinga proximity graph G = ([ n ] , E ) on the sampling points ( x i ) ni =1 – where there is an edge between i and j if x i is within a specified distance of x j – they proposed solving a quadratically constrainedquadratic program (QCQP) min g ∈C n k g − z k + γg ∗ Lg (QCQP)where L is the Laplacian of G , and γ > C n . It is not clear whether theglobal solution of (QCQP) can be obtained in polynomial time. Hence they proposed relaxing C n to a sphere leading to the trust region subproblem (TRS)min g ∈ C n : k g k = n k g − z k + γg ∗ Lg ≡ min g ∈ C n : k g k = n − g ∗ z ) + γg ∗ Lg. (TRS)2rust region subproblems are well known to be solvable in polynomial time, with many efficientsolvers (e.g., [14, 1]). In [11, 10], (TRS) was shown to be quite robust to noise and scalable to largeproblem sizes through extensive experiments. On the theoretical front, a ℓ stability result wasshown for the solution b g of (TRS). To describe this result, denote h ∈ C n to be the ground truthsamples where h i = exp( ι πf ( x i )), and define the entry-wise projection operator C n
7→ C n [21] (cid:18) g | g | (cid:19) i := (cid:26) g i | g i | ; g i = 0 , i = 1 , . . . , n. (1.2)Consider the univariate case with f : [0 , → R , and assume that the (Gaussian) noise η i ∼ N (0 , σ )i.i.d for i = 1 , . . . , n . Assuming f is M -Lipschitz, and σ .
1, denote ∆ to be the maximum degreeof G with x i ’s forming a uniform grid on [0 , γ ∆ .
1, the bounds in [11, Theorem14] imply that with high probability, the solution b g of (TRS) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . σn + γM ∆ n . (1.3)On the other hand, one can show that k z − h k ≍ σ n with high probability if σ .
1, hence wecannot conclude (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − z (cid:13)(cid:13)(cid:13)(cid:13) ≪ k z − h k . (1.4)This motivates the present paper which seeks to identify conditions under which (1.4) holds. Infact, we will consider a more abstract problem setting where G is any connected graph, and h ∈ C n is smooth w.r.t G . This is formally described below, followed by a summary of our main results. Consider an undirected, connected graph G = ([ n ] , E ) where E ⊆ {{ i, j } : i = j ∈ [ n ] } denotes itsset of edges. Denote the maximum degree of G by △ , and the (combinatorial) Laplacian matrixassociated with G by L ∈ R n × n . We will denote the eigenvalues of L by λ n = 0 < λ n − = λ min ≤ λ n − ≤ · · · ≤ λ where λ min denotes the Fiedler value of G , and is well known to be a measure of connectivity of G .The corresponding (unit ℓ norm) eigenvectors of L will be denoted by q j ∈ R n ; j = 1 , . . . , n wherewe have that q n = √ n (1 , . . . , T .Let h = [ h , . . . , h n ] T ∈ C n be an unknown ground truth signal which is assumed to be smoothw.r.t G in the sense that h ∗ Lh = X { i,j }∈ E | h i − h j | ≤ B n . (1.5)Here, B n will typically be “small” with the subscript n depicting possible dependence on n . Wewill assume that information about h is available in the form of noisy z ∈ C n . Specifically, denoting η i ∼ N (0 , σ ); i = 1 , . . . , n to be i.i.d Gaussian random variables, we are given z i = h i exp( ι πη i ); i = 1 , . . . , n. (1.6) The result in [11] bounds k b g − h k but we can then use (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) ≤ k b g − h k , see [21, Proposition 3.3]. Also,while the statement in [11, Theorem 14] has a more complicated form than that stated in (1.3), one can verify thatit is essentially of the same order as in (1.3). z ∈ C n , and the graph G , we aim to answer the following twoquestions.1. Consider the unconstrained quadratic program (UCQP) obtained by relaxing the constraintsin (QCQP) to C n min g ∈ C n k g − z k + γg ∗ Lg. (UCQP)This is a convex program, also referred to as Tikhonov regularization in the literature (e.g.,[31]), and its solution is given in closed form by b g = ( I + γL ) − z . Under what conditions canwe ensure that b g satisfies (1.4)?2. Under what conditions can we ensure that the solution b g of (TRS) satisfies (1.4)?As we will see in Section 4, the solution of (TRS) is closely related to that of (UCQP) but requiresa more careful analysis. Example: denoising modulo samples of a function.
We will also seek to instantiate theabove results for the setup in [11] described earlier. More precisely, denoting f : [0 , → R tobe a M -Lipschitz function where | f ( x ) − f ( y ) | ≤ M | x − y | for each x, y ∈ [0 , x i = i − n − for i = 1 , . . . , n be a uniform grid. For each i , we are given y i = ( f ( x i ) + η i ) mod 1; η i ∼ N (0 , σ ) , where η i are i.i.d. Then denoting z i = exp( ι πy i ) and h i = exp( ι πf ( x i )), we will consider G =([ n ] , E ) to be the path graph ( P n ), where E = {{ i, i + 1 } : i = 1 , . . . , n } . Before stating our results, let us define for any λ ∈ [ λ min , λ ], the set L λ := { j ∈ [ n −
1] : λ j < λ } (1.7)consisting of indices corresponding to the “low frequency” part of the spectrum of L . Moreover,while all our results are non-asymptotic, we suppress constants in this section for clarity. Results for (UCQP) . We begin by outlining our main results for the estimator (UCQP). The-orem 1 below identifies conditions on σ under which (UCQP) provably denoises the samples inexpectation. The statement is a simplified version of Theorem 4. Theorem 1.
For any given ε ∈ (0 , and λ ∈ [ λ min , λ ] satisfying (cid:12)(cid:12) L λ (cid:12)(cid:12) . εn , suppose that ελ q △ B n n . σ . . Then for the choice γ ≍ ( σ n △ B n λ ) / , the solution b g of (UCQP) satisfies E (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) ≤ ε E k z − h k . The parameter λ depends on the spectrum of the Laplacian and can be thought of as the “cut-offfrequency”. As can be seen from Theorem 1, we would ideally like λ to be “large” and (cid:12)(cid:12) L λ (cid:12)(cid:12) to be“small”; in particular, (cid:12)(cid:12) L λ (cid:12)(cid:12) = o ( n ). For e.g., when G is the complete graph ( K n ), then λ n = 0 and λ i = n for i = 1 , . . . , n −
1. Hence, the choice λ = n is ideal as it leads to (cid:12)(cid:12) L λ (cid:12)(cid:12) = 0. Furthermore,4he noise regime in the Theorem involves a lower bound on σ which might seem unnatural. Webelieve this is likely an artefact of the analysis involving the estimation error. Specifically, the errorbound we obtain is of the form (see Lemma 3.1) E (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . σλ p △ B n n + σ (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) . The problematic term is the first term which depends linearly on σ . Indeed, since k z − h k ≍ σ n w.h.p, we end up with the lower bound requirement when σ ≪ ελ q △ B n n = o (1) as n increases, the requirement on σ is of the form o (1) ≤ σ . n increases.We also derive conditions under which denoising occurs with high probability (w.h.p). Theorem2 below is a simplified version of Theorem 6. Theorem 2.
For any given ε ∈ (0 , and λ ∈ [ λ min , λ ] , suppose that max ( ελ r △ B n n , log n √ εn ) . σ . and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n . εn. If γ ≍ ( σ n △ B n λ ) / , then w.h.p, the solution b g of (UCQP) satisfies (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) ≤ ε k z − h k . The conditions in Theorem 2 are slightly more stringent compared to those in Theorem 1 dueto the appearance of extra log factors. These arise due to the concentration inequalities used inthe analysis for bounding the estimation error, see Theorem 5.In Section 3, we interpret our results for special graphs such as K n , P n and the star graph ( S n );see Corollaries 2, 3. In particular, the statement for P n therein can be applied to the examplefrom Section 1.2, readily yielding conditions that ensure denoising; see Corollary 4. It states thatif log n √ n . σ . n &
1, then for γ ≍ (cid:16) σ n / M (cid:17) / the solution b g of (UCQP) satisfies (w.h.p) (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . ( σM + σ ) n / + log n. Hence for any ε ∈ (0 , n Mεn / , log n √ εn o . σ .
1, if n & (1 /ε ) , then (UCQP)denoises z w.h.p. Results for (TRS) . We now describe conditions under which the estimator (TRS) provablydenoises z ∈ C n w.h.p. Theorem 3 below is a simplified version of Theorem 8. Theorem 3.
For any ε ∈ (0 , , given k ∈ [ n − s.t λ n − k +1 < λ n − k and λ ∈ [ λ min , λ ] with thechoice γ ≍ ( σ n △ B n λ ) / , suppose that the following conditions are satisfied. (i) B n . min (cid:8) nλ n − k , nλ (cid:9) , and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n . εn . (ii) σ . min (cid:26) √ ε, λλ n − k +1 q △ B n n (cid:27) and σ & max ((cid:18) log n √ n (cid:19) / , B n nλ n − k √ ε , r log nnε , ελ r △ B n n + λ n − k +1 r n △ B n !) . probability approaching 1 as n → ∞ . hen the (unique) solution b g ∈ C n of (TRS) satisfies (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) ≤ ε k z − h k w.h.p. The above statement is admittedly more convoluted than that for (UCQP) which is primarilydue to the more intricate nature of the estimation error analysis. An interesting aspect of the aboveresult is that we can consider the “best” choice of k (satisfying λ n − k +1 < λ n − k ) which leads to themildest constraints on σ and B n . Note that since G is connected (by assumption), k = 1 alwayssatisfies this condition (0 = λ n < λ n − ). This leads to the following useful Corollary which is asimplified version of Corollary 6. Corollary 1.
For any ε ∈ (0 , and λ ∈ [ λ min , λ ] with the choice γ ≍ ( σ n △ B n λ ) / , suppose thatthe following conditions are satisfied. (i) B n . nλ min and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n . εn . (ii) max (cid:26)(cid:16) log n √ n (cid:17) / , B n nλ min √ ε , q log nn , ελ q △ B n n (cid:27) . σ . √ ε. Then the (unique) solution b g ∈ C n of (TRS) satisfies (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) ≤ ε k z − h k w.h.p. It is natural to ask whether Corollary 1 is – for all practical purposes – sufficient, comparedto Theorem 3. For the complete graph K n , observe that the only valid choice is k = 1, hence weneed Corollary 1. However, as we will see in Section 4, it turns out that for the path graph P n ,Corollary 1 leads to unreasonably strict conditions on σ and B n (see Remark 4). In fact for thepath graph, λ n − k +1 < λ n − k holds for each k ∈ [ n − k in Theorem 3 iskey to obtaining satisfactory conditions, see Corollary 8.In the setting of the example from Section 1.2, Corollary 8 roughly states that for any ε ∈ (0 , log n √ n ) / . σ . √ ε with n large enough and a suitably chosen γ , (TRS) denoises z w.h.p. This condition on σ is visibly stricter than that for (UCQP); we believe that this is likelyan artefact of the analysis. Nevertheless, for this example, we see for ε fixed and n → ∞ that both(TRS) and (UCQP) provably denoise z w.h.p in the noise regime o (1) ≤ σ . Outline of the paper.
The rest of the paper is organized as follows. Section 2 introduces somepreliminaries involving certain intermediate facts and technical results that will be needed in ouranalysis. Section 3 contains the analysis for (UCQP) while Section 4 analyzes (TRS). We concludewith Section 5 which contains a discussion with related work from the literature, and directions forfuture work.
This section summarizes some useful technical tools that will be employed at multiple points in ouranalysis. We begin by deriving some simple consequences of the smoothness assumption in (1.5).
Smooth phase signals.
Similar to (1.7), for λ ∈ [ λ min , λ ], let us define the set H λ := { j ∈ [ n −
1] : λ j ≥ λ } L . Then using(1.5) and the fact k h k = n , it is not difficult to establish that X j ∈H λ |h h, q j i| ≤ B n λ , X j ∈L λ ∪{ n } |h h, q j i| ≥ n − B n λ . (2.1)Hence the smaller B n is, the more correlated h is with the eigenvectors corresponding to { n } ∪ L λ .It is also useful to note that since | ( Lh ) i | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X j : { i,j }∈ E ( h i − h j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ △ X j : { i,j }∈ E | h i − h j | , hence k Lh k ≤ △ h ∗ Lh ≤ △ B n which is equivalent to n − X j =1 λ j |h h, q j i| ≤ △ B n . (2.2)Finally, recall the setup in Section 1.2 where we obtain noisy modulo samples of a smooth function f on a uniform grid (with G = P n ). In this setting, we can relate B n to the quadratic variation of f on the grid. Indeed, using the fact | h i − h i +1 | ≤ π | f ( x i ) − f ( x i +1 ) | (see proof of [11, Lemma8]), it follows that n − X i =1 | h i − h i +1 | ≤ π n − X i =1 | f ( x i ) − f ( x i +1 ) | . (2.3) Noise model.
Next, we collect some useful results related to the random noise model in (1.6).The following Proposition is easy to verify, its proof is provided in the appendix for completeness.
Proposition 1.
Denote z = [ z , . . . , z n ] T ∈ C n with z i as defined in (1.6) . Let u ∈ C n be givenwith k u k = 1 . Then the following is true. (i) E [ z ] = e − π σ h . (ii) E (cid:20)(cid:12)(cid:12)(cid:12) h z − e − π σ h, u i (cid:12)(cid:12)(cid:12) (cid:21) = (1 − e − π σ ) . (iii) E [ |h z, u i| ] = e − π σ |h h, u i| + (1 − e − π σ ) . (iv) E (cid:20)(cid:13)(cid:13)(cid:13) z − e − π σ h (cid:13)(cid:13)(cid:13) (cid:21) = n (1 − e − π σ ) . (v) E [ k z − h k ] = 2 n (1 − e − π σ ) .In particular, if σ ≤ π √ then the expected distance of z from h can be bounded as π σ n ≤ E [ k z − h k ] ≤ π σ n. (2.4) Proof.
See Appendix AWe will also require several concentration bounds in order to derive high probability errorbounds later on in our analysis. 7 roposition 2.
Denote z = [ z , . . . , z n ] T ∈ C n with z i as defined in (1.6) . For any given U =[ u · · · u k ] ∈ R n × k with orthonormal columns, the following holds true. (i) With probability at least − n , z ∗ U U T z − k (1 − e − π σ ) − e − π σ h ∗ U U T h ≥ − h n + 32 √ k (1 − e − π σ ) p log n + 11 log n (cid:16)(cid:13)(cid:13) U U T h (cid:13)(cid:13) ∞ + p − e − π σ (cid:13)(cid:13) U U T h (cid:13)(cid:13) (cid:17)i . (ii) With probability at least − n , ( z − e − π σ h ) ∗ U U T ( z − e − π σ h ) ≤ k (1 − e − π σ ) + 4096 log n + 32 √ k (1 − e − π σ ) p log n. (iii) With probability at least − n , k z − h k − n (1 − e − π σ ) ≤ n (cid:18) q − e − π σ ) n (cid:19) . (iv) With probability at least − n , (cid:13)(cid:13)(cid:13) z − e − π σ h (cid:13)(cid:13)(cid:13) − n (1 − e − π σ ) ≤ n (cid:18) q − e − π σ ) n (cid:19) . In all the above inequalities, the event obtained by reversing both the inequality and the sign of theRHS term also holds with the stated probability of success.Proof.
See Appendix B.
Entrywise projection on C n . Lastly, we will make extensive use of the following useful resultfrom [21, Proposition 3.3] concerning the projection operator in (1.2).
Proposition 3 ([21]) . For any q ∈ [1 , ∞ ] , w ∈ C n and g ∈ C n , we have (cid:13)(cid:13)(cid:13)(cid:13) w | w | − g (cid:13)(cid:13)(cid:13)(cid:13) q ≤ k w − g k q . It is especially important to note that for any number t >
0, we have w | w | = tw | tw | . Hence fromthe above Proposition, it follows that (cid:13)(cid:13)(cid:13) w | w | − g (cid:13)(cid:13)(cid:13) q ≤ k tw − g k q for any t >
0. While it might bedifficult to choose the optimal scale t > t = 1. (UCQP) We now analyse the quality of the solution b g = ( I + γL ) − z of (UCQP). To begin with, we havethe following Lemma that bounds the error between b g | b g | and h for any z ∈ C n .8 emma 1. For any z ∈ C n and λ ∈ [ λ min , λ ] , it holds that (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ E + E ) where E = (cid:12)(cid:12)(cid:12) h q n , e π σ z − h i (cid:12)(cid:12)(cid:12) + 1(1 + γλ min ) X j ∈L λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) + (cid:13)(cid:13)(cid:13) e π σ z − h (cid:13)(cid:13)(cid:13) (1 + γλ ) ,E = 2 γ △ B n (1 + γλ min ) . Proof.
From the definition in (1.2), observe that b g | b g | = e π σ b g | e π σ b g | where we recall b g = ( I + γL ) − z .We can write e π σ b g − h = ( I + γL ) − ( e π σ z − h ) | {z } e + ( I + γL ) − h − h | {z } e which implies (cid:13)(cid:13)(cid:13) e π σ b g − h (cid:13)(cid:13)(cid:13) ≤ k e k + k e k ). Using Proposition 3, we then obtain (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) e π σ b g (cid:12)(cid:12) e π σ b g (cid:12)(cid:12) − h (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) e π σ b g − h (cid:13)(cid:13)(cid:13) ≤ k e k + k e k ) . Bounding k e k . Using h = P nj =1 h q j , h i q j , we can write e as e = ( I + γL ) − h − h = n X j =1 h q j , h i γλ j q j − n X j =1 h q j , h i q j = n − X j =1 (cid:18) − γλ j γλ j (cid:19) h q j , h i q j . This leads to the bound (using orthonormality of q j ’s) k e k = n − X j =1 γ λ j (1 + γλ j ) ! |h h, q j i| ≤ γ (1 + γλ min ) n − X j =1 λ j |h h, q j i| ≤ △ γ B n (1 + γλ min ) where the last inequality uses (2.2). Bounding k e k . By writing e as e = ( I + γL ) − ( e π σ z − h ) = h q n , e π σ z − h i q n + n − X j =1 h q j , e π σ z − h i γλ j q j
9e obtain the bound k e k = (cid:12)(cid:12)(cid:12) h q n , e π σ z − h i (cid:12)(cid:12)(cid:12) + n − X j =1 (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) (1 + γλ j ) ≤ (cid:12)(cid:12)(cid:12) h q n , e π σ z − h i (cid:12)(cid:12)(cid:12) + X j ∈L λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) (1 + γλ min ) + X j ∈H λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) (1 + γλ ) = (cid:12)(cid:12)(cid:12) h q n , e π σ z − h i (cid:12)(cid:12)(cid:12) (cid:18) − γλ ) (cid:19) + (cid:18) γλ min ) − γλ ) (cid:19) X j ∈L λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) + (cid:13)(cid:13)(cid:13) e π σ z − h (cid:13)(cid:13)(cid:13) (1 + γλ ) ≤ (cid:12)(cid:12)(cid:12) h q n , e π σ z − h i (cid:12)(cid:12)(cid:12) + 1(1 + γλ min ) X j ∈L λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) + (cid:13)(cid:13)(cid:13) e π σ z − h (cid:13)(cid:13)(cid:13) (1 + γλ ) where in the penultimate step, we used the identity X j ∈H λ (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) = (cid:13)(cid:13)(cid:13) e π σ z − h (cid:13)(cid:13)(cid:13) − X j ∈L λ ∪{ n } (cid:12)(cid:12)(cid:12) h q j , e π σ z − h i (cid:12)(cid:12)(cid:12) . The quantity λ depends on the graph G , and as we will see shortly, we will like to choose λ such that (a) (cid:12)(cid:12) L λ (cid:12)(cid:12) is small and, (b) P j ∈L λ ∪{ n } |h h, q j i| ≈ k h k = n . If z ∈ C n is generated randomly as in (1.6), we easily obtain from Lemma 1 a bound on the expected ℓ error. Lemma 2.
Let z ∈ C n be generated as in (1.6) and assume σ ≤ π . Then for any given λ ∈ [ λ min , λ ] , it holds that E (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ △ γ B n (1 + γλ min ) + 64 π σ (cid:12)(cid:12) L λ (cid:12)(cid:12) (1 + γλ min ) + n (1 + γλ ) ! . (3.1) In particular, if γ = ( π σ n △ B n λ ) / , we obtain the simplified bound E (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ π (cid:18) σλ p △ B n n + πσ (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) (cid:19) . (3.2) Proof.
By taking the expectation of both sides of the inequality in Lemma 1, and using Proposition1, we obtain E (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ △ γ B n (1 + γλ min ) + 8( e π σ − (cid:12)(cid:12) L λ (cid:12)(cid:12) (1 + γλ min ) + n (1 + γλ ) ! . e x ≤ x for any x ≤ γλ min ≥ γλ ≥ γλ in (3.1) and observethat the resulting bound is a convex function of γ minimized for the stated value of γ . Pluggingthis choice of γ in (3.1) then yields (3.2).As can be seen from (3.2), there is a trade-off in the choice of λ . We are now ready to state ourfirst main theorem which establishes conditions under which (UCQP) provably denoises the input z ∈ C n . It follows easily from Lemma 2 and Proposition 1(v). Theorem 4.
Let z ∈ C n be generated as in (1.6) , and ε ∈ (0 , . For any given λ ∈ [ λ min , λ ] satisfying (cid:12)(cid:12) L λ (cid:12)(cid:12) ≤ εn , suppose that the noise level σ satisfies πελ r △ B n n ≤ σ ≤ π . Then for the choice γ = ( π σ n △ B n λ ) / , the solution b g of (UCQP) satisfies E (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε E k z − h k . (3.3) Proof.
Recall from (2.4) that if σ ≤ π √ , then E k z − h k ≥ π σ n holds. Then using (3.2) fromLemma 2, we see that if σ ≤ π , then for any ε ∈ (0 , (cid:18) √△ B n nλ + πσ (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) (cid:19) ≤ πεσn. (3.4)Finally, (3.4) is clearly ensured provided 1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ≤ εn and32 √△ B n nλ ≤ σεn ⇐⇒ ελ r △ B n n ≤ σ which completes the proof.The following remarks are in order.(i) The condition 1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) . εn in Theorem 4 implies that we require (cid:12)(cid:12) L λ (cid:12)(cid:12) = o ( n ) as n increases,for fixed ε . In other words, λ should not be chosen to be too large.(ii) The Theorem requires that the noise level lies in the regime ελ q △ B n n . σ . σλ √△ B n n . Nevertheless, if1 λ r △ B n n = o (1) as n → ∞ , then the condition on σ weakens considerably as n increases.It is interesting to translate the conditions of Theorem 4 for special choices of G , namely K n (complete graph), S n (star graph) and P n (path graph). This leads to the following Corollary. Corollary 2.
Let z ∈ C n be generated as in (1.6) , and ε ∈ (0 , . ( G = K n ) Suppose n & ε and nε √ B n . σ . . If γ ≍ ( σ n B n ) / , then the solution b g of (UCQP) satisfies (3.3) . (ii) ( G = S n ) Suppose n & ε and ε √ B n . σ . . If γ ≍ ( σ B n ) / , then the solution b g of (UCQP) satisfies (3.3) . (iii) ( G = P n ) For a given θ ∈ [0 , , suppose n & ( ε ) − θ and n − θ ε √ B n . σ . . If γ ≍ ( σ n − θ B n ) / , then the solution b g of (UCQP) satisfies (3.3) .Proof. The spectra of L for K n , S n and P n are well known, see for e.g. [8, Chapter 1].(i) Here, △ = n − λ n − = · · · = λ = n . Choose λ = n so that L λ = ∅ .(ii) Here, △ = n − λ n − = · · · = λ = 1 and λ = n . Choose λ = 1 so that L λ = ∅ .(iii) Here, λ j = 4(sin [ π n ( n − j )]) for j = 1 , . . . , n ; hence λ ≍ λ min ≍ sin (cid:16) π n (cid:17) ≍ n . Choosing λ ≍ n − θ ) for θ ∈ [0 , (cid:12)(cid:12) L λ (cid:12)(cid:12) . n θ , and so, 1+ (cid:12)(cid:12) L λ (cid:12)(cid:12) . εn provided n & ( ε ) − θ . The conditions on σ and γ follow easily using △ = 2. Remark 1.
For the graphs in Corollary 2, we can deduce conditions on the smoothness term B n which lead to a non-vacuous regime for σ , of the form o (1) ≤ σ . . These are detailed below when n → ∞ .1. ( G = K n ) If ε is fixed, then B n = o ( n ) suffices. On the other hand, if ε = ε n → (as n → ∞ ) and ε n = ω ( n ) , then B n = o ( ε n n ) is sufficient.2. ( G = S n ) For ε fixed, it suffices that B n = o (1) , while if ε n → and ε n = ω ( n ) , then B n = o ( ε n ) is sufficient.3. ( G = P n ) For fixed ε , it is sufficient that B n = o ( n θ − ) , while the condition B n = o ( ε n n θ − ) suffices if ε n → and ε n = ω ( n − θ ) . We now proceed to derive high probability bounds on the estimation error when z ∈ C n is generatedrandomly as in (1.6). We begin with a simplification of some of the bounds stated in Proposition2 that we will need in our analysis. Lemma 3.
For any given λ ∈ [ λ min , λ ] , let U denote the n × (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) matrix consisting of q j for n − − (cid:12)(cid:12) L λ (cid:12)(cid:12) ≤ j ≤ n . Assuming z ∈ C n is generated randomly as in (1.6) , denote z = E [ z ] = e − π σ h . Then the following is true. (i) If σ ≤ √ π , then P (cid:18) ( z − z ) ∗ U U T ( z − z ) ≤ σ (cid:18)q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n + 1 + (cid:12)(cid:12) L λ (cid:12)(cid:12)(cid:19) + 4096 log n (cid:19) ≥ − n . If
72 log nπ √ n ≤ σ ≤ √ π , then P ( k z − z k ≤ π σ n ) ≥ − n . (iii) If
72 log nπ √ n ≤ σ ≤ √ π , then P ( k z − h k ≥ π σ n ) ≥ − n .Proof. Recall that for x ∈ [0 , x/ ≤ − e − x ≤ x . Therefore if 8 π σ ≤
1, then1 − e − π σ ∈ [4 π σ , π σ ] , − e − π σ ∈ [2 π σ , π σ ] , − e − π σ ∈ [ π σ , π σ ] . (3.5) Proof of (i).
Using (3.5) in Proposition 2 (ii) with k = 1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) readily yields the statement. Proof of (ii).
Using (3.5) in Proposition 2 (iv), we obtain w.p at least 1 − n the bound k z − z k ≤ π σ n + 3 log n (2 + p π σ n ) (3.6)We now simplify the RHS of (3.6) by noting that if σ ≥ π √ n and n ≥
2, then2 + p π σ n ≤ πσ √ n ≤ πσ √ n. (3.7)Applying (3.7) in (3.6), we obtain k z − z k ≤ π σ n + 72 πσ √ n log n ≤ π σ n provided σ ≥
72 log nπ √ n . Proof of (iii).
Using (3.5) in Proposition 2 (iii), we obtain w.p at least 1 − n the bound k z − h k ≥ π σ n − n (2 + p π σ n ) ≥ π σ n − πσ √ n log n provided σ ≥ π √ n and n ≥ σ ≥
72 log nπ √ n then implies k z − h k ≥ π σ n .We will now use Lemma 3 and Lemma 1 to obtain a high probability error bound on (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) . Theorem 5.
Let z ∈ C n be generated as in (1.6) , and assume that
72 log nπ √ n ≤ σ ≤ √ π . For anygiven λ ∈ [ λ min , λ ] , if γ = ( π σ n △ B n λ ) / , then with probability at least − n , the solution b g ∈ C n of (UCQP) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ π σ √△ B n nλ + 99040 σ (cid:18) (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n (cid:19) + 65536 log n. (3.8) Proof.
We first simplify the bound in Lemma 1 with 1 + γλ min ≥
1, 1 + γλ ≥ γλ . Denoting z = e − π σ h , this yields (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ γ △ B n + 8 e π σ X j ∈L λ ∪{ n } |h q j , z − z i| + 8 e π σ γ λ k z − z k . (3.9)13pplying the bounds in (i),(ii) of Lemma 3 in (3.9), and observing that e π σ ≤ π σ ≤ σ ≤ √ π , we obtain with probability at least 1 − n that (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ γ △ B n + 99040 σ (cid:18) (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n (cid:19) + 65536 log n + 80 π σ nγ λ . (3.10)Plugging the stated choice of γ in (3.10) leads to (3.8).By combining Lemma 3(iii) with Theorem 5, we obtain our main result that establishes condi-tions under which (UCQP) provably denoises z with high probability. Theorem 6.
Let z ∈ C n be generated as in (1.6) , and ε ∈ (0 , . For any given λ ∈ [ λ min , λ ] suppose that max ( ελ r △ B n n ,
142 log n √ εn ) ≤ σ ≤ √ π and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n ≤ εn . If γ = ( π σ n △ B n λ ) / , then with probability at least − n , the solution b g of (UCQP) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε k z − h k . (3.11) Proof.
Recall from Lemma 3(iii), that if
72 log nπ √ n ≤ σ ≤ √ π then k z − h k ≥ π σ n holds with highprobability. Using (3.8) from Theorem 5, we hence note that (3.11) is ensured if72 π σ √△ B n nλ + 99040 σ (cid:18) (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n (cid:19) + 65536 log n ≤ επ σ n (3.12)which in turn is ensured provided each LHS term of (3.12) is less than or equal to επ σ n . Combiningthe resulting three conditions with the requirement
72 log nπ √ n ≤ σ ≤ √ π , one can check that thestated conditions in the Theorem suffice.As in Corollary 2, we can translate the conditions of Theorem 6 for the special cases G = K n , S n or P n . The proof is similar to that of Corollary 2 and hence omitted. Corollary 3.
Let z ∈ C n be generated as in (1.6) , and ε ∈ (0 , . (i) ( G = K n ) Suppose n √ log n & ε and max n nε √ B n , log n √ εn o . σ . . If γ ≍ ( σ n B n ) / , then thesolution b g of (UCQP) satisfies (3.11) w.h.p. (ii) ( G = S n ) Suppose n √ log n & ε and max n ε √ B n , log n √ εn o . σ . . If γ ≍ ( σ B n ) / , then thesolution b g of (UCQP) satisfies (3.11) w.h.p. (iii) ( G = P n ) For a given θ ∈ [0 , , suppose n θ + p n θ log n . εn and max (cid:26) n − θ ε √ B n , log n √ εn (cid:27) . σ . . If γ ≍ ( σ n − θ B n ) / , then the solution b g of (UCQP) satisfies (3.11) w.h.p. Remark 2.
As in Remark 1, we can deduce conditions on the smoothness term B n which lead toa non-vacuous regime for σ of the form o (1) ≤ σ . . These are detailed below when n → ∞ . . ( G = K n ) If ε is fixed then B n = o ( n ) suffices while if ε = ε n → and ε n = ω ( log n √ n ) , then B n = o ( ε n n ) is sufficient.2. ( G = S n ) If ε is fixed then B n = o (1) suffices while if ε n → and ε n = ω ( log n √ n ) , then B n = o ( ε n ) is sufficient.3. ( G = P n ) For fixed ε , it is sufficient that B n = o ( n θ − ) . On the other hand, if ε n → and ε n = ω (max { log n √ n , n θ + √ n θ log nn } ) then the condition B n = o ( ε n n θ − ) suffices. Denoising modulo samples of a function.
We conclude this section by instantiating theaforementioned bounds to the setting described in Section 1.2. Recall that here, we obtain noisymodulo 1 samples of a smooth function f : [0 , → R with G = P n . This leads to the followingCorollary of Theorems 5 and 6. Corollary 4.
Consider the example from Section 1.2 where we obtain noisy modulo 1 samples of a M -Lipschitz function f : [0 , → R . If γ ≍ (cid:16) σ n / M (cid:17) / then the following is true for the solution b g of (UCQP) .1. If log n √ n . σ . and n & then w.h.p, (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . ( σM + σ ) n / + log n.
2. For ε ∈ (0 , , if n & (1 /ε ) and max n Mεn / , log n √ εn o . σ . , then b g satisfies (3.11) w.h.p.Proof. Starting with (2.3), we have n − X i =1 | h i − h i +1 | = 4 π n − X i =1 | f ( x i ) − f ( x i +1 ) | ≤ π M n − ≤ π M n (= B n ) (3.13)where the penultimate inequality uses the Lipschitz continuity of f , and the last inequality uses n − ≥ n/ λ ≍ n − θ ) for θ ∈ [0 , (cid:12)(cid:12) L λ (cid:12)(cid:12) . n θ . Applying this along with (3.13) to Theorem 5, we obtain the bound (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . σM n − θ ) + σ ( n θ + p n θ log n ) + log n. Notice that we need 1 / < θ < θ = 2 / n , and if moreover n &
1, then we obtain the statement of the first part.The second part follows easily by plugging (3.13) along with θ = 2 / Remark 3.
We can see that the estimation error bound for (UCQP) is significantly better thanthe one in (1.3) derived by Cucuringu and Tyagi [11] for (TRS) . For a constant σ , our ℓ errorbound scales as O ( n / ) for large enough n , however this is not optimal. In a parallel work [12]with the present paper, it is shown that for the complex valued function h ( x ) = exp( ι πf ( x )) , forany given x ∈ [0 , , one can obtain an estimate b h ( x ) such that E [ b h ( x ) − h ( x )] = O ( n − / ) which s also consistent with the pointwise minimax rate for estimating a Lipschitz function [23]. Thissuggests that for the example from Section 1.2, the best ℓ error bound we can hope for is O ( n / ) .It will be interesting to see whether the analysis for (UCQP) can be improved to an extent such thatinstantiating the result for G = P n achieves this optimal bound, we discuss this further in Section5.4; see also Remark 5 in Section 5.2. (TRS) We now proceed to derive a ℓ error bound for the solution b g of (TRS). Following the notation in[11], we can represent any x ∈ C n via ¯ x ∈ R n , where ¯ x = [Re( x ) T Im( x ) T ] T . Moreover, considerthe matrix H = (cid:18) γL γL (cid:19) = γ (cid:18) (cid:19) ⊗ L ∈ R n × n where ⊗ denotes the Kronecker product. Then it is easy to check that (TRS) is equivalent tomin g ∈ R n : k g k = n g T Hg − g T z (TRS R )where b g is a solution of (TRS) iff ¯ b g = [Re( b g ) T Im( b g ) T ] T is a solution of (TRS R ).The following Lemma from [11], which in turn is a direct consequence of [32, Lemma 2.4, 2.8](see also [14, Lemma 1]), characterizes any solution ¯ b g of (TRS R ). Lemma 4 ([11]) . For any given z ∈ R n , ¯ b g is a solution to (TRS R ) iff (cid:13)(cid:13) ¯ b g (cid:13)(cid:13) = n and ∃ µ ⋆ suchthat (a) H + µ ⋆ I (cid:23) and (b) (2 H + µ ⋆ I )¯ b g = 2 z . Moreover, if H + µ ⋆ I ≻ , then the solution isunique. Due to the equivalence of (TRS) and (TRS R ) as discussed above, this readily leads to thefollowing characterization for any solution b g of (TRS). Lemma 5.
For any given z ∈ C n , b g is a solution to (TRS) iff k b g k = n and ∃ µ ⋆ such that (a) γL + µ ⋆ I (cid:23) and (b) (2 γL + µ ⋆ I ) b g = 2 z . Moreover, if γL + µ ⋆ I ≻ , then the solution is unique. Note that the above Lemma’s do not require any sphere constraint on z, z . We now use Lemma5 to derive the following crucial Lemma which states upper and lower bounds on µ ⋆ . The notation N ( L ) is used to denote the null space of L , which is the span of q n since G is connected byassumption. Lemma 6.
For any z ∈ C n satisfying z
6⊥ N ( L ) , we have that b g = 2(2 γL + µ ⋆ I ) − z is the uniquesolution of (TRS) with µ ⋆ ∈ (0 , . Additionally, if e λ is an eigenvalue of L satisfying √ n X j : λ j ≤ e λ |h z, q j i| / > γ e λ then it holds that √ n X j : λ j ≤ e λ |h z, q j i| / − γ e λ ≤ µ ⋆ ≤ . Note that if f : [0 , → R is M -Lipschitz, then it implies that h ( · ) is 2 πM Lipschitz since | h ( x ) − h ( x ′ ) | ≤ π | f ( x ) − f ( x ′ ) | ≤ πM | x − x ′ | for all x, x ′ ∈ [0 , roof. Let us denote φ ( µ ) := (cid:13)(cid:13) γL + µI ) − z (cid:13)(cid:13) = 4 n X j =1 |h z, q j i| (2 γλ j + µ ) . (4.1)If z
6⊥ N ( L ) then 0 is a pole of φ . Hence there exists a unique µ ⋆ ∈ (0 , ∞ ) such that b g =2(2 γL + µ ⋆ I ) − z is a (unique) solution of (TRS) as it satisfies the conditions in Lemma 5. Since n = k b g k ≤ n ( µ ⋆ ) , we obtain µ ⋆ ≤
2. To obtain the lower bound on µ ⋆ , note that (4.1) implies k b g k = n = (cid:13)(cid:13) γL + µ ⋆ I ) − z (cid:13)(cid:13) ≥ P λ j ≤ e λ |h z, q j i| ( γ e λ + µ ⋆ ) which leads to the stated lower bound on µ ⋆ .This is an important Lemma since localizing the value of µ ⋆ will be key for controlling the ℓ error bound for (TRS). Observe that e λ = 0 always satisfies the conditions of the Lemma since |h z, q n i| > b g | b g | and h for any z ∈ C n such that z
6⊥ N ( L ). Lemma 7.
For any z ∈ C n such that z
6⊥ N ( L ) , and λ ∈ [ λ min , λ ] , the (unique) solution b g =2(2 γL + µ ⋆ I ) − z of (TRS) satisfies the bound (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ µ ⋆ ) ( E + E ) + 8 (cid:18) µ ⋆ − (cid:19) |h h, q n i| + P j ∈L λ |h h, q j i| (1 + γλ min ) + B n λ (1 + γλ ) ! with E , E as defined in Lemma 1, and µ ⋆ ∈ (0 , .Proof. The proof is along the lines of Lemma 1 with only few technical differences. Firstly, we canwrite e π σ b g − h = 2 µ ⋆ (cid:18) I + 2 γµ ⋆ L (cid:19) − ( e π σ z − h ) | {z } e + 2 µ ⋆ (cid:18) I + 2 γµ ⋆ L (cid:19) − h − h | {z } e which in conjunction with Proposition 3 leads to the bound (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) e π σ b g (cid:12)(cid:12) e π σ b g (cid:12)(cid:12) − h (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ k e k + k e k ) . Proceeding identically to the proof of Lemma 1, it is easy to verify that k e k ≤ µ ⋆ ) E . In orderto bound k e k , we begin by expanding e as e = 2 µ ⋆ (cid:18) I + 2 γµ ⋆ L (cid:19) − h − h = (cid:18) µ ⋆ − (cid:19) h q n , h i q n + n − X j =1 µ ⋆ µ ⋆ γλ j − ! h q j , h i q j . q j ’s, we can bound k e k as follows. k e k ≤ (cid:18) µ ⋆ − (cid:19) |h h, q n i| + n − X j =1 ( µ ⋆ − + γ λ j ( µ ⋆ ) (1 + µ ⋆ γλ j ) |h h, q j i| (using ( a − b ) ≤ a + b for a, b ≥ ≤ (cid:18) µ ⋆ − (cid:19) |h h, q n i| + n − X j =1 ( µ ⋆ − + γ λ j ( µ ⋆ ) (1 + γλ j ) |h h, q j i| (using µ ⋆ ≤ ≤ (cid:18) µ ⋆ − (cid:19) |h h, q n i| + P j ∈L λ |h h, q j i| (1 + γλ min ) + P j ∈H λ |h h, q j i| (1 + γλ ) ! + 4 γ ( µ ⋆ ) (1 + γλ min ) n − X j =1 λ j |h h, q j i| ≤ (cid:18) µ ⋆ − (cid:19) |h h, q n i| + P j ∈L λ |h h, q j i| (1 + γλ min ) + B n λ (1 + γλ ) ! + 4( µ ⋆ ) E , where in the last inequality, we used (2.1),(2.2).When z ∈ C n is generated as in (1.6), the following Lemma presents a (high probability) lowerbound on µ ⋆ provided σ is small and h is sufficiently smooth. Lemma 8.
Let z ∈ C n be generated as in (1.6) , then the solution of (TRS) is unique. Moreover,suppose that for any given k ∈ [ n − s.t λ n − k +1 < λ n − k , the following holds. B n λ n − k ≤ n , σ ≤ π , n √ n ≤ and γλ n − k +1 ≤ . (4.2) Then with probability at least − n , we have that µ ⋆ ≥ − (cid:18) B n nλ n − k + 4 π σ + 24760 log n √ n + γλ n − k +1 (cid:19) . Proof.
We will lower bound the lower bound estimate of µ ⋆ from Lemma 6 using Proposition 2(i).Note that z ∈ C n satisfies z
6⊥ N ( L ) a.s. Set e λ = λ n − k +1 in Lemma 6, and let U denote the n × k matrix consisting of q j ’s for n − k + 1 ≤ j ≤ n .Let us first simplify the statement of Proposition 2(i) when σ ≤ π . Recall from the proof ofLemma 3 that this implies 1 − e − π σ ∈ [4 π σ , π σ ]. Then the (magnitude of the) RHS of thebound in Proposition 2(i) can be upper bounded as4096 log n + 256 √ π σ p k log n + 11 log n (cid:13)(cid:13) U U T h (cid:13)(cid:13) ∞ + 2 √ πσ (cid:13)(cid:13) U U T h (cid:13)(cid:13) | {z } ≤√ n ≤ n + σ p k log n + log n ( (cid:13)(cid:13) U U T h (cid:13)(cid:13) ∞ + σ √ n )) ≤ n (1 + (cid:13)(cid:13) U U T h (cid:13)(cid:13) ∞ + 2 σ √ n ) (4.3)where the last inequality uses σ ≤ σ and k ≤ n .18lugging (4.3) in Proposition 2 (i), we conclude that with probability at least 1 − n , X j : λ j ≤ λ n − k +1 |h z, q j i| ≥ π σ k + (1 − π σ ) h ∗ U U T h − n (1 + (cid:13)(cid:13) U U T h (cid:13)(cid:13) ∞ + 2 σ √ n ) | {z } ≤ √ n log n ≥ (1 − π σ ) h ∗ U U T h − √ n log n ≥ (1 − π σ ) (cid:18) n − B n λ n − k (cid:19) − √ n log n ( h ∗ U U T h ≥ n − B n λ n − k , see (2.1)) ≥ n − B n λ n − k − π σ n − √ n log n = n (cid:18) − B n nλ n − k − π σ − n √ n (cid:19) . Using (4.2), we have B n nλ n − k + 4 π σ + n √ n ≤ . This leads to the bound n X j : λ j ≤ λ n − k +1 |h z, q j i| / ≥ − B n nλ n − k − π σ − n √ n which in conjunction with Lemma 6 leads to the stated bound on µ ⋆ . In particular, the conditionsin (4.2) ensure µ ⋆ ≥ (cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13) for the solution b g of (TRS). Theorem 7.
Let z ∈ C n be generated as in (1.6) , then the solution b g of (TRS) is unique. Forany given k ∈ [ n − s.t λ n − k +1 < λ n − k , and any λ ∈ [ λ min , λ ] with the choice γ = ( π σ n △ B n λ ) / ,suppose that the following conditions are satisfied. (i) B n ≤ min n nλ n − k , nλ o , and (ii) 286 (cid:16) log n √ n (cid:17) / ≤ σ ≤ min (cid:26) √ π , λ λ n − k +1 q △ B n π n (cid:27) .Then with probability at least − n , the solution b g ∈ C n of (TRS) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ C σλ p △ B n n + n / λ n − k +1 √△ B n ! + C σ (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n ) (4.4)+ C σ n + C log n + C B n nλ n − k where C = 288 π , C = 396160 , C = 230400 , C = 262144 , C = 144 .Proof. We simply combine Lemmas 7 and 8. To this end, recall that the error bound in Theorem5 is a bound on the term 8( E + E ). This means that if
72 log nπ √ n ≤ σ ≤ √ π , then for the statedchoice of γ , we have with probability at least 1 − n ,32( E + E ) ≤ π σλ p △ B n n + 396160 σ (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) + (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) p log n ) + 262144 log n. (4.5)19he conditions in Lemma 8 imply in particular that µ ⋆ ≥
1, hence (4.5) is a bound on E + E ) µ ⋆ .This takes care of the first error term in the bound in Lemma 7.In order to bound the second term therein, observe that if B n ≤ nλ , then |h h, q n i| + P j ∈L λ |h h, q j i| (1 + γλ min ) + B n λ (1 + γλ ) ≤ n + n n . Also, condition (ii) for σ implies 24760 log n √ n ≤ σ π ( ≤ ). Note that the requirement σ ≥ log n √ n ) / is stricter than σ ≥
72 log nπ √ n . Given these observations, we can bound the second term in the boundof Lemma 7 as follows.8 (cid:18) µ ⋆ − (cid:19) |h h, q n i| + P j ∈L λ |h h, q j i| (1 + γλ min ) + B n λ (1 + γλ ) ! ≤ n (cid:18) B n nλ n − k + 4 π σ + 24760 log n √ n + γλ n − k +1 (cid:19) ≤ n (cid:18) B n nλ n − k + 4 π σ + 3 σ π + γλ n − k +1 (cid:19) ≤ n (cid:18) B n nλ n − k + 40 σ + γλ n − k +1 (cid:19) ≤ n B n n λ n − k + 1600 σ + π σ n △ B n λ ! / λ n − k +1 . (4.6)Plugging (4.5) and (4.6) in Lemma 7 then readily yields the stated bound in the Theorem.The following Corollary provides a simplification of Theorem 7 and is directly obtained byconsidering k = 1 since λ n < λ n − = λ min . Corollary 5.
Let z ∈ C n be generated as in (1.6) , then the solution b g of (TRS) is unique. Forgiven λ ∈ [ λ min , λ ] with the choice γ = ( π σ n △ B n λ ) / , suppose that B n ≤ nλ min and (cid:18) log n √ n (cid:19) / ≤ σ ≤ √ π . Then with probability at least − n , the solution b g ∈ C n of (TRS) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ C σλ p △ B n n + + C σ (cid:18) (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n (cid:19) + C σ n + C log n + C B n nλ where the constants C , . . . , C are as in Theorem 7. We are now in a position to derive conditions under which (TRS) provably denoises z with highprobability. We begin with the following Theorem which provides these conditions in their fullgenerality. Theorem 8.
Let z ∈ C n be generated as in (1.6) , then the solution b g of (TRS) is unique. Withconstants C , . . . , C as in Theorem 7, for any ε ∈ (0 , , given k ∈ [ n − s.t λ n − k +1 < λ n − k and λ ∈ [ λ min , λ ] with the choice γ = ( π σ n △ B n λ ) / , suppose that the following conditions are satisfied. B n ≤ min n nλ n − k , nλ o , and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n ≤ π C εn . (ii) σ ≤ min (cid:26) π √ ε √ C , λ λ n − k +1 q △ B n π n (cid:27) and σ ≥ max ( (cid:18) log n √ n (cid:19) / , √ C π (cid:18) B n nλ n − k √ ε (cid:19) , s(cid:18) C επ (cid:19) log nn , C π ελ r △ B n n + λ n − k +1 r n △ B n !) . Then with probability at least − n , the solution b g ∈ C n of (TRS) satisfies (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε k z − h k . (4.7) Proof.
Recall from Lemma 3 (iii), that k z − h k ≥ π σ n w.p at least 1 − n . Conditioning on theintersection of this event and the event in Theorem 7, it suffices to ensure that the bound in (4.4)is less than or equal to π σ nε . This in turn is ensured provided each term in the RHS of (4.4) isless than or equal to ε π σ n . Combining the resulting conditions with those in Theorem 7 yieldsthe statement of the Theorem.The following simplification of Theorem 8 is obtained for k = 1, as was done in Corollary 5. Corollary 6.
Let z ∈ C n be generated as in (1.6) , then the solution b g of (TRS) is unique. Withconstants C , . . . , C as in Theorem 7, for any ε ∈ (0 , and λ ∈ [ λ min , λ ] with the choice γ =( π σ n △ B n λ ) / , suppose that the following conditions are satisfied. (i) B n ≤ nλ min and (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n ≤ π C εn . (ii) max ( (cid:18) log n √ n (cid:19) / , √ C π (cid:18) B n nλ min √ ε (cid:19) , s(cid:18) C επ (cid:19) log nn , C π ελ r △ B n n ) ≤ σ ≤ π √ ε √ C . Then with probability at least − n , the solution b g ∈ C n of (TRS) satisfies (4.7) . Finally, as done previously for (UCQP), it will be instructive to translate Theorem 8 for thespecial cases G = K n , S n or P n . This is stated below using the simplified version in Corollary 6. Corollary 7.
Let z ∈ C n be generated as in (1.6) and ε ∈ (0 , . (i) ( G = K n ) Suppose n √ log n & ε , B n . n and max n √ B n nε , B n n √ ε , ( log n √ n ) / , ( log nεn ) / o . σ . √ ε .If γ ≍ ( σ n B n ) / , then the (unique) solution b g of (TRS) satisfies (4.7) w.h.p. (ii) ( G = S n ) Suppose n √ log n & ε , B n . n and max n √ B n ε , B n n √ ε , ( log n √ n ) / , ( log nεn ) / o . σ . √ ε .If γ ≍ ( σ B n ) / , then the (unique) solution b g of (TRS) satisfies (4.7) w.h.p. ( G = P n ) For a given θ ∈ [0 , , suppose n θ + p n θ log n . εn , B n . n and max ( n − θ ε p B n , nB n √ ε , (cid:18) log n √ n (cid:19) / , (cid:18) log nεn (cid:19) / ) . σ . √ ε. If γ ≍ ( σ n − θ B n ) / , then the (unique) solution b g of (TRS) satisfies (4.7) w.h.p.Proof. Use Corollary 6 with λ min , λ as in Corollary 2.For K n , note that only k = 1 meets the requirement of Theorem 8 since λ n − = · · · = λ . For S n , the only other possibility (apart from k = 1) is to choose k = n −
1, since λ = 1 < λ = n .But this choice of k leads to a vacuous noise regime due to the appearance of the term √ B n + √ B n as a lower bound on σ . Remark 4.
Similarly to Remark 2 for (UCQP) , we can deduce conditions on the smoothness term B n which – when n → ∞ – lead to a non-vacuous regime for σ of the form o (1) ≤ σ . √ ε . Here,we will only treat the case where ε is fixed.1. ( G = K n ) B n = o ( n ) suffices.2. ( G = S n ) B n = o (1) suffices.3. ( G = P n ) B n = o (1 /n ) suffices. Denoising modulo samples of a function.
When G = P n , the requirement B n = o (1 /n ) isfar from satisfactory and suggests that Corollary 6 is weak when applied to a path graph. Indeed,when we obtain noisy modulo 1 samples of a M -Lipschitz function f : [0 , → R , then we have B n ≍ M n as seen in (3.13). Unfortunately, the condition on σ in Corollary 7(iii) becomes vacuouswhen B n ≍ /n . Interestingly, we can handle this smoothness regime by making use of Theorem8 with a careful choice of k . This is made possible by the fact that the spectrum of the Laplacianof P n satisfies the condition λ n − k +1 < λ n − k for each k ∈ [ n − Corollary 8.
Consider the example from Section 1.2 where we obtain noisy modulo 1 samples of a M -Lipschitz function f : [0 , → R . If γ ≍ (cid:16) σ n / M (cid:17) / then the following is true for the solution b g of (TRS) .1. If n & max (cid:8) , M (cid:9) and ( log n √ n ) / . σ . min (cid:8) , n / M (cid:9) then w.h.p, (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . (cid:18) σ (cid:18) M + 1 M (cid:19) + σ (cid:19) n / + σ n + log n + M n .
2. For ε ∈ (0 , , if n & max (cid:8) (1 /ε ) , M (cid:9) and max ( εn / (cid:18) M + 1 M (cid:19) , M n √ ε , (cid:18) log n √ n (cid:19) / , (cid:18) log nεn (cid:19) / ) . σ . min n √ ε, n / M o , then b g satisfies (3.11) w.h.p. The error bound in Corollary 8 is visibly worse than that in Corollary 4 due to the appearanceof an additional σ n term. For n large enough and ε ∈ (0 ,
1) fixed, Corollary 8 asserts that (TRS)succeeds in denoising in the noise regime ( log n √ n ) / . σ . √ ε . The corresponding noise regime for(UCQP) is the relatively weaker requirement Mεn / . σ .
1, as seen from Corollary 4.22
Discussion
We now discuss in detail some related work and conclude with directions for future work.
There exist numerous methods for this problem in the phase unwrapping community, most of whichare for the setting d = 2 (since this case has the most number of applications). Such methods can bebroadly classified as belonging to the class of (a) least squares based approaches (e.g., [16, 27]), (b)branch cut methods (e.g., [9, 24]), or (c) network flow methods (e.g., [34, 33]). While we refer thereader to [11] for a more detailed discussion of these methods (as well as other related approachesfrom phase unwrapping), we remark that most of these approaches are based on heuristics with notheoretical performance guarantees.A recent line of work for this problem has led to the development of new methods with prov-able performance guarantees. Bhandari et al. [4] considered equispaced sampling of a univariatebandlimited function (with spectrum in [ − π, π ]) and showed in the noiseless setting that if thesampling width is less than or equal to πe , then the samples of g (and consequently the function g itself) can be recovered exactly. This work was extended by the same set of authors to othersettings such as in [5], where f is assumed to be the convolution of a low pass filter and a sum of k Diracs, and in [6] where f is considered to be a sum of k sinusoids. Then, given n equispaced(with step size T ) noiseless modulo measurements of f , Bhandari et al. [5, 6] show that f can berecovered exactly provided n is large enough (roughly speaking, n & k ), and T ≤ πe . In a followup work, Rudresh et al. [28] considered the setting where f is a univariate Lipschitz function, andproposed a method based on first applying a wavelet filter to the (equispaced) modulo samples,followed by a LASSO based procedure for recovering f . They showed that if f is a polynomial ofdegree p , then it can be recovered exactly from its noiseless modulo samples provided the samplingwidth is . ζLp , where L is the Lipschitz constant of f . The authors do not provide any theoreticalguarantees in the presence of noise, however demonstrate via simulation results that their methodis more robust to noise than that of Bhandari et al. [4].While the aforementioned results [4, 5, 6, 28] are for the nonparametric setting and with f being univariate, the setting where f is a d -variate linear function was considered by Shah andHegde [30]. Assuming f to be sparse, exact recovery guarantees were provided (for the noiselesssetting) in the regime n ≪ d for an alternating minimization based algorithm. Musa et al. [22] alsoconsider f to be a sparse linear function, but assume it to be generated from a Bernoulli-Gaussiandistribution. They propose a generalized approximate message passing algorithm for recovering f ,but without any theoretical analysis.In the work of Cucuringu and Tyagi [11], the authors also proposed a semi-definite programming(SDP) relaxation of (QCQP) and also considered solving (QCQP) using methods for optimizationover manifolds. These approaches were shown to perform well via simulations, but without anytheoretical analysis.In a parallel work with the present paper, Fanuel and Tyagi [12] derived a two-stage algorithmfor unwrapping noisy modulo 1 samples of a Lipschitz function f : [0 , d → R , for the model (1.6)with η i ∼ N (0 , σ ) i.i.d. In the first stage, they represent the noisy data ( y i ) ni =1 on the productmanifold C n (as in Section 1.1), and consider estimating the ground signal h ∈ C n via a k NN ( k nearest neighbor) procedure. Assuming that the x i ’s form a uniform grid on [0 , d , they show thatthe ensuing estimate b h ∈ C n satisfies (w.h.p) the ℓ ∞ error rate || b h − h || ∞ . ( log nn ) d +2 when n is It should of course depend on p , but this was not stated explicitly in [28] f ( x i ) mod 1 and the mod 1 estimate obtained from b h i , for each i . Then inthe second stage, they present a sequential unwrapping procedure for unwrapping these denoisedmod 1 samples and show that the estimates ˜ f ( x i ) of f ( x i ) satisfy (up to a global integer shift) thebound (cid:12)(cid:12)(cid:12) ˜ f ( x i ) − f ( x i ) (cid:12)(cid:12)(cid:12) . ( log nn ) d +2 for each i .Fanuel and Tyagi [12] also studied the problem of identifying conditions under which the SDPformulation of Cucuringu and Tyagi [11] is a tight relaxation of (QCQP). This is done under ageneral graph based setup as in the present paper. Without any statistical assumptions on thenoisy data z ∈ C n , their result states that if || z − h || ∞ . γ ∆ .
1, then the SDP relaxationof (QCQP) is tight, and consequently leads to the global solution of (QCQP). As discussed in[12], the derived conditions are stricter than what one would expect, and so there is still room forimprovement in this regard.
Estimating a smooth function θ ⋆ ∈ R n on a graph G = ([ n ] , E ) is a well studied problem in signalprocessing and statistics. The statistical model is typically assumed to be y = θ ⋆ + η, η ∼ N (0 , σ I ) (5.1)with the smoothness of θ ⋆ measured by ( θ ⋆ ) ⊤ Lθ ⋆ which is assumed to be small. A common approachfor estimating θ ⋆ is via the so-called Tikhonov regularization where we aim to solvemin θ ∈ R n k θ − y k + γθ ⊤ Lθ. (5.2)While (5.2) is the same as (UCQP), the model in (5.1) is notably different from (1.6).Let us review some important theoretical results pertaining to (5.2). Belkin et al [2] consideredthe semi-supervised learning problem of predicting the values of θ ⋆ on the vertices of a partiallylabelled graph. They use the notion of algorithmic stability to derive generalization error boundsfor (5.2) which in particular depend on the Fiedler eigenvalue of L . Sadhanala et al. [29] considerthe problem of estimating θ ⋆ under the assumption that G is a d -dimensional regular grid. Thesmoothness assumption on θ ⋆ is that θ ⋆ ∈ S ( B n ) where S ( B n ) = n θ ∈ R n : θ ⊤ Lθ ≤ B n o . They establish a lower bound on the minimax risk [29, Theorem 5] for the class S ( B n ),inf b θ sup θ ⋆ ∈S ( B n ) n E [ k b θ − θ ⋆ k ] ≥ cn min n ( nσ ) d +2 ( B n ) dd +2 , nσ , n /d B n o + σ n with c > d = 1 , , b θ of (5.2) isminimax optimal since it satisfiessup θ ⋆ ∈S ( B n ) n E [ k b θ − θ ⋆ k ] ≤ ˜ cn min n ( nσ ) d +2 ( B n ) dd +2 , nσ , n /d B n o + ˜ cσ n (5.3)for a universal constant ˜ c >
0, when n is large enough and γ ≍ ( nB n ) d +2 . It is also mentioned in[29, Remark 5] that for d = 4, (5.2) is nearly minimax optimal due to an extra log factor in the translated to our notation in the present paper d ≥
5, but they conjecture that it is not optimal for this range of d .It is also shown that the Laplacian eigenmap estimator achieves the aforementioned upper boundfor all d , and hence is minimax optimal. Remark 5.
The crucial step in establishing (5.3) is Lemma in [29]; it bounds the variance errorby bounding P ni =1 1(1+ γλ i ) . This latter bound is tight as the analysis steps obviously make explicituse of the expressions for the eigenvalues of L . It is possible that the steps involved in [29, Lemma10] could be appropriately used to further tighten our bound in Corollary 4 when G = P n . Howeverthe main purpose of our analysis is to work with general connected graphs G , and to derive generalerror bounds which depend on the spectrum of the Laplacian of G . It is then not surprising thatinstantiating these general bounds to particular graphs yields sub-optimal error bounds. Kirichenko and van Zanten [18] considered a Bayesian regularization framework for estimating θ ⋆ . They make an asymptotic shape assumption on the graph, namely that G looks like a r -dimensional grid with n vertices as n → ∞ . More precisely, they assume that λ n − i = Θ(( i/n ) /r )for 1 ≤ i ≤ κn for some κ ∈ (0 , θ ⋆ is captured by the assumption ( θ ⋆ ) ⊤ ( I +( n /r L ) β ) θ ⋆ ≤ C where β, C >
0. Under these assumptions, with an appropriate assumption onthe prior distribution for a randomly generated θ ∈ R n , it is shown [18, Theorem 3.2] that for n large enough, 1 n k θ − θ ⋆ k = O ( n − β β + r )holds with high probability. This is then used to show [18, Theorem 5.1] that the posterior contractsaround θ ⋆ at the rate n − β β + r . This result was later shown to be optimal by Kirichenko and vanZanten [19] under the same set of smoothness and asymptotic shape assumptions on θ ⋆ and G respectively. It is important to understand the general idea behind the analysis technique in [11] for (TRS)that leads to the estimation error bound in (1.3). Denoting F ( g ) = k g − z k + γg ∗ Lg to be theobjective function, the main observation is that by feasibility of the ground truth h ∈ C n , we have F ( b g ) ≤ F ( h ) for any solution b g . Then after rearranging the terms followed by some simplification,one can readily check that the above inequality is equivalent to k b g − h k ≤ k z − h k − γ b g ∗ L b g + 2Re( b g ∗ ( z − h )) + γh ∗ Lh. (5.4)Now if z is generated randomly as in (1.6), we know that k z − h k . σ n w.h.p if σ .
1. Moreover,the term Re( b g ∗ ( z − h )) is bounded via Cauchy-Schwartz to obtain Re( b g ∗ ( z − h )) ≤ √ n k z − h k . σn w.h.p. Plugging these bounds in (5.4) leads to the bound k b g − h k . σn − γ b g ∗ L b g + γh ∗ Lh. (5.5)The final step in the analysis requires lower bounding the quadratic term b g ∗ L b g , which first involvesutilising the expression of the (TRS) solution b g to show that [11, Lemma 5] b g ∗ L b g & γ ∆) z ∗ Lz (5.6) Here, we project y onto the subspace spanned by the k smallest eigenvectors of L , for a suitably chosen k . z ∗ Lz & γ ∆ σ + h ∗ Lh when σ .
1. Plugging these considerations in (5.5), and noting that h ∗ Lh . M ∆ n finally leads to the bound k b g − h k . σn + γ M ∆ n − γ ∆(1 + 2 γ ∆) σ . Using Proposition 3, we obtain the bound stated in (1.3) since γ ∆(1+2 γ ∆) is always less than .We believe that certain steps in the above analysis can likely be improved. For instance thebound on Re( b g ∗ ( z − h )) could be perhaps improved by using the expression for the (TRS) solution b g , along with concentration inequalities. Furthermore, the lower bound in (5.6) is almost certainlysub-optimal as can be seen from the proof of [11, Lemma 5]. But it seems unlikely that the ensuingimprovements will improve the bounds drastically, and would probably at best improve the term σn to σ n . There are several important directions for future work.1.
Optimality of the error bounds.
The ℓ error bounds that we derived for the (TRS) and(UCQP) estimators are certainly not optimal. For instance as noted earlier in Remark 3,when G = P n and B n ≍ /n , the ℓ error bound for (UCQP) is O ( n / ) while the optimalscaling should be O ( n / ). Admittedly, our analysis does involve certain simplifications atdifferent steps in order to make the calculations more tractable - especially in the derivationof the choice of the regularization parameter γ . In particular, since we work with generalconnected graphs and do not make any assumption on the spectrum of the Laplacian, hencethe central theme behind our analysis – which is to separate the low and high frequencyspectrum of the Laplacian – can be considered a bit crude for certain graphs whose Laplacianexhibits a more continuous spectrum (such as the path graph). In general, if one restrictsthe analysis to special families of graphs with the spectrum of the Laplacian’s possessing aspecific structure (e.g., G = P n ; recall Remark 5), it is plausible that a more refined analysiscould be carried out with better error rates. Otherwise, providing an “optimal” analysis inits full generality would involve finding the best choice of γ which – as evidenced by the proofof Lemma 1 – seems challenging. Nevertheless, this is an interesting question to consider forfuture work.2. ℓ ∞ estimation error rates. While our focus throughout has been on deriving ℓ error bounds,an interesting but more challenging task would be to derive ℓ ∞ error bounds for the (TRS) and(UCQP) estimators. Such a result would be especially useful for the problem of unwrappingnoisy modulo samples of a function f since it would enable us to obtain uniform error ratesfor the unwrapped samples of f , using the aforementioned results of Fanu¨el and Tyagi [12]. Acknowledgements
I would like to thank St´ephane Chr´etien and Micha¨el Fanuel for carefully reading a preliminaryversion of the draft, and for providing useful feedback; Alain Celisse for the very helpful technicaldiscussions during the early stages of this work.26 eferences [1] S. Adachi, S. Iwata, Y. Nakatsukasa, and A. Takeda. Solving the trust-region subproblem bya generalized eigenvalue problem.
SIAM Journal on Optimization , 27(1):269–291, 2017.[2] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning on largegraphs. In
Learning Theory , pages 624–638, 2004.[3] P.C. Bellec. Concentration of quadratic forms under a bernstein moment assumption, 2019.[4] A. Bhandari, F. Krahmer, and R. Raskar. On unlimited sampling. ArXiv e-prints,arXiv:1707.06340v1, 2017.[5] A. Bhandari, F. Krahmer, and R. Raskar. Unlimited sampling of sparse signals. In , pages4569–4573, 2018.[6] A. Bhandari, F. Krahmer, and R. Raskar. Unlimited sampling of sparse sinusoidal mixtures.In , pages 336–340, 2018.[7] St´ephane Boucheron, Gabor Lugosi, and Pascal Massart.
Concentration inequalities : a nonasymptotic theory of independence . Oxford University Press, 2013.[8] A.E. Brouwer and W.H. Haemers.
Spectra of Graphs . New York, NY, 2012.[9] S. Chavez, Q.S Xiang, and L. An. Understanding phase maps in mri: a new cutline phaseunwrapping method.
IEEE Transactions on Medical Imaging , 21(8):966–977, 2002.[10] M. Cucuringu and H. Tyagi. On denoising modulo 1 samples of a function. In
Proceedings ofthe Twenty-First International Conference on Artificial Intelligence and Statistics , volume 84,pages 1868–1876, 2018.[11] M. Cucuringu and H. Tyagi. Provably robust estimation of modulo 1 samples of a smooth func-tion with applications to phase unwrapping.
Journal of Machine Learning Research , 21(32):1–77, 2020.[12] M. Fanuel and H. Tyagi. Denoising modulo samples: k-NN regression and tightness of SDPrelaxation. In preparation, 2020.[13] L. C. Graham. Synthetic interferometer radar for topographic mapping.
Proceedings of theIEEE , 62(6):763–768, 1974.[14] W.W. Hager. Minimizing a quadratic over a sphere.
SIAM J. on Optimization , 12(1):188–208,2001.[15] M. Hedley and D. Rosenfeld. A new two-dimensional phase unwrapping algorithm for mriimages.
Magnetic Resonance in Medicine , 24(1):177–181, 1992.[16] H.Y.H Huang, L. Tian, Z. Zhang, Y. Liu, Z. Chen, and G. Barbastathis. Path-independentphase unwrapping using phase gradient and total-variation (tv) denoising.
Optics express ,20(13):14075–14089, 2012.[17] W. Kester. Mt-025 tutorial adc architectures vi: Folding adcs. Analog Devices, Tech. report,2009. 2718] A. Kirichenko and H. van Zanten. Estimating a smooth function on a large graph by bayesianlaplacian regularisation.
Electron. J. Statist. , 11(1):891–915, 2017.[19] A. Kirichenko and H. van Zanten. Minimax lower bounds for function estimation on graphs.
Electron. J. Statist. , 12(1):651–666, 2018.[20] P. Lauterbur. Image formation by induced local interactions: examples employing nuclearmagnetic resonance.
Nature , 242:190–191, 1973.[21] H. Liu, Man-Chung Yue, and A. Man-Cho So. On the estimation performance and conver-gence rate of the generalized power method for phase synchronization.
SIAM Journal onOptimization , 27(4):2426–2446, 2017.[22] O. Musa, P. Jung, and N. Goertz. Generalized approximate message passing for unlimited sam-pling of sparse signals. In , pages 336–340, 2018.[23] A. Nemirovski. Topics in non-parametric statistics.
Ecole dEt´e de Probabilit´es de Saint-Flour ,28:85, 2000.[24] C. Prati, M. Giani, and N. Leuratti. Sar interferometry: A 2-d phase unwrapping techniquebased on phase and absolute values informations. In , pages 2043–2046, 1990.[25] R.G. Pratt and M.H. Worthington. The application of diffraction tomography to crossholeseismic data.
GEOPHYSICS , 53(10):1284–1294, 1988.[26] J. Rhee and Y. Joo. Wide dynamic range cmos image sensor with pixel level adc.
ElectronicsLetters , 39(4):360–361, 2003.[27] M. Rivera and J.L. Marroquin. Half-quadratic cost functions for phase unwrapping.
Opt. Lett. ,29(5):504–506, 2004.[28] S. Rudresh, A. Adiga, B. A. Shenoy, and C. S. Seelamantula. Wavelet-based reconstruction forunlimited sampling. In , pages 4584–4588, 2018.[29] V. Sadhanala, Yu-Xiang Wang, and R.J. Tibshirani. Total variation classes beyond 1d: Min-imax rates, and the limitations of linear smoothers. In
Proceedings of the 30th InternationalConference on Neural Information Processing Systems , NIPS’16, page 35213529, 2016.[30] Viraj Shah and Chinmay Hegde. Signal reconstruction from modulo observations, 2018.[31] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emergingfield of signal processing on graphs: Extending high-dimensional data analysis to networks andother irregular domains.
IEEE Signal Processing Magazine , 30(3):83–98, 2013.[32] D.C. Sorensen. Newtons method with a model trust region modification.
SIAM Journal onNumerical Analysis , 19(2):409–426, 1982.[33] M. Takeda and T. Abe. Phase unwrapping by a maximum cross-amplitude spanning treealgorithm: a comparative study.
Optical Engineering , 35:35 – 35 – 7, 1996.2834] D. P. Towers, T. R. Judge, and P. J. Bryanston-Cross. Automatic interferogram analysis tech-niques applied to quasi-heterodyne holography and ESPI.
Optics and Lasers in Engineering ,14:239–281, 1991.[35] T. Yamaguchi, H. Takehara, Y. Sunaga, M. Haruta, M. Motoyama, Y. Ohta, T. Noda,K. Sasagawa, T. Tokuda, and J. Ohta. Implantable self-reset cmos image sensor and itsapplication to hemodynamic response detection in living mouse brain.
Japanese Journal ofApplied Physics , 55(4S):04EM02, 2016.[36] H.A. Zebker and R.M. Goldstein. Topographic mapping from interferometric synthetic aper-ture radar observations.
Journal of Geophysical Research: Solid Earth , 91(B5):4993–4999,1986. 29
Proof of Proposition 1
1. This follows from the fact E [ e ι πη ] = e − πσ for η ∼ N (0 , σ ).2. We have E (cid:20)(cid:12)(cid:12)(cid:12) h z − e − π σ h, u i (cid:12)(cid:12)(cid:12) (cid:21) = n X i =1 u i E (cid:20)(cid:12)(cid:12)(cid:12) z i − e − π σ h i (cid:12)(cid:12)(cid:12) (cid:21) = n X i =1 u i (1+ e − π σ − e − π σ E [Re( z ∗ i h i )] | {z } = e − π σ ) .
3. This follows from Part 2 by noting that E h |h z, u i| i = e − π σ |h h, u i| + E (cid:20)(cid:12)(cid:12)(cid:12) h z − e − π σ h, u i (cid:12)(cid:12)(cid:12) (cid:21) .
4. Use Part 2 and the fact that for any orthonormal basis { u j } nj =1 of R n , (cid:13)(cid:13)(cid:13) z − e − π σ h (cid:13)(cid:13)(cid:13) = n X j =1 (cid:12)(cid:12)(cid:12) h z − e − π σ h, u j i (cid:12)(cid:12)(cid:12) . E [ k z − h k ] = 2 n − P ni =1 Re( E [ z ∗ i h i ]) = 2 n (1 − e − π σ ) where the last identity uses Part 1.The bounds in (2.4) follow from the following standard fact. For x ≥
0, we have that x − x ≤ − e − x ≤ x . Hence, if x ∈ [0 , x ≤ − e − x ≤ x . B Proof of Proposition 2
Before the proof, we recall some concentration inequalities that we will use. The first of these isthe standard Bernstein inequality.
Theorem 9 ([7, Corollary 2.11]) . Let X , . . . , X n be independent random variables with | X i | ≤ b for all i , and v = P ni =1 E [ X i ] . Then for any t ≥ , P n X i =1 ( X i − E [ X i ]) ≥ t ! ≤ exp − t v + bt ) ! . Note that replacing X i with − X i gives us the lower tail estimate. Next, we will use a recent,sharper version of the Hanson-Wright inequality due to Bellec [3], for concentration of randomquadratic forms. We state (for our purposes) a shorter version of this theorem. Theorem 10 ([3, Theorem 3]) . Let ξ := ( ξ , . . . , ξ n ) T be centered, independent (real-valued) randomvariables, with ν i = E [ ξ i ] , satisfying for some K > the Bernstein condition ∀ p ≥ E | ξ i | p ≤ p !2 ν i K p − . For any real matrix A ∈ R n × n , and any x > , we have with probability at least − e − x that ξ T Aξ − E [ ξ T Aξ ] ≤ K k A k x + 8 √ K k AD ν k F √ x where D ν := diag( ν , . . . , ν n ) . K , which will be the case in our setting. Note that replacing A with − A inTheorem 10 gives us the lower tail estimate. Proof of Proposition 2.
We will denote z = E [ z ] (= e − π σ h ) and also denote v = v R + ιv I for any v ∈ C n . (1) Proof of (i). Now note that z ∗ U U T z = ( z − z ) ∗ U U T ( z − z ) + 2Re(( z − z ) ∗ U U T z ) + e − π σ h ∗ U U T h (B.1)and so we will focus on lower bounding the first two terms on the RHS. In particular, we will boundthe first term using Theorem 10 (with A = − U U T ) and the second term using Theorem 9. (1a) Bounding the first term in (B.1) . Since( z − z ) ∗ U U T ( z − z ) = ( z R − z R ) ∗ U U T ( z R − z R ) + ( z I − z I ) ∗ U U T ( z I − z I ) (B.2)therefore we will bound each of the two terms in (B.2) using Theorem 10 with A = − U U T . In fact,we will do this only for the first term since the bound on the other term follows in an analogousmanner. To this end, note that E [ z R,i ] = e − π σ h R,i and also E [ z R,i ] = h R,i e − π σ ! + h I,i − e − π σ ! . Then a simple calculation reveals the bound ν R,i := E [ z R,i ] − ( E [ z R,i ]) = h R,i − e − π σ ) + h I,i − e − π σ ) ≤ − e − π σ . (B.3)Denoting D ν,R := diag( ν R, , . . . , ν R,n ), observe that (cid:13)(cid:13)
U U T D ν,R (cid:13)(cid:13) F = (cid:13)(cid:13) Q T D ν,R (cid:13)(cid:13) F ≤ √ k k D ν,R k ≤ √ k (1 − e − π σ )where the last inequality uses (B.3). Since | z R,i − z R,i | ≤ i , we obtain from Theorem 10that with probability at least 1 − e − x ,( z R − z R ) ∗ U U T ( z R − z R ) ≥ E [( z R − z R ) ∗ U U T ( z R − z R )] − (1024 x + 16 √ k (1 − e − π σ ) √ x ) . The same analysis holds for the other term in (B.2). Hence plugging these estimates in (B.2), usingProposition 1, and setting x = 2 log n , we obtain with probability at least 1 − n that( z − z ) ∗ U U T ( z − z ) ≥ k (1 − e − π σ ) − n − √ k (1 − e − π σ ) p log n. (B.4) (1b) Bounding the second term in (B.1) . We can writeRe(( z − z ) ∗ U U T z ) = ( z R − z R ) T QQ T z R + ( z I − z I ) T QQ T z I (B.5)so we will bound each of the two terms in (B.5) by Bernstein inequality. In particular we will onlydo this for the first term since the other term can be bounded analogously. To this end, denoting u R = QQ T z R , we have ( z R − z R ) T QQ T z R = n X i =1 ( z R,i − z R,i ) u R,i | {z } X R,i | X R,i | uniformly as | X R,i | ≤ k u R k ∞ ≤ e − π σ (cid:13)(cid:13) QQ T h R (cid:13)(cid:13) ∞ | {z } b R, max ; i = 1 , . . . , n, and the variance term n X i =1 E [ X R,i ] ≤ (1 − e − π σ ) k u R k = (1 − e − π σ ) e − π σ (cid:13)(cid:13) QQ T h R (cid:13)(cid:13) = v R, max . Now using Theorem 9 with v = v max ,R , b = b max ,R , and t = − log n ( b max ,R + q b ,R + 9 v max ,R ),we obtain P ( − ( z R − z R ) T QQ T z R ≥
23 log n ( b max ,R + q b ,R + 9 v max ,R )) ≤ n . (B.6)Similarly, one can show that P (cid:18) − ( z I − z I ) T QQ T z I ≥
23 log n ( b max ,I + q b ,I + 9 v max ,I ) (cid:19) ≤ n . (B.7)with b max ,I = 2 e − π σ (cid:13)(cid:13) QQ T h I (cid:13)(cid:13) ∞ , v max ,I = (1 − e − π σ ) e − π σ (cid:13)(cid:13) QQ T h I (cid:13)(cid:13) . Therefore combining(B.6), (B.7) in (B.5), we have w.p at least 1 − n thatRe(( z − z ) ∗ U U T z ) ≥ −
23 log n (cid:16) b max ,R + b max ,I + q b ,R + 9 v max ,R + q b ,I + 9 v max ,I (cid:17) ≥ −
23 log n (cid:16) e − π σ (cid:13)(cid:13) QQ T h (cid:13)(cid:13) ∞ + 3( √ v max ,R + √ v max ,I ) (cid:17) ≥ −
23 log n (cid:18) e − π σ (cid:13)(cid:13) QQ T h (cid:13)(cid:13) ∞ + 6 q (1 − e − π σ ) e − π σ (cid:13)(cid:13) QQ T h (cid:13)(cid:13) (cid:19) ≥ − e − π σ log n (cid:18)(cid:13)(cid:13) QQ T h (cid:13)(cid:13) ∞ + q (1 − e − π σ ) (cid:13)(cid:13) QQ T h (cid:13)(cid:13) (cid:19) . (B.8)Hence plugging (B.4) and (B.8) in (B.1) and applying the union bound, we obtain the statementof part 1 of the proposition after a slight simplification involving the constants. (2) Proof of (ii). This follows in an identical manner as (B.4) by using Theorem 10 with A = U U T . (3) Proof of (iii). Observe that k z − h k = 2 n − z ∗ h ) = 2 n (1 − e − π σ ) − z R − z R ) T h R + ( z I − z I ) T h I ) . (B.9)We will bound ( z R − z R ) T h R from above via Theorem 9; the same bound will hold for the term( z I − z I ) T h I which then yields the stated bound in the proposition. To this end, note that( z R − z R ) T h R = n X i =1 ( z R,i − z R,i ) h R,i | {z } X R,i | X R,i | uniformly as | X R,i | ≤ i , and the variance term n X i =1 E [ X R,i ] ≤ (1 − e − π σ ) k h R k ≤ (1 − e − π σ ) n =: v max . Then applying Theorem 9 with v = v max and b = 2 and t = log n (2 + √ v max ) yields P (cid:18) ( z R − z R ) T h R ≤ −
23 log n (cid:18) q n (1 − e − π σ ) (cid:19)(cid:19) ≤ n . (B.10)The same bound holds for the term ( z I − z I ) T h I as well, hence plugging these bounds in (B.9),together with the union bound on the success probability, yields the statement of part 3. (4) Proof of (iv). The proof is along the lines of that for part 3. Observe that (cid:13)(cid:13)(cid:13) z − e − π σ h (cid:13)(cid:13)(cid:13) = n + e − π σ n − z ∗ h ) e − π σ = n + e − π σ n − z − z ) ∗ h ) e − π σ − z ∗ h ) e − π σ = n (1 − e − π σ ) − e − π σ (( z R − z R ) T h R + ( z I − z I ) T h I ) . (B.11)Then bounding the terms ( z R − z R ) T , ( z I − z I ) T as in (B.10), and plugging these bounds in (B.11),we obtain the statement of part 4 after the simplification e − π σ ≤ C Proof of Corollary 8
Recall from Corollary 2 that λ j = 4 sin [ π n ( n − j )] for j = 1 , . . . , n . Hence for k = 1 , . . . , n − λ n − k = 4 sin (cid:16) π n k (cid:17) ≍ k n and λ n − k +1 = 4 sin (cid:16) π n ( k − (cid:17) ≍ ( k − n where we see that λ n − k +1 < λ n − k for each k . In particular, λ ≍ λ min ≍ n . Also recall that λ ≍ n − θ ) for θ ∈ [0 ,
1) which implies (cid:12)(cid:12) L λ (cid:12)(cid:12) . n θ .Plugging the above bounds in the error bound in Theorem 7, we obtain (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . σn − θ ) (cid:18) M + ( k − n M (cid:19) + σ ( n θ + p n θ log n ) + σ n + log n + M nk . (C.1)Setting k = ⌊ n / ⌋ in (C.1) simplifies the bound to (cid:13)(cid:13)(cid:13)(cid:13) b g | b g | − h (cid:13)(cid:13)(cid:13)(cid:13) . σn − θ ) (cid:18) M + 1 M (cid:19) + σ ( n θ + p n θ log n ) + σ n + log n + M n . (C.2)note that the choice θ = 2 / n in the first two terms in (C.2). Forthis choice of θ , we can see that (C.2) simplifies to the stated error bound in the first part of theCorollary when n & k = ⌊ n / ⌋ and θ = 2 /
3, we have λ n − k , λ n − k +1 ≍ n , λ = n / and (cid:12)(cid:12) L λ (cid:12)(cid:12) . n / . Then,min (cid:8) nλ n − k , nλ (cid:9) ≍ min n , n / o = 133nd since B n ≍ M /n , therefore B n . min (cid:8) nλ n − k , nλ (cid:9) is ensured if n & M . The condition on σ in the first part follows readily by applying the above bounds to the conditions on σ in Theorem7. This completes the proof of the first part of Corollary 8.The statement of the second part follows in a straightforward manner upon applying the aboveconsiderations to Theorem 8. We only remark that if n &
1, then the condition 1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) + q (1 + (cid:12)(cid:12) L λ (cid:12)(cid:12) ) log n . εn is ensured provided n / . εn or equivalently n & (1 /ε ) (this subsumesthe requirement n &&