[PDF] Geometry of error amplification in solving Prony system with near-colliding nodes

Abstract

We consider a reconstruction problem for ``spike-train'' signals F of an a priori known form F(x)=\sum_{j=1}^{d}a_{j}\delta\left(x-x_{j}\right), from their moments m_k(F)=\int x^kF(x)dx. We assume that the moments m_k(F), k=0,1,\ldots,2d-1, are known with an absolute error not exceeding \epsilon > 0. This problem is essentially equivalent to solving the Prony system \sum_{j=1}^d a_jx_j^k=m_k(F), \ k=0,1,\ldots,2d-1. We study the ``geometry of error amplification'' in reconstruction of F from m_k(F), in situations where the nodes x_1,\ldots,x_d near-collide, i.e. form a cluster of size h \ll 1. We show that in this case, error amplification is governed by certain algebraic varieties in the parameter space of signals F, which we call the ``Prony varieties''. Based on this we produce lower and upper bounds, of the same order, on the worst case reconstruction error. In addition we derive separate lower and upper bounds on the reconstruction of the amplitudes and the nodes. Finally we discuss how to use the geometry of the Prony varieties to improve the reconstruction accuracy given additional a priori information.

Full PDF

aa r X i v : . [ m a t h . C A ] M a y GEOMETRY OF ERROR AMPLIFICATION IN SOLVING PRONY SYSTEMWITH NEAR-COLLIDING NODES

ANDREY AKINSHIN, GIL GOLDMAN, AND YOSEF YOMDIN

Abstract.

We consider a reconstruction problem for “spike-train” signals F of an a priori knownform F ( x ) = P dj =1 a j δ ( x − x j ) , from their moments m k ( F ) = R x k F ( x ) dx. We assume that themoments m k ( F ), k = 0 , , . . . , d −

1, are known with an absolute error not exceeding ǫ > P dj =1 a j x kj = m k ( F ) , k =0 , , . . . , d − . We study the “geometry of error ampliﬁcation” in reconstruction of F from m k ( F ) , in situationswhere the nodes x , . . . , x d near-collide, i.e. form a cluster of size h ≪

1. We show that in thiscase, error ampliﬁcation is governed by certain algebraic varieties in the parameter space of signals F , which we call the “Prony varieties”.Based on this we produce lower and upper bounds, of the same order, on the worst casereconstruction error. In addition we derive separate lower and upper bounds on the reconstructionof the amplitudes and the nodes.Finally we discuss how to use the geometry of the Prony varieties to improve the reconstructionaccuracy given additional a priori information.

1. Introduction

The problem of reconstruction of spike-trains, and of similar signals, from noisy moment mea-surements, and a closely related problem of robust solving the classical Prony system, is a prominentproblem in Mathematics and Engineering. Here we consider the case when the nodes nearly collide,which is well known to present major mathematical diﬃculties, and is closely related to a spike-trainsuper resolution problem (see [12], [24] as a small sample).The aim of the present paper is to describe the patterns of ampliﬁcation of the measurements error ǫ in the reconstruction process, caused by the geometric nature of the Prony system, independentlyof the speciﬁc method of its inversion. We concentrate, following the line of [1, 2], on the “simplestnon-trivial case”, where the nodes of a spike-train signal F form a single cluster of size h ≪

1, with d nodes, while the measurements are the ﬁrst 2 d real moments of F . Assume that our signal F ( x ) is a spike-train, i.e. a linear combi-nation of d shifted δ -functions:(1.1) F ( x ) = d X j =1 a j δ ( x − x j ) , where a = ( a , . . . , a d ) ∈ R d , x = ( x , . . . , x d ) ∈ R d . We assume that the form (1.1) is a priori known,but the speciﬁc parameters ( a, x ) are unknown. Our goal is to reconstruct ( a, x ) from 2 d moments Mathematics Subject Classiﬁcation.

Primary 65H10, 94A12, 65J22.

Key words and phrases.

Signal reconstruction, spike-trains, Fourier transform, Prony systems, sparsity. m k ( F ) = R ∞−∞ x k F ( x ) dx, k = 0 , . . . , d −

1, which are known with a possible error bounded by ǫ > m k ( F ) are expressed through the unknownparameters ( a, x ) as m k ( F ) = P dj =1 a j x kj . Hence our reconstruction problem is equivalent to solvingthe Prony system of algebraic equations, with the unknowns a j , x j : d X j =1 a j x kj = µ ′ k , | µ ′ k − m k ( F ) | ≤ ǫ, k = 0 , , . . . , d − . (1.2)This system and its various extensions and generalizations appears in many theoretical and appliedproblems (see [29, 4, 30, 25, 26, 27, 31] and references therein).We shall denote by P = P d the parameter space of signals F , P d = { ( a, x ) = ( a , . . . , a d , x , . . . , x d ) ∈ R d , x < x < . . . < x d } , and by M = M d ∼ = R d the moment space, consisting of the 2 d -tuples of the moments ( m , m ,. . . , m d − ). We will identify signals F with their parameters ( a, x ) ∈ P . Let a signal F as above be ﬁxed. The main object we study in this paper is the ǫ -error set E ǫ ( F )consisting of all signals F ′ ( x ) which may appear as the reconstruction of F, from the noisy momentmeasurements µ ′ k with | µ ′ k − m k ( F ) | ≤ ǫ, k = 0 , . . . , d − . Deﬁnition 1.1.

The error set E ǫ ( F ) ⊂ P is the set consisting of all the signals F ′ ( x ) ∈ P with | m k ( F ′ ) − m k ( F ) | ≤ ǫ, k = 0 , . . . , d − . Our ultimate goal is a detailed understanding of the geometry of the error set E ǫ ( F ), in thevarious cases where the nodes of F near-collide, and applying this information in order to improvethe reconstruction accuracy. The results presented here describe the geometry of the error set of asingle cluster, which we will show to have very diﬀerent scales of magnitude in certain directions.For this propose consider the following deﬁnition of a cluster conﬁguration.For a signal F ∈ P d we denote by I F = [ x , x d ] the minimal interval in R containing all the nodes x , . . . , x d . We put h ( F ) = ( x d − x ) to be the half of the length of I F , and put κ ( F ) = ( x + x d )to be the central point of I F . Deﬁnition 1.2 (Regular cluster) . For F = ( a, x ) ∈ P d with h = h ( F ) ≤ , κ = κ ( F ) , m, M suchthat < m ≤ M and η > , we say that F forms an ( h, κ, η, m, M ) -regular cluster if its amplitudes a , . . . , a d satisfy m ≤ | a j | ≤ M, j = 1 , . . . , d, and the distance between any neighboring nodes x j , x j +1 , j = 1 , . . . , d − , is at least ηh . Consider the following “normalization” applied on signals F forming an ( h, κ, η, m, M )-regularcluster: shifting the interval I F to have its center at the origin, and then rescaling I F to the interval[ − , κ ∈ R and h > κ,h : P d → P d , deﬁned by ( a, x ) → ( a, ¯ x ) , with¯ x = (¯ x , . . . , ¯ x d ) , ¯ x j = 1 h ( x j − κ ) , j = 1 , . . . , d. RRORS IN SOLVING PRONY SYSTEM 3

For a given signal F with h = h ( F ) and κ = κ ( F ), we call the signal G = Ψ κ,h ( F ) the model signalfor F . Clearly, h ( G ) = 1 and κ ( G ) = 0. Explicitly G is written as G ( x ) = d X j =1 a j δ ( x − ¯ x j ) . With a certain misuse of notations, we will denote the space P d containing the model signals G by¯ P d , and call it “the model space”.For a given F ∈ P d with the model signal G = Ψ κ,h ( F ) , we denote by ¯ E ǫ ( F ) the “normalized”error set: ¯ E ǫ ( F ) = Ψ κ,h ( E ǫ ( F )) . Let F form a cluster of size h ≪ , while inside the cluster the nodes are well separated fromone another. The reason for mapping such signal F into the model space is that in this case we willshow that the moment coordinates centered at F ,( m ( F ′ ) − m ( F ) , . . . , m d − ( F ′ ) − m d − ( F )) , are “stretched” in some directions, up to an order of ( h ) d − . In contrast, the coordinates system ( m ( G ′ ) − m ( G ) , . . . , m d − ( G ′ ) − m d − ( G )) is bi-Lipschitz equivalent to the standard coordinates ( a, ¯ x ) of ¯ P d , in a neighborhood of order h d − around G (quantitative inverse function theorem, see Theorem 4.1 below). Below we describe the geometry of the error set ¯ E ǫ ( F ) in the associated model space ¯ P d whichturns out to have distinct scaling along certain algebraic curves. Note that ¯ E ǫ ( F ) is simply a translated and rescaled version of E ǫ ( F ) in the nodes coordinates. Hence, the description of ¯ E ǫ ( F ) directly describes E ǫ ( F ) via the inverse transformations.Now we deﬁne the “Prony varieties”, which are just coordinate subspaces of diﬀerent dimensions,with respect to the moment coordinates. Deﬁnition 1.3.

For each q = 0 , . . . , d − , and µ = ( µ , . . . , µ q ) , the Prony variety S q ( µ ) is analgebraic variety in the parameter space P d , deﬁned by the ﬁrst q + 1 equations of the Prony system(1.2): (1.4) d X j =1 a j x kj = µ k , k = 0 , , . . . , q. For a signal F ∈ P d and µ = ( m ( F ) , m ( F ) , . . . , m q ( F )) we will denote by S q ( F ) the variety S q ( µ ) . For a ﬁxed signal F and decreasing q the Prony varieties S q ( F ) form an increasing chain ofalgebraic varieties in P : F = S d − ( F ) ⊂ S d − ( F ) ⊂ . . . ⊂ S ( F ) ⊂ S ( F ) ⊂ P . Generically, S q ( F ) is a smooth subvariety of dimension 2 d − q −

1. In particular, S d − ( F ) is thesolution of the Prony system (1.2) while S d − ( F ) is a regular curve, passing through F .From the point of view of Algebraic Geometry, Singularity Theory and Number Theory, Pronyvarieties present an object highly important by itself. They are closely related to the Vandermondevarieties, introduced by Arnold and coauthors in [3], and to the geometry of hyperbolic polynomials For measurement error ǫ of order greater than h d − , the Prony reconstruction becomes much more complicated.In particular, singularities of various types appear (see [9]). A. AKINSHIN, G. GOLDMAN, AND Y. YOMDIN ([18] and references therein). Understanding of the geometry of Prony varieties is closely related tothe “Rank stratiﬁcation” of the space of matrices, which is a central topic in Algebraic Geometryand Singularity Theory. In this paper we almost completely ignore this point of view, referring theinterested reader to [16] and references therein.Notice that appearance of Algebraic Geometry in study of Prony systems is certainly not new(compare [21, 19] and references therein in case of multi-dimensional Prony systems). However,Prony varieties in case of one-dimensional spike trains probably were not investigated earlier.

Let the nodes x , . . . , x d of F form a cluster of size h ≪ G = Ψ κ,h ( F ) be the model signal of F . Informally, our main results in the case ǫ of order h d − or less are the following:(1) In Section 5, Theorem 5.1 and Theorem 5.2, we describe the geometry of the error set ¯ E ǫ ( F ) .It is shown that the Prony Varieties provide the “principal components” of the error set in thefollowing sense: For each q = 2 d − , . . . , , ¯ E ǫ ( F ) is contained within a neighborhood of size ∼ (cid:0) h (cid:1) q ǫ of the Prony variety S q ( G ) . Put diﬀerently, the width of ¯ E ǫ ( F ) in the direction of themodel moment coordinate m k ( G ′ ) − m k ( G ) , k = 0 , . . . , d − , is of order h − k ǫ. See Figures 1and 2 below. (2)

In Section 6, Theorems 6.1 and 6.2, we use the above result to derive lower and upper bounds,of the same order, on the worst case reconstruction error. We show that:The worst case reconstruction error, ρ ( F, ǫ ) = max F ′ ∈ E ǫ ( F ) || F ′ − F || , is of order ǫh − d +1 .The worst case reconstruction error of the amplitudes, ρ a ( F, ǫ ) = max F ′ =( a ′ ,x ′ ) ∈ E ǫ ( F ) || a ′ − a || , is of order ǫh − d +1 .The worst case reconstruction error of the nodes, ρ x ( F, ǫ ) = max F ′ =( a ′ ,x ′ ) ∈ E ǫ ( F ) || x ′ − x || , is of order ǫh − d +2 .We stress that reconstructions F ′ with reconstruction errors as above cannot occur everywhere:they fall into a small neighborhood of the Prony curve S d − ( F ) . This fact is used in Section 3to improve the reconstruction accuracy (see item 4 below). Our next result concerns the accuracy of reconstruction of the Prony varieties S q ( F ):(3) While the point worst case reconstruction error of the signal F is of order ǫh − d +1 , the curve S d − ( F ) itself can be reconstructed with a better accuracy of order ǫh − (2 d − . The “hierarchy ofthe accuracy rates” is continued along the chain S d − ( F ) ⊂ . . . ⊂ S ( F ) of the Prony varieties S q ( F ) : each S q ( F ) can be reconstructed with an accuracy of order ǫh − q . See Theorem 6.4. Based on the theory developed in sections 5 and 6 we conclude our results with the followingfact: Through this text we assume that the Prony inversion (when possible) is accurate, and that the reconstructionerror is caused only by the measurements error. Moreover, we will always assume below that all the “algebraic-geometric” operations, with the known parameters, are performed accurately. Speciﬁcally this concerns constructingcertain algebraic curves and higher-dimensional varieties. Of course, such algorithmic constructions in ComputationalAlgebraic Geometry may present well-known diﬃculties, but in the present paper we do not touch this topic.

RRORS IN SOLVING PRONY SYSTEM 5

Figure 1.

The projections of the error set ¯ E ǫ ( F ) and a section of the Prony curve S ( G ), for F = δ ( x + 0 .

1) + δ ( x − . h = 0 . ǫ = h and G = Ψ ,h ( F ). Figure 2.

The projections of the error set ¯ E ǫ ( F ) and a section of the Prony curve S ( G ), for F = δ ( x + 0 .

05) + δ ( x − . h = 0 . ǫ = h and G = Ψ ,h ( F ).Note the convergence of ¯ E ǫ ( F ) into S ( G ) as the cluster size reduces. A. AKINSHIN, G. GOLDMAN, AND Y. YOMDIN (4)

If a certain additional a priori information is available on the signal F , the reconstructionaccuracy can be signiﬁcantly improved via the following procedure: ﬁrst we reconstruct the Pronyvariety S q ( F ) for a certain appropriate q . The accuracy of this reconstruction (of order ǫh − q )is higher than that of a single point solution. Then we use the additional information availablein order to accurately localize the signal F inside the Prony variety S q ( F ) . In section 3 wedemonstrate this procedure and how it can improve the reconstruction accuracy with respect toProny method. Remark : Consider the case ǫ of order greater than h d − . Our approach, based on the regularityof the moment coordinates, does not apply here since for large errors the reconstruction encounterssingularities. We do not study this case here, however, the Prony varieties S q , being algebraicobjects that are deﬁned globally, remain a relevant tool in studying error ampliﬁcation and collisionsingularities in much larger scales (See [9]). This paper is organized as follows:In Section 2 we discuss related settings and results. In particular, we explain in detail the con-nection between the results of present paper to the case of Fourier measurements / super resolutionsetting (in particular, with N ≫ d or continuous samples), and the possible extensions to the caseof several clusters.In Section 3 we show possible applications of our results to improve the reconstruction accuracyof Prony method. We provide a simple example, supported by numerical simulations, where takinginto account the Prony varieties, signiﬁcantly improves the reconstruction accuracy.Sections 4 - 6 are devoted to the accurate stating of the results and their proofs. In Section 4 weintroduce the “Prony mapping”, and study its inversion via “Quantitative inverse function theorem”.In Section 5 our main results on the geometry of the error set are stated and proved. In Section6 we derive, based on the previous section, tight estimates on the worst case reconstruction error.Finally, in Appendix (A), we proof a speciﬁc form of the quantitative inverse function theorem,giving explicit expression for the constants used in the text. The research of GG and YY is supported in part by the Minerva Foun-dation. The authors would like to thank the referees for suggesting signiﬁcant improvements in thepresentation.

2. Related work and discussion

As it was already mentioned in the Introduction, in the present paper we concentrate on a ratherrestricted case of the spike-train reconstruction problem. First, we take the real moments as themeasurements (instead of much more common and natural Fourier samples). Second, we take exactly2 d moment measurements (instead of N ≫ d moments or Fourier samples). Finally, we assumethat the nodes of F form exactly one cluster, instead of the more general conﬁguration of severalclusters.The main reasons for us to insists on this setting is that it presents in a relatively compact formthe most essential patterns of the error ampliﬁcation in multi-cluster moment / Fourier spike-trainreconstruction. We discuss this fact in detail in subsections 2.1, 2.2, 2.3 below. In this section we outline the tightconnection between the super-resolution problem, where the measurements are Fourier samples, andthe results of the present paper about moment reconstruction. In fact, up to constants, the error

RRORS IN SOLVING PRONY SYSTEM 7 set in the case of Fourier measurements is described by exactly the same moment inequalities, as inthe preset paper .For a signal F of the form (1.1), let F ( F ) denote the Fourier transform of F : F ( F )( s ) = Z ∞−∞ F ( x ) e − πixs dx = d X j =1 a j e − πix j s . In a super resolution setting, it is frequently assumed that the measurements for the reconstructionof F are given as a function Φ satisfying(2.1) | Φ( s ) − F ( F )( s ) | ≤ ǫ, s ∈ [ − Ω , Ω] . where ǫ > > ǫ -error set 1.1, we deﬁne the Fourier ǫ -error set as follows. Deﬁnition 2.1.

For ǫ, Ω > and F ∈ P d , the Fourier error set E ǫ, Ω ( F ) ⊂ P d is the set consistingof all the signals F ′ ( x ) ∈ P d with |F ( F ′ )( s ) − F ( F )( s ) | ≤ ǫ, s ∈ [ − Ω , Ω] . Similarly to the normalized moment error set we deﬁne the normalized Fourier error set as ¯ E ǫ, Ω ( F ) = Ψ κ ( F ) ,h ( F ) ( E ǫ, Ω ( F )) . Let F form an ( h, κ, η, m, M )-regular cluster as the case considered in this paper. Deﬁne thesuper resolution factor as SRF = 1Ω h .

The radius of the Fourier error set, or equivalently the worst case reconstruction error of F , in thesuper resolution setting (2.1), was shown to be an order of SRF d − ǫ (see [1, 7, 6] for oﬀ-grid settingand [11, 12] for grid setting). If we further assume that at most l ≤ d nodes of F form a cluster ofsize h , then recent results show that the scaling of the radius of the error set improves to an orderof SRF l − ǫ (see [22, 20, 7]).The Fourier error set and the moment error set are related via the Taylor series expansion of theFourier transform, that is expressed using the moments as follows (see [1, Proposition 3.1]):(2.2) F ( F )( s ) = ∞ X k =0 m k ( F ) k ! ˜ s k , w here ˜ s = − πis. In fact it is possible to show that these sets are equivalent in the following sense:Let F = ( a, x ) ∈ P d form an ( h, κ, η, m, M )-regular cluster. Then, there exist constants B , B ,B , B depending only on η, d, m, such that for each Ω , ǫ satisfying SRF ≥ B and 0 ≤ ǫ ≤ B ( SRF ) − d +1 , it holds that(2.3) E B ǫ (Ψ κ, ( F )) ⊆ Ψ κ, (cid:0) E ǫ, Ω ( F ) (cid:1) ⊆ E B ǫ (Ψ κ, ( F )) , or equivalently(2.4) Ψ − κ, (cid:0) E B ǫ (Ψ κ, ( F )) (cid:1) ⊆ E ǫ, Ω ( F ) ⊆ Ψ − κ, (cid:0) E B ǫ (Ψ κ, ( F )) (cid:1) . Put diﬀerently, for any signal F ∈ P d , the Fourier diﬀerence is ǫ small, i.e.max s ∈ [ − Ω , Ω] |F ( F )( s ) − F ( F )( s ) | ≤ ǫ, A. AKINSHIN, G. GOLDMAN, AND Y. YOMDIN

Figure 3.

The projections of the normalized Fourier error set ¯ E ǫ, Ω ( F ) and a sectionof the Prony curve S ( G ), for F = δ ( x + 0 .

1) + δ ( x − . h = 0 . ǫ = 5 h and G = Ψ ,h ( F ). Compare with Figure 1.if and only if the moments m , . . . , m d − of the centered and scaled by Ω diﬀerence signal Ψ κ, ( F ) − Ψ κ, ( F ), are order of ǫ small.The above fact was used in [1, 7] to derive a tight lower bound on the minimax error rate of“oﬀ-grid” clustered super resolution that reads SRF l − ǫ , where l is the maximal number of nodesin each cluster (as mentioned earlier in this section).Let F (Ω) = Ψ κ, ( F ) and consider the moment error set E ǫ ( F (Ω) ) = E ǫ (Ψ κ, ( F )). The signal F (Ω) forms an (Ω h, , η, m, M )-regular cluster. The main result of this paper asserts that for each q = 0 , . . . , d −

1, the error sets E B ǫ ( F (Ω) ), E B ǫ ( F (Ω) ) are contained within a neighborhood of size ∼ ( SRF ) q ǫ of the Prony variety S q ( F (Ω) ) (See Theorem 5.1 and Theorem 5.2). Using the formerand (2.4) we get that the Fourier error set E ǫ, Ω ( F ) is also contained within a neighborhood of size ∼ ( SRF ) q ǫ of the Prony variety S q ( F ). See Figure 3.This geometrical structure of the Fourier error set suggests a similar procedure to improve thereconstruction accuracy as we demonstrate for the Prony method in Section 3. We intend to presentresults in this direction in future work. In the present paper, we keep the number of moment measurementsexactly 2 d : this is enough to obtain the correct error asymptotic behavior for the cluster size h ≪ RRORS IN SOLVING PRONY SYSTEM 9

1. Let F = P dj =1 a j δ ( x − x j ) , a j ∈ C , x j ∈ R . We apply “decimation” (see [5]), i.e. take exactly2 d uniformly spaced Fourier samples, with the step-size λ of order Ω2 d . In other words, we use “mostof the available bandwidth Ω”, keeping the number of the samples 2 d . As a result we get a Pronysystem with the nodes z j = e πiλx j on the unit circle. Clearly, the size h of any cluster becomes λh ∼ Ω h .2. We show that for “many” values of λ no new proximities between the nodes on the circle arecreated.3. We apply the approach of the present paper (but with the “quantitative inverse functiontheorem” extended to the complex spaces), and ﬁnally produce the accuracy bounds of the requiredform, with h replaced by Ω h (see Section 2.1). This gives a “correct” decay rate of the reconstructionerror, with respect to the bandwidth Ω.Available studies of certain high-resolution algorithms such as MUSIC [23], ESPRIT/MatrixPencil [13], Approximate Prony Method [28], multivariate Prony method [19] and others providerigorous performance guarantees for the case SRF <

1. We hope that our proof techniques here andin [7] may be used in deriving the stability limits of these and other methods in the super-resolutionregime, i.e. for

SRF > Our description of the error set, via the moment inequalities,and of its “skeleton”, provided by the hierarchy of the Prony varieties, extends to spike train signalsforming several clusters. Let F be a signal with the node clusters Q , . . . , Q m , each Q s being of size h s and containing d s nodes, s = 1 , . . . , m . Denote by F s the “local signals”, corresponding to theclusters Q s . The main fact in this situation is the following: If the clusters Q s of F are “well-separated”, in comparison to their size, then the error set of F is, essentially, the Cartesian product of the “local” error sets of F s , s = 1 , . . . , m . This up toconstants, depending on the mutual position of the clusters Q s , on their “multiplicities” d s , and ontheir sizes h s .This claim follows from the “mutual independence” of the local signals F s , corresponding to thenode clusters Q s : the errors in the moments of the local signals F s cannot cancel in the moments oftheir sum F . This last property is important in many questions far beyond the study of multi-clustererror sets and Prony varieties, and some of its special cases were recently conﬁrmed in the literature(compare [22, 20, 6]). We expect this property to hold in general.Consequently, also the description of the error set using the Prony varieties given in the presentpaper for one cluster, extends to the multi-cluster case via the Cartesian products of the local Pronyvarieties as follows: For each q , consider the subvariety ˜ S q in the signal space, which is the Cartesianproduct of the “local” Prony varieties S sq corresponding to the clusters Q s :˜ S q = S q × S q × . . . × S mq . We see immediately that the moments up to q are constant on ˜ S q , while the higher moments m k can be locally bounded through the k -th powers of the cluster sizes h s . Consequently, ˜ S q play inthe multi-cluster case the same role of a “skeleton” of the error set, as in the case of one cluster,described in detail in the present paper.Thus, in principle, the main results of the current paper can be extended to several clusters.However, technically, the accurate description becomes rather involved. Still, we believe that adetailed understanding of the “algebraic-geometric skeleton” of the error ampliﬁcation in the caseof several clusters is highly important. We plan to present results in this direction separately.

3. Improving the reconstruction accuracy given some additional information

In this section we shortly discuss the way one can use the Prony varieties in order to improvethe reconstruction accuracy of a spike train signal from its 2 d initial moments. Speciﬁcally, we showthat Prony varieties can help to optimally utilize an additional information on the reconstructedsignals.As we explain in Section 2.1, the spreading and scale of the error in Fourier reconstruction istightly connected to moment reconstruction via (2.4) (see also Figure 3). We therefore expect thatthe procedure we describe here can ultimately help to improve the accuracy of widely used Fourierreconstruction methods - ESPRIT, APM, Matrix pencil and variants. We intend to present resultsin this direction in future separate work.Assume that the measured signal F , is known to form a small regular cluster of size h << F . We do not specifyhere the nature of this information, which can either be known a priori or a result of a diﬀerent,non-moment, measurement of the signal, assuming just that the measured signal is known to residewithin a subset Ω ⊂ P . Recall that for measurement error ǫ ≥

0, our input for the reconstruction of F are the momentmeasurements µ ′ = ( µ ′ , . . . , µ ′ d − ) with | µ ′ k − m k ( F ) | ≤ ǫ, k = 0 , , . . . , d − . (3.1)Now consider the following reconstruction procedure: Algorithm 3.1:

Prony curve reconstruction procedure given a priori information - PCRP

Input : number of nodes d . Input : measured moments µ ′ = ( µ ′ , . . . , µ ′ d − ) satisfying (3.1). Input : feasible set Ω ⊂ P d . Output: an estimate F P CRP ∈ P d . Solve the Prony system (3.1) with input µ ′ and recover the signal F ′ ; Use F ′ to reconstruct the Prony curve S d − ( F ′ ); if S d − ( F ′ ) ∩ Ω = ∅ then Find a signal F P CRP which is closest to F ′ in the intersection S d − ( F ′ ) ∩ Ω, i.e. F P CRP = arg min F ∈ S d − ( F ′ ) ∩ Ω || F ′ − F || ; return the estimate F P CRP . else Find a signal F SRP which is closest to F ′ in the feasible set Ω, i.e. F SRP = arg min F ∈ Ω || F ′ − F || ; return the estimate F SRP .We compare the above procedure to the following “natural” solution algorithm using Pronymethod, which does not relies on Prony curves (and appears as an edge case of the PCRP in step6):

RRORS IN SOLVING PRONY SYSTEM 11

Algorithm 3.2:

Standard reconstruction procedure given a priori information- SRP

Input : number of nodes d . Input : measured moments µ ′ = ( µ ′ , . . . , µ ′ d − ) satisfying (3.1). Input : feasible set Ω ⊂ P d . Output: an estimate F SRP ∈ P d . Solve the Prony system (3.1) with input µ ′ and recover the signal F ′ ; Find a signal F SRP which is closest to F ′ in the feasible set Ω, i.e. F SRP = arg min F ∈ Ω || F ′ − F || ; return the estimate F SRP .Let us now explain why the reconstruction procedure using the Prony curve, PRCP, is expectedto improve the accuracy with respect to standard reconstruction procedure, SRP, of solving theProny system and then projecting the solution into the feasible set.Consider the solution F ′ to the Prony system (3.1), with input µ ′ , appearing as a ﬁrst step inboth reconstruction procedures. The distance of F ′ from F , in the worst case, is of order h − d +1 ǫ (see item 2 in the sketch of the main results or the formal result in Theorem 6.2). We have that thetrue solution F is contained in an order of h − d +2 ǫ neighborhood of the Prony curve S d − ( F ′ ) (seeitem 3 in the sketch of the main results or the formal result in Corollary 6.2).At the ﬁnal step of the SRP we take the closest signal to F ′ in Ω. This closest point is typicallyat the same distance of order h − d +1 ǫ from F . In contrast, in the PCRP we take the closest signalto F ′ in ˜Ω = S d − ( F ′ ) ∩ Ω (presuming that this set is non-empty, see step 4 in the PCRP). Nowsince F is located in a tiny belt around S d − ( F ′ ), and provided that the diameter of ˜Ω is of order h − d +2 ǫ or less, we get an order of h -magnitude better accuracy guarantees compared to the SRP.That is, in such case we get that the worst case reconstruction error of the PCRP is ∼ h − d +2 ǫ ,while the worst case reconstruction error of the SRP is ∼ h − d +1 ǫ .The same explanation as above holds for comparing the reconstruction accuracy of the nodes of F , but with all accuracy bounds multiplied by h . That is, if the diameter of the projection of ˜Ω intothe nodes coordinates is ∼ h − d +3 ǫ , then the worst case reconstruction error of the nodes using thePCRP is ∼ h − d +3 ǫ , whereas the worst case reconstruction error using the SRP is ∼ h − d +2 ǫ .In Figure 4 we demonstrate this eﬀect on the reconstruction of the nodes of F . Figure 5 shows the results of our numerical experiments whichare arranged as follows: We ﬁx the signal F = δ ( x + 0 .

05) + δ ( x − .

05) and a noise level of ǫ = h = (0 . / . The feasible sets Ω xw in the node space are the strips transversal to S d − ( F ),Ω xw = { ( x , x ) , | x + x | ≤ w } (as seen in Figure 4, highlighted in pink). For uniform ǫ -noise (i.e. the measured moments areuniformly distributed inside the ǫ -cube in the moments space), we plot the averages e SRP , e

P CRP of the reconstruction error of the two procedures as a function of the width w of Ω xw .As seen in the ﬁgure, the advantage of the PCRP grows as the size w decreases. For valuesof w ∼ h or less the P CRP attains a reconstruction error of ∼ h while the SRP attains areconstruction error of ∼ h . Figure 4.

The signal F is given by F = δ ( x + 0 .

05) + δ ( x − .

05) and ǫ = h .The signal F ′ is the reconstruction from the measurements µ ′ = (1 , , h , − h ).Highlighted in blue is the ǫ -error set. Note the improved reconstruction F P CRP that is attained by moving over the Prony curve, compared to F SRP which we getby projecting F ′ onto Ω. Figure 5.

RRORS IN SOLVING PRONY SYSTEM 13

4. Prony mapping and its inversion4.1. Prony mapping.Deﬁnition 4.1.

The Prony mapping

P M = P M d : P d → M d is given by P M ( F ) = µ = ( µ , . . . , µ d − ) ∈ M , µ k = m k ( F ) , k = 0 , . . . , d − . For F ∈ P the problem of its reconstruction from the exact moment measurements µ = ( µ , . . . ,µ d − ) ∈ M , is the problem of inverting the Prony mapping P M.

In this paper we always assumethat this inversion (when deﬁned) is accurate.Consider the noisy measurements µ ′ = ( µ ′ , . . . , µ ′ d − ) ∈ M of the moments of F . By ourassumption, the measurement error of each of the moments m k ( F ) does not exceed ǫ , i.e. | µ ′ k − µ k ( F ) | ≤ ǫ . Equivalently, the noisy measurement µ ′ may fall at any point in the cube(4.1) Q ǫ ( µ ) = { µ ′ = ( µ ′ , . . . , µ ′ d − ) ∈ M , | µ ′ k − µ k | ≤ ǫ, k = 0 , , . . . , d − } . Consequently, the ǫ -error set E ǫ ( F ) is the preimage P M − ( Q ǫ ( µ )) ⊂ P . Throughout this text we will always use the maximum norm || · || on both spaces M d and P d :For µ = ( µ , . . . , µ d − ) , µ ′ = ( µ ′ , . . . , µ ′ d − ) ∈ M d || µ ′ − µ || = max k =0 , ,..., d − | µ ′ k − µ k | . For F = ( a, x ) , F ′ = ( a ′ , x ′ ) ∈ P d , || F − F ′ || = max( || a − a ′ || , || x − x ′ || ) . For a matrix B = [ b ij ] || B || = max i X j | b ij | . Our ﬁrst result describes the inversionof the Prony mapping in a neighborhood of a “regular point”, i.e. of a signal G with all its d nodeswell separated, and with all its amplitudes bounded and well separated from zero. This result is,essentially, a direct application of the “quantitative inverse function theorem” (see, for instance,[17], page 264, Theorem 2.10.7 or [14], Theorem 3.2) combined with the estimates of the norm ofthe Jacobian of the Prony mapping and the norm of its inverse.Assume that the nodes x , . . . , x d of a signal G all belong to the interval I = [ − , η with 0 < η ≤ d − , d >

1, the distance between the neighboring nodes x j , x j +1 , j =1 , . . . , d − , is at least η . We also assume that for certain m, M with 0 < m < M , the amplitudes a , . . . , a d satisfy m ≤ | a j | ≤ M, j = 1 , . . . , d . We call such signals ( η, m, M )-regular. We distinguish(as above) the parameter and the moment spaces of the model signals G , denoting them by ¯ P , ¯ M , respectively. For G ∈ ¯ P we denote by ν = ( ν , . . . , ν d − ) its Prony image P M ( G ) ∈ ¯ M . Theorem 4.1.

Let G be an ( η, m, M ) -regular signal then there exist constants R, C , C , C , C (given explicitly below and in Appendix A) depending only on d, η, m, M such that(1) The Jacobian J at G of the Prony mapping P M is invertible, with || J − || ≤ C , || J || ≤ C . (2) The inverse mapping P M − exits on the cube Q R ( ν ) of size R centered at ν ∈ ¯ M , and providesa diﬀeomorphism of Q R ( ν ) to Ω R ( G ) = P M − ( Q R ( ν )) . For each ν ′ , ν ′′ ∈ Q R ( ν ) C || ν ′′ − ν ′ || ≤ || P M − ( ν ′′ ) − P M − ( ν ′ ) || ≤ C || ν ′′ − ν ′ || . Proof.

Let J = J ( G ) denote the Jacobian matrix of P M at a (regular) signal G , J k,j = ( ∂m k ( G ) ∂a j = x kj , k = 0 , . . . , d − , j = 1 , . . . , d, ∂m k ( G ) ∂x j = ka j x k − j , k = 0 , . . . , d − , j = d + 1 , . . . , d. The matrix J admits the following factorization (about factorization of the Prony Jacobian see also[8])(4.2) J =  .. .. x .. x d .. . .. . . .. .x d − .. x d − d (2 d − x d − .. (2 d − x d − d  (cid:20) I d D (cid:21) , where D = diag ( a , . . . , a d ) is a d × d diagonal matrix with the amplitudes on the diagonal and I d is the d × d identity matrix. Denote the left hand matrix in this factorization by U d . This is aspecial type of a conﬂuent Vandermonde matrix, the norm of its inverse, which is important in ourestimates, was studied in [15]. Theorem 4.2 (Gautschi, [15], Theorem 3) . || U − d || ≤ max ≤ λ ≤ d b λ  d Y j =1 ,j = λ | x j || x λ − x j |  ,b λ = max  | x λ | , | x λ | ) d X j =1 ,j = λ | x λ − x j |  . Based on the above, for U d formed by the nodes of an ( η, m, M )-regular signal, that is | x i | ≤ | x i − x j | ≥ η , it is straight forward to bound || U − d || in terms of the constants η, d . The followingproposition (given without proof) provide such upper bound. Proposition 4.1.

Let | x i | ≤ and | x i − x j | ≥ η then || U − d || ≤ (1 + 4 η − (ln( d ) + 1)) η − d +1 d − (cid:0) ⌊ d − ⌋ ! (cid:1) ! . Now using proposition 4.1 and the factorization equation (4.2) we have that(4.3) || J − || ≤ max[1 , m − ](1 + 4 η − (ln( d ) + 1)) η − d +1 d − (cid:0) ⌊ d − ⌋ ! (cid:1) ! = C ( m, η, d ) = C . In addition, for an ( η, m, M )-regular signal, a direct calculation shows that || J || ≤ d + M (2 d − d = C . This conclude the proof of statement 1 of Theorem 4.1.The second statement of Theorem 4.1 follows from “quantitative inverse function theorem” (see,[17], Theorem 2.10.7 or [14]) taking into account that in this result the constants C , C and R aregiven in terms of upper bounds on || J − || , || J || and a local upper bound on the magnitude of thesecond derivatives of P M . The latter can be easily obtained in terms of d, η, m, M . The requiredconstants C , C and R are derived explicitly in appendix A. This completes the proof of Theorem4.1. (cid:3) RRORS IN SOLVING PRONY SYSTEM 15

Let us denote Ω R ( G ) ⊂ ¯ P as the preimage P M − ( Q R ( ν )). We give an equivalent formulation ofTheorem 4.1, in terms of the moment coordinates. Deﬁnition 4.2.

For G a regular signal as above, and G ′ denoting signals near G , the momentcoordinates are the functions f k ( G ′ ) = m k ( G ′ ) − m k ( G ) , k = 0 , ..., d − . The moment metric d ( G ′ , G ′′ ) on ¯ P is deﬁned through the moment coordinates as d ( G ′ , G ′′ ) := d − max k =0 | m k ( G ′′ ) − m k ( G ′ ) | = || P M ( G ′ ) − P M ( G ′′ ) || . Corollary 4.1.

Let G be a regular signal as above. Then the moment coordinates form a regularanalytic coordinate system on Ω R ( G ) . The moment metric d ( G ′ , G ′′ ) is bi-Lipschitz equivalent on Ω R ( G ) to the maximum metric || G ′′ − G ′ || in ¯ P : C d ( G ′ , G ′′ ) ≤ || G ′′ − G ′ || ≤ C d ( G ′ , G ′′ ) . Proof.

It follows directly from Theorem 4.1. (cid:3)

5. The geometry of the error set for nodes forming an h -cluster We use regular signals G as above, as a model for signals with a “regular cluster”: For F ∈ P with h = h ( F ) and κ = κ ( F ) (i.e. F having its nodes cluster in an interval of size h and center κ ),we say that F forms an ( h, κ, η, m, M )-regular cluster if G = Ψ κ,h ( F ) is an ( η, m, M )-regular signal.Explicitly F forms an ( h, κ, η, m, M )-regular cluster if its amplitudes a , . . . , a d satisfy m ≤ | a j | ≤ M, j = 1 , . . . , d, and the distance between the neighboring nodes x j , x j +1 , j = 1 , . . . , d − , is atleast ηh . We formulate our main results in terms of the model signal G .For any G ∈ ¯ P and ǫ, α > Deﬁnition 5.1.

Deﬁne Π ǫ,α ( G ) ⊂ ¯ P as the parallelepiped, in moments coordinates, consisting ofall signals G ′ ∈ ¯ P satisfying the inequalities | m k ( G ′ ) − m k ( G ) | ≤ ǫα k , k = 0 , . . . , d − . Deﬁnition 5.2.

For each ≤ q ≤ d − , deﬁne S q,ǫ,α ( G ) as the part of the Prony variety S q ( G ) , consisting of all signals G ′ ∈ S q ( G ) with | m k ( G ′ ) − m k ( G ) | ≤ ǫα k , k = q + 1 , . . . , d − . Theorem 5.1 below describes the set ¯ E ǫ ( F ) ⊂ ¯ P , under an addi-tional assumption that there is no shift. In this case the description becomes especially transparent.The eﬀect of a non-zero shift κ is described in Section 5.2 below. In particular, a version of Theorem5.1 with a non-zero shift is given in Theorem 5.2. Theorem 5.1.

Let F ∈ P form an ( h, , η, m, M ) -regular cluster and let G = Ψ ,h ( F ) be the modelsignal for F . Then:(1) For each positive ǫ we have ¯ E ǫ ( F ) = Π ǫ, h ( G ) . (2) For each positive ǫ ≤ Rh d − , ¯ E ǫ ( F ) is contained within the ∆ q -neighborhood of the part of theProny variety S q,ǫ, h ( G ) , for ∆ q = C (cid:18) h (cid:19) q ǫ. The constants

R, C are deﬁned in Theorem 4.1 above. Remark : Assume that the measurement error ǫ ≤ Rh d − . By Corollary 4.1 we have that the metricinduced by the moments is equivalent to maximum metric on ¯ P . Combing this with statement 1of Theorem 5.1, we obtain that the error set ¯ E ǫ ( F ) is a “deformed” parallelepiped in standardcoordinates of ¯ P . See ﬁgures 1 and 2 in subsection 1.2. Proof of Theorem 5.1.

Denote by M ¯ E ǫ ( F ) = P M ( ¯ E ǫ ( F )) ⊂ ¯ M the set of all the possible errorsin the moments m k ( G ) , k = 0 , , . . . , d − , corresponding to the errors not exceeding ǫ in themoments of F .Consider the scaling transformation SC α , which acts on signals F via scaling of the nodes of F : SC α ( F )( x ) = α F ( xα ) . For F ( x ) = P dj =1 a j δ ( x − x j ) we have that SC α ( F ) = P dj =1 a j δ ( x − αx j ) , and therefore(5.1) m k ( SC α ( F )) = d X j =1 a j α k x kj = α k m k ( F ) . Accordingly, we deﬁne the scaling transformation SC ∗ α : M → M on the moment space as follows:for µ = ( µ , . . . , µ d − )(5.2) SC ∗ α ( µ ) = ν = ( ν , . . . , ν d − ) , ν k = α k µ k , k = 0 , . . . , d − . With these deﬁnitions we have for all F ′ ∈ P (5.3) P M ( SC α ( F ′ )) = SC ∗ α ( P M ( F ′ )) . For the model signal G we have G = SC h ( F ). Set µ = P M ( F ). Accordingly, the set M ¯ E ǫ ( F )of the possible measurements for the moments of G is SC ∗ h ( Q ǫ ( µ )) . The initial moment error set E ǫ ( F ) is the ǫ -cube Q ǫ ( µ ), Q ǫ ( µ ) = { µ ′ = ( µ ′ , . . . , µ ′ d − ) ∈ M , | µ ′ k − µ k | ≤ ǫ, k = 0 , , . . . , d − } . Consequently, M ¯ E ǫ ( F ) is a coordinate parallelepiped(5.4) M Π ǫ, h ( ν ) := { ν ′ ∈ M , | ν ′ k − ν k | ≤ ǫ (cid:18) h (cid:19) k , k = 0 , , . . . , d − } . The error set ¯ E ǫ ( F ) ⊂ ¯ P is the preimage¯ E ǫ ( F ) = P M − ( M ¯ E ǫ ( F )) = P M − ( M Π ǫ, h ( ν )) = Π ǫ, h ( G ) . This concludes the proof of the ﬁrst part of Theorem 5.1.We now prove the second part of Theorem 5.1. By part one of the theorem we already knowthat M ¯ E ǫ ( F ) = M Π ǫ, h ( ν ) is the parallelepiped given in (5.4). On the other hand P M ( S q,ǫ, h ( G )) isthe projection of M Π ǫ, h ( ν ) into the last 2 d − q − G ). Hence max ν ′ ∈ M ¯ E ǫ ( F ) min ν ′′ ∈ P M ( S q,ǫ, h ( G )) || ν ′ − ν ′′ || = (cid:18) h (cid:19) q ǫ. In order to apply Theorem 4.1 and Corollary 4.1 we have to check that the parallelepiped M ¯ E ǫ ( F ) = M Π ǫ, h ( ν ) is contained in the cube Q R ( ν ) of size R centered at ν ∈ ¯ M . The maximal edge of

RRORS IN SOLVING PRONY SYSTEM 17 M Π ǫ, h ( ν ) has length ǫh − d +1 , and hence for ǫ ≤ Rh d − the required inclusion holds. Now, byapplying Corollary 4.1, we getmax G ′ ∈ ¯ E ǫ ( F ) min G ′′ ∈ S q,ǫ, h ( G ) || G ′ − G ′′ || = C (cid:18) h (cid:19) q ǫ. (cid:3) For a signal G ∈ ¯ P recall that the parallelepiped Π ǫ,α ( G ) ⊂ ¯ P ,is the set of all signals G ′ ∈ ¯ P satisfying | m k ( G ′ ) − m k ( G ) | ≤ ǫα k , k = 0 , . . . , d − . Theorem 5.2.

Let F ∈ P form an ( h, κ, η, m, M ) -regular cluster and let G = Ψ κ,h ( F ) be the modelsignal for F . Set ǫ ′ = (1 + | κ | ) − d +1 ǫ and h ′ = h | κ | . Then:(1) For any ǫ > , the error set ¯ E ǫ ( F ) is contained between the following two parallelepipeds in themoment coordinates: Π ǫ ′ , h ( G ) ⊂ ¯ E ǫ ( F ) ⊂ Π ǫ, h ′ ( G ) , where Π ǫ ′ , h ( G ) = ( G ′ ∈ ¯ P : | m k ( G ) − m k ( G ′ ) | ≤ (1 + | κ | ) − d +1 ǫ (cid:18) h (cid:19) k , k = 0 , . . . , d − ) , Π ǫ, h ′ ( G ) = ( G ′ ∈ ¯ P : | m k ( G ) − m k ( G ′ ) | ≤ ǫ (cid:18) | κ | h (cid:19) k , k = 0 , . . . , d − ) . (2) For any ǫ ≤ Rh ′ d − , the error set ¯ E ǫ ( F ) is contained within the ∆ ′ q -neighborhood of the partof the Prony variety S q,ǫ, h ′ ( G ) , for ∆ ′ q = C (cid:18) h ′ (cid:19) q ǫ. The constants

R, C are deﬁned in Theorem 4.1 above.Proof Theorem 5.2: Let us describe the eﬀect of a shift transformation in P and in M . Deﬁne theshift transformation SH κ : P → P of the parameter space by SH κ ( F )( x ) = F ( x + κ ). The followingproposition describes the action of the coordinate shift on the moments of general spike-trains: Proposition 5.1. m k ( F ) = k X l =0 (cid:18) kl (cid:19) ( κ ) k − l m l ( SH κ ( F )) , m k ( SH κ ( F )) = k X l =0 (cid:18) kl (cid:19) ( − κ ) k − l m l ( F ) . Proof.

For F ( x ) = P dj =1 a j δ ( x − x j ) ∈ P we get m k ( SH κ ( F )) = d X j =1 a j ( x j − κ ) k = d X j =1 a j k X l =0 (cid:18) kl (cid:19) ( − κ ) k − l x lj == k X l =0 (cid:18) kl (cid:19) ( − κ ) k − l d X j =1 a j x lj = k X l =0 (cid:18) kl (cid:19) ( − κ ) k − l m l ( F ) . Replacing κ by − κ we get the second expression. (cid:3) Accordingly, we deﬁne the shift transformation SH ∗ κ : M → M as the following linear transfor-mation on the moment space: For µ ′ = ( µ ′ , . . . , µ ′ d − ) ∈ M SH ∗ κ ( µ ′ ) = ν = ( ν , . . . , ν d − ) , ν k = k X l =0 (cid:18) kl (cid:19) ( − κ ) k − l µ l , k = 0 , , . . . , d − . Proposition 5.1 shows that the shift transformations SH κ and SH ∗ κ , and the Prony mapping P M satisfy the following identity:(5.5)

P M ( SH κ ( F )) = SH ∗ κ ( P M ( F )) . Since SH ∗ κ is a linear transformation we will omit the parentheses and write SH ∗ κ µ instead of SH ∗ κ ( µ ). We extend this rule to every linear transformation T and write T v instead of T ( v ). Wehave the following bounds for the norms of SH ∗ κ and SH ∗− κ : Proposition 5.2.

The shift transformation SH ∗ κ : M → M satisﬁes for each ≤ k ≤ d − µ ∈M , || µ || =1 | ( SH ∗ κ µ ) k | ≤ (1 + | κ | ) k , max µ ∈M , || µ || =1 | ( SH ∗− κ µ ) k | ≤ (1 + | κ | ) k , where ( SH ∗ κ µ ) k , ( SH ∗− κ µ ) k , denotes the k th coordinate of SH ∗ µ and SH ∗− µ respectively. As aresult || SH ∗ κ || , || SH ∗− κ || ≤ (1 + | κ | ) d − . Proof:

For µ = ( µ , . . . , µ d − ) ∈ M , with || µ || = 1, we have for each 0 ≤ k ≤ d − || ( SH ∗ κ µ ) k || ≤ max k =0 ,..., d − k X l =0 (cid:18) kl (cid:19) | κ | k − l = (1 + | κ | ) k The inequality for | ( SH ∗− κ µ ) k | follows by noting that SH ∗− κ = SH ∗− κ . (cid:3) Let F ∈ P , as above, form an ( h, κ, η, m, M )-regular cluster and put P M ( F ) = µ . Then, byidentities (5.5) and (5.3) P M ( ¯ E ǫ ( F )) = P M ( SC h SH κ E ǫ ( F )) = SC ∗ h SH ∗ κ P M ( E ǫ ( F )) = SC ∗ h SH ∗ κ Q ǫ ( µ ) . Put ξ = ( ξ , . . . , ξ d − ) = SH ∗ κ ( F ). By Proposition 5.2 we have SH ∗ κ Q ǫ ( µ ) ⊂ M Π ǫ, | κ | = { ξ ′ ∈ M , | ξ ′ k − ξ k | ≤ ǫ (1 + | κ | ) k , k = 0 , , . . . , d − } . Put ν = ( ν , . . . , ν d − ) = P M ( G ). Then again by identities (5.5) and (5.3) ν = P M ( G ) = P M (Ψ κ,h ( F )) = P M ( SC h SH κ F ) == SC ∗ h SH ∗ κ P M ( F ) = SC ∗ h SH ∗ κ µ. Using the above and by deﬁnition of SC ∗ we get SC ∗ h SH ∗ κ Q ǫ ( µ ) ⊂ SC ∗ h M Π ǫ, | κ | = { ν ′ ∈ M , | ν ′ k − ν k | ≤ ǫ (cid:18) | κ | h (cid:19) k , k = 0 , , . . . , d − } . This proves that ¯ E ǫ ( F ) ⊂ Π ǫ, | κ | h ( G ) = Π ǫ, h ′ ( G ). RRORS IN SOLVING PRONY SYSTEM 19

We now prove that for ǫ ′ = (1 + | κ | ) d − ǫ , Π ǫ ′ , h ( G ) ⊂ ¯ E ǫ ( F ). By Proposition 5.2 the norm ofthe inverse shift transformation has the following lower bound, || SH ∗− κ || ≤ (1 + | κ | ) d − . Then SH ∗ κ Q ǫ ( µ ) ⊃ (cid:16) | κ | (cid:17) d − Q ǫ ( ξ ). Applying the scaling transformation SC ∗ h we get SC ∗ h SH ∗ κ Q ǫ ( µ ) ⊃ (cid:18)

11 + | κ | (cid:19) d − SC ∗ h Q ǫ ( ξ ) = (cid:18)

11 + | κ | (cid:19) d − M Π ǫ,h ( ν ) . Therefore ¯ E ǫ ( F ) ⊃ Π (1+ | κ | ) − d +1 ǫ, h ( G ) = Π ǫ ′ , h ( G ). This completes the proof of the ﬁrst statementof Theorem 5.2.Next we prove the second statement of Theorem 5.2. For a given 0 ≤ q ≤ d −

1, we need to showthat the error set ¯ E ǫ ( F ) is contained in an order of h − q ǫ neighborhood of the part of the Pronyvariety S q,ǫ, h ′ ( G ), i.e. max G ′ ∈ ¯ E ǫ ( F ) min G ′′ ∈ S q,ǫ, h ′ ( G ) || G ′ − G ′′ || ≤ C (cid:18) h ′ (cid:19) q ǫ. Set as above M ¯ E ǫ ( F ) = P M ( ¯ E ǫ ( F )) ⊂ M and ν = ( ν , . . . , ν d − ) = P M ( G ). By statement 1of Theorem 5.2, M ¯ E ǫ ( F ) ⊂ M Π ǫ, h ′ ( ν ) = { ν ′ ∈ M , | ν ′ k − ν k | ≤ ǫ (cid:18) | κ | h (cid:19) k , k = 0 , , . . . , d − } . On the other hand

P M ( S q,ǫ, h ′ ( G )) is the projection of M Π ǫ, h ′ ( ν ) into the last 2 d − q − G ). Hencemax ν ′ ∈ M ¯ E ǫ ( F ) min ν ′′ ∈ P M ( S q,ǫ, h ′ ( G )) || ν ′ − ν ′′ || ≤ max ν ′ ∈ M Π ǫ, h ′ ( ν ) min ν ′′ ∈ P M ( S q,ǫ, h ′ ( G )) || ν ′ − ν ′′ || = (cid:18) h ′ (cid:19) q ǫ. We now want to apply equivalence of the moments metric on ¯ M and the maximum metric on¯ P given in Corollary 4.1. For this purpose we need to check that M ¯ E ǫ ( F ) ⊂ Q R ( ν ). Again bystatement 1 of the theorem M ¯ E ǫ ( F ) ⊂ M Π ǫ, h ′ ( ν ). By assumption we have that ǫ ≤ Rh ′ d − then M ¯ E ǫ ( F ) ⊂ M Π ǫ, h ′ ( ν ) ⊂ Q R ( ν ) . Now applying Corollary 4.1 we getmax G ′ ∈ ¯ E ǫ ( F ) min G ′′ ∈ S q,ǫ, h ′ ( G ) || G ′ − G ′′ || ≤ C (cid:18) | κ | h (cid:19) q . This concludes the proof of statement 2 of Theorem 5.2. (cid:3)

6. Worst case reconstruction error

We now consider the worst case reconstruction error of a signal F = ( a, x ) forming an ( h, κ,η, m, M )-regular cluster. Deﬁne the worst case reconstruction error of F as ρ ( F, ǫ ) = max F ′ ∈ E ǫ ( F ) || F ′ − F || . In a similar way we deﬁne ρ a ( F, ǫ ) and ρ x ( F, ǫ ) as the worst case errors in reconstruction of theamplitudes and nodes of F respectively: ρ a ( F, ǫ ) = max F ′ =( a ′ ,x ′ ) ∈ E ǫ ( F ) || a ′ − a || , ρ x ( F, ǫ ) = max F ′ =( a ′ ,x ′ ) ∈ E ǫ ( F ) || x ′ − x || . We show that for ǫ ≤ O ( h − d +1 ), ρ ( F, ǫ ) , ρ a ( F, ǫ ) are of order h − d +1 ǫ and ρ x ( F, ǫ ) is of order h − d +2 ǫ .The following theorem provide tight, up to constants, upper bounds on ρ ( F, ǫ ) , ρ x ( F, ǫ ) , ρ a ( F, ǫ ).It is a direct consequence of the geometry of the error set presented in Theorem 5.2.

Theorem 6.1. [Reconstruction error upper bound] Let F ∈ P form an ( h, κ, η, m, M ) -regular clus-ter. Then for each positive ǫ ≤ (cid:16) h | κ | (cid:17) d − R the following bounds for the worst case reconstructionerrors is valid: ρ ( F, ǫ ) , ρ a ( F, ǫ ) ≤ C (cid:18) | κ | h (cid:19) d − ǫ, ρ x ( F, ǫ ) ≤ C h (cid:18) | κ | h (cid:19) d − ǫ, where C , R are the constants deﬁned in Theorem 4.1.Proof. For F = ( a, x ) as in the theorem, let G = ( a, ¯ x ) = Ψ κ,h ( F ) be the model signal of F . Wedeﬁne the model worst case reconstruction errors ¯ ρ ( F, ǫ ) , ¯ ρ a ( F, ǫ ) and ¯ ρ x ( F, ǫ ) by¯ ρ ( F, ǫ ) = max G ′ ∈ ¯ E ǫ ( F ) || G ′ − G || , ¯ ρ a ( F, ǫ ) = max G ′ =( a ′ ,x ′ ) ∈ ¯ E ǫ ( F ) || a ′ − a || , ¯ ρ x ( F, ǫ ) = max G ′ =( a ′ ,x ′ ) ∈ ¯ E ǫ ( F ) || x ′ − ¯ x || . We deﬁne the model worst case reconstruction error ˜ ρ ( F, ǫ ) in the moment metric by˜ ρ ( F, ǫ ) = max G ′ ∈ ¯ E ǫ ( F ) d ( G ′ , G ) . By Theorem 5.2, the error set ¯ E ǫ ( F ) ⊂ Π ǫ, | κ | h ( G ). Therefore we have˜ ρ ( G, ǫ ) ≤ max G ′ ∈ Π ǫ, | κ | h ( G ) d ( G ′ , G ) = (cid:18) | κ | h (cid:19) d − ǫ. (6.1)For ǫ ≤ (cid:16) h | κ | (cid:17) d − R we have that P M ( ¯ E ǫ ( F )) ⊂ P M (Π ǫ, | κ | h ( G )) ⊂ Q R ( P M ( G )) . We can therefore apply the equivalence of the moment and the maximum metrics given in Corollary4.1 and get that(6.2) ¯ ρ ( F, ǫ ) ≤ C ˜ ρ ( F, ǫ ) = C (cid:18) | κ | h (cid:19) d − ǫ. Since ρ a ( F, ǫ ) , ρ x ( F, ǫ ) are each the maximum of the projected errors into the amplitudes and nodessubspaces respectively, inequality 6.2 also implies that¯ ρ a ( F, ǫ ) , ¯ ρ x ( F, ǫ ) ≤ C (cid:18) | κ | h (cid:19) d − ǫ. (6.3) RRORS IN SOLVING PRONY SYSTEM 21

Now we return from G to the original signal F , and from the model space ¯ P to P . In thistransformation the amplitudes remain unchanged, while the nodes are multiplied by h (and shiftedby κ ). Therefore inequalities (6.2) and (6.3) implies that ρ ( F, ǫ ) , ρ a ( F, ǫ ) ≤ C (cid:18) | κ | h (cid:19) d − ǫ, ρ x ( F, ǫ ) ≤ C h (cid:18) | κ | h (cid:19) d − ǫ. (cid:3) We now give lower bounds on the worst case reconstruction errors: ρ ( F, ǫ ) , ρ a ( F, ǫ ) and ρ x ( F, ǫ )of the same order of the upper bounds given in Theorem 6.1 above.

Theorem 6.2. [Reconstruction error lower bound] Let F ∈ P form an ( h, κ, η, m, M ) -regular clusterthen:(1) For each positive ǫ ≤ C h d − we have the following lower bound on the worst case recon-struction error of the nodes of FK ǫ (cid:18) h (cid:19) d − ≤ ρ x ( F, ǫ ) . (2) For each positive ǫ ≤ C h d − we have the following lower bound on the worst case recon-struction error of F and the amplitudes of FK ǫ (cid:18) h (cid:19) d − ≤ ρ ( F, ǫ ) , ρ a ( F, ǫ ) . Above, K , K , C , C are constants not depending on h and will be deﬁned within the proof of theTheorem.Proof. Let G = ( a, ¯ x ) = Ψ κ,h ( F ) be the model signal of F = ( a, x ). Let P M ( G ) = ν =( ν , . . . , ν d − ). Consider now the Prony curve S d − ( G ) which is deﬁned by the equations m k ( G ′ ) = m k ( G ) = ν k , k = 0 , . . . , d − . Assume that ǫ ≤ Rh d − and let ǫ ′ = (1 + | κ | ) − d +1 ǫ . By the choice of ǫ we have P M (Π ǫ ′ , h ( G )) ⊂ Q R ( ν ) . Then by Corollary 4.1 the moment coordinates form a regular analytic coordinate system onΠ ǫ ′ , h ( G ). We can therefore ﬁx the signal G LB ⊂ ¯ P with moment coordinates ν LB = ( ν , . . . , ν d − ,ν d − + ǫ ′ h − d +1 ). The signal G LB is one of the intersection points of the Prony curve S d − ( G )and the boundary of the parallelepiped Π ǫ ′ , h ( G ).By Theorem 5.2 we have that the error setΠ ǫ ′ , h ( G ) ⊂ ¯ E ǫ ( F ) , hence G LB ∈ ¯ E ǫ ( F ). Once again by Corollary 4.1 the moment metric and the maximum metric on¯ P are equivalent and we have(6.4) || G LB − G || ≥ C · d ( G LB , G ) = C ǫ ′ h − d +1 . The rest of the proof is essentially devoted to the fact that the projection of the error into boththe amplitudes and nodes is non degenerate and to deriving speciﬁc constants that bound frombelow the size of these projections.

Let G LB = (˜ a, ˜ x ) with ˜ a = ( ˜ a , . . . , ˜ a d ) and ˜ x = ( ˜ x , . . . , ˜ x d ). We now prove that for this speciﬁcsignal (and for ǫ small enough), the errors in the amplitudes and in the nodes, || ˜ a − a || and || ˜ x − ¯ x || ,are bounded from below as required.We study in more detail the structure of the Jacobian matrix of the Prony mapping at (the regularsignal) G .The Jacobian of P M at the point G = ( a, ¯ x ) is given by the matrix J = J ( G ):(6.5) J =  .. .. x .. ¯ x d a .. a d . .. . . .. . ¯ x d − .. ¯ x d − d a (2 d − x d − .. a d (2 d − x d − d  , or J = [ J k,j ] with J k,j = ( ∂m k ( G ) ∂a j = x kj , k = 0 , . . . , d − , j = 1 , . . . , d, ∂m k ( G ) ∂x j = ka j x k − j , k = 0 , . . . , d − , j = d + 1 , . . . , d. We use the following notation to refer to submatrix blocks of J . For J as above, we index therows of J (corresponding to the moment functions m , . . . , m d − ) by 0 , . . . , d − J by 1 , . . . , d . We will denote by J ( m : n, i : j ), 0 ≤ m ≤ n ≤ d −

1, 1 ≤ i ≤ j ≤ d , the blockof J formed by the intersection of the rows m, . . . , n and the columns i, . . . , j of J .We now prove a lower bound for the worst case errors of the nodes of G . Proposition 6.1.

For G LB = (˜ a, ˜ x ) , G = ( a, ¯ x ) as above and for ǫ ≤ C h d − || ˜ x − ¯ x || ≥ K ǫ (cid:18) h (cid:19) d − , where K , C are constants depending only on ( η, κ, M, m, d ) that are deﬁned within the proof.Proof. Consider the upper left d × d block of J , J = J (0 : d − , d ) and the upper right block of J , J = J (0 : d − , d + 1 : 2 d ).We will need the following preliminaries:The next proposition bounds the remainder of the linear estimate of P M near a regular signal G . Proposition 6.2.

Let G be an ( η, m, M ) -regular signal. Let r ≤ d − and G ′ be a signal such that || G ′ − G || ≤ r . Let J = J ( G ) be the Jacobian matrix at G . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:0) P M ( G ′ ) − P M ( G ) (cid:1) − J · (cid:0) G ′ − G (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( d, M ) · r · || G ′ − G || , where C = 6( M + 1)(2 d − d . The proof of Proposition 6.2 is given as an intermediate step in the proof of the quantitativeinverse function theorem version, see Appendix A, Proposition A.1.

Proposition 6.3.

Let A be a non-singular d × d matrix and B be any non-zero d × d matrix. Let v = 0 , u ∈ R d such that || Av + Bu || ≤ α || v || where || · || is any norm on R d . Then || u || ≥ || v || − α || A − |||| A − || || B || , where || A − || , || B || are the induced matrix norms. RRORS IN SOLVING PRONY SYSTEM 23

Proof.

Put ω = Av + Bu , then || v || = || A − ( − Bu + ω ) || ≤ || A − || ( || B || || u || + || w || ) ≤ || A − || ( || B || || u || + α || v || ) . Rearranging the above we get || v || − α || A − |||| A − || || B || ≤ || u || . (cid:3) Let P ,d : R d → R d be the projection to the ﬁrst d coordinates, i.e. for x = ( x , . . . , x d − ) ∈ R d , P ,d − x = P ,d − ( x ) = ( x , . . . , x d − ).By Proposition 6.2 we get that C || G LB − G || ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:0) P M ( G LB ) − P M ( G ) (cid:1) − J · (cid:0) G LB − G (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P ,d − (cid:18)(cid:0) P M ( G LB ) − P M ( G ) (cid:1) − J · (cid:0) G LB − G (cid:1)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = || J (˜ a − a ) + J (˜ x − ¯ x ) || . (6.6)We note that J is a Vandermonde matrix with nodes ¯ x , . . . , ¯ x d . The following theorem boundsthe norm of an inverse Vandermonde matrix. Theorem 6.3 (Gautschi, [15], Theorem 1) . Let V d = V d ( x , . . . , x d ) be a d × d Vandermonde matrix, V i,j = x ij , i = 0 , . . . , d − , j = 1 , . . . , d , with distinct nodes. Then || V − d || ≤ max ≤ λ ≤ d d Y j =1 ,j = λ | x j || x λ − x j | . The nodes of G satisﬁes | ¯ x i | ≤ i = j , | ¯ x i − ¯ x j | ≥ η . Based on theorem 6.3 we can boundthe norm of || J − || by a constant depending on the minimal separation of the nodes η and d . Thenext proposition, given without proof, is a direct consequence of theorem 6.3 above. Proposition 6.4.

Let V d = V d ( x , . . . , x d ) be a d × d Vandermonde matrix, V i,j = x ij , i = 0 , . . . , d − with | x i | ≤ and | x i − x j | ≥ η for each ≤ i < j ≤ d then || V − d || ≤ η − d +1 d − (cid:0) ⌊ d − ⌋ ! (cid:1) . Therefore we ﬁx the constant C = C ( η, d ) such that || J − || ≤ C ( η, d ) ≤ η − d +1 d − (cid:0) ⌊ d − ⌋ ! (cid:1) . By a direct calculation we also have that || J || ≤ d ( d − M. By equation (6.4) || G LB − G || ≥ C ǫ ′ h − d +1 , ǫ ′ = (1 + | κ | ) − d +1 ǫ . Hence, either || ˜ x − ¯ x || = || G LB − G || ≥ C ǫ ′ h − d +1 and in this case setting K = C (1 + | κ | ) − d +1 we are done. Else,(6.7) || ˜ a − a || = || G LB − G || . We continue under the assumption of equation (6.7). From (6.7) and (6.6) we have that for α = C || G LB − G || || J (˜ a − a ) + J (˜ x − ¯ x ) || ≤ α || ˜ a − a || . We now apply Proposition 6.3 for: A = J , B = J , v = ˜ a − a, u = ˜ x − ¯ x, α = C || G LB − G || . We get that || ˜ x − ¯ x || ≥ C ǫ ′ h − d +1 (cid:18) − C || G LB − G || || J − |||| J − || || J || (cid:19) . (6.8)Deﬁne the constant C ( κ, d, η, m, M ):(6.9) C ( κ, d, η, m, M ) = min (cid:20) (1 + | κ | ) d − C C C , R (cid:21) . Then for ǫ ≤ C h d − the numerator in (6.8) satisﬁes(6.10) 1 − C || G LB − G || || J − || ≥ . Where above we used Corollary 6.1 to upper bound || G LB − G || by C ǫ ′ h − d +1 . By the previouslyderived bounds on || J − || , || J || and by inequality (6.10)(6.11) (cid:18) − C || G LB − G || || J − |||| J − || || J || (cid:19) ≥ || J − || || J || ≥ C d ( d − M .

Plugging (6.11) back into (6.8) we have that for ǫ ≤ C h d − (6.12) || ˜ x − ¯ x || ≥ C ǫ ′ h − d +1 (cid:18) C d ( d − M (cid:19) . Fixing(6.13) K = C (1 + | κ | ) − d +1 C d ( d − M we get that K ǫh − d +1 ≤ || ˜ x − ¯ x || . This concludes the proof of proposition 6.1. (cid:3) We now prove the lower bound for the worst case error of the amplitudes of G . Proposition 6.5.

For G LB = (˜ a, ˜ x ) , G = ( a, ¯ x ) as above and for ǫ ≤ C h d − || ˜ a − a || ≥ K ǫ (cid:18) h (cid:19) d − , where K , C are constants depending only on ( η, κ, M, m, d ) which are deﬁned within the proof.Proof. The proof for Proposition 6.5 goes along similar lines as that of Proposition 6.1. Considerthe following blocks of the Jacobian matrix at G given in equation (6.5). Let J = J (1 : d, d ) and J = J (1 : d, d + 1 : 2 d ). Let P ,d : R d → R d be the projection to the coordinates (2 , . . . , d + 1), i.e.for v = ( v , . . . , v d − ) ∈ R d , P ,d v = P ,d ( v ) = ( v , . . . , v d ). By Proposition 6.2 we get that C || G LB − G || ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:0) P M ( G LB ) − P M ( G ) (cid:1) − J · (cid:0) G LB − G (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P ,d (cid:18)(cid:0) P M ( G LB ) − P M ( G ) (cid:1) − J · (cid:0) G LB − G (cid:1)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = || J (˜ a − a ) + J (˜ x − ¯ x ) || . (6.14)The block J admits the following factorization(6.15) J = diag (1 , , . . . , d ) V d (¯ x , . . . , ¯ x d ) diag ( a , . . . , a d ) , RRORS IN SOLVING PRONY SYSTEM 25 where diag (1 , , . . . , d ) is the diagonal matrix with (1 , . . . , d ) on the diagonal, V d (¯ x , . . . , ¯ x d ) is theVandermonde matrix over the nodes (¯ x , . . . , ¯ x d ) and diag ( a , . . . , a d ) is a diagonal matrix with theamplitudes on the diagonal.By Theorem 6.3 and the factorization given in equation (6.15) we have that || J − || ≤ max[1 , m − ] C . We also have that || J || ≤ d .By equation (6.4) || G LB − G || ≥ C ǫ ′ h − d +1 , ǫ ′ = (1 + | κ | ) − d +1 ǫ . Hence, either || ˜ a − a || = || G LB − G || ≥ C ǫ ′ h − d +1 in this case setting K = C (1 + | κ | ) − d +1 we are done. Else,(6.16) || ˜ x − ¯ x || = || G LB − G || . We continue under the assumption of equation (6.16). From (6.16) and (6.14) we have that for α = C || G LB − G || || J (˜ a − a ) + J (˜ x − ¯ x ) || ≤ α || ˜ x − ¯ x || . We now apply Proposition 6.3 for: A = J , B = J , v = ˜ x − ¯ x, u = ˜ a − a, α = C || G LB − G || . We get that || ˜ a − a || ≥ C ǫ ′ h − d +1 (cid:18) − C || G LB − G || || J − |||| J − || || J || (cid:19) . (6.17)Deﬁne the constant C ( κ, d, η, m, M ):(6.18) C ( κ, d, η, m, M ) = min (cid:20) (1 + | κ | ) d − , m − ] C C C , R (cid:21) . Then for ǫ ≤ C h d − (6.19) 1 − C || G LB − G || || J − || ≥ . Where above we used Corollary 6.1 to upper bound || G LB − G || by C ǫ ′ h − d +1 . By the previouslyderived bounds on || J − || , || J || and by inequality (6.19)(6.20) (cid:18) − C || G LB − G || || J − |||| J − || || J || (cid:19) ≥ || J − || || J || ≥

12 max[1 , m − ] C d . Plugging (6.20) back into (6.17) we have that for ǫ ≤ C h d − (6.21) || ˜ a − a || ≥ C ǫ ′ h − d +1 (cid:18)

12 max[1 , m − ] C d (cid:19) . Fixing(6.22) K = C (1 + | κ | ) − d +1 , m − ] C d we get that K ǫh − d +1 ≤ || ˜ a − a || . This concludes the proof of proposition 6.5. (cid:3) By Propositions 6.1 and 6.5: • For ǫ ≤ C h d − , || ˜ x − ¯ x || ≥ K ǫ (cid:0) h (cid:1) d − . • For ǫ ≤ C h d − , || ˜ a − a || ≥ K ǫ (cid:0) h (cid:1) d − . To complete the proof of Theorem 6.2 we now set F = Ψ − κ,h ( G ) and F LB = Ψ − κ,h ( G LB ) ∈ E ǫ ( F ). Inthis transformation the amplitudes ˜ a, a remain unchanged, while the nodes ¯ x, ˜ x are multiplied by h (and shifted by κ ). Hence, denoting F LB = (˜ a, ˆ x ): || ˆ x − x || ≥ K ǫ (cid:18) h (cid:19) d − , || ˜ a − a || , || F LB − F || ≥ K ǫ (cid:18) h (cid:19) d − . This proves the stated lower bounds of Theorem 6.2. (cid:3)

Till now we have assumed that all the d nodes of the signal F form a cluster of size h . The lowerbounds of Theorem 6.2 can be easily extended to the case where there are also non-cluster nodes: Corollary 6.1.

Let F ∈ P d . Assume that some s ≤ d of the nodes of F form an ( h, κ, η, m, M ) -regular cluster then: • For each positive ǫ ≤ C h s − K ǫ (cid:18) h (cid:19) s − ≤ ρ x ( F, ǫ ) . • For each positive ǫ ≤ C h s − K ǫ (cid:18) h (cid:19) s − ≤ ρ ( F, ǫ ) , ρ a ( F, ǫ ) . The constants K , K , C , C are the same constants as in Theorem 6.2 but with d replaced with s .Proof. The required lower bounds follows directly from Theorem 6.2. Indeed, we can perturb onlythe nodes and the amplitudes in the cluster, leaving the other nodes and amplitudes ﬁxed, and thenall the calculations and estimates above remain unchanged. (cid:3)

Remark

In the presence of non-cluster nodes obtaining the upper bounds for the worst case re-construction error requires additional considerations. Indeed, perturbing both the cluster and thenon-cluster nodes and the amplitudes a priori may create even larger deviations than those of The-orem 6.1, with the moments, remaining within ǫ of the original ones. Accuracy estimates in thissituation presumably require analysis of several geometric scales at once. There are important openquestions related to this multi-scale analysis. In particular, the following question was suggested in[10]: is it true (as numerical experiments suggest) that for well-separated non-cluster nodes, the ac-curacy of their reconstruction in Prony inversion is of order ǫ , independently of the size and structureof the cluster?Our next result concerns the worst case accuracy of reconstruction of the Prony varieties S q ( F ).The point is that the smaller is q the larger is the variety S q ( F ), but the higher is the accuracy ofits reconstruction. This fact was used in Section 3 in order to improve the reconstruction accuracyof the signal F itself. We will state this result only in the normalized signal space ¯ P .Let F form an ( h, κ, η, m, M )-regular cluster and let G be the model signal of F . Recall theHausdorﬀ distance d H associated with the maximum metric: for A, B ⊆ P d H ( A, B ) = max { sup G ′′ ∈ A inf G ′′′ ∈ B || G ′′ − G ′′′ || , sup G ′′ ∈ B inf G ′′′ ∈ A || G ′′ − G ′′′ ||} . RRORS IN SOLVING PRONY SYSTEM 27

Consider the local Prony variety S πq,ǫ ( G ) = S q ( G ) ∩ Π ǫ, h ′ ( G ), h ′ = h | κ | , and its possible recon-structions S πq,ǫ ( G ′ ) = S q ( G ′ ) ∩ Π ǫ, h ′ ( G ), G ′ ∈ ¯ E ǫ ( F ). Deﬁne the worst case error in reconstructionof the local Prony variety S πq,ǫ ( G ) via the Hausdorﬀ distance d H :¯ ρ q ( F, ǫ ) = max G ′ ∈ ¯ E ǫ ( F ) d H (cid:0) S πq,ǫ ( G ) , S πq,ǫ ( G ′ ) (cid:1) . Theorem 6.4.

Let F ∈ P d form an ( h, κ, η, m, M ) -regular cluster. Set ǫ ′ = (1 + | κ | ) − d +1 ǫ and h ′ = h | κ | . Then for each positive ǫ ≤ h ′ d − RC ǫ ′ (cid:18) h (cid:19) q ≤ ¯ ρ q ( F, ǫ ) ≤ C ǫ (cid:18) h ′ (cid:19) q , where C , C , R are the constants deﬁned in Theorem 4.1.Proof. Deﬁne the Hausdorﬀ distance d M H associated with the moment metric d :For A, B ⊆ P , d M H ( A, B ) = max { sup G ′′ ∈ A inf G ′′′ ∈ B d ( G ′′ , G ′′′ ) , sup G ′′ ∈ B inf G ′′′ ∈ A d ( G ′′ , G ′′′ ) } . Let G be the model signal of F . Deﬁne the worst case error in reconstruction of the local Pronyvariety S πq,ǫ ( G ), in the moment metric, by˜ ρ q ( F, ǫ ) = max G ′ ∈ ¯ E ǫ ( F ) d M H (cid:0) S πq,ǫ ( G ) , S πq,ǫ ( G ′ ) (cid:1) . For each G ′ ∈ ¯ E ǫ ( F ), the Prony varieties S q ( G ) , S q ( G ′ ) ⊂ ¯ P are the moment coordinate subspacesgiven by S q ( G ) = { G ′′ : m k ( G ′′ ) = m k ( G ) , k = 0 , . . . , q } ,S q ( G ′ ) = { G ′′ : m k ( G ′′ ) = m k ( G ′ ) , k = 0 , . . . , q } . The Hausdorﬀ distance between them, with respect to the moment metric d , is equal to max k =0 ,...,q | m k ( G ) − m k ( G ′ ) | . As a result, for every ǫ > ρ q ( F, ǫ ) = max G ′ ∈ ¯ E ǫ ( F ) max k =0 ,...,q | m k ( G ) − m k ( G ′ ) | . By the ﬁrst statement of Theorem 5.2,Π ǫ ′ , h ( G ) ⊂ ¯ E ǫ ( F ) ⊂ Π ǫ, h ′ ( G ) . Therefore, for every ǫ > ǫ ′ (cid:18) h (cid:19) q ≤ ˜ ρ q ( F, ǫ ) ≤ ǫ (cid:18) h ′ (cid:19) q . For ǫ ≤ h ′ d − R , P M ( S πq,ǫ ( G )) , P M ( S πq,ǫ ( G ′ )) ⊂ P M (Π ǫ, h ′ ( G )) ⊂ Q R ( P M ( G )) . We can therefore apply the equivalence of the moment and the maximum metrics given in Corollary4.1 and get, from equation (6.23), the required result of Theorem 6.4. (cid:3)

Notice that, essentially, Theorems 6.1 and 6.2 are a special case of Theorem 6.4, for q = 2 d − Corollary 6.2.

Let F = ( a, x ) ∈ P d form an ( h, κ, η, m, M ) -regular cluster, let ǫ ≤ Rh ′ d − and set h ′ = h | κ | . Then for any F ′ ∈ E ǫ ( F ) and for any q = 0 , . . . , d − :(1) F is contained within the ∆ ′ q -neighborhood of the Prony variety S q ( F ′ ) , for ∆ ′ q = C (cid:18) h ′ (cid:19) q ǫ. (2) The nodes vector x is contained within an h ∆ ′ q -neighborhood of the projection of the Pronyvariety S q ( F ′ ) into the nodes coordinates, S xq ( F ′ ) .The constants R, C are as deﬁned in Theorem 4.1.Proof. First we show certain invariance of the Prony varieties under shift and scale transformations.

Proposition 6.6.

Let F ∈ P and for h > , κ ∈ R , let G = Ψ κ,h ( F ) . Then, for each q =0 , . . . , d − , the Prony varieties S q ( F ) and S q ( G ) satisﬁes S q ( F ) = Ψ − κ,h ( S q ( G )) . The above is simply a result of both the shift and the scale transformations, on the momentsspace, being triangular. Formally:

Proof.

Let F ′ ∈ Ψ − κ,h ( S q ( G )) and let G ′ ∈ S q ( G ) be the signal such Ψ κ,h ( F ′ ) = SC h SH κ ( F ′ ) = G ′ .By equation 5.3 and Proposition 5.1, the moments of F ′ are expressed via the moments of G ′ as m k ( F ′ ) = k X l =0 (cid:18) kl (cid:19) ( κ ) k − l h l m l ( G ′ ) . By deﬁnition of S q ( G ), for all k = 0 , . . . , q and for all G ′ ∈ S q ( G ), m k ( G ′ ) = m k ( G ). Therefore, forall k = 0 , . . . , q , m k ( F ′ ) = P kl =0 (cid:0) kl (cid:1) ( κ ) k − l h l m l ( G ) = m k ( F ) and hence F ′ ∈ S q ( F ). Since each stepis reversible this completes the proof. (cid:3) Now the proof of Corollary 6.2 follows directly from combining the upper bound given in Theorem6.4 and Proposition 6.6. (cid:3)

Appendix A. Quantitative inverse function theorem

Let G be an ( η, m, M )-regular signal and P M ( G ) = ν = ( ν , . . . , ν d − ). To prove Theorem 4.1statement 2 we need to explicitly give constants R, C , C depending only on d, η, m, M such that:The inverse mapping P M − is regular analytic in the cube Q R ( ν ) and for each ν ′ , ν ′′ ∈ Q R ( ν ) C || ν ′′ − ν ′ || ≤ || P M − ( ν ′′ ) − P M − ( ν ′ ) || ≤ C || ν ′′ − ν ′ || . Theorem 6.1, stated in the original signal space P , is strictly a special case of the upper bound given in Theorem6.4, stated in the model space ¯ P . Theorem 6.2 and the lower bound given in Theorem 6.4 has the same asymptoticin h . However, the constants and the required size of ǫ are diﬀerent as in the case of the lower bound in the originalspace P , we need to ensure that the projection of the error into amplitude space is non degenerate. RRORS IN SOLVING PRONY SYSTEM 29

Theorem 4.1, statement 2.

Let J = J ( G ) be the Jacobian matrix at G . Let C = C ( m, η, d ) ,C = C ( d, M ) be the the constants derived in statement 1 of Theorem 4.1 satisfying || J − || ≤ C , || J || ≤ C . Then for R = (cid:0) · C · d ( M + 1)(2 d − (cid:1) − , C = 2 C C C , C = 2 C , the inverse mapping P M − is regular analytic in the cube Q R ( ν ) and for each ν ′ , ν ′′ ∈ Q R ( ν ) C || ν ′′ − ν ′ || ≤ || P M − ( ν ′′ ) − P M − ( ν ′ ) || ≤ C || ν ′′ − ν ′ || . Proof Theorem 4.1, statement 2.

The next proposition provides a Lipschitz constant for the diﬀer-ence between

P M and its linear part in the neighborhood of G . Proposition A.1.

Let G be an ( η, m, M ) -regular signal. Let r ≤ d − and G ′ a signal such that || G ′ − G || ≤ r . Let J = J ( G ) be the Jacobian matrix at G . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:0) P M ( G ′ ) − P M ( G ) (cid:1) − J · (cid:0) G ′ − G (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( d, M ) · r · || G ′ − G || , where C = 6( M + 1)(2 d − d .Proof. First for each G ′′ such that || G − G ′′ || ≤ r we have the the following upper bound on thesecond derivatives of the moments functions. For each moment of order k = 0 , . . . , d − (cid:12)(cid:12)(cid:12)(cid:12) ∂ m k ∂x i ( G ′′ ) (cid:12)(cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12)(cid:12) ∂ m k ∂a i ∂x i ( G ′′ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ M + 1)(2 d − (A.1)while the rest of the second derivatives are zero.Consider the standard multi-index notation. For α = ( α , .., α n ) , α ∈ { N ∪ } n , we deﬁne:Absolute value, | α | = α + .. + α n ; Factorial, α ! = α ! · α ! · · · α n !; Power, for u ∈ R n , u α = u α · .. · u α n n ;Partial derivative, for x = ( x , . . . , x n ) ∈ R n , D α = ∂ | α | ∂x α = ∂ | α | ∂x α ..∂x αnn .Put G ′ = G + h where || h || ≤ r . Taking the ﬁrst order Taylor approximation with remainder wehave that, for each k = 0 , . . . , d − | ( m k ( G ′ ) − m k ( G )) − ∇ m k ( G ) · h | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X | α | =2 , α ∈ N d ∪{ } α ! D α m k ( G k ) h α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where G k ∈ [ G, G ′ ]. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X | α | =2 , α ∈ N d ∪{ } α ! D α m k ( G k ) h α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X | α | =2 , α ∈ N d ∪{ } | D α m k ( G k ) | · | h α |≤ r || h || X | α | =2 , α ∈ N d ∪{ } | D α m k ( G k ) |≤ r || h || d (cid:0) M + 1)(2 d − (cid:1) = 6( M + 1)(2 d − d · r || h || . The proposition follows. (cid:3)

Corollary A.1.

Let G be an ( η, m, M ) -regular signal. Let r ≤ d − and let G ′ , G ′′ be signals suchthat G ′′ , G ′ ∈ Q r ( G ) . Denote by J = J ( G ) the Jacobian matrix at G . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) P M ( G ′′ ) − P M ( G ′ ) (cid:19) − J · (cid:18) G ′′ − G ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( d, M ) · r · || G ′′ − G ′ || . Proof.

It is a direct consequence of Proposition A.1. (cid:3)

Fix r = C C . Then for G ′ , G ′′ ∈ Q r ( G ) we have, by Corollary A.1, that: || P M ( G ′′ ) − P M ( G ′ ) || ≥ || J ( G ′′ − G ′ ) || − C r || G ′′ − G ′ || = || J ( G ′′ − G ′ ) || − C || G ′′ − G ′ ||≥ C || G ′′ − G ′ || − C || G ′′ − G ′ || = 12 C || G ′′ − G ′ || , and || P M ( G ′′ ) − P M ( G ′ ) || ≤ || J ( G ′′ − G ′ ) || + 2 C r || G ′′ − G ′ || = || J ( G ′′ − G ′ ) || + 12 C || G ′′ − G ′ ||≤ C || G ′′ − G ′ || + 12 C || G ′′ − G ′ || = (cid:18) C + C (cid:19) || G ′′ − G ′ || . We conclude that for r = C C and G ′ , G ′′ ∈ Q r ( G ), P M is one to one on Q r ( G ) and satisﬁesthere(A.2) 12 C || G ′′ − G ′ || ≤ || P M ( G ′′ ) − P M ( G ′ ) || ≤ (cid:18) C + C (cid:19) || G ′′ − G ′ || . Since

P M is one to one on the open cube interior ( Q r ( G )), by invariance of domain theorem, P M is a homeomorphism between interior ( Q r ( G )) and P M ( interior ( Q r ( G )), and, P M ( interior ( Q r ( G )) is open.Let P M ( G ) = ν . By equation (A.2), we have that P M ( interior ( Q r ( G )) contains the cube ofradius R = C r , Q R ( ν ), and for each ν ′ , ν ′′ ∈ Q R ( ν )2 C C C || ν ′′ − ν ′ || ≤ || P M − ( ν ′′ ) − P M − ( ν ′ ) || ≤ C || ν ′′ − ν ′ || . Fixing R = 12 C r, C = 2 C C C , C = 2 C , (A.3)concludes the proof of statement 2 of Theorem 4.1. (cid:3) RRORS IN SOLVING PRONY SYSTEM 31

References [1] Andrey Akinshin, Dmitry Batenkov, and Yosef Yomdin. Accuracy of spike-train Fourier reconstruction for collid-ing nodes. In , pages 617–621.IEEE, 2015.[2] Andrey Akinshin, Gil Goldman, Vladimir Golubyatnikov, and Yosef Yomdin. Accuracy of reconstruction of spike-trains with two near-colliding nodes. In

Proc. Complex Analysis and Dynamical Systems VII , volume 699, pages1–17. The AMS and Bar-Ilan University, 2015.[3] Vladimir Igorevich Arnol’d. Hyperbolic polynomials and Vandermonde mappings.

Functional Analysis and ItsApplications , 20(2):125–127, 1986.[4] Jon R Auton and Michael L Van Blaricum. Investigation of procedures for automatic resonance extraction fromnoisy transient electromagnetics data.

Math. Notes , 1:79, 1981.[5] Dmitry Batenkov. Accurate solution of near-colliding Prony systems via decimation and homotopy continuation.

Theoretical Computer Science , 2017.[6] Dmitry Batenkov, Laurent Demanet, Gil Goldman, and Yosef Yomdin. Stability of partial Fourier matrices withclustered nodes. arXiv preprint arXiv:1809.00658 , 2018.[7] Dmitry Batenkov, Gil Goldman, and Yosef Yomdin. Super-resolution of near-colliding point sources. arXivpreprint arXiv:1904.09186 , 2019.[8] Dmitry Batenkov and Yosef Yomdin. On the accuracy of solving conﬂuent Prony systems.

SIAM Journal onApplied Mathematics , 73(1):134–154, 2013.[9] Dmitry Batenkov and Yosef Yomdin. Geometry and singularities of the Prony mapping. In

Proceedings of 12thInternational Workshop on Real and Complex Singularities , volume 10, pages 1–25, 2014.[10] Emmanuel J. Cand`es. private communication. 2014.[11] Laurent Demanet and Nam Nguyen. The recoverability limit for superresolution via sparsity. arXiv preprintarXiv:1502.01385 , 2015.[12] David L Donoho. Superresolution via sparsity constraints.

SIAM journal on mathematical analysis , 23(5):1309–1331, 1992.[13] Albert Fannjiang. Compressive Spectral Estimation with Single-Snapshot ESPRIT: Stability and Resolution. arXiv:1607.01827 [cs, math] , July 2016.[14] Omer Friedland and Yosef Yomdin. Doubling coverings of algebraic hypersurfaces. arXiv preprintarXiv:1512.02903 , 2015.[15] Walter Gautschi. On inverses of Vandermonde and conﬂuent Vandermonde matrices.

Numerische Mathematik ,4(1):117–123, 1962.[16] Gil Goldman, Yehonatan Salman, and Yosef Yomdin. Accuracy of noisy spike-train reconstruction: a singularitytheory point of view.

J. Singul. , 18:409–426, 2018.[17] John H Hubbard and Barbara Burke Hubbard.

Vector calculus, linear algebra, and diﬀerential forms: a uniﬁedapproach . Matrix Editions, 5 edition, 2015.[18] Vladimir Petrov Kostov. Topics on hyperbolic polynomials in one variable.

Panoramas et synth`eses-Soci´et´emath´ematique de France , (33), 2011.[19] Stefan Kunis, H. Michael Moller, Thomas Peter, and Ulrich von der Ohe. Prony method under an almost sharpmultivariate ingham inequality.

J. Fourier Anal. Appl. , 24(5):1306–1318, 2018.[20] Stefan Kunis and Dominik Nagel. On the condition number of Vandermonde matrices with pairs of nearly-colliding nodes. arXiv preprint arXiv:1812.08645 , 2018.[21] Stefan Kunis, Thomas Peter, Tim Romer, and Ulrich von der Ohe. A multivariate generalization of Prony’smethod.

Linear Algebra Appl. , 490:31–47, 2016.[22] Weilin Li and Wenjing Liao. Stable super-resolution limit and smallest singular value of restricted Fourier ma-trices. arXiv preprint arXiv:1709.03146 , 2017.[23] Wenjing Liao and Albert Fannjiang. Music for single-snapshot spectral estimation: Stability and super-resolution.

Applied and Computational Harmonic Analysis , 40(1):33–67, 2016.[24] Jari Lindberg. Mathematical concepts of optical superresolution.

Journal of Optics , 14(8):083001, 2012.[25] Victor Pereyra and Godela Scherer.

Exponential Data Fitting and Its Applications . Bentham Science Publishers,January 2010.[26] Thomas Peter and Gerlind Plonka. A generalized Prony method for reconstruction of sparse sums of eigenfunctionsof linear operators.

Inverse Problems , 29(2):025001, 2013.[27] Gerlind Plonka and Manfred Tasche. Prony methods for recovery of structured functions.

GAMM-Mitteilungen ,37(2):239–258, 2014. [28] Daniel Potts and Manfred Tasche. Parameter estimation for exponential sums by approximate Prony method.

Signal Processing , 90(4):1631–1642, 2010.[29] R Prony. Essai experimental ...

J. de lEcole Polytechnique , 1795.[30] P. Stoica and R.L. Moses.

Spectral Analysis of Signals . Pearson/Prentice Hall, 2005.[31] Martin Vetterli, Pina Marziliano, and Thierry Blu. Sampling signals with ﬁnite rate of innovation.

IEEE trans-actions on Signal Processing , 50(6):1417–1428, 2002.

Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, IsraelLaboratory of Inverse Problems of Mathematical Physics, Sobolev Institute of Mathematics SB RAS,Novosibirsk 630090, Russia

E-mail address : [email protected] Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel

E-mail address ::