Phase retrieval by random binary questions: Which complementary subspace is closer?
PPhase retrieval by random binary questions:Which complementary subspace is closer?
Dylan Domel-White and Bernhard G. Bodmann ∗ November 19, 2019
Abstract
Phase retrieval in real or complex Hilbert spaces is the task of recovering a vector, upto an overall unimodular multiplicative constant, from magnitudes of linear measurements.In this paper, we assume that the vector is normalized, but retain only qualitative, binaryinformation about the measured magnitudes by comparing them with a threshold. In morespecific, geometric terms, we choose a sequence of subspaces in a real or complex Hilbert spaceand only record whether a given vector is closer to the subspace than to the complementarysubspace. The subspaces have half the dimension of the Hilbert space and are independent,uniformly distributed with respect to the action of the orthogonal or unitary groups. Themain goal of this paper is to find a feasible algorithm for approximate recovery based onthe information gained about the vector from these binary questions and to establish errorbounds for its approximate recovery. We provide a pointwise bound for fixed input vectorsand a uniform bound that controls the worst-case scenario among all inputs. Both boundshold with high probability with respect to the choice of the subspaces. For real or complexvectors of dimension n , the pointwise bound requires m ≥ Cδ − n log( n ) and the uniform bound m ≥ Cδ − n log( δ − n ) binary questions in order to achieve an accuracy of δ . The accuracy δ ismeasured by the operator norm of the difference between the rank-one orthogonal projectionscorresponding to the normalized input vector and its approximate recovery. This paper is concerned with approximate phase retrieval from measuring qualitative, binary mea-surements. Phase retrieval is the task of recovering a vector in a real or complex Hilbert space upto an overall multiplicative unimodular constant from magnitudes of linear quantities. Motivatedby applications from diffraction imaging [18, 23, 32], or from studying properties of the Fouriertransform [2, 3], results on phase retrieval first focused on the case where measurements consist ofmagnitudes of linear functionals [7,8,17]. Phase retrieval with quantized measurements was studiedas well [24, 27], see also the preceding works [1, 11, 26]. In this context, quantization means themagnitudes are replaced by values from a finite alphabet. Coarse, one-bit quantization representsthe extreme case, for example when only qualitative information is obtained such as how each ∗ The research in this paper was supported in part by NSF grant DMS-1715735. a r X i v : . [ m a t h . P R ] N ov easured magnitude compares to a single given threshold. Another example in which coarselyquantized measurements appear is quantum state tomography, where the outcomes of experimentsare recorded in order to estimate the state of a quantum system [20,30]. In this case, the probabilityof an outcome is given by the squared norm of the projection of a (normalized) state vector ontoa subspace associated with the outcome. Estimating this quantum state is then, up to the knownnormalization, equivalent to phase retrieval for the state vector. Phase retrieval based on norms ofprojections onto subspaces has also been studied outside of the context of quantum theory [6,13,16].It may be viewed as a fusion-frame version of phase retrieval, where higher rank maps replace linearfunctionals and the norm replaces the absolute value. The recovery of matrices rather than vectorsis yet another higher rank generalization of phase retrieval [14, 27].The main goal of the present paper is to combine coarse quantization with phase retrieval fromnorms of projections under the assumption that the input vector x is normalized. In our setup, eachmeasured quantity is the answer to a binary question: Is the input vector x closer to a given subspaceor to its orthogonal complement? Hence, a measurement results in a binary string that encodesthe orientation of x in terms of the answers to the binary questions associated with a collectionof subspaces. This reduction to binary quantities is a dramatic loss of information compared tophase-insensitive, real-valued measurements. Since the outcome of a measurement is unchanged byrescaling the input vector, we are only obtaining information about the one-dimensional subspacespanned by it. The restriction of x being a unit vector permits us to perform phase retrieval fromits proximity to subspaces. In analogy with the unresolvable ambiguity in phase retrieval, we onlyseek to recover the one-dimensional subspace spanned by x , or equivalently, the orthogonal rank-oneprojection X onto the span of x .To achieve our goal, we use measure concentration arguments and show that measurementscoming from randomly selected subspaces allow approximate recovery via a semidefinite program.The recovery strategy in this paper can be outlined as follows: We specialize to even-dimensionalreal or complex Hilbert spaces and to randomized one-bit measurements based on subspaces of halfthe dimension. For each random subspace V j in a sequence { V , V , . . . , V m } , we determine whetherthe given input vector x is closer to V j or to its orthogonal complement V ⊥ j . The outcome of thebinary measurement is thus encoded in a sequence of orthogonal projections { ˆ P , ˆ P , . . . , ˆ P m } suchthat the range of each ˆ P j is the subspace ˆ V j ∈ { V j , V ⊥ j } that is closest to x . The answer to eachbinary question is equivalently determined by comparing the squared norm (cid:107) P j x (cid:107) = tr [ P j xx ∗ ] toa threshold. For the approximate recovery of the subspace spanned by x we then simply averageover these orthogonal projections { ˆ P j } mj =1 and find the eigenspace corresponding to the largesteigenvalue of this average. We denote the orthogonal projection onto this eigenspace by ˆ X . Thisoperator is, in fact, the solution of a semidefinite program which maximizes (cid:80) mj =1 tr (cid:104) ˆ P j Y (cid:105) in theconvex set of all positive semidefinite Y with tr [ Y ] ≤ [21, Section 4.2]. This strategy is motivatedby earlier results of Plan and Vershynin in the more general setting of one-bit low rank matrixrecovery [27].Randomized constructions and associated algorithms for recovery based on measure concentra-tion have been studied previously in the contexts of matrix recovery, compressed sensing, and otherproblems in phase retrieval [1, 6, 11, 15, 19, 22, 26, 27]. In contrast to the low-rank matrix recoverytreated by Plan and Vershynin [26, 27], we use measure concentration in operator norm to achieveour error bounds. For a related result based on measure concentration in operator norm but withouta low-rank prior, see the work by Guta and others [20] on approximate quantum state tomography2rom measurements associated with projections onto subspaces.In this paper, we show results that control the accuracy of the approximate recovery, in particularthe decay of the error as the number of random subspaces grows. There are two types of errorestimates, pointwise and uniform in the input vector. Pointwise Bound.
For a rank-one orthogonal projection X on a real or complex n -dimensionalHilbert space and a desired recovery accuracy δ > , we show that using m ≥ Cδ − n log( n ) random subspaces for a binary measurement and the algorithm we described yields ˆ X such thatthe operator norm difference is bounded by (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ with high probability. Here C is aconstant independent of n and δ . See Theorem 2.3.3 for the exact statement and proof of thisresult, along with an exact value for C . One may compare this to a similar result from one-bit compressed sensing which says that m = Cδ − n random one-bit measurements (of the form X (cid:55)→ sign(tr [ G j X ]) for { G j } mj =1 independent matrices with independent standard normal entries)are sufficient to recover ˆ X with nuclear norm tr (cid:104)(cid:12)(cid:12)(cid:12) ˆ X (cid:12)(cid:12)(cid:12)(cid:105) = 1 and tr (cid:20)(cid:12)(cid:12)(cid:12) ˆ X ˆ X ∗ (cid:12)(cid:12)(cid:12) (cid:21) ≤ such that theHilbert-Schmidt norm (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) HS = (cid:18) tr (cid:20)(cid:12)(cid:12)(cid:12) ˆ X − X (cid:12)(cid:12)(cid:12) (cid:21)(cid:19) < δ [27, Section 3.3]. Another result onone-bit phase retrieval [24] also gives comparable asymptotics when using measurements based onrank-two Gaussian random matrices. Uniform Bound.
We also establish an error bound that holds uniformly for all rank-one projec-tions as input with one fixed choice of subspaces for measurement. For a desired recovery accuracy δ > , we show that using m ≥ Cδ − n log( δ − n ) random subspaces for a binary measurement ensures with high probability that for each rank-oneorthogonal projection X we obtain ˆ X such that (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ . See Theorem 3.3.1 for details.We note that for fixed n , the asymptotic dependence of m on δ improves on results derived byPlan and Vershynin in a more general setting. This can be attributed to our choice of measurementswhich are constructed with random orthogonal projections, not Gaussian matrices. One expectsthat the Lipschitz regularity of the function X (cid:55)→ tr [ P X ] is better than that of X (cid:55)→ tr [ GX ] , atleast in a set of large measure among all rank-one projections. This is advantageous, in particularin combination with perturbation arguments as in Section 3.2.We also include pointwise and uniform error bounds for faulty meaurements. In this case, up toa fixed fraction of the answers to the binary questions have been flipped, possibly in an adversarialmanner. For fixed dimension n , the faulty measurements contribute with an additional term in thebound for the recovery error that is proportional to the fraction of bit flips.The rest of this paper is organized as follows: After fixing some notation, the remainder ofSection 1 describes our one-bit phaseless measurement model in more detail; we explain how wegenerate random projections for each binary measurement, and how we approximately recover asignal based on such a binary measurement of it. In Section 2 we prove the error bound for ourpointwise recovery, Theorem 2.3.3. Lastly, in Section 3 we establish the uniform accuracy forrecovery, Theorem 3.3.1. Each of the main theorems in Sections 2 and 3 is followed by a corollary3hat provides error bounds in the presence of faulty measurements. Both error bounds are alsoillustrated with plots showing empirical data from reconstruction using (PEP) in R . Notation:
Since we are interested in both real and complex signals, we let F stand for either R or C , and define β = when F = R and β = 1 when F = C in order to simplify some expressionswhich depend on the underlying field. We consider only unit norm signals, and so denote the unitsphere in F d by S d − F . As mentioned previously, both our input signals and binary measurementcan be defined in terms of orthogonal projections, so we let Proj F ( k, d ) denote the space of rank- k orthogonal projections on F d . For a vector x ∈ S d − F , xx ∗ ∈ Proj F (1 , d ) is the rank-one projectiononto the span of x . We write (cid:107) x (cid:107) for the euclidean norm of a vector x ∈ F d and (cid:107) A (cid:107) for the operatornorm of a matrix A ∈ F d × d . Our measurements are constructed from qualitative information about the proximity of x ∈ S d − F to subspaces in F d . We formulate the measurements in terms of the orthogonal projections ontothese subspaces.For a projection P ∈ Proj F ( k, d ) , we define its associated binary question as the map ϕ P : S d − F → { , } given by ϕ P ( x ) = (cid:40) if (cid:107) P x (cid:107) ≥ kd else . (1)The choice of k/d as the cut-off value for quantization is natural since it is the average of x (cid:55)→ (cid:107) P x (cid:107) over all unit vectors. Equivalently, k/d is the average of P (cid:55)→ (cid:107) P x (cid:107) when x is afixed unit vector and P is chosen uniformly at random in Proj F ( k, d ) , as discussed further below inSection 1.2.These binary questions are in fact phaseless, since ϕ P ( x ) = ϕ P ( αx ) for any α ∈ F with | α | = 1 .Additionally, for any such α and any x ∈ S d − F we have αx ( αx ) ∗ = xx ∗ , and (cid:107) P x (cid:107) = tr [ P xx ∗ ] , sothese binary questions can be recast as maps on the set of rank-one orthogonal projections. In thisframework — thinking of input signals as rank-one projections — the binary question associatedto P is the map φ P : Proj F (1 , d ) → { , } defined by φ P ( X ) = (cid:40) if tr [ P X ] > kd else . (2)Reformulating ϕ P as φ P encapsulates the fact that the map ϕ P is constant on the set of unitvectors that differ from x by a unimodular multiplicative constant. Henceforth, we will use thislatter framework and speak of measuring and reconstructing rank-one orthogonal projections ratherthan unit vectors.The binary question φ P measures qualitative proximity information about the input signal. Forprojections P ∈ Proj F ( k, d ) and X ∈ Proj F (1 , d ) , tr [ P X ] = cos ( θ ) , where θ is the principal anglebetween the one-dimensional subspace Ran( X ) and the k -dimensional subspace Ran( P ) . Thus, φ P ( X ) = 1 if and only if Ran( X ) is closer to Ran( P ) than the average for a random one-dimensionalsubspace, and if this occurs we say P is proximal to X .4ur goal is to achieve accurate phase retrieval with the qualitative proximity information gainedfrom a sufficiently large set of these binary questions from projections { P j } mj =1 . For such a collection,we define a corresponding binary measurement map. Definition 1.1.1.
Given a sequence of orthogonal projections P = { P j } mj =1 on F d , the binarymeasurement map associated with P is Φ P : Proj(1 , n ) → { , } m defined by Φ P ( X ) := ( φ P j ( X )) mj =1 . (3) We also define the measurement Hamming distance (associated with P ) between X and Y to be d P ( X, Y ) := d H (Φ P ( X ) , Φ P ( Y )) (4) where d H denotes the normalized Hamming distance on { , } m . In other words: Φ P ( X ) is a binary vector where each one-bit entry encodes the proximity of X to a projection in P . The value d P ( X, Y ) gives the relative frequency of measurement projectionsthat separate X and Y , i.e. the number of binary questions in the measurement that yield differentanswers for X and Y as inputs. In the absence of an intuitive way to construct “optimal” collections of projections for our one-bit measurements, we instead consider projections chosen uniformly at random. The uniformprobability measure on
Proj F ( k, d ) is induced by the Haar measure of the unitary group U F ( d ) ,and is characterized by the property of being rotationally invariant, see [6]. In other words, if P isuniformly distributed in Proj F ( k, d ) then for any U ∈ U F ( d ) we have U P U ∗ (d) = P (where (d) = denotesequality in distribution).In practice, there are many equivalent ways to generate a uniformly distributed rank- k projec-tion. For example, one can take k Gaussian random vectors in F d and then form the projectiononto their span. A second way is to take a fixed rank- k projection and conjugate it by a Haardistributed random unitary U ∈ U F ( d ) . It can be helpful to think of a “uniformly distributed rank- k projection” as just a “projection onto a uniformly distributed k -dimensional subspace”.For most of the paper, we work with the binary measurement map associated to a collection P = { P j } mj =1 of independent uniformly distributed projections in Proj F ( n, n ) . The reason forusing half-dimensioned projections is because their associated one-bit measurements φ P have ageometrically intuitive meaning: for a fixed X ∈ Proj F (1 , n ) , φ P ( X ) = 1 if and only if tr [ P X ] > ≥ tr [( I − P ) X ] , i.e. the subspace Ran( X ) is closer to Ran( P ) than to its orthogonal complement Ran( I − P ) . A main goal of this paper is to use the outcomes of a random binary measurement to estimatethe input accurately. Suppose we have measured an unknown vector x ∈ S n − F with the binarymeasurement map Φ P associated with a random collection of projections P ⊂
Proj F ( n, n ) and5btained the binary vector Φ P ( xx ∗ ) . The information we gain from these measurements will not ingeneral completely determine the rank- projection X = xx ∗ corresponding to the input vector x ,but with enough measured quantities we can deduce a projection ˆ X which approximates X in somemetric. A consistent reconstruction would seek an element ˆ X in the feasible set, that is, the set ofall Y consistent with the binary measurement in the sense that Φ P ( Y ) = Φ P ( X ) [12]. A naturalerror bound for such a reconstruction strategy would then result from the diameter of the feasibleset, which intuitively will be small if P is suitably large.In this paper, we relax the perfect consistency condition, but still achieve approximate recoverywith a computationally feasible, semidefinite programming algorithm investigated in other works[21, Section 4.2]. The approximate recovery of X is conveniently described in terms of projectionsobtained from the binary measurement Φ P ( X ) . Definition 1.3.1.
Given X ∈ Proj F (1 , d ) and P ∈ Proj F ( k, d ) we define the proximally flippedprojection ˆ P ( X ) := (cid:40) P if tr [ P X ] ≥ kd I − P if tr [ P X ] < kd . (5) Next, for a sequence of orthogonal projections P = { P j } mj =1 , the empirical average of the proxi-mally flipped projections is ˆ Q P ( X ) := 1 m m (cid:88) j =1 ˆ P j ( X ) . (6)The recovery algorithm we study takes the binary measurement Φ P ( X ) and produces ˆ X thatsolves the semidefinite program maximize Y tr (cid:104) ˆ Q P ( X ) Y (cid:105) subject to Y (cid:23) , tr [ Y ] ≤ . (PEP)We call this the Principal Eigenspace Program (PEP) because it amounts to maximizing theRayleigh quotient [21, Section 4.2] for ˆ Q P ( X ) . This special class of semidefinite programs can beimplemented efficiently [25, Chapter 4].Since ˆ Q P ( X ) is a positive self-adjoint operator, it may be decomposed according to the spectraltheorem as a linear combination of mutually orthogonal rank-1 projections ˆ Q P ( X ) = (cid:80) ni =1 λ i E i ,where λ ≥ λ ≥ . . . ≥ λ n ≥ . Thus, any positive self-adjoint trace normalized operator withrange contained in the principal eigenspace of ˆ Q P ( X ) is a solution to (PEP). If in addition λ isstrictly larger than λ (which happens with probability 1 for our random measurement model),then its principal eigenspace is one-dimensional, and so ˆ X = E is the unique solution to (PEP).Proposition 2.1.2 will show that E (cid:104) ˆ Q P ( X ) (cid:105) = µ X + µ ( I − X ) with µ > µ , and so for large m we might expect ˆ X ≈ X by a measure concentration argument.Section 2 of this paper shows the following pointwise result: for any fixed X ∈ Proj(1 , n ) andany δ > , we can choose m large enough so that a collection of independent uniformly distributedhalf-dimensioned projections P = { P j } mj =1 will, with high probability, yield a measurement Φ P ( X ) for which the solution ˆ X to (PEP) satisfies (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ . See Theorem 2.3.3 for details.6uch of the effort in Section 3 is directed toward getting uniform results from the above point-wise one. The uniform result we derive says: for any δ > , we can choose m large enough so thata collection of independent uniformly distributed half-dimensioned projections P = { P j } mj =1 will,with high probability, yield measurements Φ P ( X ) for every X ∈ Proj(1 , n ) for which the solution ˆ X to (PEP) satisfies (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ . See Theorem 3.3.1 for details.According to the uniform result, we can generate a collection of projections for which every signalis approximately recoverable up to an error of δ from the one-bit questions using those projections.The pointwise result can be thought of as an averaged performance guarantee, whereas the uniformbound controls even the worst case input. We begin deriving results on the statistics of signal recovery using (PEP) and our one-bit phase-less measurement model by considering a fixed unit-norm input vector x ∈ F n while the binarymeasurement map Φ P is chosen randomly. As outlined before, we identify vectors that differ by aunimodular multiplicative constant, and when considering only unit-norm vectors as input signalswe represent these equivalence classes by rank-one projection matrices. The random binary mea-surement map is determined by a sequence of random projections P = { P j } mj =1 whose rank is halfthe dimension of the signal space, and provides information whether the input signal is closer tothe range of each projection or to its orthogonal complement. The main goal of this section is toprove that (PEP) provides accurate recovery of an input signal X ∈ Proj F (1 , n ) when sufficientlymany random projections are used for the binary measurement, i.e. when m is large enough. Thederivation of the results proceeds in three steps:(1) If the orthogonal projections for the measurement of X are chosen uniformly at random andproximally flipped, then their empirical average has the expectation Q ( X ) := E (cid:104) ˆ Q P ( X ) (cid:105) = µ X + µ ( I − X ) where < µ < µ are constants. In particular, X is the projection ontothe eigenspace corresponding to the largest eigenvalue of Q ( X ) .(2) The empirical average ˆ Q P ( X ) concentrates near its expectation Q ( X ) .(3) The eigenspace of ˆ Q P ( X ) corresponding to its largest eigenvalue concentrates near X . ˆ Q P ( X ) Before we can investigate the accuracy of (PEP), we need a simple fact about the distribution of theprincipal angle between a random n -dimensional subspace and a fixed one-dimensional subspace in F n . Lemma 2.1.1.
Let X ∈ Proj F (1 , n ) be fixed and P ∈ Proj F ( n, n ) be uniformly distributed. Then tr [ P X ] ∼ Beta( βn, βn ) , i.e. tr [ P X ] has probability density function p ( t ) = B ( βn, βn ) − [ t (1 − t )] βn − , (7)7 here B ( a, b ) = (cid:82) t a − (1 − t ) b − dt is the Beta function. In particular, E [tr [ P X ]] = and thedistribution of tr [ P X ] is symmetric about .Proof. Recall that if U ∈ U F (2 n ) is uniformly distributed and E is the orthogonal projection ontothe first n standard basis vectors, then U EU ∗ ( d ) = P . Thus tr [ P X ] ( d ) = tr [ U EU ∗ X ] = tr [ EU ∗ XU ] . Observe that U ∗ XU is a uniformly distributed rank-1 projection, which has the same distributionas uu ∗ where u ∈ S n − F is a uniformly distributed unit vector. Furthermore, u ( d ) = g (cid:107) g (cid:107) where g ∼ N (0 , I n ) is the standard Gaussian random vector in F n . So we have tr [ EU ∗ XU ] ( d ) = tr [ Euu ∗ ] = (cid:107) Eu (cid:107)
22 ( d ) = (cid:107) Eg (cid:107) (cid:107) g (cid:107) = (cid:80) nk =1 | g k | (cid:80) nk =1 | g k | . (8)If F = R , then the g k ’s are independent standard Gaussian random variables, so the right handside of equation (8) has the form AA + B where the random variables A, B ∼ χ ( n ) are independent.Thus, Equation (8) is a Beta (cid:0) n , n (cid:1) random variable.If F = C , then each g k = a k + ib k where all the a k and b k ’s are independent standard randomvariables. In this case, since | g k | = | a k | + | b k | , the right hand side of Equation 8 has the form AA + B where A, B ∼ χ (2 n ) are independent, and thus is a Beta( n, n ) random variable.Next we compute the expectation of the empirical average of the proximally flipped projections. Proposition 2.1.2.
Let X ∈ Proj F (1 , n ) and P = { P j } mj =1 be an independent sequence of uni-formly distributed projections in Proj F ( n, n ) , then Q ( X ) = µ X + µ ( I − X ) , (9) where µ = 12 + 1 βn βn B ( βn, βn ) , µ = 12 − βn (2 n − βn B ( βn, βn ) . (10) Proof.
We begin with some manipulation and reasoning that does not depend on whether F is R or C , which only makes a difference when computing the values of µ and µ .Since the P j ’s are identically distributed, we know that E (cid:104) ˆ P i ( X ) (cid:105) = E (cid:104) ˆ P j ( X ) (cid:105) for all i and j .Thus, by linearity of expectation we have Q ( X ) = E (cid:104) ˆ P ( X ) (cid:105) .Also, the distribution of ˆ P ( X ) is invariant under conjugation with a unitary that fixes X . Inother words, for a unitary U ∈ U F (2 n ) such that U XU ∗ = X , then U ˆ P ( X ) U ∗ ( d ) = ˆ P ( X ) . To verifythis, we use the rotational invariance of P and the cyclic property of the trace to obtain U ˆ P ( X ) U ∗ (d) = U (cid:92) ( U ∗ P U )( X ) U ∗ = ˆ P ( U XU ∗ ) = ˆ P ( X ) . Q ( X ) is also invariant under con-jugation by unitaries that fix X . This implies that every eigenspace of Q ( X ) is preserved underrotations by all such unitaries, hence Ran( X ) and Ran( X ) ⊥ are the eigenspaces of Q ( X ) . Letting µ and µ denote the respective eigenvalues, we write Q ( X ) = µ X + µ ( I − X ) . (11)In order to determine the value of µ , we use linearity of expectation to see µ = tr [ Q ( X ) X ] = E (cid:104) tr (cid:104) ˆ P ( X ) X (cid:105)(cid:105) . (12)By the law of total probability we have, E (cid:104) tr (cid:104) ˆ P ( X ) X (cid:105)(cid:105) = E (cid:20) tr (cid:104) ˆ P ( X ) X (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] ≥ (cid:21) P (cid:26) tr [ P X ] ≥ (cid:27) + E (cid:20) tr (cid:104) ˆ P ( X ) X (cid:105) (cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] < (cid:21) P (cid:26) tr [ P X ] < (cid:27) , so by the definition of ˆ P ( X ) and the symmetry of the distribution of tr [ P X ] — a consequence ofLemma 2.1.1 — it follows that E (cid:104) tr (cid:104) ˆ P ( X ) X (cid:105)(cid:105) = E (cid:20) tr [ P X ] (cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] ≥ (cid:21) . (13)We can compute this conditional expectation using integration by parts with the probabilitydensity function given in Lemma 2.1.1, yielding µ = 2 B ( βn, βn ) − (cid:90) [ t (1 − t )] βn − dt = 12 + 1 βn βn B ( βn, βn ) . (14)Since tr [ Q ( X )] = n by linearity of expectation, we know µ + (2 n − µ = n , from which we getthe desired expression for µ . ˆ Q P ( X ) near Q ( X ) Since the empirical average of the proximally flipped projections ˆ Q P ( X ) is, after all, an empiricalaverage, by the law of large numbers it should concentrate tightly around its expectation Q ( X ) asthe number of measurements m goes to infinity. To make this precise, we use the Matrix BernsteinInequality [31, Theorem 1.6.2]. Lemma 2.2.1.
Let X ∈ Proj F (1 , n ) and P = { P j } mj =1 be an independent sequence of uniformlydistributed projections in Proj F ( n, n ) . Then E (cid:104)(cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13)(cid:105) ≤ (cid:114) log(4 n )2 m + log(4 n )3 m , (15)9 nd for any t > , P (cid:110)(cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≥ t (cid:111) ≤ n exp (cid:18) − t m (cid:19) . (16) In particular, if m ≥ t − (log(4 n ) + D ) then P (cid:110)(cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≥ t (cid:111) ≤ exp ( − D ) . (17) Proof.
Let S j = m ( ˆ P j ( X ) − Q ( X )) . Then E [ S j ] = 0 and (cid:107) S j (cid:107) ≤ m for all j = 1 , . . . , m . Note that Z := (cid:80) mj =1 S j = ˆ Q P ( X ) − Q ( X ) . Additionally, since ˆ P j ( X ) is a projection and E (cid:104) ˆ P j ( X ) (cid:105) = Q ( X ) for all j , we may bound the matrix variance v ( Z ) := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j =1 E (cid:2) S j (cid:3)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 1 m (cid:13)(cid:13) Q ( X ) − Q ( X ) (cid:13)(cid:13) ≤ m . (18)The expectation bound and tail bound now follow from applying the Matrix Bernstein Inequalityas in [31, Theorem 1.6.2]. Additionally, if m ≥ t − (log(4 n ) + D ) then log(4 n ) − t m ≤ − D , whichyields (17). ˆ X near X (Pointwise Result) From Lemma 2.2.1 we know that, with enough measurement projections, with high probability ˆ Q P ( X ) is close to Q ( X ) in operator norm. When it is sufficiently close, then the eigenspace of ˆ Q P ( X ) corresponding to its maximum eigenvalue will also be close to X . To see this, we firstneed the following lemma. It is based on the fact that for two rank-one projections X and Y ,the difference X − Y is a zero-trace self-adjoint operator of rank two, and hence has a spectralrepresentation of the form X − Y = (cid:107) X − Y (cid:107) ( A − B ) with two mutually orthogonal rank-oneprojections A and B . Lemma 2.3.1.
Let
X, Y ∈ Proj F (1 , n ) . Then (cid:107) X − Y (cid:107) = ( µ − µ ) − tr [ Q ( X )( A − B )] (19) where A, B ∈ Proj F (1 , n ) are the mutually orthogonal projections in the spectral decomposition X − Y = (cid:107) X − Y (cid:107) ( A − B ) .Proof. Let θ be the principal angle between the subspaces associated to X and Y . Then we canpick x, y, z ∈ S n − F with x ⊥ z such that X = xx ∗ , Y = yy ∗ and y = cos( θ ) x + sin( θ ) z . Then Y = yy ∗ = cos ( θ ) xx ∗ + sin ( θ ) zz ∗ + sin( θ ) cos( θ )( xz ∗ + zx ∗ ) . Since Q ( X ) = µ X + µ ( I − X ) , tr [ Q ( X )( X − Y )] = µ − cos ( θ ) µ − sin ( θ ) µ = ( µ − µ ) sin ( θ ) . Lastly, since sin( θ ) = (cid:107) X − Y (cid:107) , rewriting the left hand side using the spectral decomposition X − Y = (cid:107) X − Y (cid:107) ( A − B ) and cancelling the common factor of (cid:107) X − Y (cid:107) yields the desiredequality. 10he spectral gap µ − µ of Q ( X ) appears in the sufficient number of binary questions in bothour pointwise and uniform result. The following lemma bounds this quantity in in terms of thedimension n . Lemma 2.3.2.
Let µ and µ be as in Proposition 2.1.2. Then for βn ≥ n − √ βn − √ πβn (2 n − ≤ µ − µ ≤ n − √ βn − e √ πβn (2 n − (20) In particular, ( µ − µ ) − = O ( √ n ) .Proof. From the expressions derived in Proposition 2.1.2 we have µ − µ = 2( n − βn (2 n − βn B ( βn, βn ) . (21)Since B ( α , α ) = Γ( α )Γ( α )Γ( α + α ) , we may use Stirling’s formula to approximate the Beta function. Inparticular, from [29] we have for all real numbers k ≥ (cid:18) k − (cid:19) ≤ Γ( k ) √ π ( k − k − exp ( − ( k − ≤ exp (cid:18) k − (cid:19) . (22)In particular, when βn ≥ these inequalities for the Gamma function yield the bounds e √ π · βn √ βn − ≤ B ( βn, βn ) ≤ √ π βn √ βn − . (23)Using these bounds for the Beta function in (21) gives the desired inequalities for µ − µ .Now we have the tools to prove the pointwise error bound for approximate recovery of a fixedinput signal using (PEP). Theorem 2.3.3.
Let X ∈ Proj F (1 , n ) and δ > be fixed. If m ≥
143 ( µ − µ ) − δ − (log(4 n ) + D ) , (24) and P = { P j } mj =1 is an independent sequence of uniformly distributed projections in Proj F ( n, n ) ,then with probability at least − exp ( − D ) (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ, (25) where ˆ X is the solution to (PEP) with input Φ P ( X ) .Proof. From Lemma 2.3.1, we know that (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) = ( µ − µ ) − tr [ Q ( X )( A − B )] , (26)11 Number of measurement projections (m) -3 -2 -1 A cc u r a cy o f e s t i m a t e () Figure 1: Plot showing the accuracy of recovery using (PEP) for a fixed input and 7200 independentcollections of up to measurement projections on R . The single line separate from the clusterrepresents the upper bound on δ given by Theorem 2.3.3.where A, B ∈ Proj F (1 , n ) are the orthogonal projections from the spectral decomposition of thedifference X − ˆ X = (cid:13)(cid:13)(cid:13) X − ˆ X (cid:13)(cid:13)(cid:13) ( A − B ) .Since ˆ X is the projection onto the principal eigenspace of ˆ Q P ( X ) , we see tr (cid:104) ˆ Q P ( X )( ˆ X − X ) (cid:105) ≥ ⇒ ( µ − µ ) − tr (cid:104) ˆ Q P ( X )( B − A ) (cid:105) ≥ , (27)and so (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) ≤ ( µ − µ ) − tr (cid:104) ( Q ( X ) − ˆ Q ( X ))( A − B ) (cid:105) ≤ µ − µ ) − (cid:13)(cid:13)(cid:13) Q ( X ) − ˆ Q ( X ) (cid:13)(cid:13)(cid:13) . (28)We have chosen m such that m ≥ t − (log(4 n ) + D ) for t = ( µ − µ ) δ , so the tail bound inLemma 2.2.1 says with probability at least − exp ( − D ) we have (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) < t . If thisoccurs, then from (28) we see (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) ≤ µ − µ ) − (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) < δ. (29)See Figure 1 for a plot showing how our bound on the sufficient number of measurements toachieve an accuracy of δ relates to experimental results.Our proof lets us fine tune the probability of successful recovery by adjusting the value of D in(25). By increasing D we increase the probability of success, but also increase the sufficient numberof measurements. In particular, we can take D = α log( n ) to ensure success with high probability,i.e. the failure rate decays on the order of n − α . To do so, we gain a constant factor that dependson α in the number of sufficient measurement projections m .12 orollary 2.3.4. Let X ∈ Proj F (1 , n ) and δ > be fixed. If α > , m ≥ C α δ − n log( n ) , and P = { P j } mj =1 is an independent sequence of uniformly distributed projections in Proj F ( n, n ) , thenwith probability at least − n − α (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ, (30) where ˆ X is the solution to (PEP) with input Φ P ( X ) and C is a constant that only depends on α . We can also take D = n in (25) to ensure success with overwhelming probability, i.e. the failurerate decays on the order of exp ( − n ) . In this case, we gain an additional factor of n in the numberof sufficient measurement projections m . Corollary 2.3.5.
Let X ∈ Proj F (1 , n ) and δ > be fixed. If m ≥ Cδ − n log( n ) and P = { P j } mj =1 is an independent sequence of uniformly distributed projections in Proj F ( n, n ) , then with probabilityat least − exp ( − n ) (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ, (31) where ˆ X is the solution to (PEP) with input Φ P ( X ) and C is a constant. The pointwise accuracy guarantee of Theorem 2.3.3 can also be thought of as an “average case”error bound with respect to the random sequence of measurement projections P . The followingcorollary makes this explicit. Corollary 2.3.6.
Let X ∈ Proj F (1 , n ) and δ > be fixed. If m ≥ Cδ − n log( n ) and P = { P j } mj =1 is an independent sequence of uniformly distributed projections in Proj F ( n, n ) , then E (cid:104)(cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13)(cid:105) < δ, (32) where ˆ X is the solution to (PEP) with input Φ P ( X ) and C is a constant.Proof. Take the expectation on both sides of (28) with respect to the random sequence of projections P and use the expectation bound from Lemma 2.2.1.We also remark that recovery using (PEP) is robust to bit-flip errors in the binary measurement,which can be seen via a small addition to the proof of Theorem 2.3.3. To this end, we considera faulty measurement ˜Φ P with the property that the normalized Hamming distance between thefaulty and correct measurements is bounded by a fixed fraction d H ( ˜Φ P ( X ) , Φ P ( X )) ≤ τ . Corollary 2.3.7.
Let X , δ , m , and { P j } mj =1 be as in Theorem 2.3.3, and fix < τ < . Then withprobability at least − exp ( − D ) , for all ˜Φ P ( X ) ∈ { , } m such that d H (Φ P ( X ) , ˜Φ P ( X )) ≤ τ, (33) we have (cid:13)(cid:13)(cid:13) ˜ X − X (cid:13)(cid:13)(cid:13) ≤ δ + 2( µ − µ ) − τ, (34) where ˜ X denote the solution to (PEP) with input ˜Φ P ( X ) and µ − µ is controlled by Lemma 2.3.2. roof. Let ˜ Q P ( X ) denote the empirical average of the (faulty) flipped projections, i.e. flipped using ˜Φ P ( X ) rather than Φ P ( X ) . Then as before, we have (cid:13)(cid:13)(cid:13) ˜ X − X (cid:13)(cid:13)(cid:13) ≤ µ − µ ) − (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) . (35)By the triangle inequality, it follows that (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − ˆ Q P ( X ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) . (36)Since the normalized Hamming distance between Φ P ( X ) and ˜Φ P ( X ) is bounded by τ , we see (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − ˆ Q P ( X ) (cid:13)(cid:13)(cid:13) ≤ τ. (37)Since (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ δ with probability at least − exp ( − D ) by the same proof as in Theo-rem 2.3.3, the result follows.We expect that a deeper analysis will reveal a better dependence on the error rate, or perhapseliminate the dimension dependent factor ( µ − µ ) − . In this section we extend the result from Theorem 2.3.3 to show that the recovery error using (PEP)is small uniformly across all input vectors X ∈ Proj F (1 , n ) for a single random binary measurement Φ P . Our strategy consists of the following steps:(1) Using sufficiently many random projections, ˆ Q P ( X ) concentrates near Q ( X ) for all X in an (cid:15) -net of Proj F (1 , n ) .(2) With high probability the measurement Hamming distance between a pair X, Y ∈ Proj F (1 , n ) is not much larger than (cid:107) X − Y (cid:107) , uniformly for all such pairs.(3) The eigenspace of ˆ Q P ( X ) corresponding to its largest eigenvalue concentrates near X uni-formly for all X ∈ Proj F (1 , n ) . ˆ Q P ( X ) near Q ( X ) uniformly on a net First, we show an inequality relating the Euclidean distance between unit vectors to the operatornorm distance between their associated rank-one projections.
Lemma 3.1.1.
Let d ∈ N . Then for all x, y ∈ S d − F , (cid:107) xx ∗ − yy ∗ (cid:107) ≤ (cid:107) x − y (cid:107) . (38)14 roof. Let θ be the principal angle between the subspaces associated to xx ∗ and yy ∗ , and recall (cid:107) xx ∗ − yy ∗ (cid:107) = sin( θ ) . Thus (cid:107) x − y (cid:107) = (cid:104) x − y, x − y (cid:105) = 2 − (cid:60)(cid:104) x, y (cid:105) ≥ − |(cid:104) x, y (cid:105)| = 2 − θ ) . Since θ ∈ [0 , π ] we know ≤ cos( θ ) ≤ and so − θ ) = 2(1 − cos) ≥ (1 + cos( θ ))(1 − cos( θ )) = sin ( θ ) = (cid:107) xx ∗ − yy ∗ (cid:107) . Next, we use Lemma 3.1.1 to prove the existence of (cid:15) -nets of
Proj F (1 , n ) with explicit cardinalitybounds. This follows from the analogous results for (cid:15) -nets of S n − F . Lemma 3.1.2.
For any (cid:15) > , there exists an (cid:15) -net N (cid:15) for Proj F (1 , n ) with respect to the operatornorm with cardinality satisfying log |N (cid:15) | ≤ βn log(1 + 2 (cid:15) − ) . (39) Proof.
By the standard volume bound for the covering number of the sphere in real euclideanspace [9], and the fact that S n − C is naturally isometric to S n − R , for every (cid:15) > there exists an (cid:15) -net N (cid:48) (cid:15) for S n − F (with respect to the Euclidean distance) with cardinality satisfying |N (cid:48) (cid:15) | ≤ (cid:18) (cid:15) (cid:19) βn . By Lemma 3.1.1, N (cid:15) := { xx ∗ : x ∈ N (cid:48) (cid:15) } is an (cid:15) -net for Proj F (1 , n ) with the desired cardinalitybound.Now that we have existence of epsilon-nets with control on their cardinality, we use a unionbound and Lemma 2.2.1 to show that with sufficiently many measurements, ˆ Q P ( X ) concentratesnear Q ( X ) uniformly for all X in an epsilon-net of Proj F (1 , n ) . Lemma 3.1.3.
Let (cid:15) > and N (cid:15) be an (cid:15) -net of Proj F (1 , n ) such that log |N (cid:15) | ≤ βn log(1 + 2 (cid:15) − ) .Also, let δ > , m ≥ δ − (cid:2) log(4 n ) + 4 βn log(1 + 2 (cid:15) − ) + D (cid:3) , and P = { P j } mj =1 be an independentsequence of uniformly distributed projections in Proj F ( n, n ) . Then with probability at least − exp ( − D ) we have (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ δ (40) for all X ∈ N (cid:15) .Proof. By Lemma 2.2.1 and our assumption on m , for each X ∈ N (cid:15) we know P (cid:110)(cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≥ δ (cid:111) ≤ exp (cid:0) − βn log(1 + 2 (cid:15) − ) − D (cid:1) . By taking a union bound over all X ∈ N (cid:15) it follows that P (cid:110)(cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ t for all X ∈ N (cid:15) (cid:111) ≥ − |N (cid:15) | exp (cid:0) − βn log(1 + 2 (cid:15) − ) − D (cid:1) . The claim follows from our upper bound on |N (cid:15) | .15 .2 Relation between the measurement Hamming distance and operatornorm distance The main goal of this section is to prove our guarantee for uniformly accurate recovery, Theo-rem 3.2.10: With sufficiently many measurements, with high probability the measurement Ham-ming distance between any pair
X, Y ∈ Proj F (1 , n ) is not much larger than the operator norm oftheir difference. It is relatively simple to show that this happens for fixed X and Y , but showingthat it holds uniformly for all such pairs requires more complicated techniques. To this end, we willdefine the t -soft Hamming distance similarly as in Plan and Vershynin’s Dimension reduction byrandom hyperplane tessellations [28]. We establish a continuity property and concentration resultsfor the t -soft Hamming distance, which allow us to show uniform concentration of the measurementHamming distance near its expected value over all of Proj F (1 , n ) . We then show that E [ d P ( X, Y )] can be bounded in terms of (cid:107) X − Y (cid:107) , after which Theorem . . follows. t -soft Hamming distance and its continuity properties For any
X, Y ∈ Proj F (1 , n ) let S X,Y := { P ∈ Proj F ( n, n ) : φ P ( X ) (cid:54) = φ P ( Y ) } , i.e. the set ofprojections that yield different measurements of X and Y . If P ∈ S X,Y , then we say that P separates X and Y . For a sequence P = { P j } mj =1 ⊂ Proj F ( n, n ) , notice that d P ( X, Y ) = m (cid:80) mj =1 S X,Y ( P j ) .With this expression for the measurement Hamming distance in mind, we define S tX,Y := { P ∈ Proj F ( n, n ) : tr [ P X ] + t < ≤ tr [ P Y ] − t } (41) ∪ { P ∈ Proj F ( n, n ) : tr [ P Y ] + t < ≤ tr [ P X ] − t } for all t ∈ R , and if P ∈ S tX,Y then we say P t -separates X and Y . Definition 3.2.1.
Given a sequence of orthogonal projections P = { P j } mj =1 in Proj F ( n, n ) and t ∈ R , we define the t -soft Hamming distance between input projections X, Y ∈ Proj F (1 , n ) to be d t P ( X, Y ) := 1 m m (cid:88) j =1 S tX,Y ( P j ) . (42)Ultimately we want to prove uniform results for the measurement Hamming distance, but itsdiscontinuity causes problems with standard (cid:15) -net arguments. The t -soft Hamming distance helpsus work around this discontinuity, where the parameter t determines how strict the criteria shouldbe for determining if the measurements of two vectors are different. This is reflected in the factthat for t ≤ ≤ t we have S t X,Y ⊂ S
X,Y ⊂ S t X,Y .The addition of this extra parameter lets us formulate a type of continuity for d t P ( X, Y ) whereboth t and the projections X and Y are allowed to vary. If we want to perturb the projections X, Y by a small amount in operator norm, then we can make up for it by slightly increasing/decreasingthe parameter t . Proposition 3.2.2.
Let P = { P j } mj =1 be a sequence of projections in Proj F ( n, n ) , t ∈ R , (cid:15) > ,and X , Y , X, Y ∈ Proj F (1 , n ) such that (cid:107) X − X (cid:107) < (cid:15) and (cid:107) Y − Y (cid:107) < (cid:15) . Then d t + (cid:15) P ( X, Y ) ≤ d t P ( X , Y ) ≤ d t − (cid:15) P ( X, Y ) (43)16 roof. Suppose P ∈ S t + (cid:15)X,Y . Then, without loss of generality, we may assume that tr [ P Y ] + t + (cid:15) < < tr [ P X ] − t − (cid:15). Since P is a projection we have | tr [ P ( Y − Y )] | ≤ (cid:107) Y − Y (cid:107) < (cid:15) , so tr [ P Y ] + t = tr [ P Y ] − tr [ P ( Y − Y )] + t ≤ tr [ P Y ] + t + (cid:15) < and also tr [ P X ] − t = tr [ P X ] − tr [ P ( X − X )] − t ≥ tr [ P X ] − t − (cid:15) > . Thus S t + (cid:15)X,Y ⊂ S tX ,Y , and so d t + (cid:15) P ( X, Y ) ≤ d t P ( X , Y ) .The second inequality follows from above by swapping the roles of X, Y with X , Y and replacing t with t − (cid:15) . t -soft Hamming distance In this section, we state a basic concentration result for for the t -soft Hamming distance betweentwo fixed vectors, and then extend it to a uniform result over an (cid:15) -net. Lemma 3.2.3.
Let P = { P j } mj =1 be an independent sequence of uniformly distributed projectionsin Proj F ( n, n ) , t ∈ R , δ > , and X, Y ∈ Proj F (1 , n ) be fixed. Then P (cid:8)(cid:12)(cid:12) d t P ( X, Y ) − E (cid:2) d t P ( X, Y ) (cid:3)(cid:12)(cid:12) > δ (cid:9) ≤ (cid:0) − δ m (cid:1) . (44) Proof.
From the way that we defined the t -soft Hamming distance, m · d t P ( X, Y ) ∼ Bin ( m, p ) where p = E [ d t P ( X, Y )] . The result then follows from a standard Chernoff bound for binomial randomvariables (see [4]).We can now use Proposition 3.2.3 and the bounds on the size of (cid:15) -nets of Proj F (1 , n ) fromLemma 3.1.2 to take a union bound. The result is a bound for the probability that the t -softHamming distance is close to its expectation for all pairs of projections in an (cid:15) -net simultaneously. Proposition 3.2.4.
Let (cid:15) > and N (cid:15) be an (cid:15) -net of Proj F (1 , n ) such that log |N (cid:15) | ≤ βn log(1 +2 (cid:15) − ) . Also, let t ∈ R , δ > , m ≥ δ − (cid:0) βn log(1 + 2 (cid:15) − ) + D (cid:1) , and P = { P j } mj =1 be anindependent sequence of uniformly distributed projections in Proj F ( n, n ) . Then with probability atleast − exp ( − D ) we have (cid:12)(cid:12) d t P ( X, Y ) − E (cid:2) d t P ( X, Y ) (cid:3)(cid:12)(cid:12) ≤ δ (45) for all X, Y ∈ N (cid:15) .Proof.
By Proposition 3.2.3 and taking a union bound over all (cid:0) |N (cid:15) | (cid:1) ≤ |N (cid:15) | pairs in N (cid:15) × N (cid:15) ,we have that P (cid:8)(cid:12)(cid:12) d t P ( X, Y ) − E (cid:2) d t P ( X, Y ) (cid:3)(cid:12)(cid:12) ≤ δ, ∀ ( X, Y ) ∈ N (cid:15) × N (cid:15) (cid:9) ≥ − |N (cid:15) | exp (cid:0) − δ m (cid:1) . | N (cid:15) | and our assumption about m we have | N (cid:15) | exp (cid:0) − δ m (cid:1) ≤ exp (cid:0) βn log(1 + 2 (cid:15) − ) − δ m (cid:1) = exp ( − D ) . The following proposition addresses how varying t affects the expected difference of the t -softHamming distance from the measurement Hamming distance. Proposition 3.2.5.
Let P = { P j } mj =1 be an independent sequence of uniformly distributed projec-tions in Proj F ( n, n ) , t ∈ R , and X, Y ∈ Proj(1 , n ) be fixed. Then (cid:12)(cid:12) E (cid:2) d t P ( X, Y ) − d P ( X, Y ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12) P (cid:8) P ∈ S tX,Y (cid:9) − P { P ∈ S X,Y } (cid:12)(cid:12) ≤ √ βn − e √ π | t | . (46) Proof.
Because the t -soft and regular Hamming distances are linear combinations of indicator func-tions, and the fact that the P j are i.i.d., we have (cid:12)(cid:12) E (cid:2) d t P ( X, Y ) − d P ( X, Y ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E (cid:104) S tX,Y ( P ) − S X,Y ( P ) (cid:105)(cid:12)(cid:12)(cid:12) , and by Jensen’s inequality it follows that (cid:12)(cid:12)(cid:12) E (cid:104) S tX,Y ( P ) − S X,Y ( P ) (cid:105)(cid:12)(cid:12)(cid:12) ≤ E (cid:104)(cid:12)(cid:12)(cid:12) S tX,Y ( P ) − S X,Y ( P ) (cid:12)(cid:12)(cid:12)(cid:105) = P (cid:8) P ∈ S tX,Y (cid:52)S X,Y (cid:9) . (47)We break up this symmetric difference into two disjoint pieces P (cid:8) P ∈ S tX,Y (cid:52)S X,Y (cid:9) = P (cid:8) P ∈ S tX,Y \ S X,Y (cid:9) + P (cid:8) P ∈ S X,Y \ S tX,Y (cid:9) and look at two cases. First, if t > then S tX,Y \ S X,Y is empty, and S X,Y \ S tX,Y ⊂ (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] − (cid:12)(cid:12)(cid:12)(cid:12) < t (cid:27) (cid:91) (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P Y ] − (cid:12)(cid:12)(cid:12)(cid:12) < t (cid:27) . Similarly, if t < then S X,Y \ S tX,Y is empty and again S tX,Y \ S X,Y ⊂ (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] − (cid:12)(cid:12)(cid:12)(cid:12) < − t (cid:27) (cid:91) (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P Y ] − (cid:12)(cid:12)(cid:12)(cid:12) < − t (cid:27) , Since tr [ P X ] ( d ) = tr [ P Y ] , in both cases we have P (cid:8) P ∈ S tX,Y (cid:52)S X,Y (cid:9) ≤ P (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] − (cid:12)(cid:12)(cid:12)(cid:12) < | t | (cid:27) . (48)By Lemma 2.1.1 we know tr [ P X ] ∼ Beta( βn, βn ) , and so we can bound this probability using18he the probability density function of the beta distribution. To begin with, we see P (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] − (cid:12)(cid:12)(cid:12)(cid:12) < | t | (cid:27) = 2 B ( βn, βn ) (cid:90) + | t | x βn − (1 − x ) βn − dx (49) = 2 B ( βn, βn ) (cid:90) | t | (cid:18) − x (cid:19) βn − dx ≤ B ( βn, βn ) (cid:90) | t | − βn dx = 8 | t | βn B ( βn, βn ) . Using the lower bound for the Beta function in (23) then yields P (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) tr [ P X ] − (cid:12)(cid:12)(cid:12)(cid:12) < | t | (cid:27) ≤ √ βn − e √ π | t | . (50)The result follows from combining equation (47) with inequalities (48) and (50). We now have all the tools we need to prove that with sufficiently many measurements the Hammingdistance concentrates near its expected value for all pairs in
Proj F (1 , n ) . Theorem 3.2.6.
Let δ > , m ≥ δ − (cid:16) βn log (cid:16) √ βn − √ π δ − (cid:17) + log(2) + D (cid:17) , and P = { P j } mj =1 be a collection of independent uniformly distributed projections in Proj F ( n, n ) . Then withprobability at least − exp ( − D ) we have | d P ( X, Y ) − E [ d P ( X, Y )] | < δ (51) for all X, Y ∈ Proj F (1 , n ) .Proof. Let (cid:15) = e √ π √ βn − δ and let N (cid:15) be an (cid:15) -net of Proj F (1 , n ) with log |N (cid:15) | ≤ βn log(1 + 2 (cid:15) − ) as in Lemma 3.1.2. By our assumption on m , Proposition 3.2.4 says that P (cid:26) | d (cid:15) P ( X, Y ) − E [ d (cid:15) P ( X, Y )] | > δ for some X, Y ∈ N (cid:15) (cid:27) ≤ exp ( − log(2) − D ) and also P (cid:26)(cid:12)(cid:12) d − (cid:15) P ( X, Y ) − E (cid:2) d − (cid:15) P ( X, Y ) (cid:3)(cid:12)(cid:12) > δ for some X, Y ∈ N (cid:15) (cid:27) ≤ exp ( − log(2) − D ) , and so with probability at least − exp ( − D ) we have (cid:12)(cid:12) d ± (cid:15) P ( X, Y ) − E (cid:2) d ± (cid:15) P ( X, Y ) (cid:3)(cid:12)(cid:12) ≤ δ for all X, Y ∈ N (cid:15) (call this event A ) . 19uppose that A occurs. Consider an arbitrary pair X, Y ∈ Proj(1 , n ) and let X , Y ∈ N (cid:15) suchthat (cid:107) X − X (cid:107) < (cid:15) and (cid:107) Y − Y (cid:107) < (cid:15) . By Proposition 3.2.2 we know that d P ( X, Y ) ≤ d (cid:15) P ( X , Y ) ≤ d (cid:15) P ( X, Y ) . These inequalities together with A holding imply d P ( X, Y ) ≤ d (cid:15) P ( X , Y ) ≤ E [ d (cid:15) P ( X , Y )] + δ ≤ E (cid:2) d (cid:15) P ( X, Y ) (cid:3) + δ . (52)By Proposition 3.2.5 we have (cid:12)(cid:12) E (cid:2) d (cid:15) P ( X, Y ) (cid:3) − E [ d P ( X, Y )] (cid:12)(cid:12) ≤ √ βn − e √ π | (cid:15) | = δ , hence d P ( X, Y ) ≤ E [ d P ( X, Y )] + δ. (53)Similarly, using Proposition 3.2.2 again shows that d P ( X, Y ) ≥ d − (cid:15) P ( X , Y ) ≥ d − (cid:15) P ( X, Y ) , andsince A holds we have d P ( X, Y ) ≥ d − (cid:15) P ( X , Y ) ≥ E (cid:2) d − (cid:15) P ( X , Y ) (cid:3) − δ ≥ E (cid:2) d − (cid:15) P ( X, Y ) (cid:3) − δ . (54)Using Proposition 3.2.5 as above but for t = − (cid:15) yields d P ( X, Y ) ≥ E [ d P ( X, Y )] − δ. (55)We have just shown that when the measurement projections are chosen uniformly and indepen-dently, then d P ( X, Y ) concentrates near E [ d P ( X, Y )] = P { P ∈ S X,Y } for all X, Y ∈ Proj F (1 , n ) ,where P is a single uniformly distributed projection in Proj F ( n, n ) . When n = 1 , then P { P ∈ S X,Y } = π θ ≤ sin( θ ) = (cid:107) X − Y (cid:107) , where θ is the principal angle between Ran( X ) and Ran( Y ) . In the re-mainder of section, we show that this upper bound holds for arbitrary n , see Proposition 3.2.9. Toachieve this, we need to investigate the joint distribution of (tr [ P X ] , tr [ P Y ]) .By rotational invariance of the distribution of P we may assume that Ran( X ) and Ran( Y ) are in the two-dimensional subspace spanned by e and e , the first two standard basis vectors.Viewed as matrices, this means that all entries of X and Y are zero outside of the top-left × submatrix. Furthermore, if ˜ P , ˜ X, and ˜ Y are the top-left × submatrices of their respective matricesthen (tr [ P X ] , tr [ P Y ]) = (tr (cid:104) ˜ P ˜ X (cid:105) , tr (cid:104) ˜ P ˜ Y (cid:105) ) . We study the joint distribution of (tr [ P X ] , tr [ P Y ]) through the submatrix ˜ P acting on F .Since P is Hermitian, so is ˜ P . Thus we may write ˜ P = λ E + λ E where λ ≥ λ arethe eigenvalues of ˜ P and E ⊥ E are the projections onto their corresponding eigenspaces. Wewrite λ ( ˜ P ) := ( λ , λ ) , and E ( ˜ P ) := ( E , E ) . By the rotational invariance of P , E is uniformlydistributed in Proj (1 , and E = I − E since Hermitian matrices have mutually orthogonaleigenspaces. Note also that λ ( ˜ P ) and E ( ˜ P ) are independent of each other. The distribution of λ ( ˜ P ) is given in the following lemma. Lemma 3.2.7.
Let n ≥ and P ∈ Proj F ( n, n ) be uniformly distributed. Then λ ( ˜ P ) has probabilitydensity function p n on D := { ( x, y ) ∈ [0 , : y ≤ x } defined by p n ( x, y ) := M − n ( x − y ) β [ x (1 − x ) y (1 − y )] β ( n − − , (56)20 ith the normalization constant M n = n − B ( n − , n − if F = R n − B ( n − , n − if F = C . (57) Proof.
The probability density functions are given by [5, Proposition 4.1.4] with p = 2 , q = 2 n − , r = n − and s = n − . It only remains to compute the normalization constants M n .Suppose F = R . Then p n ( x, y ) = M − n ( x − y ) [ x (1 − x ) y (1 − y )] n − . Define the functions f n ( x, y ) = − n − x (1 − x )] n − [ y (1 − y )] n − (58) g n ( x, y ) = − n − x (1 − x )] n − [ y (1 − y )] n − . (59)With these definitions, we have p n = M − n ( ∂g n ∂x − ∂f n ∂y ) on D . So by Green’s theorem, (cid:90) (cid:90) D p n ( x, y ) dxdy = M − n (cid:73) ∂ D f n dx + g n dy, (60)where ∂ D is the boundary of D . Note that f n and g n both vanish on the boundary of D exceptfor the diagonal ∆ := { ( x, y ) ∈ D : x = y } , so we only need to compute the line integral over ∆ .Parameterizing ∆ by x ( t ) = y ( t ) = 1 − t for t ∈ [0 , , we see M n = (cid:73) ∂ D f n dx + g n dy = − (cid:90) [ f n ( x ( t ) , y ( t )) + g n ( x ( t ) , y ( t ))] dt (61) = 2 n − (cid:90) t n − (1 − t ) n − dt = 2 n − B ( n − , n − . Next, we consider the case when F = C . Then p n ( x, y ) = M − n ( x − y ) [ x (1 − x ) y (1 − y )] n − .By symmetry, (cid:82)(cid:82) [0 , p n ( x, y ) dxdy , so by expanding this integral and facts about the Betadistribution, we see (cid:90) (cid:90) [0 , p n ( x, y ) dxdy = M − n var( b ) · B ( n − , n − , (62)where b ∼ Beta( n − , n − . This beta-distributed random variable has variance var( b ) = n − ,which determines M n .Let D Sep := { ( x, y ) ∈ D : y < < x } . Then λ ( ˜ P ) ∈ D Sep if and only if there exist projections
A, B ∈ Proj F (1 , such that ˜ P ∈ S A,B . This is true because λ = max A (cid:48) ∈ Proj(1 , tr [ P A (cid:48) ] and λ = max B (cid:48) ∈ Proj(1 , n ) tr [ P B (cid:48) ] . In particular, P ∈ S X,Y requires λ ( ˜ P ) ∈ D Sep . For this reason, wecompute the probability that λ ( ˜ P ) ∈ D Sep . 21 emma 3.2.8.
Let n ≥ , and P ∈ Proj F ( n, n ) be uniformly distributed, then P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) = B ( n − , n − ) n B ( n − ,n − → √ if F = R + n − n − n − B ( n − ,n − → + π if F = C . (63) Proof.
First, suppose F = R , so p n ( x, y ) = M − n ( x − y ) [ x (1 − x ) y (1 − y )] n − . Then, P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) = M − n (cid:90) (cid:90) ( x − y ) [ x (1 − x ) y (1 − y )] n − dxdy. (64)By linearity and Fubini’s theorem, we get (cid:90) (cid:90) ( x − y ) [ x (1 − x ) y (1 − y )] n − dxdy = 14 (cid:20) E (cid:20) b | b ≥ (cid:21) − E (cid:20) b | b ≤ (cid:21)(cid:21) B (cid:18) n − , n − (cid:19) , where b ∼ Beta (cid:0) n − , n − (cid:1) . Calculating these conditional expectations we get E (cid:20) b | b ≥ (cid:21) − E (cid:20) b | b ≤ (cid:21) = 1( n − n − B (cid:0) n − , n − (cid:1) , and combining this with Lemma 3.2.7 yields P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) = B (cid:0) n − , n − (cid:1) n B ( n − , n − . (65)Next, suppose F = C , so p n ( x, y ) = M − n ( x − y ) [ x (1 − x ) y (1 − y )] n − . Then, P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) = M − n (cid:90) (cid:90) ( x − y ) [ x (1 − x ) y (1 − y )] n − dxdy. (66)Expanding ( x − y ) and rewriting integrals in terms of expectations of beta-distributed randomvariables, we see (cid:90) (cid:90) ( x − y ) [ x (1 − x ) y (1 − y )] n − dxdy = 12 (cid:20) E (cid:2) b (cid:3) − E (cid:20) b | b ≥ (cid:21) · E (cid:20) b | b ≤ (cid:21)(cid:21) B ( n − , n − , (67)where b ∼ Beta( n − , n − . We know that E (cid:2) b (cid:3) = n n − = + n − , and also E (cid:20) b | b ≥ (cid:21) · E (cid:20) b | b ≤ (cid:21) = (cid:18)
12 + 1( n − n − B ( n − , n − (cid:19) (cid:18) − n − n − B ( n − , n − (cid:19) = 14 − n − n − B ( n − , n − . Putting this all together yields P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) = 12 + 8 n − n − n − B ( n − , n − . (68)22he asymptotic limit of P (cid:110) λ ( ˜ P ) (cid:111) as n → ∞ follows from Stirling’s approximation as in (23),see [29].Now we are prepared to bound P { P ∈ S X,Y } in terms of the operator norm distance (cid:107) X − Y (cid:107) . Proposition 3.2.9.
Let P ∈ Proj F ( n, n ) be uniformly distributed, then P { P ∈ S X,Y } ≤ (cid:107) X − Y (cid:107) . (69) Proof.
The case when n = 1 is simple and was mentioned previously, so we consider here n ≥ .Further, without loss of generality, assume Ran( X ) , Ran( Y ) ⊂ Ran( E ) where E is the orthogonalprojection onto span { e , e } . By conditioning, P { P ∈ S X,Y } = E (cid:104) P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111)(cid:105) . By thedefinition of D Sep we see that P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) = 0 if λ ( ˜ P ) ∈ D c Sep . Hence E (cid:104) P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111)(cid:105) = E (cid:104) P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) D Sep ( λ ( ˜ P )) (cid:105) . (70)Suppose now that λ ( ˜ P ) ∈ D Sep , and first consider the case when F = R . Then Proj R (1 , canbe viewed as S R with its opposite points identified, and E ( ˜ P ) is a (uniformly distributed) randompair of antipodal points in this quotient space. Letting E = v v ∗ and E = v v ∗ where v and v are normalized eigenvectors corresponding to eigenvalues λ and λ of ˜ P , we may parameterize Proj R (1 , by φ ∈ [ − π , π ] via φ (cid:55)→ Z φ := (cos( φ ) v +sin( φ ) v )(cos( φ ) v +sin( φ ) v ) ∗ = cos ( φ ) E +sin ( φ ) E + sin( φ )(cos( φ )( v v ∗ + v v ∗ ) . We see that tr (cid:104) ˜ P Z φ (cid:105) = λ cos ( φ ) + λ sin ( φ ) = λ − ( λ − λ ) sin ( φ ) . Since tr (cid:104) ˜ P Z (cid:105) = λ > and tr (cid:104) ˜ P Z π (cid:105) = λ < , there exists some φ h ∈ (0 , π ) suchthat tr (cid:104) ˜ P Z φ h (cid:105) = tr (cid:104) ˜ P Z − φ h (cid:105) = . In fact, φ h = arcsin (cid:18)(cid:113) λ − λ − λ (cid:19) . We see that tr (cid:104) ˜ P Z φ (cid:105) > for φ ∈ ( − φ h , φ h ) , and tr (cid:104) ˜ P Z φ (cid:105) < for φ ∈ [ − π , − φ h ) ∪ ( φ h , π ] .All of this goes to show that λ ( ˜ P ) determines φ h , which along with the orientation of E determines which rank-1 projections in Ran( E ) that P separates. In the quotient space picture, theopen arc between E φ h and E − φ h containing E represents the rank-1 projections with measurementsgreater than , and the complementary arc represents those with measurements less than . Let w = min { φ h , π − φ h } , which is the length of the smallest of these two arcs. If w ≤ θ , then P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) = wπ ≤ π θ . If w > θ , then P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) = θπ . So E (cid:104) P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) D Sep ( λ ( ˜ P )) (cid:105) ≤ E (cid:20) θπ D Sep ( λ ( ˜ P )) (cid:21) (71) = 2 θπ P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) ≤ (cid:107) X − Y (cid:107) . Next, we consider the case when F = C , in which case Proj C (1 , can be identified with theBloch sphere [10]. By rotational invariance, E ( ˜ P ) is a pair of (uniformly distributed) antipodal23igure 2: The × principal submatrix ˜ P of P divides Proj F (1 , into two disjoint sets based onwhether the Hilbert-Schmidt inner product of a rank-one orthogonal projection with ˜ P is greater orless than (Left: F = R ; Right: F = C ). If P separates two points X and Y , then ˜ P = λ E + λ E with eigenvalues λ > / > λ and mutually orthogonal eigenprojectors E and E . The subsetshaded in darker gray contains the points for which the Hilbert-Schmidt inner product with ˜ P isgreater than 1/2.points on the sphere, and λ ( ˜ P ) determines which pairs of projections are separated by P . If v and v are eigenvectors of ˜ P as above, E = v v ∗ and E = v v ∗ , and v φ,ψ := cos( φ ) v + e iψ sin( φ ) v for φ ∈ [0 , π ] , ψ ∈ [0 , π ] , then Z φ,ψ = v φ,ψ v ∗ φ,ψ lies on the circle of points in the Bloch sphere atan angle of φ from E . Moreover, this representation shows that tr (cid:104) ˜ P Z φ,ψ (cid:105) = tr (cid:104) ˜ P Z φ,ψ (cid:105) for all φ, ψ , and ψ . By continuity, there must exist some φ h ∈ [0 , π ] such that tr (cid:104) ˜ P Z φ h ,ψ (cid:105) = for all ψ ∈ [0 , π ] . In fact, we can calculate φ h = 2 arcsin( (cid:113) λ − λ − λ ) . The open spherical cap centered at E of angle φ h consists exactly of those projections Z ∈ Proj C (1 , such that tr (cid:104) ˜ P Z (cid:105) > , and thecomplementary cap consists of those for which tr (cid:104) ˜ P Z (cid:105) < .Conditioning on λ ( ˜ P ) determines the opening angles of these two spherical caps, which areoriented along a random diameter determined by E ( ˜ P ) . The projections X, Y are two fixed pointson the Bloch sphere at an angle of θ , and are separated if and only if they are not in the samecap. Let w = min { φ h , π − φ h } , which is the smallest opening angle of these two caps. If w ≤ θ ,then any cap of angle w containing X cannot contain Y (and vice versa), so P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) is just twice the normalized area of a cap of angle w (which is just its normalized height), i.e. P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) = 1 − cos( w ) ≤ − cos( θ ) ≤ sin( θ ) = (cid:107) X − Y (cid:107) . (72)If w > θ , then it is possible for both X and Y to be in a cap of opening angle w . In this case,24 (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) is just the normalized area of the symmetric difference of spherical caps ofangle w centered at X and Y . The intersection of these two caps contains a spherical cap of angle w − θ centered at the geodesic midpoint of X and Y , so for this case P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) ≤ cos( w − θ ) − cos( w ) ≤ sin( θ ) = (cid:107) X − Y (cid:107) . (73)where the last inequality follows since w ≤ π . Thus we have E (cid:104) P (cid:110) P ∈ S X,Y | λ ( ˜ P ) (cid:111) D Sep ( λ ( ˜ P )) (cid:105) ≤ (cid:107) X − Y (cid:107) P (cid:110) λ ( ˜ P ) ∈ D Sep (cid:111) (74) ≤ (cid:107) X − Y (cid:107) . The uniform bound for the measurement Hamming distance in terms of the operator normdistance now follows directly by combining Theorem 3.2.6 with Proposition 3.2.9.
Theorem 3.2.10.
Let δ > , m ≥ δ − (cid:16) βn log (cid:16) √ βn − √ π δ − (cid:17) + log(2) + D (cid:17) , and P = { P j } mj =1 be a collection of independent uniformly distributed projections in Proj F ( n, n ) . Then withprobability at least − exp ( − D ) d P ( X, Y ) ≤ (cid:107) X − Y (cid:107) + δ (75) for all X, Y ∈ Proj F (1 , n ) . With the results from Sections 3.1 and 3.2 we are ready to extend the pointwise result given inTheorem 2.3.3 to a uniform result that controls the behavior of our recovery procedure for all inputvectors simultaneously.
Theorem 3.3.1.
Let δ > and set (cid:15) = ( µ − µ )8 δ . If m ≥ (cid:15) − (cid:18) βn log (cid:18) √ βn − √ π (cid:15) − (cid:19) + 2 log(2) + D (cid:19) (76) and P = { P j } mj =1 is an independent sequence of uniformly distributed projections in Proj F ( n, n ) ,then with probability at least − exp ( − D ) (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ (77) for all X ∈ Proj(1 , n ) , where ˆ X is the solution to (PEP) with input Φ P ( X ) .Proof. Let N (cid:15) be an (cid:15) -net for Proj F (1 , n ) such that log |N (cid:15) | ≤ βn log(1 + 2 (cid:15) − ) as in Lemma 3.1.2.By our choice of m , Lemma 3.1.3 says that with probability greater than − exp ( − log(2) − D ) we have (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:15) for all X ∈ N (cid:15) (call this event A ). Also by our choice of m ,25heorem 3.2.10 says that with probability at least − exp ( − log(2) − D ) we have d P ( X, Y ) ≤(cid:107) X − Y (cid:107) + (cid:15) for all X, Y ∈ Proj F (1 , n ) (call this event B ).Suppose that A and B both occur, which happens with probability at least − exp ( − D ) , andconsider an arbitrary X ∈ Proj F (1 , n ) . We know from (28) that (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) ≤ µ − µ ) − (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) . (78)To bound the right-hand side of this last inequality we pass to the (cid:15) -net N (cid:15) by picking X ∈ N (cid:15) with (cid:107) X − X (cid:107) < (cid:15) . Then (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − ˆ Q P ( X ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) + (cid:107) Q ( X ) − Q ( X ) (cid:107) . (79)Next, we examine each of the three terms on the right side of (179). To bound the first term,note that (cid:12)(cid:12)(cid:12)(cid:110) j : ˆ P j ( X ) (cid:54) = ˆ P j ( X ) (cid:111)(cid:12)(cid:12)(cid:12) = m · d P ( X, X ) . Using this and the assumption that A holdsyields (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − ˆ Q P ( X ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) j : ˆ P j ( X ) (cid:54) = ˆ P j ( X ) ˆ P j ( X ) − ˆ P j ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ d P ( X, X ) ≤ (cid:15). (80)Since B holds, we can bound the second term by (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:15) . Lastly, using Proposi-tion 2.1.2 gives Q ( X ) − Q ( X ) = ( µ − µ )( X − X ) , and so we can bound the third term by (cid:107) Q ( X ) − Q ( X ) (cid:107) = ( µ − µ ) (cid:107) X − X (cid:107) ≤ ( µ − µ ) (cid:15). (81)Using these three bounds together in (79) gives (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:15) + ( µ − µ ) (cid:15) ≤
12 ( µ − µ ) δ, (82)which combined with (78) yields (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ .See Figure 3 for a plot showing how our bound on the sufficient number of measurements toachieve a uniform accuracy of δ relates to experimental results.As in the pointwise case, our proof allows us to fine tune the probability that a sequence ofmeasurement projections provides uniformly accurate recovery by adjusting the value of D in (76).In particular, we can take D = n to ensure success with overwhelming probability, i.e. the failurerate decays exponentially in n . In the pointwise case, this resulted in gaining an additional factorof n in the number of measurement projections, see Corollary 2.3.5. In the uniform case, however,the asymptotics remain the same. Corollary 3.3.2.
Let δ > . If α > , m ≥ Cδ − n log( δ − n ) , and P = { P j } mj =1 is an independentsequence of uniformly distributed projections in Proj F ( n, n ) , then with probability at least − exp ( − n ) (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) < δ (83) for all X ∈ Proj(1 , n ) , where ˆ X is the solution to (PEP) with input Φ P ( X ) and C is a constant. Number of measurement projections (m) -2 A cc u r a cy o f e s t i m a t e () Figure 3: Plot showing the accuracy for the recovery of 15000 random inputs using (PEP) witha fixed collection of measurement projections on R . The single line separate from the clusterrepresents the upper bound on δ given by Theorem 3.3.1.As in the pointwise case, we can modify the proof of Theorem 3.3.1 to show that uniformrecovery using (PEP) is robust to bit-flip errors occurring in a faulty measurement ˜Φ P . Corollary 3.3.3.
Let δ , m , and { P j } mj =1 be as in Theorem 3.3.1, and additionally let < τ < .Then with probability at least − exp ( − D ) , for all X ∈ Proj F (1 , n ) and all ˜Φ P ( X ) ∈ { , } m with d H (Φ P ( X ) , ˜Φ P ( X )) ≤ τ (84) we have (cid:13)(cid:13)(cid:13) ˜ X − X (cid:13)(cid:13)(cid:13) ≤ δ + 2( µ − µ ) − τ, (85) where ˜ X is the solution to (PEP) with input ˜Φ P ( X ) and µ − µ is bounded by Lemma 2.3.2.Proof. Let ˜ Q P ( X ) denote the empirical average of the (faulty) flipped projections, i.e. flipped using ˜Φ P ( X ) rather than Φ P ( X ) . As before, for all X ∈ Proj F (1 , n ) we have (cid:13)(cid:13)(cid:13) ˜ X − X (cid:13)(cid:13)(cid:13) ≤ µ − µ ) − (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) . (86)Using the triangle inequality, we expand (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˜ Q P ( X ) − ˆ Q P ( X ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) ≤ τ + (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) . (87)Bounding (cid:13)(cid:13)(cid:13) ˆ Q P ( X ) − Q ( X ) (cid:13)(cid:13)(cid:13) with high probability proceeds exactly as in Theorem 3.3.1. References [1] Albert Ai, Alex Lapanowski, Yaniv Plan, and Roman Vershynin. One-bit compressed sensingwith non-Gaussian measurements.
Linear Algebra Appl. , 441:222–239, 2014.272] Edwin J. Akutowicz. On the determination of the phase of a Fourier integral. I.
Trans. Amer.Math. Soc. , 83:179–192, 1956.[3] Edwin J. Akutowicz. On the determination of the phase of a Fourier integral. II.
Proc. Amer.Math. Soc. , 8:234–238, 1957.[4] Noga Alon and Joel H. Spencer.
The Probabilistic Method . Wiley Publishing, 4th edition,2016.[5] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni.
An introduction to random matrices ,volume 118 of
Cambridge Studies in Advanced Mathematics . Cambridge University Press,Cambridge, 2010.[6] Christine Bachoc and Martin Ehler. Signal reconstruction from the magnitude of subspacecomponents.
IEEE Trans. Inform. Theory , 61(7):4015–4027, 2015.[7] Radu Balan, Pete Casazza, and Dan Edidin. On signal reconstruction without phase.
Appl.Comput. Harmon. Anal. , 20(3):345–356, 2006.[8] Afonso S. Bandeira, Jameson Cahill, Dustin G. Mixon, and Aaron A. Nelson. Saving phase:injectivity and stability for phase retrieval.
Appl. Comput. Harmon. Anal. , 37(1):106–125,2014.[9] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of therestricted isometry property for random matrices.
Constructive Approximation , 28(3):253–263,Dec 2008.[10] F. Bloch. Nuclear induction.
Phys. Rev. , 70:460–474, Oct 1946.[11] Petros T. Boufounos and Richard G. Baraniuk. 1-bit compressive sensing. In , pages 16–21, March 2008.[12] Stephen Boyd and Lieven Vandenberghe.
Convex optimization . Cambridge university press,2004.[13] Jameson Cahill, Peter G. Casazza, Jesse Peterson, and Lindsey Woodland. Phase retrieval byprojections.
Houston J. Math. , 42(2):537–558, 2016.[14] Emmanuel J. Candès, Yonina C. Eldar, Thomas Strohmer, and Vladislav Voroninski. Phaseretrieval via matrix completion.
SIAM J. Imaging Sci. , 6(1):199–225, 2013.[15] Emmanuel J. Candès, Thomas Strohmer, and Vladislav Voroninski. PhaseLift: exact andstable signal recovery from magnitude measurements via convex programming.
Comm. PureAppl. Math. , 66(8):1241–1274, 2013.[16] Peter G. Casazza and Lindsey M. Woodland. Phase retrieval by vectors and projections. In
Operator methods in wavelets, tilings, and frames , volume 626 of
Contemp. Math. , pages 1–17.Amer. Math. Soc., Providence, RI, 2014.[17] Aldo Conca, Dan Edidin, Milena Hering, and Cynthia Vinzant. An algebraic characterizationof injectivity in phase retrieval.
Appl. Comput. Harmon. Anal. , 38(2):346–356, 2015.2818] James R. Fienup. Reconstruction of an object from the modulus of its fourier transform.
Opt.Lett. , 3(1):27–29, Jul 1978.[19] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum statetomography via compressed sensing.
Physical review letters , 105(15):150401, 2010.[20] Madalin Guta, Jonas Kahn, Richard Kueng, and Joel A Tropp. Fast state tomography withoptimal error bounds. arXiv preprint arXiv:1809.11162 , 2018.[21] Roger A. Horn and Charles R. Johnson.
Matrix Analysis . Cambridge University Press, NewYork, 2nd edition, 2013.[22] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk. Robust 1-bit compressivesensing via binary stable embeddings of sparse vectors.
IEEE Transactions on InformationTheory , 59(4):2082–2102, April 2013.[23] Rick P. Millane. Phase retrieval in crystallography and optics.
J. Opt. Soc. Am. A , 7(3):394–411, Mar 1990.[24] Youssef Mroueh and Lorenzo Rosasco. On efficiency and low sample complexity in phaseretrieval. In , pages 931–935,June 2014.[25] Beresford N. Parlett.
The symmetric eigenvalue problem , volume 20 of
Classics in AppliedMathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,1998. Corrected reprint of the 1980 original.[26] Yaniv Plan and Roman Vershynin. One-bit compressed sensing by linear programming.
Comm.Pure Appl. Math. , 66(8):1275–1297, 2013.[27] Yaniv Plan and Roman Vershynin. Robust 1-bit compressed sensing and sparse logistic regres-sion: a convex programming approach.
IEEE Trans. Inform. Theory , 59(1):482–494, 2013.[28] Yaniv Plan and Roman Vershynin. Dimension reduction by random hyperplane tessellations.
Discrete Comput. Geom. , 51(2):438–461, 2014.[29] Herbert Robbins. A remark on Stirling’s formula.
Amer. Math. Monthly , 62:26–29, 1955.[30] A. J. Scott. Tight informationally complete quantum measurements.
Journal of Physics A-Mathematical and General , 39(43):13507–13530, October 2006.[31] Joel A. Tropp. An introduction to matrix concentration inequalities.