[PDF] Corruption Robust Phase Retrieval via Linear Programming

Abstract

We consider the problem of phase retrieval from corrupted magnitude observations. In particular we show that a fixed x_0 \in \mathbb{R}^n can be recovered exactly from corrupted magnitude measurements |\langle a_i, x_0 \rangle | + \eta_i, \quad i =1,2\ldots m with high probability for m = O(n), where a_i \in \mathbb{R}^n are i.i.d standard Gaussian and \eta \in \mathbb{R}^m has fixed sparse support and is otherwise arbitrary, by using a version of the PhaseMax algorithm augmented with slack variables subject to a penalty. This linear programming formulation, which we call RobustPhaseMax, operates in the natural parameter space, and our proofs rely on a direct analysis of the optimality conditions using concentration inequalities.

Full PDF

aa r X i v : . [ c s . I T ] D ec Corruption Robust Phase Retrieval via Linear Programming

Paul Hand ∗ and Vladislav Voroninski † December 13, 2016

Abstract

We consider the problem of phase retrieval from corrupted magnitude observations. In particularwe show that a ﬁxed x ∈ R n can be recovered exactly from corrupted magnitude measurements |h a i , x i| + η i , i = 1 , . . . m with high probability for m = O ( n ), where a i ∈ R n are i.i.d standardGaussian and η ∈ R m has ﬁxed sparse support and is otherwise arbitrary, by using a version of thePhaseMax algorithm augmented with slack variables subject to a penalty. This linear programmingformulation, which we call RobustPhaseMax, operates in the natural parameter space, and ourproofs rely on a direct analysis of the optimality conditions using concentration inequalities. Recovering signals from corrupted data is a prevalent problem in modern applied science. Forexample, data corruption may be due to sensor failure, adversarial tampering as in cybersecurity,or gross errors that are inherent to necessary preprocessing as in the structure from motion prob-lem in computer vision. Appropriately handling such corrupted data is a challenging aspect ofalgorithm design, and falls broadly speaking under the umbrella of robust statistics [15]. A promi-nent example is that of principal component analysis, where standard solutions based on singularvalue decompositions are optimal under Gaussian noise but can be arbitrarily skewed by a singlecorrupted data point. Much work has focused on formulating robust versions of PCA, culminat-ing in breakthrough results by Candes et al. [5], which exploit concepts from compressed sensingand matrix completion to formulate a convex program that operates in the space of matrices todecouple low rank and sparse structures, the latter of which model corruptions. In addition to itsfavorable theoretical properties, the robust PCA framework has found much empirical success ina wide variety of applications. There has also been work on corruption and erasure robustness inother linear settings such as compressed sensing and matrix completion [17, 20, 8].Beyond the linear model setting, there are many situations in which signal recovery must beperformed from corrupted nonlinear observations. One such scenario is phase retrieval, in whichcorrupted data is a natural byproduct of some of its most promising applications, such as XrayFree Electron Laser high energy imaging. In this imaging scenario a stream of identical particlesis blasted by very high intensity lasers, yielding a multitude of highly noisy diﬀraction images of ∗ Department of Computational and Applied Mathematics, Rice University, TX. † Helm.ai, CA andom rotations of the particle [2, 9]. Thus, the task at hand is recovering information fromredundant yet highly corrupted data. We introduce here a linear program for phase retrieval inthe natural parameter space called RobustPhaseMax, which we show recovers real vectors exactlywith high probability from O ( n ) random Gaussian magnitude measurements, Ω( n ) of which arearbitrarily corrupted. The RobustPhaseMax formulation is highly amenable to analysis, and ourproofs rely on a simple characterization of its optimality conditions and standard concentrationinequalities.The ﬁrst theoretical guarantees for stable and eﬃcient phase retrieval at optimal sample com-plexity were for the PhaseLift algorithm, which is a convex program in a lifted space of matrices[4, 7]. Subsequently, an L1-aware version of PhaseLift was shown to be robust to noise in [4], andto an O (1) fraction of arbitrary outliers, in addition to noise, in [11]. While having favorable the-oretical properties, the PhaseLift framework is not computationally tractable in high dimensionalregimes due to the squaring of the natural dimensionality of the problem at hand.Meanwhile, lifting to higher dimensional spaces is not necessary for successful algorithm designfor non-linear inverse problems — even for those that require robustness to outliers. For instance,nonlinear corruption-robust recovery in the natural parameter space was recently achieved in com-puter vision by the empirical success of the LUD algorithm for location recovery from corruptedrelative directions [18]. It was shown by [12] that an alternative convex program called Shape-Fit, recovers locations exactly from corrupted relative directions under broad conditions on thelocations and the graph of relative direction observations. This location recovery problem can beinterpreted as a robust PCA problem under projective ambiguity in each observation. That is, if weobserved the pairwise distances as well as the relative directions, the problem would be an instanceof structured rank-2 robust PCA problem. Remarkably, instead of lifting to a space of matrices,ShapeFit operates in the natural parameter space and still achieves the goals of robust PCA, evenwhile subject to projective ambiguity in each observation and adversarial corruptions.Indeed, corruption-robust recovery has also been achieved for phase retrieval in the naturalparameter space in [16], by modifying the Wirtinger Flow framework of Candes et al. [6, 3]. UnlikeShapeFit, which consists of a single convex program, Wirtinger Flow approaches rely on a spectralinitialization, followed by a non-convex optimization program.Recently, a linear programming formulation for phase retrieval, called PhaseMax, was indepen-dently discovered by [10, 1]. When provided with an anchor vector that correlates with the desiredsignal, PhaseMax has been shown to recover signals from magnitude measurements with high prob-ability. Several recovery theorems exist for PhaseMax, some involving tools from statistical learningtheory [1], geometric probability [10], and elementary probabilistic concentration arguments [14]. Asparsity-aware version called SparsePhaseMax recovers sparse signals at optimal sample complexitywhen properly initialized [13]. While noise stability for PhaseMax has been established [10, 1], theassumptions on the noise are in the inﬁnity norm, which are not realistic for Gaussian or even Pois-son noise. At ﬁrst glance, it isn’t clear whether the PhaseMax framework can accommodate morerealistic noise models due to the structure of its feasible set. Namely, the feasible set of PhaseMaxcan shrink very rapidly into a subset of a small ball around the origin when O ( n ) measurementssigniﬁcantly underestimate their true value, as may occur in the regime of highly redundant yetcorrupted data. In this paper, we introduce a outlier-tolerant algorithm called RobustPhaseMax.It consists of augmenting the PhaseMax linear program with non-negative slack variables that aresubject to a penalty on their sum.One way to get an anchor vector for RobustPhaseMax is through the initializer provided bythe Median-Truncated Wirtinger Flow work of [16], which succeeds with high probability under = O ( n ) Gaussian measurements and a constant fraction of corruptions. Thus, our results,together with the initializer of [16], guarantee exact phase retrieval from O ( n ) corrupted measure-ments. This is the ﬁrst result of its kind for corruption-robust phase retrieval in two respects.Firstly, RobustPhaseMax is a convex program in the natural parameter space. Secondly, the bestpreviously published result that operates in the natural parameter space is that of [16], which relieson O ( n log n ) measurements due to the post-initialization step of the Median-Truncated WirtingerFlow pipeline. In sum, our work establishes that despite the non-linear nature of magnitude mea-surements, exact phase retrieval can be performed robustly to corruptions in the natural parameterspace via linear programming, at optimal sample complexity. Let x ∈ R n . Let a i ∼ N (0 , I n × n ) for i = 1 . . . m . Let b i = |h a i , x i| + η i , where η ∈ R m is asparse vector of corruptions of arbitrary magnitude and sign. If an anchor vector φ is known toapproximate the signal x , then we consider recovering x by the following linear program, calledRobustPhaseMax:max h φ, x i − λ h , e i s.t. − b i − e i ≤ h a i , x i ≤ b i + e i , e i ≥ , i = 1 . . . m, x ∈ R n , e ∈ R m . (1)The variables { e i } mi =1 are slack variables, 1 represents the m -vector of all ones, and λ >

0. We showthat this linear program recovers x exactly with high probability from O ( n ) Gaussian measure-ments, of which an adversarially chosen O (1)-fraction are subject to arbitrary corruptions, providedthat φ is suﬃciently accurate and λ = Θ( k x k /m ). Theorem 1.

For any M ≥ , there exist positive c, γ, δ such that the following holds. Let x ∈ R n .Let φ ∈ R n be such that k φ − x k < . k x k . Let a i ∼ N (0 , I n × n ) be independent for i = 1 . . . m .If m ≥ cn and λ ∈ (cid:2) m k x k , Mm k x k (cid:3) , then with probability at least − me − γm , it holds that,simultaneously for all η ∈ R m such that | supp( η ) | ≤ δm , ( x , − η − ) is the unique maximizer of (1) for b i = |h a i , x i| + η i , i = 1 . . . m . Here, η − = min( η, . We note that when recovery occurs under the conditions of this theorem, the slack variables e act to correct only the measurements that are below their true value. Finally, RobustPhaseMax isequivalent to another natural formulation for corruption-robust phase retrieval as follows. Proposition 2.

Let φ ∈ R n , λ ∈ R , λ > , a i ∈ R n for i = 1 . . . m , b ∈ R m . Then the followingare equivalent: (˜ x, ˜ e ) ∈ argmax x ∈ R n , e ∈ R m h φ, x i − λ k e k subject to |h a i , x i| ≤ b i + e i , i = 1 . . . m, (2)(˜ x, ˜ e ) ∈ argmax x ∈ R n , e ∈ R m h φ, x i − λ h , e i subject to |h a i , x i| ≤ b i + e i , e i ≥ , i = 1 . . . m. (3)This proposition follows from the fact that if (˜ x, ˜ e ) is feasible for the program in (2), then(˜ x, max(0 , ˜ e )) is also feasible and has a strictly greater objective if ˜ e has any negative entries. Thus,any (˜ x, ˜ e ) that satisﬁes (2) is such that ˜ e ≥ Proofs

Proof of Theorem 1.

Let Ω + = { i | η i > } , Ω − = { i | η i < } , Ω = Ω + ∪ Ω − , and η = η + + η − with η + = max( η,

0) and η − = min( η, x + h, − η − + g ). To show ( x , − η − )is the unique solution, it suﬃces to show that for any feasible perturbation ( h, g ) = (0 , h φ, h i − λ h , g i < . (4)Because of the feasibility of the perturbation ( h, g ),sgn( h a i , x i ) h a i , x + h i ≤ |h a i , x i| + η + i + η − i − η − i + g i = |h a i , x i| + η + i + g i , which implies sgn( h a i , x i ) h a i , h i ≤ g i for all i ∈ Ω c ∪ Ω − . (5)Deﬁne S := { i | h a i , x ih a i , h i > } . We will need the following special case of (5): g i > |h a i , h i| ≤ g i for all i ∈ S ∩ (Ω c ∪ Ω − ) . (6)By the feasibility condition e i ≥ , i = 1 . . . m , we also have g i ≥ i ∈ Ω c ∪ Ω + . (7)We begin by establishing (4) if h = 0 and g = 0. In this case, note that − λ h , g i = − λ k g k < h = 0.In the remainder of this proof, we assume h = 0 and establish (4). Write h φ, h i − λ h , g i = h x , h i − λ X i ∈ Ω c g i − λ X i ∈ Ω − g i − λ X i ∈ Ω + g i + h φ − x , h i≤ h x , h i − λ X i ∈ Ω c g i | {z } I − λ X i ∈ Ω − g i | {z } II + h φ − x , h i , (8)where the inequality follows by (7).First, we bound term I . Let i = |h a i ,x i|≤ k x k be the indicator of the event on which |h a i , x i| ≤ k x k . By Lemma 4, there exist positive δ, c , γ such that if | Ω | ≤ δm and if m ≥ c n , h x , h i = * | Ω c | X i ∈ Ω c i · a i a ⊺ i , hx ⊺ + − * | Ω c | X i ∈ Ω c i · a i a ⊺ i − I n × n , hx ⊺ + ≤ | Ω c | X i ∈ Ω c i · h a i , x ih a i , h i + 0 . k x k k h k , (9) n an event with probability at least 1 − me − γm/ . We write I = h x , h i − λ X i ∈ Ω c g i ≤ | Ω c | X i ∈ Ω c i · h a i , x ih a i , h i − λ X i ∈ Ω c g i + 0 . k x k k h k = 1 | Ω c | X i ∈ Ω c ∩ S c i · h a i , x ih a i , h i + 1 | Ω c | X i ∈ Ω c ∩ S i · h a i , x ih a i , h i − λ X i ∈ Ω c g i + 0 . k x k k h k ≤ | Ω c | X i ∈ Ω c ∩ S c − i · |h a i , x i||h a i , h i| + 1 | Ω c | X i ∈ Ω c ∩ S i · h a i , x ih a i , h i − λ X i ∈ Ω c ∩ S g i + 0 . k x k k h k ≤ | Ω c | X i ∈ Ω c ∩ S c − i · |h a i , x i||h a i , h i| + 1 | Ω c | X i ∈ Ω c ∩ S i · |h a i , x i| g i − λ X i ∈ Ω c ∩ S g i + 0 . k x k k h k ≤ | Ω c | X i ∈ Ω c ∩ S c − i · |h a i , x i||h a i , h i| − | Ω c | X i ∈ Ω c ∩ S i · |h a i , x i| g i + 0 . k x k k h k ≤ | Ω c | X i ∈ Ω c ∩ S c − i · |h a i , x i||h a i , h i| − | Ω c | X i ∈ Ω c ∩ S i · |h a i , x i||h a i , h i| + 0 . k x k k h k = − | Ω c | X i ∈ Ω c i · |h a i , x i||h a i , h i| + 0 . k x k k h k , (10)where the second line follows from (9); the fourth line uses the fact that h a i , x ih a i , h i ≤ S c and the fact that g i ≥ i ∈ Ω c ; the ﬁfth line uses the feasibility condition (5); the sixth lineuses the fact that g i ≥ i ∈ Ω c , the assumption that λ ≥ m k x k , which implies λ ≥ | Ω c | k x k for δ < /

2, and the fact that i · |h a i , x i| ≤ i · k x k ; and the seventh line uses the feasibilitycondition (6). By Lemma 8, if m ≥ c n for some c , then on an event with probability at least1 − me − γm/ , 1 | Ω c | X i ∈ Ω c i · |h a i , x i||h a i , h i| ≥ . k x k k h k . By (10), we thus have I ≤ − . k x k k h k . (11)Second, we now bound term II . II = − λ X i ∈ Ω − g i ≤ − λ X i ∈ Ω − sgn( h a i , x i ) h a i , h i ≤ λ X i ∈ Ω − |h a i , h i| , where the ﬁrst inequality follows by (5). By Lemma 6, if m ≥ nδ for some δ , and if | Ω − | ≤ δm , thenon an event with probability at least 1 − e − δm , X i ∈ Ω − |h a i , h i| ≤ ( p /δ ) + 2) δm · k h k . sing λ ≤ Mm k x k , and choosing δ small enough so that M (cid:16)p /δ ) + 2 (cid:17) δ ≤ . II ≤ Mm k x k · (cid:16)p /δ ) + 2 (cid:17) δm · k h k ≤ . k x k k h k . (12)Finally, we conclude from (8), (11), (12), and the assumption that k φ − x k < . k x k that h φ, h i − λ h , g i ≤ − . k x k k h k + 0 . k x k k h k + k φ − x k k h k < . We now prove several technical lemmas used in the proof above. The ﬁrst two lemmas concernprobabilistic concentration of the eigenvalues of appropriately truncated averages of a i a ⊺ i for i belonging to an arbitrarily selected Ω c of appropriate size. Lemma 3.

There exist positive c , γ such that the following holds. Fix x ∈ R n . Let a i ∼N (0 , I n × n ) , i = 1 . . . m be independent. If m ≥ c n , then with probability at least − e − γm , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 |h a i ,x i|≤ k x k · a i a ⊺ i − I n × n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ . . Proof. If x = 0, then the lemma follows directly from standard concentration estimates of Gaussianmatrices, such as Corollary 5.35 in [19]. Consider x = 0. Without loss of generality, let x = e .Let L = (cid:18) α β · I n − × n − (cid:19) and L / = (cid:18) α / β / · I n − × n − (cid:19) , where α = E [ z | z |≤ ] ≈ . β = P [ | z | ≤ ≈ . z ∼ N (0 , ζ i = |h a i ,x i|≤ k x k L − / a i . We have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 |h a i ,x i|≤ k x k · a i a ⊺ i − I n × n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L / m m X i =1 ζ i ζ ⊺ i − I ! L / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + k L − I k≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 ζ i ζ ⊺ i − I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k L / k + k L − I k≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 ζ i ζ ⊺ i − I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 0 . . (13)where the third inequality follows from k L / k ≤ k L − I k = 1 − α < . E [ ζ i ζ ⊺ i ] = I n × n and let A be the m × n matrix whose i th row is given by ζ i . ByTheorem 5.39 in [19], there exist positive C, γ such that for all t ≥

0, with probability at least1 − e − γt , 1 − C r nm − t √ m ≤ σ min (cid:16) A √ m (cid:17) ≤ σ max (cid:16) A √ m (cid:17) ≤ C r nm + t √ m . etting δ = C p nm + t √ m , we have by Lemma 5.36 in [19], k m A ⊺ A − I k ≤ δ, δ ) withprobability at least 1 − e − γt . Let ε = 0 . /

3. If m ≥ C ε n and t = ε √ m , we have δ ≤ ε . Thus, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 ζ i ζ ⊺ i − I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13) m A ⊺ A − I (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε = 0 . , (14)with probability at least 1 − e − γε m/ . The result follows by combining (13) and (14). Lemma 4.

Let a i ∼ N (0 , I n × n ) for i = 1 . . . m be independent. There exist universal constants δ, c , γ > such that if m ≥ c n then on an event of probability at least − me − γm/ , the followingholds simultaneously for all Ω ⊂ [ m ] such that | Ω | ≤ δm : (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | Ω c | X i ∈ Ω c |h a i ,x i|≤ k x k · a i a ⊺ i − I n × n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ . . (15) Proof.

Let c , γ be given by Lemma 3. For δ < /

2, the assumption m ≥ c n implies | Ω c | ≥ m/ ≥ c n . By Lemma 3, for any ﬁxed | Ω | with | Ω c | ≥ c n , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | Ω c | X i ∈ Ω c |h a i ,x i|≤ k x k · a i a ⊺ i − I n × n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ . − e − γ | Ω c | ≥ − e − γ m . The number of subsets of m with size ⌊ δm ⌋ is (cid:18) m ⌊ δm ⌋ (cid:19) ≤ (cid:16) emδm (cid:17) δm = (cid:20)(cid:16) eδ (cid:17) δ (cid:21) m . As lim δ → (cid:0) eδ (cid:1) δ = 1, for suﬃciently small δ , (cid:0) eδ (cid:1) δ ≤ e γ . Thus, by a union bound, the desiredestimate holds simultaneously for all | Ω | ≤ δm with probability at least1 − ⌊ δm ⌋ X k =1 (cid:18) mk (cid:19) e − γ m ≥ − ⌊ δm ⌋ (cid:18) m ⌊ δm ⌋ (cid:19) e − γ m ≥ − me − γ m . The next two lemmas establish probabilistic concentration estimates on the quantity P i ∈ Ω |h a i , h i| = k A Ω h k for arbitrarily selected small subsets Ω, where A is a matrix with i.i.d. standard Gaussianentries. Lemma 5.

Let A be a k × n matrix with i.i.d. N (0 , entries and ﬁx C > . If k ≥ n , then on anevent of probability at least − e − C k/ , the following holds simultaneously for all h ∈ R n : k Ah k ≤ ( C + 2) k k h k . Proof.

By Corollary 5.25 in [19], the spectral norm of A is bounded by k A k ≤ √ k + √ n + t with probability at least 1 − e − t / . By k ≥ n , and by taking t = C √ k , we have k A k ≤ ( C + 2) √ k on an event with probability at least1 − e − C k/ . On this event, we conclude k Ah k ≤ √ k k Ah k ≤ √ k k A kk h k ≤ ( C + 2) k k h k . emma 6. Fix δ > . Let A be an m × n matrix with i.i.d. N (0 , entries. If m ≥ δ n , then onan event of probability at least − e − δm , the following holds simultaneously for all Ω ⊂ [ m ] with | Ω | ≤ δm and for all h ∈ R n : k A Ω h k ≤ (cid:16)p /δ ) + 2 (cid:17) δm k h k . Proof.

Without loss of generality, it suﬃces to only consider Ω such that | Ω | = ⌊ δm ⌋ ≥ n . Let C = p /δ ). By Lemma 5, with probability at least 1 − e − C ⌊ δm ⌋ / , we have k A Ω h k ≤ ( C + 2) δm k h k . The result follows by a union bound and by noting that1 − (cid:18) m ⌊ δm ⌋ (cid:19) e − C δm/ ≥ − (cid:16) emδm (cid:17) δm e − C δm/ ≥ − e δm (1+log(1 /δ )) e − C δm/ ≥ − e δm (1+log(1 /δ ) − C ) ≥ − e − δm . The ﬁnal three lemmas establish probabilistic concentration lower-bounds on the average of |h a i ,x i|≤ k x k · |h a i , x i||h a i , x i| over arbitrarily selected subsets Ω c . Lemma 7.

Fix x ∈ R n . Let a i ∼ N (0 , I n × n ) be independent for i = 1 . . . m . There exist constants c , γ such that if m ≥ c n , then with probability at least − e − γm , m m X i =1 |h a i ,x i|≤ k x k · |h a i , x i||h a i , x i| ≥ . k x k k x k for all x ∈ R n . (16) Proof.

Without loss of generality, we take k x k = k x k = 1. Deﬁne A x : R n → R m by A x ( x ) =( |h a i ,x i|≤ ·h a i , x ih a i , x i ) mi =1 . Note that A x is linear and the left hand side of (16) is m kA x ( x ) k .First, we show that for any ﬁxed unit vector x , m kA x ( x ) k ≥ .

59 with probability atleast 1 − e − γm . To show this, observe that m kA x ( x ) k = m P mi =1 ξ i , where ξ i = |h a i ,x i|≤ ·|h a i , x i||h a i , x i| is a sub-gaussian random variable. Let K be the sub-gaussian norm of ξ i − E ξ i .By Lemma 9, E ξ i ≥ . c > P (cid:16) m kA x ( x ) k ≥ . (cid:17) ≥ − e · e − cm/K ≥ − e − γm , for some γ > x ∈ R n . Let N ε be an ε -net of S n − , where ε will be speciﬁed below. By Lemma 5.2 in [19], we may take |N ε | ≤ (cid:0) ε (cid:1) n . Forany x , there exists an ˜ x ∈ N ε such that k x − ˜ x k ≤ ε. Because A x ( x ) = A x (˜ x )+ A x ( x − ˜ x ),we have 1 m kA x ( x ) k ≥ m kA x (˜ x ) k − m kA x ( x − ˜ x ) k . On the intersection of the events E = (cid:26) m kA x ( x ) k ≥ . , for all x ∈ N ε (cid:27) ,E = (cid:26) m kA x ( h ) k ≤ k h k , for all h ∈ R n (cid:27) , e have 1 m kA x ( x ) k ≥ . − k x − ˜ x k ≥ (0 . − ε ) = 0 . , by choosing 9 ε = 0 .

04. It remains to estimate the probability of E ∩ E . We have by a unionbound that P ( E ) ≥ − |N ε | · e − γm ≥ − (cid:18) ε (cid:19) n e − γm . If m ≥ c n for a suﬃciently large c , then P ( E ) ≥ − e − γm/ . To bound P ( E ), note that1 m kA x ( x ) k = 1 m m X i =1 |h a i ,x i|≤ · |h a i , x i||h a i , x i| ≤ m m X i =1 |h a i , x i| . By Lemma 5 with C = 1 and by m ≥ n , there is an event E , with P ( E ) ≥ − e − m/ , on which m P mi =1 |h a i , x i| ≤ k x k for all x ∈ R n . Notice that E ⊂ E , and hence P ( E ) ≥ − e − m/ .Thus, P ( E ∩ E ) ≥ − e − γm for some constant γ . Lemma 8.

Fix x ∈ R n . Let a i ∼ N (0 , I n × n ) be independent for i = 1 . . . m . There exist positiveconstants c , γ, δ such that if m ≥ c n , then with probability at least − me − γm/ , the followingholds simultaneously for all Ω ⊂ [ m ] such that | Ω | ≤ δm : | Ω c | X i ∈ Ω c |h a i ,x i|≤ k x k · |h a i , x i||h a i , x i| ≥ . k x k k x k for all x ∈ R n . (17) Proof.

Let c , γ be the constants from Lemma 7. For any ﬁxed Ω such that | Ω | ≤ δm , and for any δ < /

2, we have (17) with probability at least 1 − e − γ | Ω c | ≥ − e − γm/ . By a union bound andby selecting δ small enough such that ( e/δ ) δ < e γ/ , the result holds with probability at least1 − ⌊ δm ⌋ (cid:18) m ⌊ δm ⌋ (cid:19) e − γm/ ≥ − δm (cid:16) emδm (cid:17) δm e − γm/ ≥ − me − γm/ . Lemma 9.

Fix x , x ∈ R n . If a ∼ N (0 , I n × n ) , then E h |h a i ,x i|≤ ·|h a, x i||h a, x i| i ≥ . k x k k x k . Proof.

Without loss of generality, take k x k = k x k = 1. Further, without loss of generality, take x = e and x = cos θ e + sin θ e , where θ is the angle between x and x . The desired expected alue is E h | a |≤ · (cid:12)(cid:12)(cid:12) a ( a cos θ + a sin θ ) (cid:12)(cid:12)(cid:12)i ≥ E h √ a + a ≤ · (cid:12)(cid:12)(cid:12) a ( a cos θ + a sin θ ) (cid:12)(cid:12)(cid:12)i = 12 π Z r e − r / dr Z π | cos φ cos( θ − φ ) | dφ = 12 π (2 − e − / ) Z π | cos φ cos( θ − φ ) | dφ = 2 − e − / π Z π | cos θ + cos(2 φ − θ ) | dφ = 2 − e − / π Z π | cos θ + cos ˜ φ | d ˜ φ = 2 − e − / π (cid:0) | sin θ | + sin − (cos θ ) cos θ (cid:1) ≥ − e − / π ≥ . , where the third equality is because 2 cos φ cos( θ − φ ) = cos θ + cos(2 φ − θ ). Acknowledgements

PH acknowledges funding by the grant NSF DMS-1464525.

References [1] Sohail Bahmani and Justin Romberg. Phase retrieval meets statistical learning theory: Aﬂexible convex relaxation. arXiv preprint arXiv:1610.04210 , 2016.[2] M.J. Bogan and et al. Single particle x-ray diﬀractive imaging.

Nano Lett. , 8(2):310–316, 2008.[3] T Tony Cai, Xiaodong Li, and Zongming Ma. Optimal rates of convergence for noisy sparsephase retrieval via thresholded wirtinger ﬂow. arXiv preprint arXiv:1506.03382 , 2015.[4] Emmanuel J Cand`es and Xiaodong Li. Solving quadratic equations via phaselift when thereare about as many equations as unknowns.

Found. of Comput. Math. , pages 1–10, 2012.[5] Emmanuel J. Cand`es, Xiaodong Li, Yi Ma, and John Wright. Robust principal componentanalysis?

CoRR , abs/0912.3599, 2009.[6] Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtingerﬂow: Theory and algorithms.

Information Theory, IEEE Transactions on , 61(4):1985–2007,2015.[7] Emmanuel J Cand`es, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stablesignal recovery from magnitude measurements via convex programming.

Comm. Pure Appl.Math. , 66(8):1241–1274, 2013.

8] Yudong Chen, Ali Jalali, Sujay Sanghavi, and Constantine Caramanis. Low-rank matrixrecovery from errors and erasures.

Information Theory, IEEE Transactions on , 59(7):4324–4337, 2013.[9] M. Dierolf and et al. Ptychographic x-ray computed tomography at the nanoscale.

Nature ,467:436–440, 2010.[10] Tom Goldstein and Christoph Studer. Phasemax: Convex phase retrieval via basis pursuit. arXiv preprint arXiv:1610.07531 , 2016.[11] Paul Hand. Phaselift is robust to a constant fraction of arbitrary errors.

ArXiv preprinthttps://arxiv.org/abs/1502.04241 , 2015.[12] Paul Hand, Choongbum Lee, and Vladislav Voroninski. Shapeﬁt: Exact location recoveryfrom corrupted pairwise directions.

CoRR , abs/1506.01437, 2015.[13] Paul Hand and Vladislav Voroninski. Compressed phaseless sensing in the natural parameterspace via linear programming. arXiv preprint https://arxiv.org/abs/1611.05985 , 2016.[14] Paul Hand and Vladislav Voroninski. An elementary proof of convex phase retrieval in thenatural parameter space via the linear program phasemax. arXiv preprint arXiv:1611.03935 ,2016.[15] Peter J Huber.

Robust statistics . Springer, 2011.[16] Yingbin Liang Huishuai Zhang, Yuejie Chi. Provable non-convex phase retrieval with outliers:Median truncatedwirtinger ﬂow. In , pages 1022–1031, 2016.[17] Xiaodong Li. Compressed sensing and matrix completion with constant proportion of corrup-tions.

Constructive Approximation , 37(1):73–99, 2013.[18] Onur ¨Ozyesil and Amit Singer. Robust camera location estimation by convex programming.

CoRR , abs/1412.0165, 2014.[19] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y.C.Eldar and G. Kutyniok, editors,

Compressed Sensing: Theory and Applications . CambridgeUniversity Press, 2012.[20] Vladislav Voroninski and Zhiqiang Xu. A strong restricted isometry property, with an ap-plication to phaseless compressed sensing.

Applied and Computational Harmonic Analysis ,40(2):386–395, 2016.,40(2):386–395, 2016.