[PDF] Sparse phase retrieval via Phaseliftoff

Abstract

The aim of sparse phase retrieval is to recover a k -sparse signal x 0 ∈ C d from quadratic measurements |⟨ a i , x 0 ⟩ | 2 where a i ∈ C d ,i=1,…,m . Noting |⟨ a i , x 0 ⟩ | 2 =Tr( A i X 0 ) with A i = a i a ∗ i ∈ C d×d , X 0 = x 0 x ∗ 0 ∈ C d×d , one can recast sparse phase retrieval as a problem of recovering a rank-one sparse matrix from linear measurements. Yin and Xin introduced PhaseLiftOff which presents a proxy of rank-one condition via the difference of trace and Frobenius norm. By adding sparsity penalty to PhaseLiftOff, in this paper, we present a novel model to recover sparse signals from quadratic measurements. Theoretical analysis shows that the solution to our model provides the stable recovery of x 0 under almost optimal sampling complexity m=O(klog(d/k)) . The computation of our model is carried out by the difference of convex function algorithm (DCA). Numerical experiments demonstrate that our algorithm outperforms other state-of-the-art algorithms used for solving sparse phase retrieval.

Full PDF

aa r X i v : . [ m a t h . F A ] A ug SPARSE PHASE RETRIEVAL VIA PHASELIFTOFF

YU XIA AND ZHIQIANG XU

Abstract.

The aim of sparse phase retrieval is to recover a k -sparse signal x ∈ C d from qua-dratic measurements |h a i , x i| where a i ∈ C d , i = 1 , . . . , m . Noting |h a i , x i| = Tr( A i X ) with A i = a i a ∗ i ∈ C d × d , X = x x ∗ ∈ C d × d , one can recast sparse phase retrieval as a problem of recov-ering a rank-one sparse matrix from linear measurements. Yin and Xin introduced PhaseLiftOﬀ which presents a proxy of rank-one condition via the diﬀerence of trace and Frobenius norm. Byadding sparsity penalty to

PhaseLiftOﬀ , in this paper, we present a novel model to recover sparsesignals from quadratic measurements. Theoretical analysis shows that the solution to our modelprovides the stable recovery of x under almost optimal sampling complexity m = O ( k log( d/k )).The computation of our model is carried out by the diﬀerence of convex function algorithm (DCA).Numerical experiments demonstrate that our algorithm outperforms other state-of-the-art algo-rithms used for solving sparse phase retrieval. Introduction

Phase retrieval.

We assume that x ∈ F d is a target signal, where F ∈ { R , C } . The aimof phase retrieval is to recover x ∈ F d from |h a j , x i| + w j , j = 1 , . . . , m , up to a unimodularconstant where a j ∈ F d are known measurement vectors and w := ( w , . . . , w m ) T ∈ R m is a noisevector. Phase retrieval is raised in many areas, such as X-ray crystallography, astronomy, quantumtomography, optics and microscopy. For convenience, let A : F d × d → R m be a linear map which isdeﬁned as(1.1) A ( X ) = ( a ∗ X a , . . . , a ∗ m X a m ) , where X ∈ F d × d , a j ∈ F d , j = 1 , . . . , m . We abuse the notation and set A ( x ) := A ( xx ∗ ) = ( |h a , x i| , . . . , |h a m , x i| ) , where x ∈ F d . With these notations, we can formulate the aim of phase retrieval as follows: Toestimate the matrix X = x x ∗ ∈ C d × d from A ( x ) + w ∈ R m .For the noiseless case, to guarantee the solution A ( x ) = A ( x ) is unique for all x ∈ C d , it isshown in [20] that the measurement number m ≥ d − − α d is necessary where α d denotes thenumber of 1’s in the binary of expansion of d −

1. The authors in [8] proved that m ≥ d − a j ∈ C d , j = 1 , . . . , m, are enough to guarantee the uniqueness of the solution. Mathematics Subject Classiﬁcation.

Key words and phrases.

Signal recovery, Phase retrieval, Compressed sensing, Restricted isometry property, Com-pressed phaseless sensing.Yu Xia was supported by NSFC grant (11901143), Zhejiang Provincial Natural Science Foundation (LQ19A010008),Education Department of Zhejiang Province Science Foundation (Y201840082).Zhiqiang Xu was supported by Beijing Natural Science Foundation (Z180002) and by NSFC grant (11688101).

In [5, 6, 7], the phase retrieval was recasted as a semi-deﬁnite programming problem, i.e., thePhaseLift problem:(1.2) min X ∈ F d × d Tr( X ) s . t . A ( X ) = A ( X ) , X (cid:23) . In [6], it is shown that the solution to (1.2) is X with high probability provided a j is Gaussian ran-dom vector and m = O ( d log d ), which was reduced to m = O ( d ) in [5]. For the aim of computation,the regularized trace-norm minimization is suggested in [6, 7]:(1.3) min X (cid:23) ,X ∈ F d × d kA ( X ) − b k + λ Tr( X ) . Noting that Tr( X ) − k X k F ≥ X ) = 1, Yin and Xin suggestedthe following regularization problem [23], which is called as PhaseLiftOﬀ :(1.4) min X (cid:23) ,X ∈ F d × d kA ( X ) − b k + λ (Tr( X ) − k X k F ) . The numerical experiments in [23] showed that PhaseLiftOﬀ outperforms PhaseLift.1.2.

Sparse phase retrieval.

In many areas, one also requires k x k ≤ k , i.e., the number ofnonzero entries of x less than or equal to k [13, 19, 12]. The aim of sparse phase retrieval is torecover the k -sparse signal x from |h a j , x i| = b j , j = 1 , . . . , m .For convenience, we set Σ F k := { x ∈ F d : k x k ≤ k } . It was shown in [19] that, for F = C and x ∈ Σ F k , if m ≥ k − m ≥ k for F = R ) and a , . . . , a m are generic vectors in C d (resp. R d ) then the solution to A ( x ) = A ( x ) with x ∈ Σ F k is unique up to a unimodular constant.The ℓ -minimization is a commonly used method for recovering sparse signals. Naturally, oneis also interested in employing ℓ -minimization for solving sparse phase retrieval. For F = R , thefollowing model was considered in [17]:(1.5) min x ∈ R d × d k x k s . t . A ( x ) = A ( x ) . Particularly, it is proved that the solution to (1.5) is ± x with high probability if m & k log d and a j , j = 1 , . . . , m, are independent Gaussian random vectors. In [22], the authors extended this resultto the case where F = C .In [10], the following convex model was considered(1.6) min X ∈ R d × d k X k + λ Tr( X ) , s . t . A ( X ) = A ( x ) , X (cid:23) . The objective function in (1.6) is the summation of the trace and the ℓ norm, which is also a convexmodel. To guarantee the solution to (1.6) is x x ∗ , one has to require the number of measurements m & k log d , which is quadratic about the sparse level k [10].Beyond the convex model, one also develops many nonconvex algorithms for solving sparse phaseretrieval, such as Sparse Truncated Amplitude ﬂow (SPARTA) [18], Thresholded Wirtinger Flow(ThWF) [4], Sparse Wirtinger Flow (SWF) [25], Sparse Phase Retrieval via Smoothing Function(SPRSF) [11]. These algorithms include two stages: (i) Recover the support of the underlying sparsesignal under some analytical rule, and construct an initialization near the ground truth signal x ;(ii) Reﬁne the initialization by gradient-type iterations and extra truncation procedure by hardthresholding. However, to guarantee the algorithms converge to the true signal, the algorithmsmentioned above require the sample complexity is m = O ( k log d ). PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 3

Our contribution.

A natural model for sparse phase retrieval is to use ℓ -regularization meth-ods, i.e.,(1.7) min X ∈ C d × d µ k X k + 12 kA ( X ) − b k , s . t . X (cid:23) , rank( X ) = 1 , where k X k = P dl =1 P dj =1 | x j,l | and x j,l are the entries of X . Motivated by the notable PhaseLiftOﬀ[23], we reformulate (1.7) as the following regularization problem:(1.8) min X ∈ C d × d λ (Tr( X ) − k X k F ) + µ k X k + 12 kA ( X ) − b k s . t . X (cid:23) . For convenience, we call (1.8)

Sparse PhaseLiftOﬀ model. Note that the object function in (1.8) isthe diﬀerence of convex functions and hence it can be solved by the diﬀerence of convex functionsalgorithm (DCA).To study the performance of (1.8), we ﬁrst establish the equivalence between (1.8) and (1.7) undersome mild conditions about λ , µ and k w k : Lemma 1.1.

Assume that b = A ( x x ∗ ) + w where x ∈ C d , and w ∈ R m is the noise term. Let X be the global minimizer of (1.8). If k b k > µ k x k + k w k , µ ≥ and (1.9) λ > µd + kAk ( √ µ k x k + k w k ) √ − , then rank ( X ) = 1 . As said before, Tr( X ) − k X k F = 0 provided rank( X ) = 1. Under the conditions of Lemma 1.1, X is also the minimizer of (1.7). Hence, we turn to study the performance of (1.7). To do that,we require A satisﬁes restricted isometry property over low-rank and sparse matrices: Deﬁnition 1.1. [22]

We say that the map A : H d × d → R m satisﬁes the restricted isometry propertyof order ( r, s ) if there exist positive constants c and C such that the inequality (1.10) c k X k F ≤ m kA ( X ) k ≤ C k X k F holds for all X ∈ H d × d with rank ( X ) ≤ r and k X k , ≤ s . Then, we have the following theorem.

Theorem 1.2.

Let b = A ( x x ∗ ) + w , where x ∈ C d , and w ∈ R m is the noise term. Assume that A ( · ) satisfy the RIP condition of order (2 , ak ) with RIP constant c, C > , and a ≥ with (1.11) c − √ C √ a − Ca > . Set α := a + √ √ a c − Ca − √ C √ a . Assume that µ > . For any k -sparse signals x ∈ C d , the solution to (1.7) X := x ( x ) ∗ satisﬁes (1.12) k x ( x ) ∗ − x x ∗ k F ≤ (2 Cα + 2) k w k √ µak (cid:18) k x k + k w k √ µ (cid:19) + k w k µak + α · µak C (cid:18) C k w k µak + 1 √ m (cid:19) . Combing Lemma 1.1, Theorem 1.2 and Theorem 2.1, we have the following corollary:

YU XIA AND ZHIQIANG XU

Corollary 1.3.

Assume that a j , j = 1 , . . . , m are independently complex Gaussian random vectors,i.e., a j ∼ N (0 , I d ) + N (0 , I d ) i . Assume that b = A ( x x ∗ ) + w where w ∈ R m is a noise vectorand x ∈ C d with k x k ≤ k . Assume that m & k log( ed/k ) . Let X be the global minimizer ofmodel (1.8). The following holds with probability at least − exp( − cm ) : If k b k > µ k x k + k w k , µ > and λ > µd + kAk ( √ µ k x k + k w k ) √ − , then rank( X ) = 1 , and X = x ( x ) ∗ satisﬁes k x ( x ) ∗ − x x ∗ k F . k w k √ µk (cid:18) k x k + k w k √ µ (cid:19) + k w k µk + µk (cid:18) k w k µk + 1 √ m (cid:19) . Remark 1.4.

According to Corollary 1.3, the parameter λ depends on kAk . We next show kAk = O (( m + d ) √ d ) . We assume that the singular value decomposition of X ∈ C d × d with k X k F = 1 is X = P dj =1 σ j u j v ∗ j . Here, P dj =1 σ j = 1 . We claim that kA ( u j v ∗ j ) k = O ( m + d ) holds withprobability at least − exp( − m ) . Then we have kA ( X ) k ≤ kA ( X ) k ≤ X j σ j kA ( u j v ∗ j ) k = O (( m + d ) √ d ) . Note that kA ( u j v ∗ j ) k = X i |h a i , u j ih a i , v j i| ≤ sX j |h a i , u j i| sX j |h a i , v j i| ≤ ( √ m + √ d + t ) holds with probability larger than − exp( − t / . Here, the last inequality follows from the singularvalues of Gaussian random matrices [16, Corrollary 5.35] . By taking t = √ m + √ d , we have kA ( u j v ∗ j ) k = O ( m + d ) . Remark 1.5.

If we take k w k = 0 in Corollary 1.3 then the following holds with high probability: k x ( x ) ∗ − x x ∗ k . µkm provided λ > µd + kAk ( √ µ k x k ) √ − . Notations.

We use H d × d to denote the set of all d × d Hermitian matrices. For any

X, Y ∈ H d × d , set h X, Y i := Tr( X ∗ Y ). For x ∈ C , we use R ( x ) and I ( x ) to denote the real and complexparts of x , respectively. For X ∈ C d × d , we use X i, : and X : ,j to denote the i -th row and j -thcolumn of X , respectively. For S, T ⊂ { , . . . , d } , we use X S,T to denote a submatrix of X with therows indexed in S and columns indexed in T . We also set k X k := P i,j p R ( X i,j ) + I ( X i,j ) , k X k F := qP i,j ( R ( X i,j ) + I ( X i,j ) ), and k X k , := P j k X : ,j k . We use k X k , to denote thenumber of non-zero columns in X and use vec( X ) ∈ C d to denote the vectorization of X ∈ C d × d .1.5. Organization.

The paper is organized as follows. After introducing some useful lemmas inSection 2, we present the proof of Theorem 1.2 in Section 3. The proof of Lemma 1.1 is presented inSection 4. In Section 5, we make a lot of numerical experiments, which show our method has betterperformance over the other known algorithms for sparse phase retrieval.

PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 5 Preliminaries and Lemmas

The following theorem shows that complex Gaussian random quadratic map A satisﬁes RIP oforder (2 , s ) with high probability provided m & s log( d/s ). Theorem 2.1. [22]

Assume that the linear measurement A ( · ) is deﬁned as A ( X ) = ( a ∗ X a , . . . , a ∗ m X a m ) , where a j are independently complex Gaussian random vectors, i.e., a j ∼ N (0 , I n × n )+ N (0 , I n × n ) i .If m & s log( d/s ) , under the probability at least − − c m ) , the linear map A satisﬁes the restricted isometryproperty of order (2 , s ) , i.e. . k X k F ≤ m kA ( X ) k ≤ . k X k F , for all X ∈ H n × n with rank ( X ) ≤ and k X k , ≤ s (also k X ∗ k , ≤ s ). We also need the following lemma.

Lemma 2.2. [22] If x , y ∈ C d , and h x , y i ≥ . Then k xx ∗ − yy ∗ k F ≥ k x k k x − y k . The following lemma follows from the proof of Theorem 3.1 in [23]. We include a proof here forcompleteness.

Lemma 2.3.

Suppose that h X, Y i = 0 , where X, Y (cid:23) . If rank ( X ) = r ≥ , then (2.1) (cid:13)(cid:13)(cid:13)(cid:13) λ (cid:18) I − X k X k F (cid:19) − Y (cid:13)(cid:13)(cid:13)(cid:13) F ≥ λ ( √ r − . Here λ is a non-negative constant.Proof. First of all, we have(2.2) (cid:13)(cid:13)(cid:13)(cid:13) λ (cid:18) I − X k X k F (cid:19) − Y (cid:13)(cid:13)(cid:13)(cid:13) F ≥ k λ I − Y k F − λ (cid:13)(cid:13)(cid:13)(cid:13) X k X k F (cid:13)(cid:13)(cid:13)(cid:13) F = k λ I − Y k F − λ, Then we estimate the lower bound of k λ I − Y k F . Suppose that the singular value decomposition of X is in the form of X = r X i =1 σ i u i u ∗ i , where U = ( u , . . . , u r ) ∈ C d × r , and σ i >

0, for i = 1 , . . . , r . Construct U ∈ C d × ( d − r ) , whichsatisﬁes I = U U ∗ + U U ∗ and U ∗ U = 0. Then we have(2.3) k λ I − Y k F = k λ ( U U ∗ + U U ∗ ) − Y k F = k λU U ∗ + ( λU U ∗ − Y ) k F = k λU U ∗ k F + k λU U ∗ − Y k F + 2 λ h U U ∗ , λU U ∗ − Y i = k λU U ∗ k F + k λU U ∗ − Y k F − λ h U U ∗ , Y i . The last line follows from U ∗ U = 0. Since σ i > Y (cid:23)

0, the condition h X, Y i = 0 implies that0 = h X, Y i = r X i =1 σ i h u i u ∗ i , Y i = r X i =1 σ i Tr( u ∗ i Y u i ) , YU XIA AND ZHIQIANG XU which leads to h u i u ∗ i , Y i = 0 , i = 1 , . . . , r . Therefore, it obtain that(2.4) h U U ∗ , Y i = r X i =1 h u i u ∗ i , Y i = 0 , and (2.3) becomes(2.5) k λ I − Y k F = k λU U ∗ k F + k λU U ∗ − Y k F ≥ k λU U ∗ k F = λ r. Combining (2.2) and (2.5), we have (cid:13)(cid:13)(cid:13)(cid:13) λ (cid:18) I − X k X k F (cid:19) − Y (cid:13)(cid:13)(cid:13)(cid:13) F ≥ k λ I − Y k F − λ ≥ λ ( √ r − . (cid:3) Proof of Theorem 1.2

The aim of this section is to present the proof of Theorem 1.2.

Proof of Theorem 1.2.

Set(3.1) x := argmin x ∈ C d µ k x k + 12 m X i =1 ( |h a i , x i| − b i ) . Noting exp( iθ ) x is also a solution to (3.1) for any θ ∈ R , without loss of generality, we can assumethat h x , x i ∈ R and h x , x i ≥ . Then a simple observation is that X is the solution to (1.7) if and only if X = x ( x ) ∗ .Set X := x x ∗ and H := X − X = x ( x ) ∗ − x x ∗ . To prove the conlusion, it is enough to consider the upper bound of k H k F . Set T := supp( x ). Set T as the index set which contains the indices of the ak largest elements of x T c in magnitude, and T contains the indices of the next ak largest elements, and so on. For simplicity, we set T := T ∪ T and ¯ H := H T ,T . Note that k H k F ≤ k ¯ H k F + X i ≥ ,j ≥ k H T i ,T j k F + 2 X j ≥ ,i =0 , k H T i ,T j k F . So, it is enough to present upper bounds for k ¯ H k F , X i ≥ ,j ≥ k H T i ,T j k F , and X j ≥ ,i =0 , k H T i ,T j k F . We ﬁrst consider P i ≥ ,j ≥ k H T i ,T j k F . According to µ k X k + 12 kA ( X ) − b k ≤ µ k X k + 12 kA ( X ) − b k , we can obtain that(3.2) µ k H − H T ,T k ≤ µ k H T ,T k + k w k kA ( H ) k − kA ( H ) k . Here, we use k X k = k H T ,T − X T ,T k ≤ k H T ,T k + k X T ,T k , PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 7 and k X k − k X T ,T k = k X − X T ,T k = k H − H T ,T k . Therefore, we have(3.3) X i ≥ ,j ≥ k H T i ,T j k F = X i ≥ ,j ≥ k x T i k · k x T j k = X i ≥ k x T i k  ≤ ak k x T c k = 1 ak k H T c ,T c k ≤ ak k H − H T ,T k ≤ ak k H T ,T k + k w k µak kA ( H ) k − µak kA ( H ) k ≤ a k H T ,T k F + k w k µak kA ( H ) k − µak kA ( H ) k ≤ a k ¯ H k F + k w k µak kA ( H ) k − µak kA ( H ) k . The second line based on k x T j k ≤ k x T j − k / √ ak , and the third line follows from (3.2).Second, we consider P j ≥ ,i =0 , k H T i ,T j k F . Applying that k x T j k ≤ k x T j − k / √ ak , we have(3.4) X j ≥ k H T i ,T j k F = k x T i k · X j ≥ k x T j k ≤ √ ak k x T c k k x T i k ≤ √ a k x T − x k k x T i k + k w k √ µak k x T i k ≤ √ √ a k x T ( x T ) ∗ − x x ∗ k F + k w k √ µak k x T i k = √ √ a k ¯ H k F + k w k √ µak k x T i k , where i ∈ { , } . Here, the third line follows from Lemma 2.2 and the second line follows from(3.5) k x T c k ≤ √ k k x T − x k + k w k √ µ . Indeed, noting that µ k x ( x ) ∗ k + 12 kA ( x ( x ) ∗ ) − b k ≤ µ k x x ∗ k + 12 kA ( x ( x ) ∗ ) − b k , we obtain that µ k x k + 12 kA ( x ( x ) ∗ ) − b k ≤ µ k x k + 12 kA ( x ( x ) ∗ ) − b k

22 YU XIA AND ZHIQIANG XU which implies(3.6) k x k ≤ r k x k + 12 µ kA ( x ( x ) ∗ ) − b k − µ kA ( x ( x ) ∗ ) − b k = r k x k + 12 µ kA ( x ( x ) ∗ ) − b k − µ kA ( x ( x ) ∗ ) − A ( x ( x ) ∗ ) + A ( x ( x ) ∗ ) − b k = r k x k − µ kA ( x ( x ) ∗ − x ( x ) ∗ ) k − µ hA ( x ( x ) ∗ − x ( x ) ∗ ) , A ( x ( x ) ∗ ) − b i≤ r k x k + 1 µ kA ( H ) k kA ( x ( x ) ∗ ) − b k − µ kA ( H ) k = s k x k + k w k µ kA ( H ) k − µ kA ( H ) k ≤ k x k + s max { , k w k µ kA ( H ) k − µ kA ( H ) k }≤ k x k + k w k √ µ . Therefore, we have k x T c k ≤ −k x T k + k x k + k w k √ µ ≤ k x T − x k + k w k √ µ ≤ √ k k x T − x k + k w k √ µ , which implies (3.5). Combing (3.3) and (3.4), we have(3.7) X i ≥ ,j ≥ k H T i ,T j k F + 2 X j ≥ ,i =0 , k H T i ,T j k F ≤ a + 4 √ √ a ! k ¯ H k F + 2 k w k √ µak k x T k + k w k µak kA ( H ) k − µak kA ( H ) k ≤ a + 4 √ √ a ! k ¯ H k F + 2 k w k √ µak k x T k + k w k µak . Third, we claim that(3.8) k ¯ H k F ≤ c − Ca − √ C √ a C k w k √ µak k x T k + µak C (cid:18) C k w k µak + 1 √ m (cid:19) ! . Combining (3.7) and (3.8), we obtain that k H k F ≤ k ¯ H k F + X i ≥ ,j ≥ k H T i ,T j k F + 2 X j ≥ ,i =0 , k H T i ,T j k F ≤ a + 4 √ √ a ! k ¯ H k F + 2 k w k √ µak k x T k + k w k µak ≤ (2 Cα + 2) k w k √ µak k x T k + k w k µak + α · µak C (cid:18) C k w k µak + 1 √ m (cid:19) ≤ (2 Cα + 2) k w k √ µak (cid:18) k x k + k w k √ µ (cid:19) + k w k µak + α · µak C (cid:18) C k w k µak + 1 √ m (cid:19) , PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 9 which leads to the conclusion. Here, the fourth line is based on k x T k ≤ k x T k ≤ k x k ≤ k x k + k w k √ µ , where the last inequality follows from (3.6).We remain to prove (3.8). Note that1 √ m kA ( H ) k ≥ m kA ( H ) k ≥ m kA ( ¯ H ) k − m kA ( H − ¯ H ) k , which implies(3.9) 1 m kA ( ¯ H ) k ≤ m kA ( H − ¯ H ) k + 1 √ m kA ( H ) k . Here we can see that1 m kA ( H − ¯ H ) k ≤ X i =0 , m kA ( H T i ,T c + H T c ,T i ) k + 1 m kA ( H T c ,T c ) k . For i = 0 ,

1, since A ( · ) satisfy the RIP condition of order (2 , ak ) with upper RIP constant C , wehave 1 m kA ( H T i ,T c + H T c ,T i ) k = 1 m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X j ≥ A ( H T i ,T j + H T j ,T i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ m X j ≥ kA ( H T i ,T j + H T j ,T i ) k = 1 m X j ≥ kA ( x T i ( x T j ) ∗ + x T j ( x T i ) ∗ ) k ≤ C X j ≥ k x T i k k x T j k ≤ C √ ak k x T c k k x T i k ≤ C √ a k ¯ H k F + 2 C k w k √ µak k x T i k . (3.10)Here, the last line follows from (3.4). On the other hand, based on (3.3), we have1 m kA ( H T c ,T c ) k ≤ m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X i,j ≥ ,i = j A ( H T i ,T j + H T j ,T i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 1 m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ≥ A ( H T i ,T i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C X i ≥ ,j ≥ k H T i ,T j k F ≤ Ca k ¯ H k F + C k w k µak kA ( H ) k − C µak kA ( H ) k . (3.11) As A ( · ) satisfy the RIP condition of order (2 , ak ) with lower RIP constant c >

0, combining (3.9),(3.10) and (3.11), we obtain that(3.12) c k ¯ H k F ≤ m kA ( ¯ H ) k ≤ √ m kA ( H ) k + 1 m kA ( H − ¯ H ) k ≤ √ m kA ( H ) k + X i =0 , m kA ( H T i ,T c + H T c ,T i ) k + 1 m kA ( H T c ,T c ) k ≤ √ m kA ( H ) k + 2 C k w k √ µak ( k x T k + k x T k ) + Ca + 4 √ C √ a ! k ¯ H k F + C k w k µak kA ( H ) k − C µak kA ( H ) k ≤ √ m kA ( H ) k + 2 C k w k √ µak k x T k + Ca + 4 √ C √ a ! k ¯ H k F + C k w k µak kA ( H ) k − C µak kA ( H ) k , which implies c − Ca − √ C √ a ! k ¯ H k F ≤ C k w k √ µak k x T k + (cid:18) C k w k µak + 1 √ m (cid:19) kA ( H ) k − C µak kA ( H ) k ≤ C k w k √ µak k x T k + µak C (cid:18) C k w k µak + 1 √ m (cid:19) . It leads to the inequality (3.8). (cid:3) Proof of Lemma 1.1

Denote R d × d sym as the set of symmetric real d × d matrices, and R d × d skew as the set of skew-symmetricreal d × d matrices. If X ∈ H d × d , then X can be written as X = X + iX , where X ∈ R d × d sym and X ∈ R d × d skew are the real and imaginary parts of X . Thus the set { X ∈ H d × d : X (cid:23) } correspondsto H d × d + := (cid:26)(cid:20) X X (cid:21) : ( X , X ) ∈ R d × d sym × R d × d skew , z T1 X z + z T2 X z + z T2 X z − z T1 X z ≥ z , z ∈ R d (cid:27) . Let e A : R d × d → R m be deﬁned by(4.1) (cid:20) X X (cid:21) ( R ( a i ) T X R ( a i ) + I ( a i ) T X I ( a i ) + I ( a i ) T X R ( a i ) − R ( a i ) T X I ( a i )) mi =1 . Then A ( X ) = e A (cid:18)(cid:20) X X (cid:21)(cid:19) . By a simple calculation, its conjugate map e A ∗ : R m → R d × d is given by(4.2) ( b i ) mi =1 (cid:20)P mi =1 b i (cid:0) R ( a i ) R ( a i ) T + I ( a i ) I ( a i ) T (cid:1)P mi =1 b i (cid:0) I ( a i ) R ( a i ) T − R ( a i ) I ( a i ) T (cid:1)(cid:21) . For X = X + iX ∈ C d × d , k X k and k X k F can also be written as k X k = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , = X i,j q [ X ] i,j + [ X ] i,j , and k X k F = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F . Using the notations above, we recast the model (1.8) as follows.(4.3)min X ,X λ (cid:18) Tr( X ) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F (cid:19) + µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , + 12 (cid:13)(cid:13)(cid:13)(cid:13) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:13)(cid:13)(cid:13)(cid:13) s . t . (cid:20) X X (cid:21) ∈ H d × d + . PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 11

If ( X , X ) is a minimizer of (4.3), then the optimal solution X of (1.8) satisﬁes X = X + iX .In order to prove Lemma 1.1, we ﬁrst introduce some technical lemmas in convex optimizationand matrix theory. Assume that Ω ⊂ R n . We use T Ω ( x ) and T Ω ( x ) ∗ to denote the tangent coneof Ω at x ∈ Ω and its dual cone, respectively. Particularly, we have

Proposition 4.1. If Ω is a convex cone in R n and x ∈ Ω , then T Ω ( x ) ∗ = { y ∈ R n : h y , x i ≤ for all x ∈ Ω , and h y , x i = 0 } . Proof.

According to Proposition 4.6.3 in [2], we have(4.4) T Ω ( x ) ∗ = { y ∈ R n : h y , x − x i ≤ x ∈ Ω } . Assume that y ∈ T Ω ( x ) ∗ . Then h y , x − x i ≤ x ∈ Ω. Since Ω is a cone, we have x / , x ∈ Ω. Taking x = 2 x , we obtain h y , x i ≤

0. Similarly, taking x = x /

2, we have h y , x i ≥

0. We arrive at h y , x i = 0, which leads to h y , x i ≤ x ∈ Ω. (cid:3) The following theorem provides some properties of local minimum on constrained model.

Proposition 4.2. [2, Proposition 4.7.3]

Let x be a local minimizer of the model: min x ∈ Ω f ( x ) + f ( x ) , where f is convex and f is smooth over a subset Ω of R n . Assume that the tangent cone T Ω ( x ) is convex. Then −∇ f ( x ) ∈ ∂f ( x ) + T Ω ( x ) ∗ . We next present the sub-gradient set of ∂ ( k X k ): Proposition 4.3. ( [1] ) Assume that X = X + iX with X , X ∈ R d × d . Then the subgradient setof k X k in real space is ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec ( X ) vec ( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! := ( (cid:20) G G (cid:21) , G , G ∈ R d × d : [ G ] i ,i + [ G ] i ,i ≤ , if [ X ] i ,i = [ X ] i ,i = 0;([ G ] i ,i , [ G ] i ,i ) = ([ X ] i ,i , [ X ] i ,i ) q [ X ] i ,i + [ X ] i ,i , otherwise ) (4.5)Combining Proposition 4.1, Proposition 4.2 with Proposition 4.3, we have Lemma 4.4.

Assume that ( X , X ) is a local minimizer of model (4.3). Then there exist (cid:20) Λ Λ (cid:21) ∈ H d × d + and (cid:20) G G (cid:21) ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec ( X ) vec ( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! such that the followings hold:(i) Stationary condition: (4.6) λ (cid:20) I0 (cid:21) − (cid:20) X X (cid:21) . (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ! + µ (cid:20) G G (cid:21) + e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19) − (cid:20) Λ Λ (cid:21) = 0; (ii) Complementary slackness condition: (4.7) (cid:28)(cid:20) Λ Λ (cid:21) , (cid:20) X X (cid:21)(cid:29) = 0 . Proof.

Set f ( X , X ) := µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ,f ( X , X ) := λ (cid:18) Tr( X ) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F (cid:19) + 12 (cid:13)(cid:13)(cid:13)(cid:13) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:13)(cid:13)(cid:13)(cid:13) , and Ω := H d × d + . Then f is convex and f is smooth. Since Ω is convex, we obtain that T Ω (cid:18)(cid:20) X X (cid:21)(cid:19) isconvex by Proposition 4.6.2 in [2]. According to Proposition 4.2, there exists − (cid:20) Λ Λ (cid:21) ∈ T Ω (cid:18)(cid:20) X X (cid:21)(cid:19) ∗ such that the stationary condition (4.6) holds. Furthermore, we can use Proposition 4.1 to obtainthe complementary slackness condition (4.7).We remain to prove (cid:20) Λ Λ (cid:21) ∈ H d × d + . Take T = t t T1 + t t T2 and T = t t T1 − t t T2 for any ﬁxed t , t ∈ R d . Then (cid:20) T T (cid:21) ∈ Ω. By the deﬁnition of T Ω (cid:18)(cid:20) X X (cid:21)(cid:19) ∗ and Proposition 4.2, we obtain that (cid:28) − (cid:20) Λ Λ (cid:21) , (cid:20) T T (cid:21)(cid:29) ≤ , which implies(4.8) t T1 Λ t + t T2 Λ t + t T2 Λ t − t T1 Λ t ≥ t , t ∈ R d . If (Λ , Λ ) ∈ R d × d sym × R d × d skew , then we arrive at the conclusion. Otherwise, we can replace Λ andΛ by e Λ := Λ + Λ T1 e Λ := Λ − Λ T2 . Noting that ( e Λ , e Λ ) ∈ R d × d sym × R d × d skew and * − "e Λ e Λ , (cid:20) T T (cid:21)+ = (cid:28) − (cid:20) Λ Λ (cid:21) , (cid:20) T T (cid:21)(cid:29) ≤ , we obtain that "e Λ e Λ ∈ H d × d + . After a simple calculation, we also have λ (cid:20) I0 (cid:21) − (cid:20) X X (cid:21) . (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ! + µ " e G e G + e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19) − "e Λ e Λ = 0 , and *"e Λ e Λ , (cid:20) X X (cid:21)+ = (cid:28)(cid:20) Λ Λ (cid:21) , (cid:20) X X (cid:21)(cid:29) = 0 , where e G := G + G T1 , e G := G − G T2 and " e G e G ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! . PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 13

Therefore, the stationary condition (4.6) and complementary slackness condition (4.7) also hold for (cid:20) Λ Λ (cid:21) := "e Λ e Λ . (cid:3) We next present the proof of Lemma 1.1.

Proof of Lemma 1.1.

Since k b k > µ k x k + k w k , we obtain that X = 0.We next consider the equivalent model (4.3) with global minimizer ( X , X ). According toLemma 4.4, there exist (cid:20) Λ Λ (cid:21) ∈ H d × d + and (cid:20) G G (cid:21) ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! such that the followingholds:(4.9) λ (cid:20) I0 (cid:21) − (cid:20) X X (cid:21) . (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ! + µ (cid:20) G G (cid:21) + e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19) − (cid:20) Λ Λ (cid:21) = 0;and(4.10) (cid:28)(cid:20) Λ Λ (cid:21) , (cid:20) X X (cid:21)(cid:29) = 0 . According to (4.9), we obtain that(4.11) (cid:13)(cid:13)(cid:13)(cid:13) e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19) + µ (cid:20) G G (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) λ (cid:20) I0 (cid:21) − (cid:20) X X (cid:21) . (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) X X (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ! − (cid:20) Λ Λ (cid:21)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13) λ (cid:18) I − X k X k F (cid:19) − Λ (cid:13)(cid:13)(cid:13)(cid:13) F ≥ λ ( √ r − , where Λ := Λ + i Λ ∈ C d × d and r := rank( X ). The last inequality in (4.11) follows from (4.10)and Lemma 2.3.On the other hand, we have(4.12) (cid:13)(cid:13)(cid:13)(cid:13) e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19) + µ (cid:20) G G (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ (cid:13)(cid:13)(cid:13)(cid:13) e A ∗ (cid:18) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) F + µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) G G (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F ≤ kAk (cid:13)(cid:13)(cid:13)(cid:13) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:13)(cid:13)(cid:13)(cid:13) + µd = kAkkA ( X ) − b k + µd ≤ kAk q µ k x k + k w k + µd ≤ kAk ( p µ k x k + k w k ) + µd. Here, the second inequality follows from Proposition 4.3 and [ G ] i ,i + [ G ] i ,i ≤ i , i ∈{ , ..., d } . Combing (4.11) and (4.12), we obtain that λ ( √ r − ≤ µd + kAk ( p µ k x k + k w k ) . By the assumption on λ in (1.9) as λ > µd + kAk ( √ µ k x k + k w k ) √ − , Algorithm 1

The DCA Algorithm for solving model (1.8) Input: the map A , the vector b , the tolerance error tol ≥

0, the parameters λ, µ and MAXiter. Output:

A matrix X . Initial: X = . Loop: for k = 0 to MAXiter Y k = ( X k k X k k F if X k = if X k = (5.1) X k +1 = argmin X (cid:23) n kA ( X ) − b k + λ Tr( X ) − λ h X, Y k i + µ k X k o If k X k − X k − k F max {k X k k F , } ≤ tol then break X = X k .we have √ r − √ − (cid:16) µd + kAk ( p µ k x k + k w k ) (cid:17) ≤ µd + kAk ( p µ k x k + k w k ) . Thus the only proper choice of r is r = 1. (cid:3) Algorithms for solving Sparse PhaseLiftOff

The DCA algorithm.

In this section, we establish an algorithm to solve the Sparse PhaseLiftOﬀmodel (1.8), which is stated in Algorithm 1. Our algorithm is based on DCA, which is a descentmethod introduced by Tao and An [14, 15]. DCA is also studied in compressed sensing, and inmatrix recovery problem (see [21, 23, 24]).The step 6 of Algorithm 1 is to solve a subproblem (5.1). We suggest employing ADMM method[3] to solve it, which is shown in Algorithm 2. The convergence rate of ADMM was established in[9]. To derive ADMM, we rewrite (5.1) as(5.2) min X (cid:23) ,X = X ,X = X kA ( X ) − b k + λ Tr( X ) − λ h X , Y k i + µ k X k . The problem (5.2) is called global consensus problem [3, Equation (7.2)] with local variables X and X and a common global variable X . The augmented Lagrangian function corresponding to (5.2)is L δ ( X , X , X , Y , Y ) = 12 kA ( X ) − b k + h X , λ ( I − Y k ) i + µ k X k + g (cid:23) ( X )+ h Y , X − X i + h Y , X − X i + δ k X − X k F + δ k X − X k F , where Y , Y are dual variables, δ is augmented Lagrangian parameter and g (cid:23) ( Z ) = ( Z (cid:23) , ∞ otherwise . We can employ the standard ADMM to solve(5.3) min X ,X ,X ,Y ,Y L δ ( X , X , X , Y , Y ) , PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 15

Algorithm 2

ADMM for solving the subproblem (5.1) Input: the map A , the vector b , k , W = λ ( I − Y k ), the parameters λ, µ , δ and MAXiter. Output:

A matrix X k +1 . Initial: X = X = X = Y = Y = . Loop: for l = 0 to MAXiter X l +11 = ( A ∗ A + δ I ) − ( A ∗ ( b ) − W + δX l − Y l ) X l +12 = S µ/δ ( X l − δ Y l ) X l +13 = P (cid:23) (cid:0) ( X l +11 + X l +12 ) + δ ( Y l + Y l ) (cid:1) Y l +11 = Y l + δ ( X l +11 − X l +13 ) Y l +12 = Y l + δ ( X l +12 − X l +13 ) X k +1 = X l . which consists of updating on both the primal and dual variables [3, Equation (7.3)-Equation (7.5)]:(5.4)  X l +11 = arg min X L δ ( X , X l , X l , Y l , Y l ) X l +12 = arg min X L δ ( X l +11 , X , X l , Y l , Y l ) X l +13 = arg min X L δ ( X l +11 , X l +12 , X , Y l , Y l ) Y l +11 = Y l + δ ( X l +11 − X l +13 ) Y l +12 = Y l + δ ( X l +12 − X l +13 )According to [3], δ can be ﬁxed or adaptively updated following the rules below: δ l +1 =  δ l if k R l k F > k S l k F δ l / k R l k F < k S l k F δ l otherwise , where k R l k F = k X l − X l k F + k X l − X l k F , and k S l k F = 2( δ l ) k X l − X l − k F .More explicitly, we state ADMM algorithm for solving (5.4) in Algorithm 2. In Algorithm 2, weuse S λ : C n × n → C n × n to denote the soft-thresholding operator on each elements of the matrix, i.e.,[ S λ ( Z )] i,j = ( ( | Z i,j | − λ ) Z i,j | Z i,j | | Z i,j | ≥ λ, . We use P (cid:23) : H n × n → H n × n to denote the projection on the the positive semideﬁnite cone, i.e., P (cid:23) ( X ) = U max { Σ , } U ∗ , where X = U Σ U ∗ is the eigenvalue decomposition of X .5.2. The Convergence property of Algorithm 1.

The aim of this subsection is to study theconvergence property of Algorithm 1. Motivated by the techniques developed in [23] and [24], wewill show that Algorithm 1 converges to a stationary point. For convenience, we set F ( X ) := λ (Tr( X ) − k X k F ) + µ k X k + 12 kA ( X ) − b k . We ﬁrst show that { F ( X k ) } k ≥ generated by Algorithm 1 is a monotonically decreasing sequence. Lemma 5.1. If { X k } k ≥ is a sequence generated by Algorithm 1, then we have F ( X k ) − F ( X k +1 ) ≥ , for all k ≥ . Proof.

We consider the k th iteration of Algorithm 1. Recall that X k +1 is the solution to (5.1) inAlgorithm 1. Set X k +1 := X k +11 + iX k +12 and Y k := Y k + iY k where X k +11 , X k +12 , Y k , Y k ∈ R d × d .Take f ( X , X ) := µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X )vec( X ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ,f ( X , X ) := λ (cid:18) Tr( X ) − (cid:28)(cid:20) Y k Y k (cid:21) , (cid:20) X X (cid:21)(cid:29)(cid:19) + 12 (cid:13)(cid:13)(cid:13)(cid:13) e A (cid:18)(cid:20) X X (cid:21)(cid:19) − b (cid:13)(cid:13)(cid:13)(cid:13) , and Ω := H d × d + . Then f is convex, f is smooth, and T Ω (cid:18)(cid:20) X k +11 X k +12 (cid:21)(cid:19) is convex. According toProposition 4.2, we have(5.5) λ (cid:18)(cid:20) I0 (cid:21) − (cid:20) Y k Y k (cid:21)(cid:19) + µ (cid:20) G k +11 G k +12 (cid:21) + e A ∗ (cid:18) e A (cid:18)(cid:20) X k +11 X k +12 (cid:21)(cid:19) − b (cid:19) = (cid:20) Λ k +11 Λ k +12 (cid:21) , and(5.6) (cid:28)(cid:20) Λ k +11 Λ k +12 (cid:21) , (cid:20) X k +11 X k +12 (cid:21)(cid:29) = 0 , for some Λ k +1 = Λ k +11 + i Λ k +12 with (cid:20) Λ k +11 Λ k +12 (cid:21) ∈ H d × d + , and G k +1 = G k +11 + iG k +12 with (cid:20) G k +11 G k +12 (cid:21) ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X k +11 )vec( X k +12 ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! . According to Proposition 4.3, we have  [ G k +11 ] i ,i + [ G k +12 ] i ,i ≤ , if [ X k +11 ] i ,i = [ X k +12 ] i ,i = 0;([ G k +11 ] i ,i , [ G k +12 ] i ,i ) = ([ X k +11 ] i ,i , [ X k +12 ] i ,i ) q [ X k +11 ] i ,i +[ X k +12 ] i ,i , otherwise.Using a similar method for proving Lemma 4.4, we can obtain (5.6). According to (5.5), we have (5.7) (cid:28)(cid:20) X k − X k +11 X k − X k +12 (cid:21) , λ (cid:18)(cid:20) I0 (cid:21) − (cid:20) Y k Y k (cid:21)(cid:19) + µ (cid:20) G k +11 G k +12 (cid:21) + e A ∗ (cid:18) e A (cid:18)(cid:20) X k +11 X k +12 (cid:21)(cid:19) − b (cid:19)(cid:29) = (cid:28)(cid:20) X k − X k +11 X k − X k +12 (cid:21) , (cid:20) Λ k +11 Λ k +12 (cid:21)(cid:29) . Combining (5.7) and (cid:28)(cid:20) X k +11 X k +12 (cid:21) , (cid:20) G k +11 G k +12 (cid:21)(cid:29) = k X k +1 k , (cid:28)(cid:20) X k X k (cid:21) , (cid:20) Y k Y k (cid:21)(cid:29) = k X k k F , (cid:28)(cid:20) Λ k +11 Λ k +12 (cid:21) , (cid:20) X k +11 X k +12 (cid:21)(cid:29) = 0 , we obtain that h X k , Λ k +1 i = λ Tr( X k − X k +1 ) − λ k X k k F + λ h X k +1 , Y k i + µ h X k , G k +1 i − µ k X k +1 k + hA ( X k − X k +1 ) , A ( X k +1 ) − b i , (5.8)since e A (cid:18)(cid:20) X k +11 X k +12 (cid:21)(cid:19) = A ( X k +1 ) and e A (cid:18)(cid:20) X k − X k +11 X k − X k +12 (cid:21)(cid:19) = A ( X k − X k +1 ) with X k = X k + iX k and X k +1 = X k +11 + iX k +12 . Combining F ( X k ) − F ( X k +1 ) = 12 kA ( X k +1 − X k ) k + hA ( X k − X k +1 ) , A ( X k +1 ) − b i + µ ( k X k k − k X k +1 k ) + λ (Tr( X k − X k +1 ) − k X k k F + k X k +1 k F ) , PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 17 and (5.8), we arrive at F ( X k ) − F ( X k +1 ) = 12 kA ( X k +1 − X k ) k + µ ( k X k k − h X k , G k +1 i ) + h X k , Λ k +1 i + λ ( k X k +1 k F − h X k +1 , Y k i ) ≥ . (5.9)Here, the last inequality follows from k X k k − h X k , G k +1 i ≥ , k X k +1 k F − h X k +1 , Y k i ≥

0, and h X k , Λ k +1 i ≥ k G k +1 k ∞ ≤ k Y k k F ≤

1, and Λ k +1 (cid:23) (cid:3) We next show the convergence property of Algorithm 1.

Theorem 5.2.

Assume that { X k } k ≥ is a sequence generated by Algorithm 1. We have(1) { X k } k ≥ is a bounded sequence;(2) lim k →∞ k X k +1 − X k k F = 0 ;(3) Assume that e X = e X + i e X is an accumulation point of { X k } k ≥ . Then e X satisﬁes:(i) Stationary condition: (5.10) λ (cid:20) I0 (cid:21) − " e X e X X e X F ! + µ " e G e G + e A ∗ e A " e X e X − b ! − "e Λ e Λ = 0; (ii) Complementary slackness condition: (5.11) *"e Λ e Λ , " e X e X = 0 , for some "e Λ e Λ ∈ H d × d + and (5.12) " e G e G ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)" vec ( e X ) vec ( e X ) ,  , where ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)" vec ( e X ) vec ( e X ) ,  is given in (4.5).Proof. (1) The deﬁnition of F implies µ k X k +1 k ≤ F ( X k +1 ) and hence k X k +1 k ≤ F ( X k +1 ) /µ ≤ F ( X ) /µ for k ≥

1. Here we use Lemma 5.1, i.e., { F ( X k ) } k ≥ is monotonically decreasing. Hence, { X k } k ≥ is a bounded sequence.(2) We ﬁrst consider the case where X = . A simple calculation shows that X k = for k ≥ X = , and we arrive at the conclusion immediately. So, we next just consider thecase on X = . Taking k = 0 in (5.9), we obtain that F ( X ) − F ( X ) = F ( ) − F ( X ) = 12 kA ( X ) k + λ k X k F ≥ λ k X k F > Y k = . It implies F ( X k ) ≤ F ( X ) < F ( ) for any k ≥

2. Hence, we obtain that X k = 0 for all k ≥

1. By (5.9), we obtain that F ( X k ) − F ( X k +1 ) ≥ kA ( X k +1 − X k ) k + λ ( k X k +1 k F − h X k +1 , Y k i ) . Noting that { F ( X k ) } k ≥ is a convergent sequence and k X k +1 k F − h X k +1 , Y k i ≥

0, we have(5.13) lim k →∞ kA ( X k − X k +1 ) k = 0and(5.14) lim k →∞ ( k X k +1 k F − h X k +1 , Y k i ) = lim k →∞ (cid:18) k X k +1 k F − (cid:28) X k +1 , X k k X k k F (cid:29)(cid:19) = 0 . The following argument is similar with that in Proposition 3.1 (b) in [24]. We put it here forcompleteness. Set c k := h X k ,X k +1 ik X k k F and E k := X k +1 − c k X k . It suﬃces to prove that E k → and c k →

1. According to (5.14) and boundness of { X k } k ≥ , we have k E k k F = k X k +1 k F − h X k , X k +1 i k X k k F = (cid:18) k X k +1 k F − h X k , X k +1 ik X k k F (cid:19) (cid:18) k X k +1 k F + h X k , X k +1 ik X k k F (cid:19) → , Then we have0 = lim k →∞ kA ( X k − X k +1 ) k = lim k →∞ kA (( c k − X k − E k ) k = lim k →∞ | c k − |kA ( X k ) k . If lim k →∞ c k = 1, then there exists a subsequence { X k j } such that kA ( X k j ) k →

0. Therefore, wecan obtain that lim k j →∞ F ( X k j ) ≥ lim k j →∞ kA ( X k j ) − b k = 12 k b k = F ( X ) , which leads to a contradiction to the fact that F ( X k j ) ≤ F ( X ) < F ( X ) . Thus we can get c k → E k → , and thus X k +1 − X k → , when k → ∞ .(3) Assume that { X k j } j ≥ ⊂ { X k } k ≥ is a subsequence satisfying lim j →∞ X k j = e X = e X + i e X = . For simplicity, we abuse the notation and denote { X k j } as { X j } . Replacing k by j − λ (cid:18)(cid:20) I0 (cid:21) − (cid:20) Y j − Y j − (cid:21)(cid:19) + µ (cid:20) G j G j (cid:21) + e A ∗ (cid:18) e A (cid:18)(cid:20) X j X j (cid:21)(cid:19) − b (cid:19) = (cid:20) Λ j Λ j (cid:21) , and(5.16) (cid:28)(cid:20) Λ j Λ j (cid:21) , (cid:20) X j X j (cid:21)(cid:29) = 0 , for some Λ j = Λ j + i Λ j with (cid:20) Λ j Λ j (cid:21) ∈ H d × d + , and G j = G j + iG j with(5.17) (cid:20) G j G j (cid:21) ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) vec( X j )vec( X j ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , ! . Note that (5.15) is equivalent to(5.18) λ (cid:18)(cid:20) I0 (cid:21) − (cid:20) Y j − Y j − (cid:21)(cid:19) + e A ∗ (cid:18) e A (cid:18)(cid:20) X j X j (cid:21)(cid:19) − b (cid:19) = (cid:20) Λ j Λ j (cid:21) − µ (cid:20) G j G j (cid:21) . Noting that lim j →∞ (cid:20) Y j − Y j − (cid:21) = " e X e X X e X F , PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 19 we obtain that the left hand side of (5.18) converges to(5.19)lim j →∞ λ (cid:18)(cid:20) I0 (cid:21) − (cid:20) Y j − Y j − (cid:21)(cid:19) + e A ∗ (cid:18) e A (cid:18)(cid:20) X j X j (cid:21)(cid:19) − b (cid:19) = λ (cid:20) I0 (cid:21) − " e X e X X e X F ! + e A ∗ e A " e X e X − b ! . For convenience, we set P j := (cid:20) Λ j Λ j (cid:21) and Q j := − µ (cid:20) G j G j (cid:21) . According to (5.17) and Proposition 4.3, we have k G j k ∞ ≤ k G j k ∞ ≤

1. Combining (5.18)and the boundedness of { X j } j ≥ , we obtain that { P j } j ≥ and { Q j } j ≥ are also bounded sequences,which can belong to some compact sets S ⊂ H d × d + and T , respectively.We assume that { j l } l ≥ is a subsequence of { j } j ≥ such that lim l →∞ P j l = e P and lim l →∞ Q j l = e Q for some e P ∈ S , e Q ∈ T . More concretely, we havelim l →∞ P j l = lim l →∞ (cid:20) Λ j l Λ j l (cid:21) = "e Λ e Λ , and lim l →∞ Q j l = lim l →∞ − µ (cid:20) G j l G j l (cid:21) = − µ " e G e G for some "e Λ e Λ ∈ S ⊂ H d × d + . According to (5.18), we have λ (cid:20) I0 (cid:21) − " e X e X X e X F ! + e A ∗ e A " e X e X − b ! = "e Λ e Λ − µ " e G e G , which implies the stationary condition (5.10). The complementary slackness condition (5.11) isobtained by *"e Λ e Λ , " e X e X = lim l →∞ (cid:28)(cid:20) Λ j l Λ j l (cid:21) , (cid:20) X j l X j l (cid:21)(cid:29) = 0 . Here, we use (5.16).We remain to prove (5.12). For suﬃciently large j l , we have supp( e X ) ⊂ supp( X j l ). If ( i , i ) ∈ supp( e X ), then lim l →∞ ([ G j l ] i ,i , [ G j l ] i ,i ) = ([ e X ] i ,i , [ e X ] i ,i ) q [ e X ] i ,i + [ e X ] i ,i . If ( i , i ) / ∈ supp( e X ), we have ([ G j l ] i ,i ) + ([ G j l ] i ,i ) ≤ e G ] i ,i ) + ([ e G ] i ,i ) ≤ . Thus lim l →∞ (cid:20) G j l G j l (cid:21) = " e G e G ∈ ∂ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)" vec( e X )vec( e X ) ,  , which leads to (5.12). (cid:3) Numerical experiments

The purpose of numerical experiments is to compare the performance of (1.8) with that ofSPARTA [18], of SWF [25] and of SPRSF [11]. We choose the parameters of those algorithmsas in [18, 25, 11]. In this section, we use the relative errorrelative error := d r ( z , x ) k x k , where d r ( z , x ) = min k z ± x k for the real case and d r ( z , x ) = min θ ∈ [0 , π ) k exp( − iθ ) z − x k for thecomplex case. In our numerical experiments, we assume that the sampling vectors a j , j = 1 , . . . , m are Gaussian random vector, i.e., a j ∼ N (0 , I d ) for real case and a j ∼ N (0 , I d ) + i N (0 , I d ) forcomplex case.For each ﬁxed k , the support of a k -sparse signal x is drawn from the uniform distribution overthe set of all subsets of [1 , m ] ∩ Z of size k . The non-zero entries of the real (resp. complex) k -sparsesignal x have Gaussian distribution N (0 ,

1) (resp. N (0 , i N (0 , x into k x k = 1. All experiments are carried out on Matlab 2017 with a 3.7GHz Intel Core i7-8700K and 64 GB memory. Example 6.1.

The aim of this numerical experiment is to test the success rate of Algorithm 1against the measurement number m . In this example, we take k = 5 and d = 50 . The ratio between m and d is varied from . to , with stepsize . . We choose µ = 0 . and λ = µk √ − in Algorithm1. We classify a recovery as a success if the relative error is less than − . For each ﬁxed m , werepeat the experiments for trails and and calculate the success rate.Figure 1 depicts the empirical probability of successful recovery against the measurement number m . The numerical results show that Algorithm 1 outperform other algorithms. m/d E m p i r i c a l S u cc e ss R a t e SPRSFSWFSPARTAAlgorithm 1 (a) m/d E m p i r i c a l S u cc e ss R a t e SPRSFSWFSPARTAAlgorithm 1 (b)

Figure 1.

Comparison of diﬀerent algorithms for ﬁxed k = 5 with diﬀerent m/n ra-tio: (A) Noiseless real-valued Gaussian model; (B) Noiseless complex-valued Gauss-ian model.

PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 21

Example 6.2.

In this example, we test the success rate of Algorithm 1 against the sparsity level k .We take d = 50 and m = 2 d . The parameters in Algorithm 1 are taken as µ = 0 . and λ = µk √ − .Figure 2 depicts the numerical results. It shows that Algorithm 1 is superior to the SPRSF, SWFand SPARTA for both real and complex cases. Furthermore, we can see that Algorithm 1 can makegood performance even under large level of sparsity. k E m p i r i c a l S u cc e ss R a t e SPRSFSWFSPARTAAlgorithm 1 (a) k E m p i r i c a l S u cc e ss R a t e SPRSFSWFSPARTAAlgorithm 1 (b)

Figure 2.

Comparison of diﬀerent algorithms for diﬀerent sparsity level k : (A)Noiseless real-valued Gaussian model; (B) Noiseless complex-valued Gaussianmodel. Example 6.3.

In this example, we test the robustness of Algorithm 1. We take d = 50 , m = 2 d and k = 5 for both real and complex cases, followed by adding white Gaussian noise by MATLABfunction awgn( A ( x ) ,snr) , i.e., b j = |h a j , x i| + w j , j = 1 , . . . , m with w ∼ q kA ( x ) k /m snr/ N (0 , I m ) .Since other algorithms do not make recovery under this setting, we only show the robustnessperformance on Algorithm 1. The SNR value varies from 10dB to 50dB, with step-size 5dB. TheSNR in each noise level is averaged over 20 independent trials. According to Theorem 1.2, we choose µ = max { . k w k , . } and λ = µk √ − . We compute the signal-to noise ratio of reconstruction indB as −

20 log ( relative error ) . In Figure 3, it shows that Algorithm 1 yields robust recovery withrespect to diﬀerent noise level. In addition, the recovery error is a bitter larger for complex case. References [1] Dimitri P. Bertsekas. Nonlinear programming, Athena Scientiﬁc, 1999.[2] Dimitri P. Bertsekas, Angelia Nedic, and Asuman E. Ozdaglar. Convex Analysis and Optimization, AthenaScientiﬁc, 2003.[3] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization andstatistical learning via the alternating direction method of multipliers.

Foundations and Trends in MachineLearning , 3(1): 1–122, 2010.[4] T. Tony Cai, Xiaodong Li, and Zongming Ma. Optimal rates of convergence for noisy sparse phase retrieval viathresholded Wirtinger ﬂow.

The Annals of Statistics , 44(5):2221–2251, 2016.[5] E. J. Cand`es and X. Li, “Solving quadratic equations via PhaseLift when there are about as many equations asunknowns,”

Found. Comut. Math. , 14(5):1017–1026, 2014.

10 15 20 25 30 35 40 45 50

Noise Level(dB) S NR o f R e c on s t r u c t i on ( d B ) Real CaseComplex Case

Figure 3.

SNR of signal recovery v.s. noise level in measurements when k = 5: x -axis is the noise level varying from 10db to 50db, y -axis is the reconstruction errorin db as −

20 log (relative error). [6] Emmanuel J. Cand`es, Thomas Strohmer, and Vladislav Voroninski, Phaselift: Exact and stable signal recoveryfrom magnitude measurements via convex programming. Communnications on Pure and Applied Mathematics ,66(8):1241–1274, 2013.[7] Emmanuel J. Cand`es, Yonina C. Eldar, Thomas Strohmer and Vladislav Voroninski. Phase retrieval via matrixcompletion.

SIAM Review , 57(2):225–251, 2013.[8] Aldo Conca, Dan Edidin, Milena Hering and Cynthia Vinzant. An algebraic characterization of injectivity inphase retrieval.

Applied and Computational Harmonic Analysis , 38(2):346-356, 2015.[9] Bingsheng He and Xiaoming Yuan. On the O(1/n) convergence rate of the Douglas-Rachford alternating directionmethod.

SIAM Journal on Numerical Analysis , 50 (2) :700–709, 2012.[10] Xiaodong Li and Vladislav Voroninski. Sparse Signal Recovery from Quadratic Measurements via Convex Pro-gramming,

SIAM Journal on Mathematical Analysis , 45(5):3019–3033, 2013.[11] Samuel Pinilla, Jorge Bacca and Henry Arguello. SPRSF: Sparse Phase Retrieval via Smoothing Function, arXiv:1807.09703[12] Yoav Shechtman, Yonina C. Eldar, Alexander Szameit and Mordechai Segev. Sparsity based sub-wavelengthimaging with partially incoherent light via quadratic compressed sensing.

Optics express , 19(16):14807–14822,2011.[13] Yoav Shechtman, Amir Beck and Yonina C. Eldar. GESPAR: Eﬃcient phase retrieval of sparse signals.

IEEEtransactions on signal processing , 62(4): 928–938, 2014.[14] P. D. Tao and L. T. H. An, Convex analysis approach to dc programming: Theory, algorithms and applications.

Acta Math. Vietnam , 22: 289–355, 1997.[15] Pham D. Tao and Le T. H. An. A D.C. optimization algorithm for solving the trust-region subproblem,

SIAMJournal on Optimization ∼ romanv/papers/papers.html.[17] Vladislav Voroninski and Zhiqiang Xu. A strong restricted isometry property, with an application to phaselesscompressed sensing. Applied Computational Harmonic Analysis , 40(2):386–395, 2016.[18] Gang Wang, Liang Zhang and Georgios B. Giannakis. Sparse Phase Retrieval via Truncated Amplitude Flow.

IEEE Transactions on Signal Processing , 66(2):479–491, 2016.[19] Yang Wang and Zhiqiang Xu. Phase retrieval for sparse signals.

Applied and Computational Harmonic Analysis ,37(3): 531–544, 2014.[20] Yang Wang and Zhiqiang Xu. Generalized phase retrieval: measurement number, matrix recovery and beyond.

Applied and Computational Harmonic Analysis , 47(2):423–446, 2019.[21] Yu Xia and Song Li. Identiﬁability of Multichannel Blind Deconvolution and Nonconvex Regularization Algo-rithm.

IEEE Transactions on Signal Processing , 66(20): 5299–5312, 2018.

PARSE PHASE RETRIEVAL VIA PHASELIFTOFF 23 [22] Yu Xia and Zhiqiang Xu. The recovery of complex sparse signals from few phaseless measurements. To appearin

Applied and Computational Harmonic Analysis .[23] Penghang Yin and Jack Xin. PhaseLiftOﬀ: an Accurate and Stable Phase Retrieval Method Based on Diﬀerenceof Trace and Frobenius Norms.

Communications in Mathematical Sciences , DOI: 10.4310/CMS.2015.v13.n4.a10,2015.[24] Penghang Yin, Yifei Lou, Qi He and Jack Xin. Minimization of ℓ − for compressed sensing. SIAM Journal onScientiﬁc Computing , 37(1):A536–A563, 2015.[25] Ziyang Yuan, Hongxia Wang and Qi Wang. Phase Retrieval via Sparse Wirtinger Flow.

Journal of Computationaland Applied Mathematics , 355: 162–173, 2019.

Department of Mathematics, Hangzhou Normal University, Hangzhou 311121, China

E-mail address : [email protected] LSEC, Inst. Comp. Math., Academy of Mathematics and System Science, Chinese Academy of Sciences,Beijing, 100091, ChinaSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

E-mail address ::