[PDF] SDP Achieves Exact Minimax Optimality in Phase Synchronization

Abstract

We study the phase synchronization problem with noisy measurements Y=z^*z^{*H}+\sigma W\in\mathbb{C}^{n\times n}, where z^* is an n-dimensional complex unit-modulus vector and W is a complex-valued Gaussian random matrix. It is assumed that each entry Y_{jk} is observed with probability p. We prove that an SDP relaxation of the MLE achieves the error bound (1+o(1))\frac{\sigma^2}{2np} under a normalized squared \ell_2 loss. This result matches the minimax lower bound of the problem, and even the leading constant is sharp. The analysis of the SDP is based on an equivalent non-convex programming whose solution can be characterized as a fixed point of the generalized power iteration lifted to a higher dimensional space. This viewpoint unifies the proofs of the statistical optimality of three different methods: MLE, SDP, and generalized power method. The technique is also applied to the analysis of the SDP for \mathbb{Z}_2 synchronization, and we achieve the minimax optimal error \exp\left(-(1-o(1))\frac{np}{2\sigma^2}\right) with a sharp constant in the exponent.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n SDP Achieves Exact Minimax Optimality in PhaseSynchronization

Chao Gao and Anderson Y. Zhang University of Chicago University of Pennsylvania

January 8, 2021

Abstract

We study the phase synchronization problem with noisy measurements Y = z ∗ z ∗ H + σW ∈ C n × n , where z ∗ is an n -dimensional complex unit-modulus vector and W is acomplex-valued Gaussian random matrix. It is assumed that each entry Y jk is observedwith probability p . We prove that an SDP relaxation of the MLE achieves the error bound(1 + o (1)) σ np under a normalized squared ℓ loss. This result matches the minimax lowerbound of the problem, and even the leading constant is sharp. The analysis of the SDPis based on an equivalent non-convex programming whose solution can be characterizedas a ﬁxed point of the generalized power iteration lifted to a higher dimensional space.This viewpoint uniﬁes the proofs of the statistical optimality of three diﬀerent meth-ods: MLE, SDP, and generalized power method. The technique is also applied to theanalysis of the SDP for Z synchronization, and we achieve the minimax optimal errorexp (cid:0) − (1 − o (1)) np σ (cid:1) with a sharp constant in the exponent. Consider the problem of phase synchronization [26] with observations Y jk = z ∗ j ¯ z ∗ k + σW jk ∈ C , (1)for 1 ≤ j < k ≤ n . Our goal is to estimate z ∗ , · · · , z ∗ n ∈ C = { x ∈ C : | x | = 1 } . Since | z ∗ j | = 1, we can write z ∗ j = e iθ ∗ j with some θ ∗ j ∈ (0 , π ] for all j ∈ [ n ], and thus Y jk isunderstood to be a noisy observation of the pairwise diﬀerence between two angles θ ∗ j and θ ∗ k . Following [2, 5, 17, 30], we consider an additive noise model and we assume that W jk isa standard complex Gaussian variable independently for all 1 ≤ j < k ≤ n . For W jk ∼ CN (0 , W jk ) ∼ N (cid:0) , (cid:1) and Im( W jk ) ∼ N (cid:0) , (cid:1) independently. z ∗ ∈ C n has been studied by [16] under the lossfunction ℓ ( b z, z ) = min a ∈ C n n X j =1 | b z j − z j a | . (2)We note that the minimization of a ∈ C n in the deﬁnition of (2) is necessary, since aglobal rotation of the angles θ ∗ , · · · , θ ∗ n does not change the distribution of the observations { Y jk } ≤ j

Assume σ = o ( np ) and np log n → ∞ . Let b Z be a global maximizer of the SDP(13) and b z j = u j / | u j | for j ∈ [ n ] with u ∈ C n being the leading eigenvector of b Z . There existssome δ = o (1) such that n k b Z − z ∗ z ∗ H k ≤ (1 + δ ) σ np ,ℓ ( b z, z ∗ ) ≤ (1 + δ ) σ np , with probability at least − n − − exp (cid:16) − (cid:0) npσ (cid:1) / (cid:17) . In particular, δ can be chosen to satisfy δ = O (cid:18)(cid:16) log n + σ np (cid:17) / (cid:19) . Compared with the minimax lower bound (Theorem 2.1 in Section 2), Theorem 1.1 showsthat SDP leads to both minimax optimal estimations of the matrix z ∗ z ∗ H and of the vector z ∗ . The two error bounds are not just rate-optimal, but the leading constants are sharp aswell. We remark that both conditions σ = o ( np ) and np log n → ∞ are essential for the resultsof the above theorem to hold. Since the minimax risk of the problem is of order σ np , thecondition σ = o ( np ), which is equivalent to σ np = o (1), guarantees that the minimax risk isof smaller order than the trivial one. The order O (1) is trivial, as it can simply be achievedby random guess. The condition np log n → ∞ guarantees that the random graph A is connectedwith high probability. It is known that when p ≤ c log nn for some suﬃciently small constant c >

0, the random graph has several disjoint components, which makes the recovery of z ∗ upto a global phase impossible.Our analysis of the SDP does not rely on its connection to the MLE, and it is thereforefundamentally diﬀerent from the approaches considered by [5, 23, 30]. To study the statisticalproperties of SDP directly, we consider the following iteration procedure, V ( t ) j = P k ∈ [ n ] \{ j } ¯ Y jk V ( t − k (cid:13)(cid:13)(cid:13)P k ∈ [ n ] \{ j } ¯ Y jk V ( t − k (cid:13)(cid:13)(cid:13) ∈ C n , j = 1 , · · · , n. (6)Deﬁne the matrix V ( t ) ∈ C n × n with its j th column being V ( t ) j . The above iteration canbe shorthanded as V ( t ) = f ( V ( t − ). We use (6) as a non-convex characterization of theSDP (5), because the solution to (5) can always be written as b Z = b V H b V for some b V ∈ C n × n satisfying the ﬁxed-point equation b V = f ( b V ). Note that the iterative procedure (6) resemblesthe formula of the generalized power method (GPM) [8, 26, 30], z ( t ) j = P k ∈ [ n ] \{ j } Y jk z ( t − k (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } Y jk z ( t − k (cid:12)(cid:12)(cid:12) ∈ C , j = 1 , · · · , n. (7)3e can therefore think of (6) as a lift of the GPM (7) into a higher dimensional space. Thisallows us to analyze the statistical error of SDP from an iterative algorithm perspective,and previous techniques of analyzing general iterative algorithms in [15, 24] can be borrowedfor the current purpose. To understand the exact statistical error of SDP, we establish thefollowing convergence result for the iterative procedure (6), ℓ ( V ( t ) , z ∗ ) ≤ δℓ ( V ( t − , z ∗ ) + optimal statistical error , for all t ≥ , (8)for some δ = o (1) with high probability, as long as it is properly initialized. Here, with slightabuse of notation, the loss of b V is deﬁned by ℓ ( b V , z ∗ ) = min a ∈ C n : k a k =1 n n X j =1 k b V j − ¯ z ∗ j a k , (9)which is natural given that the matrix b Z = b V H b V is used to estimate z ∗ z ∗ H . Since the SDPsolution is a ﬁxed point of the iteration (6), the convergence result (8) directly leads to thesharp statistical error bounds in Theorem 1.1.Our analysis of SDP through (6) also uniﬁes the understandings of the GPM and theMLE. Given the relation between (6) and (7), the convergence result (8) directly implies ℓ ( z ( t ) , z ∗ ) ≤ δℓ ( z ( t − , z ∗ ) + optimal statistical error , for all t ≥ , (10)for some δ = o (1) with high probability, as long as the GPM is properly initialized. Thisprovides an alternative proof to the minimax optimality of the GPM that has been previouslyestablished by [16]. In addition, just as the SDP can be viewed as a ﬁxed point of the iteration(6), the MLE can be viewed as a ﬁxed point of the iteration (7). The minimax optimalityof the MLE can also be derived. To summarize, we are able to show the exact minimaxoptimality of SDP, GPM, and MLE using a single proof based on the iterative procedure (6).In addition to phase synchronization, we also establish the optimality of the SDP for Z synchronization. In the setting of Z synchronization, one observes Y jk = z ∗ j z ∗ k + σW jk ∈ R for 1 ≤ j < k ≤ n , and the goal is to estimate z ∗ , · · · , z ∗ n ∈ {− , } . Assume W jk ∼ N (0 , Y jk is observed with probability p , we show that the SDP for Z synchronizationachieves the error exp (cid:16) − (1 − o (1)) np σ (cid:17) . (11)We also prove a matching lower bound for this problem. Since Z synchronization is a discreteparameter estimation problem, the minimax risk is an exponential function of the signal-to-noise ratio, compared with the polynomial function for phase synchronization. Despitebeing a continuous optimization method, the SDP is able to adapt to the discreteness ofthe problem. The exponential rate (11) has been previously derived for p = 1 by [13]. Ouranalysis based on the iterative algorithm perspective generalizes their result to more generalvalues of p ≫ log nn . 4 aper Organization. The rest of the paper is organized as follows. In Section 2, weestablish the statistical optimality of the SDP for phase synchronization. The implications ofthe SDP analysis on the statistical error bounds of GPM and MLE are discussed in Section3. The analysis of the SDP for Z synchronization is presented in Section 4. Finally, Section5 collects all the technical proofs of the paper. Notation.

For d ∈ N , we write [ d ] = { , . . . , d } . Given a, b ∈ R , we write a ∨ b = max( a, b )and a ∧ b = min( a, b ). For a set S , we use I { S } and | S | to denote its indicator function andcardinality respectively. For a complex number x ∈ C , we use ¯ x for its complex conjugate,Re( x ) for its real part, Im( x ) for its imaginary part, and | x | for its modulus. For a complexvector x ∈ C d , we use k x k = qP dj =1 | x j | for its norm. For a matrix B = ( B jk ) ∈ C d × d ,we use B H ∈ C d × d for its conjugate transpose such that B H = ( ¯ B kj ). The Frobeniusnorm and operator norm of B are deﬁned by k B k F = qP d j =1 P d k =1 | B jk | and k B k op =sup u ∈ C d ,v ∈ C d : k u k = k v k =1 u H Bv . We use Tr ( B ) for the trace of a squared matrix B . For U, V ∈ C d × d , U ◦ V ∈ R d × d is the Hadamard product U ◦ V = ( U jk V jk ). The notation P and E are generic probability and expectation operators whose distribution is determinedfrom the context. Recall that we observe a random graph A jk ∼ Bernoulli( p ) independently for all 1 ≤ j

1) whenever A jk = 1. The observations can be organized as an adjacency matrix A and a masked versionof the pairwise interactions A ◦ Y . All the matrices A , W , and Y are Hermitian as we deﬁne A jk = A kj , W jk = ¯ W kj , and Y jk = ¯ Y kj for all 1 ≤ k < j ≤ n and A jj = W jj = 0 and Y jj = 1for all j ∈ [ n ]. Hence we have the matrix representation Y = z ∗ z ∗ H + σW .To estimate the vector z ∗ ∈ C n , the MLE is deﬁned as a global maximizer of the followingoptimization problem max z ∈ C n z H ( A ◦ Y ) z. (12)Since (12) is computationally infeasible, we consider the following convex relaxation of (12)via SDP, max Z = Z H ∈ C n × n Tr (( A ◦ Y ) Z ) subject to diag( Z ) = I n and Z (cid:23) . (13)The goal of our paper is to establish the statistical optimality of the SDP (13). We ﬁrstprovide a minimax lower bound as the benchmark of the problem.5 heorem 2.1 (Theorem 4.1 of [16]) . Assume σ = o ( np ) . Then, we have inf b Z ∈ C n × n sup z ∈ C n E z n k b Z − zz H k ≥ (1 − δ ) σ np , inf b z ∈ C n sup z ∈ C n E z ℓ ( b z, z ) ≥ (1 − δ ) σ np , for some δ = o (1) . The above theorem has been established by [16] as the minimax lower bound for phasesynchronization. In fact, Theorem 4.1 of [16] only states the lower bound result for the lossfunction ℓ ( b z, z ). However, the proof of Theorem 4.1 of [16] actually established the lowerbound under the loss n k b Z − zz H k , and the lower bound for ℓ ( b z, z ) is proved as a directconsequence in view of the inequalityinf b z ∈ C n sup z ∈ C n E z ℓ ( b z, z ) ≥

12 inf b Z ∈ C n × n sup z ∈ C n E z n k b Z − zz H k . Since the solution of the SDP (13) is a matrix, it is natural to study the statistical errorunder n k b Z − zz H k in addition to the loss ℓ ( b z, z ). Our analysis of the SDP (13) relies on an equivalent non-convex characterization. Since Z isa positive semi-deﬁnite Hermitian matrix, it admits a decomposition Z = V H V, for some V ∈ C n × n . Let V j be the j th column of V , and we have Z jk = V H j V k . In particular,the constraint diag( Z ) = I n can be written as Z jj = k V j k = 1 for all j ∈ [ n ]. Replacing Z by V H V , the SDP (13) can be equivalently represented asmax V ∈ C n × n Tr (( A ◦ Y ) V H V ) subject to k V j k = 1 for all j ∈ [ n ] . (14)The formulation (14) is closely related to the Burer-Monteiro problem [9, 20] for the SDPexcept that here V is still an n × n matrix without dimension reduction. This non-convexformulation allows us to derive sharp statistical error bounds of the SDP (13).We analyze (14) through the following iteration procedure, V ( t ) j = P k ∈ [ n ] \{ j } A jk ¯ Y jk V ( t − k (cid:13)(cid:13)(cid:13)P k ∈ [ n ] \{ j } A jk ¯ Y jk V ( t − k (cid:13)(cid:13)(cid:13) . (15)Let us shorthand the above formula by V ( t ) = f ( V ( t − ) , (16)6y introducing a map f : C n × n → C n × n such that the j th column of f ( V ( t − ) is given by(15). We use the notation C n × n for the set of n × n complex matrices whose columns allhave unit norms. The update (16) can be seen as a local approach (or more precisely, ablock coordinate ascent approach) [12, 27] to solve (14). To see why this is true, consider thefollowing local optimization problemmax V j ∈ C n : k V j k =1  V H j  X k ∈ [ n ] \{ j } A jk ¯ Y jk V ( t − k  +  X k ∈ [ n ] \{ j } A jk ¯ Y jk V ( t − k  H V j  . (17)The objective of (17) collects the terms in the expansion of Tr (( A ◦ Y ) V H V ) = P jk A jk ¯ Y jk V H j V k that depend on V j and replaces V k by V ( t − k for all k ∈ [ n ] \{ j } . By simple algebra, we cansee the solution of (17) is exactly (15).Let b V be a global maximizer of (14). The matrix b V must be a ﬁxed point of the map f , b V = f ( b V ) . (18)To see why (18) holds, we consider the local optimization problem (17) with V ( t − k replacedby b V k for all k ∈ [ n ] \{ j } . Thus, as long as b V maximizes (14), its j th column b V j must maximizethis local optimization problem, which then implies the ﬁxed-point equation (18).Since the SDP solution b Z = b V H b V is an estimator of the matrix z ∗ z ∗ H , we can think of b V j as an estimator of ¯ z ∗ j embedded in C n . Note that z ∗ j ¯ z ∗ k = z ∗ j a H a ¯ z ∗ k , for any a ∈ C n such that k a k = 1, and thus we can embed each ¯ z ∗ j in C n by considering thevector ¯ z ∗ j a ∈ C n . This motivates the deﬁnition of the loss function ℓ ( b V , z ∗ ) given in (9). Thefollowing lemma characterizes the evolution of this loss function through the map f . Lemma 2.1.

Assume σ = o ( np ) and np log n → ∞ . Then, for any γ = o (1) , there exists some δ = o (1) and δ = o (1) such that P (cid:18) ℓ ( f ( V ) , z ∗ ) ≤ δ ℓ ( V, z ∗ ) + (1 + δ ) σ np for any V ∈ C n × n such that ℓ ( V, z ∗ ) ≤ γ (cid:19) ≥ − (2 n ) − − exp (cid:18) − (cid:16) npσ (cid:17) / (cid:19) . In particular, δ and δ can be chosen to satisfy δ = O (cid:18)(cid:16) γ + log n + σ np (cid:17) / (cid:19) and δ = O (cid:16)q log n + σ np (cid:17) . The lemma shows that for any V ∈ C n × n that has a nontrivial error, the matrix f ( V ) willhave an error that is smaller by a multiplicative factor δ up to an additive term (1 + δ ) σ np .Deﬁne V ∗ ∈ C n × n with the j th column given by V ∗ j = ¯ z ∗ j a for some a ∈ C n that satisﬁes k a k = 1. We immediately have ℓ ( f ( V ∗ ) , z ∗ ) ≤ (1 + δ ) σ np . δ ) σ np can be understood as the oracle statistical error giventhe knowledge of z ∗ .The two conditions σ = o ( np ) and np log n → ∞ are essential for the result to hold. While σ = o ( np ) makes sure that the statistical error σ np is of a nontrivial order, the condition np log n → ∞ guarantees that the random graph is connected. We can slightly relax bothconditions to np ≥ Cσ and p ≥ C log nn for some suﬃciently large constant C > δ and δ in the result of Lemma 2.1 by some suﬃciently small constants.However, vanishing δ and δ require that σ to be of smaller order than np and p to be ofgreater order than log nn . In this section, we show the result of Lemma 2.1 implies the statistical optimality of theSDP (13). Since the solution of the SDP can be written as b Z = b V H b V with b V satisfying theﬁxed-point equation (18), we can apply the result of Lemma 2.1 to b V = f ( b V ) as long as acrude bound ℓ ( b V , z ∗ ) ≤ γ can be proved for some δ = o (1). Lemma 2.2.

Assume np log n → ∞ . Let b Z = b V H b V be a global maximizer of the SDP (13).Then, there exits some constant C > such that ℓ ( b V , z ∗ ) ≤ C s σ + 1 np , with probability at least − (2 n ) − . Under the condition that σ = o ( np ) and np log n → ∞ , we have ℓ ( b V , z ∗ ) ≤ γ for some γ = o (1). Thus, Lemma 2.1 and the fact b V = f ( b V ) imply that ℓ ( b V , z ∗ ) ≤ δ ℓ ( b V , z ∗ ) + (1 + δ ) σ np . (19)After rearrangement, we obtain the bound ℓ ( b V , z ∗ ) ≤ δ − δ σ np . The result is summarizedinto the following theorem. Theorem 2.2.

Assume σ = o ( np ) and np log n → ∞ . Let b Z = b V H b V be a global maximizer ofthe SDP (13). Then, there exists some δ = o (1) , such that ℓ ( b V , z ∗ ) ≤ (1 + δ ) σ np , n k b Z − z ∗ z ∗ H k ≤ (1 + δ ) σ np , with probability at least − n − − exp (cid:16) − (cid:0) npσ (cid:1) / (cid:17) . In particular, δ can be chosen to satisfy δ = O (cid:18)(cid:16) log n + σ np (cid:17) / (cid:19) . ℓ ( b V , z ∗ ) and n k b Z − z ∗ z ∗ H k . While the result for ℓ ( b V , z ∗ ) is derived from (19), the result for n k b Z − z ∗ z ∗ H k is a consequence of the inequality1 n k b V H b V − z ∗ z ∗ H k ≤ ℓ ( b V , z ∗ ) , which is established by Lemma 5.5 in Section 5.1. Compared with the minimax lower boundin Theorem 2.1, we can conclude that the SDP (13) is minimax optimal for the estimationof the matrix z ∗ z ∗ H . It not only achieves the optimal rate, but the leading constant is alsosharp.Since the solution of the SDP is a matrix, some post-processing step is required to obtaina vector estimator for z ∗ . This can easily be done by extracting the leading eigenvector of b Z .Let u ∈ C n be the leading eigenvector of b Z , and deﬁne b z with each entry b z j = u j / | u j | . Thestatistical optimality of b z is established by the following result. Recall that for two vectorsin C n , the deﬁnition of the loss ℓ ( b z, z ∗ ) is given by (2). Theorem 2.3.

Assume σ = o ( np ) and np log n → ∞ . Let b Z = b V H b V be a global maximizer ofthe SDP (13). Then, there exists some δ = o (1) , such that ℓ ( b z, z ∗ ) ≤ (1 + δ ) σ np , with probability at least − n − − exp (cid:16) − (cid:0) npσ (cid:1) / (cid:17) . In particular, δ can be chosen to satisfy δ = O (cid:18)(cid:16) log n + σ np (cid:17) / (cid:19) . Compared with the minimax lower bound in Theorem 2.1, the SDP (13) is also minimaxoptimal for the estimation of the vector z ∗ in phase synchronization. In this section, we show that the analysis of the SDP through Lemma 2.1 also leads tostatistical optimality of the generalized power method (GPM) and the maximum likelihoodestimator (MLE). We note that it has already been established by [16] that both GPM andMLE achieve the optimal error bound (1 + o (1)) σ np under the loss ℓ ( b z, z ∗ ). By deriving thesame results using the analysis of the SDP, we can unify the three proofs and thus form acoherent understanding of the three diﬀerent methods.The iteration of GPM of phase synchronization is z ( t ) j = P k ∈ [ n ] \{ j } A jk Y jk z ( t − k (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk Y jk z ( t − k (cid:12)(cid:12)(cid:12) . (20)The similarity between (20) and (15) is obvious. To make an explicit connection between thetwo iteration procedures, we can embed (20) into the space of (15). Let e ∈ C n be the ﬁrst9anonical vector with the ﬁrst entry 1 and the remaining entries all 0. It is easy to checkthat as long as V ( t − j = ¯ z ( t − j e for all j ∈ [ n ], we also have V ( t ) j = ¯ z ( t ) j e for all j ∈ [ n ].This is because once the columns V ( t )1 , · · · , V ( t ) n lie in the same one-dimensional subspace forsome t , the iteration (15) remains in this subspace. Thus, the formula (15) exactly describesthe GPM iteration (20). In addition to the connection between (20) and (15), the two lossfunctions ℓ ( V, z ∗ ) and ℓ ( z, z ∗ ) are also equivalent. Under the condition that V j = ¯ z j e for all j ∈ [ n ], we have ℓ ( V, z ∗ ) = ℓ ( z, z ∗ ) . Therefore, Lemma 2.1 directly implies that ℓ ( g ( z ) , z ∗ ) ≤ δ ℓ ( z, z ∗ ) + (1 + δ ) σ np , (21)uniformly over all z ∈ C n such that ℓ ( z, z ∗ ) ≤ γ with high probability. The map g : C n → C n is deﬁned so that (20) can be shorthanded by z ( t ) = g ( z ( t − ).From (21), we know that as long as ℓ ( z ( t − , z ∗ ) ≤ γ for some γ = o (1), the next step ofpower iteration (20) satisﬁes ℓ ( z ( t ) , z ∗ ) ≤ δ ℓ ( z ( t − , z ∗ ) + (1 + δ ) σ np . (22)The condition ℓ ( z ( t − , z ∗ ) ≤ γ then implies ℓ ( z ( t ) , z ∗ ) ≤ δ γ + (1 + δ ) σ np . Given that σ np = o (1), we can always choose γ = o (1) that satisﬁes σ np = o ( γ ). Therefore, ℓ ( z ( t ) , z ∗ ) ≤ γ . Thus,a simple induction argument implies that (22) holds for all t ≥ ℓ ( z (0) , z ∗ ) ≤ γ .The one-step iteration bound (22) immediately implies the linear convergence ℓ ( z ( t ) , z ∗ ) ≤ δ t ℓ ( z (0) , z ∗ ) + 1 + δ − δ σ np , (23)for all t ≥

1. It has been shown by [16] that the initial error condition ℓ ( z (0) , z ∗ ) ≤ γ = o (1)is satisﬁed by a simple eigenvector method. That is, z (0) j = v j / | v j | with v ∈ C n being theleading eigenvector of the matrix A ◦ Y . Then, (23) implies ℓ ( z ( t ) , z ∗ ) ≤ (1 + o (1)) σ np for all t ≥ log (cid:0) σ (cid:1) .The optimality of the MLE can be derived from a similar embedding argument. Let b z bea global maximizer of (12). By the deﬁnition of b z , its j th entry must satisfy b z j = argmin z j ∈ C X k ∈ [ n ] \{ j } A jk | Y jk − z j ¯ b z k | = P k ∈ [ n ] \{ j } A jk Y jk b z k (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk Y jk b z k (cid:12)(cid:12)(cid:12) . By letting b V = e b z H , it can be shown that the ﬁxed-point equation b V = f ( b V ) holds. Giventhe equivalence of the loss ℓ ( b V , z ∗ ) = ℓ ( b z, z ∗ ), as long as we can show a crude bound ℓ ( b z, z ∗ ) ≤ γ = o (1) for the MLE, the inequality (19) holds and it can be written as ℓ ( b z, z ∗ ) ≤ δ ℓ ( b z, z ∗ ) + (1 + δ ) σ np , ℓ ( b z, z ∗ ) ≤ δ − δ σ np after rearrangement. The crude bound ℓ ( b z, z ∗ ) ≤ γ = o (1)can be easily established for the MLE using the argument in [16] or by a similar argument tothe proof of Lemma 2.2, and thus we obtain the optimal error bound ℓ ( b z, z ∗ ) ≤ (1 + o (1)) σ np for the MLE. Z Synchronization

In this section, we show our analysis of SDP can also be applied to Z synchronization andleads to a sharp exponential statistical error rate. Suppose we observe a random graph A jk ∼ Bernoulli( p ) independently for all 1 ≤ j < k ≤ n . For each pair ( j, k ), we observe Y jk = z ∗ j z ∗ k + σW jk with z ∗ j , z ∗ k ∈ {− , } and W jk ∼ N (0 ,

1) whenever A jk = 1. In Z synchronization, our goal is to estimate the binary vector z ∗ ∈ {− , } n from observations { A jk } ≤ j

Assume σ = o ( np ) and np log n → ∞ . Then, we have inf b Z ∈ R n × n sup z ∈{− , } n E z n k b Z − zz T k ≥ exp (cid:16) − (1 + δ ) np σ (cid:17) , inf b z ∈{− , } n sup z ∈{− , } n E z ℓ ( b z, z ) ≥ exp (cid:16) − (1 + δ ) np σ (cid:17) , for some δ = o (1) . When p = 1, the above result has been proved by [13], but the lower bound result for ageneral p is unknown in the literature. Compared with Theorem 2.1, the minimax lower boundfor Z synchronization is an exponential function of the signal-to-noise ratio, a consequenceof the discreteness of the problem.To estimate z ∗ ∈ {− , } n , the MLE is deﬁned as the global maximizer of the followingoptimization problem max z ∈{− , } n z T ( A ◦ Y ) z. (24)Similar to (13), a convex relaxation of (24) leads to the following SDP,max Z = Z T ∈ R n × n Tr (( A ◦ Y ) Z ) subject to diag( Z ) = I n and Z (cid:23) . (25)11he SDP for Z synchronization is almost in the exact form of (13). The only diﬀerencebetween (25) and (13) is that the optimization (25) is over real symmetric matrices and theoptimization of (13) is over complex Hermitian matrices.Our analysis of the SDP (25) for Z synchronization relies on a non-convex characteriza-tion that is similar to (14). For any Z that is a positive semi-deﬁnite real symmetric matrix,it admits a decomposition Z = V T V for some V ∈ R n × n . By writing the j th column of V as V j , we can replace the constraint diag( Z ) = I n by k V j k = 1 for all j ∈ [ n ]. Then, anequivalent non-convex form of the SDP (25) ismax V ∈ R n × n Tr (( A ◦ Y ) V T V ) subject to k V j k = 1 for all j ∈ [ n ] . (26)We will study the solution of (26) using the following loss function, ℓ ( b V , z ) = min a ∈ R n : k a k =1 n n X j =1 k b V j − z j a k . By the same argument that leads to (18), we know that if b V is a global maximizer of (26),it will satisfy the equation b V = f ( b V ), where f : R n × n → R n × n is a map such that the j thcolumn of f ( b V ) is given by [ f ( b V )] j = P k ∈ [ n ] \{ j } A jk Y jk b V ( t − k (cid:13)(cid:13)(cid:13)P k ∈ [ n ] \{ j } A jk Y jk b V ( t − k (cid:13)(cid:13)(cid:13) . Here, we use the notation R n × n for set of n × n real matrices whose columns all have unitnorms. For each j ∈ [ n ], deﬁne the random variable U j = σ ( n − p X k ∈ [ n ] \{ j } z ∗ k A jk W jk . The following lemma characterizes the evolution of the loss ℓ ( V, z ∗ ) through the map f . Lemma 4.1.

Assume σ = o ( np ) and np log n → ∞ . Then, for any γ = o (1) , there exists some δ = o (1) such that P  ℓ ( f ( V ) , z ∗ ) ≤ ℓ ( V, z ∗ ) + 4 n n X j =1 I {| U j | > − δ } for any V ∈ R n × n such that ℓ ( V, z ∗ ) ≤ γ  ≥ − (2 n log n ) − . In particular, δ can be chosen to satisfy δ = O (cid:16)q log n + σ np (cid:17) . Lemma 4.1 immediately implies that for any b V that satisﬁes the ﬁxed-point equation b V = f ( b V ) and the crude error bound ℓ ( b V , z ∗ ) ≤ γ = o (1), we have ℓ ( b V , z ∗ ) ≤ n n X j =1 I {| U j | > − δ } , (27)12ith high probability. The property of the random variable n P nj =1 I {| U j | > − δ } can beeasily analyzed, and we present the following lemma. Lemma 4.2.

Assume σ = o ( np ) and np log n → ∞ . Then, for any δ = o (1) , there exists some δ ′ = o (1) such that n n X j =1 I {| U j | > − δ } ≤ exp (cid:16) − (1 − δ ′ ) np σ (cid:17) , with probability at least − exp (cid:16) − q npσ (cid:17) − n − . If we additionally assume (1 − δ ′ ) np σ > log n ,then n n X j =1 I {| U j | > − δ } = 0 , with probability at least − exp (cid:16) − q npσ (cid:17) − n − . In particular, δ ′ can be chosen to satisfy δ ′ = O (cid:16) δ + q log n + σ np (cid:17) . We also need a lemma to establish a crude error bound for ℓ ( b V , z ∗ ). Lemma 4.3.

Assume np log n → ∞ . Let b Z = b V T b V be a global maximizer of the SDP (25).Then, there exits some constant C > such that ℓ ( b V , z ∗ ) ≤ C s σ + 1 np , with probability at least − n − . The results of Lemma 4.1, Lemma 4.2 and Lemma 4.3 immediately imply the statisticaloptimality of the SDP (25).

Theorem 4.2.

Assume np log n → ∞ . Let b Z = b V T b V be a global maximizer of the SDP (25)and b z j = u j / | u j | for j ∈ [ n ] with u ∈ R n being the leading eigenvector of b Z . Then, thereexists some δ = o (1) , such that ℓ ( b V , z ∗ ) ≤ exp (cid:16) − (1 − δ ) np σ (cid:17) , n k b Z − z ∗ z ∗ T k ≤ exp (cid:16) − (1 − δ ) np σ (cid:17) ,ℓ ( b z, z ∗ ) ≤ exp (cid:16) − (1 − δ ) np σ (cid:17) , with probability at least − exp (cid:16) − q npσ (cid:17) − ( n log n ) − . In particular, δ can be chosen tosatisfy δ = O (cid:16)q log n + σ np (cid:17) . Moreover, if we additionally assume σ < (1 − ǫ ) np n for somearbitrarily small constant ǫ > , the SDP solution b Z is a rank-one matrix that satisﬁes b Z = z ∗ z ∗ T with probability at least − exp (cid:16) − q npσ (cid:17) − ( n log n ) − . n k b V T b V − z ∗ z ∗ T k ≤ ℓ ( b V , z ∗ ) , which is established by Lemma 5.5 in Section 5.1. The result for the loss ℓ ( b z, z ∗ ) is resultedfrom a matrix perturbation bound [11].Theorem 4.2 has established the minimax optimality of the SDP (25) for Z synchroniza-tion in view of the matching lower bound results in Theorem 4.1. The special case p = 1recovers the results of [13]. Moreover, under the condition σ < (1 − ǫ ) np n , we show that theSDP solution b Z is exactly rank-one and therefore rounding through the leading eigenvector isnot needed. This result generalizes the exact recovery threshold of Z synchronization when p = 1 [1, 4, 5]. The phenomenon that SDP can achieve exact recovery has also been revealedin community detection under stochastic block models [3, 10, 18, 19, 22, 25].We shall compare Theorem 4.2 to Theorem 2.2 and Theorem 2.3. Though the two SDPs(25) and (13) have the same type of constraints, the diﬀerence of the domain implies twotypes of convergence rates exp (cid:0) − (1 − o (1)) np σ (cid:1) and (1 + o (1)) σ np . It is quite surprising thatthe SDP (25), a continuous optimization problem, is able to achieve an exponential rate,which is typical for a discrete problem. The adaptation of the SDP (25) to the discretestructure is a consequence of the fact that both (25) and (26) are optimization problems over R n × n . We make this eﬀect explicit by bounding the statistical error by the random variable n P nj =1 I {| U j | > − δ } in Lemma 4.1.To close this section, we brieﬂy discuss the implications of Lemma 4.1 on the MLE (24)and the generalized power method deﬁned by the iteration procedure z ( t ) j = P k ∈ [ n ] \{ j } A jk Y jk z ( t − k (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk Y jk z ( t − k (cid:12)(cid:12)(cid:12) . (28)We note that the iteration (28) is real-valued so that we always have z ( t ) j ∈ {− , } , whichmakes it diﬀerent from (20). The statistical optimality of the generalized power method(28) has been established by [15] for Z synchronization when p = 1. Following the sameargument in Section 3, we can embed both MLE and GPM into R n × n , and thus Lemma 4.1also implies that both MLE and GPM achieve the optimal rate exp (cid:0) − (1 − o (1)) np σ (cid:1) for ageneral p as well. Just as what we have for phase synchronization, the analyses of MLE,GPM, and SDP for Z synchronization are all based on Lemma 4.1, and thus we have uniﬁedthe three diﬀerent methods from an iterative algorithm perspective. This section presents the proofs of all technical results in the paper. We ﬁrst list someauxiliary lemmas in Section 5.1. The key lemmas of the SDP analyses, Lemma 2.1 andLemma 4.1, are proved in Section 5.2 and Section 5.3, respectively. We then prove the main14esults including Theorem 2.2, Theorem 2.3 and Theorem 4.2 in Section 5.4. Theorem 4.1is proved in Section 5.5. Finally, the proofs of Lemma 2.2, Lemma 4.3 and Lemma 4.2 aregiven in Section 5.6.

Lemma 5.1.

Assume np log n → ∞ . Then, there exists a constant C > , such that max j ∈ [ n ]  X k ∈ [ n ] \{ j } ( A jk − p )  ≤ Cnp log n, and k A − E A k op ≤ C √ np, with probability at least − n − .Proof. The ﬁrst result is a direct application of union bound and Bernstein’s inequality. Thesecond result is Theorem 5.2 of [21].The following result is essentially Corollary 3.11 of [7]. The speciﬁc form that we need isfrom Lemma 5.2 in [16]

Lemma 5.2 (Corollary 3.11 of [7]) . Assume np log n → ∞ . Then, there exists a constant C > ,such that k A ◦ W k op ≤ C √ np, with probability at least − n − . The result holds for both complex W in Section 2 and real W in Section 4. Lemma 5.3 (Lemma 13 of [14]) . Consider independent random variables X j ∼ N (0 , and E j ∼ Bernoulli( p ) . Then, P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 X j E j /p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t  ≤ (cid:18) − min (cid:18) pt n , pt (cid:19)(cid:19) , for any t > . Lemma 5.4.

The following three statements hold:1. For any x, y ∈ C n such that k y k = 1 and Re( y H x ) > , we have (cid:13)(cid:13)(cid:13)(cid:13) x k x k − y (cid:13)(cid:13)(cid:13)(cid:13) ≤ k ( I n − yy H ) x k + | Im( y H x ) | | Re( y H x ) | .

2. For any x, y ∈ R n such that k y k = 1 and y T x > , we have (cid:13)(cid:13)(cid:13)(cid:13) x k x k − y (cid:13)(cid:13)(cid:13)(cid:13) ≤ k ( I n − yy T ) x k | y T x | . . For any x ∈ C such that Re( x ) > , we have (cid:12)(cid:12)(cid:12)(cid:12) x | x | − (cid:12)(cid:12)(cid:12)(cid:12) ≤ | Im( x ) | | Re( x ) | . Proof.

It is easy to see that the last two statements are special cases of the ﬁrst one. Thus,we only need to prove the ﬁrst statement. Note that (cid:13)(cid:13)(cid:13)(cid:13) x k x k − y (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13) ( I n − yy H ) x + ( y H x ) y k x k − y (cid:13)(cid:13)(cid:13)(cid:13) = k ( I n − yy H ) x k + | y H x − k x k| k x k = b + (cid:16) a − √ a + b (cid:17) a + b , where a = Re( y H x ) > b = p | Im( y H x ) | + k ( I n − yy H ) x k . Since b + (cid:16) a − √ a + b (cid:17) a + b = 2 b a + b + a √ a + b ≤ b a , the proof is complete. Lemma 5.5.

For any V = ( V , · · · , V n ) ∈ C n × n and any z ∈ C n such that k V j k = 1 for all j ∈ [ n ] , we have n − k V H V − zz H k ≤ ℓ ( V, z ) . For any V = ( V , · · · , V n ) ∈ R n × n and any z ∈ {− , } n such that k V j k = 1 for all j ∈ [ n ] ,we have n − k V T V − zz T k ≤ ℓ ( V, z ) . Proof.

We only prove the complex version of the inequality. The real version follows the sameargument. By deﬁnition, we have ℓ ( V, z ) =  − max a ∈ C n : k a k =1  a H  n n X j =1 z j V j  +  n n X j =1 z j V j  H a  = 2  − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n X j =1 z j V j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . n − k V H V − zz H k = 1 n n X j =1 n X l =1 | V H j V l − z j ¯ z l | ≤ n n X j =1 n X l =1 (cid:0) − V H j V l ¯ z j z l − V H l V j z j ¯ z l (cid:1) = 2  − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n X j =1 z j V j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  . Therefore, n − k V H V − zz H k ≤ ℓ ( V, z ) (cid:0) − ℓ ( V, z ) (cid:1) ≤ ℓ ( V, z ), and the proof is complete.

We organize the proof into four steps. We ﬁrst list a few high-probability events in Step 1.These events are assumed to be true in later steps. Step 2 provides an error decompositionof ℓ ( f ( V ) , z ∗ ), and then each error term in the decomposition will be analyzed and boundedin Step 3. Finally, we combine the bounds and derive the desired result in Step 4. Step 1: Some high-probability events.

By Lemma 5.1 and Lemma 5.2, we know thatmin j ∈ [ n ] X k ∈ [ n ] \{ j } A jk ≥ ( n − p − C p np log n, (29)max j ∈ [ n ] X k ∈ [ n ] \{ j } A jk ≤ ( n − p + C p np log n, (30) k A − E A k op ≤ C √ np, (31) k A ◦ W k op ≤ C √ np, (32)all hold with probability at least 1 − n − for some constant C >

0. In addition to (29)-(32),we need two more high-probability inequalities. For ρ that satisﬁes ρ → ρ npσ → ∞ , wewant to bound the random variable P nj =1 I n σnp (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k (cid:12)(cid:12)(cid:12) > ρ o . The existenceof such ρ is guaranteed by the condition σ = o ( np ), and the speciﬁc choice will be givenlater. We ﬁrst bound its expectation by Lemma 5.3, n X j =1 P  σnp (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ρ  ≤ n X j =1 P  σnp (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Re( ¯ W jk ¯ z ∗ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ρ  + n X j =1 P  σnp (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Im( ¯ W jk ¯ z ∗ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ρ  ≤ n exp (cid:18) − ρ np σ (cid:19) + 4 n exp (cid:16) − ρnp σ (cid:17) .

17y Markov inequality, we have n X j =1 I  σnp (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ρ  ≤ σ ρ p exp − r ρ npσ ! , (33)with probability at least1 − ρ pnσ exp − ρ np σ + 116 r ρ npσ ! + exp − ρnp σ + 116 r ρ npσ !! ≥ − ρ pnσ exp − r ρ npσ ! ≥ − exp − r ρ npσ ! . The second high-probability bound we need is for the random variable P nj =1 (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk Im( ¯ W jk ¯ z ∗ k z ∗ j ) (cid:12)(cid:12)(cid:12) .We ﬁrst ﬁnd its expectation. By direct calculation, we have n X j =1 E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Im( ¯ W jk ¯ z ∗ k z ∗ j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n ( n − p . To study its variance, we introduce the notation ǫ jk = A jk Im( ¯ W jk ¯ z ∗ k z ∗ j ) and ǫ j = P k ∈ [ n ] \{ j } ǫ jk .The random variable ǫ jk satisﬁes the property − ǫ jk = − A jk Im( ¯ W jk ¯ z ∗ k z ∗ j ) = A kj Im( W jk z ∗ k ¯ z ∗ j ) = A kj Im( ¯ W kj ¯ z ∗ j z ∗ k ) = ǫ kj . With the new notation, we have

Var  n X j =1 ǫ j  ≤ n X j =1 (cid:0) E ǫ j − ( E ǫ j ) (cid:1) + X ≤ j = l ≤ n (cid:0) E ǫ j ǫ l − E ǫ j E ǫ l (cid:1) . (34)For any j = l , we have E ǫ j ǫ l = E  X k ∈ [ n ] \{ j } ǫ jk   X k ∈ [ n ] \{ l } ǫ lk  = X k ∈ [ n ] \{ j } X k ∈ [ n ] \{ j } E ǫ jk ǫ lk . Observe that E ǫ jk ǫ lk is either p or p , depending on whether or not ( j, k ) and ( l, k )correspond to the same edge. Therefore E ǫ j ǫ l = ( n − n ) p p , j = l . We also have for any j , E ǫ j = E  X k ∈ [ n ] \{ j } ǫ jk  = X k =[ n ] \{ j } E ǫ jk + X k,l ∈ [ n ] \{ j } : k = l E ǫ jk E ǫ lk = 3( n − p n − n + 2) p . We plug the above results into the bound (34), and we have

Var  n X j =1 ǫ j  ≤ n ( n − p n ( n − p ≤ n p . (35)Therefore, by Chebyshev inequality, we can conclude that with probability at least 1 − (6 n ) − , n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Im( ¯ W jk ¯ z ∗ k z ∗ j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n p (cid:18) √ np (cid:19) . (36)Using the same analysis, we also have n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Re( ¯ W jk ¯ z ∗ k z ∗ j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n p (cid:18) √ np (cid:19) , (37)with probability at least 1 − (6 n ) − . Finally, we conclude that the events (29), (30), (31), (32),(33), (36) and (37) hold simultaneously with probability at least 1 − (2 n ) − − exp (cid:18) − q ρ npσ (cid:19) . Step 2: Error decomposition.

For any V ∈ C n × n such that ℓ ( V, z ∗ ) ≤ γ , we can write b V = f ( V ) with each column b V j = e V j / k e V j k , where e V j = P k ∈ [ n ] \{ j } A jk ¯ Y jk V k P k ∈ [ n ] \{ j } A jk . ℓ ( V, z ∗ ) ≤ γ implies there exists some b ∈ C n such that k b k = 1 and nℓ ( V, z ∗ ) = k V − bz ∗ H k ≤ γn . By direct calculation, we can write z ∗ j e V j = b + P k ∈ [ n ] \{ j } A jk z ∗ k ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk + σz ∗ j b P k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k P k ∈ [ n ] \{ j } A jk + σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk = b + 1 n − n X k =1 z ∗ k ( V k − ¯ z ∗ k b ) − n − z ∗ j ( V j − ¯ z ∗ j b )+ P k ∈ [ n ] \{ j } A jk z ∗ k ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk − n − n X k =1 z ∗ k ( V k − ¯ z ∗ k b ) ! + σz ∗ j b P k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k P k ∈ [ n ] \{ j } A jk + σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk . Now we deﬁne a = b + n − P nk =1 z ∗ k ( V k − ¯ z ∗ k b ) and a = a / k a k , and we have z ∗ j a H e V j = k a k − n − z ∗ j a H ( V j − ¯ z ∗ j b ) + a H F j + a H bG j + a H H j , (38) k ( I n − aa H ) e V j k ≤ n − k V j − ¯ z ∗ j b k + k F j k + k ( I n − aa H ) b k| G j | + k H j k , (39)where F j = P k ∈ [ n ] \{ j } A jk z ∗ k ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk − n − n X k =1 z ∗ k ( V k − ¯ z ∗ k b ) ,G j = σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k P k ∈ [ n ] \{ j } A jk ,H j = σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk . By Lemma 5.4, we have the bound k b V j − ¯ z ∗ j a k ≤ k ( I n − aa H ) e V j k + | Im( z ∗ j a H e V j ) | | Re( z ∗ j a H e V j ) | , (40)whenever Re( z ∗ j a H e V j ) > k a − b k = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − n X k =1 z ∗ k ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n − √ n k V − bz ∗ H k F ≤ nn − √ γ ≤ √ γ, we have k a − b k ≤ k a − b k ≤ √ γ . Therefore, k a k ≥ k b k − k a − b k ≥ − √ γ, (41) | a H b − | = | ( a − b ) H b | ≤ k a − b k ≤ √ γ, (42) k ( I n − aa H ) b k ≤ k a − b k + | a H b − | ≤ √ γ. (43)20e also have (cid:12)(cid:12)(cid:12)(cid:12) n − z ∗ j a H ( V j − ¯ z ∗ j b ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ n − k V j − ¯ z ∗ j b k ≤ √ γnn − . (44)Therefore, as long as k F j k ∨ | G j | ∨ k H j k ≤ ρ , we haveRe (cid:16) z ∗ j a H e V j (cid:17) ≥ − √ γ − √ γnn − − ρ ≥ − √ γ + ρ ) , (45)where we have used (41) and (44). By (40), we obtain the bound k b V j − ¯ z ∗ j a k ≤ k ( I n − aa H ) e V j k + | Im( z ∗ j a H e V j ) | | Re( z ∗ j a H e V j ) | I {k F j k ∨ | G j | ∨ k H j k ≤ ρ } +4 I {k F j k ∨ | G j | ∨ k H j k > ρ }≤ (cid:16) n − k V j − ¯ z ∗ j b k + k F j k + 8 √ γ | G j | + k H j k (cid:17) (1 − √ γ + ρ )) + (cid:16) n − k V j − ¯ z ∗ j b k + k F j k + (1 + 4 √ γ ) | Im( a H bG j ) | + k H j k (cid:17) (1 − √ γ + ρ )) +4 I {k F j k > ρ } + 4 I {| G j | > ρ } + 4 I {k H j k > ρ }≤ (1 + η )(1 + 4 √ γ ) | Im( a H bG j ) | + 256 γ | G j | (1 − √ γ + ρ )) + (7 + 4 η − ) (cid:16) n − k V j − ¯ z ∗ j b k + k F j k + k H j k (cid:17) (1 − √ γ + ρ )) +4 I {k F j k > ρ } + 4 I {| G j | > ρ } + 4 I {k H j k > ρ } , for some η = o (1) to be speciﬁed later. The last inequality above is due to Jensen’s inequality. Step 3: Analysis of each error term.

Next, we will analyze the error terms F j , H j and G j separately. By triangle inequality, (29) and (30), we have k F j k ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) P k ∈ [ n ] \{ j } ( A jk − p ) z ∗ k ( V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p X k ∈ [ n ] \{ j } z ∗ k ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P k ∈ [ n ] \{ j } A jk − n − p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ np (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X k ∈ [ n ] \{ j } ( A jk − p ) z ∗ k ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + p √ n k V − bz ∗ H k F (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } ( A jk − p ) (cid:12)(cid:12)(cid:12) n p ≤ np (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X k ∈ [ n ] \{ j } ( A jk − p ) z ∗ k ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + C √ p log nnp k V − bz ∗ H k F . n X j =1 k F j k ≤ n p n X j =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X k ∈ [ n ] \{ j } ( A jk − p ) z ∗ k ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 C log nnp k V − bz ∗ H k ≤ n p k A − E A k k V − bz ∗ H k + 2 C log nnp k V − bz ∗ H k ≤ C log nnp k V − bz ∗ H k . (46)The above bound also implies n X j =1 I {k F j k > ρ } ≤ ρ − n X j =1 k F j k ≤ C ρ log nnp k V − bz ∗ H k . Similarly, we can also bound the error terms that depend on H j . By (29) and (32), we have n X j =1 k H j k ≤ σ n p n X j =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X k ∈ [ n ] \{ j } A jk ¯ W jk ( V k − ¯ z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 2 σ n p k ( V − bz ∗ H )( A ◦ W ) H k ≤ σ n p k A ◦ W k k V − bz ∗ H k ≤ C σ np k V − bz ∗ H k , (47)and thus n X j =1 I {k H j k > ρ } ≤ ρ − n X j =1 k H j k ≤ C ρ σ np k V − bz ∗ H k . For the contribution of G j , we use (29) and (33), and have n X j =1 I {| G j | > ρ } ≤ n X j =1 I  σnp (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ρ  ≤ σ ρ p exp − r ρ npσ ! . (48)22ext, we study the main error term | Im( a H bG j ) | . By (29), we have n X j =1 | Im( a H bG j ) | ≤ C s log nnp ! σ n p n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Im( ¯ W jk z ∗ j ¯ z ∗ k a H b ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + η ) C s log nnp ! σ n p n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Im( ¯ W jk z ∗ j ¯ z ∗ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) +(1 + η − ) C s log nnp ! σ n p n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk Re( ¯ W jk z ∗ j ¯ z ∗ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | Im( a H b ) | . By (42), we have | Im( a H b ) | = | Im( a H b − | ≤ | a H b − | ≤ √ γ. Together with (36) and (37), we have n X j =1 | Im( a H bG j ) | ≤ C η + η − γ + s log nnp !! σ p . (49)We also have n X j =1 | G j | ≤ σ n p n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk ¯ W jk z ∗ j ¯ z ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C σ p , (50)by (36) and (37). Step 4: Combining the bounds.

Plugging all the individual error bounds obtained inStep 3 into the error decomposition in Step 2, we obtain nℓ ( b V , z ∗ ) ≤ n X j =1 k b V j − ¯ z ∗ j a k ≤ C ρ + η + √ γ + η − γ + s log nnp !! σ p + 16 σ ρ p exp − r ρ npσ ! + C (cid:0) η − + ρ − (cid:1) log n + σ np nℓ ( V, z ∗ ) . We set η = s γ + log n + σ np and ρ = √ s log n + σ np . Then, since ρ npσ → ∞ , we have16 σ ρ p exp − r ρ npσ ! ≤ σ ρ p (cid:18) σ ρ np (cid:19) ≤ σ p s σ np . ℓ ( b V , z ∗ ) ≤ C (cid:18) γ + log n + σ np (cid:19) / ! σ np + C s log n + σ np ℓ ( V, z ∗ ) . Since the above inequality is derived from the conditions (29), (30), (31), (32), (33), (36),(37) and ℓ ( V, z ∗ ) ≤ γ , it holds uniformly over all V ∈ C n × n such that ℓ ( V, z ∗ ) ≤ γ withprobability at least 1 − (2 n ) − − exp (cid:16) − (cid:0) npσ (cid:1) / (cid:17) . The proof is complete. Similar to the proof of Lemma 2.1, we organize the proof of Lemma 4.1 into four steps.

Step 1: Some high-probability events.

We already know that (29), (30) and (31) holdwith probability at least 1 − n − . We also have k A ◦ W k op ≤ C √ np, (51)with probability at least 1 − n − by Lemma 5.2. Note that the matrix W in (51) is real-valued,compared with the complex version of the bound (32). Another high-probability event weneed is for the random variable P nj =1 (cid:12)(cid:12)(cid:12)P k ∈ [ n ] \{ j } A jk W jk z ∗ j z ∗ k (cid:12)(cid:12)(cid:12) . By direct calculation, thisrandom variable has expectation n ( n − p . With a similar analysis that leads to (35), itsvariance can also be bounded by the order of n p . Therefore, by Chebyshev inequality, wecan conclude that with probability at least 1 − (3 n p ) − , n X j =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } A jk W jk z ∗ j z ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn p. (52)In the end, we conclude that the events (29), (30), (31), (51) and (52) hold simultaneouslywith probability at least 1 − (2 n p ) − . Step 2: Error decomposition.

For any V ∈ R n × n such that ℓ ( V, z ∗ ) ≤ γ , we can write b V = f ( V ) with each column b V j = e V j / k e V j k , where e V j = P k ∈ [ n ] \{ j } A jk Y jk V k P k ∈ [ n ] \{ j } A jk . The condition ℓ ( V, z ∗ ) ≤ γ implies there exists some b ∈ R n such that k b k = 1 and nℓ ( V, z ∗ ) = k V − bz ∗ T k ≤ γn . By direct calculation, we can write z ∗ j e V j = b + 1 n − n X k =1 z ∗ k ( V k − z ∗ k b ) − n − z ∗ j ( V j − z ∗ j b )+ P k ∈ [ n ] \{ j } A jk z ∗ k ( V k − z ∗ k b ) P k ∈ [ n ] \{ j } A jk − n − n X k =1 z ∗ k ( V k − z ∗ k b ) ! + σz ∗ j b P k ∈ [ n ] \{ j } A jk W jk z ∗ k P k ∈ [ n ] \{ j } A jk + σz ∗ j P k ∈ [ n ] \{ j } A jk W jk ( V k − z ∗ k b ) P k ∈ [ n ] \{ j } A jk . a = b + n − P nk =1 z ∗ k ( V k − z ∗ k b ) and a = a / k a k , and we have z ∗ j a T e V j = k a k − n − z ∗ j a T ( V j − z ∗ j b ) + a T F j + a T bG j + a T H j , k ( I n − aa T ) e V j k ≤ n − k V j − z ∗ j b k + k F j k + k ( I n − aa T ) b k| G j | + k H j k , where F j = P k ∈ [ n ] \{ j } A jk z ∗ k ( V k − z ∗ k b ) P k ∈ [ n ] \{ j } A jk − n − n X k =1 z ∗ k ( V k − z ∗ k b ) ,G j = σz ∗ j P k ∈ [ n ] \{ j } A jk W jk z ∗ k P k ∈ [ n ] \{ j } A jk ,H j = σz ∗ j P k ∈ [ n ] \{ j } A jk W jk ( V k − z ∗ k b ) P k ∈ [ n ] \{ j } A jk . By Lemma 5.4, we have the bound k b V j − z ∗ j a k ≤ k ( I n − aa T ) e V j k | z ∗ j a T e V j | , (53)whenever z ∗ j a T e V j > k a − b k = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − n X k =1 z ∗ k ( V k − z ∗ k b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n − √ n k V − bz ∗ T k F ≤ √ n k V − bz ∗ T k F , we have k a − b k ≤ k a − b k ≤ √ n k V − bz ∗ T k F . Therefore, k a k ≥ k b k − k a − b k ≥ − √ n k V − bz ∗ T k F , (54) | a T b − | = | ( a − b ) T b | ≤ k a − b k ≤ √ n k V − bz ∗ T k F , (55) k ( I n − aa T ) b k ≤ k a − b k + | a T b − | ≤ √ n k V − bz ∗ T k F . (56)We also have (cid:12)(cid:12)(cid:12)(cid:12) n − z ∗ j a T ( V j − z ∗ j b ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ n − k V j − z ∗ j b k ≤ n − k V − bz ∗ T k F . (57)Therefore, as long as k F j k ∨ k H j k ≤ ρ and | G j | ≤ − ρ , we have z ∗ j a T e V j ≥ − √ n k V − bz ∗ T k F − n − k V − bz ∗ T k F − k F j k − | G j | − k H j k≥ − √ γ − ρ − (1 − ρ ) ≥ ρ, ρ to be some sequence that satisﬁes ρ = o (1)and ρ ≥ √ γ . The speciﬁc choice of ρ will be given later. By (53), we obtain the bound k b V j − z ∗ j a k ≤ k ( I n − aa T ) e V j k | z ∗ j a T e V j | I {k F j k ∨ k H j k ≤ ρ, | G j | ≤ − ρ } +4 I {k F j k ∨ k H j k ≤ ρ, | G j | ≤ − ρ }≤ ρ (cid:18) n − k V j − z ∗ j b k + k F j k + k ( I n − aa T ) b k| G j | + k H j k (cid:19) +4 I {k F j k > ρ } + 4 I {| G j | > − ρ } + 4 I {k H j k > ρ }≤ k V j − z ∗ j b k ρ ( n − + 4 k F j k ρ + 4 k ( I n − aa T ) b k | G j | ρ + 4 k H j k ρ + 4 k F j k ρ + 4 k H j k ρ + 4 I {| G j | > − ρ }≤ k V j − z ∗ j b k ρ ( n − + 8 k F j k ρ + 256 k V − bz ∗ T k | G j | nρ + 8 k H j k ρ +4 I {| G j | > − ρ } . We have used (56), Jensen’s inequality and Markov’s inequality in the above derivation.

Step 3: Analysis of each error term.

Next, we will analyze the error terms F j , H j and G j separately. Following the same analysis that leads to (46), (47) and (50), we have n X j =1 k F j k ≤ C log nnp k V − bz ∗ H k , n X j =1 k H j k ≤ C σ np k V − bz ∗ H k , n X j =1 | G j | ≤ C σ p . Note that the above three bounds are based on the events (29), (30), (31), (51) and (52).

Step 4: Combining the bounds.

Plugging all the individual error bounds obtained inStep 3 into the error decomposition in Step 2, we obtain nℓ ( b V , z ∗ ) ≤ n X j =1 k b V j − z ∗ j a k ≤ (cid:18) ρ ( n − + 8 C log n + (8 C + 256 C ) σ ρ np (cid:19) nℓ ( V, z ∗ ) + 4 n X j =1 I {| G j | > − ρ } . Set ρ = C (cid:18) log n + σ np (cid:19) , C such that ρ ( n − + C log n +(8 C +256 C ) σ ρ np ≤ . Then,we have ℓ ( b V , z ∗ ) ≤ ℓ ( V, z ∗ ) + 4 n n X j =1 I {| G j | > − ρ }≤ ℓ ( V, z ∗ ) + 4 n n X j =1 I  σ ( n − p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X k ∈ [ n ] \{ j } z ∗ k A jk W jk (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > − C s log n + σ np ! , where the last inequality is by (29) and (30). Since the above about is derived from theconditions (29), (30), (31), (51) and (52) and ℓ ( V, z ∗ ) ≤ γ , it holds uniformly over all V ∈ R n × n such that ℓ ( V, z ∗ ) ≤ γ with probability at least 1 − (2 n p ) − . The proof is complete. Proof of Theorem 2.2.

We obtain (19) as a consequence of Lemma 2.1 and Lemma 2.2, whichimmediately implies the ﬁrst conclusion. The second conclusion is a consequence of Lemma5.5.

Proof of Theorem 2.3.

By Theorem 2.2, we have k b V − bz ∗ H k ≤ (1 + o (1)) σ p with highprobability for some b ∈ C n such that k b k = 1. Since b V = f ( b V ), we can follow the sameanalysis in the proof of Lemma 2.1 and obtain the bound k b V − az ∗ H k ≤ (1 + δ ) σ p , (58)with high probability, where δ = O (cid:18)(cid:16) log n + σ np (cid:17) / (cid:19) and a = a / k a k with a = b + n − P nk =1 z ∗ k ( b V k − ¯ z ∗ k b ). By the deﬁnition of b z , we can write b z j = e z j / | e z j | for all j ∈ [ n ]with e z = b V H b a , where b a is the leading left singular vector of b V . By (58) and Wedin’s sin-thetatheorem [28], we have k b a − ha k ≤ σ np , (59)form some h ∈ C . Deﬁne d = a H b a and d = d / | d | . With e z j = b V H j b a , we have e z j ¯ z ∗ j ¯ d = | d | + h ¯ d ¯ z ∗ j ( b V j − ¯ z ∗ j a ) H a + ¯ d ( b V j − ¯ z ∗ j a ) H ( b a − ha )¯ z ∗ j . (60)By Lemma 5.4, we have the bound | b z j − dz ∗ j | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e z j ¯ z ∗ j ¯ d | e z j ¯ z ∗ j ¯ d | − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | Im( e z j ¯ z ∗ j ¯ d ) | | Re( e z j ¯ z ∗ j ¯ d ) | , (61)as long as Re( e z j ¯ z ∗ j ¯ d ) >

0. By (59), we have | d | ≥ Re( h b a H a ) ≥ − σ np and | ¯ d ( b V j − ¯ z ∗ j a ) H ( b a − ha )¯ z ∗ j | ≤ q σ np k b V j − ¯ z ∗ j a k . Moreover, | h ¯ d ¯ z ∗ j ( b V j − ¯ z ∗ j a ) H a | ≤ | ( b V j − ¯ z ∗ j a ) H a | = | b V H j a ¯ z ∗ j − | = | a H b V j z ∗ j − | . e z j ¯ z ∗ j ¯ d ) ≥ − σ np − | a H b V j z ∗ j − | − s σ np k b V j − ¯ z ∗ j a k . (62)Since b V = f ( b V ), we can write b V j = e V j / k e V j k , where e V j = P k ∈ [ n ] \{ j } A jk Y jk b V k P k ∈ [ n ] \{ j } A jk . Similar to the decomposition (38), we can write a H e V j z ∗ j = k a k − n − z ∗ j a H ( b V j − ¯ z ∗ j b ) + a H F j + a H bG j + a H H j , where F j = P k ∈ [ n ] \{ j } A jk z ∗ k ( b V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk − n − n X k =1 z ∗ k ( b V k − ¯ z ∗ k b ) ,G j = σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ¯ z ∗ k P k ∈ [ n ] \{ j } A jk ,H j = σz ∗ j P k ∈ [ n ] \{ j } A jk ¯ W jk ( b V k − ¯ z ∗ k b ) P k ∈ [ n ] \{ j } A jk . By the same argument that leads to (45) with γ = σ np , we know that as long as k F j k ∨ | G j | ∨k H j k ≤ ρ , we have | Re( a H e V j z ∗ j ) − | ≤ ρ + 3 s σ np . (63)Moreover, | Im( a H e V j z ∗ j ) | ≤ n − k b V j − ¯ z ∗ j b k + k F j k + | Im( a H bG j ) | + k H j k≤ n − s σ p + k F j k + | Im( a H bG j ) | + k H j k (64) ≤ n − s σ p + 3 ρ. By a similar bound to (39), we also have k ( I n − aa H ) e V j k ≤ n − k b V j − ¯ z ∗ j b k + k F j k + | G j | + k H j k ≤ n − s σ p + 3 ρ. With the decomposition k e V j k = k ( I n − aa H ) e V j k + | Im( a H e V j z ∗ j ) | + | Re( a H e V j z ∗ j ) | , we have (cid:12)(cid:12)(cid:12) k e V j k − (cid:12)(cid:12)(cid:12) ≤ k ( I n − aa H ) e V j k + | Im( a H e V j z ∗ j ) | + (cid:12)(cid:12)(cid:12) | Re( a H e V j z ∗ j ) | − (cid:12)(cid:12)(cid:12) ≤ ρ + 4 s σ np . (65)28his leads to the bound | a H b V j z ∗ j − | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Re( a H e V j z ∗ j ) − k e V j kk e V j k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + | Im( a H e V j z ∗ j ) |k e V j k ≤ C ρ + s σ np ! . By Lemma 5.4, we have the bound k b V j − ¯ z ∗ j a k ≤ k ( I n − aa H ) e V j k + | Im( z ∗ j a H e V j ) | | Re( z ∗ j a H e V j ) | ≤ C (cid:18) ρ + σ n p (cid:19) . (66)Plugging the above two bounds into (62), we haveRe( e z j ¯ z ∗ j ¯ d ) ≥ − C ρ + s σ np ! . Therefore, the bound (61) holds when k F j k ∨ | G j | ∨ k H j k ≤ ρ , and we have | b z j − dz ∗ j | ≤ | Im( e z j ¯ z ∗ j ¯ d ) | (cid:16) − C (cid:16) ρ + q σ np (cid:17)(cid:17) I {k F j k ∨ | G j | ∨ k H j k ≤ ρ } + 4 I {k F j k ∨ | G j | ∨ k H j k > ρ } . Now we need to bound | Im( e z j ¯ z ∗ j ¯ d ) | according to the expansion (60). We have | Im( e z j ¯ z ∗ j ¯ d ) | ≤ (cid:12)(cid:12)(cid:12) Im(¯ z ∗ j ( b V j − ¯ z ∗ j a ) H a ) (cid:12)(cid:12)(cid:12) + | Im( h ¯ d ) | (cid:12)(cid:12)(cid:12) Re(¯ z ∗ j ( b V j − ¯ z ∗ j a ) H a ) (cid:12)(cid:12)(cid:12) + k b V j − ¯ z ∗ j a kk b a − ha k . (67)By (59) and (66), the third term in the bound (67) can be further bounded by C q σ np (cid:16) ρ + q σ n p (cid:17) .To bound the second term on the right hand side of (67), we have | Im( h ¯ d ) | ≤ | Im( ha H b a ) | ≤ p − | Re( ha H a ) | ≤ q σ np by (59). Together with (63), we obtain the bound 3 q σ np (cid:16) ρ + q σ np (cid:17) .By (65), we can bound the ﬁrst term in the bound (67) by (cid:12)(cid:12)(cid:12) Im(¯ z ∗ j ( b V j − ¯ z ∗ j a ) H a ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) Im( b V H j a ¯ z ∗ j ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) Im( a H b V j z ∗ j ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) Im( a H e V j z ∗ j ) (cid:12)(cid:12)(cid:12) − (cid:16) ρ + q σ np (cid:17) . Then, we have | b z j − dz ∗ j | ≤ C ρ + s σ np !! (cid:12)(cid:12)(cid:12) Im( a H e V j z ∗ j ) (cid:12)(cid:12)(cid:12) + C σ np (cid:18) ρ + σ np (cid:19) +4 I {k F j k ∨ | G j | ∨ k H j k > ρ }≤ C ρ + s σ np + η !! | Im( a H bG j ) | + C η − (cid:18) σ n p + k F j k + k H j k (cid:19) + C σ np (cid:18) ρ + σ np (cid:19) + 4 ρ − (cid:0) k F j k + k H j k (cid:1) + 4 I {| G j | > ρ } , η = o (1) to be speciﬁed later, where the last inequality is by (64). Summing over j ∈ [ n ], we obtain nℓ ( b z, z ∗ ) ≤ n X j =1 | b z j − dz ∗ j | ≤ C ρ + s σ np + η !! n X j =1 | Im( a H bG j ) | + C η − σ np + C σ p (cid:18) ρ + σ np (cid:19) +( C η − + 4 ρ − ) n X j =1 (cid:0) k F j k + k H j k (cid:1) + 4 n X j =1 I {| G j | > ρ } . By the same argument that leads to the bound (46), (47), (48) and (49) (with γ = σ np in(49)), we have n X j =1 ( k F j k + k H j k ) ≤ C ′ log nnp k b V − bz ∗ H k ≤ C ′ log nnp σ p , n X j =1 I {| G j | > ρ } ≤ σ ρ p exp − r ρ npσ ! , n X j =1 | Im( a H bG j ) | ≤ C ′′ η + η − σ np + s log nnp !! σ p . Take η = ρ = q log n + σ np , and we have ℓ ( b z, z ∗ ) ≤ O (cid:18) log n + σ np (cid:19) / !! σ np . Note that the above bound is derived from conditions (29), (30), (31), (32), (33), (36), (37),and thus the result holds with high probability.

Proof of Theorem 4.2.

The ﬁrst conclusion is an immediate consequence of Lemma 4.1, Lemma4.2 and Lemma 4.3. By Lemma 5.5, we also obtain the second conclusion. For the last con-clusion, we have | b z j − z ∗ j | ≤ |√ nu j − z ∗ j | and | b z j + z ∗ j | ≤ |√ nu j + z ∗ j | by the deﬁnition of b z j .Then, ℓ ( b z, z ∗ ) ≤ (cid:0) k u − z ∗ / √ n k ∧ k u + z ∗ / √ n k (cid:1) ≤ n k b Z − z ∗ z ∗ T k , by Davis-Kahan theorem [11]. Thus, we can derive the third conclusion from the secondone. Finally, when σ < (1 − ǫ ) np n , we know from (27) and Lemma 4.2 that ℓ ( b V , z ∗ ) = 0.Lemma 5.5 implies that k b Z − z ∗ z ∗ T k = 0 and thus b Z = z ∗ z ∗ T is a rank-one matrix.30 .5 Proof of Theorem 4.1 Since ℓ ( b z, z ) = 2 (cid:0) − n | b z T z | (cid:1) and n − k b z b z T − zz T k = 2 (cid:0) − n | b z T z | (cid:1) , we have n − k b z b z T − zz T k = ℓ ( b z, z ) (cid:18) n | b z T z | (cid:19) ≤ ℓ ( b z, z ) , and thus inf b z ∈{− , } n sup z ∈{− , } n E z ℓ ( b z, z ) ≥ inf b z ∈{− , } n sup z ∈{− , } n E z n k b z b z T − zz T k ≥

12 inf b Z ∈ R n × n sup z ∈{− , } n E z n k b Z − zz T k . It suﬃces to prove a lower bound for the loss k b Z − zz T k . We lower bound the minimax riskby a Bayes risk inf b Z ∈ R n × n sup z ∈{− , } n E z n k b Z − zz T k ≥ inf b Z ∈ R n × n n X z ∈{− , } n E z n k b Z − zz T k ≥ n X ≤ j = k ≤ n n − X z − ( j,k ) ∈{− , } n − inf b T X z j ∈{− , } X z k ∈{− , } E z | b T − z j z k | , where z − ( j,k ) is a sub-vector of z by excluding the j th and the k th entries. For each z − ( j,k ) ,we have inf b T X z j ∈{− , } X z k ∈{− , } E z | b T − z j z k | ≥ inf b T (cid:16) E ( z − ( j,k ) ,z j =1 ,z k = − | b T + 1 | + E ( z − ( j,k ) ,z j =1 ,z k =1) | b T − | (cid:17) ≥ Z d P ( z − ( j,k ) ,z j =1 ,z k = − ∧ d P ( z − ( j,k ) ,z j =1 ,z k =1) , where the last inequality is due to the classical Le Cam’s two-point method. The totalvariation aﬃnity characterizes the optimal testing error between two simple hypotheses of z k = − z k = 1 with the values of all other parameters are known. By Neyman-Pearsonlemma, we have Z d P ( z − ( j,k ) ,z j =1 ,z k = − ∧ d P ( z − ( j,k ) ,z j =1 ,z k =1) ≥ P ( z − ( j,k ) ,z j =1 ,z k = − d P ( z − ( j,k ) ,z j =1 ,z k =1) d P ( z − ( j,k ) ,z j =1 ,z k = − > ! = P ( z − ( j,k ) ,z j =1 ,z k = −  X j ∈ [ n ] \{ k } z j A jk Y jk >  = P  σ X j ∈ [ n ] \{ k } z j A jk W jk > X j ∈ [ n ] \{ k } A jk  . A be the collections of A ’s that satisfy the conclusions of Lemma 5.1, and we know that P ( A ) ≥ − n − . Let P A be the shorthand of the conditional probability P ( ·| A ). For each A ∈ A , a standard Gaussian tail bound implies P A  σ X j ∈ [ n ] \{ k } z j A jk W jk > X j ∈ [ n ] \{ k } A jk  ≥ exp (cid:16) − (1 + δ ) np σ (cid:17) , for some δ = o (1). This implies P  σ X j ∈ [ n ] \{ k } z j A jk W jk > X j ∈ [ n ] \{ k } A jk  ≥ inf A ∈A P A  σ X j ∈ [ n ] \{ k } z j A jk W jk > X j ∈ [ n ] \{ k } A jk  P ( A ) ≥

12 exp (cid:16) − (1 + δ ) np σ (cid:17) . Therefore, inf b z ∈{− , } n sup z ∈{− , } n E z ℓ ( b z, z ) ≥

12 inf b Z ∈ R n × n sup z ∈{− , } n E z n k b Z − zz T k ≥

116 exp (cid:16) − (1 + δ ) np σ (cid:17) . By absorbing the constant 1 /

16 into the exponent, the proof is complete.

Proof of Lemma 2.2.

By the deﬁnition of b Z = b V H b V , we have Tr (( A ◦ Y ) b Z ) ≥ Tr (( A ◦ Y ) z ∗ z ∗ H ).Rearranging this inequality, we obtain Tr ( z ∗ z ∗ H ( z ∗ z ∗ H − b Z )) ≤ Tr (cid:16) ( A ◦ Y /p − z ∗ z ∗ H )( b Z − z ∗ z ∗ H ) (cid:17) . (68)The right hand side of (68) can be bounded by (cid:12)(cid:12)(cid:12) Tr (cid:16) ( A ◦ Y /p − z ∗ z ∗ H ) b Z (cid:17)(cid:12)(cid:12)(cid:12) + | Tr (( A ◦ Y /p − z ∗ z ∗ H ) z ∗ z ∗ H ) |≤ k A ◦ Y /p − z ∗ z ∗ H k op Tr ( b Z ) + k A ◦ Y /p − z ∗ z ∗ H k op Tr ( z ∗ z ∗ H )= 2 n k A ◦ Y /p − z ∗ z ∗ H k op ≤ n (cid:18) p k ( A − E A ) ◦ z ∗ z ∗ H k op + σp k A ◦ W k op (cid:19) . By Lemma 5.1, k ( A − E A ) ◦ z ∗ z ∗ H k op = sup k u k =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ≤ j = k ≤ n ( A jk − p ) z ∗ j ¯ z ∗ k u j ¯ u k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k A − E A k op ≤ C √ np, − n − . By Lemma 5.2, k A ◦ W k op ≤ C √ np with probability atleast 1 − n − . Thus, we have Tr ( z ∗ z ∗ H ( z ∗ z ∗ H − b Z )) ≤ C n s (1 + σ ) np . Deﬁne m = n P nj =1 b V j z ∗ j . By the inequality k x/ k x k − y/ k y kk ≤ k x − y k / k x k , we have ℓ ( b V , z ∗ ) = min a ∈ C n : k a k =1 n n X j =1 k b V j z ∗ j − a k = min a ∈ C n \{ } n n X j =1 k b V j z ∗ j − a/ k a kk ≤ min a ∈ C n \{ } n n X j =1 k b V j z ∗ j − a k = 4 n n X j =1 k b V j z ∗ j − m k = 2 n n X j =1 n X l =1 (cid:16) k b V j z ∗ j − m k + k b V l z ∗ l − m k (cid:17) = 2 n n X j =1 n X l =1 k b V j z ∗ j − b V l z ∗ l k = 4 n n X j =1 n X l =1 (1 − ¯ z ∗ j z ∗ l b V H j b V l )= 4 n Tr ( z ∗ z ∗ H ( z ∗ z ∗ H − b Z )) . Therefore, we have ℓ ( b V , z ∗ ) ≤ C q (1+ σ ) np , and the proof is complete. Proof of Lemma 4.3.

Following the same argument in the proof of Lemma 2.2, we have Tr ( z ∗ z ∗ T ( z ∗ z ∗ T − b Z )) ≤ Cn s (1 + σ ) np , with probability at least 1 − n − and ℓ ( b V , z ∗ ) ≤ n Tr ( z ∗ z ∗ T ( z ∗ z ∗ T − b Z )). Then, we obtainthe bound ℓ ( b V , z ∗ ) ≤ C q (1+ σ ) np , and the proof is complete. Proof of Lemma 4.2.

Let A be the collections of A ’s that satisfy the conclusions of Lemma5.1, and we know that P ( A ) ≥ − n − . Let P A be the shorthand of the conditional probability P ( ·| A ). For each A ∈ A , a standard Gaussian tail bound implies8 n n X j =1 P A ( | U j | > − δ ) ≤ n n X j =1 exp − (1 − δ ) ( n − p σ P k ∈ [ n ] \{ j } A jk ! ≤ exp (cid:16) − (1 − ¯ δ ) np σ (cid:17) , δ = O (cid:16) δ + q log nnp (cid:17) . Therefore, P  n n X j =1 I {| U j | > − δ } > exp − − ¯ δ − s σ np ! np σ ! ≤ sup A ∈A P A  n n X j =1 I {| U j | > − δ } > exp − − ¯ δ − s σ np ! np σ ! + P ( A c ) ≤ exp (cid:18) − r npσ (cid:19) + n − , by Markov’s inequality. This immediately implies the ﬁrst conclusion. For the second con-clusion, it is easy to see that when (cid:16) − ¯ δ − q σ np (cid:17) np σ > log n , we have n P nj =1 I {| U j | > − δ } ≤ n , and thus the value of n P nj =1 I {| U j | > − δ } has to be 0. References [1] Abbe, E., Fan, J., Wang, K. and Zhong, Y. [2020]. Entrywise eigenvector analysis ofrandom matrices with low expected rank,

Annals of Statistics (3): 1452–1474.[2] Abbe, E., Massoulie, L., Montanari, A., Sly, A. and Srivastava, N. [2017]. Group syn-chronization on grids, arXiv preprint arXiv:1706.08561 .[3] Amini, A. A., Levina, E. et al. [2018]. On semideﬁnite relaxations for the block model, The Annals of Statistics (1): 149–179.[4] Bandeira, A. S. [2018]. Random laplacian matrices and convex relaxations, Foundationsof Computational Mathematics (2): 345–379.[5] Bandeira, A. S., Boumal, N. and Singer, A. [2017]. Tightness of the maximum likelihoodsemideﬁnite relaxation for angular synchronization, Mathematical Programming (1-2): 145–167.[6] Bandeira, A. S., Khoo, Y. and Singer, A. [2014]. Open problem: Tightness of maximumlikelihood semideﬁnite relaxations, arXiv preprint arXiv:1404.2655 .[7] Bandeira, A. S. and Van Handel, R. [2016]. Sharp nonasymptotic bounds on the norm ofrandom matrices with independent entries,

The Annals of Probability (4): 2479–2506.[8] Boumal, N. [2016]. Nonconvex phase synchronization, SIAM Journal on Optimization (4): 2355–2377.[9] Burer, S. and Monteiro, R. D. [2003]. A nonlinear programming algorithm forsolving semideﬁnite programs via low-rank factorization, Mathematical Programming (2): 329–357. 3410] Chen, Y., Li, X. and Xu, J. [2018]. Convexiﬁed modularity maximization for degree-corrected stochastic block models, The Annals of Statistics (4): 1573–1602.[11] Davis, C. and Kahan, W. M. [1970]. The rotation of eigenvectors by a perturbation. iii, SIAM Journal on Numerical Analysis (1): 1–46.[12] Erdogdu, M. A., Ozdaglar, A., Parrilo, P. A. and Vanli, N. D. [2018]. Convergence rateof block-coordinate maximization burer-monteiro method for solving large sdps, arXivpreprint arXiv:1807.04428 .[13] Fei, Y. and Chen, Y. [2020]. Achieving the bayes error rate in synchronization and blockmodels by sdp, robustly, IEEE Transactions on Information Theory (6): 3929–3953.[14] Gao, C., Lu, Y., Ma, Z. and Zhou, H. H. [2016]. Optimal estimation and completionof matrices with biclustering structures, The Journal of Machine Learning Research (1): 5602–5630.[15] Gao, C. and Zhang, A. Y. [2019]. Iterative algorithm for discrete structure recovery, arXiv preprint arXiv:1911.01018 .[16] Gao, C. and Zhang, A. Y. [2020]. Exact minimax estimation for phase synchronization, arXiv preprint arXiv:2010.04345 .[17] Gao, T. and Zhao, Z. [2020]. Multi-frequency phase synchronization, Proceedings ofMachine Learning Research .[18] Hajek, B., Wu, Y. and Xu, J. [2016a]. Achieving exact cluster recovery threshold viasemideﬁnite programming, IEEE Transactions on Information Theory (5): 2788–2797.[19] Hajek, B., Wu, Y. and Xu, J. [2016b]. Achieving exact cluster recovery thresholdvia semideﬁnite programming: Extensions, IEEE Transactions on Information Theory (10): 5918–5937.[20] Javanmard, A., Montanari, A. and Ricci-Tersenghi, F. [2016]. Phase transi-tions in semideﬁnite relaxations, Proceedings of the National Academy of Sciences (16): E2218–E2223.[21] Lei, J. and Rinaldo, A. [2015]. Consistency of spectral clustering in stochastic blockmodels,

The Annals of Statistics (1): 215–237.[22] Li, X., Chen, Y. and Xu, J. [2018]. Convex relaxation methods for community detection, arXiv preprint arXiv:1810.00315 .[23] Ling, S. [2020]. Solving orthogonal group synchronization via convex and low-rankoptimization: Tightness and landscape analysis, arXiv preprint arXiv:2006.00902 .[24] Lu, Y. and Zhou, H. H. [2016]. Statistical and computational guarantees of lloyd’salgorithm and its variants, arXiv preprint arXiv:1612.02099 .3525] Perry, A. and Wein, A. S. [2017]. A semideﬁnite program for unbalanced multisectionin the stochastic block model, , IEEE, pp. 64–67.[26] Singer, A. [2011]. Angular synchronization by eigenvectors and semideﬁnite program-ming, Applied and computational harmonic analysis (1): 20–36.[27] Wang, P.-W., Chang, W.-C. and Kolter, J. Z. [2017]. The mixing method: low-rank co-ordinate descent for semideﬁnite programming with diagonal constraints, arXiv preprintarXiv:1706.00476 .[28] Wedin, P.-˚A. [1972]. Perturbation bounds in connection with singular value decomposi-tion, BIT Numerical Mathematics (1): 99–111.[29] Zhang, A. Y. and Zhou, H. H. [2016]. Minimax rates of community detection in stochasticblock models, The Annals of Statistics (5): 2252–2280.[30] Zhong, Y. and Boumal, N. [2018]. Near-optimal bounds for phase synchronization, SIAMJournal on Optimization28