Settling the Sharp Reconstruction Thresholds of Random Graph Matching
aa r X i v : . [ m a t h . S T ] J a n Settling the Sharp Reconstruction Thresholdsof Random Graph Matching
Yihong Wu, Jiaming Xu, and Sophie H. Yu ∗ February 2, 2021
Abstract
This paper studies the problem of recovering the hidden vertex correspondence between twoedge-correlated random graphs. We focus on the Gaussian model where the two graphs arecomplete graphs with correlated Gaussian weights and the Erd˝os-R´enyi model where the twographs are subsampled from a common parent Erd˝os-R´enyi graph G ( n, p ). For dense graphswith p = n − o (1) , we prove that there exists a sharp threshold, above which one can correctlymatch all but a vanishing fraction of vertices and below which correctly matching any positivefraction is impossible, a phenomenon known as the “all-or-nothing” phase transition. Even morestrikingly, in the Gaussian setting, above the threshold all vertices can be exactly matched withhigh probability. In contrast, for sparse Erd˝os-R´enyi graphs with p = n − Θ(1) , we show that theall-or-nothing phenomenon no longer holds and we determine the thresholds up to a constantfactor. Along the way, we also derive the sharp threshold for exact recovery, sharpening theexisting results in Erd˝os-R´enyi graphs [CK16, CK17].The proof of the negative results builds upon a tight characterization of the mutual informa-tion based on the truncated second-moment computation in [WXY20] and an “area theorem”that relates the mutual information to the integral of the reconstruction error. The positive re-sults follows from a tight analysis of the maximum likelihood estimator that takes into accountthe cycle structure of the induced permutation on the edges.
Contents ∗ Y. Wu is with Department of Statistics and Data Science, Yale University, New Haven CT, USA, [email protected] . J. Xu and S. H. Yu are with The Fuqua School of Business, Duke University, Durham NC,USA, { jx77,haoyang.yu } @duke.edu . Y. Wu is supported in part by the NSF Grant CCF-1900507, an NSF CA-REER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants IIS-1838124,CCF-1850743, and CCF-1856424. Impossibility of Partial Recovery 8
A Hanson-Wright Inequality 34B Moment Generating Functions Associated with Edge Orbits 34
The problem of graph matching (or network alignment) refers to finding the underlying vertexcorrespondence between two graphs on the sole basis of their network topologies. Going be-yond the worst-case intractability of finding the optimal correspondence (quadratic assignmentproblem [PW94, BCPP98]), an emerging line of research is devoted to the average-case analysisof graph matching under meaningful statistical models, focusing on either information-theoreticlimits [CK16, CK17, CKMP19, HM20, WXY20, Gan20] or computationally efficient algorithms[FQRM +
16, LFF +
16, DMWX20, BCL +
19, FMWX19a, FMWX19b, GM20]. Despite these recentadvances, the sharp thresholds of graph matching remain not fully understood especially for ap-proximate reconstruction. The current paper aims to close this gap.Following [PG11, DMWX20], we consider the following probabilistic model for two randomgraphs correlated through a hidden vertex correspondence. Let the ground truth π be a uniformlyrandom permutation on [ n ]. We generate two random weighted graphs on the common vertexset [ n ] with (weighted) adjacency vectors A = ( A ij ) ≤ i
0. In this case, we have B = ρA π + p − ρ Z , where A and Z are independent standard normal vectors and A πij = A π ( i ) π ( j ) . • (Erd˝os-R´enyi random graph): P denotes the joint distribution of two correlated Bern( q )random variables X and Y such that P { Y = 1 | X = 1 } = s , where q ≤ s ≤
1. In this case, A and B are the adjacency vectors of two Erd˝os-R´enyi random graphs G , G ∼ G ( n, q ),where G π (with the adjacency vector A π ) and G are independently edge-subsampled from acommon parent graph G ∼ G ( n, p ) with p = q/s .Given the observation A and B , the goal is to recover the latent vertex correspondence π asaccurately as possible. More specifically, given two permutations π, b π on [ n ], denote the fraction of2heir overlap by overlap ( π, b π ) , n |{ i ∈ [ n ] : π ( i ) = b π ( i ) }| . Definition 1.
We say an estimator b π of π achieves, as n → ∞ , • partial recovery, if P { overlap ( b π, π ) ≥ δ } = 1 − o (1) for some constant δ ∈ (0 , • almost exact recovery, if P { overlap ( b π, π ) ≥ δ } = 1 − o (1) for any constant δ ∈ (0 , • exact recovery, if P { overlap ( b π, π ) = 1 } = 1 − o (1).The information-theoretic threshold of exact recovery has been determined for the Erd˝os-R´enyigraph model [CK17] in the regime of p = O (cid:0) log − ( n ) (cid:1) and more recently for the Gaussian model[Gan20]; however, the results and proof techniques in [CK17] do not hold for denser graphs. Incontrast, approximate recovery are far less well understood. Apart from the sharp condition foralmost exact recovery in the sparse regime p = n − Ω(1) [CKMP19], only upper and lower bounds areknown for partial recovery [HM20]. See Section 1.2 for a detailed review of these previous results.In this paper, we characterize the sharp reconstruction thresholds for both the Gaussian anddense Erd˝os-R´enyi graphs with p = n − o (1) . Specifically, we prove that there exists a sharp threshold,above which one can estimate all but a vanishing fraction of the latent permutation and belowwhich recovering any positive fraction is impossible, a phenomenon known as the “all-or-nothing”phase transition [RXZ19]. This phenomenon is even more striking in the Gaussian model, inthe sense that above the threshold the hidden permutation can be estimated error-free with highprobability. In contrast, for sparse Erd˝os-R´enyi graphs with p = n − Θ(1) , we show that the all-or-nothing phenomenon no longer holds. To this end, we determine the threshold for partial recoveryup to a constant factor and show that it is order-wise smaller than the almost exact recoverythreshold found in [CKMP19].Along the way, we also derive a sharp threshold for exact recovery, sharpening existing resultsin [CK16, CK17]. As a byproduct, the same technique yields an alternative proof of the resultin [Gan20] for the Gaussian model.
Throughout the paper, we let ǫ > b π ML denotethe maximum likelihood estimator, which reduces to b π ML ∈ arg max π ′ D A π ′ , B E . (1) Theorem 1 (Gaussian model) . If ρ ≥ (4 + ǫ ) log nn , (2) then P { overlap ( b π ML , π ) = 1 } = 1 − o (1) . Conversely, if ρ ≤ (4 − ǫ ) log nn , (3) then for any estimator b π and any fixed constant δ > , P { overlap ( b π, π ) ≤ δ } = 1 − o (1) . Theorem 1 implies that in the Gaussian setting, the recovery of π exhibits a sharp phasetransition in terms of the limiting value of nρ log n at threshold 4, above which exact recovery ispossible and below which even partial recovery is impossible. The positive part of Theorem 1 was3rst shown in [Gan20]. Here we provide an alternative proof that does not rely on the Gaussianproperty and works for Erd˝os-R´enyi graphs as well.The next result determines the sharp threshold for the Erd˝os-R´enyi model in terms of the keyquantity nps , the average degree of the intersection graph G ∧ G (whose edges are sampled byboth G and G ). Theorem 2 (Erd˝os-R´enyi graphs, dense regime) . Assume p is bounded away from and p = n − o (1) .If nps ≥ (2 + ǫ ) log n log p − p , (4) then for any constant δ < , P { overlap ( b π ML , π ) ≥ δ } = 1 − o (1) . Conversely, if nps ≤ (2 − ǫ ) log n log p − p , (5) then for any estimator b π and any constant δ > , P { overlap ( b π, π ) ≤ δ } = 1 − o (1) . Theorem 2 implies that analogous to the Gaussian model, in dense Erd˝os-R´enyi graphs, the re-covery of π exhibits an “all-or-nothing” phase transition in terms of the limiting value of nps (cid:16) log p − p (cid:17) log n at threshold 2, above which almost exact recovery is possible and below which even partial recoveryis impossible. However, as we will see in Theorem 4, unlike the Gaussian model, this thresholddiffers from that of exact recovery. Remark 1 (Entropy interpretation of the thresholds) . The sharp thresholds in Theorem 1 andTheorem 2 can be interpreted from an information-theoretic perspective. Suppose an estimator b π = b π ( A, B ) achieves almost exact recovery with E [ overlap ( π, b π )] = 1 − o (1), which, by a rate-distortion computation, implies that I ( π ; b π ) must be close to the full entropy of π , that is, I ( π ; b π ) =(1 − o (1)) n log n . On the other hand, by the data processing inequality, we have I ( π ; b π ) ≤ I ( π ; A, B ).The latter can be bounded simply as (see Section 1.3.2) I ( π ; A, B ) ≤ (cid:18) n (cid:19) I ( P ) , (6)where I ( P ) denote the mutual information between a pair of random variables with joint distribu-tion P . For the Gaussian model, we have I ( P ) = 12 log 11 − ρ . (7)For the correlated Erd˝os-R´enyi graph, I ( P ) = qd ( s k q ) + (1 − q ) d ( η k q ) , (8)where q = ps , η , q (1 − s )1 − q , and d ( s k q ) , D (Bern( s ) k Bern( q )) denotes the binary KL divergence. ByTaylor expansion, we have I ( P ) = s p (cid:16) p − p (cid:17) (1 − o (1)). Combining these with (cid:0) n (cid:1) I ( P ) ≥ (1 − o (1)) n log n shows that the conditions (3) and (5) are necessary for almost exact recovery. Thefact that they are also necessary for partial recovery takes more effort to show, which we do inSection 2. 4 heorem 3 (Erd˝os-R´enyi graphs, sparse regime) . Assume p = n − Ω(1) . If nps ≥ (2 + ǫ ) max (cid:26) log n log(1 /p ) , (cid:27) , (9) then there exists a constant δ > such that P { overlap ( b π ML , π ) ≥ δ } = 1 − o (1) . Conversely,assuming np = ω (log n ) , if nps ≤ − ǫ, (10) then for any estimator b π and any constant δ > , P { overlap ( b π, π ) ≤ δ } = 1 − o (1) . Theorem 3 implies that for sparse Erd˝os-R´enyi graphs with p = n − α for a constant α ∈ (0 , nps ≍
1, which is much lower thanthe almost exact recovery threshold nps = ω (1) as established in [CKMP19]. Hence, interestinglythe all-or-nothing phenomenon no longer holds for sparse Erd˝os-R´enyi graphs. Note that theconditions (9) and (10) differ by a constant factor. Determining the sharp threshold for partialrecovery in the sparse regime remains an open question.Finally, we address the exact recovery threshold in the Erd˝os-R´enyi graph model. Theorem 4 (Erd˝os-R´enyi graphs, exact recovery) . Assume p is bounded away from . If nps ≥ (1 + ǫ ) log n (cid:0) − √ p (cid:1) (11) then P { overlap ( b π ML , π ) = 1 } = 1 − o (1) . Conversely, if nps ≤ (1 − ǫ ) log n (cid:0) − √ p (cid:1) (12) then for any estimator b π , P { overlap ( b π, π ) = 1 } = o (1) . Note that log p − p ≥ − √ p ) , with equality if and only if p = 1. Thus, when p is boundedaway from 1, the threshold of exact recovery threshold is strictly higher than that of almost exactrecovery in the Erd˝os-R´enyi graph model, unlike the Gaussian model. Exact recovery
The information-theoretic thresholds for exact recovery have been determinedfor the Gaussian model and the general Erd˝os-R´enyi graph model in certain regimes. In particular,for the Gaussian model, it is shown in [Gan20] that if nρ ≥ (4+ ǫ ) log n for any constant ǫ >
0, thenthe MLE achieves exact recovery; if instead nρ ≤ (4 − ǫ ) log n , then exact recovery is impossible.Theorem 1 significantly strengthens this negative result by showing that under the same conditioneven partial recovery is impossible.Analogously, for Erd˝os-R´enyi random graphs, it is shown in [CK16, CK17] that the MLE achieveexact recovery when nps = log n + ω (1) under the additional restriction that p = O (log − ( n )). Conversely, exact recovery is shown in [CK16] to be information-theoretically impossible providedthat nps ≤ log n − ω (1), based on the fact the intersection graph G ∧ G ∼ G ( n, ps ) has many In fact, [CK16, CK17] studied exact recovery condition in a more general correlated Erd˝os-R´enyi model where P (cid:8) A π ( i ) π ( j ) = a, B ij = b (cid:9) = p ab for a, b ∈ { , } , which will also be the setting in Section 4. p = O (log − ( n )),the exact recovery threshold is given by lim nps log n = 1, coinciding with the connectivity thresholdof G ∧ G . In comparison, Theorem 4 implies that as long as p is bounded away from 1, theprecise exact recovery threshold is instead given by lim nps ( −√ p ) log n = 1, strictly higher than theconnectivity threshold when p = Θ(1). In particular, deriving the tight condition (12) requiresmore than eliminating isolated nodes. See the discussions in Section 1.3.4 for more details. Partial and almost exact recovery
Compared to exact recovery, the understanding of partialand almost exact recovery is less precise. It is shown in [CKMP19] that in the sparse regime p = n − Ω(1) , almost exact recovery is information-theoretically possible if and only if nps = ω (1). The more recent work [HM20] further investigates partial recovery. It is shown that if nps ≥ C ( δ ) max n , log n log(1 /p ) o , then there exists an exponential-time estimator b π that achieves overlap ( b π, π ) ≥ δ with high probability, where C ( δ ) is some large constant that tends to ∞ as δ →
1; conversely, if I ( P ) δ = o (cid:16) log( n ) n (cid:17) with I ( P ) given in (8), then no estimator can achieve overlap ( b π, π ) ≤ δ with positive probability. These conditions do not match in general and are muchlooser than the results in Theorems 2 and 3. We start by introducing some preliminary definitions associated with permutations (cf. [WXY20,Section 3.1] for more details and examples).
Let S n denote the set of permutations on the node set [ n ]. Each σ ∈ S n induces a permutation σ E on the edge set (cid:0) [ n ]2 (cid:1) of unordered pairs, according to σ E (( i, j )) , ( σ ( i ) , σ ( j )) . (13)We shall refer to σ and σ E as a node permutation and edge permutation. Each permutation canbe decomposed as disjoint cycles known as orbits. Orbits of σ (resp. σ E ) are referred as nodeorbits (resp. edge orbits). Let n k (resp. N k ) denote the number of k -node (resp. k -edge) orbits in σ (resp. σ E ). The cycle structure of σ E is determined by that of σ . For example, we have N = (cid:18) n (cid:19) + n , (14)because an edge ( i, j ) is a fixed point of σ E if and only if either both i and j are fixed points of σ or ( i, j ) forms a 2-node orbit of σ . Let F be the set of fixed points of σ with | F | = n . Denote O = (cid:0) F (cid:1) ⊂ O as the subset of fixed points of edge permutation σ E , where O denotes the collectionof all edge orbits of σ E . Let P denote the joint distribution of A and B under the correlated model. To prove our negativeresults, we introduce an auxiliary null model Q , under which A and B are independent with thesame marginal as P . In other words, under Q , ( A ij , B ij ) are i.i.d. pairs of independent randomvariables with a joint distribution Q equal to the product of the marginals of P .6s the first step, we leverage the previous truncated second moment computation in [WXY20] toconclude that the KL-divergence D ( PkQ ) is negligible under the desired conditions. By expressingthe mutual information as I ( π ; A, B ) = (cid:0) n (cid:1) D ( P k Q ) − D ( PkQ ) where D ( P k Q ) = I ( P ), this readilyimplies that I ( π ; A, B ) = (cid:0) n (cid:1) I ( P )(1 + o (1)). Next, we relate the mutual information I ( π ; A, B ) tothe integral of the minimum mean-squared error (MMSE) of A π , the weighted adjacency vectorrelabeled according to the ground truth. For the Gaussian model, this directly follows from thecelebrated I-MMSE formula [GSV05]. For correlated Erd˝os-R´enyi graphs, we introduce an appro-priate interpolating model and obtain an analogous but more involved “area theorem”, following[MMRU09, DAM17]. These two steps together imply that the MMSE of A π given the observation( A, B ) is asymptotically equal to the estimation error of the trivial estimator E [ A π ]. Finally, weconnect the MMSE of A π to the Hamming loss of reconstructing π , concluding the impossibility ofthe partial recovery.Note that by the non-negativity of D ( PkQ ), we arrive at the simple upper bound (6), that is, I ( π ; A, B ) ≤ (cid:0) n (cid:1) I ( P ). Interestingly, our proof relies on establishing an asymptotically matchinglower bound to the mutual information I ( π ; A, B ). This significantly deviates from the existingresults in [HM20] based on Fano’s inequality: P { overlap ( b π, π ) ≤ δ } ≥ − I ( π ; A,B )+1log( n ! /m ) with m = |{ π : overlap ( π, π ) ≥ δ }| , followed by applying the simple bound (6). Our positive results follow from a large deviation analysis of the maximum likelihood estimator (1).A crucial observation is that the difference of the objective function in (1) evaluated at a givenpermutation π ′ and the ground truth π can be decomposed across the edge orbits of σ , π − ◦ π ′ as D A π ′ − A π , B E = X O ∈O\O X O − X O ∈O\O Y O , X − Y, where F is the fixed point of σ, O = (cid:0) F (cid:1) ⊂ O is a subset of fixed points of the edge permutation σ E , X O , P ( i,j ) ∈ O A π ′ ( i ) π ′ ( j ) B ij , and Y O , P ( i,j ) ∈ O A π ( i ) π ( j ) B ij , are independent across edge orbits O . Crucially, Y depends on π ′ only through its fixed point set F , which has substantially fewerchoices than π ′ itself when n − | F | ≍ n . Therefore, for the purpose of applying the union boundit is beneficial to separately control X and Y . Indeed, we show that Y is highly concentrated onits mean. Hence, it remains to analyze the large-deviation event of X exceeding E [ Y ], which isaccomplished by a careful computation of the moment generation function M | O | , E [exp ( tX O )]and proving that M | O | ≤ M | O | / , for | O | ≥ . (15)Intuitively, it means that the contribution of longer edge orbits can be effectively bounded by thatof the 2-edge orbits. Capitalizing on this key finding and applying the Chernoff bound togetherwith a union bound over π ′ yields our tight conditions.We remark that the partial recovery results in [HM20] are obtained by analyzing an estimatorslightly different from the MLE and the same MGF bound (15) is used. However, there are twomajor differences that led to the looseness of their results. First, their analysis does not separatelybound X and Y . Second, the tilting parameter in the Chernoff bound is suboptimal. For exact recovery, we need to further consider π ′ that is close to π , i.e., n − | F | = o ( n ). Inthis regime, the number of choices of F is comparable to that of π ′ . Hence, instead of separately7ounding X and Y , it is more favorable to directly applying the Chernoff bound to the difference X − Y . Crucially, the moment generation function E [exp ( t ( X O − Y O ))] continues to satisfy therelation (15) and the bottleneck for exact recovery happens at | F | = n −
2, where π ′ differs from π from a 2-cycle (transposition).Prompted by this observation, we prove a matching necessary condition of exact recovery byconsidering all possible permutations σ , π − ◦ π ′ that consists of n − i, j ) (transposition), in which case, D A π ′ − A π , B E = − X k ∈ [ n ] \{ ij } (cid:0) A πik − A πjk (cid:1) ( B ik − B jk ) . (16)There remains two key challenges to conclude the existence of many choices of ( i, j ) for which h A π ′ , B i ≥ h A π , B i . First, to derive a tight impossibility condition, we need to obtain a tightlarge-deviation lower estimate for this event. Second, the RHS of (16) for different pairs ( i, j ) arecorrelated. This dependency is addressed by restricting the choices of ( i, j ) and applying a secondmoment computation.Note that the impossibility proof of exact recovery for the Gaussian model in [Gan20] alsoconsiders the permutations that consist of a single transposition. The difference is that the large-deviation lower estimate simply follows from Gaussian tail probability and the correlations amongdifferent pairs ( i, j ) is bounded by a second-moment calculation using densities of correlated Gaus-sians. To start, we characterize the asymptotic value of the mutual information I ( A, B ; π ) – a key quantitythat measures the amount of information about π provided by the observation ( A, B ). By definition, I ( A, B ; π ) , E (cid:2) D (cid:0) P A,B | π kP A,B (cid:1)(cid:3) = E (cid:2) D (cid:0) P A,B | π kQ A,B (cid:1)(cid:3) − D ( P A,B ||Q AB )for any joint distribution Q A,B of (
A, B ) such that D ( P A,B ||Q AB ) < ∞ . Note that P A,B | π fac-torizes into a product distribution Q i It holds that I ( A, B ; π ) = (cid:18) n (cid:19) I ( P ) − ζ n , where ζ n = o (1) in the Gaussian model with ρ ≤ (4 − ǫ ) log nn ; • ζ n = o (1) in the dense Erd˝os-R´enyi graphs with p = n − o (1) and nps (log(1 /p ) − p ) ≤ (2 − ǫ ) log( n ) ; • ζ n = O (log n ) in the sparse Erd˝os-R´enyi graphs with p = n − Ω(1) and np = ω (1) and nps ≤ − ǫ ;for some arbitrarily small but fixed constant ǫ > . Given the tight characterization of the mutual information in Proposition 1, we now relate itto the Bayes risk. Using the chain rule, we have I ( A, B ; π ) = I ( B ; π | A ) = I ( B ; A π | A ) , where the second equality follows from the fact that A → A π → B forms a Markov chain. Theintuition is that conditioned on A , B is a noisy observation of A π (which is random owning to π ). In such a situation, the mutual information can typically be related to an integral of thereconstruction error of the signal A π . To make this precise, we first introduce a parametric model P θ that interpolates between the planted model P and the null model Q as θ varies. We write E θ to indicate expectation taken with respect to the law P θ .For the Gaussian model, let P θ denote the model under which B = √ θA π + √ − θZ, where A, Z are two independent Gaussian matrices and θ ∈ [0 , θ = ρ corresponds to the plantedmodel P while θ = 0 corresponds to the null model Q . As θ increases from 0 to ρ , P θ interpolatesbetween Q and P . Let mmse θ ( A π ) , E θ [ k A π − E θ [ A π | A, B ] k ] (17)denote the minimum mean-squared error (MMSE) of estimating A π based on ( A, B ) distributedaccording to P θ . The following proposition follows from the celebrated I-MMSE formula [GSV05]. Proposition 2 (Gaussian model) . I ( A, B ; π ) = 12 Z ρ mmse θ ( A π )(1 − θ ) dθ . The correlated Erd˝os-R´enyi graph model requires more effort. Let us fix q = ps and consider thefollowing coupling P θ between two Bern( q ) random variables with joint probability mass function p θ , where p θ (11) = qθ , p θ (01) = p θ (10) = q (1 − θ ), and p θ (00) = 1 − (2 − θ ) q , with θ ∈ [ q, s ]. Let P θ denote the interpolated model under which (cid:0) A π ( i ) π ( j ) , B ij (cid:1) are i.i.d. pairs of correlated randomvariables with joint distribution P θ . As θ increases from q to s , P θ interpolates between the nullmodel Q = P q and the planted model P = P s . We have the following area theorem that relates I ( A, B ; π ) to the MMSE of A π . Proposition 3 (Erd˝os-R´enyi random graph) . It holds that I ( A, B ; π ) ≤ (cid:18) n (cid:19) I ( P ) + (cid:18) n (cid:19) qs + Z sq θ − qs (1 − q ) (cid:18) mmse θ ( A π ) − (cid:18) n (cid:19) q (1 − q ) (cid:19) dθ. Finally, we relate the estimation error of A π to that of π .9 roposition 4. In both the Gaussian and Erd˝os-R´enyi graph model, if mmse θ ( A π ) ≥ E (cid:2) k A k (cid:3) (1 − ξ ) , (18) for some ξ > , then for any estimator b π = b π ( A, B ) , E θ [ overlap ( b π, π )] ≤ O ξ / + (cid:18) n log n E [ k A k ] (cid:19) / ! . Now, we are ready to prove the negative results on partial recovery. We start with the Gaussiancase. Proof of Theorem 1. In the Gaussian model, we have I ( P ) = D (cid:16) N (cid:16) ( ) , (cid:16) ρρ (cid:17) (cid:17)(cid:13)(cid:13)(cid:13) N (cid:16) ( ) , ( ) (cid:17)(cid:17) = 12 log 11 − ρ . Assume that ρ = (4 − ǫ/ log nn . Fix some θ ∈ (0 , ρ ) to be chosen. Then (cid:18) n (cid:19) 12 log 11 − ρ − ζ n (a) = I ( A, B ; π ) (b) ≤ − ρ ) Z ρ mmse θ ( A π ) dθ (c) ≤ − ρ ) (cid:18)(cid:18) n (cid:19) θ + mmse θ ( A π )( ρ − θ )) (cid:19) = 12(1 − ρ ) (cid:18)(cid:18) n (cid:19) ρ + (cid:18) mmse θ ( A π ) − (cid:18) n (cid:19)(cid:19) ( ρ − θ ) (cid:19) where ζ n = o (1) and ( a ) holds by Proposition 1 ; ( b ) follows from the I-MMSE formula givenin Proposition 2; ( c ) holds because mmse θ ( A π ) ≤ E (cid:2) k A k (cid:3) = (cid:0) n (cid:1) and the fact that mmse θ ( A π ) isnon-increasing in θ . Rearranging the terms in the last displayed equation, we getmmse θ ( A π ) − (cid:18) n (cid:19) ≥ (1 − ρ ) ρ − θ (cid:18)(cid:18) n (cid:19) (cid:18) log 11 − ρ − ρ − ρ (cid:19) − ζ n (cid:19) ≥ − (1 − ρ ) ρ − θ (cid:18)(cid:18) n (cid:19) ρ (1 − ρ ) + 2 ζ n (cid:19) , where the last inequality holds because log(1 + x ) ≥ x − x for x ≥ 0. Choosing θ = (4 − ǫ ) log nn ,we conclude thatmmse θ ( A π ) ≥ (cid:18) n (cid:19) (cid:18) − O (cid:18) ρ + ζ n n ρ (cid:19)(cid:19) = (cid:18) n (cid:19) (cid:18) − O (cid:18) log nn (cid:19)(cid:19) , (19)where the last equality holds because ρ = Θ(log( n ) /n ) and ζ n = o (1). Since E (cid:2) k A k (cid:3) = (cid:0) n (cid:1) , itfollows from Proposition 4 that E θ [ overlap ( b π, π )] ≤ O (cid:18) log nn (cid:19) / ! . Finally, by Markov’s inequality, for any δ n = ω (cid:18)(cid:16) log nn (cid:17) / (cid:19) (in particular, a fixed constant δ n > P θ { overlap ( b π, π ) ≥ δ n } = o (1). Note that P θ corresponds to the Gaussian model with squaredcorrelation coefficient equal to θ = (4 − ǫ ) log nn . By the arbitrariness of ǫ , this completes the proofof Theorem 1. 10ext, we move to the Erd˝os-R´enyi graph model. Proof of the negative parts in Theorems 2 and 3. Let s = (2 − ǫ ) log nnp (log(1 /p ) − p ) in the dense regime and s = − ǫnp in the sparse regime. Then we get that − ζ n − (cid:18) n (cid:19) qs ≤ Z sq θ − qs (1 − q ) (cid:18) mmse θ ( A π ) − (cid:18) n (cid:19) q (1 − q ) (cid:19) dθ (b) ≤ Z s (1 − ǫ ) s θ − qs (1 − q ) (cid:18) mmse (1 − ǫ ) s ( A π ) − (cid:18) n (cid:19) q (1 − q ) (cid:19) dθ = s (2 ǫ − ǫ ) − ǫsqs (1 − q ) | {z } =Θ( s ) (cid:18) mmse (1 − ǫ ) s ( A π ) − (cid:18) n (cid:19) q (1 − q ) (cid:19) where ζ n = o (1) in the dense regime and ζ n = O (log n ) in the sparse regime; ( a ) follows from Propo-sition 1 and Proposition 3; ( b ) holds because mmse θ ( A π ) ≤ E (cid:2) k A − E [ A ] k (cid:3) = (cid:0) n (cid:1) q (1 − q ) andmmse θ ( A π ) is non-increasing in θ . Rearranging the terms in the last displayed equation, weconclude that mmse (1 − ǫ ) s ( A π ) ≥ (cid:18) n (cid:19) q (1 − q ) (cid:18) − O (cid:18) ζ n n qs + s (cid:19)(cid:19) ≥ (cid:18) n (cid:19) q (1 − O ( s )) , where the last inequality holds because q ≤ s , ζ n = O ( n qs ) and s ≍ log nnp log(1 /p ) . Since E (cid:2) k A k (cid:3) = (cid:0) n (cid:1) q , it follows from Proposition 4 that E (1 − ǫ ) s [ overlap ( b π, π )] ≤ O s / + (cid:18) log nnq (cid:19) / ! = O (cid:18) log nnq (cid:19) / ! , where the last equality holds because nqs = nps = O (log n ). Note that in the dense regime, since s ≍ log nnp (log p − p ) and p = n − o (1) , we have nq = nps = ω (log n ). This also holds in the sparse regimewhen s ≍ np under the extra assumption that np = ω ((log n ) ). Thus, by Markov’s inequality, forany δ n = ω (cid:18)(cid:16) log nnq (cid:17) / (cid:19) , in particular, any fixed constant δ n > 0, we have P (1 − ǫ ) s { overlap ( b π, π ) ≥ δ n } = o (1). In other words, we have shown the desired impossibility result under the distribution P (1 − ǫ ) s , which corresponds to the correlated Erd˝os-R´enyi model with parameters p ′ = p − ǫ and s ′ = s (1 − ǫ ). By the arbitrariness of ǫ , this completes the proof. In this subsection, we prove Proposition 1, which reduces to bounding D ( P A,B kQ A,B ). It is well-known that KL divergence can be bounded by the χ -divergence (variance of the likelihood ratio).This method, however, is often too loose as the second moment can be derailed by rare events.Thus a more robust version is by means of truncated second moment , which has been carried The fact that mmse θ ( A π ) is non-increasing in θ follows from a simulation argument. Let ( A, B ) ∼ P θ . Fix θ ′ such that q < θ ′ < θ < s . Define B ′ = ( B ′ ij ) by passing each B ij independently through the same (asymmetric)channel W to obtain B ′ ij , where W (0 | 1) = (1 − q )( θ − θ ′ ) θ − q and W (1 | 0) = q ( θ − θ ′ ) θ − q are well-defined. Then ( A, B ′ ) ∼ P θ ′ . P A,B , Q A,B ) for studying the hypothesis testing problem in graphmatching. Here we leverage the same result to bound the KL divergence. To this end, we firstpresent a general bound then specialize to our problem in both Gaussian and Erd˝os-R´enyi models. Lemma 1. Let P XY denote the joint distribution of ( X, Y ) . Let E be an event independent of X such that P ( E ) = 1 − δ . Let Q Y be an auxiliary distribution such that P Y | X ≪ Q Y P X -a.s. Then D ( P Y kQ Y ) ≤ log(1+ χ ( P Y |E kQ Y ))+ δ (cid:18) log 1 δ + E [ D ( P Y | X kQ Y )] (cid:19) + s δ · Var (cid:18) log d P Y | X d Q Y (cid:19) , (20) where P Y |E denote the distribution of Y conditioned on E , and the χ -divergence is defined as χ ( P k Q ) = E Q [( dPdQ ) ] if P ≪ Q and ∞ otherwise.Proof. Note that P Y = (1 − δ ) P Y |E + δ P Y |E c . Thanks to the convexity of the KL divergence,Jensen’s inequality yields D ( P Y kQ Y ) ≤ (1 − δ ) D ( P Y |E kQ Y ) + δD ( P Y |E c kQ Y ) . The first term can be bounded using the generic fact that D ( P k Q ) ≤ log E Q [( dPdQ ) ] = log(1 + χ ( P k Q )). Let g ( X, Y ) = log d P Y | X d Q Y . Using the convexity of KL divergence and the independenceof E and X , we bound the second term as follows: D ( P Y |E c kQ Y ) ≤ E [ D ( P Y | X, E c kQ Y )]= E (cid:20) log (cid:18) g ( X, Y ) P ( E c ) { ( X,Y ) ∈E c } (cid:19)(cid:21) = δ (cid:18) log 1 δ + E [ g ( X, Y )] (cid:19) + E [( g ( X, Y ) − E [ g ( X, Y )]) { ( X,Y ) ∈E c } ] . Applying Cauchy-Schwarz to the last term completes the proof.Next we apply Lemma 1 in the context of the random graph matching by identifying X and Y with the latent π and the observation ( A, B ) respectively. Let E be a certain high-probabilityevent independent of π . Then P A,B |E = 1 P ( E ) n ! X π ∈S n P A,B { ( A,B,π ) ∈E} . Recall that the null model is chosen to be Q A,B = P A ⊗ P B . As shown in [WXY20], for boththe Gaussian and Erd˝os-R´enyi graph model, it is possible to construct a high-probability event E satisfying the symmetry condition P ( E | π ) = P ( E ), such that χ ( P A,B |E kQ A,B ) = o (1). Thisbounds the first term in (20). For the second term, since both A and B are individually independentof π , we have E (cid:20) log d P A,B | π d Q A,B (cid:21) = I ( A ; B | π ) = (cid:18) n (cid:19) D ( P k Q ) = (cid:18) n (cid:19) I ( P ) , (21)where the last equality holds because Q is the product of the marginals of P . The third term in (20)can be computed explicitly. Next we give details on how to complete the proof of Proposition 1.12 aussian model It is shown in [WXY20, Section 4.1] that there exists an event E independentof π such that P ( E c ) = e − Ω( n ) and χ ( P E kQ ) = o (1) , provided that ρ ≤ (4 − ǫ ) log nn . Furthermore,by (21) and (7), we have E h log d P A,B | π d Q A,B i = (cid:0) n (cid:1) log − ρ = O ( n log n ). To compute the variance,note that log d P A,B | π d Q A,B = − (cid:18) n (cid:19) log(1 − ρ ) − h ( A, B, π )4(1 − ρ ) , where h ( A, B, π ) , ρ k A k + ρ k B k − ρ h A π , B i . Thus Var(log d P A,B | π d Q A,B ) = − ρ ) Var( h ( A, B, π )). Write B = ρA π + p − ρ Z where Z is anindependent copy of A , we have h ( A, B, π ) = ρ ( k B k −k A k ) − ρ p − ρ h A π , Z i . Here both k A k and k B k are distributed as χ ( n ), with variance equal to 2 (cid:0) n (cid:1) . Furthermore, Var( h A π , Z i ) = (cid:0) n (cid:1) .Thus Var( h ( A, B, π )) = O ( n log n ). Applying Lemma 1, we conclude that D ( P A,B kQ A,B ) = o (1). Erd˝os-R´enyi Graphs In the dense regime of p = n − o (1) and p = 1 − Ω(1), it is shown in [WXY20,Section 6.5] that there exists an event E such that P ( E c ) = e − n − o (1) and χ ( P E kQ ) = o (1), providedthat nps (log(1 /p ) − p ) ≤ (2 − ǫ ) log( n ). In the sparse regime (see [WXY20, Section 5]), it ispossible to choose E such that P ( E c ) = O ( n ) and χ ( P E kQ ) = o (1), provided that nps ≤ − ǫ and np = ω (1).By (21) and (8), we have E h log d P A,B | π d Q A,B i = (cid:0) n (cid:1) I ( P ), where I ( P ) = qd ( s k q ) + (1 − q ) d ( η k q ), with q = ps and η = q (1 − s )1 − q . In both the dense and sparse regime, one can verify that I ( P ) = s p (cid:18) p − p (cid:19) (1 + o (1)) . As a result, we have E h log d P A,B | π d Q A,B i = O ( n log n ) in both cases.It remains to bound the variance in (20). Note thatlog P ( A, B | π ) Q ( A, B ) = (cid:18) n (cid:19) log 1 − η − ps + 12 h ( A, B, π ) , where h ( A, B, π ) , log 1 − s − η (cid:0) k A k + k B k (cid:1) + log s (1 − η ) η (1 − s ) h A π , B i . Since p is bounded away from 1 and s = o (1) in both the dense regime ( p = n − o (1) ) and sparseregime ( p = n − Ω(1) and np = ω (1)), it follows thatlog 1 − η − s = log (cid:18) s (1 − p )(1 − s )(1 − ps ) (cid:19) = (1 + o (1)) s (1 − p ) s (1 − η ) η (1 − s ) = (1 − η )(1 − ps ) p (1 − s ) = 1 + o (1) p . Note that k A k , k B k ∼ Binom( (cid:0) n (cid:1) , ps ) and h A π , B i ∼ Binom( (cid:0) n (cid:1) , ps ). We have Var( h ) = O ( n ps log p ). Consequently, p Var( h ) P ( E c ) = o (1) and O (log n ) in the dense and sparse case,respectively. Applying Lemma 1 yields the same upper bound on D ( P A,B kQ A,B ).13 .2 Proof of Proposition 2 The proof follows from a simple application of the I-MMSE formula for the additive Gaussianchannel. Note that under the interpolated model P θ , we have B √ − θ = r θ − θ A π + Z, which is the output of an additive Gaussian channel with input A π and the standard Gaussian noise Z . Letting I ( θ ) = I ( B ; A π | π ) = I ( A, B ; π ) and using the I-MMSE formula [GSV05, Theorem 2],we have dI ( θ ) d ( θ/ (1 − θ )) = 12 mmse θ ( A π ) . (22)Thus dI ( θ ) dθ = − θ ) mmse θ ( A π ). Integrating over θ from 0 to ρ and noting I (0) = 0, we arrive at I ( ρ ) = Z ρ − θ ) mmse θ ( A π ) dθ. Note that under the interpolated model P θ , we have p θ ( y | x ) = θ x = 1 , y = 11 − θ x = 1 , y = 0 η x = 0 , y = 11 − η x = 0 , y = 0 , η = q (1 − θ )1 − q . Let g ( θ ) , D ( P θ k Q ) = q · d ( θ k q ) + (1 − q ) d ( η k q ). Then g ( s ) = D ( P k Q ). Let I ( θ ) = I θ ( A, B ; π ),where the subscript θ indicates that A π and B are distributed according to P θ . Then I s ( A π ; B | A ) = H s ( A π | A ) − H s ( A π | B, A ) = H q ( A π | B, A ) − H s ( A π | B, A ) = − Z sq ddθ H θ ( A π | B, A ) dθ, where the second equality holds because for a fixed q , H θ ( A π | A ) does not change with θ and when θ = q , A π and B are independent and hence I q ( A π ; B | A ) = 0 so that H q ( A π | A ) = H q ( A π | B, A ).By [DAM17, Lemma 7.1], we have ddθ H θ ( A π | B, A ) = (I) + (cid:18) n (cid:19) ddθ ( h ( q ) − g ( θ )) = (I) − (cid:18) n (cid:19) g ′ ( θ ) , where h ( q ) = − q log q − (1 − q ) log(1 − q ) is the binary entropy function,(I) = X e ∈ ( [ n ]2 ) X x e ,y e ∂p θ ( y e | x e ) ∂θ E µ e ( x e | B \ e , A ) log X x ′ e p θ ( y e | x ′ e ) µ e ( x ′ e | B \ e , A ) ,B \ e denotes the adjacency vector B excluding B e , and µ e ( · | B \ e , A ) is the distribution of A πe conditional on ( B \ e , A ) under P θ . Since g ( q ) = 0, we have I s ( A π ; B | π ) = − Z sq (I) + (cid:18) n (cid:19) g ( s ) = − Z sq (I) + (cid:18) n (cid:19) D ( P k Q ) . (23)14t remains to relate (I) to the reconstruction error. Note that for x, y ∈ { , } , p θ ( y | x ) = α ( x ) y + (1 − α ( x ))(1 − y ) , α ( x ) = θx + η (1 − x ) (24)and ∂p θ (1 | x ) ∂θ = ∂α ( x ) ∂θ = x + ∂η∂θ (1 − x ) = − ∂p θ (0 | x ) ∂θ . Thus for each x e , X y e =0 , ∂p θ ( y e | x e ) ∂θ E µ e ( x e | B \ e , A ) log X x ′ e =0 , p θ ( y e | x ′ e ) µ e ( x ′ e | B \ e , A ) = ∂p θ (1 | x e ) ∂θ E " µ e ( x e | B \ e , A ) log P x ′ e =0 , p θ (1 | x ′ e ) µ e ( x ′ e | B \ e , A )1 − P x ′ e =0 , p θ (1 | x ′ e ) µ e ( x ′ e | B \ e , A ) = − ∂p θ (1 | x e ) ∂θ E (cid:2) µ e ( x e | B \ e , A ) h ′ ( y e ) (cid:3) , where we defined y e ≡ y e ( B \ e , A ) , X x e =0 , p θ (1 | x e ) µ e ( x e | B \ e , A ) . and used h ′ ( x ) = log − xx . Then we have X x e =0 , X y e =0 , ∂p θ ( y e | x e ) ∂θ E µ e ( x e | B \ e , A ) log X x ′ e =0 , p θ ( y e | x ′ e ) µ e ( x ′ e | B \ e , A ) = − E (cid:20) ∂y e ∂θ h ′ ( y e ) (cid:21) Let e x e = E [ A πe | B \ e , A ]. Then y e = θ e x e + η (1 − e x e ). Let∆ e = y e − q = ( θ − η )( e x e − q ) = θ − q − q ( e x e − q ) . Then E [∆ e ] = 0 and E [∆ e ] = ( θ − q − q ) Var( e x e ). Furthermore, ∂y e ∂θ = 11 − q ( e x e − q ) . Using h ′′ ( x ) = − x (1 − x ) , we get h ′ ( y e ) = h ′ ( q ) − ξ (1 − ξ ) ∆ e for some ξ between y e and q . Notethat η ≤ q ≤ θ ≤ s and y e ∈ [ η, θ ]. Thus η ≤ ξ ≤ s . So − E (cid:20) ∂y e ∂θ h ′ ( y e ) (cid:21) = − h ′ ( q )1 − q E [ e x e − q ] + θ − q (1 − q ) E (cid:20) ( e x e − q ) ξ (1 − ξ ) (cid:21) ≥ θ − qs (1 − q ) Var θ ( e x e ) . Integrating over θ we get Z sq (I) = X e Z sq dθ (cid:18) − E (cid:20) ∂y e ∂θ h ′ ( y e ) (cid:21)(cid:19) (25) ≥ X e Z sq dθ θ − qs (1 − q ) Var θ ( e x e ) (26)15inally, note that the above bound pertains to e x e = E [ A πe | B \ e , A ], which we now relate to b x e = E [ A πe | B, A ]. Denote by µ e ( ·| B, A ) the full posterior law of A πe . Note that b x e = X x e x e µ e ( x e | B, A ) = P x e x e µ e ( x e | B \ e , A ) p θ ( B e | x e ) P x e µ e ( x e | B \ e , A ) p θ ( B e | x e )By (24), we have p θ ( y | x ) = 1 − y + ( θx + η (1 − x )) (2 y − . After some simplification, we have b x e = X x e x e µ e ( x e | B, A ) = e x e ( θB e + (1 − θ )(1 − B e ))1 − η − (1 − η ) B e + e x e (2 B e − θ − η ) = ( (1 − θ ) e x e − η − e x e ( θ − η ) B e = 0 θ e x e η + e x e ( θ − η ) B e = 1 . Since η ≤ q ≤ θ ≤ s , we have b x e ≤ B e min (cid:18) , sη e x e (cid:19) + (1 − B e ) e x e and hence E [ b x e ] ≤ E " min (cid:18) , sη e x e (cid:19) B e + E [ e x e ] . Note that E " min (cid:18) , sη e x e (cid:19) B e (a) = E " E " min (cid:18) , sη e x e (cid:19) (cid:12)(cid:12)(cid:12) A πe E [ B e | A πe ] (b) = E " E " min (cid:18) , sη e x e (cid:19) (cid:12)(cid:12)(cid:12) A πe ( sA πe + η (1 − A πe )) (c) ≤ s E [ A πe ] + s E h E he x e (cid:12)(cid:12)(cid:12) A πe i (1 − A πe ) i ≤ sq + s E [ e x e ] = 2 sq, where (a) follows from the conditional independence of e x e (which depends on ( A, B \ e )) and B e given A πe ; (b) follows from (24); (c) follows by using min (cid:16) , sη e x e (cid:17) ≤ (cid:16) , sη e x e (cid:17) ≤ sη e x e to get the second term. Combining the previous two displays yields that E θ [ b x e ] ≤ sq + E (cid:2)e x e (cid:3) ≤ sq + q + Var (cid:0)e x e (cid:1) . It follows thatmmse( A π ) = X e E h ( x e − b x e ) i = X e (cid:0) E (cid:2) x e (cid:3) − E (cid:2)b x e (cid:3)(cid:1) ≥ (cid:18) n (cid:19) q (1 − q ) − (cid:18) n (cid:19) sq − X e Var (cid:0)e x e (cid:1) . (27)16ombining (26) with (27) yields that Z sq (I) ≥ Z sq dθ θ − qs (1 − q ) (cid:18)(cid:18) n (cid:19) q (1 − q ) − (cid:18) n (cid:19) sq − mmse θ ( A π ) (cid:19) dθ = Z sq dθ θ − qs (1 − q ) (cid:18)(cid:18) n (cid:19) q (1 − q ) − mmse θ ( A π ) (cid:19) dθ − (cid:18) n (cid:19) ( s − q ) q (1 − q ) ≥ Z sq dθ θ − qs (1 − q ) (cid:18)(cid:18) n (cid:19) q (1 − q ) − mmse θ ( A π ) (cid:19) dθ − (cid:18) n (cid:19) qs The conclusion follows by combining the last display with (23). In this section, we prove Proposition 4 by connecting the MSE of A π to the Hamming risk ofestimating π . In particular, assuming (18), that is, mmse( A π ) ≥ E (cid:2) k A k (cid:3) (1 − ξ ), we aim toshow that E [ overlap ( π, b π )] = O (cid:16) ξ / + ( n log n E [ k A k ] ) / (cid:17) for any estimator b π ( A, B ). We first present ageneral program and then specialize the argument to the Gaussian and Erd˝os-R´enyi graph model.Recall that overlap ( π, b π ) denotes the fraction of fixed points of σ , π − ◦ b π . Let α ( π, b π )denote the fraction of fixed points of the edge permutation σ E induced by the node permutation σ (cf. Section 1.3.1). The following simple lemma relates α ( π, b π ) to overlap ( π, b π ). Lemma 2. It holds that E [ overlap ( π, b π )] ≤ p E [ α ( π, b π )] + 1 n . Proof. In view of (14), (cid:18) n overlap ( π, b π )2 (cid:19) ≤ (cid:18) n (cid:19) α ( π, b π ) . By Jensen’s inequality, (cid:18) n E [ overlap ( π, b π )]2 (cid:19) ≤ E (cid:20)(cid:18) n overlap ( π, b π )2 (cid:19)(cid:21) ≤ (cid:18) n (cid:19) E [ α ( π, b π )] . The desired conclusion follows because for x, y ≥ (cid:0) nx (cid:1) ≤ (cid:0) n (cid:1) y ⇐⇒ nx − x − ( n − y ≤ ⇒ x ≤ √ n ( n − y n ≤ √ y + n .In view of Lemma 2 and the fact that E (cid:2) k A k (cid:3) ≤ n , it suffices to show E [ α ( π, b π )] = O (cid:16) ξ / + ( n log n E [ k A k ] ) / (cid:17) . Let α = E [ α ( π, b π )] and define an estimator of A π by b A = α A b π + (1 − α ) E [ A ]. This is well-defined since α is deterministic and b π only depends on ( A, B ). Intuitively, b A can be viewed as an interpolation between the “plug-in” estimator A b π and the trivial estimator E [ A π ] = E [ A ]. We remark that to derive the desired lower bound to α , it is crucial to use theinterpolated estimator b A rather than the “plug-in” estimator, because we expect α to be smalland b π is only slightly correlated with π .On the one hand, by definition of the MMSE and the assumption (18), E h k A π − b A k i ≥ mmse( A π ) ≥ E (cid:2) k A k (cid:3) (1 − ξ ) . (28)17n the other hand, we claim that in both the Gaussian and Erd˝os-R´enyi model, E hD A π , A b π Ei ≥ E (cid:2) k A k (cid:3) α − O (cid:18)q E (cid:2) k A k (cid:3) n log n (cid:19) , (29)so that E h k A π − b A k i = E (cid:2) k A π k (cid:3) + E h k b A k i − E hD A π , b A Ei (a) = (1 + α ) E (cid:2) k A k (cid:3) − (1 − α ) k E [ A ] k − α E hD A π , A b π Ei ≤ (1 − α ) E (cid:2) k A k (cid:3) + O (cid:18) α q E (cid:2) k A k (cid:3) n log n (cid:19) , (30)where in (a) we used the fact that E [ A π ] = E [ A ] is entrywise constant and (cid:10) E [ A ] , A b π (cid:11) = E [ A ] P i Let C be a sufficiently large enough constant. For each permutation π ′ ∈ S n ,define an event F π ′ = n h A π , A π ′ i ≥ E (cid:2) k A k (cid:3) α ( π, π ′ ) − C p E [ k A k ] n log n o (32)and set F , ∩ π ′ ∈S n F π ′ . It follows that E hD A π , A b π Ei = E hD A π , A b π E {F} i + E hD A π , A b π E {F c } i ≥ E (cid:2) k A k (cid:3) E (cid:2) α ( π, b π ) {F} (cid:3) − C p E [ k A k ] n log n − E (cid:2) k A k {F c } (cid:3) ≥ E (cid:2) k A k (cid:3) ( α − P {F c } ) − C p E [ k A k ] n log n − p E [ k A k ] P {F c } , where the last inequality holds because E (cid:2) α ( π, b π ) {F} (cid:3) = E [ α ( π, b π )] − E (cid:2) α ( π, b π ) {F c } (cid:3) ≥ α − P {F c } , and E (cid:2) k A k {F c } (cid:3) ≤ p E [ k A k ] P {F c } by the Cauchy-Schwarz inequality. Note that E (cid:2) k A k (cid:3) = O ( n ), and E (cid:2) k A k (cid:3) is equal to (cid:0) n (cid:1) in the Gaussian case and (cid:0) n (cid:1) q in the Erd˝os-R´enyi case (with q ≥ n − O (1) ). To get (29), it suffices to prove P {F c } ≤ e − n log n , which, by union bound, furtherreduces to showing that P (cid:8) F cπ ′ (cid:9) ≤ e − n log n for any permutation π ′ ∈ S n . To this end, we considerthe Gaussian and Erd˝os-R´enyi graph model separately.For the Gaussian Winger model, let M ∈ { , } ( [ n ]2 ) × ( [ n ]2 ) denote the permutation matrix cor-responding to the edge permutation σ E induced by σ = π − ◦ π ′ . Recalling that A denotes theweighted adjacency vector, we have h A π , A π ′ i = A ⊤ M A. By Hanson-Wright inequality Lemma 6,with probability at least 1 − δA ⊤ M A ≥ Tr ( M ) − C ′ (cid:16) k M k F p log(1 /δ ) + k M k log(1 /δ ) (cid:17) , C ′ > Tr ( M ) = (cid:0) n (cid:1) α ( π, π ′ ), k M k F = (cid:0) n (cid:1) , and the spectralnorm k M k = 1, it follows that with probability at least 1 − δ , A ⊤ M A ≥ (cid:18) n (cid:19) α ( π, π ′ ) − C ′ (cid:16) n p log(1 /δ ) + log(1 /δ ) (cid:17) . Choosing δ = e − n log n and C in (32) to be a large enough constant, we get that P {F π ′ } ≥ − e − n log n . Next, we move to the Erd˝os-R´enyi graph model. Fix any permutation π ′ ∈ S n . Let O denotethe set of fixed points of the edge permutation induced by π ′ ◦ π − . By definition, | O | = (cid:0) n (cid:1) α ( π, π ′ )and D A π , A π ′ E ≥ X ( i,j ) ∈ O A ij ∼ Binom (cid:18)(cid:18) n (cid:19) α ( π, π ′ ) , q (cid:19) . By Bernstein’s inequality, with probability at least 1 − δ , D A π , A π ′ E ≥ (cid:18) n (cid:19) α ( π, π ′ ) q − C ′ s(cid:18) n (cid:19) α ( π, π ′ ) q log(1 /δ ) + log(1 /δ ) ! ≥ (cid:18) n (cid:19) α ( π, π ′ ) q − C ′ (cid:16)p n q log(1 /δ ) + log(1 /δ ) (cid:17) , where C ′ > δ = e − n log n and C to be a large enough constant,we get that P {F π ′ } ≥ − e − n log n . For any two permutations π, π ′ ∈ S n , let d ( π, π ′ ) denote the number of non-fixed points in the π ′ ◦ π − . The following proposition provides sufficient conditions for b π ML defined in (1) to achievethe correlated recovery and almost exact recovery in Erd˝os-R´enyi graphs. Proposition 5. Assume that p ≤ − c for some constant c . Suppose that nps ≥ ( (2+ ǫ ) log n log(1 /p ) − p if p ≥ n − ǫ if p < n − (33) for any arbitrarily small constant ǫ > . Then there exists a constant δ = δ ( ǫ, c ) < such that P { d ( b π ML , π ) < δn } ≥ − n − o (1) , that is, b π ML achieves partial recovery with high probability.If in addition nps = ω (1) , then for any constant δ > P { d ( b π ML , π ) < δn } ≥ − n − o (1) , that is, b π ML achieves almost exact recovery with high probability. We remark that in the dense regime of p = n − o (1) , (33) already implies that nps = ω (1)and hence the MLE achieves almost exact recovery provided nps ≥ (2+ ǫ ) log n log(1 /p ) − p ; this proves thepositive part of Theorem 2. In contrast, in the sparse regime of p = n − Ω(1) , the MLE achievesthe almost exact recovery provided that nps = ω (1), which is in fact needed for any estimator tosucceed [CKMP19].To prove Proposition 5, we need the following intermediate lemma, which bounds the probabilitythat the ML estimator (1) makes a given number of errors.19 emma 3. Suppose that for any constant < ǫ < , • for Erd˝os-R´enyi random graphs, p ≤ − c for some constant c and (33) holds. There existsa constant c = c ( c , ǫ ) > such that if h ( k/n ) ≤ cǫ kps ; • for Gaussian model, nρ ≥ (4+ ǫ ) log n . There exists a constant c such that if h ( k/n ) ≤ cǫ kρ ;then P { d ( b π ML , π ) = k } ≤ (cid:18) − nh (cid:18) kn (cid:19)(cid:19) { k ≤ n − } + e − n { k = n } + exp (cid:18) − ǫk log n (cid:19) , (34) where h ( x ) = − x log x − (1 − x ) log(1 − x ) is the binary entropy function. Note that Lemma 3 also includes the Gaussian case. In fact, analogous to Proposition 5, wecan apply Lemma 3 to show that the MLE attains almost exact recovery when nρ ≥ (4 + ǫ ) log n .We will not do it here; instead in the next section, we will directly prove a stronger result, showingthat the MLE attains exact recovery under the same condition.Now, we proceed to prove Proposition 5. Proof. (Proof of Proposition 5) Note that h ( x ) /x is monotone decreasing in x and converges to 0as x → • Since p log(1 /p ) − p is monotone decreasing in p ∈ (0 , nps ≥ ǫ and hence there exists a constant δ > h ( δ ) /δ ≤ cǫ (4 + ǫ ), where c is the constant in Lemma 3. • If further nps = ω (1), then for any constant δ < h ( δ ) /δ ≤ cǫ nps .In both cases, we get that h ( k/n ) ≤ cǫ kps for all k ≥ δn . Applying Lemma 3 with a union boundyields that P { d ( b π ML , π ) ≥ δn } ≤ n X k ≥ δn P { d ( b π ML , π ) = k }≤ exp ( − n ) + 2 n − X k ≥ δn exp (cid:18) − nh (cid:18) kn (cid:19)(cid:19) + X k ≥ δn exp (cid:18) − ǫk log n (cid:19) = n − o (1) , (35)where the last inequality follows from P k ≥ δn exp (cid:0) − ǫk log n (cid:1) ≤ exp ( − ǫ δn log n ) − exp ( − ǫ log n ) = n − Ω( n ) for anyfixed constant δ > 0, and n − X k ≥ exp (cid:18) − nh (cid:18) kn (cid:19)(cid:19) (a) ≤ X ≤ k ≤ n/ exp (cid:16) − k log nk (cid:17) ≤ 10 log n X k =1 exp (cid:16) − k log nk (cid:17) + 2 X 10 log n ≤ k ≤ n/ − k ≤ e − log n × 10 log n + 4 × − 10 log n = n − o (1) , where (a) follows from h ( x ) = h (1 − x ) and h ( x ) ≥ x log x .20 .1 Proof of Lemma 3 Fix k ∈ [ n ]. Let T k denote the set of permutations π ′ such that d ( π, π ′ ) = k . Recall that F is theset of fixed points of σ , π − ◦ π ′ with | F | = n − k and O = (cid:0) F (cid:1) is a subset of fixed points of edgepermutation σ E . It follows that for any τ ∈ R , { d ( b π ML , π ) = k } ⊂ ∃ π ′ ∈ T k : X i 2. It follows that E [exp ( tX π ′ )] (a) = M n ( n ) Y ℓ =2 M N ℓ ℓ (b) ≤ M n ( n ) Y ℓ =2 M ℓN ℓ / 22 (c) = (cid:18) M M (cid:19) n M ( n ) − ( n ) = (cid:18) M M (cid:19) k M m , where (a) follows from (39) and N = (cid:0) n (cid:1) + n in view of (14); (b) follows from M ℓ ≤ M ℓ/ for ℓ ≥ 2; (c) follows from P ( n ) ℓ =1 ℓN ℓ = (cid:0) n (cid:1) ; (d) follows from n ≤ k , n = n − k , and m , (cid:0) n (cid:1) − (cid:0) n − k (cid:1) .Combining the last displayed equation with (38) yields that(II) ≤ exp (cid:18) k log n − tτ + k M M + m M (cid:19) ≤ exp (cid:16) − ǫ k log n (cid:17) , where the last inequality follows from the claim thatinf t ≥ (cid:26) − tτ + k M M + m M (cid:27) ≤ − (cid:16) ǫ (cid:17) k log n. (40)It remains to check (37) and (40) by choosing appropriately τ for Erd˝os-R´enyi random graphsand Gaussian model separately. • For Erd˝os-R´enyi random graphs, Note that for any F ⊂ [ n ] with | F | = n − k , Y = X ( i,j ) / ∈ ( F ) A π ( i ) π ( j ) B ij ∼ Binom (cid:0) m, ps (cid:1) . Let µ , E [ Y ] = mps and set τ = (1 − δ ) µ where δ , q h ( k/n ) kps k ≤ n − q nn ( n − ps k = n. It can be verified that δ ∈ (0 , k ≤ n − 1, by the assumption that h ( k/n ) ≤ cǫ kps for sufficiently small c , we have that δ ≤ √ cǫ < 1; if k = n , under condition (33), nps ≥ ǫ and thus δ ≤ q nn − < n .If k = n , applying the Chernoff bound yields P { Y ≤ τ } = P (cid:26) Binom (cid:18)(cid:18) n (cid:19) , ps (cid:19) ≤ (1 − δ ) µ (cid:27) ≤ exp (cid:18) − δ µ (cid:19) = exp ( − n ) . If 1 ≤ k ≤ n − (cid:18) nk (cid:19) P { Y ≤ τ } ( a ) ≤ (cid:18) nk (cid:19) exp (cid:18) − δ µ (cid:19) ( b ) ≤ exp (cid:18) − nh (cid:18) kn (cid:19)(cid:19) , where ( a ) holds by applying the Chernoff bound to Y ∼ Binom( m, ps ) with mean µ and τ =(1 − δ ) µ ; ( b ) follows from (cid:0) nk (cid:1) ≤ exp (cid:0) nh (cid:0) kn (cid:1)(cid:1) , the definition of δ , and µ = (cid:0) − k +12 n (cid:1) nkps ≥ nkps when k ≤ n − 1. In conclusion, we arrive at the desired (37).22ext, we check (40). In view of (52) in Lemma 7, M = T and M = T − D = 1 + ps (cid:0) e t ps + 2 e t p (1 − s ) − p + ps (cid:1) , where T = ps (cid:0) e t − (cid:1) + 1 and D = ps (1 − p ) (cid:0) e t − (cid:1) . Moreover, M M = 1 + 2 DT − D ≤ D = 1 + 2 ps ( e t − − p ) , where the inequality holds because T − D = M ≥ 1. In the sequel, we assume 0 ≤ t ≤ log ps . Then we have that M M ≤ ≤ e .Therefore, − tτ + k M M + m M ≤ − t (1 − δ ) mps + k + m M − ≤ k + 12 mps f ( t )where f ( t ) , − − δ ) t + e t ps + 2 e t p (1 − s ) − p + ps , Therefore, to verify (40), it suffices to check12 mps inf ≤ t ≤ log(1 /ps ) f ( t ) ≤ − (cid:16) ǫ (cid:17) k log n − k. (41)Next, we choose different values of t ∈ [0 , log(1 /ps )] according to different cases. Case 1 : nps (log(1 /p ) − p ) ≥ (4 + ǫ ) log n . We pick t = log p . Then f ( t ) = − (1 − δ ) log 1 p + s ( √ p − + 2 √ p − p ( a ) ≤ − (1 − δ ) log 1 p + 1 − p ≤ − (1 − δ ) (cid:18) log 1 p − p (cid:19) + δ ( b ) ≤ − (1 − Cδ ) (cid:18) log 1 p − p (cid:19) , where ( a ) holds because s ≤ 1; ( b ) holds for some large enough constant C , because log p − p is bounded away from 0 as p is bounded away from 1. Recalling that m = (cid:0) n (cid:1) − (cid:0) n − k (cid:1) = nk (cid:0) − k +1 n (cid:1) ≥ nk (cid:0) − n (cid:1) , it follows that in this case,12 mps f ( t ) ≤ − (cid:16) ǫ (cid:17) (cid:18) − n (cid:19) (1 − Cδ ) k log n. Case 2 : nps (log(1 /p ) − p ) ≤ (4 + ǫ ) log n and p ≥ n − . We pick t = log p . Then f ( t ) = − − δ ) log (cid:18) p (cid:19) + s p + 2(1 − s ) − p + ps ≤ − − δ ) (cid:18) log 1 p − p (cid:19) + 2 δ + s p ( a ) ≤ − − δ ) (cid:16) − ǫ (cid:17) (cid:18) log 1 p − p (cid:19) + 2 δ ( b ) ≤ − − Cδ ) (cid:16) − ǫ (cid:17) (cid:18) log 1 p − p (cid:19) , a ) holds because s p ≤ (4+ ǫ ) log nnp (log(1 /p ) − p ) ≤ ǫ (log(1 /p ) − p ) for all sufficientlylarge n when p ≥ n − ; ( b ) holds for some large enough constant C , because log p − p is bounded away from 0 as p is bounded away from 1. Then under the assumption that nps (log(1 /p ) − p ) ≥ (2 + ǫ ) log n ,12 mps f ( t ) ≤ − (cid:16) ǫ (cid:17) (cid:18) − n (cid:19) (1 − Cδ ) (cid:16) − ǫ (cid:17) k log n. Case 3 : nps ≥ ǫ and p < n − . We pick t = log ps . Then f ( t ) = − (1 − δ ) log 1 ps + 1 + 2 r ps (1 − s ) − p + ps ≤ − (1 − δ ) log 1 ps + 1 + 2 r ps ≤ − (1 − δ ) (cid:16) − ǫ (cid:17) log 1 ps where the last inequality holds because ps ≤ np ǫ ≤ ǫ and log ps ≥ log n when nps ≥ ǫ and p < n − . Then,12 mps f ( t ) ≤ − (cid:16) ǫ (cid:17) (cid:18) − n (cid:19) (1 − δ ) (cid:16) − ǫ (cid:17) k log n. Hence, combining the three cases, we get12 mps inf ≤ t ≤ log(1 /ps ) f ( t ) ≤ − (cid:16) ǫ (cid:17) (cid:18) − n (cid:19) (1 − Cδ ) (cid:16) − ǫ (cid:17) k log n ≤ − (cid:16) ǫ (cid:17) k log n − k, where the last inequality holds for all sufficiently large n , because recall that δ ≤ √ cǫ ≤ ǫ by choosing c ≤ × C when k ≤ n − δ ≤ q nn − ≤ ǫ when k = n . Thus we arriveat the desired (41) which further implies (40). • For Gaussian model, set τ = ρm − a k , where a k = ( C q h (cid:0) kn (cid:1) nm k ≤ n − Cn √ log n k = n for some universal constant C > 0. For k ≤ n − 1, by the assumption h ( k/n ) ≤ ǫ C kρ and m = kn (cid:0) − k +12 n (cid:1) ≥ kn , a k ≤ ǫ ρ √ knm ≤ ǫ ρm . For k = n, by the assumption that nρ ≥ (4 + ǫ ) log n , a k ≤ ǫ ρm . In conclusion, we have τ ≥ ρm (cid:0) − ǫ (cid:1) .First, we check (37). Note that ( A π ( i ) π ( j ) , B ij ) are iid pairs of standard normals with cor-relation coefficient ρ . If k = n , applying the Hanson-Wright inequality (see Lemma 6) with M = I m to Y = P ( i,j ) ( F ) A π ( i ) π ( j ) B ij yields that P ( Y ≤ τ ) ≤ e − n . If k ≤ n − 1, applyingLemma 6 again yields that (cid:18) nk (cid:19) P ( Y ≤ τ ) ≤ (cid:18) nk (cid:19) exp (cid:18) − nh (cid:18) kn (cid:19)(cid:19) ≤ exp (cid:18) − nh (cid:18) kn (cid:19)(cid:19) . M = λ and M = λ λ , where λ = q (1 + ρt ) − t and λ = q (1 − ρt ) − t for 0 ≤ t ≤ ρ . Thus for 0 ≤ t ≤ ρ ρ , M M = λ ≤ ρ ≤ 2. It follows thatinf t ≥ (cid:18) − tτ + k M M + m M (cid:19) = inf t ≥ (cid:18) − tτ + k log M M + m + k M (cid:19) ≤ k + inf ≤ t ≤ ρ ρ (cid:18) − tτ + m + k M (cid:19) ≤ k + 12 k ( n − ρ inf ≤ t ≤ ρ ρ f ( t ) , where f ( t ) , − t (cid:16) − ǫ (cid:17) + 12 ρ (cid:18) n − (cid:19) log M , and the last inequality holds because τ ≥ ρm (cid:0) − ǫ (cid:1) and m = k (2 n − k − ≥ k ( n − n − ρ inf ≤ t ≤ ρ ρ f ( t ) ≤ − (cid:16) ǫ (cid:17) log n − . (42) Case 1: ρ ≥ ǫ . We pick t = ρ ρ . Then,log M = log (cid:0) ρ (cid:1) p ρ + ρ p ρ + 9 ρ ! ≤ ρ ρ ! ≤ ρ ρ , where the first inequality holds because (1 + ρ + ρ )(1 + 5 ρ + 9 ρ ) ≥ (cid:0) ρ (cid:1) and thelast inequality follows from log(1 + x ) ≤ x . Therefore, f ( t ) = − ρ ρ (cid:16) − ǫ (cid:17) + ρ ρ (cid:18) n − (cid:19) = − ρ ρ (cid:18)(cid:18) − ρ ρ (cid:19) (cid:16) − ǫ (cid:17) − (cid:18) n − (cid:19)(cid:19) ≤ − ρ ρ (cid:18) (cid:16) − ǫ (cid:17) − (cid:18) n − (cid:19)(cid:19) ≤ − ρ (cid:18) − ǫ − n − (cid:19) , where the first inequality holds because ρ ρ ≤ and the last inequality follows from 2 +3 ρ ≤ 5. Hence, 12 ( n − ρf ( t ) ≤ − ( n − ρ (cid:18) − ǫ − n − (cid:19) . Thus it follows from ρ ≥ ǫ that (42) holds for all sufficiently large n . Case 2: (4 + ǫ ) log nn ≤ ρ < ǫ . We pick t = ρ ρ . Then,log M = log (cid:0) ρ (cid:1) p ρ + 4 ρ p − ρ ≤ log (cid:0) ρ (cid:1) ρ − ρ ≤ ρ + 2 ρ , ρ +4 ρ )(1 − ρ ) − (1+ ρ − ρ ) = ρ (2 − ρ + ρ ) ≥ ρ ≤ ǫ / x ) ≤ x .Therefore, f ( t ) ≤ − ρ ρ (cid:16) − ǫ (cid:17) + ρ (1 + 2 ρ )2 (cid:18) n − (cid:19) ≤ − ρ (cid:18) − ǫ (cid:19) (cid:18) − n − (cid:19) , where the first inequality holds because ρ < ǫ . Hence,12 ( n − ρf ( t ) ≤ − 14 ( n − ρ (cid:16) − ǫ (cid:17) ≤ − (cid:16) ǫ (cid:17) (cid:18) − n (cid:19) (cid:18) − n − (cid:19) log n, where the last inequality follows from the assumption that ρ ≥ (4 + ǫ ) log nn . Thus (42) holdsfor all sufficiently large n . Building upon the almost exact recovery results in the preceding section, we now analyze the MLEfor exact recovery. Improving upon Lemma 3, the following lemma gives a tighter bound on theprobability that the MLE makes a small number of errors. We state this result under a generalcorrelated Erd˝os-R´enyi graph model specified by the joint distribution P = ( p ab : a, b ∈ { , } ),so that P (cid:8) A π ( i ) π ( j ) = a, B ij = b (cid:9) = p ab for a, b ∈ { , } . In this general Erd˝os-R´enyi model, b π ML isagain given by the maximization problem (1) if p p > p p (positive correlation) and changesto minimization if p p < p p (negative correlation). The subsampling model is a special casewith positive correlation, where p = ps , p = p = ps (1 − s ) , p = 1 − ps + ps . (43) Lemma 4. s Suppose that for any constant < ǫ < , • for general Erd˝os-R´enyi random graphs, if n (cid:0) √ p p − √ p p (cid:1) ≥ (1 + ǫ ) log n ; • for Gaussian model, if nρ ≥ (4 + ǫ ) log n ;then for any k ∈ [ n ] such that k ≤ ǫ n , P { d ( b π ML , π ) = k } ≤ exp (cid:16) − ǫ k log n (cid:17) . (44)Note that Lemma 4 only holds when k/n is small but requires less stringent conditions thanLemma 3. Inspecting the proof of Lemma 4, one can see that (44) holds for any k ∈ [ n ], if thecondition is strengthened by a factor of 2 to n (cid:0) √ p p − √ p p (cid:1) ≥ (2 + ǫ ) log n for the Erd˝os-R´enyi graphs, which recovers the sufficient condition of exact recovery in [CK16]. Conversely, wewill prove in Section 4.2 that exact recovery is impossible if n (cid:0) √ p p − √ p p (cid:1) ≤ (1 − ǫ ) log n .Closing this gap of two for general Erd˝os-R´enyi graphs is an open problem.In the following, we apply Lemma 4 for small k and Lemma 3 (or Proposition 5) for large k toshow the success of MLE in exact recovery. 26 roof of positive parts in Theorem 1 and Theorem 4. Note that when p is bounded away from 1 , the condition (11), that is nps (1 − √ p ) ≥ (1 + ǫ ) log n , implies nps (cid:16)p − ps + ps − √ p (1 − s ) (cid:17) ≥ (1 + ǫ ) log n, which is further equivalent to n ( √ p p − p p ) ≥ (1 + ǫ ) log n in view of the relation (43).Therefore, for all k ≤ ǫ n , by Lemma 4, for both Erd˝os-R´enyi random graphs and Gaussian model, ǫ n X k =1 P { d ( b π ML , π ) = k } ≤ ǫ n X k =1 exp (cid:16) − ǫ k log n (cid:17) ≤ exp (cid:0) − ǫ log n (cid:1) − ǫ log n = n − ǫ + o (1) . Moreover, • For the subsampling Erd˝os-R´enyi random graphs, since log p − p ≥ − √ p ) , it followsthat nps (1 − √ p ) ≥ (1 + ǫ ) log n implies (33) and nps = ω (1). Then, by Proposition 5, P n d ( b π ML , π ) > ǫ n o ≤ n − o (1) . (45) • For the Gaussian model, since nρ ≥ (4 + ǫ ) log n , it follows that h ( k/n ) ≤ cǫ kρ holds forall k ≥ ǫ n , where c is the constant in Lemma 3. Thus by (34) in Lemma 3 and (35), (45)follows.Hence, for both the Erd˝os-R´enyi random graph and Gaussian model, P { d ( b π ML , π ) > } ≤ ǫ n X k =1 P { d ( b π ML , π ) = k } + P n d ( b π ML , π ) > ǫ n o ≤ n − ǫ + o (1) . In this proof we focus on the case with positive correlation, i.e., p p ≥ p p in the generalErd˝os-R´enyi graphs and ρ ≥ D A π ′ − A π , B E ≤ k ∈ [ n ] and let T k denote the set of permutation π ′ such that d ( π, π ′ ) = k . Let O ′ is theset of fixed points of edge permutation σ E , where in view of (14), |O ′ | = N = (cid:18) n (cid:19) + n = (cid:18) n − k (cid:19) + n . Then, applying the Chernoff bound together with the union bound yields that for any t ≥ , P { d ( b π ML , π ) = k } ≤ X π ′ ∈T k P { X π ′ ≥ } ≤ n k E [exp ( tX π ′ )] , (46)where X π ′ , X i 2. It follows that E [exp ( tX π ′ )] = ( n ) Y ℓ =2 M N ℓ ℓ ≤ ( n ) Y ℓ =2 M ℓN ℓ / ≤ M m , where m , (cid:0) n (cid:1) − (cid:0) n − k (cid:1) − n and the last inequality follows from P ( n ) ℓ =1 ℓN ℓ = (cid:0) n (cid:1) and N = (cid:0) n (cid:1) + n = (cid:0) n − k (cid:1) + n . Hence, P { d ( b π ML , π ) = k } (a) ≤ exp (cid:18) k log n + 12 kn (cid:18) − k + 22 n (cid:19) log M (cid:19) (b) ≤ exp (cid:18) − (cid:18) ǫ − k + 22 n (cid:16) ǫ (cid:17)(cid:19) k log n (cid:19) (c) ≤ exp (cid:16) − ǫ k log n (cid:17) , where ( a ) holds because n ≤ k/ m ≥ kn (cid:0) − k +22 n (cid:1) ; ( b ) holds by the claim that n log M ≤ − ǫ/ 4) log n for appropriately chosen t ; ( c ) holds for all sufficiently large n given k/n ≤ ǫ and 0 < ǫ < 1. It remains to check n log M ≤ − ǫ/ 4) log n by choosing appropriately t for Erd˝os-R´enyi random graphs and Gaussian model separately. • For Erd˝os-R´enyi random graphs, in view of (54) in Lemma 8, M = 1 + 2 (cid:0) p p (cid:0) e t − (cid:1) + p p (cid:0) e − t − (cid:1)(cid:1) . Since p p ≥ p p , by choosing the optimal t ≥ e t = q p p p p , M = 1 − (cid:0) √ p p − √ p p (cid:1) , and hence n log M ≤ − n ( √ p p − √ p p ) ≤ − ǫ ) log n, where the last inequality holds by the assumption that n (cid:0) √ p p − √ p p (cid:1) ≥ (1+ ǫ ) log n ; • For Gaussian model, in view of (55) in Lemma 8, M = 1 p tρ − t (1 − ρ ) . By choosing the optimal t = ρ − ρ ) ≥ M = p − ρ . and hence n log M ≤ − nρ ≤ − ǫ/ 4) log n, where the last inequality holds by the assumption that nρ ≥ (4 + ǫ ) log n .28 .2 Impossibility of Exact Recovery In this subsection we prove the negative result in Theorem 4. As in Section 4.1, we consider ageneral correlated Erd˝os-R´enyi graph model, where P (cid:8) A π ( i ) π ( j ) = a, B ij = b (cid:9) = p ab . We aim toshow that if n ( √ p p − √ p p ) ≤ (1 − ǫ ) log n, (47)then the exact recovery of the latent permutation π is impossible. Particularizing to the subsamplingmodel parameterized by (43) the condition (47) reduces to nps (cid:16)p − ps + ps − √ p (1 − s ) (cid:17) ≤ (1 − ǫ ) log n, which, under the assumption that p = 1 − Ω(1) in Theorem 4, is further equivalent to the desired(12).Since the true permutation π is uniformly distributed, the MLE b π ML minimizes the error prob-ability among all estimators. In the sequel, we focus on the case of positive correlation as the othercase is entirely analogous. Without loss of generality, the latent permutation π is assumed to bethe identity permutation id.To prove the failure of the MLE, it suffices to show the existence of a permutation π ′ thatachieves a higher likelihood than the true permutation π = id. To this end, we consider the set T of permutation π ′ such that d ( π ′ , π ) = 2; in other words, π ′ coincides with π except for swappingtwo vertices. Then the cycle decomposition of π ′ consists of n − i, j ) for some pair of vertices ( i, j ). It follows that h A, B i − h A π ′ , B i = X k ∈ [ n ] \{ i,j } ( A ik − A jk ) ( B ik − B jk ) , X k ∈ [ n ] \{ i,j } X ij,k , where X ij,k i.i.d. ∼ aδ +1 + bδ − + (1 − a − b ) δ for k ∈ [ n ] \{ i, j } with a , p p , b , p p and a ≥ b by assumption. We aim to show the existence of π ′ ∈ T such that h A, B i ≤ h A π ′ , B i , whichfurther implies that π ′ has a higher likelihood than π . We divide the remaining analysis into twocases depending on whether na ≥ (2 − ǫ ) log n .First we consider the case where na ≤ (2 − ǫ ) log n . We show that with probability at least1 − n − Ω( ǫ ) , there are at least n Ω( ǫ ) distinct π ′ ∈ T such that h A, B i ≤ h A π ′ , B i . This impliesthat the MLE b π ML coincides with π with probability at most n − Ω( ǫ ) . Specifically, define χ ij as theindicator random variable which equals to 1 if X ij,k ≤ k = i, j , and 0 otherwise. Then P { χ ij = 1 } = Y k = i,j P { X ij,k ≤ } = (1 − a ) n − ≥ n − ǫ − o (1) , where the last inequality holds because an ≤ (2 − ǫ ) log n . Let I = P ≤ i ≤ n/ P n/ 2) log n . In conclusion, we have shownthat I ≥ n Ω( ǫ ) with probability at least 1 − n − Ω( ǫ ) . This implies that the MLE exactly recovers thetrue permutation with probability at most n − Ω( ǫ ) .Next, we shift to the case where na ≥ (2 − ǫ ) log n . The assumption (47) translates to n ( √ a −√ b ) ≤ − ǫ ) log n , we have(2 − ǫ ) log n − r ba ! ≤ na − r ba ! ≤ − ǫ ) log n. It follows that r ba ≥ − r − ǫ )2 − ǫ = 1 − r − ǫ − ǫ ≥ ǫ − ǫ ) ≥ ǫ ǫ ≤ ba ≤ 1. Let T be a set of 2 m vertices where m = (cid:4) n/ log n (cid:5) . We further partition T into T ∪ T where | T | = | T | = m . Let S denote the set of permutations π ′ that consists of n − i, j ) for some i ∈ T and j ∈ T . Then |S| = m . Next weshow that P n ∃ π ′ ∈ S s.t. h A, B i − D A π ′ , B E < o = 1 − o (1) . (48)Fix a π ′ ∈ S with ( i, j ) as its 2-node orbit, i.e., π ′ ( i ) = j , π ′ ( j ) = i , and π ′ ( k ) = k for any k ∈ [ n ] \{ i, j } . Then h A, B i − D A π ′ , B E = X ij + Y ij , where X ij , X k ∈ T c X ij,k , and Y ij , X k ∈ T \{ i,j } X ij,k . Note that E [ Y ij ] = (2 m − 2) ( a − b ). Letting τ = (2 m − a − b ) log n , by Markov’s inequality, P { Y ij ≥ τ } ≤ n . Define T ′ = { ( i, j ) ∈ T × T : Y ij < τ } . Then E [ | ( T × T ) \ T ′ | ] ≤ m / log n . Hence, by Markov’s inequality again, | ( T × T ) \ T ′ | ≥ m with probability at most n . Hence, we have | T ′ | ≥ m with probability at least 1 − n .Note that crucially T ′ is independent of { X ij } i ∈ T ,j ∈ T . Thus, we condition on the event that | T ′ | ≥ m in the sequel. Define I n = P ( i,j ) ∈ T ′ { X ij ≤− τ } . To bound the P { X ij ≤ − τ } from below,we need the following reverse large-deviation estimate (proved at the end of this subsection): Lemma 5. Suppose { X k } i ∈ [ n ] i.i.d. ∼ aδ +1 + bδ − + (1 − a − b ) δ with some a, b ∈ [0 , such that ≤ a + b ≤ , ≤ ab = Θ(1) , an = ω (1) , and (cid:16) √ a − √ b (cid:17) ≤ log nn . For all τ such that τ = o (cid:0) √ an log n (cid:1) and τ = o ( an ) , and any constant < δ < , there exists n such that for all n ≥ n , P ( n X k =1 X k ≤ − τ ) ≥ exp (cid:18) − n (cid:16) √ a − √ b (cid:17) − δ n (cid:19) . To apply this lemma, note that τ √ an log n = O (cid:18)(cid:16) √ a − √ b (cid:17) r n log n (cid:19) = O (cid:18) n (cid:19) , τan = O (cid:18) n (cid:19) , m ≤ n/ log n , b ≤ a , and √ a − √ b = O ( p log( n ) /n ). Then, applyingLemma 5 with δ = ǫ yields E [ I n ] ≥ m exp (cid:18) −| T c | (cid:16) √ a − √ b (cid:17) − ǫ n (cid:19) ≥ n ǫ − o (1) , (49)where the last inequality holds by assumption that (cid:16) √ a − √ b (cid:17) ≤ − ǫ ) log nn . Next, we showthat Var[ I n ] / E [ I n ] = o (1), which, by Chebyshev’s inequality, implies that I n ≥ E [ I n ] with highprobability.WriteVar[ I n ] = X ( i,j ) , ( i ′ ,j ′ ) ∈ T ′ (cid:0) P (cid:8) X ij ≤ − τ, X i ′ j ′ ≤ − τ (cid:9) − P { X ij ≤ − τ } P (cid:8) X i ′ j ′ ≤ − τ (cid:9)(cid:1) ≤ (I) + (II) , where (I) = X ( i,j ) ∈ T ′ P { X ij ≤ − τ } = E [ I n ] , (II) = X ( i,j ) , ( i,j ′ ) ∈ T ′ ,j = j ′ P (cid:8) X ij ≤ , X ij ′ ≤ (cid:9) + X ( i,j ) , ( i ′ ,j ) ∈ T ′ ,i = i ′ P (cid:8) X ij ≤ , X i ′ j ≤ (cid:9) . To bound (II), fix ( i, j ) , ( i, j ′ ) ∈ T ′ such that j = j ′ . Note that { X ij,k } j ∈ T i.i.d. ∼ Bern ( p ) conditionalon A ik = 1 , B ik = 1; {− X ij,k } j ∈ T i.i.d. ∼ Bern ( p ) conditional on A ik = 1 , B ik = 0; {− X ij,k } j ∈ T i.i.d. ∼ Bern ( p )conditional on A ik = 0 , B ik = 1; { X ij,k } j ∈ T i.i.d. ∼ Bern ( p ) conditional on A ik = 0 , B ik = 0. Then,for ℓ, ℓ ′ ∈ { , } , we define M ℓℓ ′ = (cid:12)(cid:12)(cid:8) k ∈ T c | A ik = ℓ, B ik = ℓ ′ (cid:9)(cid:12)(cid:12) , and get that for any λ ≥ , P ( X k ∈ T c X ij,k ≤ , X k ∈ T c X ij ′ ,k ≤ ) ( a ) = E " P ( X k ∈ T c X ij,k ≤ (cid:12)(cid:12)(cid:12)(cid:12) M , M , M , M ) P ( X k ∈ T c X ij ′ ,k ≤ (cid:12)(cid:12)(cid:12)(cid:12) M , M , M , M ) ( b ) = E h γ M γ M γ p M γ M i ( c ) = (cid:0) γ p + γ p + γ p + γ p (cid:1) | T c | , where γ ℓ,ℓ ′ = 1 − p ℓℓ ′ + p ℓℓ ′ e (2 | ℓ − ℓ ′ |− λ ; ( a ) holds because P k ∈ T ′ X ij,k and P k ∈ T ′ X ij ′ ,k are indepen-dent conditional on M , M , M , M ; ( b ) holds by applying the Chernoff bound; ( c ) holds by ap-plying the MGF of the multinomial distribution, since M , M , M , M ∼ Multi( | T c | , p , p , p , p ).Choosing e λ = q p p p p = p ab where a = 2 p p and b = 2 p p , we have P ( X k ∈ T c X ij,k ≤ , X k ∈ T c X ij ′ ,k ≤ ) ≤ (cid:18) − (cid:16) √ a − √ b (cid:17) (cid:19) | T c | . The same upper bound applies to any ( i, j ) , ( i ′ , j ) ∈ T ′ such that i = i ′ .32herefore, we get that(II) ≤ m (cid:18) − (cid:16) √ a − √ b (cid:17) (cid:19) | T c | ≤ m exp (cid:18) − (cid:16) √ a − √ b (cid:17) | T c | (cid:19) . Hence, by Chebyshev’s inequality, P (cid:26) I n ≤ E [ I n ] (cid:27) ≤ Var[ I n ] ( E [ I n ]) ≤ E [ I n ] + 8 × (II)( E [ I n ]) a ) ≤ n − ǫ + o (1) + 32 m exp (cid:18) (cid:16) √ a − √ b (cid:17) | T c | + ǫ n (cid:19) ( b ) ≤ n − ǫ/ o (1) where ( a ) is due to (49); ( b ) holds by the assumption that (cid:16) √ a − √ b (cid:17) ≤ − ǫ ) log nn .Therefore, with probability 1 − n − Ω( ǫ ) there exists some ( i, j ) ∈ T ′ such that X ij ≤ − τ . Bydefinition of T ′ , we have Y ij < τ and hence X ij + Y ij < 0. Thus, we arrive at the desired claim (48). Proof of Lemma 5. Let E n = { P nk =1 X k ≤ − τ } . Let Q denote the distribution of X k . The fol-lowing large-deviation lower estimate based on the data processing inequality is well-known (see[CK82, Eq. (5.21), p. 167] and [HWX17, Lemma 3]): For any Q ′ , Q ( E n ) ≥ exp (cid:18) − nD ( Q ′ k Q ) − log 2 Q ′ ( E n ) (cid:19) . (50)Choose Q ′ = α − β δ +1 + α + β δ − + (1 − α ) δ , where α , √ ab − (cid:16) √ a − √ b (cid:17) , and β , min ( α , δ r b log nn ) . Note that under the assumption that 1 ≤ a/b = Θ(1) and (cid:16) √ a − √ b (cid:17) ≤ log nn , we have that2 b ≤ α = Θ( a ). Moreover, since τ = o ( √ an log n ) and τ = o ( an ) , it follows that τ = o ( βn ). Then, Q ′ ( E cn ) = Q ′ n X k =1 Y k ≥ − τ ! (a) ≤ Q ′ n X k =1 ( Y k − E [ Y k ]) ≥ βn − τ ! (b) ≤ P nk =1 Var[ Y k ]( βn − τ ) ≤ αn ( βn − τ ) = o (1) , where ( a ) holds because E [ Y k ] = − β ; ( b ) follows from by Chebyshev’s inequality as { Y k } i ∈ [ n ] areindependent and τ = o ( βn ); ( c ) holds because P nk =1 Var[ Y k ] ≤ P nk =1 E (cid:2) Y k (cid:3) ≤ αn ; ( d ) holds33ecause τ = o ( βn ), and αβ n = max n αδ b log n , αn o = o (1), in view of that δ = Θ(1), α = Θ( a ) =Θ( b ) and αn = Θ( an ) = ω (1).Next, we upper bound D ( Q ′ k Q ). We get D (cid:0) Q ′ k Q (cid:1) = α − β α − β a + α + β α + β b + (1 − α ) log 1 − α − a − b (a) = − log (cid:18) − (cid:16) √ a − √ b (cid:17) (cid:19) + α (cid:0) α − β (cid:1) α + β a ( α + β ) b ( α − β ) (b) ≤ − log (cid:18) − (cid:16) √ a − √ b (cid:17) (cid:19) + β α − β + β √ a − √ b √ b (c) ≤ (cid:16) √ a − √ b (cid:17) + δ log n n , where ( a ) holds because α = √ ab − ( √ a −√ b ) ; ( b ) holds because log α + βα − β ≤ βα − β and log ab ≤ √ a −√ b √ b ,in view of log(1 + x ) ≤ x ; ( c ) holds for all sufficiently large n because (cid:16) √ a − √ b (cid:17) ≤ log nn sothat log(1 − ( √ a − √ b ) ) = − ( √ a − √ b ) + O (cid:0) log ( n ) /n (cid:1) , β ≤ α and β ≤ δ q b log nn so that β α − β ≤ β α ≤ δ b log n αn ≤ δ log n n , and β √ a −√ b √ b ≤ δ q log nn (cid:16) √ a − √ b (cid:17) ≤ √ δ log n n .In conclusion, it follows from (50) that Q ( E n ) ≥ exp (cid:18) − (1 + o (1)) (cid:18) n (cid:16) √ a − √ b (cid:17) + δ n + log 2 (cid:19)(cid:19) ≥ exp (cid:18) − n (cid:16) √ a − √ b (cid:17) − δ n (cid:19) , where the last inequality holds for all sufficiently large n in view of n (cid:16) √ a − √ b (cid:17) ≤ n . A Hanson-Wright Inequality The following lemma stated in [WXY20, Lemma 12] follows from the Hanson-Wright inequality[HW71, RV13]. Lemma 6 (Hanson-Wright inequality) . Let U, V ∈ R n are standard Gaussian vectors such thatthe pairs ( U k , V k ) ∼ N (cid:16) ( ) , (cid:16) ρρ (cid:17) (cid:17) are independent for k = 1 , . . . , n . Let M ∈ R n × n be anydeterministic matrix. There exists some universal constant c > such that with probability at least − δ , (cid:12)(cid:12)(cid:12) U ⊤ M V − ρ Tr ( M ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) k M k F p log(1 /δ ) + k M k log(1 /δ ) (cid:17) . (51) B Moment Generating Functions Associated with Edge Orbits The claim (52) in the following lemma has been proved in [HM20, Lemma B.2, B.3]. For complete-ness, we present a more concise proof, following that of [WXY20, Proposition 1]. Lemma 7. Fixing π and π ′ , where π is the latent permutation under P . For any edge orbit O of σ = π − ◦ π ′ with | O | = k and t ≥ , let M k , E h exp (cid:16) t P ( i,j ) ∈ O A π ′ ( i ) π ′ ( j ) B ij (cid:17)i . For Erd˝os-R´enyi random graphs, M k = T − √ T − D ! k + T + √ T − D ! k , (52) where T = ps (cid:0) e t − (cid:1) + 1 and D = ps (1 − p ) (cid:0) e t − (cid:1) . • For Gaussian model, for t ≤ ρ , M k = "(cid:18) λ + λ (cid:19) k − (cid:18) λ − λ (cid:19) k − (53) where λ = p (1 + ρt ) − t and λ = p (1 − ρt ) − t .Moreover, M k ≤ M k/ for all k ≥ .Proof. For ease of notation, let { ( a i , b i ) } ki =1 be independently and identically distributed pairs ofrandom variables with joint distribution P . Let a k +1 = a and b k +1 = b . Since O is an edge orbitof σ E , we have { A π ( i ) π ( j ) } ( i,j ) ∈ O = { A π ′ ( i ) π ′ ( j ) } ( i,j ) ∈ O and ( π ′ ( i ) , π ′ ( j )) = ( π ( σ ( i )) , π ( σ ( j ))). Then,we have that M k = E " exp k X i =1 ta i +1 b i ! = E " E " exp k X i =1 ta i +1 b i ! (cid:12)(cid:12)(cid:12)(cid:12) a , a , · · · , a k = E " k Y i =1 E (cid:20) exp ( ta i +1 b i ) (cid:12)(cid:12)(cid:12)(cid:12) a i , a i +1 (cid:21) . • For Erd˝os-R´enyi random graphs, M k = tr (cid:0) M k (cid:1) , where M is a 2 × { , } and M ( ℓ, m ) = E [exp ( ta i +1 b i ) | a i = ℓ, a i +1 = m ] P ( a i +1 = m ) . = E [exp ( tmb i ) | a i = ℓ ] P ( a i +1 = m ) . Explicitly, we have M = (cid:18) − ps (cid:0) ηe t + 1 − η (cid:1) ps − ps (cid:0) se t + 1 − s (cid:1) ps (cid:19) . where η = ps (1 − s )1 − ps . The eigenvalues of M are T −√ T − D and T + √ T − D , where T , Tr( M ) = ps (cid:0) e t − (cid:1) + 1 and D , det( M ) = ps (1 − p ) (cid:0) e t − (cid:1) . • For Gaussian model, M k = E " k Y i =1 exp (cid:18) tρa i a i +1 + 12 t (cid:0) − ρ (cid:1) a i +1 (cid:19) , where the equality follows from b i ∼ N ( ρa i , − ρ ) conditional on a i and E [exp ( tZ )] =exp (cid:0) tµ + t ν / (cid:1) for Z ∼ N ( µ, ν ). 35et λ = q (1 + ρt ) − t and λ = q (1 − ρt ) − t , where t ≤ ρ . By change of variables, M k = Z · · · Z k Y i =1 √ π exp − (cid:16) λ + λ a i + λ − λ a i +1 (cid:17) d a · · · d a k = Z · · · Z k Y i =1 √ π exp (cid:18) − X i (cid:19) d X · · · d X k det( J − )= 1 (cid:16) λ + λ (cid:17) k − (cid:16) λ − λ (cid:17) k , where the second equality holds by letting X i = λ + λ a i + λ − λ a i +1 and denoting J as theJacobian matrix whose entry J ij = ∂X k ∂a j and its inverse matrix is J = λ + λ λ − λ · · · λ + λ λ − λ · · · 00 0 λ + λ · · · λ − λ λ − λ · · · λ + λ , the last equality holds because det( J − ) = det( J ) − , where det( J ) = (cid:16) λ + λ (cid:17) | O | − (cid:16) λ − λ (cid:17) | O | .Finally, we prove that M k ≤ ( M ) k/ for k ≥ 2. Indeed, for the Erd˝os-R´enyi graphs, this simplyfollows from x k + y k ≤ ( x + y ) k/ for x, y ≥ k ∈ N . Analogously, for the Gaussian model,this follows from ( a + b ) k − ( a − b ) k ≥ (4 ab ) k/ , which holds by rewriting ( a + b ) = ( a − b ) + 4 ab and letting x = ( a − b ) and y = 4 ab . Lemma 8. Fixing π and e π , where π is the latent permutation under P . For any edge orbit O of σ = π − ◦ π ′ with | O | = k and t > , let M k , E h exp (cid:16) t P ( i,j ) ∈ O A π ′ ( i ) π ′ ( j ) B ij − P ( i,j ) ∈ O A π ( i ) π ( j ) B ij (cid:17)i . • For general Erd˝os-R´enyi random graphs model, M k = T − √ T − D ! k + T + √ T − D ! k , (54) where T = 1 and D = − (cid:0) p p (cid:0) e t − (cid:1) + p p (cid:0) e − t − (cid:1)(cid:1) . • For Gaussian model, for t ≤ − ρ ) , M k = "(cid:18) λ + λ (cid:19) k − (cid:18) λ − λ (cid:19) k − , (55) where λ = 1 and λ = p tρ − t (1 − ρ ) .Moreover, M k ≤ M k/ for all k ≥ . roof. For ease of notation, let { ( a i , b i ) } ki =1 be independently and identically distributed pairs ofrandom variables with joint distribution P . Let a k +1 = a and b k +1 = b . Since O is an edge orbitof σ E , we have { A π ( i ) π ( j ) } ( i,j ) ∈ O = { A π ′ ( i ) π ′ ( j ) } ( i,j ) ∈ O and ( π ′ ( i ) , π ′ ( j )) = π ( σ ( i ) , σ ( j )). Then, wehave that M k = E exp t X ( i,j ) ∈ O (cid:0) A π ′ ( i ) π ′ ( j ) − A π ( i ) π ( j ) (cid:1) B ij = E " E " exp k X i =1 t ( a i +1 − a i ) b i ! (cid:12)(cid:12)(cid:12)(cid:12) a , a , · · · , a k = E " k Y i =1 E (cid:20) exp ( t ( a i +1 − a i ) b i ) (cid:12)(cid:12)(cid:12)(cid:12) a i , a i +1 (cid:21) . • For Erd˝os-R´enyi random graphs, M k = tr (cid:0) M k (cid:1) , where M is a 2 × { , } and M ( ℓ, m ) = E [exp ( t ( a i +1 − a i ) b i ) | a i = ℓ, a i +1 = m ] P ( a i +1 = m ) . = E [exp ( t ( m − ℓ ) b i ) | a i = ℓ ] P ( a i +1 = m ) . Explicitly, we have M = − p − p (cid:16) p − p − p e t + p − p − p (cid:17) ( p + p ) (cid:16) p p + p ( e t − 1) + 1 (cid:17) ( p + p ) p + p . The eigenvalues of M are T −√ T − D and T + √ T − D , where T = 1 and D = − (cid:0) p p (cid:0) e t − (cid:1) + p p (cid:0) e − t − (cid:1)(cid:1) . • For Gaussian model, M k = E " k Y i =1 exp (cid:16) t ( a i +1 − a i ) ρa i + t ( a i +1 − a i ) (cid:0) − ρ (cid:1) / (cid:17) = E " exp (cid:0) t (cid:0) − ρ (cid:1) − tρ (cid:1) k X i =1 (cid:0) a i − a i a i +1 (cid:1)! = 1 (cid:16) λ + λ (cid:17) k − (cid:16) λ − λ (cid:17) k . where the first equality follows from b i ∼ N ( ρa i , − ρ ) conditional on a i and E [exp ( tZ )] =exp (cid:0) tµ + t ν / (cid:1) for Z ∼ N ( µ, ν ) ; the last equality holds by change of variables andGaussian integral analogous to the proof of Lemma 7.Finally, M k ≤ ( M ) k/ for k ≥ eferences [BCL + 19] Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, and Yueqi Sheng. (Nearly)efficient algorithms for the graph matching problem on correlated random graphs. In Advances in Neural Information Processing Systems , pages 9186–9194, 2019.[BCPP98] Rainer E Burkard, Eranda Cela, Panos M Pardalos, and Leonidas S Pitsoulis. Thequadratic assignment problem. In Handbook of combinatorial optimization , pages1713–1809. Springer, 1998.[CK82] Imre Csisz´ar and J´anos K¨orner. Information Theory: Coding Theorems for DiscreteMemoryless Systems . Academic Press, Inc., 1982.[CK16] Daniel Cullina and Negar Kiyavash. Improved achievability and converse bounds forErd˝os-R´enyi graph matching. ACM SIGMETRICS Performance Evaluation Review ,44(1):63–72, 2016.[CK17] Daniel Cullina and Negar Kiyavash. Exact alignment recovery for correlated Erd˝os-R´enyi graphs. arXiv preprint arXiv:1711.06783 , 2017.[CKMP19] Daniel Cullina, Negar Kiyavash, Prateek Mittal, and H Vincent Poor. Partial recoveryof Erd˝os-R´enyi graph alignment via k-core alignment. Proceedings of the ACM onMeasurement and Analysis of Computing Systems , 3(3):1–21, 2019.[DAM17] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual in-formation for the balanced binary stochastic block model. Information and Inference:A Journal of the IMA , 6(2):125–170, 2017.[DMWX20] Jian Ding, Zongming Ma, Yihong Wu, and Jiaming Xu. Efficient random graphmatching via degree profiles. Probability Theory and Related Fields , pages 1–87, Sep2020.[FMWX19a] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph match-ing and regularized quadratic relaxations I: The Gaussian model. arxiv preprintarXiv:1907.08880 , 2019.[FMWX19b] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matchingand regularized quadratic relaxations II: Erd˝os-R´enyi graphs and universality. arxivpreprint arXiv:1907.08883 , 2019.[FQRM + 16] Soheil Feizi, Gerald Quon, Mariana Recamonde-Mendoza, Muriel M´edard, Mano-lis Kellis, and Ali Jadbabaie. Spectral alignment of networks. arXiv preprintarXiv:1602.04181 , 2016.[Gan20] Luca Ganassali. Sharp threshold for alignment of graph databases with gaussianweights. arXiv preprint arXiv:2010.16295 , 2020.[GM20] Luca Ganassali and Laurent Massouli´e. From tree matching to sparse graph align-ment. In Conference on Learning Theory , pages 1633–1665. PMLR, 2020.[GSV05] Dongning Guo, Shlomo Shamai, and Sergio Verd´u. Mutual information and minimummean-square error in gaussian channels. IEEE Trans. on Information Theory , 51,2005. 38HM20] Georgina Hall and Laurent Massouli´e. Partial recovery in the graph alignment prob-lem. arXiv preprint arXiv:2007.00533 , 2020.[HW71] David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadraticforms in independent random variables. The Annals of Mathematical Statistics ,42(3):1079–1083, 1971.[HWX17] Bruce Hajek, Yihong Wu, and Jiaming Xu. Information limits for recovering a hiddencommunity. IEEE Trans. on Information Theory , 63(8):4729 – 4745, 2017.[LFF + 16] Vince Lyzinski, Donniell Fishkind, Marcelo Fiori, Joshua Vogelstein, Carey Priebe,and Guillermo Sapiro. Graph matching: Relax at your own risk. IEEE Transactionson Pattern Analysis & Machine Intelligence , 38(1):60–73, 2016.[MMRU09] Cyril M´easson, Andrea Montanari, Thomas J Richardson, and R¨udiger Urbanke.The generalized area theorem and some of its consequences. IEEE Transactions onInformation Theory , 55(11):4793–4821, 2009.[PG11] Pedram Pedarsani and Matthias Grossglauser. On the privacy of anonymized net-works. In Proceedings of the 17th ACM SIGKDD international conference on Knowl-edge discovery and data mining , pages 1235–1243, 2011.[PW94] Panos M Pardalos and Henry Wolkowicz. Quadratic Assignment and Related Prob-lems: DIMACS Workshop, May 20-21, 1993 , volume 16. American MathematicalSoc., 1994.[RV13] Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub-Gaussianconcentration. Electronic Communications in Probability , 18, 2013.[RXZ19] Galen Reeves, Jiaming Xu, and Ilias Zadik. The all-or-nothing phenomenon in sparselinear regression. In Conference on Learning Theory , pages 2652–2663. PMLR, 2019.[WXY20] Yihong Wu, Jiaming Xu, and Sophie H. Yu. Testing correlation of unlabeled randomgraphs. arXiv preprint arXiv:2008.10097arXiv preprint arXiv:2008.10097