[PDF] Chi-squared Amplification: Identifying Hidden Hubs

Abstract

We consider the following general hidden hubs model: an n×n random matrix A with a subset S of k special rows (hubs): entries in rows outside S are generated from the probability distribution p 0 ∼N(0, σ 2 0 ) ; for each row in S , some k of its entries are generated from p 1 ∼N(0, σ 2 1 ) , σ 1 > σ 0 , and the rest of the entries from p 0 . The problem is to identify the high-degree hubs efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a k×k submatrix. There are two well-known barriers: if k≥c nlnn − − − − − √ , just the row sums are sufficient to find S in the general model. For the submatrix problem, this can be improved by a lnn − − − √ factor to k≥c n − − √ by spectral methods or combinatorial methods. In the variant with p 0 =±1 (with probability 1/2 each) and p 1 ≡1 , neither barrier has been broken. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for k≥ n 0.5−δ for some δ>0 , when σ 2 1 >2 σ 2 0 . The algorithm extends to the setting where planted entries might have different variances each at least as large as σ 2 1 . We also show a nearly matching lower bound: for σ 2 1 ≤2 σ 2 0 , there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from N(0, σ 2 0 ) and a matrix with k= n 0.5−δ hidden hubs for any δ>0 . The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value σ 2 1 =2 σ 2 0 , we show that the general hidden hubs problem can be solved for k≥c n − − √ (lnn ) 1/4 , improving on the naive row sum-based method.

Full PDF

aa r X i v : . [ c s . L G ] N ov Chi-Squared Ampliﬁcation: Identifying Hidden Hubs

Ravi Kannan ∗ Santosh Vempala † August 29, 2018

Abstract

We consider the following general hidden hubs model: an n × n random matrix A with asubset S of k special rows (hubs): entries in rows outside S are generated from the (Gaussian)probability distribution p ∼ N (0 , σ ); for each row in S , some k of its entries are generatedfrom p ∼ N (0 , σ ), σ > σ , and the rest of the entries from p . The special rows withhigher variance entries can be viewed as hidden higher-degree hubs. The problem we addressis to identify them eﬃciently. This model includes and signiﬁcantly generalizes the plantedGaussian Submatrix Model, where the special entries are all in a k × k submatrix. There aretwo well-known barriers: if k ≥ c √ n ln n , just the row sums are suﬃcient to ﬁnd S in thegeneral model. For the submatrix problem, this can be improved by a √ ln n factor to k ≥ c √ n by spectral methods or combinatorial methods. In the variant with p = ± / p ≡

1, neither barrier has been broken (in spite of much eﬀort, particularly forthe submatrix version, which is called the Planted Clique problem.)Here, we break both these barriers for the general model with Gaussian entries. We give apolynomial-time algorithm to identify all the hidden hubs with high probability for k ≥ n . − δ for some δ >

0, when σ > σ . The algorithm extends easily to the setting where planted entriesmight have diﬀerent variances each at least as large as σ . We also show a nearly matching lowerbound: for σ ≤ σ , there is no polynomial-time Statistical Query algorithm for distinguishingbetween a matrix whose entries are all from N (0 , σ ) and a matrix with k = n . − δ hidden hubsfor any δ >

0. The lower bound as well as the algorithm are related to whether the chi-squareddistance of the two distributions diverges. At the critical value σ = 2 σ , we show that thegeneral hidden hubs problem can be solved for k ≥ c √ n (ln n ) / , improving on the naive rowsum-based method. ∗ Microsoft Research India. Email: [email protected] † Georgia Tech. Email: [email protected]

Introduction

Identifying hidden structure in random graphs and matrices is a fundamental topic in unsupervisedmachine learning, with many application areas and deep connections to probability, informationtheory, linear algebra, statistical physics and other disciplines. A prototypical example is ﬁnding alarge hidden clique in a random graph, where the best known algorithms can ﬁnd a clique of size k =Ω( √ n ) planted in G n, , and smaller planted cliques are impossible to ﬁnd by statistical algorithms[FGR +

13] or using powerful convex programming hierarchies [BHK + N (0 , σ ), exceptfor entries from a k × k submatrix, which are drawn from N ( µ, σ ).Algorithms for both are based on spectral or combinatorial methods. Information-theoretically,even a planting of size O (log n ) can be found in time n O (log n ) by enumerating subsets of size O (log n ). This raises the question of the threshold for eﬃcient algorithms. Since the planted parthas diﬀerent variance, it is natural to try to detect the planting using either the sums of the rows(degrees in the case of graphs) or the spectrum of the matrix. However, these approaches can onlydetect the planting at rather large separations (when µ = ω ( σ ) for example) or for k = Ω( √ n )[Bop87, Kuc95, AKS98, FR10, DGGP11, BCC +

10, MRZ15, DM15b]. Roughly speaking, the rela-tively few entries of the planted part must be large enough to dominate the variance of the manyentries of the rest of the matrix. A precise threshold for a rank-one perturbation to a random matrixto be noticeable was given by F´eral and Pech´e [FP07] and applied in a lower bound by Monta-nari et al. on using the spectrum to detect a planting [MRZ15]. Tensor optimization (or highermoment optimization) rather than eigen/singular vectors can ﬁnd smaller cliques [FK08, BV09],but the technique has not yielded a polynomial-time algorithm to date. A diﬀerent approach toplanted clique and planted Gaussian submatrix problems is to use convex programming relaxations,which also seem unable to go below √ n . Many recent papers demonstrate the limitations of theseapproaches [FK00, FGR +

13, MPW15, HKP +

16, BHK +

16, FGV17] (see also [Jer92]).

Model.

In this paper, we consider a more general model of hidden structure: the presence ofa small number of hidden hubs . These hubs might represent more inﬂuential or atypical nodesof a network. Recovering such nodes is of interest in many areas (information networks, proteininteraction networks, cortical networks etc.). In this model, as before, the entries of the matrixare drawn from N (0 , σ ) except for special entries that all lie in k rows, with k entries from eachof these k rows. This is a substantial generalization of the above hidden submatrix problems, asthe only structure is the existence of k higher “degree” rows (hubs) rather than a large submatrix.(Our results also extend to unequal variances for the special entries and varying numbers of themfor each hub.)More precisely, we are given an N × n random matrix A with independent entries. There issome unknown subset S of special rows, with | S | = s . Each row in S has k special entries, eachpicked according to p ( x ) ∼ N (0 , σ ) , whereas, all the other N n − k | S | entries are distributed according to p ∼ N (0 , σ ) . The task is to ﬁnd S , given, s = | S | , k, n, σ , σ . One may also think of S rows as picking n i.i.d.1amples from a mixture kn p ( x ) + (cid:18) − kn (cid:19) p ( x ) , whereas, the non- S rows are picking i.i.d. samples from p ( x ). This makes it clear that we cannotassume that the planted entries in the S rows are all in the same columns.If σ = σ , obviously, we cannot ﬁnd S . If σ > σ (1 + c ) , for a positive constant c (independent of n, k ), then it is easy to see that k ≥ Ω (cid:16) √ n ln n (cid:17) suﬃcesto have a polynomial time algorithm to ﬁnd S : Set B ij = A ij −

1. Let P j B ij = ρ i . It is notdiﬃcult to show that if k ≥ c √ n √ ln n , then, whp,Min i : hub ρ i > i : non-hub ρ i . The above algorithm is just the analog of the “degree algorithm” for hidden (Gaussian) clique —take the k vertices with the highest degrees — and works with high probability for k ≥ c √ n ln n .The remaining literature on upper bounds removes the √ ln n factor, by using either a spectralapproach (SVD) or a combinatorial approach (iteratively remove the minimum degree vertex). Forthe general hub model, however, this improvement is not possible. The algorithms (both spectraland combinatorial) rely on the special entries being in a submatrix. This leads to our ﬁrst question: Q. Are there eﬃcient algorithms for ﬁnding hidden hubs for k = o ( √ n ln n ) ? Main results.

Our main results can be summarized as follows. (For this statement, assume ε, δ are positive constants. In detailed statements later in the paper, they are allowed to depend on n .) Theorem 1.1

For the hidden hubs model with k hubs:1. For σ = 2(1 + ε ) σ , there is an eﬃcient algorithm for k ≥ n . − δ for some δ > , dependingonly on ε .2. For σ ∈ [ cσ , σ ] , any c > , no polynomial Statistical Query algorithm can detect hiddenhubs for k = n . − δ , for any δ > .3. At the critical value σ = 2 σ , with N = n , k ≥ √ n (ln n ) / suﬃces. Our algorithm also gives improvements for the special case of identifying hidden Gaussiancliques. For that problem, the closest upper bound in the literature is the algorithm of [BCC + n O (1 / ( ǫ − δ )) for σ = 2(1+ ǫ ) σ , and ǫ must be Ω(1) to be polynomial-time. Moreover, as with all previous algorithms, it does not extend to the hidden hubs model andneeds the special (higher variance) entries to span a k × k submatrix. In contrast, our simplealgorithms run in time linear in the number of entries of the matrix for ǫ = Ω(1 / log n ).Our upper bound can be extended to an even more general model, where each planted entrycould have its own distribution p ij ∼ N (0 , σ ij ) with bounded σ ij . There is a set of rows S that2re hubs, with | S | = k . For each i ∈ S , now we assume there is some subset T i of higher varianceentries. The | T i | are not given and need not be equal. We assume that the special entries satisfy: σ ij ≥ σ , where, σ = 2(1 + ε ) σ , ε > . Theorem 1.2

Let τ i = P j ∈ T i n − σ /σ ij . Suppose, for all i ∈ S , τ i ≥ √ ε c (ln N )(ln n ) . , then there is a randomized algorithm to identify all of S with high probability. As a corollary, we get that if | T i | = k for all i ∈ S , all special entries satisfy σ ij = σ , and k = n . − δ , with ε ≥ δ − δ + ln ln N ln n + ln ln n n , then we can identify all of S .We also have a result for values of ε ∈ Ω(1 / ln n ). See Theorem (3.1). Techniques.

Our algorithm is based on a new technique to amplify the higher variance entries,which we illustrate next. Let p ( x ) = 1 √ πσ exp (cid:18) − x σ (cid:19) p ( x ) = 1 √ πσ exp (cid:18) − x σ (cid:19) be the two probability densities. The central (intuitive) idea behind our algorithm is to constructanother matrix ˆ A of “likelihood ratios”, deﬁned asˆ A ij = p ( A ij ) p ( A ij ) − . Such a transformation was also described in the context of the planted clique problem [DM15a](although it does not give an improvement for that problem). At a high level, one computes therow sums of ˆ A and shows that the row sums of the k rows of the planted part are all higher thanall the row sums of the non-planted part. First, note that E p ( ˆ A ij ) = Z p − Z p = 0 ; Var p ( ˆ A ij ) = Z (cid:18) p p − (cid:19) p = Z p p − χ ( p k p ) , the χ -squared distance between the two distributions p , p . Also, E p (cid:18) p p − (cid:19) = χ ( p k p ) . Intuitively, since the expected sum of row i , for any i / ∈ S is 0, we expect success if the expectedrow sum in each row of S is greater than the standard deviation of the row sum in any row not in S times a log factor, namely, if p χ ( p k p ) ≥ Ω ∗ ( √ nk ) = Ω ∗ ( n δ ) . (1)3ow, χ ( p k p ) = Z p p − cσ σ Z exp (cid:18) x (cid:18) σ − σ (cid:19)(cid:19) . So, if σ ≥ σ , then, clearly, χ ( p k p ) is inﬁnite and so intuitively, (1) can be made to hold. Thisis not a proof. Indeed substantial technical work is needed to make this succeed. The starting pointof that is to truncate entries, so the integrals are ﬁnite. We also have to compute higher momentsto ensure enough concentration to translate these intuitive statements into rigorous ones.On the other hand, if σ < σ , then χ ( p k p ) is ﬁnite and indeed bounded by a constantindependent of k, √ n . So (1) does not hold. This shows that this line of approach will not yield analgorithm. Our lower bounds show that there is no polynomial time Statistical Query algorithm atall when σ ∈ (0 , σ ].The algorithms are based on the following transformation to the input matrix: truncate eachentry of the matrix, i.e., set the ij ’th entry to min { M, A ij } , then apply p ( · ) p ( · ) to it; then take rowsums. The analysis needs nonstandard a concentration inequality via a careful estimation of highermoments; standard concentration inequalities like the H¨oﬀding inequality are not suﬃcient to dealwith the fact that the absolute bound on p /p is too large.Our algorithms also apply directly to the following distributional version of the hidden hubsproblem with essentially the same separation guarantees. A hidden hubs distribution is a dis-tribution over vectors x ∈ R n deﬁned by a subset S ⊂ [ n ] and parameters µ, σ , σ as follows: x i ∼ N (0 , σ ) for i S , and for i ∈ S , x i ∼ ( N ( µ, σ ) with probability kn N (0 , σ ) with probability 1 − kn . The problem is to identify S .For almost all known distributional problems , the best-known algorithms are statistical or canbe made statistical, i.e., they only need to compute expectations of functions on random samplesrather than requiring direct access to the samples. This characterization of algorithms, introducedby Kearns [Kea93, Kea98], has been insightful in part because it is possible to prove lower boundson the complexity of statistical query algorithms. For example, Feldman et al. [FGR +

13] haveshown that the bipartite planted clique problem cannot be solved eﬃciently by such algorithmswhen the clique size is k ≤ n . − δ for any δ >

0. A statistical query algorithm can query the inputdistribution via a statistical oracle. Three natural oracles are STAT, VSTAT and 1-STAT. Roughlyspeaking, STAT( τ ) returns the expectation of any bounded function on a random sample to withinadditive tolerance τ ; VSTAT( t ) returns the expectation of a 0 / t random samples; and 1-STAT simply returns the value of a0 / √ n threshold on the number of hubs (size of clique for the special case of hidden Gaussian clique).Under the conditions of the algorithmic bounds, for σ ≥ ǫ ) σ , there is a δ > O ( n/k ) independent samples. We complement the algorithmic resultswith a lower bound on the separation between parameters that is necessary for statistical queryalgorithms to be eﬃcient (Theorem 5.1). Our application of statistical query lower bounds to The only known exception where a nonstatistical algorithm solves a distributional problem eﬃciently is learningparities with no noise using Gaussian elimination. χ -squared divergence of theplanted Gaussian and the base Gaussian.The model and results raise several interesting open questions, including: (1) Can the upperbounds be extended to more general distributions on the entries, assuming independent entries?(2) Does the χ -squared divergence condition suﬃce for general distributions? (3) Can we recover k = O ( √ n ) hidden hubs when σ = 2 σ ? (our current upper bound is k = √ n (ln n ) / and ourlower bounds do not apply above √ n ) (4) Are there reductions between planted clique problemswith 1 / − Summary of algorithms.

Our basic algorithm for all cases is the same:Deﬁne an M (which is σ √ ln n (1 + o (1)).) The exact value of M diﬀers from case to case.Deﬁne matrix B by B ij = exp( γ Min( x , M )), where, γ is always = σ − σ . Then, we prove thatwith high probability, the maximum | S | row sums of B occur precisely in the S rows. However,the bounds are delicate and so we present the proofs in each case separately (taking advantage ofthe no page limit rule). σ = 2 σ In this section, we assume σ = 2 σ and N = n.p p = ce γx , where, γ > γ = 12 σ − σ = 14 σ . (2)Deﬁne L, M by: L = p n − ln ln n ) ; M = Lσ . (3) B ij = exp (cid:0) γ Min( M , A ij ) (cid:1) . (4) Theorem 2.1 If k ≥ c √ n (ln n ) / , then with high probability, the top s row sums of the matrix B occur precisely in the S rows. Proposition 2.2

Suppose X is a non-negative real-valued random variable and l is a positiveinteger. E (cid:16) | X − E ( X ) | l (cid:17) ≤ E ( X l ) . roof. E (cid:16) | X − E ( X ) | l (cid:17) ≤ Z E ( X ) x =0 ( EX ) l Pr ( X = x ) dx + Z ∞ x = E ( X ) x l Pr ( X = x ) dx ≤ ( EX ) l + E ( X l ) ≤ E ( X l ) , the last, since, E ( X ) ≤ ( E ( X l )) /l . (cid:3) Let µ = E p ( B ij ) = 1 √ πσ Z ∞−∞ exp (cid:0) γ Min( M , x ) (cid:1) exp( − x / σ ) . (5) µ ≤ √ πσ Z ∞−∞ exp( γx ) p ( x ) dx = 1 √ πσ Z ∞−∞ exp( − x / σ ) = √ . (6). E p (( B ij − µ ) ) ≤ E p ( B ij ) ≤ √ πσ Z M exp(2 γx ) exp( − x / σ ) + 2 exp(2 γM ) √ πσ Z ∞ M xM exp( − x / σ ) dx ≤ σ Z M dx + 2 σ M exp (cid:18) M (cid:18) γ − σ (cid:19)(cid:19) ≤ cL. (7)For l ≥

4, even, we have γl − (1 / σ ) > E p (( B ij − µ ) l ) ≤ E p ( B lij ) ≤ √ πσ Z M exp( γlx ) exp( − x / σ ) + 4 exp( γlM ) √ πσ Z ∞ M xM exp( − x / σ ) dx ≤ σ Z M exp (cid:18) M x (cid:18) γl − σ (cid:19)(cid:19) dx + 2 σ M exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) ≤ cL exp (cid:18) L ( l − (cid:19) , (8)We will use a concentration result from ([Kan09], Theorem 2) which specialized to our casestates Theorem 2.3 If X , X , . . . , X n are i.i.d. mean 0 random variables, for any even positive integer m , we have E  n X j =1 X j  m  ≤ ( cm ) m  m/ X l =1 l (cid:18) nE ( X l ) m (cid:19) /l  m/ . With X j = B ij − µ , in Theorem (2.3), we plug in the bounds of (7) and (8) to get:6 emma 2.4 ∀ m even, m ≤ c ln n, E p  n X j =1 ( B ij − µ )  m ≤ ( cmnL ) m/ Proof. ∀ m even, E p  n X j =1 ( B ij − µ )  m ≤ ( cm ) m  nLm + exp( L / m/ X l =2 l (cid:16) nmL exp( − L / (cid:17) /l  m/ . Now, it is easy to check that cnLm ≥ exp( L / (cid:0) n exp( − L / / ( mL ) (cid:1) /l ∀ l ≥ . Hence the Lemma folows, noting that P l (1 /l ) ≤ c . (cid:3) Lemma 2.5

Let t = c √ n (ln n ) / . for c a suitable constant. For i / ∈ S , Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ n . Thus, we have Pr  ∃ i / ∈ S : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ n . Proof.

We use Markov’s inequality on the random variable (cid:12)(cid:12)(cid:12)P nj =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12) m and Lemma (2.4)with m set to 4 ln n to get Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ e − m ≤ n , giving us the ﬁrst inequality. The second follows by union bound. (cid:3) Now focus on i ∈ S . Let T i be the set of k special entries in row i . We will use arguments similarto (8) to prove an upper bound on the l th moment of B ij − µ for planted entries and use that toprove that P T i B ij is concentrated about its mean.We ﬁrst need to get a lower bound on µ = E p ( B ij ): µ ≥ cσ Z M e x / σ e − x / σ dx = cσ Z M dx = cL. l ≥ E p (( B ij − µ ) l ) ≤ E p ( B lij ) ≤ √ πσ Z M exp( γlx ) exp( − x / σ ) + 4 exp( γlM ) √ πσ Z ∞ M xM exp( − x / σ ) dx ≤ σ Z M exp (cid:18) M x (cid:18) γl − σ (cid:19)(cid:19) dx + 2 σ M exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) ≤ σ M (2 γ − (1 / σ )) exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) ≤ cL exp (cid:18) L ( l − (cid:19) . (9) Lemma 2.6

Let t be as in Lemma (2.5). Let t = c ln n exp( L /

4) + √ k ln n √ L exp( L / ! . Pr  ∃ i ∈ S : X j ∈ T i ( B ij − µ ) < − t  ≤ n . Pr  ∃ i ∈ S : n X j =1 ( B ij − µ ) < t  < n . Proof.

First, ﬁx attention on one i ∈ S . We use Theorem (2.3) with X j = B ij − µ for j ∈ T i . Weplug in (9) for E ( X lj ) to get, with m = 4 ln N : E X j ∈ T i ( B ij − µ )  m ≤ ( cm exp( L / m  m/ X l =1 l (cid:18) kmL exp( − L / (cid:19) /l  m/ ≤ ( cm exp( L / m (cid:18) kmL exp( − L /

4) + 1 (cid:19) m/ , the last using x /l ≤ x + 1 for all x >

0. Now, we get that for a single i ∈ S , probability that P j ∈ T i ( B ij − µ ) < − t is at most 1 /n by using Markov inequality on (cid:12)(cid:12)(cid:12)P j ∈ T i ( B ij − µ ) (cid:12)(cid:12)(cid:12) m . Weget the ﬁrst statement of the Lemma by a union bound over all i ∈ S .For the second statement we have, using the same argument as in Lemma (2.5), with highprobability, ∀ i ∈ S, X j / ∈ T i ( B ij − µ ) ≥ − t. (10)We now claim that kL > t + t ) . From the deﬁnition of t, t , it suﬃces to prove the following three inequalities to show this: kL > c √ n (ln n ) / ; kL > c ln ne L / ; kL > √ k ln n √ L e L / . n X j =1 ( B ij − µ ) ≥ k ( µ − µ ) − t − t ≥ t + t ) , proving Lemma (2.6). (cid:3) σ > σ Recall that all planted entries are N (0 , σ ). There are k planted entries in each row of S . Assume(only) ε > c ln n . Deﬁne: M = 2 σ (ln n − ln ε − ln ln N −

12 ln ln n ) and B ij = exp (cid:0) γ Min( M , A ij ) (cid:1) . Theorem 3.1 If ε > c/ ln n and k > ( ε ln N √ ln n ) − ε ) n / (2(1+ ε )) , then with high probability, the top s row sums of B occur precisely in the S rows. Corollary 3.2 If ε > c/ ln n and k ∈ Ω ∗ (cid:16) n . − ε ε ) (cid:17) , then, with high probability, the top s rowsums of B occur precisely in the S rows. Let µ = E p ( B ij ) = 1 √ πσ Z ∞−∞ exp (cid:0) γ Min( M , x ) (cid:1) exp( − x / σ ) ≤ √ πσ Z ∞−∞ exp( − x / σ ) = p ε ) . (11)Let l ≥ γl − (1 / σ ) > l ≥

2. Using Proposition (2.2), weget (recall i / ∈ S ) E p (( B ij − µ ) l ) ≤ E p ( B lij ) ≤ √ πσ Z M exp( γlx ) exp( − x / σ ) + 4 exp( γlM ) √ πσ Z ∞ M xM exp( − x / σ ) dx ≤ σ Z M exp (cid:18) M x (cid:18) γl − σ (cid:19)(cid:19) dx + 2 σ M exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) ≤ cσ M ε exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) , (12)using 2 γ − (1 / σ ) = ε σ (1+ ε ) ≥ ε σ .With X j = B ij − µ , in Theorem (2.3), we plug in the bounds of (12) to get:9 emma 3.3 ∀ m even, E p  n X j =1 ( B ij − µ )  m ≤ ( cm ) m e γmM  m/ X l =1 l (cid:16) cnσ mM ε exp( − M / (2 σ )) (cid:17) /l  m/ = ⇒ With m = 4 ln N, E p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ≤ (cid:0) cm exp( γM ) (cid:1) m (cid:16) cnσ mM ε exp( − M / σ ) (cid:17) m/ . (13)Here, the last inequality is because x /l ≤ x + 1 for all real x and further P l (1 /l ) is a convergentseries. Lemma 3.4

Let t = c (ln N ) exp( γM ) (cid:18) √ cnσ √ mM ε exp (cid:18) − M σ (cid:19)(cid:19) , for c a suitable constant. For i / ∈ S , Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ N . Thus, we have Pr  ∃ i / ∈ S : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ N .

Proof.

We use Markov’s inequality on the random variable (cid:12)(cid:12)(cid:12)P nj =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12) m and (13) with m set to 4 ln N to get Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ e − m ≤ N , giving us the ﬁrst inequality. The second follows by union bound. (cid:3) Now focus on i ∈ S . We will use arguments similar to (12) to prove an upper bound on the l thmoment of B ij − µ for planted entries and use that to prove that P T i B ij is concentrated aboutits mean. Let l ≥ E p (( B ij − µ ) l ) ≤ E p ( B lij ) ≤ √ πσ Z M exp( γlx ) exp( − x / σ ) + 4 exp( γlM ) √ πσ Z ∞ M xM exp( − x / σ ) dx ≤ σ Z M exp (cid:18) M x (cid:18) γl − σ (cid:19)(cid:19) dx + 2 σ M exp M γl − σ ij !! ≤ σ M (2 γ − (1 / σ )) exp (cid:18) M (cid:18) γl − σ (cid:19)(cid:19) ≤ cσ M exp (cid:0) M ( γl − (1 / σ )) (cid:1) . E p  X j ∈ T i ( B ij − µ ) m  ≤ ( cm exp( γM )) m  m/ X l =1 l (cid:18) kσ mM exp( − M / σ ) (cid:19) /l  m/ (14) Lemma 3.5

Let t = c ln N exp( γM ) " c √ k √ ln N (ln n ) / exp( − M / σ ) . Pr  ∃ i ∈ S : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X j ∈ T i ( B ij − µ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t  ≤ N Pr  ∃ i ∈ S : n X j =1 ( B ij − µ ) < t  ≤ N .

Proof.

The ﬁrst statement of the Lemma follows from (14) with m = 4 ln N by applying Markovinequality to | P j ∈ T i ( B ij − µ ) | and then union bound over all i ∈ S (using P l l x /l ≤ P l (1 /l )(1+ x ) ≤ c (1 + x ).)For the second statement, we start with a lower bound on µ ,.) µ ≥ cσ Z M exp( γx − x / σ ) ≥ cσ εM exp( γM − ( M / σ )) , (15)the last using: for λ > R M e λx ≥ R MM − (1 /λM ) exp( λ ( M − (1 /λM )) ) dx ≥ c exp( λM ) /λM . [Note:We also needed: M ≥ /εM which holds because M ∈ O ( √ ln n ) and ε > c/ ln n .] We assert that kµ > ct, t . This is proved by checcking three inequalities: kcσ εM exp( γM − ( M / σ )) > c ln N exp( γM ) kcσ εM exp( γM − ( M / σ )) > c ln N exp( γM ) √ nσ √ mM ε exp( − M / σ ) kcσ εM exp( γM − ( M / σ )) > c ln N exp( γM ) √ k (ln N ) / (ln n ) / exp( − M / σ ) . These all hold as can be checked by doing simple calculations.Now, we have n X j =1 ( B ij − µ ) = k ( µ − µ ) + X j ∈ T i ( B ij − µ ) + X j / ∈ T i ( B ij − µ ) . − t with high probability (the proof is exactly as for the non-plantedentries). The second term is at least − t (whp). We have already shown that µ ≤ √ kµ > t + t + µ ). This proves the second statement of the Lemma. (cid:3) Lemmas (3.5) and (3.4) together prove Theorem (3.1).

Noise Tolerance

This algorithm can tolerate (adversarial) noise which can perturb Ω ∗ ( e / ε )(which is, for example, a power of n when ε = c/ ln n ) of the planted entries in each row of S . Hereis a sketch of the argument for this: Note that the crucial lower bound on planted row sums in B comes from the lower bound on kµ , the expected row sum in S rows. The lower bound of L on µ involves the integral (15). It is easy to see that we only loose a constant factor if the integralis taken from 0 to M − σ εM (instead of to M ). Thus, corruption of all x ∈ h M − σ εM , M i wouldonly cost a constant factor. It is easy to see that (i) there are Ω ∗ ( e / ε ) points in this interval and(ii) these are the worst possible points to be corrupted. We assume the non-planted entries of an N × n matrix are drawn from N (0 , σ ). There is againa set S of “planted” rows, with | S | = k . For each i ∈ S , now we assume there is some subset T i of “planted entries”. [But | T i | are not equal and we are not given | T i | .] Planted entry ( i, j ) hasdistribution p ij ∼ N (0 , σ ij ). We assume each planted σ ij ≥ σ , where, σ = 2(1 + ε ) σ , ε > . Let τ i = X j ∈ T i n − σ /σ ij . (16)Let γ = 12 σ − σ . (17)Deﬁne M by: M = √ σ √ ln n. (18) B ij = exp (cid:0) γ Min( M , A ij ) (cid:1) . (19) Theorem 4.1

With the above notation, if, for all i ∈ S , τ i ≥ √ ε c (ln N )(ln n ) . , then, with high probability, the set of k rows of B with the largest row sums is precisley S . Corollary 4.2 If | T i | = k for all i ∈ S and all planted σ ij = σ , and k = n . − δ , with ε ≥ δ − δ + ln ln N ln n + ln ln n n , then, with high probability, the largest k row sums of B occur in the S rows. The analysis for the non-planted entries is the same as before.12 .1 Planted Entries are large

Now focus on i ∈ S . We will use arguments similar to (12) to prove an upper bound on the l thmoment of B ij − µ ij ( µ ij = E p ij ( B ij )) for planted entries and use that to prove that P T i B ij isconcentrated about its mean. Let l ≥ E p ij (( B ij − µ ij ) l ) ≤ E p ( B lij ) ≤ √ πσ ij Z M exp( γlx ) exp( − x / σ ij ) + 4 exp( γlM ) √ πσ ij Z ∞ M xM exp( − x / σ ij ) dx ≤ σ ij Z M exp M x γl − σ ij !! dx + 2 σ ij M exp M γl − σ ij !! ≤ σ M (2 γ − (1 / σ ij )) exp M γl − σ ij !! ≤ cσ M exp (cid:0) M ( γl − (1 / σ ij )) (cid:1) . (20) Lemma 4.3

For i ∈ S , let t i = c ln N exp( γM ) (cid:16) √ τ i √ ln N (ln n ) / (cid:17) . Pr  ∃ i ∈ S : X j ∈ T i ( B ij − µ ij ) < − t i  ≤ N . Pr  ∃ i ∈ S : n X j =1 ( B ij − µ ) < t  < N .

Proof.

First, ﬁx attention on one i ∈ S . We use a more general version of Theorem (2.3) also from([Kan09]): Theorem 4.4 If X , X , . . . , X n are independent (not necessarily identical) mean 0 random vari-ables, for any even positive integer m , we have E  n X j =1 X j  m  ≤ ( cm ) m  m/ X l =1 l  n X j =1 E ( X lj ) m  /l  m/ . We apply this with X j = B ij − µ ij for j ∈ T i . We plug in (20) for E ( X lj ) to get, with m = 4 ln N : E X j ∈ T i ( B ij − µ ij )  m ≤ ( cm exp( γM )) m  m/ X l =1 l  X j ∈ T i mM exp( − M / σ ij )  /l  m/ ≤ ( cm ) m exp( γmM )  m/ X l =1 l X j ∈ T i mM n − σ /σ ij  /l  m/ ≤ ( cm ) m exp( γmM ) τ m/ i ( mM ) m/ ! , x /l ≤ x + 1 for all x > m = 4 ln N , we get that for a single i ∈ S , probability that P j ∈ T i ( B ij − µ ) < − t i is at most 1 /N by using Markov inequality on (cid:12)(cid:12)(cid:12)P j ∈ T i ( B ij − µ ) (cid:12)(cid:12)(cid:12) m (noting: M ≥ c √ ln n ). We getthe ﬁrst statement of the Lemma by a union bound over all i ∈ S .For the second statement, we ﬁrst need to get a lower bound on µ ij : µ ij ≥ Z Mx =0 cσ ij exp( γx − x / σ ij ) dx ≥ cσ M exp( γM − M / σ ij ) , the last using: for λ > R M e λx ≥ R MM − (1 /λM ) exp( λ ( M − (1 /λM )) ) dx ≥ c exp( λM ) /λM . So, X j ∈ T i µ ij ≥ cσ M exp( γM ) τ i . (21)We have, using the same argument as in Lemma (3.4), with high probability, ∀ i ∈ S, X j / ∈ T i ( B ij − µ ) ≥ − t. (22)Thus, from (22), (21) and the ﬁrst assertion of the current Lemma, n X j =1 ( B ij − µ ) = X j ∈ T i ( B ij − µ ij ) + X j ∈ T i ( µ ij − µ ) + X j / ∈ T i ( B ij − µ ij ) ≥ − t i + cσ M exp( γM ) − t. We would like to assert the follwing inequalities, which together prove the second assertion of theLemma. cσ M exp( γM ) τ i > c ln N exp( γM ) > c (ln N ) exp( γM ) √ τ i √ ln N (ln n ) / > c ln N exp( γM ) (cid:18) √ cnσ √ mM ε exp( − M / σ ) (cid:19) . Each follows by a simple calculation. (cid:3)

For problems over distributions, the input is a distribution which can typically be accessed viaa sampling oracle that provide iid samples from the unknown distribution.

Statistical algorithmsare a restricted class of algorithms that are only allowed to query functions of the distributionrather than directly access samples. We consider three types of statistical query oracles from theliterature. Let X be the domain over which distributions are deﬁned (e.g., {− , } n or R n ).14. STAT( τ ): For any bounded function f : X → [ − , τ ∈ [0 , τ ) returns anumber p ∈ [ E D ( f ( x )) − τ, E D ( f ( x )) + τ ].2. VSTAT( t ): For any function f : X → { , } , and any integer t >

0, VSTAT( t ) returns anumber p ∈ [ E D ( f ( x )) − γ, E D ( f ( x )) + γ ] where γ = Max (cid:26) t , q Var D ( f ) t (cid:27) . Note that in thesecond term, Var D ( f ) = E D ( f )(1 − E D ( f )).3. 1-STAT: For any f : X → { , } , returns f ( x ) on a single random sample from D .The ﬁrst oracle was deﬁned by Kearns in his seminal paper [Kea93, Kea98] showing a lower boundfor learning parities using statistical queries and analyzed more generally by Blum et al. [BFJ + +

13] to get stronger lower bounds, including for theplanted clique problem. For relationships between these oracles (and simulations of one by another),the reader is referred to [FGR +

13, FPV13].Our algorithm for the hidden hubs problem can be made statistical. We focus on the detectionproblem P : determine with probability at least 3 / N (0 , σ )for every entry with no planting, or if it is a hidden hubs instance, i.e., on a ﬁxed k -subset ofcoordinates, the distribution is a mixture of N (0 , σ ) and N ( µ, σ ) where the latter distribution isused with mixing weight k/n . To get a statistical version of our algorithm ( p /p ), consider thefollowing query function f : For a random sample (column) x , truncate each entry, apply p /p − µ ,add all the entries and output 1 if the sum exceeds t ; else output 0.By Lemmas 3.4 and 4.3, with T = 100 t and the threshold t as in Lemma 3.4, we have thefollowing consequence: if there is no planting, the probability that this query is 1 is at most 1 /N ,while if there is a planting it is one with probability at least kn (1 − N ). Thus it suﬃces to approximatethe expectation to within relative error 1 /

2. To do this with VSTAT( t ), we set t = Cn/k for alarge enough constant C . Thus, a planted Gaussian of size n . − δ can be detected with a singlequery to VSTAT( O ( n/k )), provided σ ≥ ǫ ) σ .We will now prove that this upper bound is essentially tight. For cσ ≤ σ ≤ σ , for any c >

0, and k = n . − δ for any δ >

0, any statistical algorithm that detects hidden hubs must havesuperpolynomial complexity. For the lower bounds we assume the planted entries are drawn from N ( µ, σ ). The cases of most interest are (a) µ = 0 and (b) σ = σ . In both cases, the lower boundswill nearly match algorithmic upper bounds. Theorem 5.1

For a planting of size k = n − δ ,1. For µ = 0 and cσ ≤ σ ≤ σ (1 − ǫ ) , any c > , any statistical algorithm that solves P withprobability at least / needs n Ω(log n ) calls to VSTAT( n δ ).2. For µ = 0 and σ = 2 σ , any statistical algorithm that solves P with probability at least / needs n Ω(log n/ log log n ) calls to VSTAT( n δ ).3. For µ = 0 and σ ≤ (2 + o ( δ )) σ , any statistical algorithm that solves P with probability atleast / needs n ω (1) calls to VSTAT( n δ ).4. For σ = σ , if µ = o ( σ ln( √ n/k )) , any statistical algorithm that solves P with probabilityat least / needs n ω (1) calls to VSTAT( n δ ).Moreover, the number of queries to -STAT for any of the above settings is Ω( n δ ) . Statistical Dimension with Average Corre-lation deﬁned in [FGR + +

94] for learning problems. We ﬁrst need to deﬁne the correlation of two distributions

A, B and a reference distribution U , all over a domain X , ρ U ( A, B ) = E X (cid:18)(cid:18) A ( x ) U ( x ) − (cid:19) (cid:18) B ( x ) U ( x ) − (cid:19)(cid:19) . The average correlation of a set of distributions D with respect to reference distribution U is ρ U ( D ) = 1 |D| X A,B ∈D ρ U ( A, B ) . Deﬁnition 5.2

For ¯ γ > , domain X , a set of distributions D over X and a reference distribution U over X the statistical dimension of D relative to U with average correlation ¯ γ is denoted bySDA( D , U, ¯ γ ) and deﬁned to be the largest integer d such that for any subset D ′ ⊂ D , |D ′ | > |D| /d ⇒ ρ U ( D ′ ) ≤ ¯ γ . The main application of this deﬁnition is captured in the following theorem.

Theorem 5.3 [FGR +

13] For any decision problem P with reference distribution U , let D be a setof distributions such that d = SDA ( D , U, ¯ γ ) . Then any randomized algorithm that solves P withprobability at least ν > must make at least (2 ν − d queries to V ST AT (1 / γ ) . Moreover, anyalgorithm that solves P with probability at least / needs Ω(1) min { d, γ } calls to -STAT. For two subsets

S, T , each of size k , the correlation of their corresponding distributions F S , F T is ρ ( F S , F T ) = (cid:28) F S ( x ) F ( x ) − , F T ( x ) F ( x ) − (cid:29) F = E F (cid:18)(cid:18) F S ( x ) F ( x ) − (cid:19) (cid:18) F T ( x ) F ( x ) − (cid:19)(cid:19) where F is the distribution with no planting, i.e., N (0 , σ ) n . For proving the lower bound at thethreshold σ = 2 σ , it will be useful to deﬁne ¯ F S as F S with each coordinate restricted to the interval[ − M, M ]. We will set M = σ √ C ln k . As before, we focus on the range σ ∈ [ cσ , (2 + o (1)) σ ]. Lemma 5.4

For σ < σ ρ ( F S , F T ) = k n  σ σ p σ − σ ! | S ∩ T | exp (cid:18) µ σ − σ · | S ∩ T | (cid:19) −  . For σ = 2 σ , ρ ( ¯ F S , ¯ F T ) ≤ k ( C ln k ) | S ∩ T | / n . For σ = (2 + α ) σ and α = o (1) , ρ ( ¯ F S , ¯ F T ) ≤ k n k Cα | S ∩ T | / . roof. ρ ( F S , F T ) = (cid:28) F S ( x ) F ( x ) − , F T ( x ) F ( x ) − (cid:29) F = Z dF S ( x ) dF T ( x ) dF ( x ) − k n Π i ∈ S ∩ T σ √ πσ Z exp (cid:18) − ( x i − µ ) σ − ( x i − µ ) σ + x i σ (cid:19) − ! = k n Π i ∈ S ∩ T σ √ πσ Z exp (cid:18) − x i · σ − σ σ σ − µ − x i µ σ (cid:19) − ! Setting z = σ σ √ σ − σ , ρ ( F S , F T ) = k n Y i ∈ S ∩ T σ √ πσ Z exp (cid:18) − ( x i − µz /σ ) z + µ (cid:18) z σ − σ (cid:19)(cid:19) − ! . We note that if z ≤

0, then the integral diverges. Assuming that z > ρ ( F S , F T ) = k n Y i ∈ S ∩ T σ √ πσ Z exp (cid:18) − ( x i − µz /σ ) z + µ (cid:18) σ σ (2 σ − σ ) − σ (cid:19)(cid:19) − ! = k n exp (cid:18) µ | S ∩ T | σ − σ (cid:19) Y i ∈ S ∩ T σ √ πσ Z exp (cid:18) − ( x i − µz /σ ) z (cid:19) − ! = k n (cid:18) exp (cid:18) µ σ − σ (cid:19) σ zσ (cid:19) | S ∩ T | − ! = k n  σ σ p σ − σ exp (cid:18) µ σ − σ (cid:19)! | S ∩ T | −  Note that σ ≥ σ p σ − σ , so the above bound is of the form αβ | S ∩ T | , where β >

1. For thesecond part, we have ρ ( ¯ F S , ¯ F T ) ≤ k n σ √ πσ Z M − M dx ! | S ∩ T | ≤ k n (cid:18) C ln k (cid:19) | S ∩ T | / . σ = (2 + α ) σ , ρ ( ¯ F S , ¯ F T ) ≤ k n σ √ πσ Z M − M e αx σ dx ! | S ∩ T | ≤ k n (cid:16) k Cα/ (cid:17) | S ∩ T | / . (cid:3) Lemma 5.5

Let σ < σ and D be set of distributions induced by every possible subset of [ n ] ofsize k . Assume ρ ( F S , F T ) ≤ αβ | S ∩ T | for some β > . Then, for any subset A ⊂ D with | A | ≥ (cid:0) nk (cid:1) ℓ !( n/ k ) ℓ , the average correlation of A with any subset S is at most ρ ( A, S ) = 1 | A | X T ∈ A ρ ( F T , F S ) ≤ αβ ℓ . Proof.

This proof is similar to [FGR + T r = { T ∈ A : | T ∩ S | = r } . Then, X T ∈ A ρ ( F S , F T ) ≤ α X T ∈ A β | S ∩ T | = α k X r = r | T r ∩ A | β r . To maximize the bound, we would include in A sets that intersect S in k − k − A gives us a lower bound on the minimumintersection size r as follows. Note that for 0 ≤ j ≤ k − | T j +1 || T j | = (cid:0) kj +1 (cid:1)(cid:0) n − kk − j − (cid:1)(cid:0) kj (cid:1)(cid:0) n − kk − j (cid:1) = ( k − j ) ( j + 1)( n − k + j + 1) ≤ k jn where the last step assumes 2 k < n . Therefore, | T j | ≤ j ! (cid:18) k n (cid:19) j | T | ≤ (cid:0) nk (cid:1) j !( n/k ) j . This gives a bound on the minimum intersection size since k X j = r | T j | < (cid:0) nk (cid:1) r !( n/k ) r | A | , we get that r < ℓ . Using this, X T ∈ A ρ ( F S , F T ) ≤ α k X r = r | T r ∩ A | β r ≤ α | T r ∩ A | β r + k X r = r +1 | T r | β r ! ≤ α (cid:18) | T r ∩ A | β r + 2 | T r +1 | β r +1 − r + 1)( β − (cid:19) ≤ α | A | β r +1 ≤ αβ ℓ | A | . (cid:3) Theorem 5.6

For the planted Gaussian problem P , with (a) σ < σ , and average correlation atmost ¯ γ = 2 k n σ σ p σ − σ exp (cid:18) µ σ − σ (cid:19)! ℓ or (b) σ = 2 σ , and average correlation ¯ γ = 2 k n (cid:18) C ln k (cid:19) ℓ/ or (c) σ = (2 + α ) σ for α = o (1) , and average correlation ¯ γ = 2 k n k Cαℓ/ the statistical dimension of P is at least ℓ !( n/k ) ℓ / . We now state explicitly the three main corollaries of this theorem. This completes the proof ofTheorem 5.1.

Corollary 5.7

With µ = 0 , and σ = 2 σ (1 − ǫ ) , we have ¯ γ = 2 k n (cid:18) ǫ (1 − ǫ ) (cid:19) ℓ/ and for any δ > , with k = n . − δ , ℓ = c log n/ log(1 /ǫ (1 − ǫ )) , we have ¯ γ = 2 n c − δ − and SDA ( P , ¯ γ ) = Ω( n δ log ǫ (1 − ǫ ) n ) . Hence with c = δ , any statistical algorithm that solves P with probability at least / needs n Ω(log n ) calls to VSTAT( n δ ).

19e note that the above corollary applies for any 0 < σ < σ , with the bounds dependingmildly on how close σ is to the ends of this range. This is quantiﬁed by the dependence on ǫ (1 − ǫ )above.Our lower bound extends slightly above the threshold σ = 2 σ . For this, we need to observethat with respect to any n C samples, the distributions F S and ˆ F S are indistinguishable with highprobability (1 − n − C ). Therefore, proving a lower bound on the statistical dimension of P withdistributions ˆ F S is eﬀectively a lower bound for the original problem P with distributions F S . Corollary 5.8

With µ = 0 , and σ = 2 σ , we have ¯ γ = 2 k n (cid:18) C ln k (cid:19) ℓ/ and for any δ > , with k = n . − δ , ℓ = c log n/ k , we have ¯ γ = 2 n c − δ − and SDA ( P , ¯ γ ) = Ω( n δ log n/ log log n ) . Hence with c = δ , any statistical algorithm that solves P with probability at least / needs n Ω(log n/ log log n ) calls to VSTAT( n δ ). Moreover, for σ = (2 + α ) σ , α = o ( δ ) , we have ¯ γ = 2 k n k Cαℓ/ and for any δ > , with k = n . − δ , ℓ = 8 δ/Cα , we have ¯ γ = 2 n − δ − and SDA ( P , ¯ γ ) ≥ n δℓ . Hence any statistical algorithm that solves P with probability at least / needs n ω (1) calls toVSTAT( n δ ). Corollary 5.9

For σ = σ , ¯ γ = 2 k n exp (cid:18) µ ℓσ (cid:19) . and for any δ > , with k = n . − δ , µ = cσ ln( √ n/k ) , we have ¯ γ = 2 n cδℓ − δ − and SDA ( P , ¯ γ ) = Ω( n δℓ ) . If µ = o ( σ ln( √ n/k )) , any statistical algorithm that solves P with probability at least / needs n ω (1) calls to VSTAT( n δ ). References [AKS98] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a randomgraph. Random Structures and Algorithms, 13:457–466, 1998.[BCC +

10] Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vi-jayaraghavan. Detecting high log-densities: an O(n1/4) approximation for densestk-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing,STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 201–210, 2010.20BFJ +

94] Avrim Blum, Merrick Furst, Jeﬀrey Jackson, Michael Kearns, Yishay Mansour, andSteven Rudich. Weakly learning DNF and characterizing statistical query learningusing Fourier analysis. In STOC, pages 253–262, 1994.[BHK +

16] Boaz Barak, Samuel B. Hopkins, Jonathan A. Kelner, Pravesh Kothari, Ankur Moitra,and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted cliqueproblem. CoRR, abs/1604.03084, 2016.[Bop87] R. Boppana. Eigenvalues and graph bisection: An average-case analysis. Proceedings ofthe 28th IEEE Symposium on Foundations of Computer Science, pages 280–285, 1987.[BV09] S. Charles Brubaker and Santosh Vempala. Random tensors and planted cliques.In Approximation, Randomization, and Combinatorial Optimization. Algorithms andTechniques, 12th International Workshop, APPROX 2009, and 13th InternationalWorkshop, RANDOM 2009, Berkeley, CA, USA, August 21-23, 2009. Proceedings,pages 406–419, 2009.[DGGP11] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear timewith high probability. In Proceedings of the Meeting on Analytic Algorithmics andCombinatorics, ANALCO ’11, pages 67–75, Philadelphia, PA, USA, 2011. Society forIndustrial and Applied Mathematics.[DM15a] Yash Deshpande and Andrea Montanari. Finding hidden cliques of size p N/e in nearlylinear time. Found. Comput. Math., 15(4):1069–1128, August 2015.[DM15b] Yash Deshpande and Andrea Montanari. Improved sum-of-squares lower bounds forhidden clique and hidden submatrix problems. In Proceedings of The 28th Conferenceon Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 523–562, 2015.[FGR +

13] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Sta-tistical algorithms and a lower bound for planted clique. In Proceedings of the 45thannual ACM symposium on Symposium on theory of computing, pages 655–664. ACM,2013.[FGV17] Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. Statistical query algorithmsfor mean estimation and stochastic convex optimization. In SIAM Symposium onDiscrete Algorithms, 2017.[FK00] U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semi-random graph. Random Structures and Algorithms, 16(2):195–208, 2000.[FK08] Alan M. Frieze and Ravi Kannan. A new approach to the planted clique problem.In IARCS Annual Conference on Foundations of Software Technology and TheoreticalComputer Science, FSTTCS 2008, December 9-11, 2008, Bangalore, India, pages 187–198, 2008.[FP07] Delphine F´eral and Sandrine P´ech´e. The largest eigenvalue of rank one deformationof large wigner matrices. Communications in Mathematical Physics, 272(1):185–228,2007. 21FPV13] Vitaly Feldman, Will Perkins, and Santosh Vempala. On the complexity of randomsatisﬁability problems with planted solutions. CoRR, abs/1311.4821, 2013. Extendedabstract in STOC 2015.[FR10] Uriel Feige and Dorit Ron. Finding hidden cliques in linear time. In 21st InternationalMeeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis ofAlgorithms (AofA’10), pages 189–204. Discrete Mathematics and Theoretical ComputerScience, 2010.[HKP ++