[PDF] Computational Lower Bounds for Community Detection on Random Graphs

Abstract

This paper studies the problem of detecting the presence of a small dense community planted in a large Erdős-Rényi random graph G(N,q) , where the edge probability within the community exceeds q by a constant factor. Assuming the hardness of the planted clique detection problem, we show that the computational complexity of detecting the community exhibits the following phase transition phenomenon: As the graph size N grows and the graph becomes sparser according to q= N −α , there exists a critical value of α= 2 3 , below which there exists a computationally intensive procedure that can detect far smaller communities than any computationally efficient procedure, and above which a linear-time procedure is statistically optimal. The results also lead to the average-case hardness results for recovering the dense community and approximating the densest K -subgraph.

Full PDF

aa r X i v : . [ m a t h . S T ] M a r Computational Lower Bounds for Community Detection onRandom Graphs

Bruce Hajek Yihong Wu Jiaming Xu ∗ October 2, 2018

Abstract

This paper studies the problem of detecting the presence of a small dense community plantedin a large Erd˝os-R´enyi random graph G ( N, q ), where the edge probability within the commu-nity exceeds q by a constant factor. Assuming the hardness of the planted clique detectionproblem, we show that the computational complexity of detecting the community exhibits thefollowing phase transition phenomenon: As the graph size N grows and the graph becomessparser according to q = N − α , there exists a critical value of α = , below which there exists acomputationally intensive procedure that can detect far smaller communities than any compu-tationally eﬃcient procedure, and above which a linear-time procedure is statistically optimal.The results also lead to the average-case hardness results for recovering the dense communityand approximating the densest K -subgraph. Networks often exhibit community structure with many edges joining the vertices of the same com-munity and relatively few edges joining vertices of diﬀerent communities. Detecting communitiesin networks has received a large amount of attention and has found numerous applications in socialand biological sciences, etc (see, e.g., the exposition [23] and the references therein). While mostprevious work focuses on identifying the vertices in the communities, this paper studies the morebasic problem of detecting the presence of a small community in a large random graph, proposedrecently in [8]. This problem has practical applications including detecting new events and moni-toring clusters, and is also of theoretical interest for understanding the statistical and algorithmiclimits of community detection [15].Inspired by the model in [8], we formulate this community detection problem as a planted densesubgraph detection (

PDS ) problem. Speciﬁcally, let G ( N, q ) denote the Erd˝os-R´enyi random graphwith N vertices, where each pair of vertices is connected independently with probability q . Let G ( N, K, p, q ) denote the planted dense subgraph model with N vertices where: (1) each vertex isincluded in the random set S independently with probability KN ; (2) for any two vertices, theyare connected independently with probability p if both of them are in S and with probability q otherwise, where p > q . In this case, the vertices in S form a community with higher connectivitythan elsewhere. The planted dense subgraph here has a random size with mean K , which is similarto the models adopted in [16, 34, 35, 22, 31], instead of a deterministic size K as assumed in[8, 38, 15]. ∗ The authors are with the Department of ECE, University of Illinois at Urbana-Champaign, Urbana, IL, { b-hajek,yihongwu,jxu18 } @illinois.edu . eﬁnition 1. The planted dense subgraph detection problem with parameters (

N, K, p, q ), hence-forth denoted by

PDS ( N, K, p, q ), refers to the problem of distinguishing hypotheses: H : G ∼ G ( N, q ) , P ,H : G ∼ G ( N, K, p, q ) , P . The statistical diﬃculty of the problem depends on the parameters (

N, K, p, q ). Intuitively, ifthe expected dense subgraph size K decreases, or if the edge probabilities p and q both decreaseby the same factor, or if p decreases for q ﬁxed, the distributions under the null and alternativehypotheses become less distinguishable. Recent results in [8, 38] obtained necessary and suﬃcientconditions for detecting planted dense subgraphs under certain assumptions of the parameters.However, it remains unclear whether the statistical fundamental limit can always be achieved byeﬃcient procedures. In fact, it has been shown in [8, 38] that many popular low-complexity tests,such as total degree test, maximal degree test, dense subgraph test, as well as tests based oncertain convex relaxations, can be highly suboptimal. This observation prompts us to investigatethe computational limits for the PDS problem, i.e., what is the sharp condition on (

N, K, p, q )under which the problem admits a computationally eﬃcient test with vanishing error probability,and conversely, without which no algorithm can detect the planted dense subgraph reliably inpolynomial time. To this end, we focus on a particular case where the community is denser bya constant factor than the rest of the graph, i.e., p = cq for some constant c >

1. Adoptingthe standard reduction approach in complexity theory, we show that the

PDS problem in someparameter regime is at least as hard as the planted clique problem in some parameter regime,which is conjectured to be computationally intractable. Let G ( n, k, γ ) denote the planted cliquemodel in which we add edges to k vertices uniformly chosen from G ( n, γ ) to form a clique. Deﬁnition 2.

The PC detection problem with parameters ( n, k, γ ), denoted by PC ( n, k, γ ) hence-forth, refers to the problem of distinguishing hypotheses: H C0 : G ∼ G ( n, γ ) ,H C1 : G ∼ G ( n, k, γ ) . The problem of ﬁnding the planted clique has been extensively studied for γ = and the state-of-the-art polynomial-time algorithms [4, 20, 32, 21, 17, 6, 18] only work for k = Ω( √ n ). Thereis no known polynomial-time solver for the PC problem for k = o ( √ n ) and any constant γ > PC problem cannot be solved in polynomial time for k = o ( √ n ) with γ = , which we refer to as the PC Hypothesis.

Hypothesis 1.

Fix some constant 0 < γ ≤ . For any sequence of randomized polynomial-timetests { ψ n,k n } such that lim sup n →∞ log k n log n < / n →∞ P H C0 { ψ n,k ( G ) = 1 } + P H C1 { ψ n,k ( G ) = 0 } ≥ . The PC Hypothesis with γ = is similar to [30, Hypothesis 1] and [11, Hypothesis B PC ]. Ourcomputational lower bounds require that the PC Hypothesis holds for any positive constant γ . Aneven stronger assumption that PC Hypothesis holds for γ = 2 − log . n has been used in [7, Theorem10.3] for public-key cryptography. Furthermore, [22, Corollary 5.8] shows that under a statisticalquery model, any statistical algorithm requires at least n Ω( log n log(1 /γ ) ) queries for detecting the plantedbi-clique in an Erd˝os-R´enyi random bipartite graph with edge probability γ .2 .1 Main Results We consider the

PDS ( N, K, p, q ) problem in the following asymptotic regime: p = cq = Θ( N − α ) , K = Θ( N β ) , N → ∞ , (1)where c > α ∈ [0 ,

2] governs the sparsity of the graph, and β ∈ [0 ,

1] capturesthe size of the dense subgraph. Clearly the detection problem becomes more diﬃcult if either α increases or β decreases. Assuming the PC Hypothesis holds for any positive constant γ , we showthat the parameter space of ( α, β ) is partitioned into three regimes as depicted in Fig. 1: • The Simple Regime: β > + α . The dense subgraph can be detected in linear time withhigh probability by thresholding the total number of edges. • The Hard Regime: α < β < + α . Reliable detection can be achieved by thresholdingthe maximum number of edges among all subgraphs of size K ; however, no polynomial-timesolver exists in this regime. • The Impossible Regime: β < min { α, + α } . No test can detect the planted subgraphregardless of the computational complexity. p = cq = Θ( N − α ) K = Θ( N β )1 / β = α simple1 / O αβ / β = α / + / Figure 1: The simple (green), hard (red), impossible (gray) regimes for detecting the planted densesubgraph.The computational hardness of the

PDS problem exhibits a phase transition at the critical value α = 2 /

3: For moderately sparse graphs with α < /

3, there exists a combinatorial algorithm thatcan detect far smaller communities than any eﬃcient procedures; For highly sparse graphs with α > /

3, optimal detection is achieved in linear time based on the total number of edges. Equivalently,attaining the statistical detection limit is computationally tractable only in the large-communityregime ( β > / The case of α > K = N ). β < + α . It should be notedthat Fig. 1 only captures the leading polynomial term according to the parametrization (1); at theboundary β = α/ /

2, it is plausible that one needs to go beyond simple edge counting in orderto achieve reliable detection. This is analogous to the planted clique problem where the maximaldegree test succeeds if the clique size satisﬁes k = Ω( √ n log n ) [29] and the more sophisticatedspectral method succeeds if k = Ω( √ n ) [4].The above hardness result should be contrasted with the recent study of community detectionon the stochastic block model, where the community size scales linearly with the network size.When the edge density scales as Θ( N ) [34, 35, 31] (resp. Θ( log NN ) [1, 36, 24]), the statisticallyoptimal threshold for partial (resp. exact) recovery can be attained in polynomial time up to thesharp constants. In comparison, this paper focuses on the regime when the community size grows sublinearly as N β and the edge density decays more slowly as N − α . It turns out that in this caseeven achieving the optimal exponent is computationally as demanding as solving the planted cliqueproblem.Our computational lower bound for the PDS problem also implies the average-case hardnessof approximating the planted dense subgraph or the densest K -subgraph of the random graphensemble G ( N, K, p, q ), complementing the worst-case inapproximability result in [3], which is basedon the planted clique hardness as well. In particular, we show that no polynomial-time algorithmcan approximate the planted dense subgraph or the densest K -subgraph within any constant factorin the regime of α < β < + α , which provides a partial answer to the conjecture made in [15,Conjecture 2.6] and the open problem raised in [3, Section 4] (see Section 4.1). Our approachand results can be extended to the bipartite graph case (see Section 4.3) and shed light on thecomputational limits of the PDS problem with a ﬁxed planted dense subgraph size studied in[8, 38] (see Section 4.2).

This work is inspired by an emerging line of research (see, e.g., [28, 10, 11, 14, 30, 15, 39]) whichexamines high-dimensional inference problems from both the statistical and computational per-spectives. Our computational lower bounds follow from a randomized polynomial-time reductionscheme which approximately reduces the PC problem to the PDS problem of appropriately chosenparameters. Below we discuss the connections to previous results and highlight the main technicalcontributions of this paper. PC Hypothesis

Various hardness results in the theoretical computer science literature have beenestablished based on the PC Hypothesis with γ = , e.g. cryptographic applications [27], approxi-mating Nash equilibrium [25], testing k -wise independence [2], etc. More recently, the PC Hypoth-esis with γ = has been used to investigate the penalty incurred by complexity constraints oncertain high-dimensional statistical inference problems, such as detecting sparse principal compo-nents [11] and noisy biclustering (submatrix detection) [30]. Compared with most previous works,our computational lower bounds rely on the stronger assumption that the PC Hypothesis holds forany positive constant γ . An even stronger assumption that PC Hypothesis holds for γ = 2 − log . n has been used in [7] for public-key cryptography. It is an interesting open problem to prove that PC Hypothesis for a ﬁxed γ ∈ (0 , ) follows from that for γ = . Reduction from the PC Problem

Most previous work [25, 2, 3, 7] in the theoretical computerscience literature uses the reduction from the PC problem to generate computationally hard in-stances of problems and establish worst-case hardness results; the underlying distributions of the4nstances could be arbitrary. Similarly, in the recent works [11, 30] on the computational limits ofcertain minimax inference problems, the reduction from the PC problem is used to generate com-putationally hard but statistically feasible instances of their problems; the underlying distributionsof the instances can also be arbitrary as long as they are valid priors on the parameter spaces. Incontrast, here our goal is to establish the average-case hardness of the PDS problem based on thatof the PC problem. Thus the underlying distributions of the problem instances generated fromthe reduction must be close to the desired distributions in total variation under both the null andalternative hypotheses. To this end, we start with a small dense graph generated from G ( n, γ )under H and G ( n, k, γ ) under H , and arrive at a large sparse graph whose distribution is exactly G ( N, q ) under H and approximately equal to G ( N, K, p, q ) under H . Notice that simply spar-sifying the PC problem does not capture the desired tradeoﬀ between the graph sparsity and thecluster size. Our reduction scheme diﬀers from those used in [11, 30] which start with a large densegraph. Similar to ours, the reduction scheme in [3] also enlarges and sparsiﬁes the graph by takingits subset power; but the distributions of the resulting random graphs are rather complicated andnot close to the Erd˝os-R´enyi type. Inapproximability of the

DKS

Problem

The densest K -subgraph ( DKS ) problem refers toﬁnding the subgraph of K vertices with the maximal number of edges. In view of the NP-hardness ofthe DKS problem which follows from the NP-hardness of MAXCLIQUE, it is of interest to consideran η -factor approximation algorithm, which outputs a subgraph with K vertices containing atleast a η -fraction of the number of edges in the densest K -subgraph. Proving the NP-hardnessof (1 + ǫ )-approximation for DKS for any ﬁxed ǫ > PC Hypothesis holds with γ = , [3] shows that the DKS problem is hard to approximate within any constant factor even if the densest K -subgraphis a clique of size K = N β for any β <

1, where N denotes the total number of vertices. Thisworst-case inapproximability result is in stark contrast to the average-case behavior in the planteddense subgraph model G ( N, K, p, q ) under the scaling (1), where it is known [15, 5] that the planteddense subgraph can be exactly recovered in polynomial time if β > + α (see the simple regionin Fig. 2 below), implying that the densest K -subgraph can be approximated within a factor of1 + ǫ in polynomial time for any ǫ >

0. On the other hand, our computational lower bound for

PDS ( N, K, p, q ) shows that any constant-factor approximation of the densest K -subgraph has highaverage-case hardness if α < β < + α (see Section 4.1). Variants of

PDS

Model

Three versions of the

PDS model were considered in [12, Section 3].Under all three the graph under the null hypothesis is the Erd˝os-R´enyi graph. The versions of thealternative hypothesis, in order of increasing diﬃculty of detection, are: (1) The random planted model, such that the graph under the alternative hypothesis is obtained by generating an Erd˝os-R´enyi graph, selecting K nodes arbitrarily, and then resampling the edges among the K nodes witha higher probability to form a denser Erd˝os-R´enyi subgraph. This is somewhat more diﬃcult todetect than the model of [8, 38], for which the choice of which K nodes are in the planted densesubgraph is made before any edges of the graph are independently, randomly generated. (2) The dense in random model, such that both the nodes and edges of the planted dense K -subgraph arearbitrary; (3) The dense versus random model, such that the entire graph under the alternativehypothesis could be an arbitrary graph containing a dense K -subgraph. Our PDS model is closelyrelated to the ﬁrst of these three versions, the key diﬀerence being that for our model the size ofthe planted dense subgraph is binomially distributed with mean K (see Section 4.2). Thus, ourhardness result is for the easiest type of detection problem. A bipartite graph variant of the PDS

PDS problem is usedin [7] for cryptographic applications.

For any set S , let | S | denote its cardinality. Let s n = { s , . . . , s n } . For any positive integer N , let[ N ] = { , . . . , N } . For a, b ∈ R , let a ∧ b = min { a, b } and a ∨ b = max { a, b } . We use standardbig O notations, e.g., for any sequences { a n } and { b n } , a n = Θ( b n ) if there is an absolute constant C > /C ≤ a n /b n ≤ C . Let Bern( p ) denote the Bernoulli distribution with mean p andBinom( N, p ) denote the binomial distribution with N trials and success probability p . For randomvariables X, Y , we write X ⊥⊥ Y if X is independent with Y . For probability measures P and Q ,let d TV ( P , Q ) = R | d P − d Q | denote the total variation distance and χ ( P k Q ) = R (d P − d Q ) d Q the χ -divergence. The distribution of a random variable X is denoted by P X . We write X ∼ P if P X = P . All logarithms are natural unless the base is explicitly speciﬁed. This section determines the statistical limit for the

PDS ( N, K, p, q ) problem with p = cq for a ﬁxedconstant c >

1. For a given pair (

N, K ), one can ask the question: What is the smallest density q such that it is possible to reliably detect the planted dense subgraph? When the subgraph size K is deterministic , this question has been thoroughly investigated by Arias-Castro and Verzelen[8, 38] for general ( N, K, p, q ) and the statistical limit with sharp constants has obtained in certainasymptotic regime. Their analysis treats the dense regime log(1 ∨ ( Kq ) − ) = o (log NK ) [8] andsparse regime log NK = O (log(1 ∨ ( Kq ) − )) [38] separately. Here as we focus on the special case of p = cq and are only interested in characterizations within absolute constants, we provide a simplenon-asymptotic analysis which treats the dense and sparse regimes in a uniﬁed manner. Our resultsdemonstrate that the PDS problem in Deﬁnition 1 has the same statistical detection limit as the

PDS problem with a deterministic size K studied in [8, 38]. By the deﬁnition of the total variation distance, the optimal testing error probability is deter-mined by the total variation distance between the distributions under the null and the alternativehypotheses: min φ : { , } N ( N − / →{ , } ( P { φ ( G ) = 1 } + P { φ ( G ) = 0 } ) = 1 − d TV ( P , P ) . The following result (proved in Section A.1) shows that if q = O ( K log eNK ∧ N K ), then there existsno test which can detect the planted subgraph reliably. Proposition 1.

Suppose p = cq for some constant c > . There exists a function h : R + → R + satisfying h (0+) = 0 such that the following holds: For any ≤ K ≤ N , C > and q ≤ C ( K log eNK ∧ N K ) , d TV ( P , P ) ≤ h ( Cc ) + exp( − K/ . (2)6 .2 Upper Bound Let A denote the adjacency matrix of the graph G . The detection limit can be achieved by thelinear test statistic and scan test statistic proposed in [8, 38]: T lin , X i

Suppose p = cq for a constant c > . For the linear test statistic, set τ = (cid:0) N (cid:1) q + (cid:0) K (cid:1) ( p − q ) / . For the scan test statistic, set τ = (cid:0) K (cid:1) ( p + q ) / . Then there exists a constant C which only depends on c such that P [ T lin > τ ] + P [ T lin ≤ τ ] ≤ (cid:18) − C K qN (cid:19) + exp (cid:18) − K (cid:19) P [ T scan > τ ] + P [ T scan ≤ τ ] ≤ (cid:18) K log N eK − CK q (cid:19) + exp (cid:18) − K (cid:19) . To illustrate the implications of the above lower and upper bounds, consider the

PDS ( N, K, p, q )problem with the parametrization p = cq , q = N − α and K = N β for α > β ∈ (0 ,

1) and c >

1. In this asymptotic regime, the fundamental detection limit is characterized by the followingfunction β ∗ ( α ) , α ∧ (cid:18)

12 + α (cid:19) , (4)which gives the statistical boundary in Fig. 1. Indeed, if β < β ∗ ( α ), as a consequence of Propo-sition 1, P { φ ( G ) = 1 } + P { φ ( G ) = 0 } → β > β ∗ ( α ),then Proposition 2 implies that the test φ ( G ) = { T lin >τ or T scan >τ } achieves vanishing Type-I+IIerror probabilities. More precisely, the linear test succeeds in the regime β > + α , while the scantest succeeds in the regime β > α .Note that T lin can be computed in linear time. However, computing T scan amounts to enumer-ating all subsets of [ N ] of cardinality K , which can be computationally intensive. Therefore it isunclear whether there exists a polynomial-time solver in the regime α < β < + α . Assuming the PC Hypothesis, this question is resolved in the negative in the next section.

In this section, we establish the computational lower bounds for the

PDS problem assuming theintractability of the planted clique problem. We show that the

PDS problem can be approximatelyreduced from the PC problem of appropriately chosen parameters in randomized polynomial time.Based on this reduction scheme, we establish a formal connection between the PC problem and the PDS problem in Proposition 3, and the desired computational lower bounds follow as Theorem 1.We aim to reduce the PC ( n, k, γ ) problem to the PDS ( N, K, cq, q ) problem. For simplicity, wefocus on the case of c = 2; the general case follows similarly with a change in some numerical7onstants that come up in the proof. We are given an adjacency matrix A ∈ { , } n × n , or equiva-lently, a graph G, and with the help of additional randomness, will map it to an adjacency matrix e A ∈ { , } N × N , or equivalently, a graph e G such that the hypothesis H C0 (resp. H C1 ) in Deﬁnition 2 ismapped to H exactly (resp. H approximately) in Deﬁnition 1. In other words, if A is drawn from G ( n, γ ), then e A is distributed according to P ; If A is drawn from G ( n, k, , γ ), then the distributionof e A is close in total variation to P .Our reduction scheme works as follows. Each vertex in e G is randomly assigned a parent vertexin G, with the choice of parent being made independently for diﬀerent vertices in e G, and uniformlyover the set [ n ] of vertices in G. Let V s denote the set of vertices in e G with parent s ∈ [ n ] and let ℓ s = | V s | . Then the set of children nodes { V s : s ∈ [ n ] } form a random partition of [ N ]. For any1 ≤ s ≤ t ≤ n, the number of edges, E ( V s , V t ), from vertices in V s to vertices in V t in e G will beselected randomly with a conditional probability distribution speciﬁed below. Given E ( V s , V t ) , theparticular set of edges with cardinality E ( V s , V t ) is chosen uniformly at random.It remains to specify, for 1 ≤ s ≤ t ≤ n, the conditional distribution of E ( s, t ) given l s , l t , and A s,t . Ideally, conditioned on ℓ s and ℓ t , we want to construct a Markov kernel from A s,t to E ( s, t )which maps Bern(1) to the desired edge distribution Binom( ℓ s ℓ t , p ), and Bern( γ ) to Binom( ℓ s ℓ t , q ),depending on whether both s and t are in the clique or not, respectively. Such a kernel, unfortu-nately, provably does not exist. Nonetheless, this objective can be accomplished approximately interms of the total variation. For s = t ∈ [ n ] , let E ( V s , V t ) ∼ Binom( (cid:0) ℓ t (cid:1) , q ) . For 1 ≤ s < t ≤ n ,denote P ℓ s ℓ t , Binom( ℓ s ℓ t , p ) and Q ℓ s ℓ t , Binom( ℓ s ℓ t , q ). Fix 0 < γ ≤ and put m , ⌊ log (1 /γ ) ⌋ .Deﬁne P ′ ℓ s ℓ t ( m ) =  P ℓ s ℓ t ( m ) + a ℓ s ℓ t for m = 0 ,P ℓ s ℓ t ( m ) for 1 ≤ m ≤ m , γ Q ℓ s ℓ t ( m ) for m < m ≤ ℓ s ℓ t . where a ℓ s ℓ t = P m ℓ. (5)The next proposition (proved in Section A.3) shows that the randomized reduction deﬁned abovemaps G ( n, γ ) into G ( N, q ) under the null hypothesis and G ( n, k, γ ) approximately into G ( N, K, p, q )under the alternative hypothesis, respectively. The intuition behind the reduction scheme is asfollows: By construction, (1 − γ ) Q ′ ℓ s ℓ t + γP ′ ℓ s ℓ t = Q ℓ s ℓ t = Binom( ℓ s ℓ t , q ) and therefore the nulldistribution of the PC problem is exactly matched to that of the PDS problem, i.e., P e G | H C = P . Thecore of the proof lies in establishing that the alternative distributions are approximately matched.The key observation is that P ′ ℓ s ℓ t is close to P ℓ s ℓ t = Binom( ℓ s ℓ t , p ) and thus for nodes with distinctparents s = t in the planted clique, the number of edges E ( V s , V t ) is approximately distributed asthe desired Binom( ℓ s ℓ t , p ); for nodes with the same parent s in the planted clique, even though E ( V s , V s ) is distributed as Binom( (cid:0) ℓ s (cid:1) , q ) which is not suﬃciently close to the desired Binom( (cid:0) ℓ s (cid:1) , p ),after averaging over the random partition { V s } , the total variation distance becomes negligible. Proposition 3.

Let ℓ, n ∈ N , k ∈ [ n ] and γ ∈ (0 , ] . Let N = ℓn , K = kℓ , p = 2 q and m = ⌊ log (1 /γ ) ⌋ . Assume that qℓ ≤ and k ≥ eℓ . If G ∼ G ( n, γ ) , then e G ∼ G ( N, q ) , i.e., e G | H C = P . If G ∼ G ( n, k, , γ ) , then d TV (cid:16) P e G | H C , P (cid:17) ≤ e − K + 1 . ke − ℓ + 2 k (8 qℓ ) m +1 + 0 . p e e qℓ − √ . ke − ℓ . (6)An immediate consequence of Proposition 3 is the following result (proved in Section A.4)showing that any PDS solver induces a solver for a corresponding instance of the PC problem. Proposition 4.

Let the assumption of Proposition 3 hold. Suppose φ : { , } ( N ) → { , } is a testfor PDS ( N, K, q, q ) with Type-I+II error probability η . Then G φ ( e G ) is a test for the PC ( n, k, γ ) whose Type-I+II error probability is upper bounded by η + ξ with ξ given by the right-hand side of(6). The following theorem establishes the computational limit of the

PDS problem as shown inFig. 1.

Theorem 1.

Assume Hypothesis 1 holds for a ﬁxed < γ ≤ / . Let m = ⌊ log (1 /γ ) ⌋ . Let α > and < β < be such that α < β <

12 + m α + 44 m α + 4 α − m α . (7) Then there exists a sequence { ( N ℓ , K ℓ , q ℓ ) } ℓ ∈ N satisfying lim ℓ →∞ log q ℓ log N ℓ = α, lim ℓ →∞ log K ℓ log N ℓ = β such that for any sequence of randomized polynomial-time tests φ ℓ : { , } ( Nℓ ) → { , } for the PDS ( N ℓ , K ℓ , q ℓ , q ℓ ) problem, the Type-I+II error probability is lower bounded by lim inf ℓ →∞ P { φ ℓ ( G ′ ) = 1 } + P { φ ℓ ( G ′ ) = 0 } ≥ , where G ′ ∼ G ( N, q ) under H and G ′ ∼ G ( N, K, p, q ) under H . Consequently, if Hypothesis 1holds for all < γ ≤ / , then the above holds for all α > and < β < such that α < β < β ♯ ( α ) ,

12 + α . (8) Remark 1.

Consider the asymptotic regime given by (1). The function β ♯ in (8) gives the com-putational barrier for the PDS ( N, K, p, q ) problem (see Fig. 1). Compared to the statistical limit β ∗ given in (4), we note that β ∗ ( α ) < β ♯ ( α ) if and only if α < , in which case computationaleﬃciency incurs a signiﬁcant penalty on the detection performance. Interestingly, this phenomenonis in line with the observation reported in [30] for the noisy submatrix detection problem, wherethe statistical limit can be attained if and only if the submatrix size exceeds the (2 / th power ofthe matrix size. In this section, we discuss the extension of our results to: (1) the planted dense subgraph recoveryand

DKS problem; (2) the

PDS problem where the planted dense subgraph has a deterministic size.(3) the bipartite

PDS problem; 9 .1 Recovering Planted Dense Subgraphs and

DKS

Problem

Closely related to the

PDS detection problem is the recovery problem, where given a graph generatedfrom G ( N, K, p, q ), the task is to recover the planted dense subgraph. As a consequence of ourcomputational lower bound for detection, we discuss implications on the tractability of the recoveryproblem as well as the closely related

DKS problem as illustrated in Fig. 2.Consider the asymptotic regime of (1), where it has been shown [15, 5] that recovery is possibleif and only if β > α and α <

1. Note that in this case the recovery problem is harder than ﬁndingthe

DKS , because if the planted dense subgraph is recovered with high probability, we can obtaina (1 + ǫ )-approximation of the densest K -subgraph for any ǫ > Results in[15, 5] imply that the planted dense subgraph can be recovered in polynomial time in the simple(green) regime of Fig. 2 where β > + α . Consequently (1 + ǫ )-approximation of the DKS can befound eﬃciently in this regime.Conversely, given a polynomial time η -factor approximation algorithm to the DKS problem withthe output b S , we can distinguish H : G ∼ G ( N, q ) versus H : G ∼ G ( N, K, p = cq, q ) if β > α and c > η in polynomial time as follows: Fix any positive ǫ > − ǫ ) c > (1 + ǫ ) η .Declare H if the density of b S is larger than (1 + ǫ ) q and H otherwise. Assuming β > α , one canshow that the density of b S is at most (1 + ǫ ) q under H and at least (1 − ǫ ) p/η under H . Hence,our computational lower bounds for the PC problem imply that the densest K -subgraph as well asthe planted dense subgraph is hard to approximate to any constant factor if α < β < β ♯ ( α ) (thered regime in Fig. 1). Whether DKS is hard to approximate with any constant factor in the blueregime of β ♯ ( α ) ∨ α ≤ β ≤ + α is left as an interesting open problem. p = cq = Θ( n − α ) K = Θ( n β )1 / / O αβ Figure 2: The simple (green), hard (red), impossible (gray) regimes for recovering planted densesubgraphs, and the hardness in the blue regime remains open.

PDS

Problem with a Deterministic Size

In the

PDS problem with a deterministic size K , the null distribution corresponds to the Erd˝os-R´enyi graph G ( N, q ); under the alternative, we choose K vertices uniformly at random to planta dense subgraph with edge probability p . Although the subgraph size under our PDS model is If the planted dense subgraph size is smaller than K , output any K -subgraph containing it; otherwise outputany of its K -subgraph. K , it is not entirely clear whether these two models are equivalent. Although our reduction schemein Section 3 extends to the ﬁxed-size model with V n being the random ℓ -partition of [ N ] with | V t | = ℓ for all t ∈ [ n ], so far we have not been able to prove the alternative distributions areapproximately matched: The main technical hurdle lies in controlling the total variation betweenthe distribution of { E ( V t , V t ) , t ∈ [ n ] } after averaging over the random ℓ -partition { V t } and thedesired distribution.Nonetheless, our result on the hardness of solving the PDS problem extends to the case ofdeterministic dense subgraph size if the tests are required to be monotone. (A test φ is monotoneif φ ( G ) = 1 implies φ ( G ′ ) = 1 whenever G ′ is obtained by adding edges to G .) It is intuitive toassume that any reasonable test should be more likely to declare the existence of the planted densesubgraph if the graph contains more edges, such as the linear and scan test deﬁned in (3). Moreover,by the monotonicity of the likelihood ratio, the statistically optimal test is also monotone. If werestrict our scope to monotone tests, then our computational lower bound implies that for the PDS problem with a deterministic size, there is no eﬃciently computable monotone test in the hardregime of α < β < β ♯ in Fig. 1. In fact, for a given monotone polynomial-time solver φ for the PDS problem with size K , the PDS ( N, K, p, q ) can be solved by φ in polynomial time because with highprobability the planted dense subgraph is of size at least K . It is an interesting open problem toprove the computational lower bounds without restricting to monotone tests, or prove the optimalpolynomial-time tests are monotone. We conjecture that the computational limit of PDS of ﬁxedsize is identical to that of the random size, which can indeed by established in the bipartite case asdiscussed in the next subsection.Finally, we can show that the

PDS recovery problem with a deterministic planted dense subgraphsize K is computationally intractable if α < β < β ♯ ( α ) (the red regime in Fig. 1). This followsfrom the fact that given a polynomial-time algorithm for the PDS recovery problem with size K ,we can construct a polynomial-time solver for PDS ( N, K, p = cq, q ) if α < β (See Appendix B fora formal statement and the proof). PDS

Problem

Let G b ( N, q ) denote the bipartite Erd˝os-R´enyi random graph model with N top vertices and N bottom vertices. Let G b ( N, K, p, q ) denote the bipartite variant of the planted densest subgraphmodel in Deﬁnition 1 with a planted dense subgraph of K top vertices and K bottom vertices onaverage. The bipartite PDS problem with parameters (

N, K, p, q ), denoted by

BPDS ( N, K, p, q ),refers to the problem of testing H : G ∼ G b ( N, q ) versus H : G ∼ G b ( N, K, p, q ).Consider the asymptotic regime of (1). Following the arguments in Section 2, one can show thatthe statistical limit is given by β ∗ deﬁned in (4). To derive computational lower bounds, we usethe reduction from the bipartite PC problem with parameters ( n, k, γ ), denoted by BPC ( n, k, γ ),which tests H : G ∼ G b ( n, γ ) versus H : G ∼ G b ( n, k, γ ), where G b ( n, k, γ ) is the bipartite variantof the planted clique model with a planted bi-clique of size k × k . The BPC

Hypothesis refersto the assumption that for some constant 0 < γ ≤ /

2, no sequence of randomized polynomial-time tests for

BPC succeeds if lim sup n →∞ log k n log n < /

2. The reduction scheme from

BPC ( n, k, γ ) to BPDS ( N, K, q, q ) is analogue to the scheme used in non-bipartite case. The proof of computationallower bounds in bipartite graph is much simpler. In particular, under the null hypothesis, G ∼G b ( n, γ ) and one can verify that e G ∼ G b ( N, q ). Under the alternative hypothesis, G ∼ G b ( n, k, γ ).Lemma 1 directly implies that the total variation distance between the distribution of e G and G b ( N, K, q, q ) is on the order of k ( qℓ ) ( m +1) . Then, following the arguments in Proposition 4and Theorem 1, we conclude that if the BPC

Hypothesis holds for any positive γ , then no eﬃciently11omputable test can solve BPDS ( N, K, q, q ) in the regime α < β < β ♯ ( α ) given by (8). The sameconclusion also carries over to the bipartite PDS problem with a deterministic size K and thestatistical and computational limits shown in Fig. 1 apply verbatim. References [1] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model.

Arxivpreprint arXiv:1405.3267 , 2014.[2] N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. Testing k -wise andalmost k -wise independence. In Proceedings of the thirty-ninth annual ACM symposium onTheory of computing , pages 496–505. ACM, 2007.[3] N. Alon, S. Arora, R. Manokaran, D. Moshkovitz, and O. Weinstein. Inapprox-imabilty of densest κ -subgraph from average case hardness. ∼ rajsekar/ papers/ dks.pdf , 2011.[4] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. Random Structures and Algorithms , 13(3-4), 1998.[5] B. P. Ames. Robust convex relaxation for the planted clique and densest k -subgraph problems. arXiv:1305.4891 , 2013.[6] B. P. Ames and S. A. Vavasis. Nuclear norm minimization for the planted clique and bicliqueproblems. Mathematical programming , 129(1):69–89, 2011.[7] B. Applebaum, B. Barak, and A. Wigderson. Public-key cryptography from diﬀerent assump-tions. In

Proceedings of the Forty-second ACM Symposium on Theory of Computing ∼ boaz/Papers/ncpkcFull1.pdf.[8] E. Arias-Castro and N. Verzelen. Community detection in dense random networks. The Annalsof Statistics , 42(3):940–969, 06 2014.[9] S. Arora, B. Barak, M. Brunnermeier, and R. Ge. Computational complexity and informationasymmetry in ﬁnancial products. In

Innovations in Computer Science (ICS 2010) ∼ rongge/derivativelatest.pdf .[10] S. Balakrishnan, M. Kolar, A. Rinaldo, A. Singh, and L. Wasserman. Statistical and com-putational tradeoﬀs in biclustering. In NIPS 2011 Workshop on Computational Trade-oﬀs inStatistical Learning .[11] Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal componentdetection.

J. Mach. Learn. Res. , 30:1046–1066 (electronic), 2013.[12] A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: An o ( n / ) approximation for densest k -subgraph. In Proceedings of the Forty-secondACM Symposium on Theory of Computing , STOC ’10, pages 201–210, 2010.[13] C. Butucea and Y. I. Ingster. Detection of a sparse submatrix of a high-dimensional noisymatrix.

Bernoulli , 19(5B):2652–2688, 11 2013.1214] V. Chandrasekaran and M. I. Jordan. Computational and statistical tradeoﬀs via convexrelaxation.

PNAS , 110(13):E1181–E1190, 2013.[15] Y. Chen and J. Xu. Statistical-computational tradeoﬀs in planted problems and submatrixlocalization with a growing number of clusters and submatrices. arXiv:1402.1267 , 2014.[16] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova. Asymptotic analysis of the stochas-tic block model for modular networks and its algorithmic applications.

Physics Review E ,84:066106, 2011.[17] Y. Dekel, O. Gurel-Gurevich, and Y. Peres. Finding hidden cliques in linear time with highprobability. arxiv:1010.2997 , 2010.[18] Y. Deshpande and A. Montanari. Finding hidden cliques of size p N/e in nearly linear time. arxiv:1304.7047 , 2012.[19] D. Dubhashi and D. Ranjan. Balls and bins: A study in negative dependence.

RandomStructures and Algorithms , 13(2):99–124, 1998.[20] U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandomgraph.

Random Structures & Algorithms , 16(2):195–208, 2000.[21] U. Feige and D. Ron. Finding hidden cliques in linear time. In , pages 189–203, 2010.[22] V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao. Statistical algorithms and alower bound for detecting planted cliques. In

Proceedings of the 45th annual ACM symposiumon Symposium on theory of computing , pages 655–664, 2013.[23] S. Fortunato. Community detection in graphs.

Physics Reports , 486(3):75–174, 2010.[24] B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semideﬁniteprogramming. preprint, arxiv:1412.6156 , Nov 2014.[25] E. Hazan and R. Krauthgamer. How hard is it to approximate the best Nash equilibrium?

SIAM Journal on Computing , 40(1):79–91, 2011.[26] M. Jerrum. Large cliques elude the metropolis process.

Random Structures & Algorithms ,3(4):347–359, 1992.[27] A. Juels and M. Peinado. Hiding cliques for cryptographic security.

Designs, Codes & Crypto. ,2000.[28] M. Kolar, S. Balakrishnan, A. Rinaldo, and A. Singh. Minimax localization of structuralinformation in large noisy matrices. In

NIPS , 2011.[29] L. Kuˇcera. Expected complexity of graph partitioning problems.

Discrete Applied Mathematics ,57(2):193–212, 1995.[30] Z. Ma and Y. Wu. Computational barriers in minimax submatrix detection. to appear in

TheAnnals of Statistics, arXiv:1309.5914 , 2015.1331] L. Massouli´e. Community detection thresholds and the weak Ramanujan property. arxiv:1109.3318 , 2013.[32] F. McSherry. Spectral partitioning of random graphs. In

FOCS , pages 529 – 537, 2001.[33] M. Mitzenmacher and E. Upfal.

Probability and Computing: Randomized Algorithms andProbabilistic Analysis . Cambridge University Press, New York, NY, USA, 2005.[34] E. Mossel, J. Neeman, and A. Sly. Stochastic block models and reconstruction. available at:http://arxiv.org/abs/1202.1499 , 2012.[35] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. arxiv:1311.4115 , 2013.[36] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for binary symmetric block models.

Arxiv preprint arXiv:1407.1591 ∼ romanv/papers/decoupling-simple.pdf, 2011.[38] N. Verzelen and E. Arias-Castro. Community detection in sparse random networks. arXiv:1308.2955 , 2013.[39] J. Xu, R. Wu, K. Zhu, B. Hajek, R. Srikant, and L. Ying. Jointly clustering rows and columns ofbinary matrices: Algorithms and trade-oﬀs. SIGMETRICS Perform. Eval. Rev. , 42(1):29–41,June 2014.

A Proofs

A.1 Proof of Proposition 1

Proof.

Let P A || S | denote the distribution of A conditional on | S | under the alternative hypothesis.Since | S | ∼ Binom(

N, K/N ), by the Chernoﬀ bound, P [ | S | > K ] ≤ exp( − K/ d TV ( P , P ) = d TV ( P , E | S | [ P A || S | ]) ≤ E | S | (cid:2) d TV ( P , P A || S | ) (cid:3) ≤ exp( − K/

8) + X K ′ ≤ K d TV ( P , P A || S | = K ′ ) P [ | S | = K ′ ] , (9)where the ﬁrst inequality follows from the convexity of ( P, Q ) d TV ( P, Q ), Next we condition on | S | = K ′ for a ﬁxed K ′ ≤ K . Then S is uniformly distributed over all subsets of size K ′ . Let e S be an independent copy of S . Then | S ∩ e S | ∼ Hypergeometric(

N, K ′ , K ′ ). By the deﬁnition of the14 -divergence and Fubini’s theorem, χ ( P A || S | = K ′ k P ) = Z E S [ P A | S ] E e S [ P A | e S ] P − E S ⊥⊥ e S "Z P A | S P A | e S P − E S ⊥⊥ e S (cid:18) p − q ) q (1 − q ) (cid:19) ( | S ∩ e S | )  − ≤ E S ⊥⊥ e S " exp ( c − q − q (cid:18) | S ∩ e S | (cid:19)! − ( a ) ≤ E h exp (cid:16) ( c − cq | S ∩ e S | (cid:17)i − ( b ) ≤ τ ( Cc ) − , where ( a ) is due to the fact that q = pc ≤ c ; ( b ) follows from Lemma 6 in Appendix C with anappropriate choice of function τ : R + → R + satisfying τ (0+) = 1. Therefore, we get that2 d ( P , P A || S | = K ′ ) ≤ log( χ ( P A || S | = K ′ k P ) + 1) ≤ log( τ ( Cc )) , (10)Combining (9) and (10) yields (2) with h , log ◦ τ . A.2 Proof of Proposition 2

Proof.

Let

C > c and may change line by line.Under P , T lin ∼ Binom (cid:16)(cid:0) N (cid:1) , q (cid:17) . By the Bernstein inequality, P [ T lin > τ ] ≤ exp − (cid:0) K (cid:1) ( p − q ) / (cid:0) N (cid:1) q + (cid:0) K (cid:1) ( p − q ) / ! ≤ exp (cid:18) − C K qN (cid:19) . Under P , Since | S | ∼ Binom(

N, K/N ), by the Chernoﬀ bound, P [ | S | < . K ] ≤ exp( − K/ | S | = K ′ for some K ′ ≥ . K , then T lin is distributed as an independent sum ofBinom (cid:16)(cid:0) K ′ (cid:1) , p (cid:17) and Binom (cid:16)(cid:0) N (cid:1) − (cid:0) K ′ (cid:1) , q (cid:17) . By the multiplicative Chernoﬀ bound (see, e.g., [33,Theorem 4.5]), P [ T lin ≤ τ ] ≤ P [ | S | < . K ] + exp  − (cid:16) (cid:0) K ′ (cid:1) − (cid:0) K (cid:1)(cid:17) ( p − q ) (cid:16)(cid:0) N (cid:1) q + (cid:0) K ′ (cid:1) ( p − q ) (cid:17)  ≤ exp (cid:18) − K (cid:19) + exp (cid:18) − C K qN (cid:19) . S of size K , P i,j ∈ S A ij ∼ Binom (cid:16)(cid:0) K (cid:1) , q (cid:17) . By the union bound and the Bernstein inequality, P [ T scan > τ ] ≤ (cid:18) NK (cid:19) P [ X ≤ i τ ] ≤ (cid:18) N eK (cid:19) K exp − (cid:0) K (cid:1) ( p − q ) / (cid:0) K (cid:1) q + (cid:0) K (cid:1) ( p − q ) / ! ≤ exp (cid:18) K log N eK − CK q (cid:19) . Under the alternative hypothesis, conditional on | S | = K ′ for some K ′ ≥ . K , P i,j ∈ S A ij ∼ Binom (cid:16)(cid:0) K ′ (cid:1) , p (cid:17) and thus T scan is stochastically dominated by Binom (cid:16)(cid:0) K ′ ∧ K (cid:1) , p (cid:17) . By the multi-plicative Chernoﬀ bound, P [ T scan ≤ τ ] ≤ P [ | S | < . K ] + exp  − (cid:16) (cid:0) K ′ ∧ K (cid:1) − (cid:0) K (cid:1)(cid:17) ( p − q ) (cid:0) K ′ ∧ K (cid:1) p  ≤ exp (cid:18) − K (cid:19) + exp (cid:0) − CK q (cid:1) . A.3 Proof of Proposition 3

We ﬁrst introduce several key auxiliary results used in the proof. The following lemma ensuresthat P ′ ℓ s ℓ t and Q ′ ℓ s ℓ t are well-deﬁned under suitable conditions and that P ′ ℓ s ℓ t and P ℓ s ,ℓ t are close intotal variation. Lemma 1.

Suppose that p = 2 q and qℓ ≤ . Fix { ℓ t } such that ℓ t ≤ ℓ for all t ∈ [ k ] . Then forall ≤ s < t ≤ k , P ′ ℓ s ℓ t and Q ′ ℓ s ℓ t are probability measures and d TV ( P ′ ℓ s ℓ t , P ℓ s ℓ t ) ≤ qℓ ) ( m +1) . Proof.

Fix an ( s, t ) such that 1 ≤ s < t ≤ k . We ﬁrst show that P ′ ℓ s ℓ t and Q ′ ℓ s ℓ t are well-deﬁned.By deﬁnition, P ℓ s ℓ t m =0 P ′ ℓ s ℓ t ( m ) = P ℓ s ℓ t m =0 Q ′ ℓ s ℓ t ( m ) = 1 and it suﬃces to show positivity, i.e., P ℓ s ℓ t (0) + a ℓ s ℓ t ≥ , (11) Q ℓ s ℓ t ( m ) ≥ γP ′ ℓ s ℓ t ( m ) , ∀ ≤ m ≤ m . (12)Recall that P ℓ s ℓ t ∼ Binom( ℓ s ℓ t , p ) and Q ℓ s ℓ t ∼ Binom( ℓ s ℓ t , q ). Therefore, Q ℓ s ℓ t ( m ) = (cid:18) ℓ s ℓ t m (cid:19) q m (1 − q ) ℓ s ℓ t − m , P ℓ s ℓ t ( m ) = (cid:18) ℓ s ℓ t m (cid:19) p m (1 − p ) ℓ s ℓ t − m , ∀ ≤ m ≤ ℓ s ℓ t , It follows that1 γ Q ℓ s ℓ t ( m ) − P ℓ s ℓ t ( m ) = 1 γ (cid:18) ℓ s ℓ t m (cid:19) q m (1 − q ) ℓ s ℓ t − m "(cid:18) − q − q (cid:19) ℓ s ℓ t − m − m γ . m = ⌊ log (1 /γ ) ⌋ and thus Q ℓ s ℓ t ( m ) ≥ γP ℓ s ℓ t ( m ) for all m ≤ m . Furthermore, Q ℓ s ℓ t (0) = (1 − q ) ℓ s ℓ t ≥ (1 − qℓ s ℓ t ) ≥ − qℓ ≥ ≥ γ ≥ γP ′ ℓ s ℓ t (0) , and thus (12) holds. Recall that a ℓ s ℓ t = X m qℓ ≤ /

2, it follows that1 γ X m m (2 ℓ s ℓ t q ) m ≤ qℓ ) ( m +1) , (13)and therefore a ℓ s ℓ t ≥ − /

2. Furthermore, P ℓ s ℓ t (0) = (1 − p ) ℓ s ℓ t ≥ − pℓ s ℓ t ≥ − qℓ ≥ / , and thus (11) holds.Next we bound d TV (cid:0) P ′ ℓ s ℓ t , P ℓ s ℓ t (cid:1) . Notice that X m m ( ℓ s ℓ t p ) m ≤ qℓ ) ( m +1) . (14)Therefore, by the deﬁnition of the total variation distance and a ℓ s ℓ t , d TV ( P ′ ℓ s ℓ t , P ℓ s ℓ t ) = 12 | a ℓ s ℓ t | + 12 X m

Let P Y | X be a Markov kernel from X to Y and denote the marginal of Y by P Y = E X ∼ P X [ P Y | X ] . Let Q Y be such that P Y | X = x ≪ Q Y for all x . Let E be a measurable subset of X .Deﬁne g : X → ¯ R + by g ( x, e x ) , Z d P Y | X = x d P Y | X = e x d Q .

Then d TV ( P Y , Q Y ) ≤ P X ( E c ) + 12 r E h g ( X, e X ) E ( X ) E ( e X ) i − P X ( E c ) , (15) where e X is an independent copy of X ∼ P X . roof. By deﬁnition of the total variation distance, d TV ( P Y , Q Y ) = 12 k P Y − Q Y k ≤ k E [ P Y | X ] − E [ P Y | X { X ∈ E } ] k + 12 k E [ P Y | X { X ∈ E } ] − Q Y k , where the ﬁrst term is k E [ P Y | X ] − E [ P Y | X { X ∈ E } ] k = k E [ P Y | X { X E } ] k = P { X E } . Thesecond term is controlled by k E [ P Y | X { X ∈ E } ] − Q Y k = E Q Y "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:2) P Y | X { X ∈ E } (cid:3) Q Y − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E Q Y  E (cid:2) P Y | X { X ∈ E } (cid:3) Q Y − !  (16)= E Q Y  E (cid:2) P Y | X { X ∈ E } (cid:3) Q Y !  + 1 − E [ E (cid:2) P Y | X { X ∈ E } (cid:3) ] (17)= E h g ( X, ˜ X ) E ( X ) E ( e X ) i + 1 − P { X ∈ E } , (18)where (16) is Cauchy-Schwartz inequality, (18) follows from Fubini theorem. This proves the desired(15).Note that { V t : t ∈ [ n ] } can be equivalently generated as follows: Throw balls indexed by [ N ]into bins indexed by [ n ] independently and uniformly at random; let V t denote the set of balls in the t th bin. Furthermore, Fix a subset C ⊂ [ n ] and let S = ∪ t ∈ C V t . Conditioned on S , { V t : t ∈ C } canbe generated by throwing balls indexed by S into bins indexed by C independently and uniformlyat random. We need the following negative association property [19, Deﬁnition 1]. Lemma 3.

Fix a subset C ⊂ [ n ] and let S = ∪ t ∈ C V t . Let { e V t : t ∈ C } be an independent copyof { V t : t ∈ C } conditioned on S . Then conditioned on S , the full vector {| V s ∩ e V t | : s, t ∈ C } isnegatively associated, i.e., for every two disjoint index sets I, J ⊂ C × C , E [ f ( V s ∩ e V t , ( s, t ) ∈ I ) g ( V s ∩ e V t , ( s, t ) ∈ J )] ≤ E [ f ( V s ∩ e V t , ( s, t ) ∈ I )] E [ g ( V s ∩ e V t , ( s, t ) ∈ J )] , for all functions f : R | I | → R and g : R | J | → R that are either both non-decreasing or bothnon-increasing in every argument.Proof. Deﬁne the indicator random variables Z m,s,t for m ∈ S, s, t ∈ C as Z m,s,t = (cid:26) m th ball is contained in V s and e V t , . By [19, Proposition 12], the full vector { Z m,s,t : m ∈ S, s, t ∈ C } is negatively associated. Bydeﬁnition, we have | V s ∩ e V t | = X m ∈ S Z m,s,t , which is a non-decreasing function of { Z m,s,t : m ∈ S } . Moreover, for distinct pairs ( s, t ) = ( s ′ , t ′ ),the sets { ( m, s, t ) : m ∈ S } and { ( m, s ′ , t ′ ) : m ∈ S } are disjoint. Applying [19, Proposition 8]yields the desired statement. 18he negative association property of {| V s ∩ e V t | : s, t ∈ C } allows us to bound the expectationof any non-decreasing function of {| V s ∩ e V t | : s, t ∈ C } conditional on C and S as if they wereindependent [19, Lemma 2], i.e., for any collection of non-decreasing functions { f s,t : s, t ∈ [ n ] } , E  Y s,t ∈ C f s,t ( | V s ∩ e V t | ) (cid:12)(cid:12)(cid:12)(cid:12) C, S  ≤ Y s,t ∈ C E (cid:20) f s,t ( | V s ∩ e V t | ) (cid:12)(cid:12)(cid:12)(cid:12) C, S (cid:21) . (19) Lemma 4.

Suppose that X ∼ Binom(1 . K, k ) and Y ∼ Binom(3 ℓ, ek ) with K = kℓ and k ≥ eℓ .Then for all ≤ m ≤ ℓ − , P [ X = m ] ≤ P [ Y = m ] , and P [ X ≥ ℓ ] ≤ P [ Y = 2 ℓ ] .Proof. In view of the fact that ( nm ) m ≤ (cid:0) nm (cid:1) ≤ ( enm ) m , we have for 1 ≤ m ≤ ℓ , P [ X = m ] = (cid:18) . Km (cid:19) (cid:18) k (cid:19) m (cid:18) − k (cid:19) . K − m ≤ (cid:18) . eKmk (cid:19) m . Therefore, P [ X ≥ ℓ ] ≤ ∞ X m =2 ℓ (cid:18) . eℓkm (cid:19) m ≤ ∞ X m =2 ℓ (cid:18) e k (cid:19) m ≤ (0 . e/k ) ℓ − . e/k . On the other hand, for 1 ≤ m ≤ ℓ − P [ Y = m ] = (cid:18) ℓm (cid:19) (cid:16) ek (cid:17) m (cid:16) − ek (cid:17) ℓ − m ≥ (cid:18) eℓmk (cid:19) m (cid:18) − eℓk (cid:19) ≥ m − (cid:18) . eℓmk (cid:19) m ≥ P [ X = m ] . Moreover, P [ Y = 2 ℓ ] ≥ P [ X ≥ ℓ ] . Lemma 5.

Let T ∼ Binom( ℓ, τ ) and λ > . Assume that λℓ ≤ . Then E [exp( λT ( T − ≤ exp (cid:0) λℓ τ (cid:1) . (20) Proof.

Let ( s , . . . , s ℓ , t , . . . , t ℓ ) i.i.d. ∼ Bern( τ ), S = P ℓi =1 s i and T = P ℓi =1 t i . Next we use a decou-pling argument to replace T − T by ST : E [exp ( λT ( T − E " exp λ X i = j t i t j ! ≤ E " exp λ X i = j s i t j ! , (21) ≤ E [exp (4 λST )] , λT ≤ λℓ ≤ and exp( x ) − ≤ exp( a ) x for all x ∈ [0 , a ], the desired (20) follows from E [exp (4 λST )] = E h (1 + τ (exp(4 λT ) − ℓ i ≤ E h (1 + 8 τ λT ) ℓ i ≤ E [exp (8 τ λℓT )]= (1 + τ (exp (8 τ λℓ ) − ℓ ≤ exp (cid:0) τ λℓ (cid:1) . Proof of Proposition 3.

Let [ i, j ] denote the unordered pair of i and j . For any set I ⊂ [ N ], let E ( I )denote the set of unordered pairs of distinct elements in I , i.e., E ( I ) = { [ i, j ] : i, j ∈ S, i = j } , andlet E ( I ) c = E ([ N ]) \ E ( I ). For s, t ∈ [ n ] with s = t , let e G V s V t denote the bipartite graph where theset of left (right) vertices is V s (resp. V t ) and the set of edges is the set of edges in e G from verticesin V s to vertices in V t . For s ∈ [ n ], let e G V s V s denote the subgraph of e G induced by V s . Let e P V s V t denote the edge distribution of e G V s V t for s, t ∈ [ n ].First, we show that the null distributions are exactly matched by the reduction scheme. Lemma 1implies that P ′ ℓ s ℓ t and Q ′ ℓ s ℓ t are well-deﬁned probability measures, and by deﬁnition, (1 − γ ) Q ′ ℓ s ℓ t + γP ′ ℓ s ℓ t = Q ℓ s ℓ t = Binom( ℓ s ℓ t , q ). Under the null hypothesis, G ∼ G ( n, γ ) and therefore, accordingto our reduction scheme, E ( V s , V t ) ∼ Binom( ℓ s ℓ t , q ) for s < t and E ( V t , V t ) ∼ Binom( (cid:0) ℓ t (cid:1) , q ). Sincethe vertices in V s and V t are connected uniformly at random such that the total number of edges is E ( V s , V t ), it follows that e P V s V t = Q ( i,j ) ∈ V s × V t Bern( q ) for s < t and e P V s V t = Q [ i,j ] ∈E ( V s ) Bern( q )for s = t . Conditional on V n , { E ( V s , V t ) : 1 ≤ s ≤ t ≤ n } are independent and so are { e G V s V t : 1 ≤ s ≤ t ≤ n } . Consequently, P e G | H C = P = Q [ i,j ] ∈E ([ N ]) Bern( q ) and e G ∼ G ( N, q ).Next, we proceed to consider the alternative hypothesis, under which G is drawn from theplanted clique model G ( n, k, γ ). Let C ⊂ [ n ] denote the planted clique. Deﬁne S = ∪ t ∈ C V t andrecall K = kℓ . Then | S | ∼ Binom(

N, K/N ) and conditional on | S | , S is uniformly distributed overall possible subsets of size | S | in [ N ]. By the symmetry of the vertices of G , the distribution of e A conditional on C does not depend on C . Hence, without loss of generality, we shall assume that C = [ k ] henceforth. The distribution of e A can be written as a mixture distribution indexed by therandom set S as e A ∼ e P , E S  e P SS × Y [ i,j ] ∈E ( S ) c Bern( q )  ,

20y the deﬁnition of P , d TV ( e P , P ) = d TV  E S  e P SS × Y [ i,j ] ∈E ( S ) c Bern( q )  , E S  Y [ i,j ] ∈E ( S ) Bern( p ) Y [ i,j ] ∈E ( S ) c Bern( q )  ≤ E S  d TV  e P SS × Y [ i,j ] ∈E ( S ) c Bern( q ) , Y [ i,j ] ∈E ( S ) Bern( p ) Y [ i,j ] ∈E ( S ) c Bern( q )  = E S  d TV  e P SS , Y [ i,j ] ∈E ( S ) Bern( p )  ≤ E S  d TV  e P SS , Y [ i,j ] ∈E ( S ) Bern( p )  {| S |≤ . K }  + exp( − K/ , (22)where the ﬁrst inequality follows from the convexity of ( P, Q ) d TV ( P, Q ), and the last inequalityfollows from applying the Chernoﬀ bound to | S | . Fix an S ⊂ [ N ] such that | S | ≤ . K . Deﬁne P V t V t = Q [ i,j ] ∈E ( V t ) Bern( q ) for t ∈ [ k ] and P V s V t = Q ( i,j ) ∈ V s × V t Bern( p ) for 1 ≤ s < t ≤ k . By thetriangle inequality, d TV  e P SS , Y [ i,j ] ∈E ( S ) Bern( p )  ≤ d TV  e P SS , E V k  Y ≤ s ≤ t ≤ k P V s V t (cid:12)(cid:12)(cid:12)(cid:12) S  (23)+ d TV  E V k  Y ≤ s ≤ t ≤ k P V s V t (cid:12)(cid:12)(cid:12)(cid:12) S  , Y [ i,j ] ∈E ( S ) Bern( p )  . (24)To bound the term in (23), ﬁrst note that conditional on S , { V k } can be generated as follows:Throw balls indexed by S into bins indexed by [ k ] independently and uniformly at random; let V t is the set of balls in the t th bin. Deﬁne the event E = { V k : | V t | ≤ ℓ, t ∈ [ k ] } . Since | V t | ∼ Binom( | S | , /k ) is stochastically dominated by Binom(1 . K, /k ) for each ﬁxed 1 ≤ t ≤ k ,it follows from the Chernoﬀ bound and the union bound that P { E c } ≤ k exp( − ℓ/ d TV  e P SS , E V k  Y ≤ s ≤ t ≤ k P V s V t (cid:12)(cid:12)(cid:12)(cid:12) S  ( a ) = d TV  E V k  Y ≤ s ≤ t ≤ k e P V s V t (cid:12)(cid:12)(cid:12)(cid:12) S  , E V k  Y ≤ s ≤ t ≤ k P V s V t (cid:12)(cid:12)(cid:12)(cid:12) S  ≤ E V k  d TV  Y ≤ s ≤ t ≤ k e P V s V t , Y ≤ s ≤ t ≤ k P V s V t  (cid:12)(cid:12)(cid:12)(cid:12) S  ≤ E V k  d TV  Y ≤ s ≤ t ≤ k e P V s V t , Y ≤ s ≤ t ≤ k P V s V t  { V k ∈ E } (cid:12)(cid:12)(cid:12)(cid:12) S  + k exp( − ℓ/ , where ( a ) holds because conditional on V k , n e A V s V t : s, t ∈ [ k ] o are independent. Recall that ℓ t =21 V t | . For any ﬁxed V k ∈ E , we have d TV  Y ≤ s ≤ t ≤ k e P V s V t , Y ≤ s ≤ t ≤ k P V s V t  ( a ) = d TV  Y ≤ s

4; ( b ) follows from the negative associationproperty of {| V s ∩ e V t | : s, t ∈ [ k ] } proved in Lemma 3 and (19), in view of the monotonicity of x e q ( x ∧ ℓ ) on R + ; ( c ) follows because | V s ∩ e V t | is stochastically dominated by Binom(1 . K, /k )for all ( s, t ) ∈ [ k ] ; ( d ) follows from Lemma 4; ( e ) follows from Lemma 5 with λ = q/ qℓ ≤ / d TV  e P SS , Y [ i,j ] ∈E ( S ) Bern( p )  ≤ . ke − ℓ + 0 . q e e qℓ − ke − ℓ ≤ . ke − ℓ + 0 . p e e qℓ − √ . ke − ℓ . (28)The proposition follows by combining (22), (23), (24), (25) and (28). A.4 Proof of Proposition 4

Proof.

By assumption the test φ satisﬁes P { φ ( G ′ ) = 1 } + P { φ ( G ′ ) = 0 } = η, where G ′ is the graph in PDS ( N, K, q, q ) distributed according to either P or P . Let G denotethe graph in the PC ( n, k, γ ) and e G denote the corresponding output of the randomized reduc-tion scheme. Proposition 3 implies that e G ∼ G ( N, q ) under H C0 . Therefore P H C0 { φ ( e G ) = 1 } = P { φ ( G ′ ) = 1 } . Moreover, | P H C1 { φ ( e G ) = 0 } − P { φ ( G ′ ) = 0 }| ≤ d TV ( P e G | H C , P ) ≤ ξ. It follows that P H C0 { φ ( e G ) = 1 } + P H C1 { φ ( e G ) = 0 } ≤ η + ξ. .5 Proof of Theorem 1 Proof.

Fix α > < β < α < β < min (cid:26) m δ δ α, − δ + 1 + 2 δ δ α (cid:27) (29)holds for some δ >

0. Let ℓ ∈ N and q ℓ = ℓ − (2+ δ ) . Deﬁne n ℓ = ⌊ ℓ δα − ⌋ , k ℓ = ⌊ ℓ (2+ δ ) βα − ⌋ , N ℓ = n ℓ ℓ, K ℓ = k ℓ ℓ. (30)Then lim ℓ →∞ log q ℓ log N ℓ = (2 + δ )(2 + δ ) /α − α, lim ℓ →∞ log K ℓ log N ℓ = (2 + δ ) β/α − δ ) /α − β. (31)Suppose that for the sake of contradiction there exists a small ǫ > { φ ℓ } for PDS ( N ℓ , K ℓ , q ℓ , q ℓ ), such that P { φ N ℓ ,K ℓ ( G ′ ) = 1 } + P { φ N ℓ ,K ℓ ( G ′ ) = 0 } ≤ − ǫ holds for arbitrarily large ℓ , where G ′ is the graph in the PDS ( N ℓ , K ℓ , q ℓ , q ℓ ). Since β > α , we have k ℓ ≥ ℓ δ . Therefore, 16 q ℓ ℓ ≤ k ℓ ≥ eℓ for all suﬃciently large ℓ . Applying Proposition 4, weconclude that G φ ( e G ) is a randomized polynomial-time test for PC ( n ℓ , k ℓ , γ ) whose Type-I+IIerror probability satisﬁes P H C0 { φ ℓ ( e G ) = 1 } + P H C1 { φ ℓ ( e G ) = 0 } ≤ − ǫ + ξ, (32)where ξ is given by the right-hand side of (6). By the deﬁnition of q ℓ , we have q ℓ ℓ = ℓ − δ and thus k ℓ ( q ℓ ℓ ) m +1 ≤ ℓ δ ) β/α − − ( m +1) δ ≤ ℓ − δ , where the last inequality follows from (29). Therefore ξ → ℓ → ∞ . Moreover, by the deﬁnitionin (30), lim ℓ →∞ log k ℓ log n ℓ = (2 + δ ) β/α − δ ) /α − ≤ − δ, where the above inequality follows from (29). Therefore, (32) contradicts our assumption thatHypothesis 1 holds for γ . Finally, if Hypothesis 1 holds for any γ >

0, (8) follows from (7) bysending γ ↓ B Computational Lower Bounds for Approximately Recovering aPlanted Dense Subgraph with Deterministic Size

Let e G ( N, K, p, q ) denote the planted dense subgraph model with N vertices and a deterministicdense subgraph size K : (1) A random set S of size K is uniformly chosen from [ N ]; (2) for anytwo vertices, they are connected with probability p if both of them are in S and with probability q otherwise, where p > q . Let PDSR ( n, K, p, q, ǫ ) denote the planted dense subgraph recoveryproblem, where given a graph generated from e G ( N, K, p, q ) and an ǫ <

1, the task is to output a set b S of size K such that b S is a (1 − ǫ )-approximation of S , i.e., | b S ∩ S | ≥ (1 − ǫ ) K . The following theoremimplies that PDSR ( N, K, p = cq, q, ǫ ) is at least as hard as PDS ( N, K, p = cq, q ) if Kq = Ω(log N ).Notice that in PDSR ( N, K, p, q, ǫ ), the planted dense subgraph has a deterministic size K , whilein PDS ( N, K, p, q ), the size of the planted dense subgraph is binomially distributed with mean K .24 heorem 2. For any constant ǫ < and c > , suppose there is an algorithm A N with runningtime T N that solves the PDSR ( N, K, cq, q, ǫ ) problem with probability − η N . Then there exists atest φ N with running time at most N + N T N + N K that solves the PDS ( N, K, cq, q ) problemwith Type-I+II error probabilities at most η N + e − CK +2 N e − CK q + K log N , where the constant C > only depends on ǫ and c .Proof. Given a graph G , we construct a sequence of graphs G , . . . , G N sequentially as follows:Choose a permutation π on the N vertices uniformly at random. Let G = G . For each t ∈ [ N ],replace the vertex π ( t ) in G t − with a new vertex that connects to all other vertices independentlyat random with probability q . We run the given algorithm A N on G , . . . , G N and let S , . . . , S N denote the outputs which are sets of K vertices. Let E ( S i , S i ) denote the total number of edgesin S i and τ = q + (1 − ǫ ) ( p − q ) /

2. Deﬁne a test φ : G → { , } such that φ ( G ) = 1 if and onlyif max i ∈ [ N ] E ( S i , S i ) > τ (cid:0) K (cid:1) . The construction of each G i takes N time units; the running timeof A on G i is at most T N time units; the computation of E ( S i , S i ) takes at most K time units.Therefore, the total running time of φ is at most N + N T N + N K .Next we upper bound the Type-I and II error probabilities of φ . Let C = C ( ǫ, c ) denotea positive constant whose value may depend on the context. If G ∼ G ( N, q ), then all G i aredistributed according to G ( N, q ). By the union bound and the Bernstein inequality, P { φ ( G ) = 1 } ≤ N X i =1 P (cid:26) E ( S i , S i ) ≥ τ (cid:18) K (cid:19)(cid:27) ≤ N X i =1 X S ′ : S ′ ⊂ [ N ] , | S ′ | = K P (cid:26) E ( S ′ , S ′ ) ≥ τ (cid:18) K (cid:19)(cid:27) ≤ N (cid:18) NK (cid:19) exp − (cid:0) K (cid:1) (1 − ǫ ) ( p − q ) / (cid:0) K (cid:1) q + (cid:0) K (cid:1) (1 − ǫ ) ( p − q ) / ! ≤ N exp( − CK q + K log N ) . If G ∼ G ( N, K, p, q ), let S denote the set of vertices in the planted dense subgraph. Then | S | ∼ Binom( N, KN ) and by the Chernoﬀ bound, P [ | S | < K ] ≤ exp( − CK ). If | S | = K ′ ≥ K , then theremust exist some I ∈ [ N ] such that G I is distributed exactly as e G ( N, K, p, q ). Let S ∗ denote theset of vertices in the planted dense subgraph of G I such that | S ∗ | = K . Then conditional on I = i and the success of A N on G i , | S i ∩ S ∗ | ≥ (1 − ǫ ) K . Thus by the union bound and the Bernsteininequality, for K ′ ≥ K , P { φ ( G ) = 0 || S | = K ′ , I = i }≤ η N + X S ′ ⊂ [ N ]: | S ′ | = K, | S ′ ∩ S ∗ |≥ (1 − ǫ ) K P (cid:26) E ( S ′ , S ′ ) ≤ τ (cid:18) K (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) | S | = K ′ , I = i (cid:27) ≤ η N + K X t ≥ (1 − ǫ ) K (cid:18) Kt (cid:19)(cid:18) N − KK − t (cid:19) exp − (cid:0) K (cid:1) (1 − ǫ ) ( p − q ) / (cid:0) K (cid:1) p + (cid:0) K (cid:1) (1 − ǫ ) ( p − q ) / ! ≤ η N + K exp( − CK q + K log N ) .

25t follows that P { φ ( G ) = 0 }≤ P {| S | < K } + X K ′ ≥ K N X i =1 P {| S | = K ′ , I = i } P { φ ( G ) = 0 || S | = K ′ , I = i }≤ exp( − CK ) + η N + K exp( − CK q + K log N ) . C A Lemma on Hypergeometric Distributions

Lemma 6.

There exists a function τ : R + → R + satisfying τ (0+) = 1 such that the followingholds: For any p ∈ N and m ∈ [ p ] , let H ∼ Hypergeometric ( p, m, m ) and λ = b (cid:16) m log e pm ∧ p m (cid:17) with < b < / (16e) . Then E (cid:2) exp (cid:0) λH (cid:1)(cid:3) ≤ τ ( b ) . (33) Proof.

Notice that if p ≤

64, then the lemma trivially holds. Hence, assume p ≥

64 in the rest of theproof. We consider three separate cases depending on the value of m . We ﬁrst deal with the case of m ≥ p . Then λ = bp m ≤ bp . Since H ≤ p with probability 1, we have E (cid:2) exp (cid:0) λH (cid:1)(cid:3) ≤ exp(256 b ).Next assume that m ≤ log e pm . Then m ≤ log p and λ = bm log e pm . Let ( s , . . . , s m ) i.i.d. ∼ Bern( mp − m ).Then S = P mi =1 s i ∼ Bin( m, mp − m ) which dominates H stochastically. It follows that E (cid:2) exp (cid:0) λH (cid:1)(cid:3) ≤ E [exp ( λmS )]= (cid:20) mp − m (cid:16) e λm − (cid:17)(cid:21) m ( a ) ≤ exp (cid:18) m p (cid:18)(cid:16) e pm (cid:17) b − (cid:19)(cid:19) ( b ) ≤ exp p ) p (cid:18) e p log p (cid:19) b − !! ( c ) ≤ max ≤ p ≤ ( exp p ) p (cid:18) e p log p (cid:19) b − !!) := τ ( b ) , (34)where ( a ) follows because 1 + x ≤ exp( x ) for all x ∈ R and m ≤ p/

2; ( b ) follows because m ≤ log p and f ( x ) = x p (cid:16)(cid:0) e px (cid:1) b − (cid:17) in non-decreasing in x ; ( c ) follows because g ( x ) = x ) x (cid:20)(cid:16) e x log x (cid:17) b − (cid:21) is non-increasing when x ≥ τ (0+) = 1 by deﬁnition.In the rest of the proof we shall focus on the intermediate regime: log e pm ≤ m ≤ p . Since S dominates H stochastically, E (cid:2) exp (cid:0) λH (cid:1)(cid:3) ≤ E (cid:2) exp (cid:0) λS (cid:1)(cid:3) . (35)Let ( t , . . . , t m ) i.i.d. ∼ Bern( mp − m ) and T = P mi =1 t i , which is an independent copy of S . Next we use26 decoupling argument to replace S by ST : (cid:0) E (cid:2) exp (cid:0) λS (cid:1)(cid:3)(cid:1) =  E " exp λ m X i =1 s i + λ X i = j s i s j ! ≤ E [exp (2 λS )] E " exp λ X i = j s i s j ! (36) ≤ E [exp (2 λS )] E " exp λ X i = j s i t j ! , (37) ≤ E [exp (2 λS )] E [exp (8 λST )] , (38)where (36) is by Cauchy-Schwartz inequality and (37) is a standard decoupling inequality (see, e.g.,[37, Theorem 1]).The ﬁrst expectation on the right-hand side (38) can be easily upper bounded as follows: Since m ≥ log e pm , we have λ ≤ b . Using the convexity of the exponential function:exp( ax ) − ≤ (e a − x, x ∈ [0 , , (39)we have E [exp (2 λS )] ≤ exp (cid:18) m p − m (cid:16) e λ − (cid:17)(cid:19) ≤ exp (cid:0) e b − (cid:1) m λbp ! ≤ exp (cid:18) (cid:16) e b − (cid:17) m log e pm p (cid:19) ≤ exp (cid:16) (cid:16) e b − (cid:17)(cid:17) , (40)where the last inequality follows from max ≤ x ≤ x log e x = 1.Next we prove that for some function τ ′ : R + → R + satisfying τ ′ (0+) = 1, E [exp (8 λST )] ≤ τ ′ ( b ) , (41)which, in view of (35), (38) and (40), completes the proof of the lemma. We proceed toward thisend by truncating on the value of T . First note that E h exp (8 λST ) { T > λ } i ≤ E h exp (cid:16) bT log e pm (cid:17) { T > λ } i (42)27here the last inequality follows from S ≤ m and λm ≤ b log e pm . It follows from the deﬁnition that E h exp (cid:16) bT log e pm (cid:17) { T > λ } i ≤ X t ≥ / (8 λ ) exp (cid:16) bt log e pm (cid:17) (cid:18) mt (cid:19) (cid:18) mp − m (cid:19) t ( a ) ≤ X t ≥ / (8 λ ) exp (cid:16) bt log e pm + t log e mt − t log p m (cid:17) ( b ) ≤ X t ≥ / (8 λ ) exp (cid:16) bt log e pm + t log (cid:16) b log e pm (cid:17) − t log p m (cid:17) ( c ) ≤ X t ≥ / (8 λ ) exp [ − t (log 2 − b log(4e) − log (8e b log(4e)))] ( d ) ≤ X t ≥ / (8 b ) exp [ − t (log 2 − b log(4e) − log (8e b log(4e)))] := τ ′′ ( b ) (43)where ( a ) follows because (cid:0) mt (cid:1) ≤ (cid:0) emt (cid:1) t and m ≤ p/

2; ( b ) follows because mt ≤ mλ ≤ b log e pm ; ( c )follows because m ≤ p/ b ≤ / (16e); ( d ) follows because λ ≤ b ; τ ′′ (0+) = 0 holds becauselog 2 < b log(4e) + log (8e b log(4e)) for b ≤ / (16e).Recall that m ≥ log e pm . Then λ = b (cid:16) m log e pm ∧ p m (cid:17) ≤ b (cid:16) ∧ p m (cid:17) . Hence, we have m λp ≤ b (cid:18) m p ∧ pm (cid:19) ≤ b (44)By conditioning on T and averaging with respect to S , we have E h exp (8 λST ) { T ≤ λ } i ≤ E (cid:20) exp (cid:18) m p (exp(8 λT ) − (cid:19) { T ≤ λ } (cid:21) ( a ) ≤ E (cid:20) exp (cid:18) m p λT (cid:19)(cid:21) ( b ) ≤ exp (cid:26) m p (cid:18) exp (cid:18) m λp (cid:19) − (cid:19)(cid:27) ( c ) ≤ exp (cid:26) m p λ (cid:27) ( d ) ≤ exp(32e b ) , (45)where ( a ) follows from e x − ≤ e a x for x ∈ [0 , a ]; ( b ) follows because T ∼ Bin( m, mp − m ) and p ≥ m ;( c ) follows due to (44) and 16e b ≤

1; ( d ) follows because λ ≤ b p m4