[PDF] Approximation Algorithms for Hypergraph Small Set Expansion and Small Set Vertex Expansion

Abstract

The expansion of a hypergraph, a natural extension of the notion of expansion in graphs, is defined as the minimum over all cuts in the hypergraph of the ratio of the number of the hyperedges cut to the size of the smaller side of the cut. We study the Hypergraph Small Set Expansion problem, which, for a parameter δ∈(0,1/2] , asks to compute the cut having the least expansion while having at most δ fraction of the vertices on the smaller side of the cut. We present two algorithms. Our first algorithm gives an O ~ ( δ −1 logn − − − − √ ) approximation. The second algorithm finds a set with expansion O ~ ( δ −1 ( d max r −1 logr ϕ ∗ − − − − − − − − − − − − √ + ϕ ∗ )) in a r --uniform hypergraph with maximum degree d max (where ϕ ∗ is the expansion of the optimal solution). Using these results, we also obtain algorithms for the Small Set Vertex Expansion problem: we get an O ~ ( δ −1 logn − − − − √ ) approximation algorithm and an algorithm that finds a set with vertex expansion O( δ −1 ϕ V log d max − − − − − − − − − √ + δ −1 ϕ V ) (where ϕ V is the vertex expansion of the optimal solution). For δ=1/2 , Hypergraph Small Set Expansion is equivalent to the hypergraph expansion problem. In this case, our approximation factor of O( logn − − − − √ ) for expansion in hypergraphs matches the corresponding approximation factor for expansion in graphs due to ARV.

Full PDF

AApproximation Algorithms for Hypergraph Small Set Expansionand Small Set Vertex Expansion

Anand Louis ∗ Georgia [email protected] Yury Makarychev † [email protected]

Abstract

The expansion of a hypergraph, a natural extension of the notion of expansion in graphs, is de-ﬁned as the minimum over all cuts in the hypergraph of the ratio of the number of the hyperedgescut to the size of the smaller side of the cut. We study the Hypergraph Small Set Expansion prob-lem, which, for a parameter δ ∈ (0 , / , asks to compute the cut having the least expansion whilehaving at most δ fraction of the vertices on the smaller side of the cut. We present two algorithms. Ourﬁrst algorithm gives an ˜ O ( δ − √ log n ) approximation. The second algorithm ﬁnds a set with expansion ˜ O ( δ − ( (cid:112) d max r − log r φ ∗ + φ ∗ )) in a r –uniform hypergraph with maximum degree d max (where φ ∗ isthe expansion of the optimal solution). Using these results, we also obtain algorithms for the Small SetVertex Expansion problem: we get an ˜ O ( δ − √ log n ) approximation algorithm and an algorithm thatﬁnds a set with vertex expansion O (cid:16) δ − (cid:112) φ V log d max + δ − φ V (cid:17) (where φ V is the vertex expansion ofthe optimal solution).For δ = 1 / , Hypergraph Small Set Expansion is equivalent to the hypergraph expansion problem. Inthis case, our approximation factor of O ( √ log n ) for expansion in hypergraphs matches the correspond-ing approximation factor for expansion in graphs due to Arora, Rao, and Vazirani (2004). The expansion of a hypergraph, a natural extension of the notion of expansion in graphs, is deﬁned as follows.

Deﬁnition 1.1 (Hypergraph Expansion) . Given a hypergraph H = ( V, E ) on n vertices (each edge e ∈ E of H is a subset of vertices), we say that an edge e ∈ E is cut by a set S if e ∩ S (cid:54) = ∅ and e ∩ ¯ S (cid:54) = ∅ (i.e. some vertices in e lie in S and some vertices lie outside of S ). We denote the set of edges cut by S by E cut ( S ) . The expansion φ ( S ) of a set S ⊂ V ( S (cid:54) = ∅ , S (cid:54) = V ) in a hypergraph H = ( V, E ) is deﬁned as φ ( S ) = | E cut ( S ) | min( | S | , | ¯ S | ) . Hypergraph expansion and related hypergraph partitioning problems are of immense practical impor-tance, having applications in parallel and distributed computing (Catalyurek and Aykanat (1999)), VLSIcircuit design and computer architecture (Karypis et. al. (1999); Girard et. al (2000)), scientiﬁc computing(Devine et. al. (2006)) and other areas. Inspite of this, there has not been much theoretical work on them. Inthis paper, we study a generalization of the Hypergraph Expansion prolbem, namely the Hypergraph SmallSet Expansion problem. ∗ Supported by Santosh Vempala’s NSF award CCF-1217793. † Supported by NSF CAREER award CCF-1150062 and NSF award IIS-1302662. a r X i v : . [ c s . D S ] A p r roblem 1.2 (Hypergraph Small Set Expansion Problem) . Given a hypergraph H = ( V, E ) and a parameter δ ∈ (0 , / , the Hypergraph Small Set Expansion problem (H-SSE) is to ﬁnd a set S ⊂ V of size at most δn that minimizes φ ( S ) . The value of the optimal solution to H-SSE is called the small set expansion of H .That is, for δ ∈ (0 , / , the small set expansion φ ∗ H,δ of a hypergraph H = ( V, E ) is deﬁned as φ ∗ H,δ = min S ⊂ V < | S |≤ δn φ ( S ) . Note that for δ = 1 / , the Hypergraph Small Set Expansion Problem is the Hypergraph Expansion Problem.Small Set Expansion in graphs has attracted a lot of attention recently. The problem was introducedby Raghavendra and Steurer (2010), who showed that it is closely related to the Unique Games problem.Raghavendra, Steurer and Tetali (2010) designed an algorithm for SSE that ﬁnds a set of size O ( δn ) withexpansion O ( (cid:112) φ ∗ d log(1 /δ )) in d regular graphs (where φ ∗ is the expansion of the optimal solution). LaterBansal, Feige, Krauthgamer, Makarychev, Nagarajan, Naor, and Schwartz (2011) gave a O ( (cid:112) log n log(1 /δ )) approximation algorithm for the problem.We present analogs of the results of Bansal et al. (2011) and Raghavendra, Steurer and Tetali (2010) forhypergraphs. Our ﬁrst result is an ˜ O ( δ − √ log n ) approximation algorithm for H-SSE (see Theorem 1.3).Our second result is an algorithm that ﬁnds a set with expansion at most ˜ O (cid:16) δ − (cid:16)(cid:113) d max log rr φ ∗ H,δ + φ ∗ H,δ (cid:17)(cid:17) if H is an r –uniform hypergraph with maximum degree d max (see Theorem 1.4; the result also applies tonon-uniform hypergraphs, see Theorem B.2).We note that H-SSE can be reduced to SSE (small set expansion in graphs) if all hyperedges havebounded size. Let r be the size of the largest hyperedge in H . Construct an auxiliary graph F on V asfollows: pick a vertex in each hyperedge e and connect it in F to all other vertices of e (i.e. replace e with a star). Then solve SSE in the graph F . It is easy to see that if we solve SSE using an α approximationalgorithm, then we get ( r − α approximation for H-SSE. This approach gives O ( (cid:112) log n log(1 /δ )) approx-imation if r is bounded. However, if H is an arbitrary hypergraph, we only get an O ( n (cid:112) log n log(1 /δ )) approximation. The goal of this paper is to give an approximation guarantee valid for hypergraphs withhyperedges of arbitrary size. We now formally state our main results. Theorem 1.3.

There is a randomized polynomial-time approximation algorithm for the Hypergraph SmallSet Expansion problem that given a hypergraph H = ( V, E ) , and parameters ε ∈ (0 , and δ ∈ (0 , / ,ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ ( S ) ≤ O ε (cid:16) δ − log δ − log log δ − · (cid:112) log n · φ ∗ H,δ (cid:17) = ˜ O ε (cid:16) δ − (cid:112) log n φ ∗ H,δ (cid:17) , (where the constant in the O notation depends polynomially on /ε ). That is, the algorithm gives O ( √ log n ) approximation when δ and ε are ﬁxed. We state our second result, Theorem 1.4, for r -uniform hypergraphs. We present and prove a more generalTheorem B.2 that applies to any hypergraphs in Section B. Theorem 1.4.

There is a randomized polynomial-time algorithm for the Hypergraph Small Set Expansionproblem that given an r –uniform hypergraph H = ( V, E ) with maximum degree d max , and parameters ε ∈ (0 , and δ ∈ (0 , / ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ ( S ) ≤ ˜ O ε (cid:32) δ − (cid:32)(cid:114) d max log rr φ ∗ H,δ + φ ∗ H,δ (cid:33)(cid:33) . The ˜ O –notation hides a log δ − log log δ − term. S of size atmost (1 + ε ) δn . We note that this is similar to the algorithm of Bansal et al. (2011) for SSE, which alsoﬁnds a set of size at most (1 + ε ) δn rather than a set of size at most δn . The algorithm of Raghavendra,Steurer and Tetali (2010) ﬁnds a set of size O ( δn ) . The approximation factor of our ﬁrst algorithm does notdepend on the size of hyperedges in the input hypergraph. It has the same dependence on n as the algorithmof Bansal et al. (2011) for SSE. However, the dependence on /δ is quasi-linear; whereas it is logarithmic inthe algorithm of Bansal et al. (2011). In fact, we show that the integrality gap of the standard SDP relaxationfor H-SSE is at least linear in /δ (Theorem D.1). The approximation guarantee of our second algorithm isanalogous to that of the algorithm of Raghavendra, Steurer and Tetali (2010). Small Set Vertex Expansion.

Our techniques can also be used to obtain an approximation algorithm forSmall Set Vertex Expansion (SSVE) in graphs.

Problem 1.5 (Small Set Vertex Expansion Problem) . Given graph G = ( V, E ) , the vertex expansion of a set S ⊂ V is deﬁned as φ V ( S ) = |{ u ∈ ¯ S : ∃ v ∈ S such that { u, v } ∈ E }|| S | Given a parameter δ ∈ (0 , / , the Small Set Vertex Expansion problem (SSVE) is to ﬁnd a set S ⊂ V ofsize at most δn that minimizes φ V ( S ) . The value of the optimal solution to SSVE is called the small set vertexexpansion of G . That is, for δ ∈ (0 , / , the small set expansion φ VG,δ of a graph G = ( V, E ) is deﬁned as φ VG,δ = min S ⊂ V < | S |≤ δn φ V ( S ) . Small Set Vertex Expansion recently gained interest due to its connection to obtaining subexponen-tial-time, constant factor approximation algorithms for many combinatorial problems like Sparsest Cut andGraph Coloring (Arora, and Ge (2011); Louis, Raghavendra and Vempala (2012)). Using a reduction fromvertex expansion in graphs to hypergraph expansion, we can get an approximation algorithm for SSVEhaving the same approximation guarantee as that for H-SSE.

Theorem 1.6.

There exist absolute constants c , c ∈ R + such that for every graph G = ( V, E ) , there existsa polynomial time computable hypergraph H = ( V (cid:48) , E (cid:48) ) such that c φ ∗ H,δ ≤ φ VG,δ ≤ c φ ∗ H,δ . Also, η H max ≤ log ( d max +1) , where d max is the maximum degree of G (where η H max is deﬁned in Deﬁnition B.1). From this theorem, Theorem 1.3 and Theorem B.2 we immediately get algorithms for SSVE.

Theorem 1.7 (Corollary to Theorem 1.3 and Theorem 1.6) . There is a randomized polynomial-time ap-proximation algorithm for the Small Set Vertex Expansion problem that given a graph G = ( V, E ) , andparameters ε ∈ (0 , and δ ∈ (0 , / ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ V ( S ) ≤ O ε (cid:16)(cid:112) log n δ − log δ − log log δ − · φ VG,δ (cid:17) , That is, the algorithm gives O ( √ log n ) approximation when δ and ε are ﬁxed. Theorem 1.8 (Corollary to Theorem B.2 and Theorem 1.6) . There is a randomized polynomial-time algo-rithm for the Small Set Vertex Expansion problem that given a graph G = ( V, E ) of maximum degree d max ,parameters ε ∈ (0 , and δ ∈ (0 , / ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ V ( S ) ≤ O ε (cid:16)(cid:113) φ VG,δ log d max · δ − log δ − log log δ − + δ − φ VG,δ (cid:17) = ˜ O ε (cid:16) δ − (cid:113) φ VG,δ log d max + δ − φ VG,δ (cid:17) .

3e note that the Small Set Vertex Expansion problem for δ = 1 / is just the Vertex Expansion problem.In that case, Theorem 1.8 gives the same approximation guarantee as the algorithm of Louis, Raghavendraand Vempala (2013). Techniques.

Our general approach to solving H-SSE is similar to the approach of Bansal et al. (2011). Werecall how the algorithm of Bansal et al. (2011) for (graph) SSE works. The algorithm solves a semideﬁniteprogramming relaxation for SSE and gets an SDP solution. The SDP solution assigns a vector ¯ u to eachvertex u . Then the algorithm generates an orthogonal separator . Informally, an orthogonal separator S withdistortion D is a random subset of vertices such that(a) If ¯ u and ¯ v are close to each other then the probability that u and v are separated by S is small; namely,it is at most αD (cid:107) ¯ u − ¯ v (cid:107) , where α is a normalization factor such that Pr ( u ∈ S ) = α (cid:107) ¯ u (cid:107) .(b) If the angle between ¯ u and ¯ v is larger than a certain threshold, then the probability that both u and v are in S is much smaller than the probability that one of them is in S .Bansal et al. (2011) showed that condition (b) together with SDP constraints implies that S is of size at most (1 + ε ) δn with sufﬁciently high probability. Then condition (a) implies that the expected number of cutedges is at most D times the SDP value. That means that S is a D –approximate solution to SSE.If we run this algorithm on an instance of H-SSE, we will still ﬁnd a set of size at most (1 + ε ) δn , butthe cost of the solution might be very high. Indeed, consider a hyperedge e . Even though every two vertices u and v in e are unlikely to be separated by S , at least one pair out of (cid:0) | e | (cid:1) pairs of vertices is quite likely tobe separated by S ; hence, e is quite likely to be cut by S . To deal with this problem, we develop hypergraphorthogonal separators . In the deﬁnition of a hypergraph orthogonal separator, we strengthen condition (a)by requiring that a hyperedge e is cut by S with small probability if all vertices in e are close to each other.Speciﬁcally, we require that Pr ( e is cut by S ) ≤ αD max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . (1)We show that there is a hypergraph orthogonal separator with distortion proportional to √ log n (the distortionalso depends on parameters of the orthogonal separator). Plugging this hypergraph orthogonal separator inthe algortihm of Bansal et al. (2011), we get Theorem 1.3. We also develop another variant of hypergraphorthogonal separators, (cid:96) – (cid:96) orthogonal separators. An (cid:96) – (cid:96) orthogonal separator with (cid:96) –distortion D (cid:96) ( r ) and (cid:96) –distortion D (cid:96) satisﬁes the following condition Pr ( e is cut by S ) ≤ αD (cid:96) ( | e | ) · min w ∈ E (cid:107) ¯ w (cid:107) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + αD (cid:96) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . (2)We show that there is an (cid:96) - (cid:96) hypergraph orthogonal separator whose (cid:96) and (cid:96) distortions do not dependon n (in contrast, there is no hypergraph orthogonal separator whose distortion does not depend on n ). Thisresult yields Theorem 1.4.We now give a brief conceptual overview of our construction of hypergraph orthogonal separators. Weuse the framework developed in (Chlamtac, Makarychev, and Makarychev, 2006, Section 4.3) for (graph)orthogonal separators. For simplicity, we ignore vector normalization steps in this overview; we do notexplain how we take into account vector lengths. Note, however, that these normalization steps are crucial.We ﬁrst design a procedure that partitions the hypergraph into two pieces (the procedure labels every vertexwith either or ). In a sense, each set S in the partition is a “very weak” hypergraph orthogonal separator.It satisﬁes property (1) with D ∼ √ log n log log(1 /δ ) and α = 1 / and a weak variant of property It may look strange that we have two terms in the bound. One may expect that we can either have only term D (cid:96) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) (as in the previous deﬁnition) or only term D (cid:96) ( | e | ) · min w ∈ E (cid:107) ¯ w (cid:107) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . However, the latter is not possible —there is no (cid:96) – (cid:96) separator with D (cid:96) = 0 . (cid:88) e ∈ E max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) (3)subject to: (cid:88) v ∈ V (cid:104) ¯ u, ¯ v (cid:105) ≤ δn · (cid:107) ¯ u (cid:107) for every u ∈ V (4) (cid:88) u ∈ V (cid:107) ¯ u (cid:107) = 1 (5) (cid:107) ¯ u − ¯ v (cid:107) + (cid:107) ¯ v − ¯ w (cid:107) ≥ (cid:107) ¯ u − ¯ w (cid:107) for every u, v, w ∈ V (6) ≤ (cid:104) ¯ u, ¯ v (cid:105) ≤ (cid:107) u (cid:107) for every u, v ∈ V. (7)Figure 1: SDP relaxation for H-SSE(b): if the angle between vectors ¯ u and ¯ v is larger than the threshold then events u ∈ S and v ∈ S are“almost” independent. We repeat the procedure l = log (1 /δ ) + O (1) times and obtain a partition of graphinto l = O (1 /δ ) pieces. Then we randomly choose one set S among them; this set S is our hypergraphorthogonal separator. Note that that by running the procedure many times we decrease exponentially in l theprobability that two vertices, as in condition (b), belong to S . So condition (b) holds for S . Also, we affect thedistortion in (1) in two ways. First, the probability that the edge is cut increases by a factor of l . That is, weget Pr ( e is cut by S ) ≤ l × α D max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Second, the probability that we choose a vertex u goesdown from (cid:107) ¯ u (cid:107) / to Ω( δ ) (cid:107) ¯ u (cid:107) since roughly speaking we choose one set S among O (1 /δ ) possible sets.That is, the parameter α of S is Ω( δ ) . Therefore, Pr ( e is cut by S ) ≤ α ( α lD /α ) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Thatis, we get a hypergraph orthogonal separator with distortion ( α lD /α ) ∼ ˜ O ( δ − √ log n ) . The constructionof (cid:96) orthogonal separators is similar but a bit more technical. Organization.

We present our SDP relaxation and introduce our main technique, hypergraph orthogonalseparators, in Section 2. We describe our ﬁrst algorithm for H-SSE in Section 3, and then describe an algo-rithm that generates hypergraph orthogonal separators in Section 4. We deﬁne (cid:96) – (cid:96) hypergraph orthogonalseparators, give an algorithm that generates them, and then present our second algorithm for H-SSE in Sec-tion A and Section B. Finally, we show a simple SDP integrality gap for H-SSE in Section D. This integralitygap also gives a lower bound on the quality of m -orthogonal separators. We give a proof of Theorem 1.6 inSection C. We use the SDP relaxation for H-SSE shown in Figure 2.1. There is an SDP variable ¯ u for every vertex u ∈ V . Every combinatorial solution S (with | S | ≤ δn ) deﬁnes the corresponding (intended) SDP solution: ¯ u = e √ | S | , if u ∈ S ; ¯ u = 0 , otherwise, where e is a ﬁxed unit vector. It is easy to see that this solutionsatisﬁes all SDP constraints. Note that max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) is equal to / | S | , if e is cut, and to 0, otherwise.Therefore, the objective function equals (cid:88) e ∈ E max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) = (cid:88) e ∈ E cut ( S ) | S | = E cut ( S ) S = φ ( S ) . Thus our SDP for H-SSE is indeed a relaxation. 5 .2 Hypergraph Orthogonal Separators

The main technical tool for proving Theorem 1.3 is hypergraph orthogonal separators . Orthogonal sepa-rators were introduced by Chlamtac, Makarychev, and Makarychev (2006) (see also Bansal et al. (2011),Louis and Makarychev (2014), and Makarychev and Makarychev (2014)) and were previously used forsolving Unique Games and various graph partitioning problems. In this paper, we extend the techniqueof orthogonal separators to hypergraphs and introduce hypergraph orthogonal separators. We then use hy-pergraph orthogonal separators to solve H-SSE. In Section A, we introduce another version of hypergraphorthogonal separators, (cid:96) – (cid:96) hypergraph orthogonal separators, and then use them to prove Theorem 1.4 andTheorem B.2. Deﬁnition 2.1 (Hypergraph Orthogonal Separators) . Let { ¯ u : u ∈ V } be a set of vectors in the unit ball thatsatisfy (cid:96) –triangle inequalities (6) and (7) . We say that a random set S ⊂ V is a hypergraph m -orthogonalseparator with distortion D ≥ , probability scale α > , and separation threshold β ∈ (0 , if it satisﬁesthe following properties.1. For every u ∈ V , Pr( u ∈ S ) = α (cid:107) ¯ u (cid:107) .

2. For every u and v such that (cid:107) ¯ u − ¯ v (cid:107) ≥ β min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) )Pr ( u ∈ S and v ∈ S ) ≤ α min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) m .

3. For every e ⊂ V , Pr ( e is cut by S ) ≤ αD max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . The deﬁnition of a hypergraph m -orthogonal separator is similar to that of a (graph) m -orthogonal separator:a random set S is an m -orthogonal separator if it satisﬁes properties 1, 2, and property (cid:48) , which is property3 restricted to edges e of size 2. (cid:48) . For every ( u, v ) , Pr ( e is cut by S ) ≤ αD (cid:107) ¯ u − ¯ v (cid:107) .In this paper, we design an algorithm that generates a hypergraph m -orthogonal separator with distor-tion O β ( √ log n · m log m log log m ) . We note that the distortion of any hypergraph orthogonal separa-tor must depend on m at least linearly (see Section D). We remark that there are two constructions of(graph) orthogonal separators, “orthogonal separators via (cid:96) ” and “orthogonal separators via (cid:96) ”, with dis-tortions, O β ( √ log n log m ) and O β ( √ log n log m ) , respectively (presented in (Chlamtac, Makarychev, andMakarychev, 2006)). Our construction of hypergraph orthogonal separators uses the framework of orthogo-nal separators via (cid:96) . We prove the following theorem in Section 4. Theorem 2.2.

There is a polynomial-time randomized algorithm that given a set of vertices V , a set of vec-tors { ¯ u } satisfying (cid:96) –triangle inequalities (6) and (7) , parameters m ≥ and β ∈ (0 , , generates a hyper-graph m -orthogonal separator with probability scale α ≥ /n and distortion D = O ( β − m log m log log m ×√ log n ) . In this section, we present our algorithm for Hypergraph Small Set Expansion. Our algorithm uses hyper-graph orthogonal spearators that we describe in Section 4. We use the approach of Bansal et al. (2011). Sup-pose that we are given a polynomial-time algorithm that generates hypergraph m -orthogonal separators withdistortion D ( m, β ) (with probability scale α > / poly( n ) ). We show how to get a D ∗ = 4 D (4 / ( εδ ) , ε/ approximation for H-SSE. 6 heorem 3.1. There is a randomized polynomial-time approximation algorithm for the Hypergraph SmallSet Expansion problem that given a hypergraph H = ( V, E ) , and parameters ε ∈ (0 , and δ ∈ (0 , / ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ ( S ) ≤ D (4 / ( εδ ) , ε/ · φ ∗ H,δ . Proof.

We solve the SDP relaxation for H-SSE and obtain an SDP solution { ¯ u } . Denote the SDP value by sdp-cost . Consider a hypergraph orthogonal separator S with m = 4 / ( εδ ) and β = ε/ . Deﬁne a set S (cid:48) : S (cid:48) = (cid:40) S, if | S | ≤ (1 + ε ) δn, ∅ , otherwise.Clearly, | S (cid:48) | ≤ (1 + ε ) δn . Bansal et al. (2011) showed that Pr( u ∈ S (cid:48) ) ∈ (cid:2) α (cid:107) ¯ u (cid:107) , α (cid:107) ¯ u (cid:107) (cid:3) for every u ∈ V (see also Theorem A.1 in (Makarychev and Makarychev, 2014)). Note that Pr (cid:0) S (cid:48) cuts edge e (cid:1) ≤ Pr ( S cuts edge e ) ≤ αD ∗ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . where D ∗ denotes D (4 / ( εδ ) , ε/ for the sake of brevity. Let Z = | S (cid:48) | − | E cut ( S (cid:48) ) | D ∗ · sdp-cost . We have, E [ Z ] = E (cid:2) | S (cid:48) | (cid:3) − E [ | E cut ( S (cid:48) ) | ]4 D ∗ · sdp-cost ≥ (cid:88) u ∈ V α · (cid:107) ¯ u (cid:107) − (cid:80) e ∈ E αD ∗ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) D ∗ · sdp-cost= α − D ∗ · sdp-cost × αD ∗ sdp-cost = α . Since Z ≤ | S (cid:48) | ≤ (1 + ε ) δn < n (always), by Markov’s inequality, we have Pr (

Z > ≥ α/ (4 n ) andhence Pr (cid:0) | E cut ( S (cid:48) ) | / | S (cid:48) | < D ∗ · sdp-cost (cid:1) ≥ α/ (4 n ) . We sample S independently n/α times and return the ﬁrst set S (cid:48) such that | E cut ( S (cid:48) ) || S (cid:48) | < D ∗ · sdp-cost .This gives a set S (cid:48) such that | S (cid:48) | ≤ (1 + ε ) δn , and φ ( S (cid:48) ) ≤ D ∗ φ ∗ H,δ . The algorithm succeeds (ﬁnds such aset S (cid:48) ) with a constant probability. By repeating the algorithm n times, we can make the success probabilityexponentially close to .In Section 4, we describe how to generate an m -hypergraph orthogonal separator with distortion D = O (cid:0) √ log n × β − m log m log log m (cid:1) . That gives us an algorithm for H-SSE with approximation factor O ε (cid:0) δ − log δ − log log δ − × √ log n (cid:1) . In this section, we present an algorithm that generates a hypergraph m -orthogonal separator. At the highlevel, the algorithm is similar to the algorithm for generating orthogonal separators from Section 4.3 in (Chlam-tac, Makarychev, and Makarychev, 2006). We use a different procedure for generating words W ( u ) (seebelow) and set parameters differently; also the analysis of our algorithm is different.In our algorithm, we use a “normalization” map ϕ from (Chlamtac, Makarychev, and Makarychev, 2006).Map ϕ maps a set { ¯ u } of vectors satisfying (cid:96) –triangle inequalities (6) and (7) to R n . It has the followingproperties.1. For all vertices u , v , w , (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) + (cid:107) ϕ (¯ v ) − ϕ ( ¯ w ) (cid:107) ≥ (cid:107) ϕ (¯ u ) − ϕ ( ¯ w ) (cid:107) .

7. For all nonzero vertices u and v , (cid:104) ϕ (¯ u ) , ϕ (¯ v ) (cid:105) = (cid:104) ¯ u, ¯ v (cid:105) max( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) .

3. In particular, for every ¯ u (cid:54) = 0 , (cid:107) ϕ (¯ u ) (cid:107) = (cid:104) ϕ (¯ u ) , ϕ (¯ u ) (cid:105) = 1 . Also, ϕ (0) = 0 .4. For all non-zero vectors ¯ u and ¯ v , (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≤ (cid:107) ¯ u − ¯ v (cid:107) max( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) . We also use the following theorem of Arora, Lee, and Naor (2005) (see also Arora, Rao, and Vazirani, 2004).

Theorem 4.1 (Arora, Lee, and Naor (2005), Theorem 3.1) . There exist constants C ≥ and p ∈ (0 , / such that for every n unit vectors x u ( u ∈ V ), satisfying (cid:96) –triangle inequalities (6) , and every ∆ > , thefollowing holds. There exists a random subset U of V such that for every u, v ∈ V with (cid:107) x u − x v (cid:107) ≥ ∆ , Pr (cid:16) u ∈ U and d ( v, U ) ≥ ∆ C √ log n (cid:17) ≥ p , where d ( v, U ) = min u ∈ U (cid:107) x u − x v (cid:107) . First we describe an algorithm that randomly assigns each vertex u a symbol, either 0 or 1. Then we usethis algorithm to generate an orthogonal separator. Lemma 4.2.

There is a randomized polynomial-time algorithm that given a ﬁnite set V , unit vectors ϕ (¯ u ) for u ∈ V satisfying (cid:96) -triangle inequalities and a parameter β ∈ (0 , , returns a random assignment ω : V → { , } that satisﬁes the following properties. • For every u and v such that (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β , Pr ( ω ( u ) (cid:54) = ω ( v )) ≥ p, where p > is the constant from Theorem 4.1. • For every set e ⊂ V of size at least 2, Pr ( ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e ) ≤ O ( β − (cid:112) log n max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ) . Proof.

Let U be the random set from Theorem 4.1 for vectors x u = ϕ (¯ u ) and ∆ = β . Choose t ∈ (0 , / ( C √ log n )) uniformly at random. Let ω ( u ) = (cid:40) , if d ( U i , u ) ≤ t, , otherwise . Consider ﬁrst vertices u and v such that (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β . By Theorem 4.1, Pr (cid:18) u ∈ U and d ( v, U ) ≥ ∆ C √ log n (cid:19) ≥ p and Pr (cid:18) v ∈ U and d ( u, U ) ≥ ∆ C √ log n (cid:19) ≥ p. Note that in the former case, when u ∈ U and d ( v, U ) ≥ ∆ C √ log n , we have ω ( u ) = 0 and ω ( v ) = 1 ; in thelatter case, when v ∈ U and d ( u, U ) ≥ ∆ C √ log n , we have ω ( v ) = 0 and ω ( u ) = 1 . Therefore, the probabilitythat ω ( u ) (cid:54) = ω ( v ) is at least p .Now consider a set e ⊂ V of size at least 2. Let τ m = min w ∈ e d ( U, ϕ ( ¯ w )) and τ M = max w ∈ e d ( U, ϕ ( ¯ w )) . We have, τ M − τ m ≤ max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) . Note that if t < τ m then ω ( u ) = 1 for all u ∈ e ; if t ≥ τ M then ω ( u ) = 0 for all u ∈ e . Thus ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e only if t ∈ [ τ m , τ M ) . Since the probabilitydensity of the random variable t is at most C √ log n , we get, Pr ( ∃ u, v ∈ e : ω ( u ) (cid:54) = ω ( v )) ≤ Pr ( t ∈ [ τ m , τ M )) ≤ C √ log n ∆ ( τ M − τ m ) ≤ C √ log nβ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) .

8e now amplify the result of Lemma 4.2.

Lemma 4.3.

There is a randomized polynomial time algorithm that given V , vectors ϕ (¯ u ) and β ∈ (0 , as in Lemma 4.2, and a parameter m ≥ , returns a random assignment ω : V → { , } such that: • For every u and v such that (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β , Pr (˜ ω ( u ) (cid:54) = ˜ ω ( v )) ≥ − m . • For every set e ⊂ V of size at least 2, Pr (˜ ω ( u ) (cid:54) = ˜ ω ( v ) for some u, v ∈ e ) ≤ O ( β − (cid:112) log n · log log m · max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ) . We independently sample K = max (cid:16)(cid:108) log log m − log (1 − p ) (cid:109) , (cid:17) assignments ω , . . . , ω K , and let ˜ ω ( u ) = ω ( u ) ⊕ · · · ⊕ ω K ( u ) (where ⊕ denotes addition modulo 2). It is easy to see that the assignment ˜ ω sat-isﬁes the required properties. We give the proof in Section E.We are now ready to present our algorithm.1. Set l = (cid:100) log m/ (1 − log (1 + 2 / log m )) (cid:101) = log m + O (1) .2. Sample l independent assignments ˜ ω , . . . , ˜ ω l using Lemma 4.3.3. For every vertex u , deﬁne word W ( u ) = ˜ ω ( u ) . . . ˜ ω l ( u ) ∈ { , } l .4. If n ≥ l , pick a word W ∈ { , } l uniformly at random. If n < l , pick a random word W ∈ { , } l so that Pr W ( W = W ( u )) = 1 /n for every u ∈ V . This is possible since the number of distinct wordsconstructed in step 3 is at most n (we may pick a word W not equal to any W ( u ) ).5. Pick r ∈ (0 , uniformly at random.6. Let S = (cid:8) u ∈ V : (cid:107) ¯ u (cid:107) ≥ r and W ( u ) = W (cid:9) . Theorem 4.4.

Random set S is a hypergraph m -orthogonal separator with distortion D = O (cid:0)(cid:112) log n × m log m log log mβ (cid:1) , probability scale α ≥ /n and separation threshold β .Proof. We verify that S satisﬁes properties 1–3 in the deﬁnition of a hypergraph m -orthogonal separatorwith α = max(1 / l , /n ) . Property 1.

We compute the probability that u ∈ S . Observe that u ∈ S if and only if W ( u ) = W and r ≤ (cid:107) ¯ u (cid:107) (these two events are independent). If n ≥ l , the probability that W = W ( u ) is / l since wechoose W uniformly at random from { , } l ; if n < l the probability is /n . That is, Pr ( W = W ( u )) =max(1 / l , /n ) = α . The probability that r ≤ (cid:107) ¯ u (cid:107) is (cid:107) ¯ u (cid:107) . We conclude that property 1 holds. Property 2.

Consider two vertices u and v such that (cid:107) ¯ u − ¯ v (cid:107) ≥ β min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) . Assume without lossof generality that (cid:107) ¯ u (cid:107) ≤ (cid:107) ¯ v (cid:107) . Note that u, v ∈ S if and only if r ≤ (cid:107) ¯ u (cid:107) and W = W ( u ) = W ( v ) .We ﬁrst upper bound the probability that W ( u ) = W ( v ) . We have, (cid:104) ¯ u, ¯ v (cid:105) = (cid:107) ¯ u (cid:107) + (cid:107) ¯ v (cid:107) − (cid:107) ¯ u − ¯ v (cid:107) ≤ (1 − β ) (cid:107) ¯ u (cid:107) + (cid:107) ¯ v (cid:107) ≤ (2 − β ) (cid:107) ¯ v (cid:107) . Therefore, (cid:104) ¯ u, ¯ v (cid:105) / (cid:107) ¯ v (cid:107) ≤ − β . Hence, (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) = 2 − (cid:104) ϕ (¯ u ) , ϕ (¯ v ) (cid:105) = 2 − (cid:104) ¯ u, ¯ v (cid:105) max( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) ≥ β = ∆ . From Lemma 4.3 we get that

Pr (˜ ω i ( u ) (cid:54) = ˜ ω i ( v )) ≥ − m for every i . The probability that W ( u ) = W ( v ) is at most ( + m ) l ≤ /m . We have, Pr ( u ∈ S, v ∈ S ) = Pr (cid:0) r ≤ min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) (cid:1) × Pr ( W = W ( u ) = W ( v ) | W ( u ) = W ( v )) × Pr ( W ( u ) = W ( v )) ≤ min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) × α × (1 /m ) , as required. Property 3.

Let e be an arbitrary subset of V , | e | ≥ . Let ρ m = min w ∈ e (cid:107) ¯ w (cid:107) and ρ M = max w ∈ e (cid:107) ¯ w (cid:107) .Note that ρ M − ρ m = (cid:107) ¯ w (cid:107) − (cid:107) ¯ w (cid:107) ≤ (cid:107) ¯ w − ¯ w (cid:107) ≤ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) , for some w , w ∈ e . Here we used that SDP constraint (7) implies that (cid:107) ¯ w (cid:107) − (cid:107) ¯ w (cid:107) ≤ (cid:107) ¯ w − ¯ w (cid:107) .Let A = (cid:8) u ∈ e : (cid:107) ¯ u (cid:107) ≥ r (cid:9) . Note that S ∩ e = { u ∈ A : W ( u ) = W } . Therefore, if e is cut by S thenone of the following events happens. • Event E : A (cid:54) = e and S ∩ e (cid:54) = ∅ . • Event E : A = e and A ∩ S (cid:54) = ∅ , A ∩ S (cid:54) = A .If E happens then r ∈ [ ρ m , ρ M ] since A (cid:54) = e and A (cid:54) = ∅ . We have, Pr ( E ) ≤ Pr ( r ∈ ( ρ m , ρ M ]) ≤ | ρ M − ρ m | ≤ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . If E happens then (1) r ≤ ρ m (since A = e ) and (2) W ( u ) (cid:54) = W ( v ) for some u, v ∈ e . The probabilitythat r ≤ ρ m is ρ m . We now upper bound the probability that W ( u ) (cid:54) = W ( v ) for some u, v ∈ e . For each i ∈ { , . . . , l } , Pr (˜ ω i ( u ) (cid:54) = ˜ ω i ( v ) for some u, v ∈ e ) ≤ O ( β − (cid:112) log n · log log m ) max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≤ O ( β − (cid:112) log n · log log m ) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) ≤ O ( β − (cid:112) log n · log log m ) × ρ − m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . By the union bound over i ∈ { , . . . , l } , the probability that W ( u ) (cid:54) = W ( v ) for some u, v ∈ e is at most O ( l × β − √ log n · log log m ) × ρ − m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Therefore, Pr ( E ) ≤ ρ m × O ( l × β − (cid:112) log n log log m ) × ρ − m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) ≤ O ( β − (cid:112) log n log m log log m ) × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . We get that the probability that e is cut by S is at most Pr ( E ) + Pr ( E ) ≤ O ( β − (cid:112) log n log m log log m ) × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . For D = O ( β − √ log n log m log log m ) /α we get Pr ( e is cut by S ) ≤ αD max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Note that α ≥ / l ≥ Ω(1 /m ) . Thus D ≤ O ( β − (cid:112) log n m log m log log m ) . eferences S. Arora, and R. Ge. New tools for graph coloring. APPROX 2011.S. Arora, J. R. Lee, and A. Naor. Euclidean distortion and the sparsest cut. STOC 2005.S. Arora, S. Rao, and U. Vazirani. Expander ﬂows, geometric embeddings and graph partitioning. STOC2004.N. Bansal, U. Feige, R. Krauthgamer, K. Makarychev, V. Nagarajan, J. Naor, and R. Schwartz. Min-maxGraph Partitioning and Small Set Expansion. FOCS 2011.U. Catalyurek, and C. Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vec-tor multiplication. IEEE Transactions on Parallel and Distributed Systems 1999.E. Chlamtac, K. Makarychev, and Y. Makarychev. How to Play Unique Games Using Embeddings. FOCS2006.K. Devine, E. Boman, R. Heaphy, R. Bisseling and U. Catalyurek. Parallel hypergraph partitioning forscientiﬁc computing. IPDPS 2006.P. Girard, L. Guiller, C. Landrault and S. Pravossoudovitch. Low power BIST design by hypergraph parti-tioning: methodology and architectures. IEEE Test Conference 2000.G. Karypis, R. Aggarwal, V. Kumar and S. Shekhar. Multilevel hypergraph partitioning: applications inVLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 1999.R. Krauthgamer, J. Naor, and R. Schwartz. Partitioning graphs into balanced components. SODA 2009.A. Louis and K. Makarychev. Approximation Algorithm for Sparsest k -Partitioning. SODA 2014.A. Louis. Hypergraph Markov Operators, Eigenvalues and Approximation Algorithms. Manuscript 2014.A. Louis, P. Raghavendra and S. Vempala. Private Communication. 2012.A. Louis, P. Raghavendra and S. Vempala. The Complexity of Approximating Vertex Expansion. FOCS2013.K. Makarychev and Y. Makarychev. Nonuniform Graph Partitioning with Unrelated Weights. To appear atICALP 2014. Preprint arXiv:1401.0699 [cs.DS].P. Raghavendra and D. Steurer. Graph expansion and the unique games conjecture. STOC 2010.P. Raghavendra, D. Steurer, and P. Tetali. Approximations for the isoperimetric and spectral proﬁle of graphsand related parameters. STOC 2010. 11 (cid:96) – (cid:96) Hypergraph Orthogonal Separators

In this section, we present another variant of hypergraph orthogonal separators, which we call (cid:96) – (cid:96) hyper-graph orthogonal separators. The advantage of (cid:96) – (cid:96) hypergraph orthogonal separators is that their distor-tions do not depend on n (the number of vertices). Then in Section B, we use (cid:96) – (cid:96) hypergraph orthogonalseparators to prove Theorem B.2 (which, in turn, implies Theorem 1.4). Deﬁnition A.1 ( (cid:96) – (cid:96) Hypergraph Orthogonal Separator) . Let { ¯ u : u ∈ V } be a set of vectors in the unitball. We say that a random set S ⊂ V is a (cid:96) – (cid:96) hypergraph m -orthogonal separator with (cid:96) –distortion D (cid:96) : N → R , (cid:96) –distortion D (cid:96) , probability scale α > , and separation threshold β ∈ (0 , if it satisﬁesthe following properties.1. For every u ∈ V , Pr( u ∈ S ) = α (cid:107) ¯ u (cid:107) .

3. For every e ⊂ V , Pr ( e is cut by S ) ≤ αD (cid:96) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + αD (cid:96) ( | e | ) · min w ∈ e (cid:107) barw (cid:107) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . (This deﬁnition differs from Deﬁnition 2.1 only in item 3.) Theorem A.2.

There is a polynomial-time randomized algorithm that given a set of vertices V , a set ofvectors { ¯ u } satisfying (cid:96) –triangle inequalities, and parameters m and β generates an (cid:96) – (cid:96) hypergraph m -orthogonal separator with probability scale α ≥ /n and distortions: D (cid:96) = O ( m ) ,D (cid:96) ( r ) = O ( β − / (cid:112) log r m log m log log m ) . Note that distortions D (cid:96) and D (cid:96) do not depend on n . The algorithm and its analysis are very similar to those in the proof of Theorem 2.2. The only differenceis that we use another procedure to generate random assignments ω : V → { , } . The following lemma isan analog of Lemma 4.2. Lemma A.3.

There is a randomized polynomial time algorithm that given a ﬁnite set V , vectors ϕ (¯ u ) for u ∈ V , satisfying (cid:96) triangle inequalities, and a parameter β ∈ (0 , , returns a random assignment ω : V → { , } that satisﬁes the following properties. • For every set e ⊂ V of size at least 2, Pr ( ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e ) ≤ O ( β − / (cid:112) log | e | ) × max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) . • For every u and v such that (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β , Pr ( ω ( u ) (cid:54) = ω ( v )) ≥ . . roof. We sample a random Gaussian vector g ∼ N (0 , I n ) (each component g i of g is distributed as N (0 , ,all random variables g i are mutually independent). Let N be a Poisson process on R with rate / √ β . Let w ( u ) = 1 if N ( (cid:104) g, u (cid:105) ) is even, and w ( u ) = 0 if N ( (cid:104) g, ϕ (¯ u ) (cid:105) ) is odd. Note that ω ( u ) = ω ( v ) if and only if N ( (cid:104) g, ϕ (¯ u ) (cid:105) ) − N ( (cid:104) g, ϕ (¯ v ) (cid:105) ) is even.Consider a set e ⊂ V of size at least 2. Denote diam( e ) = max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) . Let τ m =min w ∈ e (cid:104) g, ϕ ( ¯ w ) (cid:105) and τ M = max w ∈ e (cid:104) g, ϕ ( ¯ w ) (cid:105) . Note that N ( τ m ) = min w ∈ e N ( (cid:104) g, ϕ ( ¯ w ) (cid:105) ) ,N ( τ M ) = max w ∈ e N ( (cid:104) g, ϕ ( ¯ w ) (cid:105) ) . If all numbers N ( (cid:104) g, ϕ (¯ u ) (cid:105) ) are equal then ω ( u ) = ω ( v ) for all u, v ∈ e . Thus if ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e then N ( (cid:104) g, ϕ (¯ u ) (cid:105) ) (cid:54) = N ( (cid:104) g, ϕ (¯ v ) (cid:105) ) for some u, v ∈ e . In particular, then N ( τ M ) − N ( τ m ) > .Given g , N ( τ M ) − N ( τ m ) is a Poisson random variable with rate ( τ M − τ m ) / √ β . We have, Pr ( ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e | g ) ≤ Pr ( N ( τ M ) − N ( τ m ) > | g )= 1 − e − ( τ M − τ m ) / √ β ≤ β − / ( τ M − τ m ) . Let ξ uv = (cid:104) g, ϕ (¯ u ) (cid:105) − (cid:104) g, ϕ (¯ v ) (cid:105) for u, v ∈ e ( u (cid:54) = v ). Note that ξ uv are Gaussian random variables withmean 0, and Var[ ξ uv ] = Var[ (cid:104) g, ϕ (¯ u ) (cid:105) − (cid:104) g, ϕ (¯ v ) (cid:105) ] = (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≤ diam( e ) Note that the expectation of the maximum of (not necessarily independent) N Gaussian random variableswith standard deviation bounded by σ is O ( √ log N σ ) . We have, E [ τ M − τ m ] = E (cid:20) max u,v ∈ e ( ξ uv ) (cid:21) = O ( (cid:112) log | e | diam( e )) since the total number of random variables ξ uv is | e | ( | e | − . Therefore, Pr ( ω ( u ) (cid:54) = ω ( v ) for some u, v ∈ e ) ≤ β − / E [ τ M − τ m ] = O ( β − / (cid:112) log | e | max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ) . We proved that ω satisﬁes the ﬁrst property. Now we verify that ω satisﬁes the second condition. Considertwo vertices u and v with (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β . Given g , the random variable Z = N ( (cid:104) g, ϕ (¯ u ) (cid:105) ) − N ( (cid:104) g, ϕ (¯ v ) (cid:105) ) has Poisson distribution with rate λ = |(cid:104) g, ϕ (¯ u ) (cid:105) ) − (cid:104) g, ϕ (¯ v ) (cid:105)| / √ β . We have, Pr ( Z is even | g ) = ∞ (cid:88) k =0 Pr ( Z = 2 k | g ) = ∞ (cid:88) k =0 e − λ λ k (2 k )! = 1 + e − λ . Note that λ is the absolute value of a Gaussian random variable with mean 0 and standard deviation σ = (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) / √ β ≥ . Thus Pr ( Z is even ) = E (cid:34) e − σ | γ | (cid:35) , where γ is a standard Gaussian random variable, γ ∼ N (0 , . We have, Pr ( ω ( u ) (cid:54) = ω ( v )) = E (cid:34) − e − σ | γ | (cid:35) ≥ E (cid:34) − e − | γ | (cid:35) ≥ . . (cid:96) – (cid:96) hypergraph orthogonal separators. Theonly difference is that we use the procedure from Lemma A.3 rather than from Lemma 4.2 to generateassignments ω . We obtain a (cid:96) – (cid:96) hypergraph orthogonal separator. Theorem A.4.

Random set S is a hypergraph m -orthogonal separator with distortion D (cid:96) = O ( m ) ,D (cid:96) ( r ) = O ( β − / (cid:112) log r m log m log log m ) , probability scale α ≥ /n and separation threshold β ∈ (0 , .Proof. The proof of the theorem is almost identical to that of Theorem 4.4. We ﬁrst check conditions 1and 2 of (cid:96) – (cid:96) hypergraph orthogonal separators in the same way as we checked conditions 1 and 2 ofhypergraph orthogonal separators in Theorem 4.4. When we verify that property 3 holds, we use boundsfrom Lemma A.3. The only difference is how we upper bound the probability of the event E .If E happens then (1) r ≤ ρ m (since A = e ) and (2) W ( u ) (cid:54) = W ( v ) for some u, v ∈ e . The probabilitythat r ≤ ρ m is ρ m . We upper bound the probability that W ( u ) (cid:54) = W ( v ) for some u, v ∈ e . For each i ∈ { , . . . , l } , Pr (˜ ω i ( u ) (cid:54) = ˜ ω i ( v ) for some u, v ∈ e ) ≤ O ( β − / (cid:112) log | e | log log m ) max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107)≤ O ( β − / (cid:112) log | e | log log m ) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) ≤ O ( β − / (cid:112) log | e | log log m ) × ρ − / m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . By the union bound over i ∈ { , . . . , l } , the probability that W ( u ) (cid:54) = W ( v ) for some u, v ∈ e is at most O ( l × β − / (cid:112) log | e | log log m ) × ρ − / m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Therefore, Pr ( E ) ≤ ρ m × O ( l × β − / (cid:112) log | e | log log m ) × ρ − / m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107)≤ O ( l × β − / (cid:112) log | e | log log m ) × ρ / m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . We get that the probability that e is cut by S is at most Pr ( E ) + Pr ( E ) ≤ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + O ( l × β − / (cid:112) log | e | log log m ) × ρ / m × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107)≤ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + O ( l × β − / (cid:112) log | e | log log m ) × min w ∈ e (cid:107) ¯ w (cid:107) × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . For D (cid:96) = 1 /α and D (cid:96) ( r ) = O ( β − / √ log r log m log log m ) /α , we get Pr ( e is cut by S ) ≤ αD (cid:96) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + αD (cid:96) ( | e | ) · min w ∈ e (cid:107) ¯ w (cid:107) · max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Note that α ≥ / l ≥ Ω(1 /m ) . Thus D (cid:96) = O ( m ) ,D (cid:96) ( r ) = O ( β − / (cid:112) log r m log m log log m ) . Algorithm for Hypergraph Small Set Expansion via (cid:96) – (cid:96) Hypergraph Or-thogonal Separators

In this section, we present another algorithm for Hypergraph Small Set Expansion. The algorithm ﬁnds aset with expansion proportional to (cid:113) φ ∗ G,δ . The proportionality constant depends on degrees of vertices andhyperedge size but not on the graph size. Here, we present our result for arbitrary hypergraphs. The resultfor uniform hypergraphs (Theorem 1.4) stated in the introduction follows from our general result. In orderto state our result for arbitrary graphs, we need the following deﬁnition.

Deﬁnition B.1.

Consider a hypergraph H = ( V, E ) . Suppose that for every edge e we are given a non-emptysubset e ◦ ⊆ e .Let η ( u ) = (cid:88) e : u ∈ e ◦ log | e || e ◦ | ,η max = max u ∈ V η ( u ) . Finally, let η H max be the minimum of η max over all possible choices of subsets e ◦ . Claim B.1. η H max ≤ max u ∈ V (cid:80) e : u ∈ e (log | e | ) / | e | .2. If H is a r -uniform graph with maximum degree d max then η H max ≤ ( d max log r ) /r .3. Suppose that we can choose one vertex in every edge so that no vertex is chosen more than once. Then η H max ≤ log r max , where r max is the size of the largest hyperedge in H .Proof.

1. Let e ◦ = e for every e ∈ E . We have, η H max ≤ max u ∈ V (cid:80) e : u ∈ e (log | e | ) / | e | .2. By item 1, η H max ≤ max u ∈ V (cid:80) e : u ∈ e (log | e | ) / | e | = max u ∈ V (cid:80) e : u ∈ e (log r ) /r = ( d max log r ) /r .3. For every edge e ∈ E , let e ◦ be the set that contains the vertex chosen for e . Then | e ◦ | = 1 and |{ e : u ∈ e ◦ }| ≤ for every u . We have, η H max ≤ max u ∈ V (cid:88) e : u ∈ e ◦ log | e || e ◦ | ≤ max u ∈ V (cid:88) e : u ∈ e ◦ log r max r max . Theorem B.2.

There is a randomized polynomial-time algorithm for the Hypergraph Small Set Expansionproblem that given a hypergraph H = ( V, E ) , and parameters ε ∈ (0 , and δ ∈ (0 , / , ﬁnds a set S ⊂ V of size at most (1 + ε ) δn such that φ ( S ) ≤ O ε (cid:16) δ − log δ − log log δ − (cid:113) η H max · φ ∗ H,δ + δ − φ ∗ H,δ (cid:17) = ˜ O ε (cid:16) δ − (cid:16)(cid:113) η H max φ ∗ H,δ + φ ∗ H,δ (cid:17)(cid:17) , In particular, if H is an r -uniform hypergraph with maximum degree d max , then we have, φ ( S ) ≤ ˜ O ε (cid:32) δ − (cid:32)(cid:114) d max log rr φ ∗ H,δ + φ ∗ H,δ (cid:33)(cid:33) . roof. The proof is similar to that of Theorem 3.1. We solve the SDP relaxation for H-SSE and obtain anSDP solution { ¯ u } . Denote the SDP value by sdp-cost . Consider an (cid:96) – (cid:96) hypergraph orthogonal separator S with m = 4 / ( εδ ) and β = ε/ . Deﬁne a set S (cid:48) : S (cid:48) = (cid:40) S, if | S | ≤ (1 + ε ) δn, ∅ , otherwise.Clearly, | S (cid:48) | ≤ (1 + ε ) δn . As in the proof of Theorem 3.1, Pr( u ∈ S (cid:48) ) ∈ (cid:2) α (cid:107) ¯ u (cid:107) , α (cid:107) ¯ u (cid:107) (cid:3) . Note that Pr (cid:0) S (cid:48) cuts edge e (cid:1) ≤ Pr ( S cuts edge e ) ≤ αD (cid:96) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) + αD (cid:96) ( r ) min w ∈ e (cid:107) ¯ w (cid:107) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) . Let C = α − E [ | E cut ( S (cid:48) ) | ] . Let Z = | S (cid:48) | − | E cut ( S (cid:48) ) | C . We have, E [ Z ] = E (cid:2) | S (cid:48) | (cid:3) − E (cid:20) | E cut ( S (cid:48) ) | C (cid:21) ≥ (cid:88) u ∈ V α · (cid:107) ¯ u (cid:107) = α − α α . Now we upper bound C . Consider the optimal choice of e ◦ for H in the deﬁnition of η H max . C = α − E (cid:2) | E cut ( S (cid:48) ) | (cid:3) ≤ α − (cid:88) e ∈ E Pr ( e is cut by S ) ≤ D (cid:96) (cid:88) e ∈ E max (cid:107) ¯ u − ¯ v (cid:107) + (cid:88) e ∈ E D (cid:96) ( | e | ) min w ∈ e (cid:107) ¯ w (cid:107) max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107)≤ D (cid:96) · sdp-cost + (cid:88) e ∈ E D (cid:96) ( | e | ) (cid:88) w ∈ e ◦ (cid:18) (cid:107) ¯ w (cid:107)| e ◦ | (cid:19) × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107)≤ D (cid:96) · sdp-cost + (cid:88) e ∈ E (cid:88) w ∈ e ◦ D (cid:96) ( | e | ) (cid:107) ¯ w (cid:107) (cid:112) | e ◦ | × max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) (cid:112) | e ◦ | (by Cauchy—Schwarz) ≤ D (cid:96) · sdp-cost + (cid:115)(cid:88) e ∈ E (cid:88) w ∈ e ◦ D (cid:96) ( | e | ) (cid:107) ¯ w (cid:107) | e ◦ | (cid:115)(cid:88) e ∈ E (cid:88) w ∈ e ◦ max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) | e ◦ |≤ D (cid:96) · sdp-cost + (cid:115) (cid:88) w ∈ V (cid:88) e : w ∈ e ◦ D (cid:96) ( | e | ) | e ◦ | (cid:107) ¯ w (cid:107) (cid:112) sdp-cost . For every vertex w , (cid:88) e : w ∈ e ◦ D (cid:96) ( | e | ) | e ◦ | ≤ O β ( m log m log log m ) (cid:88) e : w ∈ e ◦ log | e || e ◦ | ≤ O β ( m log m log log m ) × η H max . and (cid:80) w ∈ V (cid:107) ¯ w (cid:107) = 1 . Therefore, C ≤ O β (cid:18) m sdp-cost + m log m log log m (cid:113) η H max · sdp-cost (cid:19) . By the argument from Theorem 3.1, we get that if we sample S (cid:48) sufﬁciently many times (i.e., ( n /α ) times),we will ﬁnd a set S (cid:48) such that | E cut ( S (cid:48) ) || S (cid:48) | ≤ C ≤ O β (cid:18) δ − log δ − log log δ − (cid:113) η H max · sdp-cost + δ − sdp-cost (cid:19) with probability exponentially close to 1. 16 Reduction from Vertex Expansion to Hypergraph Expansion

In the reduction from vertex expansion to hypergraph expansion, we will use the notion of

Symmetric VertexExpansion . For a graph G = ( V, E ) , and for a set S ⊂ V , we deﬁne its internal neighborhood N in ( S ) , andits outer neighborhood N out ( S ) as follows. N in ( S ) = { u ∈ S : ∃ v ∈ ¯ S such that { u, v } ∈ E } N out ( S ) = { u ∈ ¯ S : ∃ v ∈ S such that { u, v } ∈ E } . The symmetric vertex expansion of a set, denoted by Φ V ( S ) , is deﬁned as Φ V ( S ) = | N in ( S ) ∪ N out ( S ) | min( | S | , | ¯ S | ) , Φ VG,δ = min S ⊂ V < | S |≤ δn Φ V ( S ) . We will use the following reduction from vertex expansion to symmetric vertex expansion.

Theorem C.1 (Louis, Raghavendra and Vempala (2013)) . Given a graph G , there exists a graph G (cid:48) suchthat c φ VG,δ ≤ Φ VG (cid:48) ,δ ≤ c φ VG,δ . where c , c > are absolute constants, and the maximum degree of graph G (cid:48) is equal to the maximumdegree of graph G . Moreover, there exists a polynomial time algorithm to compute such graph G (cid:48) . Theorem C.2 (Restatement of Theorem 1.6) . There exist absolute constants c (cid:48) , c (cid:48) ∈ R + such that for everygraph G = ( V, E ) , there exists a polynomial time computable hypergraph H = ( V (cid:48) , E (cid:48) ) such that c (cid:48) φ ∗ H,δ ≤ φ VG,δ ≤ c (cid:48) φ ∗ H,δ , and η H max ≤ log d max .Proof. Starting with graph G , we use Theorem C.1 to obtain a graph G (cid:48) = ( V (cid:48) , E (cid:48) ) such that c φ VG,δ ≤ Φ VG (cid:48) ,δ ≤ c φ VG,δ . (8)Next we construct hypergraph H = ( V (cid:48) , E (cid:48)(cid:48) ) as follows. For every vertex v ∈ V (cid:48) , we add the hyperedge { v } ∪ N out ( { v } ) to E (cid:48)(cid:48) (note that N out ( { v } ) is the set of neighbors of v in G ). Fix an arbitrary set S ⊂ V .We ﬁrst show that Φ V ( S ) ≤ φ H ( S ) . Consider the vertices N in ( S ) . Each vertex in v ∈ N in ( S ) hasa neighbor, say u , in ¯ S . Therefore the hyperedge { v } ∪ N out ( { v } ) is cut by S in H . Similarly, for eachvertex v ∈ N out ( S ) , the hyperedge { v } ∪ N out ( { v } ) is cut by S in H . All these hyperedges are disjoint byconstruction. Therefore, Φ V ( S ) = (cid:12)(cid:12) N in ( S ) (cid:12)(cid:12) + | N out ( S ) || S | ≤ | E cut ( S ) || S | ≤ φ H ( S ) . Now we verify that φ H ( S ) ≤ Φ V ( S ) . For any hyperedge ( { v } ∪ N out ( { v } )) ∈ E cut ( S ) , the vertex v has to belong to either N in ( S ) or N out ( S ) . Therefore, φ H ( S ) ≤ | E cut ( S ) || S | ≤ (cid:12)(cid:12) N in ( S ) (cid:12)(cid:12) + | N out ( S ) || S | = Φ V ( S ) . φ H ( S ) = Φ V ( S ) for every S ⊂ V ,and hence φ ∗ H,δ = Φ VG (cid:48) ,δ We get from (8), c φ VG,δ ≤ φ ∗ H,δ ≤ c φ VG,δ . Finally, we upper bound η H max . We use part 3 of Claim B.1. We choose vertex v in the hyperedge { v } ∪ N out ( { v } ) . By Claim B.1, η H max ≤ log r max , where r max is the size of the largest hyperedge. Notethat | { v } ∪ N out ( { v } ) | = deg v + 1 . Thus η H max ≤ log r max ≤ log ( d max + 1) D SDP Intgrality Gap

In this section, we present an integrality gap for the SDP relaxation for H-SSE. We also give a lower boundon the distortion of a hypergraph m -orthogonal separator. Theorem D.1.

For δ = 1 /r , the integrality gap of the SDP for H-SSE is at least / (2 δ ) = r/ .Proof. Consider a hypergraph H = ( V, E ) on n = r vertices with one hyperedge e = V ( e contains allvertices). Note that the expansion of every set of size δn = 1 is . Thus φ ∗ H,δ = 1 .Consider an SDP solution that assigns vertices mutually orthogonal vectors of length / √ r . It is easy tosee this is a feasible SDP solution. Its value is max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) = 2 /r . Therefore, the SDP integrality gapis at least r/ .Now we give a lower bound on the distortion of hypergraph m -orthogonal separators. Lemma D.2.

For every m > , there is an SDP solution such that every hypergraph m -orthogonal separatorwith separation threshold β ≥ has distortion at least (cid:100) m (cid:101) / .Proof. Consider the SDP solution from Theorem D.1 for n = r = (cid:100) m (cid:101) . Consider a hypergraph m -orthogonal separator S for this solution. Let D be its distortion. Note that condition (2) from the deﬁnitionof hypergraph orthogonal separators applies to any pair of distinct vertices ( u, v ) since (cid:104) ¯ u, ¯ v (cid:105) = 0 .By the inclusion–exclusion principle, we have, Pr ( | S | = 1) ≥ (cid:88) u ∈ S Pr ( u ∈ S ) − (cid:88) u,v ∈ S,u (cid:54) = v Pr ( u ∈ S, v ∈ S ) ≥ (cid:88) u ∈ S α (cid:107) ¯ u (cid:107) − (cid:88) u,v ∈ S,u (cid:54) = v α min( (cid:107) ¯ u (cid:107) , (cid:107) ¯ v (cid:107) ) m = α − αn ( n − mr = α (cid:18) − ( n − m (cid:19) ≥ α/ . On the other hand, if | S | = 1 then S cuts e . We have, Pr ( | S | = 1) ≤ Pr ( S cuts e ) ≤ αD max u,v ∈ e (cid:107) ¯ u − ¯ v (cid:107) = 2 αD/r. We get that α/ ≤ αD/r and thus D ≥ r/ (cid:100) m (cid:101) / .18 Proof of Lemma 4.3

Let K = max (cid:16)(cid:108) log log m − log (1 − p ) (cid:109) , (cid:17) . We independently sample K assignments ω , . . . , ω K . Let ˜ ω ( u ) = ω ( u ) ⊕ · · · ⊕ ω K ( u ) , where ⊕ denotes addition modulo 2. Consider u and v such that (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ≥ β . Let ˜ p = Pr ( ω i ( u ) (cid:54) = ω i ( v )) ≥ p for i ∈ { , . . . , K } (the expression does not depend on the value of i since all ω i are identically distributed). Note that ˜ ω ( u ) (cid:54) =˜ ω ( v ) if and only if ω i ( u ) (cid:54) = ω i ( v ) for an odd number of values i . Therefore, Pr ( ω ( u ) (cid:54) = ω ( v )) = (cid:88) ≤ k ≤ K/ (cid:18) K k + 1 (cid:19) ˜ p k +1 (1 − ˜ p ) K − k − = 1 − (1 − p ) K ≥ − (1 − p ) K ≥ − m . Now let e ⊂ V be a subset of size at least 2. We have, Pr (˜ ω ( u ) (cid:54) = ˜ ω ( v )) ≤ Pr ( ω i ( u ) (cid:54) = ω i ( v ) for some i ) ≤ O ( Kβ − (cid:112) log n max u,v ∈ e (cid:107) ϕ (¯ u ) − ϕ (¯ v ) (cid:107) ) ..