Approximate Hypergraph Vertex Cover and generalized Tuza's conjecture
aa r X i v : . [ c s . D S ] A ug An Algorithmic Study of the Hypergraph Turán Problem
Venkatesan Guruswami ∗ Sai Sandeep † Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA 15213
Abstract
We propose an algorithmic version of the hypergraph Turán problem (AHTP): given a t -uniform hypergraph H = ( V, E ) , the goal is to find the smallest collection of ( t − -elementsubsets of V such that every hyperedge e ∈ E contains one of these subsets. In addition toits inherent combinatorial interest—for instance, the t = 3 case is connected to Tuza’s famousconjecture on covering triangles of a graph with edges—variants of AHTP arise in recentlyproposed reductions to fundamental Euclidean clustering problems.AHTP admits a trivial factor t approximation algorithm as it can be cast as an instance ofvertex cover on a structured t -uniform hypergraph that is a “blown-up” version of H . Our mainresult is an approximation algorithm with ratio t + o ( t ) . The algorithm is based on roundingthe natural LP relaxation using a careful combination of thresholding and color coding.We also present results motivated by structural aspects of the blown-up hypergraph. Theblown-up is a simple hypergraph with hyperedges intersecting in at most one element. We provethat vertex cover on simple t -uniform hypergraphs is as hard to approximate as general t -uniformhypergraphs. The blown-up hypergraph further has many forbidden structures, including a“tent” structure for the case t = 3 . Whether a generalization of Tuza’s conjecture could alsohold for tent-free -uniform hypergraphs was posed in a recent work. We answer this questionin the negative by giving a construction based on combinatorial lines that is tent-free, and yetneeds to include most of the vertices in a vertex cover. ∗ [email protected] . Research supported in part by NSF grant CCF-1908125. † [email protected] . Research supported in part by NSF grants CCF-1563742 and CCF-1908125. Introduction
What is the largest number of edges in a n vertex graph without a copy of the r -clique K r ? Thisquestion is answered by Turán’s theorem [Tur41] that among the graphs with no copy of K r , thegraph obtained by partitioning n into ( r − equal or nearly equal parts and adding all edges acrossthe partitions has the largest number of edges. A hypergraph analog of the above question is thefollowing: what is the largest number of edges in a n vertex -uniform hypergraph without a copyof K (3) , the complete -uniform hypergraph on vertices? Even though Turán’s theorem is oneof the earliest results in extremal combinatorics, the above hypergraph problem is still open.More generally, given a r -uniform hypergraph F , finding the largest number of edges in a r -uniform hypergraph that doesn’t have F as a subhypergraph is referred to as the Hypergraph Turánproblem. We refer the reader to an excellent survey due to Keevash [Kee11] on this topic. In thispaper, we restrict ourselves to the case when F is the complete r -uniform hypergraph on ( r + 1) vertices. In this case, the question can be equivalently posed as finding the minimum size of a family F ⊆ (cid:0) [ n ] t − (cid:1) of subsets of [ n ] with cardinality ( t − such that for every subset S of [ n ] of size t , thereexists a set T ∈ F such that T is a subset of S .The best known upper bound is due to [Sid97]: there exists a family F ⊆ (cid:0) [ n ] t − (cid:1) of size O ( log tt ) (cid:0) nt − (cid:1) such that for every subset S of [ n ] of size t , there exists a subset T ∈ F such that T is contained in S . On the other hand, the lower bound situation is rather dire, with only second-order improvements [CL99,LZ09] over the trivial t (cid:0) nt − (cid:1) lower bound. Towards this, de Cain [dC94]conjectured that t (cid:0) nt − (cid:1) − |F | → ∞ for any such F ⊆ (cid:0) [ n ] t − (cid:1) . In this paper, we study an algorithmicversion of the above problem. Problem 1. ( Algorithmic Hypergraph Turán Problem ( AHTP )) Given a t -uniform hypergraph G =( V = [ n ] , E ) , find the minimum size of a family F ⊆ (cid:0) [ n ] t − (cid:1) of subsets of V of size ( t − such thatfor every hyperedge e ∈ E, there exists T ∈ F such that T is a subset of e . The problem is a generalization of the minimum vertex cover on graphs, which corresponds tothe case when t = 2 . It seems fundamental and is interesting in its own right from a combinatorialperspective. Our main computational motivation comes from recent work on the hardness of clus-tering in Euclidean metrics [CCL20]. In this paper, the authors obtained several hardness results for k -median and related problems on Euclidean metric spaces under a hardness assumption called the Johnson Coverage Hypothesis . In this problem, we are given a t -uniform hypergraph H = ([ n ] , E ) and a positive integer k . The goal is to pick k subsets S , S , . . . , S k ∈ (cid:0) [ n ] t − (cid:1) so as to maximizethe fraction of hyperedges f ∈ E such that there is an i ∈ [ k ] , S i ⊆ f . They conjectured that thiscoverage problem cannot be approximated to a factor better than − e + ǫ for every ǫ > for largeenough t . AHTP is the covering version of the above problem, and we believe understanding thecomputational complexity of
AHTP is a natural and important step towards resolving the JohnsonCoverage Hypothesis.Note that the
AHTP gets harder with increasing t , and thus the problem is NP-hard for every t ≥ . We can view this problem as a special case of the hypergraph vertex cover on the blown-up hypergraph H = G ( t − whose vertices are the set of all ( t − sized subsets that are contained inat least one edge of G , and corresponding to every edge e in G , all the ( t − -sized subsets of e form an edge in H . Note that this blown-up hypergraph is a t -uniform hypergraph as well. Thus,there is a trivial factor t approximation algorithm to the AHTP .The original hypergraph Turán problem corresponds to the case when the input G for the AHTP is the complete t -uniform hypergraph on [ n ] . There is a gap of Ω(log t ) between the current bestupper bound and the best lower bound for the hypergraph Turán problem. Thus, a polynomial1ime algorithm achieving o (log t ) approximation factor for AHTP would imply a breakthrough inthe hypergraph Turán problem, either by improving the state of the art on the lower bound frontor by improving the upper bound of [Sid97].Our main result is that unlike the general hypergraph vertex cover, we can get an improvedapproximation algorithm for
AHTP . Theorem 2.
For every integer t ≥ , there is a randomized polynomial time algorithm for AHTP that on any given t -uniform hypergraph G outputs a vertex cover of H = G ( t − whose expected sizeis at most t + 2 √ t ln t times the optimal solution. We round the standard Linear Programming relaxation to the vertex cover problem on theblown-up hypergraph H = G ( t − = ( V ( H ) , E ( H )) . First, we use thresholding to ensure that theLP solution does not have any variables that are assigned values greater than t . Let S be the set ofvertices of H that are assigned non-zero LP value. The thresholding procedure ensures that everyhyperedge e ∈ E ( H ) intersects with S in at least t vertices. We can bound the cardinality of S from above by t OPT using the dual matching LP, where
OPT is the cost of the optimal LP solution.Thus, our goal is to find an algorithm that outputs a vertex cover with cardinality at most | S | . Weachieve this by a color-coding technique: we randomly assign a color from { , } to each vertex of G independently. Once we assign the colors to the vertices of G , we argue that most of the hyperedgesof G satisfy a certain uniformity property in the sense that each color appears at least . (cid:0) t (cid:1) times in the edge. We then use this uniformity property to find a small vertex cover in H .As mentioned earlier, AHTP is a special case of vertex cover on t -uniform hypergraphs. In fact,the blown-up hypergraph G ( t − satisfies a stronger property: any two edges intersect in at mostone vertex. This is simply because any two distinct t -sized subsets of [ n ] intersect in at most one ( t − -sized subset. A hypergraph in which any two edges intersect in at most one vertex is knownas a simple hypergraph. Simple hypergraphs have been well studied in Graph Theory, especiallyin the context of Erdős-Faber-Lovász conjecture [Erd81, Erd88] and Ryser’s conjecture [FHMW17].A natural question to ask is whether our algorithm can be extended to obtain better approximationalgorithms for vertex cover on simple hypergraphs. We prove that this is not the case, and in fact,vertex cover on simple hypergraphs is as hard as vertex cover on general t -uniform hypergraphs. Theorem 3. (Vertex cover on simple hypergraphs) For every ǫ > , Unless NP ⊆ BPP , no polyno-mial time algorithm can approximate vertex cover on simple t -uniform hypergraphs within a factorof t − − ǫ . The blown-up hypergraph is also studied in a recent work [AZ20] on a generalization of Tuza’sconjecture. Tuza’s conjecture [Tuz81, Tuz90] states that τ ( H ) ≤ ν ( H ) where τ ( H ) , ν ( H ) areminimum vertex cover and the maximum matching respectively of the -uniform hypergraph H obtained with edges of G as the vertices, and triangles of G as the edges of H . Aharoni andZerbib [AZ20] conjectured that more generally, for any t -uniform hypergraph H , the minimum vertexcover τ ( H ′ ) of H ′ = H ( t − is at most (cid:6) t +12 (cid:7) times that of the maximum matching ν ( H ′ ) . Tuza’sconjecture is a special case of their conjecture when t = 3 and H has hyperedges corresponding tothe triangles in a graph.Krivelevich [Kri95] proved the fractional version of Tuza’s conjecture that τ ( G (2) ) ≤ τ ∗ ( G (2) ) for any -uniform hypergraph G , where τ ∗ ( H ) denotes the minimum fractional vertex cover of ahypergraph H . Note that ν ( H ) ≤ τ ∗ ( H ) ≤ τ ( H ) for any hypergraph H , and thus the fractionalversion is a necessary step towards proving the Tuza’s conjecture. As our algorithm is based on Simple hypergraphs are also referred to as linear hypergraphs. -tentrounding standard LP relaxation, we obtain an upper bound on the integrality gap of the standardLP relaxation of AHTP . In other words, we prove a generalization of [Kri95] to the setting of Aharoniand Zerbib’s conjecture:
Corollary 4.
For any t -uniform hypergraph G , τ ( H ) ≤ (cid:18) t √ t ln t (cid:19) τ ∗ ( H ) where τ and τ ∗ are the minimum vertex cover and minimum fractional vertex cover of the blown-uphypergraph H = G ( t − respectively. As mentioned earlier, a key property of the blown-up hypergraphs G ( t − is that they are simplehypergraphs. In addition, they have more structural properties. One such property is the absenceof “tent” subhypergraphs. Aharoni and Zerbib [AZ20] studied this property and asked if we canprove a generalization of Tuza’s conjecture to hypergraphs without tents. In this paper, we answerthis problem in negative and prove that there are hypergraphs on n vertices without tents where τ = (1 − o (1)) n . Theorem 5.
For every ǫ > , there exists a -uniform hypergraph H without a tent such that τ ( H ) ≥ (3 − ǫ ) ν ( H ) . Our counterexample is the hypergraph H with vertex set [3] n for large enough n and edges beingthe set of combinatorial lines. By the density Hales Jewett Theorem [FK91,Pol12], there is no largeindependent set in H , and using the structure of combinatorial lines, we can prove that H does nothave any tent. Related work. t -factor approximation algorithms have been obtained for the vertex cover prob-lem on several other families of t -uniform hypergraphs: Lovász [Lov75] gave an algorithm to roundthe natural LP relaxation to get a t -factor approximation algorithm for vertex cover on t -uniform t -partite hypergraphs. This algorithm is shown to be optimal under the Unique Games Conjectureby Guruswami, Sachdeva, and Saket [GSS15] (and an almost matching NP-hardness is also shown).Aharoni, Holzman, and Krivelevich [AHK96] generalized the above algorithmic result to other classof hypergraphs which have a partition of vertices obeying certain properties. A factor t approxi-mation algorithm has also been obtained on subdense regular t -uniform hypergraphs [CKSV12]. Outline.
In Section 2, we introduce some notation and definitions. In Section 3, we describe ouralgorithm and prove Theorem 2. Then, in Section 4, we consider simple hypergraphs and prove The-orem 3. Finally, in Section 5, we study the structural properties of the blown-up hypergraphs.3
Preliminaries
Notation.
We use [ n ] to denote the set { , , . . . , n } . We use Z n to denote the set { , , . . . , n − } .For a set S and an integer ≤ k ≤ | S | , we use (cid:0) Sk (cid:1) to denote the family of all the k -sized subsets of S . A hypergraph H ′ = ( V ′ , E ′ ) is called a subhypergraph of H = ( V, E ) if V ′ ⊆ V and E ′ ⊆ E ′ .For a hypergraph H = ( V, E ) , we use τ ( H ) , ν ( H ) to denote the size of the minimum vertex coverand the maximum matching respectively. Similarly, we use τ ∗ ( H ) to denote the minimum fractionalvertex cover of H : τ ∗ ( H ) = min (X v ∈ V x v : x v ∈ R ≥ ∀ v ∈ V, X v ∈ e x v ≥ ∀ e ∈ E ) We define the k -blown up hypergraph formally: Definition 6.
For a t -uniform hypergraph G = ( V, E ) and for an integer ≤ k < t , we define the k -blown up hypergraph H = G ( k ) = ( V ′ , E ′ ) as follows:1. The vertex set V ′ ⊆ (cid:0) Vk (cid:1) is the set of all k -sized subsets of V that are contained in an edge of G : V ′ = { U : U ⊆ V, | U | = k, ∃ e ∈ E : U ⊆ e }
2. For every edge e ∈ E , we include in E ′ all the k -sized subsets of e , so that E ′ = (cid:26) e ′ : e ′ = (cid:18) ek (cid:19) , e ∈ E (cid:27) We will need the following Chernoff bound:
Lemma 7. (Multiplicative Chernoff bound) Suppose X , X , . . . , X n are independent random vari-ables taking values in { , } . Let X = X + X + . . . + X n , and let µ = E [ X ] . Then, for any ≤ δ ≤ , Pr ( X ≤ (1 − δ ) µ ) ≤ e − δ µ In this section, we present our algorithm for the
AHTP . Given a t -uniform hypergraph G as an inputto the AHTP , let H = G ( t − be the ( t − -blown-up hypergraph of G . We first prove a lemma that in any ( t − -blown-up hypergraph H = ([ n ] , E ) , there is a vertex coverof size at most O ( log tt ) n using a color-coding argument. This lemma illustrates the color-codingidea well, and is also useful later in the context of structural characterization (Conjecture 21) ofthe blown-up hypergraphs. This lemma is not used in the main algorithm, and the reader can skipto Section 3.2 for the algorithm. Lemma 8.
Suppose G = ([ n ] , E ( G )) is a t -uniform hypergraph and H = G ( t − = ( V ( H ) , E ( H )) .Then, there exists a randomized polynomial time algorithm that outputs a vertex cover of H withexpected size at most | V | (cid:0) tt + O (cid:0) t (cid:1)(cid:1) . roof. Our algorithm is based on the color-coding technique used to get upper bounds for thehypergraph Turán problem [KR83,Sid95]. Let P = (cid:6) t −
12 ln t (cid:7) . Color each vertex of G with c : [ n ] → [ P ] uniformly independently at random. For v ∈ V ( H ) and i ∈ [ P ] , let C i ( v ) denote the number ofnodes of v that are colored with i i.e. C i ( v ) := |{ j ∈ v : c ( j ) = i }| We define a function f : V ( H ) → Z P as f ( v ) = C ( v ) + 2 C ( v ) + . . . + ( P − C ( P − ( v ) mod P For an element i ∈ Z P , let f − ( i ) denote the set { v ∈ V ( H ) : f ( v ) = i } . Let p ∈ Z P be suchthat | f − ( p ) | ≤ | f − ( i ) | for all i ∈ Z P . Note that by definition, | f − ( p ) | ≤ | V | P . Let U ⊆ V ( H ) bedefined as follows: U = { v : v ∈ V ( H ) , ∃ i ∈ [ P ] such that C i ( v ) = 0 } We claim that S = f − ( p ) ∪ U is a vertex cover of H . Consider an arbitrary edge e = { v , v , . . . , v t } ∈ E ( H ) . Let the corresponding edge in G be equal to e ( G ) = S j ∈ [ t ] v j = ( u , u , . . . , u t ) ∈ E ( G ) where u , u , . . . , u t are elements of [ n ] . Without loss of generality, let v j = e ( G ) \ { u j } . Fora color i ∈ [ P ] , let C i ( e ) = |{ j ∈ [ t ] : c ( u j ) = i }| . We consider two cases separately:1. First, if there exists a color i ∈ [ P ] such that C i ( e ) = 0 , then for every j ∈ [ t ] , C i ( v j ) = 0 , andthus, for every j ∈ [ t ] , v j ⊆ U , and thus, e ∩ S = φ.
2. Suppose that for every color i ∈ [ P ] , C i ( e ) > . We define f ( e ) ∈ Z P as f ( e ) = C ( e ) + 2 C ( e ) + . . . + ( P − C ( P − ( e ) mod P Note that for every j ∈ [ t ] , we have f ( v j ) = f ( e ) − c ( u j ) mod P As the cardinality of { c ( u ) , c ( u ) , . . . , c ( u t ) } is equal to P , the cardinality of { f ( v ) , f ( v ) , . . . , f ( v t ) } is equal to P as well. Thus, there exists a j ∈ [ t ] such that f ( v j ) = p which implies that v j ∈ S .Thus, our goal is to upper bound the expected value of | S | . Note that P ≤ t − t . By takingunion bound over all the colors, we get E [ U ] ≤ P (cid:18) − P (cid:19) t − | V |≤ t − t e − t | V |≤ (cid:18) t ln t (cid:19) | V | = O (cid:18) t (cid:19) | V | Thus, the expected value of S is at most | f − ( p ) | + E [ | U | ] which is at most (cid:16) tt − + O (cid:0) t (cid:1)(cid:17) | V | . .2 LP rounding based algorithm for AHTP
Consider the standard LP relaxation for vertex cover in H :Minimize X v ∈ V ( H ) x v such that X v ∈ e x v ≥ ∀ e ∈ E ( H ) x v ≥ ∀ v ∈ V ( H ) Let x be an optimal solution to the above Linear Program, and let OPT = P v ∈ V ( H ) x v . In general, OPT could be much smaller than | V ( H ) | , and thus we cannot use Lemma 8 directly. We now de-scribe a randomized algorithm to round the above LP to obtain an integral solution whose expectedsize is at most (cid:16) t + 2 √ t ln t (cid:17) OPT .For ease of notation, let t ′ = t + 2 √ t ln t . Our first step is to round all the variables above acertain threshold to (Algorithm 1). However, we need to do it recursively to ensure that we canbound the optimal value of the remaining instance. Algorithm 1
Recursive thresholding for
AHTP Let γ = t ′ . Let x be an optimal solution of the LP and let V ′ = { v : x v ≥ γ } . Let U = V ′ . while V ′ is non-empty do Delete V ′ from V ( H ) , and delete all the edges e ∈ E ( H ) that contain at least one vertex v ∈ V ′ . Solve the LP with updated H . Update x to be the new LP solution. Update V ′ = { v ∈ V ( H ) : x v ≥ γ } . Update U ← U ∪ V ′ . Output U and the updated H .Let the final updated hypergraph H when Algorithm 1 terminates be denoted by H ′ . Let theoptimal cost of the solution x for the vertex cover on H ′ be denoted by OPT ′ . We prove that thesize of the vertex cover output by the algorithm is not too large: Lemma 9.
When the above recursive thresholding algorithm (Algorithm 1) terminates, we have | U | ≤ t ′ · (cid:0) OPT − OPT ′ (cid:1) .Proof. We will inductively prove the following: after line 6 in the while loop of the algorithm, | U | ≤ t ′ · ( OPT − OPT new ) where OPT new is the cost of the current optimal solution x . Let x ′ is theoptimal solution before deleting V ′ from H . Let OPT old be the cost of the solution x ′ . By inductivehypothesis, we have | U | − | V ′ | ≤ t ′ · ( OPT − OPT old ) .We claim that | V ′ | ≤ t ′ · ( OPT old − OPT new ) . As x is an optimal vertex cover of H , we havethat x ′ restricted to H has cost at least OPT new . This implies that P v ∈ V ′ x ′ v ≥ OPT old − OPT new .As each x ′ v , v ∈ V ′ is at least t ′ , we obtain the required claim.We are now ready to state our main algorithm for the AHTP . The input to the algorithm is a t -uniform hypergraph G , and the output is a vertex cover for the hypergraph H = G ( t − .6 lgorithm 2 Main algorithm Apply Algorithm 1 to obtain U and let H ′ = ( V ( H ′ ) , E ( H ′ )) be the updated H . Let x be anoptimal solution of the vertex cover LP on H ′ with x v ≤ γ for all v ∈ V ( H ′ ) . Let S ⊆ V ( H ′ ) be defined as S = { v : V ( H ′ ) : x v ≥ } . Let δ = q tt − . Color the vertices [ n ] of G using c : [ n ] → { , } uniformly and independently at random. For a vertex v ∈ S and a color i ∈ { , } , let C i ( v ) denote the number of nodes that are coloredwith the color i i.e. ⊲ Recall that S ⊆ V ( H ′ ) ⊆ (cid:0) [ n ] t − (cid:1) . C i ( v ) = |{ j ∈ v : c ( j ) = i }| Let S ′ ⊆ S be defined as the set of vertices in S where the discrepancy between two colors ishigh: S ′ = (cid:26) v ∈ S : ∃ i ∈ { , } : C i ( v ) ≤ (1 − δ ) t − (cid:27) We now define a function f : S → { , } as f ( v ) = C ( v ) mod 2 . For i ∈ { , } , let f − ( i ) denote the set of all the vertices v ∈ S such that f ( v ) = i . Let p ∈ { , } be such that | f − ( p ) | ≤ | f − (1 − p ) | . Let T ⊆ S be defined as T = S ′ ∪ f − ( p ) . Output T ∪ U . We will first prove that Algorithm 2 indeed outputs a valid vertex cover of H . Lemma 10. T ∪ U is a vertex cover of H .Proof. It suffices to prove that T is a vertex cover of H ′ .Consider an arbitrary edge e = ( v , v , . . . , v t ) ∈ E ( H ′ ) corresponding to the edge e ( G ) = ∪ j ∈ [ t ] v j = { u , u , . . . , u t } ∈ E ( G ) . Since x v ≤ γ for all v ∈ V ( H ′ ) , we can deduce that | e ∩ S | ≥ γ = t ′ .Our goal is to show that there exists j ∈ [ t ] such that v j ∈ T . We consider two separate cases:1. If there is a color i ∈ { , } such that there are at most (1 − δ ) t − nodes of color i in e ( G ) ,then for all j ∈ [ t ] , C i ( v j ) ≤ (1 − δ ) t − . Since e ∩ S is non-empty, there exists j ∈ [ t ] such that v j ∈ S . By definition of S ′ , this implies that v j ∈ S ′ as well, and thus e ∩ T = φ .2. Suppose that in the coloring c , both the colors , occur at least (1 − δ ) t − times in e . Let e ′ = e ∩ S and let k = | e ′ | ≥ t ′ . Without loss of generality, let e ′ = { v , v , . . . , v k } . For every7 ∈ [ k ] , let v j = e ( G ) \ { u j } for u j ∈ [ n ] . First, we claim that t − k < (1 − δ ) t − . We have t − k − (1 − δ ) t − ≤ t − t ′ − (1 − δ ) t − t − √ t ln t − − r tt − ! t −
12= 12 (cid:16) t − √ t ln t − ( t −
1) + 2 p ( t −
1) ln t (cid:17) ≤ (cid:16) − √ t ln t (cid:17) < Since each color occurs at least (1 − δ ) t − times in e ( G ) , using the above, we can infer that |{ c ( u ) , c ( u ) , . . . , c ( u k ) }| ≥ . We define the value f ( e ) in the same fashion as we have defined f ( v ) for v ∈ S : For i ∈ { , } ,let C i ( e ) denote the number of nodes j ∈ [ t ] such that c ( u j ) = i , and let f ( e ) = C ( e ) mod 2 .Using this definition, we get f ( v j ) = f ( e ) − c ( u j ) mod 2 ∀ j ∈ [ k ] . As { c ( u ) , c ( u ) , . . . , c ( u k ) } = { , } , we have { f ( v ) , f ( v ) , . . . , f ( v k ) } = { , } as well. Thus,there exists j ∈ [ k ] such that f ( v j ) = p , which proves that v j ∈ f − ( p ) ⊆ T .In order to bound the expected size of the output of the algorithm, we need a couple lemma.First, we prove that the cardinality of S is not too large: Lemma 11.
The cardinality of S is at most t · OPT ′ .Proof. Consider the dual of the vertex cover LP:Maximize X e ∈ E ( H ′ ) y ( e ) such that X e ∋ v y ( e ) ≤ ∀ v ∈ V ( H ′ ) y ( e ) ≥ ∀ e ∈ E ( H ′ ) Let y be an optimal solution to the above matching LP. By LP-duality, we get P e ∈ E ( H ′ ) y e = OPT ′ .Recall that for all v ∈ S , x v = 0 . By the complementary slackness conditions, we get that for all v ∈ S , P e ∋ v y e = 1 . Summing over all v ∈ S , we obtain | S | = X v ∈ S X e ∋ v y e ≤ t X e ∈ E ( H ′ ) y e = t OPT ′ . Note that the expected number of nodes of each color i ∈ { , } in a vertex v = ( u , u , . . . , u t − ) ∈ S is equal to t − . The set S ′ is the set of vertices of S where there is a color that occurs much fewerthan its expected value. We prove that this happens with low probability: Lemma 12.
The expected cardinality of S ′ is at most t | S | . roof. Let v = ( u , u , . . . , u t − ) ∈ S be an arbitrary vertex in S , where u , u , . . . , u t − are elementsof [ n ] . For a color i ∈ { , } , let the random variable X ( i ) denote to the number of nodes j ∈ [ t − such that c ( u j ) = i . We can write X ( i ) = P j ∈ [ t − X ( i, j ) , where X ( i, j ) is the indicator randomvariable of the event that c ( u j ) = i . We have µ = E [ X ( i )] = t − . Using multiplicative Chernoffbound (Lemma 7), we can upper bound the probability that X ( i ) ≤ (1 − δ ) t − byPr (cid:18) X ( i ) ≤ (1 − δ ) t − (cid:19) ≤ e − δ t − By substituting δ = q tt − , we get that the above probability at most t . By applying union boundover the two colors and adding the expectation over all the vertices in S , we obtain the lemma.Finally, we bound the expected size of the output of the algorithm: Lemma 13.
The expected cardinality of T ∪ U is at most (cid:16) t + 2 √ t ln t (cid:17) · OPT .Proof.
Note that by definition, | f − ( p ) | ≤ | S | . We bound the expected size of the output of thealgorithm T ∪ U as E [ | T ∪ U | ] ≤ E [ | T | ] + E [ | U | ] ≤ E [ | S ′ | ] + 12 | S | + E [ | U | ] ≤ (cid:18)
12 + 2 t (cid:19) | S | + E [ | U | ] ( Using Lemma 12 ) ≤ (cid:18) t (cid:19) OPT’ + E [ | U | ] ( Using Lemma 11 ) ≤ (cid:18) t √ t ln t (cid:19) OPT ( Using Lemma 9 ) . Lemma 10 and Lemma 13 together give a proof of Theorem 2.
In this section, we prove Theorem 3. Our hardness result is obtained using a reduction from thegeneral problem of vertex cover on t -uniform hypergraphs. In particular, we use the following result: Theorem 14. ( [DGKR05]) For every constant ǫ > and t ≥ , the following holds: Given a t -uniform hypergraph G = ( V, E ) , it is NP-hard to distinguish between the following cases:1. Completeness: G has a vertex cover of measure ǫt − .2. Soundness: Any subset of V of measure ǫ contains an edge from E . We give a randomized reduction from Theorem 14 to Theorem 3. In particular, we instanti-ate Theorem 14 with ǫ replaced by ǫ ′ = ǫ , and let the resulting hypergraph be denoted by G . Now,given this t -uniform hypergraph G = ( V, E ) , we output a t -uniform hypergraph H = ( V ′ , E ′ ) as fol-lows: Let n = | V | , m = | E | . We have integer parameters B, P depending on ǫ, t, n, m to be set later.The vertex set of H is V ′ = V × [ B ] –we have a cloud of B vertices v , v , . . . , v B in V ′ correspondingto every vertex v ∈ V . For every edge e = ( v , v , . . . , v t ) ∈ E , we pick P edges e , e , . . . , e P with9 i = (( v ) i , ( v ) i , . . . , ( v t ) i ) and add them to E ′ , where for each j ∈ [ t ] and i ∈ [ P ] , ( v j ) i is chosenuniformly and independently at random from ( v j ) , ( v j ) , . . . , ( v j ) B . Thus, so far, we have added mP edges to E ′ .We first upper bound the expected value of the number of pairs of edges in E ′ that intersect inmore than one vertex. Order the edges in E ′ as e , e , . . . , e mP . Let X denote the random variablethat counts the number of pairs of edges in E ′ that intersect in more than one vertex. For every pairof indices i, j ∈ [ mP ] , let the random variable X ij be the indicator variable of the event that theedges e i and e j of E ′ intersect in greater than one vertex. Note that the edges in E correspondingto e i and e j have at most t vertices in common. Thus, the probability that e i and e j intersect in atleast two vertices is upper bounded by (cid:0) t (cid:1) B . Summing over all the pairs i, j , we get E [ X ] ≤ (cid:18) mP (cid:19)(cid:18) t (cid:19) B ≤ m t P B . By Markov’s inequality, with probability at least , X is at most m t P B .We consider all the pairs of edges that intersect in more than one vertex in E ′ , and arbitrarilydelete one of those edges. Let the resulting set of edges be denoted by E ′′ . The final hypergraphresulting in this reduction is H = ( V ′ , E ′′ ) . Note that H is indeed a simple hypergraph. We willprove the following:1. (Completeness) If G has a vertex cover of measure µ , then there is a vertex cover of measure µ in H .2. (Soundness) If every subset of V of measure ǫ ′ contains an edge from E , then with probabilityat least , every subset of V ′ of measure ǫ contains an edge from E ′′ . Completeness. If G has a vertex cover of size µn , then picking all the vertices in V ′ in thecloud corresponding to these vertices ensures that H has a vertex cover of size µnB . Thus, in thecompleteness case, there is a vertex cover of measure µ in H . Soundness.
Suppose that every ǫ ′ measure subset of V contains an edge from E . Our goal is toshow that with probability at least , every ǫ measure subset of V ′ contains an edge from E ′′ . Wefirst prove the following lemma: Lemma 15.
With probability at least over the choice of E ′ , the following holds: For every edge e = ( v , v , . . . , v t ) ∈ E , and every subset S ⊆ V ′ such that for each i ∈ [ t ] , S contains at least ǫ B vertices from { v i , v i , . . . , v Bi } , there exists an edge e ′ ∈ E ′ all of whose vertices are in S .Proof. The probability that there exists an edge e = ( v , v , . . . , v t ) and a subset S which containsat least ǫ B vertices from each cloud and does not contain any edge from E ′ is at most m tB (cid:18) − (cid:16) ǫ (cid:17) t (cid:19) P ≤ m tB − log e ǫtP t ≤ when P = maB where a := a ( t, ǫ ) = t +2 tǫ t .Using the above lemma, we can conclude that with probability at least , X ≤ m t P B =10 m t a and for every edge e ∈ E and every subset S ⊆ V ′ such that for each i ∈ [ t ] , S contains at10east ǫ B vertices from { v i , v i , . . . , v Bi } , there exists an edge e ′ ∈ E ′ all of whose vertices are in S .We claim that this implies that with probability at least , every ǫ measure subset of V ′ containsan edge of E ′′ . Consider an arbitrary subset U ⊆ V ′ such that | U | ≥ ǫnB . We choose B largeenough such that t (10 m t a ) ≤ ǫ nB . Thus, the set of the vertices W in the edges deleted from E ′ to obtain E ′′ has cardinality at most ǫ nB .Let U ′ = U \ W . Note that all the edges in U ′ that are in E ′ are present in E ′′ as well. As U ′ has a measure of at least ǫ in V ′ , for at least ǫ n vertices v in V , U ′ should contain at least ǫ fraction of the vertices in the cloud { v , v , . . . , v B } . Since otherwise, the cardinality of U ′ isat most (cid:0) n − ǫn (cid:1) · ǫB + ǫn · B < ǫnB , a contradiction. By Lemma 15, we can deduce that thereexists an edge e ∈ E ′ all of whose vertices are in U ′ , which implies that the edge e is in E ′′ as well.This proves that in the soundness case, with probability at least , there exists an edge in every ǫ measure subset of V ′ .This completes the proof of Theorem 3. Under the Unique Games Conjecture [Kho02], thehardness of vertex cover in t -uniform hypergraphs can be improved to t − ǫ . We remark that wecan get the same hardness for simple hypergraphs by our reduction. Since
AHTP is the problem of vertex cover on H = G ( t − for a given t -uniform hypergraph G ,an interesting question is to characterize for which t -uniform hypergraph H , there exists another t -uniform hypergraph G such that H = G ( t − . As mentioned in the previous section, one necessarycondition is that the hypergraph H should be simple. However, this is not sufficient—there are sim-ple t -uniform hypergraphs that cannot be written as G ( t − . One possible avenue towards resolvingthis is the following question: Are there a finite set of hypergraphs F such that every hypergraphwithout any subhypergraph from F can be represented as a G ( t − for some hypergraph G ?Aharoni and Zerbib [AZ20] study this question for t = 3 in the context of Tuza’s conjecture. Theypropose a general form of Tuza’s conjecture that τ (cid:0) G (2) (cid:1) ≤ ν (cid:0) G (2) (cid:1) for all -uniform hypergraphs G . They suggested that pinning down the exact structural property of the blown-up hypergraphs H = G (2) that leads to τ ( H ) ≤ ν ( H ) could be a way to resolve the conjecture. A candidatecharacterization they had is the absence of a “tent” which we define below. Definition 16. A t -tent (Figure 1) is a set of four t -uniform edges e , e , e , e such that1. ∩ i =1 e i = φ .2. | e ∩ e i | = 1 for all i ∈ [3] .3. e ∩ e i = e ∩ e j for all i = j ∈ [3] . In [AZ20], the authors ask the following problem:
Problem 17.
Is it true that for every -uniform hypergraph H without a -tent, τ ( H ) ≤ ν ( H ) ? We answer this question in the negative. Our counterexample is a hypergraph with vertex set [3] n for large enough n and the edge set is the set of all combinatorial lines that we formally definebelow: Definition 18. (Combinatorial lines in [3] n ) A set of three distinct vectors u = ( u , u , . . . , u n ) , v =( v , v , . . . , v n ) , w = ( w , w , . . . , w n ) ∈ [3] n forms a combinatorial line if there exists a subset S ⊆ [ n ] such that . For all i ∈ [ n ] \ S , u i = v i = w i .2. There exist three distinct integers u ′ , v ′ , w ′ ∈ [3] such that for all i ∈ S , u i = u ′ , v i = v ′ , w i = w ′ . We will use the following seminal result about combinatorial lines:
Theorem 19. (Density Hales Jewett Theorem [FK91], [Pol12] ) For every positive integer k andevery real number δ > there exists a positive integer DHJ ( k, δ ) such that if n ≥ DHJ ( k, δ ) and A is any subset of [ k ] n of density at least δ , then A contains a combinatorial line. We now prove Theorem 5.
Theorem 5.
For every ǫ > , there exists a -uniform hypergraph H without a -tent such that τ ( H ) > (3 − ǫ ) ν ( H ) .Proof. The hypergraph that we use H = ( V, E ) has V = [3] n for n large enough to be set later, andthe edges are all the combinatorial lines in [3] n . First, we claim that the above defined hypergraphdoes not have a -tent. Suppose for contradiction that there are edges e , e , e , e satisfying theproperties of Definition 16. Let u = ( u , u , . . . , u n ) ∈ e ∩ e , v = ( v , v , . . . , v n ) ∈ e ∈ e , w =( w , w , . . . , w n ) ∈ e ∩ e . Note that e = { u, v, w } . Thus, there exists a subset S ⊆ [ n ] such thatfor all i ∈ [ n ] \ S , u i = v i = w i . Without loss of generality, we can also assume that for all i ∈ S , u i = 1 , v i = 2 , w i = 3 .Let x = ( x , x , . . . , x n ) ∈ e ∩ e ∩ e . Note that { x, u } ⊆ e , { x, v } ⊆ e , { x, w } ⊆ e .Consider an arbitrary element p ∈ S , and without loss of generality, let x p = 1 . Thus, we have that x p = 1 , v p = 2 and both x, v share the combinatorial line e . This implies that there exist a subset S ⊆ [ n ] such that for all i ∈ [ n ] \ S , x i = v i and for all i ∈ S , x i = 1 , v i = 2 . Similarly, thereexists a subset S ⊆ [ n ] such that for all i ∈ [ n ] \ S , x i = w i and for all i ∈ S , x i = 1 , w i = 3 .Note that S ⊆ S . Suppose for contradiction that there exists j ∈ S \ S . Then, we have v j = 2 , x j = 1 . However, since v i = w i for all i ∈ [ n ] \ S , we get that w j = 2 , and thus, j / ∈ S ,which implies that x j = w j = 2 , a contradiction. Thus, S ⊆ S , and similarly S ⊆ S . We can alsoobserve that S = S since in that case, x = u which cannot happen since | e ∩ e | = 1 . By the sameargument on e , we can deduce that S = S . As S is a strict subset of S , there exists j ∈ S \ S .As v i = x i for all i ∈ [ n ] \ S , x j = v j = 2 . As j ∈ S , we have w j = 3 . However, as w j = x j , thisimplies that j ∈ S , which then implies that x j = 1 , a contradiction.Now, we will prove that for large enough n , τ ( H ) > (3 − ǫ ) ν ( H ) . Let N = 3 n . Since thecardinality of V is equal to N , we have ν ( H ) ≤ N . We apply Theorem 19 with k = 3 , δ = ǫ , andset n ≥ DHJ ( k, δ ) . Thus, we can infer that in any subset T ⊆ V of size ǫ N , there exists an edge of H fully contained in T . Thus, we get that τ ( H ) > (1 − ǫ ) N , which gives τ ( H ) > (3 − ǫ ) ν ( H ) .One might wonder if the above combinatorial lines based construction can be used as a coun-terexample to the generalized Tuza’s conjecture of Aharoni and Zerbib [AZ20] that τ ( H ) ≤ ν ( H ) for all -uniform hypergraphs H such that H = G (2) for some G . However, the blown-up hy-pergraphs have stronger structural properties. For example, the below “(2,3)-grid” subhypergraph(Figure 2) is absent in blown-up hypergraphs but is abundant in combinatorial lines.This raises the question of whether there are finite substructures the exclusion of which fullycharacterizes the blown-up hypergraphs. Aharoni and Zerbib [AZ20] informally ask this question,and we make it formal in the below conjecture: 12igure 2: (2 , -grid Conjecture 20. (Subhypergraph characterization of the blown-up hypergraphs) For every t ≥ ,there is a finite family of t -uniform hypergraphs F t such that: A t -uniform hypergraph H can bewritten as G ( t − for another t -uniform hypergraph G if and only if H does not have any subhyper-graph from F t . Similar to Problem 17, Conjecture 20 also implies (using Lemma 8) the following weaker con-jecture on the presence of large independent sets in hypergraphs without certain substructures:
Conjecture 21.
For every t ≥ , there is a finite family of t -uniform hypergraphs F t , and a constant c t < such that:1. For every hypergraph H such that H = G ( t − , H does not contain any subhypergraph from F t .2. For every hypergraph H = ([ n ] , E ) without any subhypergraph from F t , there is a vertex coverof H with cardinality c t n . We believe that either a positive or negative resolution to Conjecture 20 and Conjecture 21would improve our understanding of the blown-up hypergraphs and help in making progress onTuza’s conjecture and
AHTP . In this paper, we introduced an algorithmic version of the hypergraph Turán problem (
AHTP ) for t -uniform hypergraphs and gave a factor t + o ( t ) algorithm for it. Our work also raises severalnatural directions to further explore:1. Finding better algorithms and improved hardness results for AHTP is still wide open. Espe-cially on the hardness front, the best hardness result (for any t ) is the factor hardness ofthe vertex cover problem on graphs, under the Unique Games Conjecture. Towards this, aninteresting open problem is to obtain ω (1) lowerbound on the integrality gap of the standardLP relaxation for AHTP .2. ( t, k ) -version of AHTP : Throughout the paper, we have studied the problem of vertex cover on H = G ( t − for a given t -uniform hypergraph G . An interesting generalization is the problemof vertex cover on H = G ( k ) for a t -uniform hypergraph G , for an arbitrary ≤ k < t . Thecase of k = 1 is the standard vertex cover on t -uniform hypergraphs, and k = t − is the AHTP . Of special interest is the case when k = 2 : finding an o ( t ) factor approximationalgorithm or showing NP-hardness of finding one is an interesting open problem.13. The dual problem to AHTP is the maximum matching problem on t -blown-up hypergraphs.For the general case of the maximum matching problem on t -uniform hypergraphs, also knownas t -set packing, Cygan [Cyg13] gave a local search algorithm that achieves an approximationfactor of t +13 + ǫ for any ǫ > . Can we get better algorithms for the maximum matchingproblem on H = G ( t − ? On the hardness front, by a simple reduction from the independentset problem on graphs with maximum degree t [AKS11, Cha16], it follows that the maximummatching problem on t -uniform simple hypergraphs is NP-hard to approximate better than Ω (cid:16) t log t (cid:17) .As mentioned earlier, the hardness of the coverage version of AHTP , Johnson Coverage Hypothe-sis (JCH) has various implications for fundamental clustering problems in Euclidean metrics. Thecoverage version of the set cover problem, known as Maximum coverage has − e + ǫ hardness whichcan be proved by a simple reduction from the ln n hardness of set cover. Whether such progresstowards JCH can be made assuming some form of hardness of AHTP is an interesting open problem.
Acknowledgment
We thank Vincent Cohen-Addad for telling us about the Johnson Coverage Hypothesis and itsconnection to hardness of clustering and for sharing a copy of [CCL20].
References [AHK96] Ron Aharoni, Ron Holzman, and Michael Krivelevich. On a theorem of Lovász oncovers in tau-partite hypergraphs.
Combinatorica , 16(2):149–174, 1996.[AKS11] Per Austrin, Subhash Khot, and Muli Safra. Inapproximability of vertex cover andindependent set in bounded degree graphs.
Theory Comput. , 7(1):27–43, 2011.[AZ20] Ron Aharoni and Shira Zerbib. A generalization of Tuza’s conjecture.
Journal of GraphTheory , 94(3):445–462, 2020.[CCL20] Vincent Cohen-Addad, Karthik C.S., and Euiwoong Lee. On approximability of k-means, k-median, and k-minsum clustering.
Manuscript , 2020.[Cha16] Siu On Chan. Approximation resistance from pairwise-independent subgroups.
J. ACM ,63(3):27:1–27:32, 2016.[CKSV12] Jean Cardinal, Marek Karpinski, Richard Schmied, and Claus Viehmann. Approxi-mating vertex cover in dense hypergraphs.
Journal of discrete algorithms , 13:67–77,2012.[CL99] Fan Chung and Linyuan Lu. An upper bound for the Turán number t3(n, 4).
Journalof Combinatorial Theory, Series A , 87(2):381–389, 1999.[Cyg13] Marek Cygan. Improved approximation for 3-dimensional matching via bounded path-width local search. In , 2013.14dC94] D. de Caen. The current status of Turán’s problem on hypergraphs.
Extremal Problemsfor Finite Sets , 1991, Bolyai Soc. Math. Stud., Vol. 3, pp. 187–197, János Bolyai Math.Soc., Budapest, 1994.[DGKR05] Irit Dinur, Venkatesan Guruswami, Subhash Khot, and Oded Regev. A new multilay-ered PCP and the hardness of hypergraph vertex cover.
SIAM J. Comput. , 34(5):1129–1146, 2005.[Erd81] Paul Erdős. On the combinatorial problems which I would most like to see solved.
Combinatorica , 1(1):25–42, 1981.[Erd88] Paul Erdös. Problems and results in combinatorial analysis and graph theory.
GraphTheory and Applications, Proceedings of the First Japan Conference on Graph Theoryand Applications , pages 81–92, 1988.[FHMW17] Nevena Francetic, Sarada Herke, Brendan D. McKay, and Ian M. Wanless. On Ryser’sconjecture for linear intersecting multipartite hypergraphs.
Eur. J. Comb. , 61:91–105,2017.[FK91] H. Furstenberg and Y. Katznelson. A density version of the Hales-Jewett theorem.
J.Anal. Math. , 57:64–119, 1991.[GSS15] Venkatesan Guruswami, Sushant Sachdeva, and Rishi Saket. Inapproximability ofminimum vertex cover on k-uniform k-partite hypergraphs.
SIAM J. Discret. Math. ,29(1):36–58, 2015.[Kee11] P Keevash. Hypergraph Turán problems.
Surveys in combinatorics 2011 , 392:83–139,2011.[Kho02] Subhash Khot. Hardness results for approximate hypergraph coloring. In
Proceedingson 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal,Québec, Canada , 2002.[KR83] K. H. Kim and F. W. Roush.
On a problem of Turán , pages 423–425. Birkhäuser Basel,Basel, 1983.[Kri95] Michael Krivelevich. On a conjecture of Tuza about packing and covering of triangles.
Discret. Math. , 142(1-3):281–286, 1995.[Lov75] L. Lovász. On minmax theorems of combinatorics, Doctoral thesis.
Mathematiki Lapok ,26, 1975.[LZ09] Linyuan Lu and Yi Zhao. An exact result for hypergraphs and upper bounds for theTurán density of K rr +1 . SIAM J. Discret. Math. , 23(3):1324–1334, 2009.[Pol12] D.H.J. Polymath. A new proof of the density Hales-Jewett theorem.
Annals of Math-ematics , 175(3):1283–1327, May 2012.[Sid95] Alexander Sidorenko. What we know and what we do not know about Turán numbers.
Graphs Comb. , 11(2):179–199, 1995.[Sid97] Alexander Sidorenko. Upper bounds for Turán numbers.
J. Comb. Theory, Ser. A ,77(1):134–147, 1997. 15Tur41] Paul Turán. On an extremal problem in graph theory.
Mat. Fiz. Lapok , 48:436–452,1941.[Tuz81] Zsolt Tuza. Conjecture.
Finite and Infinite Sets , Proc. Colloq. Math. Soc. JanosBolyai:888, 1981.[Tuz90] Zsolt Tuza. A conjecture on triangles of graphs.