[PDF] On Approximability of Clustering Problems Without Candidate Centers

Abstract

The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: ∙ Continuous k-median to a factor of 2−o(1) ; this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms '99). ∙ Continuous k-means to a factor of 4−o(1) ; this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms '99). ∙ k-minsum to a factor of 1.415 ; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).

Full PDF

aa r X i v : . [ c s . CC ] O c t On Approximability of Clustering ProblemsWithout Candidate Centers

Vincent Cohen-Addad * Karthik C. S. † Euiwoong Lee ‡ Abstract

The k - means objective is arguably the most widely-used cost function for modeling clus-tering tasks in a metric space. In practice and historically, k - means is thought of in a con-tinuous setting, namely where the centers can be located anywhere in the metric space. Forexample, the popular Lloyd’s heuristic locates a center at the mean of each cluster.Despite persistent efforts on understanding the approximability of k - means , and otherclassic clustering problems such as k - median and k - minsum , our knowledge of the hardnessof approximation factors of these problems remains quite poor. In this paper, we signif-icantly improve upon the hardness of approximation factors known in the literature forthese objectives. We show that if the input lies in a general metric space, it is NP -hard toapproximate:• Continuous k - median to a factor of 2 − o ( ) ; this improves upon the previous inap-proximability factor of 1.36 shown by Guha and Khuller (J. Algorithms ’99).• Continuous k - means to a factor of 4 − o ( ) ; this improves upon the previous inapprox-imability factor of 2.10 shown by Guha and Khuller (J. Algorithms ’99).• k - minsum to a factor of 1.415; this improves upon the APX -hardness shown by Gu-ruswami and Indyk (SODA ’03).Our results shed new and perhaps counter-intuitive light on the differences betweenclustering problems in the continuous setting versus the discrete setting (where the candi-date centers are given as part of the input). * Google Research, Switzerland. [email protected] . † New York University, USA. [email protected] . ‡ University of Michigan, USA. [email protected] . Introduction

Given a set of points in a metric space, a clustering is a partition of the points such that pointsin the same part are close to each other. This makes clustering a basic, crucial computationalproblem for a variety of applications, ranging from unsupervised learning, to information re-trieval, and even arching over bioinformatics. The most popular clustering problem (in metricspaces) is arguably the k - means problem: Given a set of points P in a metric space, the k - means problem asks to identify a set of k representatives , called centers , such that the sum of the squareddistances from each point to its closest center is minimized (for the k - median problem, the goalis to minimize the sum of distances, not squared) – see Section 2 for formal deﬁnitions. Find-ing efﬁcient algorithms that produce good solutions with respect to the k - means or k - median objectives has been a major challenge over the last 40 years.From a theoretical standpoint, the picture is rather frustrating: the hardness of approxi-mation for k - means and k - median remain quite far from the approximation that the best knownefﬁcient algorithms achieve. In general metrics, the k - median and k - means problems are knownto be hard to approximate within a factor of 1.73 and 3.94 respectively [GK99], whereas thebest known approximation algorithms achieve an approximation guarantee of 2.67 and 9 re-spectively [BPR +

15, ANSW20].The k - median and k - means problems come in two ﬂavours: continuous , where the set ofcenters can be picked arbitrarily in the metric; and discrete , where the centers have to be pickedfrom a speciﬁc set given as input. While most of the approximation algorithms known focuson the discrete case, algorithms in practice (such as e.g. Lloyd method) often leverage thefreedom on the location of the centers to get empirically good performances. In practice, thecontinuous case is arguably more relevant: when looking for a representative of a set of points,we would like to ﬁnd the best one and not constraint ourself to some speciﬁc set. In fact, forseveral metrics such as edit distance, the problem of computing a “good representative” of aset of arbitrary strings (i.e.: a string whose sum of distances to the other strings is minimized)is a well-strudied problem in itself.At a ﬁrst glance, it appears that the continuous case is computationally easier than thediscrete case, as it allows the algorithm designer not to be forced to pick from the input set ofcandidate centers. In Euclidean space, an important result of Matousek [Mat00] shows that an α -approximation algorithm for the discrete case of k - means can be used to obtain a ( + ε ) · α -approximation to the continuous case of k - means under the ℓ distance. This suggests that thecontinuous case is somewhat easier than the discrete case in the Euclidean metric . Moreover,the 20-year old hardness results of Guha and Khuller [GK99] of 1 + e and 1 + e for k - median and k - means respectively only apply to the discrete case and the only known boundsfor the continuous setting derived from their approach are 1 + e ≈ + e ≈ k - median and k - means respectively. We thus ask: Can we approximate continuous k- means (resp. continuous k- median )to a factor less than + / e (resp. + / e ) in polynomial time? Another classic clustering objective in the k - minsum problem. Given a set of points in a Note that we know non-trivial inapproximability results for Euclidean k - means and k - median [ACKS15, LSW17,CK19]. k - minsum problem asks for a partition of the points to k parts that minimizesthe sum of the pairwise distances between points in the same part of the partition (see Sec-tion 2 for formal deﬁnition). Compared to k - means and k - median , the fact that the objectivefunction sums over a quadratic number of distances within each cluster favors balanced clus-tering where clusters are of similar sizes. This fundamental clustering problem introduced inthe 70s by Sahni and Gonzalez [SG76], together with the capacitated k - median problem, is oneof the problems for which designing an O ( ) -approximation algorithm or showing that noneexists, for general metric case remains an important open problem.The k - minsum problem has received a large amount of attention over the years [GBH98,Sch00, Ind00, dlVKKR03, CS04, CS10], but the current understanding of k - minsum is worsethan that of k - median and k - means : while no better than O ( log n ) -approximation is known inpolynomial time [BCR01, BFSS19], the best known hardness of approximation factor is ( + ε ) ,due to Guruswami and Indyk [GI03], for some small implicit constant ε >

0. Getting betterhardness of approximation for the k - minsum remains an important open problem. Arguably,the intrinsic continuous nature of the problem – the fact that the hardness must be directlyencoded into the locations of the points – has been one of the most important roadblock for theproblem. Can we show hardness of approximation result for k- minsum for any explicit, non-negligible constant greater than 1?

Technical Barriers.

A well-known framework to obtain hardness of approximation resultsin the general metric for clustering objectives is through a straightforward reduction from theMax k -Coverage or the Set Cover problem. Given an instance of Max k -Coverage that con-sists of a collection of its subsets of some universe, we create a ’point’ for each element of theuniverse and a ’candidate center’, namely a location where it is possible to place a center, foreach set. Then, we deﬁne the distance between a point (corresponding to an element of theuniverse) and a candidate center (corresponding to a set) to be 1 if the set contains the ele-ment and 3 otherwise. This reduction due to Guha and Khuller [GK99] yields lower boundsof 1 + e and 1 + e for the k - median and k - means problems, respectively, in general discretemetric spaces.The reduction of Guha and Khuller [GK99] for k - median in general metrics does not evenrule out PTAS for k - minsum , mainly due to the fact that even in one cluster, the objective func-tion sums over all pairs of points whose edges may come from different sets. To bypass this is-sue, the only known APX -hardness [GI03] starts from a very restricted set system where everyset has 3 elements and only rules out ( + ε ) factor approximation algorithms for some implicitconstant ε >

0. However, reductions form bounded degree set systems are highly restrictiveand one cannot typically hope to prove inapproximability for factors 1 + α , for non-negligible α . One may thus wonder if there are other structured set systems which could be the rightstarting point for proving hardness of approximation results for k - minsum . In fact one may fur-ther wonder if the hard instances of clustering problems as a whole are completely capturedby hard instances of various kinds of set systems or maybe there are other mathematical ob-jects which might be more appropriate to prove improved inapproxiability results for certainclustering problems. 2 .1 Our Results The main contributions of this paper are conceptual. First, we develop an approach to providethe ﬁrst explicit constant inapproximability ratio for the k - minsum problem. En route to provingthe inapproximability of k - minsum , we also prove that the ( − e ) -hardness of approximationfor Max k -Coverage holds, even for set systems of bounded VC dimension — an importantnotion in computational geometry and machine learning. We believe that further study onapproximability of Max k -Coverage restricted to set systems with additional combinatorial andgeometric structures will produce not only interesting results on their own but also have wideapplications. We discuss the details about the result and the technique further in Section 1.1.1.Our second contribution is an insight for proving hardness of approximation results for continuous versions of k - means and k - median in general metrics . In particular, instead of start-ing the reduction from set-cover-type problems we start from coloring problems and yielda surprising result that the complexity of the discrete and continuous versions are signiﬁ-cantly different, but in the counter-intuitive direction — the continous version of the problemis harder to approximate than the discrete version! This is elaborated further in Section 1.1.2. Objective

Continuous k - means Continuous k - median k - minsum Hardness 2.10 [GK99] 1.36 [GK99]

APX -Hard [GI03]

Previous . This paper

Algorithms 36 [KMN +

02] 5.3 [BPR + O ( log n ) [BFSS19] Table 1: State-of-the-art approximability results for clustering objectives without candidatecenters in general metric. The algorithmic results for k - means and k - median , though not ex-plicitly stated in the literature, can be obtained by considering data points as candidate centers(which loses a factor 4 and 2 for k - means and k - median respectively) and running the algorithmsfor the discrete problems cited in the references. k - minsum We state our results on the k - minsum problem. Theorem 1.1 ( k - minsum in ℓ ∞ -metric) . Given n points in O ( log n ) dimensional ℓ ∞ -metric space itis NP -hard (under randomized reductions) to distinguish between the following two cases: • Completeness : The k- minsum objective is at most 1. • Soundness : The k- minsum objective is at least . In order to prove Theorem 1.1, we prove hardness of Max k -Coverage in a specializedset system. Given an instance ( U , E , k ) for Max k -Coverage where U is the universe and E We write the result in this paper for the ℓ ∞ -metric, but the reader should note that there is a Fr´echet embeddingfrom any discrete metric to the ℓ ∞ -metric in high dimensions.

3s a collection of subsets, let the girth of the set system ( U , E ) to be the girth of the incidencebipartite graph; the vertex set is U ∪ E and there is an edge ( u , S ) ∈ U × E if and only if u ∈ S . When the girth of a set system is strictly greater than 4, then no two sets intersect inmore than a single element, so the VC dimension of the set system is also at most 2. Set sys-tems with bounded VC dimensions are known to admit qualitatively better algorithms such as O ( log OPT ) -approximation algorithm for Set Cover [BG95] and an FPT-approximation schemefor Max k -Coverage [BKL12], which cannot exist for general set systems [KLM19, Man20].We prove a hardness result showing that, for polynomial time approximation for Max k -Coverage, having a bounded VC dimension (even a super-constant girth) does not help. Theorem 1.2 (Informal statement of Theorem 3.1) . For any ε > , it is NP -hard (under randomizedreductions) to approximate Max k-Coverage within a factor of ( − e + ε ) even when the set systemhas girth Ω ( log n / log log n ) and maximum degree O ε ( ) . The above result is proved by “lifting” Feige’s optimal hard instances of Max k -Coverage [Fei98]. Given a hard instance of Max k -Coverage without any girth guarantee, wetake the dual set system to view it as a hypergraph vertex coverage problem. For each vertex,we create a cloud of many vertices, and for each hyperedge, we create many random copieswhere each copy contains a random vertex in each cloud.Intuitively, putting too many hyperedges will result in many intersections between hy-peredges, which may create a short cycle. On the other hand, putting too few hyperedges willmake the new instance signiﬁcantly different from the original instance, possibly allowing asmall hitting set that does not reveal the hitting set in the original hypergraph. By appropri-ately choosing the size of cloud and the number of hyperedges and carefully analyzing theprobabilities for both bad events, it can be shown that the hardness is almost preserved whilethe girth becomes large.Given the hardness of Max k -Coverage with large girth, the reduction to k - minsum is sim-ple; given a set system for Max k -Coverage, the instance for k - minsum is given by the graphicmetric where each vertex corresponds to an element and two vertices are connected if the cor-responding elemtns are contained in the same set. If the set system can be partitioned into k sets in the system, the graph can be partitioned into k cliques, so every pair of vertices in thesame cluster are at distance 1 from each other. To analyze the soundness, even though edgeswithin one cluster may come from different sets, the girth Ω ( log n / log log n ) is larger thanthe average cluster size (which is still bounded by O ε ( ) ), so we can argue that most clusters,roughly correspond to only one set of Max k -Coverage. k - means and k - median in General MetricSpace Finally, we state below the inapproximability of k - median and k - means in the continuous casefor the ℓ ∞ -metric, whose factors are even higher than that of [GK99] for k - median and k - means in the discrete case for the ℓ ∞ -metric. By applying Fr´echet embedding, we can embed any discrete metric into the ℓ ∞ -metric, preserving all pairwisedistances. heorem 1.3 (Informal statement of Theorems 5.2, 5.3, and 5.6) . For every constant ε > , thereexists a constant integer k such that, given n points in poly ( n ) dimensional ℓ ∞ -metric space it is NP -hard to approximate: • the k- means objective to within − ε factor. • the k- median objective to within − ε factor.Moreover, the above statement holds for k = (and can be further strengthened to hold for k = byassuming the Unique Game Conjecture). The above result is very surprising as it breaks the more than twenty year old bound of[GK99]. Furthermore it is believed that the bound of [GK99] is indeed tight for the discretecase as there are 1 + e and 1 + e parameterized approximation algorithms for k - median and k - means problems respectively in general metrics [CGK +

19] (note that this is merely anindication that [GK99] might be tight for the discrete case and not a formal conclusion). There-fore this provides morally the ﬁrst separation between the continuous and discrete versions forclustering problems.Further, we show that the bound in Theorem 1.3 is tight for a large range of settings. First,for any constant k , we note that there is a simple 2-approximation algorithm to the continuous k - median problem and a 4-approximation algorithm for the continuous k - means problem in the ℓ ∞ -metric both running in polynomial time. Second, we show that the hardness result with thesame gap cannot hold for much smaller dimensions (see Corollaries 5.11, 5.13 and 5.14).The proof of Theorem 1.3 follows from a new technique to construct clustering probleminputs; instead of starting from set-cover-type problems (as in the framework of [GK99]), westart our reductions from the hard instances of k -coloring (or equivalently on ﬁnding k -disjointindependent sets) in graphs due to [KS12]. In other words, instead of starting from coveringproblems on graphs (like almost all other results in literature) and embedding a pair of verticessharing an edge as points that are close and other vertex pairs far away, we start from thecomplement of cover problems, i.e., the independent set problem and embed a pair of vertices not sharing an edge as points that are close and other vertex pairs far away, leveraging thestronger inapproximability of the independent set problem. The paper is organized as follows. In Section 2, we introduce some notations that are usedthroughout the paper. In Section 3, we prove our hardness of approximation result for Max k -Coverage on instances with large girth (i.e., Theorem 1.2). In Section 4, we prove our hardnessof approximation result for k - minsum objective in general metrics (i.e., Theorem 1.1). In Sec-tion 5, we prove our improved inapproximability results for k - means and k - median in generalmetrics (i.e., Theorem 1.3). 5 Preliminaries

Notations.

For any two points a , b ∈ R d , the distance between them in the ℓ ∞ -metric is de-noted by k a − b k ∞ = max i ∈ [ d ] {| a i − b i |} . Let e i denote the vector which is 1 on coordinate i and 0everywhere else. We denote by (cid:16) ~ (cid:17) , the vector that is / on all coordinates. Clustering Objectives.

Given two sets of points P and C in a metric space, we deﬁne the k - means cost of P for C to be ∑ p ∈ P (cid:18) min c ∈ C ( dist ( p , c )) (cid:19) and the k - median cost to be ∑ p ∈ P (cid:18) min c ∈ C dist ( p , c ) (cid:19) .Given a set of points P in a metric space and partition π of P into P ˙ ∪ P ˙ ∪ · · · ˙ ∪ P k , we deﬁne the k - minsum cost of P for π to be ∑ i ∈ [ k ] ∑ p , q ∈ P i dist ( p , q ) ! . Given a set of points P , the k - means / k - median (resp. k - minsum ) objective is the minimum over all C (resp. π ) of cardinality k of the k - means / k - median (resp. k - minsum ) cost of P for C (resp. π ). Given a point p ∈ P , the contribu-tion to the k - means (resp. k - median ) cost of p is min c ∈ C ( dist ( p , c )) (resp. min c ∈ C dist ( p , c ) ). k -Coverage with large girth In this section, we prove the following hardness of Max k -Coverage with large girth andbounded degree and then use the hardness result to prove Theorem 4.1 for k - minsum clusteringin the next section. Like k - median [GK99], the result is based on hardness of Max k -Coverage;given an instance ( U , E , k ) of Max k -Coverage, we output the corresponding instance of k - minsum consisting a graph G = ( U , E ′ ) where v , u ∈ U have an edge if and only if there exists S ∈ E that contains both u and v . However, unlike k - median , just the objective function valueof Max k -Coverage does not sufﬁce to prove results for k - minsum . For example, consider aninstance of Max k -Coverage where typical sets are large, but we add a set of size two for eachpair of elements. These sets of size two are small so that it will not affect the Max k -Coverageobjective function, but the outcome of the reduction will be a complete graph! Therefore, weneed to start from hardness of Max k -Coverage in a specialized set system.The proof starts from the standard Max k -Coverage hardness result of Feige [Fei98] thathas no guarantee on girth. Considering the dual set system has a hypergraph, we put manycopies of each vertex and many random copies of each hyperedge. This idea was previouslyused in subgraph hitting sets and constraint satisfaction problems [GL15, GT17]. Theorem 3.1.

For any ε > , given an instance ( U , E , k ) is Max k-Coverage where the incidencegraph has girth Ω ( log n / log log n ) and maximum degree O ε ( ) , it is NP -hard (under randomizedreductions) to distinguish between the following two cases: • Completeness : There exists k sets that cover E. • Soundness : Any k sets cover at most an ( − e + ε ) fraction of E.Proof. We consider the dual set system of the hard instance of Max k -Coverage given by Feige [Fei98]as a regular r -uniform hypergraph H = ( V , E ) , which has n vertices, m hyperedges, and de-gree d (so that nd = mr ). In the completeness case, there is a set S ∗ ⊆ V , | S ∗ | = k = n / r = m / d e ∈ E . In the soundness case, any set | S | ≤ k hits at most ( − e + ε ) -fraction of hyperedges. Feige’s reduction also ensures that this hardness can beachieved with r and d being constants (depending on ε ).The new hypergraph H = ( V , E ) is the following. Let ℓ and B be numbers determinedlater (they will be both Θ ( n ) ).• V = V × [ B ] .• For each e ∈ E , – For each v ∈ e , sample ( j v , . . . , j ℓ , v ) ∈ [ B ] ℓ uniformly from the set of ℓ -tuples whereevery number in [ B ] appears the same number of times (we will ensure B divides ℓ ). – For each i ∈ [ ℓ ] , add { ( v , j i , v ) } v ∈ e to E .• For each simple cycle of the incidence bipartite graph of length at most t (which will beﬁxed later), delete an arbitrary hyperedge in it.Then | V | = | V | · B , | E | ≤ | E | · ℓ . Note that the girth is at least t , and the maximum degree isat most d · Θ ( ℓ / B ) = O ( ) . Girth control.

We bound how many hyperedges we deleted in the last step of the construc-tion. Consider the incidence bipartite graph of the hypergraph; hyperedge vertices are (a sub-set of) E × [ ℓ ] and element vertices are V × [ B ] . Fix a 2 t -tuple (( v , p ) , ( e , q ) , ( v , p ) , ( e , q ) , . . . , ( v t , p t ) , ( e t , q t )) ,where all vertices are different and v i , v i + ∈ e i (and v ∈ e t ). We have n choices for v , andafter that d choices for each e i and r choices for each v i , so the number of such tuples is upperbounded by n · ( dr ) t · ( B ℓ ) t .For each possible edge in the tuple (say (( v i , p i ) , ( e i , q i )) ), the probability that it appears is theprobability that j q i , v i = p i in the above sampling procedure for e i . Since j q i , v i draws from B numbers and we will take t = o ( log n ) ≪ B , this probability, conditioned on existence of anarbitrary set of edges in the tuple, is at most 2/ B . So the expected number of cycles is at most n · ( dr ) t · ( B ℓ ) t · ( B ) t = n · ( dr ) t ( ℓ / B ) t .We will take ℓ = aB for some constant a depending on r and ε . Let B = n . Using Markov’sinequality, with probability at least 3/4, the number of hyperedges we deleted is at most4 n · ( dr ) t ( ℓ / B ) t = n · ( adr ) t = o ( m ℓ ) .as long as t = o ( log n ) . Fix t = Ω ( log n / log log n ) . We can ensure that the girth is at least t with losing only o ( ) fraction of hyperedges. Completeness. If S ⊆ V is a feasible solution for the Max k -Coverage instance (i.e., S inter-sects every e ∈ H ), then S × [ B ] is a feasible solution for the new instance.7 oundness. Fix a hyperedge e ∈ E . For simplicity let us assume e = ( v , . . . , v r ) . Fix C , . . . , C r ⊆ [ B ] , and let α i : = | C i | / B . We want to show that out of ℓ hyperedges in thenew instance coming from e , approximately 1 − ∏ ri = ( − α i ) fraction of hyperedges intersect ∪ ri = ( v i × C i ) . For one such hyperedge, the probability is exactly 1 − ∏ ri = ( − α i ) . The ℓ hyperedges are not independent, but since the distribution is negatively correlated (i.e., if onehyperedge intersects ∪ ri = ( v i × C i ) , other hyperedges are less likely to intersect it.) We can stillapply the Chernoff bound so that the probability that the total number is ε ℓ more than the ex-pectation is at most exp ( − Θ ( ε ℓ )) . Since there are at most 2 Br choices of C , . . . , C r and we let ℓ = aB , with probability at most2 Br · exp ( − Θ ( ε aB )) ≤ exp ( B ( r − Θ ( ε a ))) ,which is exponentially small in B (thus n ) if we take a to be a large constant depending on r and ε . Union bounding over all e ∈ E , we showed that for any S ⊆ V for the new instance with | S | ≤ kB , if we let α v : = | S ∩ ( v × [ B ]) | / B (so that ∑ v α v = k ), then the fraction of hyperedges S intersects in the new instance is at most ε more than the expected fraction of hyperedges hit inthe old instance if we round each v ∈ V independently with probability α v . In the soundnesscase the latter is at most ( − e + ε ) , so with high probability the optimal value in the newinstance is at most ( − e + ε ) .To prove hardness of k - minsum , we additionally need to prove the in the soundness case,no α k sets cover more than an ( − e − α ) fraction of elements for any constant α >

0. The sameconstruction ensures it.

Corollary 3.2.

Theorem 3.1 holds with the following stronger soundness: For any constant α > , • Soundness: Any α k sets cover at most an ( − e − α + ε ) fraction of E.Proof. Guha and Khuller [GK99] proved that the same soundness for general set systems. Theirresult uses a tight (( − ε ) ln n ) -hardness of Set Cover whose reduction took time n O ( log log n ) atthat time, but the running time became polynomial [DS14]. The proof of Theorem 3.1 indeedshows that the maximum fraction of elements covered by any β fraction of sets in the newset system is at most ε plus the same quantity in the original set system, so we can transferthis strong hardness for general set systems to set systems of high girth, up to an additive ε factor. k - minsum in General metric In this section, we use Theorem 3.1 to prove hardness of k - minsum clustering. The reduction issimple; given an instance ( U , E , k ) of Max k -Coverage, we output the corresponding instanceof k - minsum consisting a graph G = ( U , E ′ ) where v , u ∈ U have an edge if and only if thereexists S ∈ E that contains both u and v . Therefore, if each cluster is a clique of G , then eachpairwise distance is 1, and if it is a sparse subgraph of G , then the average pairwise distanceis approximately at least 2. Using the large girth guarantee in Theorem 3.1, we prove that anydense induced subgraph of a certain size must correspond to elements covered by a single set,8o that any good solution for k - minsum implies a good solution for Max k -Coverage. Since theobjective function considers all pairwise distances in each cluster, more technical calculationsare needed to prove a better inapproximability factor. Theorem 4.1 (Restatement of Theorem 1.1) . Given n points in O ( log n ) dimensional ℓ ∞ -metricspace it is NP -hard (under randomized reductions) to distinguish between the following two cases: • Completeness : The k- minsum objective is at most β , • Soundness : The k- minsum objective is at least · β ,where β is some positive real number depending only on n.Proof. Given an instance S of Max k -Coverage promised in Theorem 3.1, where the maxi-mum set size r = O ( ) and the incidence bipartite graph has max degree O ( ) and girth t = Ω ( log n / log log n ) , let G = ( V , E ) be the graph where V consists of elements, and for eachset S , we put a clique on its elements. Since the girth of the set system is at least 2, these cliquesare all edge disjoint. Note that n = ( − o ( )) kr from Theorem 3.1. The instance for k - minsum clustering is the shortest metric on G along with the same k .Indeed, since our analysis only uses distances 1 and 2, we can change all distance greaterthan 2 to 2. Guruswami and Indyk [GI03] showed that any {

1, 2 } -metric where each pointhas only O ( ) other points at distance 1 can be embedded to O ( log n ) -dimensional ℓ ∞ space,which can be applied to our metric because each vertex in G only has O ( ) neighbors. Completeness.

In the completeness case of Theorem 3.1, we can partition G into k cliques,each of size at most r . The clustering cost is then at most k · ( r ) ≤ ( + o ( )) nr /2. Soundness.

Fix V ′ ⊆ V and let n ′ : = | V ′ | . Consider V ′ as one cluster. We will bound the k - minsum cost of V ′ as one cluster. Consider the following cases.1. n ′ ≤ t /2: Consider the set system S ′ induced by (in the bipartite graph sense) V ′ ∪ { S : | S ∩ V ′ | ≥ S ∈ S } . The corresponding bipartite graph is acyclic, so a forest. Let S ′ , . . . , S ′ m ′ be the sets of this restricted system, and let a ′ i : = | S ′ i | . Let r ′ : = max i a ′ i .We want to upper bound ∑ i ( a ′ i ) . For each tree in the forest, root it at an arbitrary elementvertex. For each S ′ i we get ( a ′ i ) = a ′ i ( a ′ i − ) /2. Charge this to its a ′ i − a ′ i /2 each.Since a ′ i ≤ r ′ , every element vertex is charged at most r ′ /2. This shows that ∑ i ( a ′ i ) ≤ r ′ n ′ /2. When r ′ > n ′ /2, using the fact that all other a ′ i ≤ n ′ − r ′ , we have a better boundof ( r ′ ) /2 + ( n ′ − r ′ ) /2. Note that ∑ i ( a ′ i ) is exactly the number of edges in the subgraphof G induced by V ′ . Therefore, the cost of V ′ is at least2 · (cid:18) n ′ (cid:19) − min ( r ′ n ′ /2, ( r ′ ) /2 + ( n ′ − r ′ ) /2 )=( − o r ( )) max (cid:18) (( n ′ ) − n ′ r ′ /2 ) , ( n ′ ) /2 + n ′ r ′ − ( r ′ ) (cid:19) .Here o r ( ) denotes a quantity decreasing to 0 as r increases. By taking r large enough(but still) constant, we can ignore up to an arbitrarily small additive factor in the ﬁnalinapproximability ratio. 9. If n ′ > t /2. Since (the bipartite graph of) S has degree O ( ) , G also has degree O ( ) .Therefore, if V ′ ⊆ V has n ′ = | V ′ | ≥ t /2 = Ω ( log n / log log n ) , the induced graph G V ′ has density at most o ( ) , so the cost is at least ( − o ( ))( n ′ ) .Now we compute the k - minsum cost for a k -clustering. Let V , . . . , V k be a partition of V and let n i : = | V i | . Let r i be the largest clique size in G V i (same as r ′ in the case (1)).Suppose that n i ≥ t /2 for each i ∈ [ ℓ ] . The total cost from these ℓ clusters is at least ( − o ( )) ∑ ℓ i = ( n i ) . If ∑ ℓ i = n i = Ω ( n / log log n ) , since t = Ω ( log n / log log n )( − o ( )) ℓ ∑ i = (cid:18) n i (cid:19) ≥ ( − o ( )) · t − · ℓ ∑ i = n i = Ω ( n log n / ( log log n ) ) ,which is superconstant times larger than the cost ( + o ( )) nr /2 in the completeness case.Therefore, we can conclude that clusters of size at least t /2 cover at most an o ( ) fraction ofvertices, so up to an ( − o ( )) factor we can assume that every V i satisﬁes n i ≤ t i /2. Then theabove case 1 is applied for every V i , so the total cost at least (again up to a ( − o r ( )) factor), k ∑ i = max ( n i − r i n i /2, n i /2 + n i r i − r i ) . (1)Let f ( n , r ) : = max ( f ( n , r ) , f ( n , r )) , with f ( n , r ) : = n − rn /2 and f ( n , r ) : = n /2 + nr − r . Note that f ( n , r ) = f ( n , r ) when r = n /2. For any ﬁxed n i , it can be checked that f ( n i , r i ) is decreasing in r i . Therefore, (1) is minimized when r i ’s are as large as possible. Sowe can apply Corollary 3.2 and assume that the worst case for (1) happens (up to an ( + o ( )) factor) when r i = r · e − i / k .For the sake of exposition, we let n α : = n α k / r , r α : = r α k / r for α ∈ [

0, 1 ] . So (1) becomes k ∑ i = f ( n i , r i ) = kr k ∑ i = (cid:18) f ( n i / k , r i / k ) · ( k ) (cid:19) = ( ± o ( )) nr · Z α = f ( n α , r α ) d α , (2)where we use linear interpolation to extend f to all [

0, 1 ] . Given r ( α ) = e − α , we ﬁnd the best ( n α ) α ∈ [ ] to minimize (2). There are three requirements for ( n α ) .1. n α ≥ r α for all α ∈ [

0, 1 ] .2. R α = n α = t > α ∈ [

0, 1 ] , one of the following must hold, becauseotherwise we can decrease one n α and increase another n α ′ to further decrease (2). Notethat d f ( n , r ) d n = n − r /2 and d f ( n , r ) d n = n + r .• If n α = r α , d f ( n α , r α ) d n α = d f ( n α , r α ) d n α = n α ≥ t .• If n α = r α , d f ( n − α , r α ) d n α = d f ( n α , r α ) d n α = n α ≤ t and d f ( n + α , r α ) d n α = d f ( n α , r α ) d n α = n α ≥ t .• Otherwise, f ( n α , r α ) d n α = t . 10t is easy to see that t <

2, because otherwise n α > α ∈ (

0, 1 ] , violating the condition 2.This implies that t = ( − c ) for some c > n α = r α for all α ∈ [ c ] . (3)Since f ( n , r ) = f ( n , r ) when n ≤ r and f ( n , r ) otherwise, to meet the condition 3, we havethe following conditions.Whenever r α < n α < r α (which implies f ( n α , r α ) = f ( n α , r α ) ), d f ( n α , r α ) n α = n α + r α = t ⇒ n α = ( − c ) − exp ( − α ) . (4)Whenever n α > r α (which implies f ( n α , r α ) = f ( n α , r α ) ), d f ( n α , r α ) n α = n α − r α /2 = t ⇒ n α = exp ( − c ) + exp ( − α ) /4. (5)To meet (3), (4), and (5), n α has to be n α =  exp ( − α ) , α ∈ [ c ] ( − c ) − exp ( − α ) , α ∈ [ c , d ] ( − α ) , α ∈ [ d , d ] exp ( − c ) + exp ( − α ) /4, α ∈ [ d , 1 ] (6)where d = ln ( ) + c and d = ln ( ) + c so that f ( n α , r α ) = f ( n α , r α ) for α ∈ [ c , d ] and f ( n α , r α ) = f ( n α , r α ) for α ∈ [ d , 1 ] . ( f ( n α , r α ) = f ( n α , r α ) when α ∈ [ d , d ] .) Then Z α = n α d α = Z c α = e − α d α + Z d α = c ( e − c − e − α ) d α + Z d α = d e − α d α + Z α = d ( e − c + e − α /4 ) d α = (cid:18) − e − c (cid:19) + (cid:18) ( ) e − c − e − c + e − d (cid:19) + (cid:18) e − d − e − d (cid:19) + (cid:18) ( − d ) e − c + ( e − d − e − ) /4 (cid:19) = − e − /4 + e − c ( ln ( ) − c ) = d and d . This implies e − c ( ln ( ) − c ) = e − /4, which solves to c = ln ( ) − W ( e ) ≈ W ( z ) denotes the real solutionof z = We W . Plugging this value into Z α = f ( n α , r α ) d α = Z d α = f ( n α , r α ) α + Z α = d f ( n α , r α ) α gives ≥ k - minsum cost in the soundness case is at least ( − o ( )) nr .Compared to the cost ( + o ( )) nr in the completeness case, the gap is ≥ Inapproximability of Continuous k - means and k - median in ℓ ∞ -metric In this section, we prove the highest inapproximability factor known for k - means and k - median in literature (in any metric), i.e., we prove Theorem 1.3. The proof relies crucially on the fol-lowing result of Khot and Saket. Theorem 5.1 (Khot and Saket [KS12]) . For any constant ε > , and positive integers t and q suchthat q ≥ t + , given a graph G ( V , E ) , it is NP -hard to distinguish between the following two cases: • Completeness : There are q disjoint independent sets V , . . . , V q ⊆ V, such that for all i ∈ [ q ] we have | V i | = ( − ε ) q · | V | . • Soundness : There is no independent set in G of size q t + · | V | . We are now ready to prove the main result of this section.

Theorem 5.2 ( k - means without candidate centers in n O ( ) dimensional ℓ ∞ -metric space) . Forany constant ε > and any constant α ∈ N , there exists a constant k : = k ( ε , α ) ∈ N , such that givena point-set P ⊂ R m of size n (and m = poly ( n ) ), it is NP -hard to distinguish between the followingtwo cases: • Completeness : There exists C ′ : = { c , . . . , c k } ⊆ R m and σ : P → C ′ such that ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≤ ( + ε ) · n ,• Soundness : For every C ′ : = { c , . . . , c α k } ⊆ R m and every σ : P → C ′ we have: ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≥ ( − ε ) · n . Theorem 5.3 ( k - median without candidate centers in n O ( ) dimensional ℓ ∞ -metric space) . Forany constant ε > and any constant α ∈ N , there exists a constant k : = k ( ε , α ) ∈ N , such that givena point-set P ⊂ R m of size n (and m = poly ( n ) ), it is NP -hard to distinguish between the followingtwo cases: • Completeness : There exists C ′ : = { c , . . . , c k } ⊆ R m and σ : P → C ′ such that ∑ a ∈P k a − σ ( a ) k ∞ ≤ ( + ε ) · n ,• Soundness : For every C ′ : = { c , . . . , c α k } ⊆ R m and every σ : P → C ′ we have: ∑ a ∈P k a − σ ( a ) k ∞ ≥ ( − ε ) · n . Proof of Theorems 5.2 and 5.3.

Fix ε > r = α k and ε ′ : = ε / r .Starting from the hard instance ( G ( V , E ) , q , t , ε ′ ) given in Theorem 5.1, we create an instance ofthe k - means , or of the k - median problem, where k = q (and t = o ( log k ) ), as follows.12 onstruction. The k - median or k - means instance consists of the set of points to be clustered P ⊆ R m of size n (where n = | V | , m = | E | ) which will be deﬁned below. First, we arbitrarilyorient the edges of G (so that for every ( u , v ) ∈ V × V , at most one of ( u , v ) or ( v , u ) is in E ).Then, we will construct function A : V → R m . Given A , the point-set P is just deﬁned to be P : = (cid:8) A ( v ) (cid:12)(cid:12) v ∈ V (cid:9) .For every v ∈ V and every ( u ′ , v ′ ) ∈ E , we deﬁne the ( u ′ , v ′ ) th coordinate of A ( v ) asfollows A ( v ) ( u ′ , v ′ ) : =  v = u ′ − v = v ′ k - means and k - median cost of the instance. Consider the completenesscase ﬁrst. Completeness.

Suppose there are k disjoint independent sets V , . . . , V k ⊆ V , such that forall i ∈ [ k ] we have | V i | = ( − ε ′ ) k · | V | . Then, we partition P into k clusters, say C , . . . , C k , asfollows. For every p ∈ P where p : = A ( v ) for some v ∈ V , if there is some i ∈ [ k ] such that v ∈ V i then we assign p to cluster C i ; otherwise, we assign it to cluster C . Next, we deﬁne thecluster centers C = { c , . . . , c k } ⊆ R m as follows. For every i ∈ [ k ] , and every ( u ′ , v ′ ) ∈ E , the ( u ′ , v ′ ) th coordinate of c i is deﬁned as follows c i ( u ′ , v ′ ) : =  u ′ ∈ V i − v ′ ∈ V i ( u ′ , v ′ ) th coordinate of c i is consistent, as V i is an indepen-dent set and thus both u ′ and v ′ cannot be in V i . For any p ∈ P and any c ∈ C , we have thefollowing upper bound on their distance: k p − c k ∞ ≤

1. (8)Therefore, from (8), the k - means and k - median cost of cluster C i for all i ∈ [ k ] \ { } is exactly | V i | . On the other hand, putting together (7) and (8), the k - means cost of C is upper boundedby: | V | + · | V | − ∑ i ∈ [ k ] | V i | ! ≤ | V | + ε ′ | V | .13imilarly, we have that the k - median cost of C is upper bounded by | V | + ε ′ | V | .Thus, the k - means cost of the overall instance is at most | V | ( + ε ′ ) , while the k - median cost is | V | ( + ε ′ ) . Finally, we turn to the soundness analysis. Soundness.

We have that from the soundness case assumption that every subset S ⊂ V ofsize at least ε ′ | V | is not an independent set in G . Consider any set of centers C ′ = { c , . . . , c r } ⊂ R m that is optimal for the k - median or k - means objective (and let C , . . . , C r be the correspondingpartitioning of P into r clusters). We have the following claim. Claim 5.4.

Let i ∈ [ r ] and V i : = { v ∈ V | A ( v ) ∈ C i } . Then, there are ( | V i |− ε ′ | V | ) / vertex disjointedges in the induced subgraph of V i in G.Proof. Suppose | V i | ≥ ε ′ | V | then there exists an edge in the induced subgraph of V i in G .Remove the two corresponding vertices of the edge from V i . Repeat the above procedure until | V i | < ε ′ | V | . The vertex pairs (which are edges in G ) that were removed would be at least ( | V i |− ε ′ | V | ) / in number.For every i ∈ [ r ] , let E i be the set of vertex disjoint edges guaranteed by the above claim.Fix i ∈ [ r ] . For every e : = ( u ′ , v ′ ) ∈ E i we have: k A ( u ′ ) − c i k ∞ + k A ( v ′ ) − c i k ∞ ≥ k A ( v ′ ) − A ( u ′ ) k ∞ ≥ | A ( u ′ ) e − A ( v ′ ) e | ≥

4. (9)We also have: k A ( u ′ ) − c i k ∞ + k A ( v ′ ) − c i k ∞ ≥ ( A ( u ′ ) e − c i ( e )) + ( A ( v ′ ) e − c i ( e )) ≥ · ( A ( u ′ ) e − A ( v ′ ) e ) ≥

8. (10)Therefore, the optimal solution w.r.t. k - median objective has cost at least: ∑ i ∈ [ r ] ∑ v ∈ V i k A ( v ) − c i k ∞ ≥ ∑ i ∈ [ r ] ∑ ( u ′ , v ′ ) ∈ E i ( k A ( u ′ ) − c i k ∞ + k A ( v ′ ) − c i k ∞ ) ≥ ∑ i ∈ [ r ] ( · | E i | ) ( from (9) ) ≥ ∑ i ∈ [ r ] (cid:0) (cid:0) | V i | − ε ′ | V | (cid:1)(cid:1) ( from Claim 5.4 ) ≥ ( − ε ′ r ) · | V | = ( − ε ) · | V | Similarly, the optimal solution w.r.t. k - means objective has cost at least: ∑ i ∈ [ r ] ∑ v ∈ V i k A ( v ) − c i k ∞ ≥ ∑ i ∈ [ r ] ∑ ( u ′ , v ′ ) ∈ E i ( k A ( u ′ ) − c i k ∞ + k A ( v ′ ) − c i k ∞ ) ∑ i ∈ [ r ] ( · | E i | ) ( from (10) ) ≥ ∑ i ∈ [ r ] (cid:0) (cid:0) | V i | − ε ′ | V | (cid:1)(cid:1) ( from Claim 5.4 ) ≥ ( − ε ′ r ) · | V | = ( − ε ) · | V | To prove that Theorems 5.2 and 5.3 hold even when in the completeness case we have k =

4, we simply start from the below theorem instead of Theorem 5.1.

Theorem 5.5 ([KMS17, DKK + + . For any constant ε > , given agraph G ( V , E ) , it is NP -hard to distinguish between the following two cases: • Completeness : There are disjoint independent sets V , V , V , V ⊆ V, such that | V | = | V | = | V | = | V | = ( − ε ) · | V | . • Soundness : There is no independent set in G of size ε · | V | . We remark that Theorem 1.3 can also be obtained for ℓ p -metrics as p tends to ∞ . Aninteresting variant of Theorems 5.2 and 5.3, is when we restrict that the centers have to bepicked from Z d (where d = poly ( n ) ) instead of allowing to pick them from anywhere in R d .This can be seen as in between the traditional discrete and continuous case, where the size ofthe set of candidate centers is exponential in the number of points to be clustered, but has acompact representation (in this case ﬁxed representation depending only on n ). Surprisingly,for this variant, we show even stronger inapproximability factors of 9 − ε for k - means and 3 − ε for k - median (see Theorems A.1 and A.2 in Appendix A), for any small ε >

0. We prove belowa strengthening of Theorems 5.2 and 5.3 under the unique games conjecture.

Theorem 5.6 (Bi-criteria 2- mean and 2- median without candidate centers in n O ( ) dimensional ℓ ∞ -metric space) . Assuming the unique games conjecture, for any constant ε > , and every constantr ∈ N , given a point-set P ⊂ R m of size n (and m = poly ( n ) ), it is NP -hard to distinguish betweenthe following two cases: • Completeness : There exists C ′ : = { c , c } ⊆ R m and σ : P → C ′ such that ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≤ n resp. ∑ a ∈P k a − σ ( a ) k ∞ ≤ n ! ,• Soundness : For every C ′ : = { c , . . . , c r } ⊆ R m and every σ : P → C ′ we have: ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≥ ( − ε ) · n resp. ∑ a ∈P k a − σ ( a ) k ∞ ≥ ( − ε ) · n ! .The proof simply follows by using the following result of Bansal and Khot instead ofTheorem 5.1. Theorem 5.7 (Bansal and Khot [BK09]) . Assuming the unique games conjecture, for any constant ε > , given a graph G ( V , E ) , it is NP -hard to distinguish between the following two cases: Completeness : There are disjoint independent sets V , V ⊆ V, such that | V | = | V | = ( − ε ) · | V | . • Soundness : There is no independent set in G of size ε · | V | . We now show that the above bound is tight for a large range of settings. First, for any k , thereis an algorithm running in time dn k + that takes as input a set of points in R d and output a2-approximate solution to the continuous k - median problem (and a 4-approximation solutionfor the continuous k - means problem) in the ℓ ∞ -metric (see Fact 5.8). Second, we show how toobtain a ( + ε ) -approximation solution in time ( kd ε − log n ) O ( k ) ( ε ) O ( dk ) + poly ( nd / ε ) (seeCorollary 5.11). Third, we show a ( + ε ) -approximation solution in time O (( ε − kd log n ) O ( k ) +( nd ) O ( ) ) which is ﬁxed parameter tractable when parameterized by k , for any d = O ( log − δ ( n )) ,where δ is a constant less than 1 (see Corollary 5.13). Finally, we provide an ( + e + ε ) -approximate solution in time ( kd ε − log n ) O ( k ) + ( kd ε − log n ) O ( ) ( ε ) O ( d ) + poly ( nd / ε ) whichshows that for the hardness bounds mentioned above, the dependency in d cannot be signiﬁ-cantly improved unless k becomes large (see Corollary 5.14). Fact 5.8.

There exists a 2-approximation algorithm (resp. 4-approximation algorithm) that for anyinstance of the continuous k- median (resp. k- means ) problem consisting of n points P in R d in the ℓ ∞ -metric runs in time dn k + .Proof. Consider an instance of the continuous k - median problem consisting of a set of n pointsin R d (an analogous argument applies to the k - means problem). Consider the solution e S ob-tained from the optimal solution as follows: for each center c i of the optimal solution, pick thepoint p c i of P that is the closest to c i . e S obviously contains at most k centers and so is a valid so-lution. Now, each point p ∈ P whose closest center in the optimal solution is c i has a center thatis no further away than p c i . Since by the choice of p c i we have that || p − c i || ∞ ≥ || p c i − c i || ∞ ,and we have by the triangle inequality || p − p c i || ∞ ≤ || p − c i || ∞ and so e S is at most a 2-approximation.Thus, the algorithm that enumerates all possible k -tuples of P and outputs the one thatinduces the minimum k - median cost achieves a 2-approximation in the above time bound.We then turn to the following fact which states that up to losing a ( + ε ) -factor in the ap-proximation guarantee, one can identify a discrete set of centers of size at most n ( ε ) O ( d ) log n .Given an instance P of the continuous k - median (resp. k - means problem), we deﬁne an ε -approximate candidate center set for P as a set C such that there exists a set of k points of C whose k - median (resp. k - means ) cost is at most ( + ε ) times the cost of the optimal continu-ous k - median (resp. k - means ) clustering. Lemma 5.9.

There exists an algorithm that takes as input an instance P of the continuous k- median (resp. continuous k- means ) in R d and that produces an ε -approximate candidate center set C of size |P | ( ε ) O ( d ) log |P | .Proof. The proof follows from designing approximate candidate center sets (see [Mat00, CL19]for similar results for the ℓ -metric). Let n = |P | . The set of candidate centers C is iteratively16onstructed as follows. Let γ be an estimate of the cost of the optimal solution (which canbe computed in polynomial time using an O ( ) -approximate solution on the discrete versionof the problem where the set of candidate centers is P ; Fact 5.8 guarantees that it is an O ( ) -approximate solution to the continuous version). First start with C = P . Then, for each point p ∈ S , for each 2 i such that εγ / n ≤ i ≤ γ , consider the ball of center p and radius 2 i and pickan ε · i -net in this ball, the size of the net is at most ( ε ) O ( d ) . Add the net to C .The total size of the candidate center set C follows immediately from the deﬁnition. Wethus turn to proving the correctness. Consider the optimal solution and let’s build a solution S ⊆ C of cost at most ( + ε ) times higher. For any center c in the optimal solution, considerthe closest point p c in P and let δ be || p c − c || ∞ . Let e c be the point of C that is the closest to c . By triangle inequality and the deﬁnition of the net, we have that || e c − c || ∞ < εδ . Therefore,applying the triangle inequality, each point in the cluster of c can be assigned to e c at an additivecost increase of εδ . Moreover, since each point of the cluster is at distance at least δ from c , thecost to assign each point in cluster c to e c is no more than ( + ε ) times higher than the cost ofassigning these points to c and so follows the lemma.For proving Corollaries 5.11, 5.13, and 5.14, we will make use of the notion of coreset. A(strong) ε - coreset for a discrete k - median instance of n points P and m candidate centers C is aset of points W with a weight function w : W R + such that for any set of centers S ⊆ C ofsize k , we have: ∑ p ∈ P min s ∈ S dist ( p , s ) = ( ± ε ) ∑ p ∈ W w ( p ) min s ∈ S dist ( p , s ) .We now consider the following lemma from Langberg and Feldman [FL11] andChen [Che09]. Lemma 5.10 ([FL11, Che09] – Restated) . There exists a polynomial-time algorithm that on any in-stance of the discrete k- median problem consisting of n points and m candidate centers, outputs an ε -coreset of size ( k ε − log m ) O ( ) . From there we can deduce the following corollary.

Corollary 5.11.

There exists a 2-approximation algorithm for continuous k- median instances of npoints in R d with running time ( kd ε − log n ) O ( k ) + poly ( nd / ε ) .Proof. The corollary follow from Lemma 5.9 and Lemma 5.10: one can obtain an ε -coreset C of size ( kd ε − log n ) O ( ) of any k - median instance consisting of n points in R d . Hence, byFact 5.8, the best k - median solution whose centers are in C is a ( + ε ) -approximation to theoriginal continuous k - median instance and so, the algorithm that enumerates all k -tuples of C and outputs the one that has minimum k - median cost for the instance achieves a ( + ε ) -approximation in the prescribed time bounds. Corollary 5.12.

There exists an algorithm that on any continuous k- median instance of n points in R d , produces an ε -approximate candidate center set of size ( kd ε − log n ) O ( ) ( ε ) O ( d ) .Proof. The proof follows from applying Lemma 5.10 on the input points and the ε -approximatecandidate center set C described by Lemma 5.9. Then, by observing that the proof of Lemma 5.917lso applies to weighted set of points, one can further reduce the number of candidate centersto a set C ′ of size ( kd ε − log n ) O ( ) ( ε ) O ( d ) . Corollary 5.13.

There exists a ( + ε ) -approximation algorithm with running time ( kd ε − log n ) O ( k ) ( ε ) O ( dk ) + poly ( nd / ε ) . Proof.

The ( + ε ) -approximation algorithm follows from computing the set of candidate cen-ters C ′ prescribed by Corollary 5.12 and enumerating all k -tuples of C ′ and outputting the onewhich induces the smallest k - median cost. Corollary 5.14.

There exists a ( + e + ε ) -approximation algorithm for continuous k- median in-stances of n points in R d with running time ( kd ε − log n ) O ( k ) + ( kd ε − log n ) O ( ) ( ε ) O ( d ) + poly ( nd / ε ) . Proof.

Applying Corollary 5.12, one constructs an instance of the discrete k - median instancewith n = |P | points and m = ( kd ε − log n ) O ( ) ( ε ) O ( d ) candidate centers. Then, one cancompute a ( + e + ε ) -approximation to this instance in time ( k ε − log m log n ) O ( k ) + m usingthe FPT algorithm of [CGK + Acknowledgements

We are truly grateful to Pasin Manurangsi for various detailed discussions that inspired manyof the results in this paper.Karthik C. S. was supported by Irit Dinur’s ERC-CoG grant 772839, the Israel ScienceFoundation (grant number 552/16), the Len Blavatnik and the Blavatnik Family foundation,and Subhash Khot’s Simons Investigator Award. Euiwoong Lee was supported in part by theSimons Collaboration on Algorithms and Geometry.

References [ACKS15] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali KemalSinop. The hardness of approximation of euclidean k-means. In , pages 754–767, 2015.[ANSW20] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Bet-ter guarantees for k-means and euclidean k-median by primal-dual algorithms.

SIAM J. Comput. , 49(4), 2020.[BCR01] Yair Bartal, Moses Charikar, and Danny Raz. Approximating min-sum k -clustering in metric spaces. In Proceedings on 33rd Annual ACM Symposium onTheory of Computing, July 6-8, 2001, Heraklion, Crete, Greece , pages 11–20, 2001.18BFSS19] Babak Behsaz, Zachary Friggstad, Mohammad R. Salavatipour, and RohitSivakumar. Approximation algorithms for min-sum k-clustering and balancedk-median.

Algorithmica , 81(3):1006–1030, Mar 2019.[BG95] Herv´e Br ¨onnimann and Michael T Goodrich. Almost optimal set covers in ﬁnitevc-dimension.

Discrete & Computational Geometry , 14(4):463–479, 1995.[BK09] Nikhil Bansal and Subhash Khot. Optimal long code test with one free bit. In , pages 453–462, 2009.[BKL12] Ashwinkumar Badanidiyuru, Robert Kleinberg, and Hooyeon Lee. Approximat-ing low-dimensional coverage problems. In

Proceedings of the twenty-eighth annualsymposium on Computational geometry , pages 161–170. ACM, 2012.[BKS19] Boaz Barak, Pravesh K. Kothari, and David Steurer. Small-set expansion in short-code graph and the 2-to-2 conjecture. In ,pages 9:1–9:12, 2019.[BPR +

15] Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and KhoaTrinh. An improved approximation for k -median, and positive correlation inbudgeted optimization. In Proceedings of the Twenty-Sixth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6,2015 , pages 737–756, 2015.[CGK +

19] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and JasonLi. Tight FPT approximations for k-median and k-means. In , pages 42:1–42:14, 2019.[Che09] Ke Chen. On coresets for k-median and k-means clustering in metric and eu-clidean spaces and their applications.

SIAM Journal on Computing , 39(3):923–947,2009.[CK19] Vincent Cohen-Addad and Karthik C. S. Inapproximability of clustering in l p -metrics. In ,pages 519–539, 2019.[CL19] Vincent Cohen-Addad and Jason Li. On the ﬁxed-parameter tractability of ca-pacitated clustering. In , pages 41:1–41:14, 2019.[CS04] Artur Czumaj and Christian Sohler. Sublinear-time approximation for clusteringvia random sampling. In International Colloquium on Automata, Languages, andProgramming , pages 396–407. Springer, 2004.[CS10] Artur Czumaj and Christian Sohler. Small space representations for metric min-sum k-clustering and their applications.

Theory of Computing Systems , 46(3):416–442, 2010. 19DKK + Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA,USA, June 25-29, 2018 , pages 940–951, 2018.[DKK + Proceedings of the 50th Annual ACMSIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA,June 25-29, 2018 , pages 376–389, 2018.[dlVKKR03] Wenceslas Fernandez de la Vega, Marek Karpinski, Claire Kenyon, and YuvalRabani. Approximation schemes for clustering problems. In

Proceedings of the35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego,CA, USA , pages 50–58, 2003.[DS14] Irit Dinur and David Steurer. Analytical approach to parallel repetition. In

Pro-ceedings of the forty-sixth annual ACM symposium on Theory of computing , pages624–633. ACM, 2014.[Fei98] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM , 45(4):634–652, 1998.[FL11] Dan Feldman and Michael Langberg. A uniﬁed framework for approximatingand clustering data. In

Proceedings of the forty-third annual ACM symposium onTheory of computing , pages 569–578. ACM, 2011.[GBH98] Nili Guttmann-Beck and Refael Hassin. Approximation algorithms for min-sump-clustering.

Discrete Applied Mathematics , 89(1-3):125–142, 1998.[GI03] Venkatesan Guruswami and Piotr Indyk. Embeddings and non-approximabilityof geometric problems. In

Proceedings of the Fourteenth Annual ACM-SIAM Sympo-sium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA. , pages537–538, 2003.[GK99] Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility locationalgorithms.

J. Algorithms , 31(1):228–248, 1999.[GL15] Venkatesan Guruswami and Euiwoong Lee. Inapproximability of h-transversal/packing. In

Approximation, Randomization, and Combinatorial Opti-mization. Algorithms and Techniques (APPROX/RANDOM 2015) . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015.[GT17] Mrinalkanti Ghosh and Madhur Tulsiani. From weak to strong lp gaps for allcsps. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.[Ind00] Piotr Indyk.

High-dimensional computational geometry . PhD thesis, Citeseer, 2000.[KLM19] Karthik C. S., Bundit Laekhanukit, and Pasin Manurangsi. On the parameterizedcomplexity of approximating dominating set.

J. ACM , 66(5):33:1–33:38, 2019.20KMN +

02] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, RuthSilverman, and Angela Y Wu. A local search approximation algorithm for k-means clustering. In

Proceedings of the eighteenth annual symposium on Computa-tional geometry , pages 10–18, 2002.[KMS17] Subhash Khot, Dor Minzer, and Muli Safra. On independent sets, 2-to-2 games,and grassmann graphs. In

Proceedings of the 49th Annual ACM SIGACT Symposiumon Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017 , pages576–589, 2017.[KMS18] Subhash Khot, Dor Minzer, and Muli Safra. Pseudorandom sets in grassmanngraph have near-perfect expansion. In , pages 592–601, 2018.[KS12] Subhash Khot and Rishi Saket. Hardness of ﬁnding independent sets in almostq-colorable graphs. In , pages 380–389,2012.[LSW17] Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simpliﬁedinapproximability for k-means.

Inf. Process. Lett. , 120:40–43, 2017.[Man20] Pasin Manurangsi. Tight running time lower bounds for strong inapproximabil-ity of maximum k-coverage, unique set cover and related problems (via t-wiseagreement testing theorem). In

SODA , 2020.[Mat00] Jiˇr´ı Matouˇsek. On approximate geometric k-clustering.

Discrete & ComputationalGeometry , 24(1):61–84, 2000.[Sch00] Leonard J Schulman. Clustering for edge-cost minimization. In

STOC , volume 5,2000.[SG76] Sartaj Sahni and Teoﬁlo Gonzalez. P-complete approximation problems.

Journalof the ACM (JACM) , 23(3):555–565, 1976.

A Inapproximability of Continuous k - means and k - median in ℓ ∞ -metricwith Centers from Integral Lattice Theorem A.1 ( k - means with centers from integral lattice in n O ( ) dimensional ℓ ∞ -metric space) . For any constant ε > , given a point-set P ⊂ R m of size n (and m = poly ( n ) ) and a parameter k asinput, it is NP -hard to distinguish between the following two cases: • Completeness : There exists C ′ : = { c , . . . , c k } ⊆ Z m and σ : P → C ′ such that ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≤ n ,21 Soundness : For every C ′ : = { c , . . . , c k } ⊆ Z m and every σ : P → C ′ we have: ∑ a ∈P ( k a − σ ( a ) k ∞ ) ≥ ( − ε ) · n . Theorem A.2 ( k - median with centers from integral lattice in n O ( ) dimensional ℓ ∞ -metric space) . For any constant ε > , given a point-set P ⊂ R m of size n (and m = poly ( n ) ) and a parameter k asinput, it is NP -hard to distinguish between the following two cases: • Completeness : There exists C ′ : = { c , . . . , c k } ⊆ Z m and σ : P → C ′ such that ∑ a ∈P k a − σ ( a ) k ∞ ≤ n ,• Soundness : For every C ′ : = { c , . . . , c k } ⊆ Z m and every σ : P → C ′ we have: ∑ a ∈P k a − σ ( a ) k ∞ ≥ ( − ε ) · n . Proof of Theorems A.1 and A.2.

The proof follows as with the proof of Theorems 5.2 and 5.3, butwe have the following construction of A : V → R m . For every v ∈ V and every ( u ′ , v ′ ) ∈ E ,we deﬁne the ( u ′ , v ′ ) th coordinate of A ( v ) as follows A ( v ) ( u ′ , v ′ ) : =  v = u ′ − v = v ′′