[PDF] Clustering under Perturbation Stability in Near-Linear Time

Abstract

We consider the problem of center-based clustering in low-dimensional Euclidean spaces under the perturbation stability assumption. An instance is α -stable if the underlying optimal clustering continues to remain optimal even when all pairwise distances are arbitrarily perturbed by a factor of at most α . Our main contribution is in presenting efficient exact algorithms for α -stable clustering instances whose running times depend near-linearly on the size of the data set when α≥2+ 3 – √ . For k -center and k -means problems, our algorithms also achieve polynomial dependence on the number of clusters, k , when α≥2+ 3 – √ +ϵ for any constant ϵ>0 in any fixed dimension. For k -median, our algorithms have polynomial dependence on k for α>5 in any fixed dimension; and for α≥2+ 3 – √ in two dimensions. Our algorithms are simple, and only require applying techniques such as local search or dynamic programming to a suitably modified metric space, combined with careful choice of data structures.

Full PDF

CClustering under Perturbation Stability in Near-Linear Time

Pankaj K. Agarwal † Hsien-Chih Chang ‡ Kamesh Munagala † Erin Taylor † Emo Welzl § September 28, 2020

Abstract

3. For k -center and k -means problems, our algorithms also achievepolynomial dependence on the number of clusters, k , when α ≥ + (cid:112) + (cid:34) for any constant (cid:34) > k -median, our algorithms have polynomial dependence on k for α > α ≥ + (cid:112) Clustering is a fundamental problem in unsupervised learning and data summarization, with wide-rangingapplications that span myriad areas. Typically, the data points are assumed to lie in a Euclidean space, and thegoal in center-based clustering is to open a set of k centers to minimize the objective cost, usually a function overthe distance from each data point to its closest center. The k -median objective minimizes the sum of distances; the k -means minimizes the sum of squares of distances; and the k -center minimizes the longest distance. In the worstcase, all these objectives are NP-hard even in 2D [

52, 54 ] .A substantial body of work has focused on developing polynomial-time approximation algorithms and analyzingnatural heuristics for these problems. Given the sheer size of modern data sets, such as those generated in genomicsor mapping applications, even a polynomial-time algorithm is too slow to be useful in practice—just computingall pairs of distances can be computationally burdensome. What we need is an algorithm whose running time isnear-linear in the input size and polynomial in the number of clusters.Because of NP-hardness results, we cannot hope to compute an optimal solution in polynomial time, but in theworst case an approximate clustering can be different from an optimal clustering. We focus on the case when theoptimal clustering can be recovered under some reasonable assumptions on the input that hold in practice. Suchmethodology is termed “beyond worst-case analysis” and has been adopted by recent work [

2, 10, 25 ] . In recentyears, the notion of stability has emerged as a popular assumption under which polynomial-time optimal clusteringalgorithms have been developed. An instance of clustering is called stable if any “small perturbation” of inputpoints does not change the optimal solution. This is natural in real datasets, where often, the optimal clusteringis clearly demarcated, and the distances are obtained heuristically. Different notions of stability differ in how“small perturbation” is deﬁned, though most of them are related. In this paper, we focus on the notions of stabilityintroduced in Bilu and Linial [ ] and Awasthi, Blum, and Sheffet [ ] . A clustering instance is α -perturbation Pankaj Agarwal has been partially supported by NSF grants IIS-18-14493 and CCF-20-07556. Kamesh Munagala is supported by NSFgrants CCF-1637397 and IIS- 1447554; ONR award N00014-19-1-2268; and DARPA award FA8650-18-C-7880. † Department of Computer Science, Duke University, USA. ‡ Department of Computer Science, Dartmouth, USA. § Department of Computer Science, ETH Zürich, Switzerland. a r X i v : . [ c s . D S ] S e p lustering under Perturbation Stability in Near-Linear Time resilient or α -stable if the optimal clustering does not change when all distances are perturbed by a factor of atmost α . Similarly, a clustering instance is α -center proximal if any point is at least a factor of α closer to its ownoptimal center than any other optimal center. Awasthi, Blum, and Sheffet showed that α -stability implies α -centerproximity [ ] . This line of work designs algorithms to recover the exact optimal clustering—the ground truth—inpolynomial time for α -stable instances.This paper also focuses on recovering the optimal clustering for stable clustering instances. But instead offocusing on polynomial-time algorithms and optimizing the value of α , we ask the question: Can algorithmsbe designed that compute exact solutions to stable instances of Euclidean center-based clustering that run in timenear-linear in the input size?

We note that an ( + (cid:34) ) -approximation solution, for an arbitrarily small constant (cid:34) >

0, may differ signiﬁcantly from an optimal solution (the ground truth) even for stable instances, so one cannothope to use an approximation algorithm to recover the optimal clustering.

In this paper, we make progress on the above question, and present near-linear time algorithms for ﬁndingoptimal solutions of stable clustering instances with moderate values of α . In particular, we show the followingmeta-theorem: Theorem 1.1.

Let X be a set of n points in (cid:82) d for some constant d , let k ≥ be an integer, and let α ≥ + (cid:112) bea parameter. If the k -median, k -means, or k -center clustering instance for X is α -stable, then the optimal solutioncan be computed in ˜ O ( n poly k + f ( k )) time. In the above theorem, the ˜ O notation suppresses logarithmic terms in n and the spread of the point set. Thefunction f ( k ) depends on the choice of algorithm, and we present the exact dependence below. We also omit termsdepending solely on the dimension, d . Furthermore, the above theorem is robust in the sense that the algorithm isnot restricted to choosing the input points as centers ( discrete setting ), and can potentially choose arbitrary pointsin the Euclidean plane as centers ( continuous setting , sometimes referred to as the Steiner point setting )—indeed,we show that these notions are identical under a reasonable assumption on stability.At a more ﬁne-grained level, we present several algorithms that require mild assumptions on the stabilitycondition. In the results below, as well as throughout the paper, we present our results both for the Euclideanplane, as well as generalizations to higher (but ﬁxed number of) dimensions.

Dynamic Programming.

In Section 3, we present a dynamic programming algorithm that computes the optimalclustering in O ( nk + n polylog n ) time for α -stable k -means, k -median, and k -center in any ﬁxed dimension,provided that α ≥ + (cid:112) + (cid:34) for any constant (cid:34) >

0. For d =

2, it sufﬁces to assume that α ≥ + (cid:112) Local Search.

In Sections 4 and 5, we show that the standard 1-swap local-search algorithm, which iterativelyswaps out a center in the current solution for a new center as long as the resulting total cost improves,computes an optimal clustering for α -stable instances of k -median assuming α >

5. We also show that it canbe implemented in O ( nk log n log ∆ ) for d = O ( nk d − polylog n log ∆ ) for d > ∆ is the spreadof the point set. Coresets.

In the Section 6, we use multiplicative coresets to compute the optimal clustering for k -means, k -medianand k -center in any ﬁxed dimension, when α ≥ + (cid:112)

3. The running time is O ( nk + f ( k )) where f ( k ) is anexponential function of k . Remark 1.2.

While the current analysis of the dynamic programming based algorithm suggests that it is betterthan the local-search and coreset based approaches, the latter are of independent interest—our local-searchanalysis is considerably simpler than the previous analysis [ ] , and coresets have mostly been used to computeapproximate, rather than exact, solutions. We also note that our analysis of the local-search algorithm is probablynot tight. Furthermore, variants of all three approaches might work for smaller values of α . We note that thevalue of α assumed in the above results in larger than what is known for polynomial-time algorithm (e.g. α ≥ inAngelidakis et al. [ ] ) and that in some applications the input may not satisfy our assumption, but our resultsare a big ﬁrst step toward developing near-linear time algorithms for stable instances. We are not aware of anyprevious near-linear time algorithms for computing optimal clustering even for larger values of α . We leave theproblem of reducing the assumption on α as an important open question. The spread of a point set is the ratio between the longest and shortest pairwise distances.

P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

Techniques.

The key difﬁculty with developing fast algorithms for computing the optimal clustering is thatsome clusters could have a very small size compared to others. This issue persists even when the instances arestable. Imagine a scenario where there are multiple small clusters, and an algorithm must decide whether to mergethese into one cluster while splitting some large cluster, or keep them intact. Now imagine this situation happeningrecursively, so that the algorithm has multiple choices about which clusters to recursively split. The difference incost between these options and the size of the small clusters can be small enough that any ( + (cid:34) ) -approximationcan be agnostic, while an exact solution cannot. As such, work on ﬁnding exact optima use techniques such asdynamic programming [ ] or local search with large number of swaps [

28, 40 ] in order to recover small clusters.Other work makes assumptions lower-bounding the size of the optimal clusters or the spread of their centers [ ] .Our main technical insight for the ﬁrst two results is simple in hindsight, yet powerful: For a stable instance,if the Euclidean metric is replaced by another metric that is a good approximation, then the optimal clusteringdoes not change under the new metric and in fact the instance remains stable albeit with a smaller stabilityparameter. In particular, we replace the Euclidean metric with an appropriate polyhedral metric —that is, a convexdistance function where each unit ball is a regular polyhedron—yielding efﬁcient procedures for the following twoprimitives:• Cost of 1-swap.

Given a candidate set of centers S , maintain a data structure that efﬁciently updates thetotal cost if center x ∈ S is replaced by center y / ∈ S .• Cost of 1-clustering.

Given a partition of the data points, maintain a data structure where the cost of1-clustering (under any objectives) can be efﬁciently updated as partitions are merged.We next combine the insight of changing the metrics with additional techniques. For local search, we build onthe approach in [

28, 33, 40 ] that shows local search with t -swaps for large enough constant t ﬁnds an optimalsolution for stable instances in polynomial time for any ﬁxed-dimension Euclidean space. None of the prior analysisdirectly extends as is to 1-swap, which is critical in achieving near-linear running time—note that even when t = α ≥ + (cid:112) All of k -median, k -means, and k -center are widely studied from the perspective of approximation algorithms andare known to be hard to approximate [ ] . Indeed, for general metric spaces, k -center is hard to approximate towithin a factor of 2 − (cid:34) [ ] ; k -median is hard to ( + / e ) -approximate [ ] ; and k -means is hard to ( + / e ) -approximate in general metrics [ ] , and is hard to approximate within a factor of 1.0013 in the Euclideansetting [ ] . Even when the metric space is Euclidean, k -means is still NP-hard when k = [

9, 34 ] , and there isan n Ω ( k ) lower bound on running time for k -median and k -means in 4-dimensional Euclidean space under theexponential-time hypothesis [ ] .There is a long line of work in developing ( + (cid:34) ) -approximations for these problems in Euclidean spaces.The holy grail of this work has been the development of algorithms that are near-linear time in n , and severaltechniques are now known to achieve this. This includes randomly shifted quad-trees [ ] , coresets [ ] ,sampling [ ] , and local search [

28, 30, 32 ] , among others.There are many notions of clustering stability that have been considered in literature [

1, 8, 15, 19, 20, 24, 37, 49,56 ] . The exact deﬁnition of stability we study here was ﬁrst introduced in Awasthi et al. [ ] ; their deﬁnitionin particular resembles the one of Bilu and Linial [ ] for max-cut problem, which later has been adapted toother optimization problems [

11, 12, 21, 53, 55 ] . Building on a long line of work [

16, 18, 22, 23 ] , which graduallyreduced the stability parameter, Angelidakis et al. [ ] present a dynamic programming based polynomial-timeoptimal algorithm for discrete 2-stable instances for all center-based objectives.Chekuri and Gupta [ ] show that a natural LP-relaxation is integral for the 2-stable k -center problem. Recentwork by Cohen-Addad [ ] provides a framework for analyzing local search algorithms for stable instances. Thiswork shows that for an α -stable instance with α >

3, any solution is optimal if it cannot be improved by swapping (cid:100) / ( α − ) (cid:101) centers. Focusing on Euclidean spaces of ﬁxed dimensions, Friggstad et al. [ ] show that a local-search lustering under Perturbation Stability in Near-Linear Time O ( ) -swaps runs in polynomial time under a ( + δ ) -stable assumption for any δ >

0. However,none of the algorithms for stable instances of clustering so far have running time near-linear in n , even when thestability parameter α is large, points lie in (cid:82) , and the underlying metric is Euclidean.On the hardness side, solving ( − δ ) -center proximal k -median instances in general metric spaces is NP-hardfor any δ > [ ] . When restricted to Euclidean spaces in arbitrary dimensions, Ben-David and Reyzin [ ] showed that for every δ >

0, solving discrete ( − δ ) -center proximal k -median instances is NP-hard. Similarly,the clustering problem for discrete k -center remains hard for α -stable instances when α <

2, assuming standardcomplexity assumption that NP (cid:54) = RP [ ] . Under the same complexity assumption, discrete α -stable k -meansis also hard when α < + δ for some positive constant δ [ ] . Deshpande et al. [ ] showed it is NP-hard to ( + (cid:34) ) -approximate ( − δ ) -center proximal k -means instances. Clustering.

Let X = { p , . . . , p n } be a set of n points in (cid:82) d , and let δ : (cid:82) d × (cid:82) d → (cid:82) ≥ be a distance function(not necessarily a metric satisfying triangle inequality). For a set Y ⊆ (cid:82) d , we deﬁne δ ( p , Y ) : = min y ∈ Y δ ( p , y ) . A k -clustering of X is a partition of X into k non-empty clusters X , . . . , X k . We focus on center-based clusteringsthat are induced by a set S : = { c , . . . , c k } of k centers ; each X i is the subset of points of X that are closest to c i in S under δ , that is, X i : = (cid:8) p ∈ X | δ ( p , c i ) ≤ δ ( p , c j ) (cid:9) (ties are broken arbitrarily). Assuming the nearest neighborof each point of X in S is unique (under distance function δ ), S deﬁnes a k -clustering of X . Sometimes it is moreconvenient to denote a k -clustering by its set of centers S .The quality of a clustering S of X is deﬁned using a cost function $ ( X , S ) ; cost function $ depends on thedistance function δ , so sometimes we may use the notation $ δ to emphasize the underlying distance function. Thegoal is to compute S ∗ : = arg min S $ ( X , S ) where the minimum is taken over all subsets S ⊂ (cid:82) d of k points. Severaldifferent cost functions have been proposed, leading to various optimization problems. We consider the followingthree popular variants:• k -median clustering : the cost function is $ ( X , S ) = (cid:80) p ∈ X δ ( p , S ) .• k -means clustering : the cost function is $ ( X , S ) = (cid:80) p ∈ X ( δ ( p , S )) .• k -center clustering : the cost function is $ ( X , S ) = max p ∈ X δ ( p , S ) .In some cases we wish S to be a subset of X , in which case we refer to the problem as the discrete k -clustering problem. For example, the discrete k -median problem is to compute arg min S ⊆ X , | S | = k (cid:80) p ∈ X δ ( p , S ) . The discrete k -means and discrete k -center problems are deﬁned analogously.Given point set X , distance function δ , and cost function $, we refer to ( X , δ , $ ) as a clustering instance . If $is deﬁned directly by the distance function δ , we use ( X , δ ) to denote a clustering instance. Note that a center of aset of points may not be unique (e.g. when δ is deﬁned by the L -metric and $ is the sum of distances) or it maynot be easy to compute (e.g. when δ is deﬁned by the L -metric and $ is the sum of distances). Stability.

Let X be a point set in Euclidean space (cid:82) d . For α ≥

1, a clustering instance ( X , δ , $ δ ) is α -stable if for any perturbed distance function ˜ δ (not necessary a metric) satisfying δ ( p , q ) ≤ ˜ δ ( p , q ) ≤ α · δ ( p , q ) for all p , q ∈ (cid:82) d , any optimal clustering of ( X , δ , $ δ ) is also an optimal clustering of ( X , ˜ δ , $ ˜ δ ) . Note that the clustercenters as well as the cost of optimal clustering may be different for the two instances. We exploit the followingproperty of stability, which follows directly from its deﬁnition. Lemma 2.1.

Let ( X , δ ) be an α -stable clustering instance with α > . Then the optimal clustering O of ( X , δ ) isunique. Proof:

Assume for contradiction that there are two optimal clusterings O and O (cid:48) . There must be a point p in X that belongs to a cluster centered at c in O but is assigned to a different center c (cid:48) in O (cid:48) . Consider the perturbeddistance ˜ δ by scaling inter-cluster distances in O by an α factor while preserving all intra-cluster distances. Then α · δ ( p , c ) ≤ α · δ ( p , c (cid:48) ) = ˜ δ ( p , c (cid:48) ) ≤ ˜ δ ( p , c ) = δ ( p , c ) ,where the ﬁrst inequality is by deﬁnition of clustering O , the second inequality is by deﬁnition of clustering O (cid:48) still being optimal under ˜ δ by α -stability, and the two equalities are follows from how the perturbed distance isdeﬁned. This give a contradiction as long as α > (cid:131) P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

Metric approximations.

The next lemma, which we rely on heavily throughout the paper, is the observationthat a change of metric preserves the optimal clustering as long as the new metric is a β -approximation of theoriginal metric satisfying β < α . Lemma 2.2.

Given point set X , let δ and δ (cid:48) be two metrics satisfying δ ( p , q ) ≤ δ (cid:48) ( p , q ) ≤ β · δ ( p , q ) for all p and q in X for some β . Let ( X , δ ) be an α -stable clustering instance with α > β . Then the optimal clustering of ( X , δ ) is also the optimal clustering of ( X , δ (cid:48) ) , and vice versa. Furthermore, ( X , δ (cid:48) ) is an ( α/β ) -stable clustering instance. Proof:

Because ( X , δ ) is α -stable for α > β , the optimal clustering of ( X , δ ) is also an optimal clustering of ( X , δ (cid:48) ) by taking δ (cid:48) to be the perturbed distance. Now, for an arbitrary perturbed distance ˜ δ (cid:48) satisfying δ (cid:48) ( p , q ) ≤ ˜ δ (cid:48) ( p , q ) ≤ ( α/β ) · δ (cid:48) ( p , q ) for all p , q ∈ X , one has δ ( p , q ) ≤ δ (cid:48) ( p , q ) ≤ ˜ δ (cid:48) ( p , q ) ≤ αβ · δ (cid:48) ( p , q ) ≤ α · δ ( p , q ) ,and therefore the optimal clustering O of ( X , δ ) and ( X , δ (cid:48) ) is must be an optimal clustering of ( X , ˜ δ (cid:48) ) , provingthat ( X , δ (cid:48) ) is ( α/β ) -stable. Providing α > β , the optimal clustering of ( X , δ (cid:48) ) is again unique by Lemma 2.1; inother words, the optimal clustering of ( X , δ (cid:48) ) is by deﬁnition equal to the optimal clustering of ( X , δ ) . (cid:131) Polyhedral metric.

In light of the metric approximation lemma, we would like to approximate the Euclideanmetric without losing too much stability, using a collection of convex distance functions generalizing the L ∞ -metricin Euclidean space. Let N ⊆ S d − be a centrally-symmetric set of γ unit vectors (that is, if u ∈ N then − u ∈ N )such that for any unit vector v ∈ S d − , there is a vector u ∈ N within angle arccos ( − (cid:34) ) = O ( (cid:112) (cid:34) ) . The numberof vectors needed in N is known to be O ( (cid:34) − ( d − ) / ) . We deﬁne the polyhedral metric δ N : (cid:82) d × (cid:82) d → (cid:82) ≥ to be δ N ( p , q ) : = max u ∈ N 〈 p − q , u 〉 .Since N is centrally symmetric, δ N is symmetric and thus a metric. The unit ball under δ N is a convexpolyhedron, thus the name polyhedral metric. By construction, an easy calculation shows that for any p and q in (cid:82) d , (cid:107) p − q (cid:107) ≥ δ N ( p , q ) ≥ ( − (cid:34) ) · (cid:107) p − q (cid:107) . By scaling each vector in N by a 1 + (cid:34) factor, we can ensurethat ( + (cid:34) ) · (cid:107) p − q (cid:107) ≥ δ N ( p , q ) ≥ (cid:107) p − q (cid:107) . By taking (cid:34) to be small enough, the optimal clustering for α -stableclustering instance ( X , (cid:107)·(cid:107) , $ ) is the same as that for ( X , δ N , $ ) by Lemma 2.2, and the new instance ( X , δ N , $ ) is ( − (cid:34) ) α -stable if the original instance ( X , (cid:107)·(cid:107) , $ ) is α -stable. Center proximity.

A clustering instance ( X , δ ) satisﬁes α -center proximity property [ ] if for any distinctoptimal clusters X i and X j with centers c i and c j and any point p ∈ X i , one has α · δ ( p , c i ) < δ ( p , c j ) . Awasthi, Blum,and Sheffet showed that any α -stable instance satisﬁes α -center proximity [

16, Fact 2.2 ] (also [

12, Theorem 3.1 ] under metric perturbation). Optimal solutions of α -stable instances satisfy the following separation properties. • α -center proximity implies that ( α − ) · δ ( p , c i ) < δ ( p , q ) for any p ∈ X i and any q (cid:54)∈ X i . For α ≥

2, a pointis closer to its own center than to any point of another cluster. • For α ≥ + (cid:112) α -center proximity implies that δ ( p , p (cid:48) ) < δ ( p , q ) for any p , p (cid:48) ∈ X i and any q (cid:54)∈ X i . In otherwords, from any point p in X , any intra -cluster distance to a point p (cid:48) is shorter than any inter -cluster distanceto a point q . We make use of the following stronger intra-inter distance property on α -stable instances, which allows us tocompare any intra-distance between two points in X i and any inter-distance between a point in X i and a point in X j . Lemma 2.3.

Let ( X , δ ) be an α -stable instance, α > , and let X be a cluster in an optimal clustering with q ∈ X \ X and p , p (cid:48) , p (cid:48)(cid:48) ∈ X . If δ is a metric, then δ ( p , p (cid:48) ) ≤ δ ( p (cid:48)(cid:48) , q ) for α ≥ + (cid:112) . If δ is the Euclidean metricin (cid:82) d , then δ ( p , p (cid:48) ) ≤ δ ( p (cid:48)(cid:48) , q ) for α ≥ + (cid:112) . Proof.

See Appendix A.1.Finally, we note that it is enough to consider the discrete version of the clustering problem for stable instances. We give an additional list of known separation properties in Appendix A.1. They are known as weak center proximity [ ] and strict separation property [ ] respectively. lustering under Perturbation Stability in Near-Linear Time Lemma 2.4.

For any α -stable instance ( X , δ , $ δ ) with α ≥ + (cid:112) , any continuous optimal k -clustering is a discreteoptimal k -clustering and vice versa. Proof:

Consider O to be an optimal solution of an arbitrary α -stable instance ( X , δ ) in the continuous setting;denote the centers in O as o , . . . , o k . Deﬁne solution O (cid:48) to be the set of centers nn , . . . , nn k , where nn i is deﬁnedto be the nearest point of o i in X . By deﬁnition O (cid:48) is a discrete solution as all centers nn i lie in X . We now arguethat O (cid:48) is in fact an optimal solution of the k -clustering instance ( X , δ ) .First we show that nn i must be a point that was assigned to o i in clustering O . Assume for contradiction thatnn i was in a different cluster with center o j . Let p be an arbitrary point in cluster O i . By center proximity onehas δ ( p , nn i ) > ( α − ) · δ ( p , o i ) . But then this implies ( α − ) · δ ( p , o i ) < δ ( p , nn i ) ≤ δ ( p , o i ) + δ ( o i , nn i ) , that is, δ ( p , o i ) ≤ ( α − ) · δ ( p , o i ) < δ ( o i , nn i ) given α ≥

3, a contradiction.Now again take an arbitrary point p in some arbitrary cluster O i . Compare the distances δ ( p , nn i ) and δ ( p , nn j ) for any other center nn j in O (cid:48) . By [

24, Theorem 8 ] for α > + (cid:112) δ ( p , nn i ) < δ ( p , nn j ) since nn i lies in O i and nn j lies in O j . Therefore the clusteringformed by the centers in O (cid:48) is identical to the clustering of O , thus proving that O (cid:48) is an optimal solution of ( X , δ ) . (cid:131) We now describe a simple, efﬁcient algorithm for computing the optimal clustering for the k -means, k -center,and k -median problem assuming the given instance is α -stable for α ≥ + (cid:112)

3. Roughly speaking, we make thefollowing observation: if there are at least two clusters, then the two endpoints of the longest edge of the minimumspanning tree of X belong to different clusters, and no cluster has points in both subtrees of the minimum spanningtree delimited by the longest edge. We describe the dynamic programming algorithm in Section 3.1 and thendescribe the procedure for computing cluster costs in Section 3.2. We summarize the results in this section by thefollowing theorem. Theorem 3.1.

Let X be a set with n points lying in (cid:82) d and k ≥ an integer. If the k -means, k -median, or k -centerinstance for X under the Euclidean metric is α -stable for α ≥ + (cid:112) + (cid:34) for any constant (cid:34) > , then the optimalclustering can be computed in O ( nk + n polylog n ) time. For d = the assumption can be relaxed to α ≥ + (cid:112) . The following lemma is the key observation for our algorithm.

Lemma 3.2.

Let ( X , δ , $ ) be an α -stable k -clustering instance with α ≥ + (cid:112) and k ≥ , and let T be theminimum spanning tree of X under metric δ . Then (1) The two endpoints u and v of the longest edge e in T donot belong to the same cluster; (2) each cluster lies in the same connected component of T \ { e } . Proof:

Assume for contradiction that the longest spanning tree edge uv belongs to the same cluster X i in theoptimal k -clustering O . Since k >

1, there is at least one other cluster X j of O with a spanning tree edge x y connecting X i to X j . Given α ≥ + (cid:112) d ( u , v ) < d ( x , y ) by Lemma 2.3, a contradiction. The second statementfollows from Angelidakis et al. [

12, Lemma 4.1 ] . (cid:131) Algorithm.

We ﬁx the metric δ and the cost function $. For a subset Y ⊆ X and for an integer j between 1and k −

1, let µ ( Y ; j ) denote the optimal cost of an j -clustering on Y (under δ and $). Recall that our deﬁnitionof j -clustering required all clusters to be non-empty, so it is not deﬁned for | Y | < j . For simplicity, we assume that µ ( Y ; j ) = ∞ for | Y | < j . Let T be the minimum spanning tree on X under δ , let uv be the longest edge in T ; let X u and X v be the set of vertices of the two components of T \ { uv } . Then µ ( X ; k ) satisﬁes the following recurrencerelation: µ ( X ; k ) =  µ ( X ; 1 ) if k = ∞ if k > | X | ,min ≤ i < k { µ ( X u ; i ) + µ ( X v ; k − i ) } if | X | > k >

1. (1)

P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

Using recurrence (1), we compute µ ( X ; k ) as follows. Let R be a recursion tree , a binary tree where each node v in R is associated with a subtree T v of T . If v is the root of R , then T v = T . Recursion tree R is deﬁned recursivelyas follows. Let X v ⊆ X be the set of vertices of T in T v . If | X v | =

1, then v is a leaf. Each interior node v of T is alsoassociated with the longest edge e v of T v . Removal of e v decomposes T v into two connected components, each ofwhich is associated with one of the children of v . After having computed T , R can be computed in O ( n log n ) timeby sorting the edges in decreasing order of their costs. For each node v ∈ R and for every i between 1 and k −

1, we compute µ ( X v ; i ) as follows. If v is a leaf, weset µ ( X v ; 1 ) = µ ( X v ; i ) = ∞ otherwise. For all interior nodes v , we compute µ ( X v ; 1 ) using the algorithmsdescribed in Section 3.2. Finally, if v is an interior node and i >

1, we compute µ ( X v ; i ) using the recurrencerelation (1). Recall that if w and z are the children of v , then µ ( X w ; (cid:96) ) and µ ( X z ; r ) for all (cid:96) and r have beencomputed before we compute µ ( X v ; i ) .Let τ ( n ) be the time spent in computing T plus the total time spent in computing µ ( X v , 1 ) for all nodes v ∈ R .Then the overall time taken by the algorithm is O ( nk + τ ( n )) . What is left is to compute the minimum spanningtree T and all µ ( X v , 1 ) efﬁciently. In this section, we show how to obtain the minimum spanning tree and compute µ ( X v ; 1 ) efﬁciently for 1-mean,1-center, and 1-median when X ⊆ (cid:82) d . We can compute the Euclidean minimum spanning tree T in O ( n log n ) timein (cid:82) [ ] . We can then compute µ ( X v ; 1 ) efﬁciently either under Euclidean metric (for 1-mean), or switch to the L -metric and compute µ ( X v ; 1 ) efﬁciently using Lemma 2.2 (for 1-center and 1-median).There are two difﬁculties in extending the 2D data structures to higher dimensions. No near-linear timealgorithm is known for computing the Euclidean minimum spanning tree for d ≥

3, and we can work with the L -metric only if α ≥ (cid:112) d (Lemma 2.2). We address both of these difﬁculties by working with a polyhedral metric δ N . Let α ≥ + (cid:112) + Ω ( ) be the stability parameter. By taking the number of vectors in N (deﬁned by thepolyhedral metric) to be large enough, we can ensure that ( − (cid:34) ) (cid:107) p − q (cid:107) ≤ δ N ( p , q ) ≤ (cid:107) p − q (cid:107) for all p , q ∈ (cid:82) d .By Lemma 2.2, X is an α -stable instance under δ N for α ≥ + (cid:112)

3. We ﬁrst compute the minimum spanning treeof X in O ( n polylog n ) time under δ N using the result of Callahan and Kosaraju [ ] , and then compute µ ( X v , 1 ) . Data structure.

We compute µ ( X v ; 1 ) in a bottom-up manner. When processing a node v of R , we maintain adynamic data structure Ψ v on X v from which µ ( X v ; 1 ) can be computed quickly. The exact form of Ψ v depends onthe cost function to be described below. Before that, we analyze the running time τ ( n ) spent on computing every µ ( X v ; 1 ) . Let w and z be the two children of v . Suppose we have Ψ w and Ψ z at our disposal and suppose | X w | ≤ | X z | .We insert the points of X w into Ψ z one by one and obtain Ψ v from which we compute µ ( X v ; 1 ) . Suppose Q ( n ) is theupdate time of Ψ v as well as the time taken to compute µ ( X v ; 1 ) from Ψ v . The total number of insert operationsperformed over all nodes of R is O ( n log n ) because we insert the points of the smaller set into the larger set ateach node of R [

45, 58 ] . Hence τ ( n ) = O ( Q ( n ) · n log n ) . We now describe the data structure for each speciﬁcclustering problem. We work with the L -metric. Here the center of a single cluster consisting of X v is the centroid σ v : = (cid:128)(cid:80) p ∈ X v p (cid:138) / | X v | , and µ ( X v ; 1 ) = (cid:80) p ∈ X v (cid:107) p (cid:107) − | X v | · (cid:107) σ v (cid:107) . At each node v , we maintain (cid:80) p ∈ X v p and (cid:80) p ∈ X v (cid:107) p (cid:107) . Point insertion takes O ( ) time so Q ( n ) = As mentioned in the beginning of the section, we can work with the L -metric for d =

2. We wishto ﬁnd the smallest L -disc (a diamond) that contains X v . Let e + = (

1, 1 ) and e − = ( −

1, 1 ) . Then the radius ρ v ofthe smaller L -disc containing X v is ρ v =

12 max (cid:167) max p ∈ X v 〈 p , e + 〉 − min p ∈ X v 〈 p , e + 〉 , max p ∈ X v 〈 p , e − 〉 − min p ∈ X v 〈 p , e − 〉 (cid:170) . (2)We maintain the four terms max p ∈ X v 〈 p , e + 〉 , min p ∈ X v 〈 p , e + 〉 , max p ∈ X v 〈 p , e − 〉 , and min p ∈ X v 〈 p , e − 〉 at v . A pointcan be inserted in O ( ) time and ρ v can be computed from these four terms in O ( ) time. Therefore, Q ( n ) = O ( ) . Tree R is nothing but the minimum spanning tree constructed by Kruskal’s algorithm. lustering under Perturbation Stability in Near-Linear Time For d >

2, we work with a polyhedral metric δ N with N = O ( d ) . For a node v , we need to compute the smallest ball B ( X v ) under δ N that contains X v . We need a few geometric observationsto compute the smallest enclosing ball efﬁciently.For each u ∈ N , let H u be the halfspace 〈 x , u 〉 ≤

1, that is, the halfspace bounded by the hyperplane tangent to S d − at u and containing the origin. Deﬁne Q : = (cid:84) u ∈ N H u . A ball of radius λ centered at p under δ N is P + λ Q .For a vector u ∈ N , let p u : = arg max p ∈ x v 〈 p , u 〉 be the maximal point in direction u . Set X v : = (cid:8) p u | u ∈ N (cid:9) . Thefollowing simple lemma is the key to computing B ( X v ) . Lemma 3.3.

Any δ N -ball that contains X v also contains X v . By Lemma 3.3, it sufﬁces to compute B ( X v ) . The next observation is that B ( X v ) has a basis of size d +

1, i.e.there is a subset Y of d + X v such that B ( Y ) = B ( X v ) = B ( X v ) . One can try all possible subsets of X v in O ( γ d + ) = O ( d ) time. We note that X v can be maintained under insertion in O ( γ ) = O ( d ) time, and we thenre-compute B ( X v ) in 2 O ( d ) time. Hence, Q ( n ) = O ( ) . Similar to 1-center, we work with the polyhedral metric. Fix a node v of T . For a point x ∈ (cid:82) d , let F v ( x ) = (cid:80) p ∈ X v δ N ( x , p ) which is a piecewise-linear function. Our goal is to compute ξ ∗ v = arg min x ∈ (cid:82) d F v ( x ) . Ourdata structure is a dynamic range-tree [ ] used for orthogonal range searching that can insert a point in O ( log n ) time. Using multi-dimensional parametric search [ ] , ξ ∗ v can be computed in O ( poly log n ) time after each update. For simplicity, we describe the data structure for d =

2. It extends to highdimensions in a straightforward manner. u u u u u u w w w w w w C Figure 3.1.

Polyhedral metric deﬁned by N = { u , . . . u } , with C corresponding to Ψ . Fix a node v . We describe the data structure for maintaining ξ ∗ v under insertion of points Let N = { u , . . . , u r − } ⊂ S be the set of unit vectors that deﬁne the metric δ N . We partition the plane into a family (cid:67) = { C , . . . , C r − } of r cones such that for a point p ∈ C i , δ N ( p , 0 ) = 〈 p , u i 〉 . C i is deﬁned by unit vectors w i − , w i ,where w j is the unit vector in direction ( u j − + u j ) | ; see Figure 3.1. For a point x ∈ (cid:82) and j < r , let C j ( x ) = C j + x .Then, F v ( x ) = r − (cid:88) i = (cid:88) p ∈ C i ( x ) ∩ X v δ N ( p , x ) = r − (cid:88) i = (cid:88) p ∈ C i ( x ) ∩ X v 〈 p − x , u i 〉 = r − (cid:88) i = (cid:88) p ∈ C i ( x ) ∩ X v 〈 p , u i 〉 − r (cid:88) i = | C i ( x ) ∩ X v | · 〈 x , u i 〉 . A more complex algorithm can compute B ( X v ) in γ · O ( d ) = O ( d ) time, but we ignore this improvement. We note that ξ ∗ v may not be unique. The minimum may be realized of a convex polygon. P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

We note that F v ( x ) is a piecewise-linear convex function. We construct a separate data structure Ψ i for each i so that for any x ∈ (cid:82) , Ψ i computes α i ( x ) = (cid:80) p ∈ C i ( x ) ∩ X v 〈 p , u i 〉 and β i ( x ) = | C i ( x ) ∩ X v | . Ψ i is basically a dynamic2 D range tree in which the coordinates of a point are described with w i − , w i as the coordinate axes; see [ ] . Ψ i requires O ( n log n ) space, a query can be answered in O ( log n ) time, and a point can be inserted in O ( log n ) amortized time. Hence, the overall query and update time is O ( r log n ) . We note that α i , β i for i < r can be usedto compute the linear function L v , x that represents x (recall that F v is piecewise linear). Let Ψ = ( Ψ , . . . , Ψ r − ) denote the overall data structure, and let Q ( x ) be the above query procedure on Ψ .Using Ψ , Q ( · ) , and multi-dimensional parametric search, we compute ξ ∗ v as follows. For a line (cid:96) in (cid:82) , let ξ ∗ v , (cid:96) = arg min x ∈ (cid:96) F v ( x ) . We ﬁrst describe how to compute ξ ∗ v , (cid:96) . Let q be a point on (cid:96) . By invoking Q ( q ) on Ψ ,we can compute F v ( q ) as well as L v , q . Using L v , q , we can determine whether q = ξ ∗ v , l , q lies to left of ξ ∗ v , l , or q lies to the right of ξ ∗ v , l . We refer to this as the “decision” procedure. In order to compute ξ ∗ v , we simulate Q generically on ξ ∗ v , l without knowing its value and using Q on known points as the decision procedure at each stepof this generic procedure. More precisely, at each step, a Ψ i compares the w i − or w i -coordinate, say w i -coordinate ξ v , x , with a real value ∆ . Let q be the intersection point of (cid:96) and the line w i = ∆ . By invoking Q ( q ) on Ψ , wecan determine in O ( r log n ) time whether q lies to the left or right of ξ ∗ v , which in turn determines whether the w i -coordinate of ξ ∗ v , (cid:96) is smaller or greater than ∆ . (If q = ξ ∗ x , (cid:96) , then we have found ξ ∗ x , (cid:96) ). Hence, each step ofthe decision procedure can be determined in O ( r log n ) time. The total time taken by the generic procedure is O ( r log n ) . The parametric search technique ensures that the generic procedure will query with ξ ∗ v , (cid:96) as one ofthe steps, so the decision procedure will detect this and return ξ ∗ v , (cid:96) .Let Q ( (cid:96) ) denote the above procedure to compute ξ ∗ v , (cid:96) . By simulating Q on ξ ∗ v generically but now using Q as the decision procedure, we can compute ξ ∗ v in O ( r log n ) time. Hence, we can maintain ξ ∗ v in O ( log n ) timeunder insertion of a point. In higher dimensions, Q takes O ( log d n ) time in (cid:82) d . So the parametric search will take O ( log d ( d + ) n ) time to compute ξ ∗ v . k -Median: Single-Swap Local Search We customize the standard local-search framework for the k -clustering problem [

32, 33, 41 ] . In order to recoverthe optimal solution, we must deﬁne near-optimality more carefully. Let ( X , δ ) be an instance of α -stable k -medianin (cid:82) for α >

5. By Lemma 2.4, it sufﬁces to consider the discrete k -median problem In Section 4, we describe asimple local-search algorithm for ﬁnding the optimal clustering of ( X , δ ) . In Section 4 we show that the algorithmterminates within O ( k log ( n ∆ )) iterations. We obtain the following. Theorem 4.1.

Let ( X , δ ) be an α -stable instance of the k -median problem for some α > where X is a set of n points in (cid:82) equipped with L p -metric δ . The -swap local search algorithm terminates with the optimal clusteringin O ( k log ( n ∆ )) iterations. Local-search algorithm.

The local-search algorithm maintains a k -clustering induced by a set S of k clustercenters. At each step, it ﬁnds a pair of points x ∈ X and y ∈ S such that $ ( X , S + x − y ) is minimized. If$ ( X , S + x − y ) ≥ $ ( X , S ) , it stops and returns the k -clustering induced by S . Otherwise it replaces S with S + x − y and repeats the above step. The pair ( x , y ) will be referred to as a . Local-search analysis.

The high-level structure of our analysis follows Friggstad et al. [ ] , however newideas are needed for 1-swap. In this subsection, we denote a k -clustering by the set of its cluster centers. Let S bea ﬁxed k -clustering, and let O be the optimal clustering. For a subset Y ⊆ X , we use $ ( Y ) and $ ∗ ( Y ) to denote$ ( Y , S ) and $ ( Y , O ) , respectively. Similarly, for a point p ∈ X , we use nn ( p ) and nn ∗ ( p ) to denote the nearestneighbor of p in S and in O , respectively; deﬁne δ ( p ) to be δ ( p , S ) and δ ∗ ( p ) to be δ ( p , O ) . We partition X intofour subsets as follows:• X : = (cid:8) p ∈ X | nn ( p ) ∈ S \ O , nn ∗ ( p ) ∈ O \ S (cid:9) ;• X : = (cid:8) p ∈ X | nn ( p ) ∈ S \ O , nn ∗ ( p ) ∈ S ∩ O (cid:9) ;• X : = (cid:8) p ∈ X | nn ( p ) ∈ S ∩ O , nn ∗ ( p ) ∈ O \ S (cid:9) ; lustering under Perturbation Stability in Near-Linear Time Figure 4.1.

Illustration of candidate swaps (cid:83) in (cid:82) . The blue dots belong to set S , the red dots belong to set O ; the only purple dot is in S ∩ O .The thick gray segments indicate pairs inside the stars; each star has exact one blue dot as its center. The black pairs are the candidate swaps.Notice that the partitions of S and O form connected components. • X : = (cid:8) p ∈ X | nn ( p ) ∈ S ∩ O , nn ∗ ( p ) ∈ S ∩ O (cid:9) .Observe that for any point p in X , nn ( p ) = nn ∗ ( p ) and $ ( p ) = $ ∗ ( p ) ; for any point p in X , one has $ ( p ) ≤ $ ∗ ( p ) ;and for any point p in X , one has $ ( p ) ≥ $ ∗ ( p ) . Costs δ ( p ) and δ ∗ ( p ) are not directly comparable for point p in X . A k -clustering S is C -good for some parameter C ≥ ( X ) ≤ $ ∗ ( X ) + C · $ ∗ ( X ) . Lemma 4.2.

Any C -good clustering S for an α -stable clustering instance ( X , δ , $ ) must be optimal for α ≥ C + . Proof:

Deﬁne a perturbed distance function ˜ δ : X × X → (cid:82) ≥ with respect to the given clustering S as follows:˜ δ ( p (cid:48) , p ) : = (cid:168) α · δ ( p (cid:48) , p ) if p (cid:54) = nn ( p (cid:48) ) , δ ( p (cid:48) , p ) otherwise.Note that ˜ δ is not symmetric. Let ˜$ ( · , · ) denote the cost function under the perturbed distance function ˜ δ . Theoptimal clustering under perturbed cost function is the same as the original optimal clustering O by the stabilityassumption. Since nn ( p ) = nn ∗ ( p ) if and only if p ∈ X , the cost of O under the perturbed cost can be written as:˜$ ( X , O ) = α · $ ( X , O ) + α · $ ( X , O ) + α · $ ( X , O ) + $ ( X , O ) .By deﬁnition of perturbed distance ˜ δ , ˜$ ( X , S ) = $ ( X , S ) . Now, by the assumption that clustering S is C -good,˜$ ( X , S ) = $ ( X , S ) ≤ $ ( X , O ) + C · $ ( X , O ) ≤ ( C + ) · $ ( X , O ) + $ ( X , O ) + $ ( X , O ) + $ ( X , O ) ≤ ˜$ ( X , O ) ;the last inequality follows by taking α ≥ C +

1. This implies that S is an optimal clustering for ( X , ˜ δ ) , and thus isequal to O . (cid:131) Next, we prove a lower bound on the improvement in the cost of a clustering that is not C -good after performinga 1-swap. Following Arya et al. [ ] , deﬁne the set of candidate swaps (cid:83) as follows: For each center i in S ,consider the star Σ i centered at i deﬁned as the collection of pairs Σ i : = { ( i , j ) ∈ S × O | nn ( j ) = i } . Denote center ( j ) to be the center of the star where j belongs; in other words, center ( j ) = i if j belongs to Σ i .For i ∈ S , let O i : = { j ∈ O | center ( j ) = i } be the set of centers of O in star Σ i . If | O i | =

1, then we add theonly pair ( i , j ) ∈ Σ i to the candidate set (cid:83) . Let S ∅ : = { i ∈ S | O i = ∅ } . Let O > contain centers in O that belong toa star of size greater than 1. We pick | O > | pairs from S ∅ × O > such that each point of O > is matched only onceand each point of S ∅ is matched at most twice and add them to (cid:83) ; this is feasible because | S ∅ | ≥ | O > | /

2. Sinceeach center in O belongs to exactly one pair of (cid:83) , |(cid:83) | = k . By construction, if | Σ i | ≥

2, then i does not belong toany candidate swap. See Figure 4.1. Lemma 4.3.

For each point p in X , X , or X , the set of candidate swaps (cid:83) satisﬁes (cid:88) ( i , j ) ∈(cid:83) ( δ ( p ) − δ (cid:48) ( p )) ≥ δ ( p ) − δ ∗ ( p ) ; (3)0 P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzland for each point p in X , the set of candidate swaps (cid:83) satisﬁes (cid:88) ( i , j ) ∈(cid:83) ( δ ( p ) − δ (cid:48) ( p )) ≥ ( δ ( p ) − δ ∗ ( p )) − δ ∗ ( p ) , (4) where $ (cid:48) is the cost function on X deﬁned with respect to S (cid:48) : = S − i + j , and δ (cid:48) ( p ) is the distance between p andits nearest neighbor in S (cid:48) . Proof:

For point p in X , both nn ( p ) and nn ∗ ( p ) are in S (cid:48) , so δ (cid:48) ( p ) = δ ( p ) = δ ∗ ( p ) . For point p in X , δ ( p ) ≤ δ ∗ ( p ) ; when nn ( p ) is being swapped out by some in 1-swap S (cid:48) , nn ∗ ( p ) must be in S (cid:48) . For point p in X , δ ( p ) ≥ δ ∗ ( p ) ; center nn ( p ) will never be swapped out by any 1-swap in (cid:83) , so δ (cid:48) ( p ) ≤ δ ( p ) . By construction of (cid:83) , there is exactly one choice of S (cid:48) that swaps nn ∗ ( p ) in; for that particular swap we have δ (cid:48) ( p ) = δ ∗ ( p ) . In allthree cases one has inequality (3). Our ﬁnal goal is to prove inequality (4). Consider a swap ( i , j ) in (cid:83) . There arethree cases to consider:• j = nn ∗ ( p ) . There is exactly one swap for which j = nn ∗ ( p ) . In this case δ ( p ) ≤ δ ∗ ( p ) , therefore δ ( p ) − δ (cid:48) ( p ) ≥ δ ( p ) − δ ∗ ( p ) .• j (cid:54) = nn ∗ ( p ) and i (cid:54) = nn ( p ) . Since nn ( p ) ∈ S (cid:48) , δ (cid:48) ( p ) ≤ δ ( p ) . Therefore δ ( p ) − δ (cid:48) ( p ) ≥ j (cid:54) = nn ∗ ( p ) and i = nn ( p ) . By construction, there are most two swaps in (cid:83) that may swap out nn ( p ) . We claimthat i (cid:54) = center ( nn ∗ ( p )) . Indeed, if i = center ( nn ∗ ( p )) , then by construction, Σ i = { ( i , nn ∗ ( p )) } because thecenter of star of size greater than one is never added to a candidate swap. But this contradicts the assumptionthat j (cid:54) = nn ∗ ( p ) . The claim implies that center ( nn ∗ ( p )) ∈ S (cid:48) and thus δ (cid:48) ( p ) ≤ δ ( p , center ( nn ∗ ( p ))) . Weobtain a bound on δ ( p , center ( nn ∗ ( p ))) as follows: δ ( p , center ( nn ∗ ( p ))) ≤ δ ( p , nn ∗ ( p )) + δ ( nn ∗ ( p ) , center ( nn ∗ ( p ))) ≤ δ ∗ ( p ) + δ ( nn ∗ ( p ) , nn ( p )) ≤ δ ∗ ( p ) + ( δ ∗ ( p ) + δ ( p )) = δ ( p ) + δ ∗ ( p ) .Therefore, δ ( p ) − δ (cid:48) ( p ) ≥ δ ( p ) − δ ( p , center ( nn ∗ ( p ))) . Putting everything together, we obtain: (cid:88) S (cid:48) ∈(cid:83) ( δ ( p ) − δ (cid:48) ( p )) ≥ ( δ ( p ) − δ ∗ ( p )) + + ( δ ( p ) − δ ( p ) − δ ∗ ( p )) = δ ( p ) − δ ∗ ( p ) . (cid:131) Using Lemma 4.3, we can prove the following.

Lemma 4.4.

Let S be a k -clustering of ( X , δ ) that is not C -good for some ﬁxed constant C > + (cid:34) with arbitrarilysmall (cid:34) > . There is always a -swap S (cid:48) such that $ (cid:48) ( X ) − $ ∗ ( X ) ≤ ( − (cid:34)/ ( + (cid:34) ) k ) · ( $ ( X ) − $ ∗ ( X )) , where $ (cid:48) isthe cost function deﬁned with respect to S (cid:48) . Proof:

By Lemma 4.3 one has $ ( X ) − $ (cid:48) ( X ) ≥ ( $ ( X ) − $ ∗ ( X ) − Ψ ( X )) / k for some 1-swap S (cid:48) and its correspondingcost function $ (cid:48) ( · ) . Since S is not C -good, $ ( X ) − $ ∗ ( X ) > C · $ ∗ ( X ) . Rearranging and plugging the deﬁnition of Ψ ( · ) , we have$ (cid:48) ( X ) − $ ∗ ( X ) ≤ $ ( X ) − $ ∗ ( X ) − ( $ ( X ) − $ ∗ ( X ) − Ψ ( X )) / k ≤ $ ( X ) − $ ∗ ( X ) − ( $ ( X ) − $ ∗ ( X ) − · $ ∗ ( X )) / k ≤ $ ( X ) − $ ∗ ( X ) − ( $ ( X ) − $ ∗ ( X ) + ( M − ) · ( $ ( X ) − $ ∗ ( X )) − M · $ ∗ ( X )) / M k ≤ (cid:129) − (cid:34) ( + (cid:34) ) k (cid:139) · ( $ ( X ) − $ ∗ ( X )) ,where the last inequality holds by taking M to be arbitrarily large (say M > + /(cid:34) ). (cid:131) lustering under Perturbation Stability in Near-Linear Time Figure 5.1. L Voronoi diagram V , quadrant decomposition ˜ V , and trapezoid decomposition V (cid:107) . We describe an efﬁcient implementation of each step of the local-search algorithm in this section. By Lemma 2.2,it sufﬁces to implement the algorithm using a polyhedral metric δ N . We show that each step of 1-swap can beimplemented in O ( nk d − polylog n ) time under the assumption that α >

5. We obtain the following:

Theorem 5.1.

Let ( X , δ ) be an α -stable instance of the k -median problem where X ⊂ (cid:82) d and δ is the Euclidean met-ric. For α > , the -swap local search algorithm computes the optimal k -clustering of ( X , δ ) in O ( nk d − polylog n ) time. For simplicity, we present a slightly weaker result for d = L -metric, as it is straightforward toimplement and more intuitive. Using the L -metric requires α > (cid:112)

2. The extension to higher dimensionalEuclidean space using the polyhedral works for α > Voronoi diagram under L norm. First, we ﬁx a point x ∈ X \ S to insert and a center y ∈ S to drop. Deﬁne S (cid:48) : = S + x − y . We build the L Voronoi diagram V of S (cid:48) . The cells of V may not be convex, but they are star-shaped : for any c ∈ S (cid:48) and for any point x ∈ Vor ( c ) , the segment c x lies completely in Vor ( c ) . Furthermore, allline segments on the cell boundaries of V must have slopes belonging to one of the four possible values: vertical,horizontal, diagonal, or antidiagonal.Next, decompose each Voronoi cell Vor ( c ) into four quadrants centered at c . Denote the resulting subdivisionof V as ˜ V . We compute a trapezoidal decomposition V (cid:107) of the diagram ˜ V by drawing a vertical segment fromeach vertex of ˜ V in both directions until it meets an edge of V ; V (cid:107) has O ( k ) trapezoids, see Figure 5.1. For eachtrapezoid τ ∈ V (cid:107) , let X τ : = X ∩ τ . The cost of the new clustering S (cid:48) can be computed as $ ( X , S (cid:48) ) = (cid:80) τ ∈ V (cid:107) $ ( X τ , S (cid:48) ) . Range-sum queries.

Now we discuss how to compute $ ( X τ , S (cid:48) ) . Each trapezoid τ in cells Vor ( c ) is associatedwith a vector u ( τ ) ∈ {± } , depending on which of the four quadrants τ belongs to with respect to the axis-parallelsegments drawn passing through the center c of the cell. If τ lies in the top-right quadrant then u ( τ ) = (

1, 1 ) .Similarly if τ lies in the top-left (resp. bottom-left, bottom-right) then u ( τ ) = ( −

1, 1 ) (resp. ( − − ) , ( − ) ).$ ( X τ , S (cid:48) ) = (cid:88) x ∈ X τ (cid:107) x − c (cid:107) = (cid:88) x ∈ X τ 〈 x − c , u ( τ ) 〉 = (cid:88) x ∈ X τ 〈 x , u ( τ ) 〉 − | X τ | · 〈 c , u ( τ ) 〉 . (5)We preprocess X into a data structure that answers the following query:• T RAPEZOID S UM ( τ , u ) : Given a trapezoid τ and a vector u ∈ {± } , return | X ∩ τ | as well as (cid:80) x ∈ X ∩ τ 〈 x , u 〉 .The above query can be viewed as a 3-oriented polygonal range query [ ] . We construct a 3-level rangetree Ψ on X . Omitting the details (which can be found in [ ] ), Ψ can be constructed in O ( n log n ) timeand uses O ( n log n ) space. Each node ξ at the third level of Ψ is associated with a subset X ξ ⊆ X . We store w ( ξ , u ) : = (cid:80) x ∈ X ξ 〈 x , u 〉 for each u ∈ {± } and | X ξ | at ξ . For a trapezoid τ , the query procedure identiﬁes in O ( log n ) time a set Ξ τ of O ( log n ) third-level nodes such that X ∩ τ = ∪ ξ ∈ Ξ τ X ξ and each point of X ∩ τ appearsas exactly one node of Ξ τ . Then (cid:80) x ∈ X τ 〈 x , u 〉 = (cid:80) ξ ∈ Ξ τ w ( ξ , u ) and | X τ | = (cid:80) ξ ∈ Ξ τ | X ξ | .2 P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

WAP ( X , S ) : input: Point set X and centers S for each point x ∈ X \ S and center y ∈ S : S (cid:48) ← S + x − yV ← L Voronoi diagram of S (cid:48) ˜ V ← decompose each cell Vor ( c ) into four quadrants centered at cV (cid:107) ← trapezoidal decomposition of ˜ V for each trapezoid τ ∈ V (cid:107) :$ ( X τ , S (cid:48) ) ← T RAPEZOID S UM ( τ , u ( τ )) $ ( X , S (cid:48) ) ← (cid:80) τ ∈ V (cid:107) $ ( X τ , S (cid:48) ) return ( x , y ) with the lowest $ ( X , S + x − y ) Figure 5.2.

Efﬁcient implementation of 1-swap under 1-norm.

With the information stored at the nodes in Ξ τ , T RAPEZOID S UM ( τ , u ) query can be answered in O ( log n ) time.By performing T RAPEZOID S UM ( τ , u ( τ )) query for all τ ∈ V (cid:107) , $ ( X τ , S (cid:48) ) can be computed in O ( k log n ) time since V (cid:107) has a total of O ( k ) trapezoids.We summarize the implementation of 1-swap algorithm in Figure 5.2. The 1-swap procedure considers at most nk different k -clusterings. Therefore we obtain the following. Lemma 5.2.

Let ( X , δ , $ ) be a given clustering instance where δ is the L metric, and let S be a given k -clustering.After O ( n log n ) time preprocessing, we ﬁnd a k -clustering S (cid:48) : = S + x − y minimizing $ ( X , S (cid:48) ) among all choicesof ( x , y ) in O ( nk log n ) time. The 1-

SWAP algorithm can be extended to higher dimensions using the theory of geometric arrangements [

6, 7, 57 ] .The details are rather technical, so we only sketch the proofs here. As in Section 3.2, instead of working withthe L metric, we work with a polyhedral metric. Let the centrally-symmetric set N ⊆ S d − and the convexpolyhedron Q be deﬁned as in Section 3.2. The set N partitions (cid:82) d into a set of O ( ) polyhedral cones denoted by (cid:67) : = (cid:8) C , . . . , C γ (cid:9) , each with 0 as its apex so that all points (when viewed as vectors) in a cone have the samevector u of N as the nearest neighbor under the cosine distance, i.e. the polyhedral distance δ N ( u ) is realized by u . The total complexity of (cid:67) is O ( γ ) = O ( ) .We show that X can be preprocessed into a data structure so that $ ( X , S ) , the cost of the k -clustering inducedby any k -point subset S of X under δ N can be computed in O ( k d polylog ( n )) time.Let S ⊂ X be a set of k points. We compute the Voronoi diagram VD ( S ) of S under the distance function δ N . More precisely, for a point c ∈ S , let f c : (cid:82) d → (cid:82) ≥ be the function f c ( x ) : = δ N ( c , x ) . The graph of f c is apolyhedral cone in (cid:82) d + whose level set for the value λ is the homothet copy of Q , c + λ Q . Voronoi diagram VD ( S ) is the minimization diagram of function f c over every point c in S ; that is, the projection of the lower envelope f ( x ) : = min c f c ( x ) onto the hyperplane X d + = (cid:82) d ). We further decompose each Voronoi cellVor ( c ) of VD ( S ) by drawing the family of cones in (cid:67) from c ; put it differently, by drawing the cone c + C j for1 ≤ j ≤ k , within the cell Vor ( c ) . Each cell τ in the reﬁned subdivision of Vor ( c ) has the property that for all points x ∈ τ , δ N ( x , c ) is realized by the vector of N —by u j if x ∈ c + C j . Let ˜ V denote the resulting reﬁnement of VD ( S ) .Finally, we compute the vertical decomposition of each cell in ˜ V , which is the extension of the trapezoidaldecomposition to higher dimensions; see [

48, 57 ] for details. Let V (cid:107) denote the resulting convex subdivision of (cid:82) d .It is known that V (cid:107) has O ( k d − ) cells, that it can be computed in O ( k d − ) time, and that each cell of V (cid:107) is convexand bounded by at most 2 d facets, namely it is the intersection of at most 2 d halfspaces. Using the same structureof the distance function δ N , we can show that there is a set U of O ( γ d ) = O ( ) unit vectors such that each facet ofa cell in V (cid:107) is normal to a vector in U , and that U depends only on N and not on S .With these observations at hand, we preprocess X into a data structure as follows: we ﬁx a 2 d -tuple ¯ u : =( u , . . . , u d ) ∈ U d . Let R ¯ u be the set of all convex polyhedra formed by the intersection of at most 2 d halfspaceseach of which is normal to a vector in ¯ u . Using a multi-level range tree (consisting of 2 d levels), we preprocess X in O ( n log d n ) time into a data structure Ψ ¯ u of size O ( n log d − n ) for each ¯ u , so that for a query cell τ ∈ R ¯ u andfor a vector u ∈ N , we can quickly compute the total weight w ( τ , u ) = (cid:80) p ∈ X ∩ τ 〈 p , u 〉 in O ( log d n ) time. lustering under Perturbation Stability in Near-Linear Time S , we compute the cost $ ( X , S ) as follows. We ﬁrst compute VD ( S ) and V (cid:107) ( S ) . For each cell τ ∈ V (cid:107) ( S ) lying in Vor ( c ) , let u ( τ ) ∈ N be the vector u j such that τ ⊆ c + C j . As in the 2d case,$ ( X , S ) = (cid:88) c (cid:88) p ∈ Vor ( c ) δ N ( p , c ) = (cid:88) c (cid:88) τ ∈ V (cid:107) ( S ) ∩ Vor ( c ) (cid:88) p ∈ X ∩ τ 〈 p − c , u ( τ ) 〉 = (cid:88) c (cid:88) τ ∈ V (cid:107) ( S ) ∩ Vor ( c ) (cid:32) (cid:88) p ∈ X ∩ τ 〈 p , u ( τ ) 〉 − | X ∩ τ | · 〈 c , u ( τ ) 〉 (cid:33) .Fix a cell τ ∈ V (cid:107) ( S ) ∩ Vor ( c ) . Suppose τ ∈ R ¯ u . Then by querying the data structure Ψ ¯ u with τ and u ( τ ) , we cancompute w ( τ , u ) = (cid:80) p ∈ X ∩ τ 〈 p , u ( τ ) 〉 in O ( log d n ) time. Repeating this procedure over all cells of V (cid:107) ( S ) , $ ( X , S ) can be computed in O ( k d − log d n ) time, after an initial preprocessing of O ( n log d n ) time. In this section we provide an alternative way to compute the optimal k -clustering, where the objective can beany of k -center, k -means, or k -median. Here we are aiming for a running time linear in n , but potentially withexponential dependence on k . With such a goal we can further relax the stability requirement using the ideaof coresets . When there is strict separation between clusters (when α ≥ + (cid:112) k -medianover the local search approach, albeit with worse running time dependence on k . Coresets.

Let ( X , δ ) be a clustering instance. The radius of a cluster X i is the maximum distance betweenits center and any point in X i . Let S be a given k -clustering of ( X , δ ) , with clusters X , . . . , X k , centers c , . . . , c k ,and radius r , . . . , r k , respectively. Let O be the optimal k -clustering of ( X , δ ) , with clusters X ∗ , . . . , X ∗ k , centers c ∗ , . . . , c ∗ k and radius r ∗ , . . . , r ∗ k , respectively. Let B ( c , r ) denote the ball centered at c with radius r under δ .A point set Q ⊆ X is a multiplicative (cid:34) -coreset of X if every k -clustering S of Q satisﬁes X ⊆ (cid:91) i B ( c i , ( + (cid:34) ) · r i ) . Lemma 6.1.

Let ( X , δ ) be a ( + (cid:34) ) -stable clustering instance with optimal k -clustering O . A multiplicative (cid:34) -coreset of X contains at least one point from each cluster of O . Proof:

Let Q be a multiplicative (cid:34) -coreset of X . We start by deﬁning a k -clustering S Q of Q . For each point q in Q ,assign q to its cluster in the optimal clustering O . This results in some clustering with at most k clusters. Insertadditional empty clusters to create a valid k -clustering S Q of Q .Assume that Q does not contain any points from some optimal cluster X ∗ i of O . Consider the center point c ∗ i of X ∗ i . By the fact that Q is a multiplicative (cid:34) -coreset, c ∗ i must be contained in a ball resulting from the expansion ofeach cluster of S Q by an (cid:34) -fraction of its radius. In notation, let the cluster of S Q whose expansion covers c ∗ i be X j ,with center c j and radius r j . Then one has δ ( c j , c ∗ i ) ≤ ( + (cid:34) ) · r j .Because S Q is constructed by restricting the optimal clustering O on Q , cluster X j is a subset of some optimalcluster in O : X j ⊆ X ∗ j . This implies r j ≤ r ∗ j . Additionally, c j and c ∗ i lie in different optimal clusters, as c j is in Q andtherefore does not lie in X ∗ i . So by ( + (cid:34) ) -center proximity: δ ( c j , c ∗ i ) > ( + (cid:34) ) · δ ( c j , c ∗ j ) = ( + (cid:34) ) · r ∗ j ≥ ( + (cid:34) ) · r j ,contradicting to δ ( c j , c ∗ i ) ≤ ( + (cid:34) ) · r j . Therefore, Q must contain at least one point from each optimal cluster. (cid:131) Algorithm.

We ﬁrst compute a constant approximation to the clustering problem instance ( X , δ ) . We thenrecursively construct a multiplicative coreset of size O ( k ! /(cid:34) dk ) [

4, 44 ] . By taking (cid:34) =

1, the coreset has size O ( k ! ) and for 2-stable instances, the multiplicative 2-coreset Q contains at least one point from every optimal cluster byLemma 6.1. After obtaining this coreset, we then need to reconstruct the optimal clustering. By [

24, Corollary 9 ] ,4 P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl when our instance satisﬁes strict separation ( α ≥ + (cid:112) X . To ﬁnd k such points, each from a different optimal cluster, we try all possible k subsetsof Q as the candidate k centers. For each set of centers we compute its cost (under polygonal metric) using thecost computation scheme of Section 5, then take the clustering with minimum cost. Finally, recompute the optimalcenters using any 1-clustering algorithm on each cluster of O .It is known constant approximation to any of the k -means, k -median, or k -center instance can be computed in O ( nk ) time [ ] and even in ˜ O ( n ) time [

30, 44 ] in constant-dimensional Euclidean spaces. Thus computing themultiplicative 2-coreset takes O ( nk + k ! ) time [ ] . Using the cost computation scheme from Section 5, after O ( n log d n ) preprocessing time, the cost of each clustering can be computed in ˜ O ( k d − ) time. There are at most O (( k ! ) k ) possible choices of center set of size k .We conclude the section with the following theorem. Theorem 6.2.

Let X be a set with n points lying in (cid:82) d and k ≥ an integer. If the k -means, k -median, or k -centerinstance for X under the Euclidean distance is α -stable for α ≥ + (cid:112) then the optimal clustering can be computedin ˜ O ( nk + k d − · ( k ! ) k ) time. References [ ] M. Ackerman and S. Ben-David. Clusterability: A theoretical study. In

Proceedings of of the 12th InternationalConference on Artiﬁcial Intelligence and Statistics , volume 5 of

JMLR Proceedings , pages 1–8, 2009. [ ] P. Afshani, J. Barbay, and T. M. Chan. Instance-optimal geometric algorithms.

Journal of the ACM (JACM) ,64(1):3, 2017. [ ] P. K. Agarwal, J. Erickson, et al. Geometric range searching and its relatives.

Contemporary Mathematics ,223:1–56, 1999. [ ] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via coresets.

Combinatorial andcomputational geometry , 52:1–30, 2005. [ ] P. K. Agarwal and J. Matoušek. Ray shooting and parametric search.

SIAM Journal on Computing , 22(4):794–806, 1993. [ ] P. K. Agarwal, J. Pach, and M. Sharir. State of the union (of geometric objects). In J. E. Goodman, J. Pach,and R. Pollack, editors,

Surveys on Discrete and Computational Geometry: Twenty Years Later , volume 453 of

Contemporary Mathematics , pages 9–48. American Mathematical Society, 2008. [ ] P. K. Agarwal and M. Sharir. Arrangements and their applications. In J.-R. Sack and J. Urrutia, editors,

Handbook of computational geometry , pages 49–119. Elsevier, 2000. [ ] N. Ailon, A. Bhattacharya, R. Jaiswal, and A. Kumar. Approximate clustering with same-cluster queries. InA. R. Karlin, editor, , volume 94 of

Leibniz International Proceedings in Informatics (LIPIcs) , pages 40:1–40:21, Dagstuhl, Germany, 2018. SchlossDagstuhl–Leibniz-Zentrum fuer Informatik. [ ] D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering.

Machine learning , 75(2):245–248, 2009. [ ] O. Angel, S. Bubeck, Y. Peres, and F. Wei. Local max-cut in smoothed polynomial time. In

Proceedings of the49th Annual ACM SIGACT Symposium on Theory of Computing , pages 429–437. ACM, 2017. [ ] H. Angelidakis, P. Awasthi, A. Blum, V. Chatziafratis, and C. Dan. Bilu-Linial stability, certiﬁed algorithms andthe independent set problem. Preprint, October 2018. [ ] H. Angelidakis, K. Makarychev, and Y. Makarychev. Algorithms for stable and perturbation-resilient problems.In

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages 438–451. ACM,2017. lustering under Perturbation Stability in Near-Linear Time [ ] S. Arora, P. Raghavan, and S. Rao. Approximation schemes for Euclidean k -medians and related problems.In STOC , volume 98, pages 106–113, 1998. [ ] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k -medianand facility location problems. SIAM Journal on Computing , 33(3):544–562, Jan. 2004. [ ] H. Ashtiani, S. Kushagra, and S. Ben-David. Clustering with same-cluster queries. In

Advances in neuralinformation processing systems , pages 3216–3224, 2016. [ ] P. Awasthi, A. Blum, and O. Sheffet. Center-based clustering under perturbation stability.

InformationProcessing Letters , 112(1–2):49–54, 2012. [ ] M. B¯adoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In

Proceedings of the thiry-fourthannual ACM symposium on Theory of computing , pages 250–257. ACM, 2002. [ ] A. Bakshi and N. Chepurko. Polynomial time algorithm for 2-stable clustering instances. Preprint, July 2016. [ ] M.-F. Balcan, A. Blum, and A. Gupta. Approximate clustering without the approximation. In

Proceedings ofthe twentieth annual ACM-SIAM symposium on Discrete algorithms , pages 1068–1077. Society for Industrialand Applied Mathematics, 2009. [ ] M.-F. Balcan, A. Blum, and S. Vempala. A discriminative framework for clustering via similarity functions. In

Proceedings of the fortieth annual ACM symposium on Theory of computing , pages 671–680. ACM, 2008. [ ] M.-F. Balcan and M. Braverman. Nash equilibria in perturbation-stable games.

Theory of Computing , 13(1):1–31, 2017. [ ] M.-F. Balcan, N. Haghtalab, and C. White. k -center clustering under perturbation resilience. In I. Chatzigian-nakis, M. Mitzenmacher, Y. Rabani, and D. Sangiorgi, editors, , volume 55 of Leibniz International Proceedings in Informatics(LIPIcs) , pages 68:1–68:14, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. [ ] M. F. Balcan and Y. Liang. Clustering under perturbation resilience.

SIAM Journal on Computing , 45(1):102–155, 2016. [ ] S. Ben-David and L. Reyzin. Data stability in clustering: A closer look.

Theoretical Computer Science ,558(1):51–61, 2014. [ ] Y. Bilu and N. Linial. Are stable instances easy?

Combinatorics, Probability and Computing , 21(5):643–660,2012. [ ] P. B. Callahan and S. R. Kosaraju. Faster algorithms for some geometric graph problems in higher dimensions.In

Proceedings of the fourth annual ACM-SIAM symposium on Discrete algorithms , pages 291–300. Society forIndustrial and Applied Mathematics, 1993. [ ] C. Chekuri and S. Gupta. Perturbation resilient clustering for k -center and related problems via LP relaxations.In E. Blais, K. Jansen, J. D. P. Rolim, and D. Steurer, editors, Approximation, Randomization, and CombinatorialOptimization. Algorithms and Techniques (APPROX / RANDOM 2018) , volume 116 of

Leibniz InternationalProceedings in Informatics (LIPIcs) , pages 9:1–9:16, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. [ ] V. Cohen-Addad. A fast approximation scheme for low-dimensional k -means. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’18, pages 430–440, Philadelphia, PA,USA, 2018. Society for Industrial and Applied Mathematics. [ ] V. Cohen-Addad, A. de Mesmay, E. Rotenberg, and A. Roytman. The bane of low-dimensionality clustering.In

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’18, pages441–456, Philadelphia, PA, USA, 2018. Society for Industrial and Applied Mathematics.6

P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl [ ] V. Cohen-Addad, A. E. Feldmann, and D. Saulpic. Near-linear time approximation schemes for clustering indoubling metrics. Preprint, June 2019. [ ] V. Cohen-Addad, A. Gupta, A. Kumar, E. Lee, and J. Li. Tight FPT Approximations for k-Median and k-Means.In C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, , volume 132 of

Leibniz International Proceedings inInformatics (LIPIcs) , pages 42:1–42:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuerInformatik. [ ] V. Cohen-Addad, P. N. Klein, and C. Mathieu. Local search yields approximation schemes for k -means and k -median in Euclidean and minor-free metrics. SIAM Journal on Computing , 48(2):644–667, 2019. [ ] V. Cohen-Addad and C. Schwiegelshohn. On the local structure of stable clustering instances. In , pages 49–60. IEEE, 2017. [ ] S. Dasgupta. The hardness of k -means clustering. Technical report, Department of Computer Science andEngineering, University of California, 09 2008. [ ] M. De Berg, M. Van Kreveld, M. Overmars, and O. Schwarzkopf. Computational geometry. In

Computationalgeometry , pages 1–17. Springer, 1997. [ ] A. Deshpande, A. Louis, and A. Vikram Singh. On Euclidean k -means clustering with α -center proximity. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics , pages 2087–2095, 2019. [ ] A. Dutta, A. Vijayaraghavan, and A. Wang. Clustering stable instances of Euclidean k -means. In Advances inNeural Information Processing Systems , pages 6500–6509, 2017. [ ] T. Feder and D. Greene. Optimal algorithms for approximate clustering. In

Proceedings of the twentieth annualACM symposium on Theory of computing , pages 434–444. ACM, 1988. [ ] D. Feldman, M. Monemizadeh, and C. Sohler. A ptas for k-means clustering based on weak coresets. In

Proceedings of the twenty-third annual symposium on Computational geometry , pages 11–18. ACM, 2007. [ ] Z. Friggstad, K. Khodamoradi, and M. R. Salavatipour. Exact algorithms and lower bounds for stable instancesof Euclidean k -means. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms ,SODA ’19, pages 2958–2972, Philadelphia, PA, USA, 2019. Society for Industrial and Applied Mathematics. [ ] Z. Friggstad, M. Rezapour, and M. R. Salavatipour. Local search yields a PTAS for k -means in doublingmetrics. SIAM Journal on Computing , 48(2):452–480, 2019. [ ] T. F. Gonzalez. Clustering to minimize the maximum intercluster distance.

Theoretical Computer Science ,38:293–306, 1985. [ ] S. Har-Peled. No, coreset, no cry. In

International Conference on Foundations of Software Technology andTheoretical Computer Science , pages 324–335. Springer, 2004. [ ] S. Har-Peled and S. Mazumdar. On coresets for k -means and k -median clustering. In Proceedings of thethirty-sixth annual ACM symposium on Theory of computing , pages 291–300. ACM, 2004. [ ] D. Harel and R. E. Tarjan. Fast algorithms for ﬁnding nearest common ancestors.

SIAM J. Comput. , 13(2):338–355, 1984. [ ] D. S. Hochbaum and D. B. Shmoys. A best possible heuristic for the k-center problem.

Math. Oper. Res. ,10(2):180–184, May 1985. [ ] K. Jain, M. Mahdian, and A. Saberi. A new greedy approach for facility location problems. In

Proceedings ofthe thiry-fourth annual ACM symposium on Theory of computing , pages 731–740. ACM, 2002. [ ] V. Koltun. Almost tight upper bounds for vertical decompositions in four dimensions.

Journal of the ACM(JACM) , 51(5):699–730, 2004. lustering under Perturbation Stability in Near-Linear Time [ ] A. Kumar and R. Kannan. Clustering with spectral norm and the k -means algorithm. In Proceedings of the2010 IEEE 51st Annual Symposium on Foundations of Computer Science (FOCS) , pages 299–308. IEEE, 2010. [ ] A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + epsilon)-approximation algorithm for k-meansclustering in any dimensions. In Annual Symposium on Foundations of Computer Science , volume 45, pages454–462. IEEE COMPUTER SOCIETY PRESS, 2004. [ ] E. Lee, M. Schmidt, and J. Wright. Improved and simpliﬁed inapproximability for k -means. InformationProcessing Letters , 120:40–43, 2017. [ ] M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k -means problem is NP-hard. TheoreticalComputer Science , 442:13–21, 2012. [ ] K. Makarychev, Y. Makarychev, and A. Vijayaraghavan. Bilu-Linial stable instances of max cut and minimummultiway cut. In

Proceedings of the twenty-ﬁfth annual ACM-SIAM symposium on Discrete algorithms , pages890–906. SIAM, 2014. [ ] N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems.

SIAMjournal on computing , 13(1):182–196, 1984. [ ] M. Mihalák, M. Schöngens, R. Šrámek, and P. Widmayer. On the complexity of the metric TSP under stabilityconsiderations. In

International Conference on Current Trends in Theory and Practice of Computer Science ,pages 382–393. Springer, 2011. [ ] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k -means problem. Journal of the ACM (JACM) , 59(6):28, 2012. [ ] M. Sharir and P. K. Agarwal.

Davenport-Schinzel Sequences and Their Geometric Applications . CambridgeUniversity Press, New York, NY, USA, 1995. [ ] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees.

J. Comput. Syst. Sci. , 26(3):362–391,1983. [ ] C. D. Toth, J. O’Rourke, and J. E. Goodman, editors.

Handbook of discrete and computational geometry .Chapman and Hall / CRC, 2017.8

P. K. Agarwal, H.-C. Chang, K. Munagala, E. Taylor, and E. Welzl

A Appendix

A.1 Stability Properties

Properties of α -center proximity. Let ( X , δ ) be a clustering instance satisfying α -center proximity, where δ is ametric and α >

1. Let X be a cluster with center c in an optimal clustering. Let p , p (cid:48) , p (cid:48)(cid:48) ∈ X and q ∈ X \ X with c the center of q ’s cluster. Then,(1) ( α − ) · δ ( p , c ) < δ ( p , q ) ;(2) ( α − ) · δ ( p , c ) < δ ( c , c ) ;(3) ( α − ) · δ ( c , c ) < ( α + ) · δ ( p , q ) ;(4) ( α − ) · δ ( p , p (cid:48) ) < αα − · δ ( p , q ) ; δ ( p , p (cid:48) ) < δ ( p , q ) for α ≥ + (cid:112) ( α − ) · δ ( p (cid:48) , p (cid:48)(cid:48) ) < ( α + ) α − · δ ( p , q ) . δ ( p (cid:48) , p (cid:48)(cid:48) ) < δ ( p , q ) for α ≥ + (cid:112) Proof: (1) δ ( p , q ) ≤ ( α − ) · δ ( p , c ) yields the following contradiction. α · δ ( q , c ) < δ ( q , c ) ≤ δ ( p , c ) + δ ( p , q ) ≤ α · δ ( p , c ) ⇒ δ ( q , c ) < δ ( p , c ) α · δ ( p , c ) < δ ( p , c ) ≤ δ ( q , c ) + δ ( p , q ) ≤ δ ( q , c ) + ( α − ) · δ ( p , c ) ⇒ δ ( p , c ) < δ ( q , c ) (2) Follows from α · δ ( p , c ) < δ ( p , c ) ≤ δ ( p , c ) + δ ( c , c ) .(3) Follows by δ ( c , c ) ≤ δ ( c , p ) + δ ( p , q ) + δ ( q , c ) ( ) < (cid:0) α − + (cid:1) · δ ( p , q ) = α + α − · δ ( p , q ) .(4) Follows by ( α − ) · δ ( p , p (cid:48) ) ≤ ( α − ) · δ ( p , c ) + ( α − ) · δ ( p (cid:48) , c ) ( ) , ( ) < δ ( p , q ) + δ ( c , c ) ( ) < δ ( p , q ) + α + α − · δ ( p , q ) = αα − · δ ( p , q ) .(5) Follows by ( α − ) · δ ( p (cid:48) , p (cid:48)(cid:48) ) ≤ ( α − ) · δ ( p (cid:48) , c ) + ( α − ) · δ ( p (cid:48)(cid:48) , c ) ( ) < · δ ( c , c ) ( ) < ( α + ) α − · δ ( p , q ) . (cid:131) Proof (Proof of Lemma 2.3):

Let c and c be the centers of cluster X and q ’s cluster, respectively.(i) First we show that δ ( c , c ) < α + α − · δ ( p (cid:48)(cid:48) , q ) : δ ( p (cid:48)(cid:48) , q ) ≥ δ ( q , c ) − δ ( p (cid:48)(cid:48) , c ) ≥ δ ( c , c ) − δ ( q , c ) − δ ( p (cid:48)(cid:48) , c ) > δ ( c , c ) − (cid:0) δ ( q , p (cid:48)(cid:48) ) + δ ( p (cid:48)(cid:48) , q ) (cid:1) / ( α − ) .Rearranging the inequality proves the claim.Now α · δ ( p , c ) < δ ( p , c ) ≤ δ ( p , c ) + δ ( c , c ) , which implies that ( α − ) · δ ( p , c ) < δ ( c , c ) . It follows that δ ( p , p (cid:48) ) ≤ δ ( p , c ) + δ ( p (cid:48) , c ) < · δ ( c , c ) / ( α − ) < ( α + )( α − ) · δ ( p (cid:48)(cid:48) , q ) ≤ δ ( p (cid:48)(cid:48) , q ) ,where the last inequality holds when α ≥ + (cid:112) lustering under Perturbation Stability in Near-Linear Time δ ( c , c ) =

1. We know that δ ( p , c ) > α · δ ( p , c ) for all p ∈ X . The set of points p with δ ( p , c ) = α · δ ( p , c ) is known as Apollonian Circle A (with c inside, but not centered at c !), see Fig. A.1. X must be contained inside this circle A , or sphere in higher dimensions. Similarly, there is a sphere A enclosing q ’scluster (relative to X ).We take the classical fact that these are circles as given, but we want to understand the involved parameters.Of course, the circle A has to be centered on the line (cid:96) through c and c . Let a and b be the intersections of A with (cid:96) , with b on the segment c c . δ ( c , b ) = αδ ( c , b ) = α ( − δ ( c , b )) , hence δ ( c , b ) = α + . Similarly, δ ( c , a ) = αδ ( c , a ) = α ( + δ ( c , a )) , hence δ ( c , a ) = α − . This sets the diameter of A to α + + α − = αα − , andthe distance between A and A to 1 − · α + = α − α + . It follows that δ ( p (cid:48) , p (cid:48)(cid:48) ) /δ ( p (cid:48)(cid:48)(cid:48) , q ) < αα − / α − α + = α ( α − ) . (cid:131) Figure A.1.

The Apollonian Circles (with parameter α ) for clusters centered at c and c2