[PDF] Minimum Enclosing Ball Revisited: Stability and Sub-linear Time Algorithms

Abstract

In this paper, we revisit the Minimum Enclosing Ball (MEB) problem and its robust version, MEB with outliers, in Euclidean space R d . Though the problem has been extensively studied before, most of the existing algorithms need at least linear time (in the number of input points n and the dimensionality d ) to achieve a (1+ϵ) -approximation. Motivated by some recent developments on beyond worst-case analysis, we introduce the notion of stability for MEB (with outliers), which is natural and easy to understand. Roughly speaking, an instance of MEB is stable, if the radius of the resulting ball cannot be significantly reduced by removing a small fraction of the input points. Under the stability assumption, we present two sampling algorithms for computing approximate MEB with sample complexities independent of the number of input points n . In particular, the second algorithm has the sample complexity even independent of the dimensionality d . Further, we extend the idea to achieve a sub-linear time approximation algorithm for the MEB with outliers problem. Note that most existing sub-linear time algorithms for the problems of MEB and MEB with outliers usually result in bi-criteria approximations, where the "bi-criteria" means that the solution has to allow the approximations on the radius and the number of covered points. Differently, all the algorithms proposed in this paper yield single-criterion approximations (with respect to radius). We expect that our proposed notion of stability and techniques will be applicable to design sub-linear time algorithms for other optimization problems.

Full PDF

MMinimum Enclosing Ball Revisited: Stability, Sub-linear TimeAlgorithms, and Extension

Hu Ding

School of Computer Science and Engineering, University of Science and Technology of ChinaHe Fei, China [email protected]

Abstract.

In this paper, we revisit the Minimum Enclosing Ball (MEB) problem and its robustversion, MEB with outliers, in Euclidean space R d . Though the problem has been extensivelystudied before, most of the existing algorithms need at least linear time (in the number of inputpoints n and the dimensionality d ) to achieve a (1 + (cid:15) )-approximation. Motivated by some recentdevelopments on beyond worst-case analysis, we introduce the notion of stability for MEB (withoutliers), which is natural and easy to understand. Under the stability assumption, we present twosampling algorithms for computing approximate MEB with sample complexities independent of thenumber of input points; further, we achieve the ﬁrst sub-linear time single-criterion approximationalgorithm for the MEB with outliers problem. Our result can be viewed as a new step along thedirection of beyond worst-case analysis. We also show that our ideas can be extended to be moregeneral techniques, a novel uniform-adaptive sampling method and a sandwich lemma, for solvingthe general case of MEB with outliers ( i.e., without the stability assumption) and the problem of k -center clustering with outliers. We achieve sub-linear time bi-criteria approximation algorithmsfor these problems respectively; the algorithms have sample sizes independent of the number ofpoints n and the dimensionality d , which signiﬁcantly improve the time complexities of existingalgorithms. We expect that our technique will be applicable to design sub-linear time algorithms forother shape ﬁtting with outliers problems. a r X i v : . [ c s . C G ] J u l Introduction

Given a set P of n points in Euclidean space R d , where d could be quite high, the problem of Minimum Enclosing Ball (MEB) is to ﬁnd a ball with minimum radius to cover all the points in P [8, 33, 50]. MEB is a fundamental problem in computational geometry and ﬁnds applicationsin many ﬁelds such as machine learning and data mining. For example, one of the most popularclassiﬁcation models, Support Vector Machine (SVM) , can be formulated as an MEB problemin high dimensional space, and fast MEB algorithms can be adopted to speed up its trainingprocedure [22, 23, 67]. Recently, MEB has also been used for preserving privacy [32, 59] andquantum cryptography [39].In real world applications, we often need to assume the presence of outliers in given datasets.MEB with outliers is a natural generalization of the MEB problem, where the goal is to ﬁnd theminimum ball covering at least a certain fraction or number of input points; for example, theball may be required to cover at least 90% of the points and leave the remaining 10% of pointsas outliers. The existence of outliers makes the problem not only non-convex but also highlycombinatorial; the high dimensionality of the problem further increases its challenge.The MEB (with outliers) problem has been extensively studied before (a detailed discussionon previous works is given in Section 1.2). However, almost all of them need at least linear time(in terms of n and d ) to obtain a (1 + (cid:15) )-approximation. This is not quite ideal, especially inbig data where the size of the dataset could be so large that we cannot even aﬀord to read thewhole dataset once. This motivates us to ask the following question: is it possible to developapproximation algorithms for MEB (with outliers) that run in sub-linear time in the input size?Designing sub-linear time algorithms has become a promising approach to handle many big dataproblems and has attracted a great deal of attentions in the past decades [24, 64]. Our idea for designing sub-linear time MEB (with outliers) algorithms is inspired by some recentdevelopments on optimization with respect to stable instances, under the umbrella of beyondworst-case analysis [63]. Many NP-hard optimization problems have shown to be challengingeven for approximation, but admit eﬃcient solutions in practice. Several recent works tried toexplain this phenomenon and introduced the notion of stability for problems like clusteringand max-cut [6, 14, 15, 60]. In this paper, we give the notion of “stability” for MEB. Roughlyspeaking, an instance of MEB is stable, if the radius of the resulting ball cannot be signiﬁcantlyreduced by removing a small fraction of the input points ( e.g., the radius cannot be reduced by10% if only 1% of the points are removed). The rationale behind this notion is quite natural: ifthe given instance is not stable, the small fraction of points causing signiﬁcant reduction in theradius should be viewed as outliers (or they may need to be covered by additional balls, e.g., k -center clustering [37, 44]). To the best of our knowledge, this is the ﬁrst study on MEB (withoutliers) from the perspective of stability. Our main contribution contains the following threeparts. (1) We prove an important implication of the stability assumption that is useful not onlyfor designing sub-linear time MEB (with outliers) algorithms, but also for handling incompletedatasets (Section 3). Using this implication, we propose two sampling algorithms for computingapproximate MEB with sub-linear time complexities (Section 4); in particular, our secondalgorithm has the sample size ( i.e., the number of sampled points) independent of the inputsize n and dimensionality d . The approximation ratios of both algorithms are in the form ofsome function f ( (cid:15), α ); lim (cid:15),α → f ( (cid:15), α ) = 1, where (cid:15) is a small error caused in the computationand α is a parameter for measuring the stability (the instance is more stable if α is smaller).We further consider the MEB with outliers problem of stable instances, and obtain a sub-lineartime single-criterion algorithm in Section 5.Note that if we arbitrarily select a point from the input dataset, it will be the center ofa 2-approximate MEB by the triangle inequality. However, it is challenging to determine the1adius of the ball in sub-linear time. In some applications, only estimating the position of theball center may not be suﬃcient, and a ball covering all the given points is thus needed. In thispaper, we aim to determine not only the center of the ball, but also its radius, in sub-linear time. (2) In Section 6, we consider the general case of MEB with outliers ( i.e., without thestability assumption). We modify our previous ideas for stable instances, and propose two generaltechniques, a novel “uniform-adaptive sampling” method and a “sandwich” lemma. By usingthese techniques, we obtain a sub-linear time bi-criteria approximation algorithm, where the“bi-criteria” means that the ball is allowed to exclude a little more points than the pre-speciﬁednumber of outliers. Our result is the ﬁrst sub-linear time approximation algorithm for MEB withoutliers with sample size independent of the number of points n and the dimensionality d , whichsigniﬁcantly improves the time complexities of existing algorithms. (3) Finally, we observe that our uniform-adaptive sampling method and sandwich lemmacan be used to solve the general k -center clustering with outliers problem, where the goal isto ﬁnd k balls to cover at least a certain fraction of input points and minimize the maximumradius of the balls (note that the MEB with outliers problem is just the case of k = 1). Similarto our result for MEB with outliers, in Section 7 we present the ﬁrst sub-linear time bi-criteriaapproximation algorithm for k -center clustering with outliers with sample size independent ofthe number of points n and the dimensionality d . Moreover, we expect that our uniform-adaptivesampling method and sandwich lemma will be applicable to design sub-linear time algorithmsfor other shape ﬁtting with outliers problems, e.g., ﬂat and polytope ﬁtting [26, 41, 42, 61]. The works most related to ours are [5, 23]. Alon et al. [5] studied the following property testingproblem: given a set of n points in some metric space, determine whether the instance is( k, b )-clusterable, where an instance is called ( k, b )-clusterable if it can be covered by k ballswith radius (or diameter) b >

0. They proposed several sampling algorithms to answer thequestion “approximately”. Particularly, they distinguish between the case that the instanceis ( k, b )-clusterable and the case that it is (cid:15) -far away from ( k, b (cid:48) )-clusterable, where (cid:15) ∈ (0 , b (cid:48) ≥ b . “ (cid:15) -far” means that more than (cid:15)n points should be removed so that it becomes( k, b (cid:48) )-clusterable. Note that their method cannot yield a single-criterion approximation algorithmfor MEB or k -center clustering (with outliers), since it will introduce an unavoidable error onthe number of covered points due to the relaxation of “ (cid:15) -far”. However, it is possible to convertit into bi-criteria approximation algorithms for MEB and k -center clustering with outliers (asdeﬁned in Section 2); but its sample size depends on the dimensionality d (similar results werealso presented in [29, 45]). Our bi-criteria approximation algorithms presented in Section 6 and 7have the sample sizes independent of both n and d . Note that Alon et al. showed in [5] anotherproperty testing algorithm with sample size independent of d for testing ( k, b )-clusterable, but itis challenging to be used to solve the problems of MEB and k -center clustering with outliers(their algorithm relies on the property of minimum enclosing ball, but the ball is mixed withoutliers in our case).Clarkson et al. [23] developed an elegant perceptron framework for solving several optimizationproblems arising in machine learning, such as MEB. For a set of n points in R d represented as an n × d matrix with M non-zero entries, their framework can solve the MEB problem in ˜ O ( n(cid:15) + d(cid:15) ) time. Note that the parameter “ (cid:15) ” is an additive error ( i.e., the resulting radius is r + (cid:15) if r isthe radius of the optimal MEB) which can be converted into a relative error ( i.e., (1 + (cid:15) ) r ) in O ( M ) preprocessing time. Thus, if M = o ( nd ), the running time is still sub-linear in the inputsize nd . Our algorithms have diﬀerent sub-linear time complexities which are independent of thenumber of input points. The asymptotic notation ˜ O ( f ) = O (cid:0) f · polylog ( nd(cid:15) ) (cid:1) . EB and k -center clustering (with outliers). A core-set [1] is a small set of points thatapproximates the structure/shape of a much larger point set, and thus can be used to signiﬁcantlyreduce the time complexities for many optimization problems (the reader is referred to a recentsurvey [62] for more details on core-sets). The core-set idea has also been used to approximatethe MEB problem in high dimensional space [10, 50]. B˘adoiu and Clarkson [8] showed that it ispossible to ﬁnd a core-set of size (cid:100) /(cid:15) (cid:101) that yields a (1 + (cid:15) )-approximate MEB ; later, they [9]further proved that actually only (cid:100) /(cid:15) (cid:101) points are suﬃcient, but their core-set construction ismore complicated. In fact, the algorithm for computing the core-set of MEB is a Frank-Wolfe style algorithm [34], which has been systematically studied by Clarkson [22]. There are alsoseveral exact and approximation algorithms for MEB that do not rely on core-sets [4, 33, 61, 65].Most of these algorithms have linear time complexities. Agarwal and Sharathkumar [2] presenteda streaming ( √ + (cid:15) )-approximation algorithm for MEB; later, Chan and Pathak [19] provedthat the same algorithm has an approximation ratio less than 1 . et al. [10] extended their core-set idea to the problems of MEB and k -center clusteringwith outliers, and achieved linear time bi-criteria approximation algorithms (if k is assumed to bea constant). Recently, Ding et al. [29] provided a linear time tri-criteria approximation algorithm(it outputs more than k clusters) for the k -center clustering with outliers problem, where the ideais based on the well-known Gonzalez’s algorithm for ordinary k -center clustering [37]. Severalalgorithms for the low dimensional MEB with outliers problem have also been developed [3, 30,40,54]. A 3-approximation algorithm for k -center clustering with outliers in arbitrary metrics wasproposed by Charikar et al. [20]; recently, Chakrabarty et al. [18] proposed a 2-approximationalgorithm for k -center clustering with outliers. These algorithms often have high time complexities( e.g., Ω ( n d )). There are a number of existing works on streaming and distributed MEB and k -center clustering with outliers, such as [17, 21, 38, 51, 53, 55, 69]. Optimizations under stability.

Bilu and Linial [15] showed that the Max-Cut problembecomes easier if the given instance is stable with respect to perturbation on edge weights.Ostrovsky et al. [60] proposed a separation condition for k -means clustering which refers to thescenario where the clustering cost of k -means is signiﬁcantly lower than that of ( k − et al. [14] introduced the concept of approximation-stability forﬁnding the ground-truth of k -median and k -means clustering. Awasthi et al. [6] introducedanother notion of clustering stability and gave a PTAS for k -median and k -means clustering.More algorithms on clustering problems under stability assumption were studied in [7, 11–13, 49]. Sub-linear time algorithms.

Indyk presented sub-linear time algorithms for several metricspace problems, such as k -median clustering [46] and 2-clustering [47]. More sub-linear timeclustering algorithms have been studied in [25,56,57]. Another important motivation for designingsub-linear time algorithms is property testing. For example, Goldreich et al. [36] focused onusing small sample to test some natural graph properties. More detailed discussion on sub-lineartime algorithms can be found in the survey papers [24, 64]. In this paper, we let | A | denote the number of points of a given point set A in R d , and || x − y || denote the Euclidean distance between two points x and y in R d . We use B ( c, r ) to denote theball centered at a point c with radius r >

0. Below, we ﬁrst give the deﬁnitions of MEB and theproperty of stability.

Deﬁnition 1 (Minimum Enclosing Ball (MEB)).

Given a set P of n points in R d , theMEB problem is to ﬁnd a ball with minimum radius to cover all the points in P . The resultingball and its radius are denoted by M EB ( P ) and Rad ( P ) , respectively. A ball B ( c, r ) is called a λ -approximation of M EB ( P ) for some λ ≥

1, if the ball covers allpoints in P and has radius r ≤ λRad ( P ). 3 eﬁnition 2 (( α , β )-stable). Given a set P of n points in R d with two small parameters α and β in (0 , , P is an ( α , β )-stable instance if Rad ( P (cid:48) ) ≥ (1 − α ) Rad ( P ) for any P (cid:48) ⊂ P with | P (cid:48) | ≥ (1 − β ) n . Intuitively, the property of stability indicates that

Rad ( P ) cannot be signiﬁcantly reducedafter removing any small fraction of points from P . For a ﬁxed β , the smaller α is, the morestable P becomes. Actually, our stability assumption is quite reasonable in practice. For example,if the radius of MEB can be reduced considerably (say by 10%) after removing only a smallfraction (say 1%) of points, it is natural to view the small fraction of points as outliers. Anotherintuition of stability is shown in Section 9, which says that if the distribution of P is denseenough and β is ﬁxed, α will tend to 0 as d increases. Moreover, the stability property impliesthat the MEB of a stable instance stays stable in the space, even a small fraction of points aremissed (we prove this implication in Section 3). Deﬁnition 3 (MEB with Outliers).

Given a set P of n points in R d and a small parameter γ ∈ (0 , , the MEB with outliers problem is to ﬁnd the smallest ball that covers (1 − γ ) n points.Namely, the task is to ﬁnd a subset of P with size (1 − γ ) n such that the resulting MEB is thesmallest among all possible choices of the subset. The obtained ball is denoted by M EB ( P, γ ) . For convenience, we use P opt to denote the optimal subset of P with respect to M EB ( P, γ ).That is, P opt = arg Q min (cid:110) Rad ( Q ) | Q ⊂ P, | Q | = (1 − γ ) n (cid:111) . From Deﬁnition 3, we can seethat the main issue is to determine the subset of P . Actually, solving such combinatorialproblems involving outliers are often challenging. For example, Mount et al. [58] showed that anyapproximation for linear regression with n points and γn outliers requires Ω (cid:0) ( γn ) d (cid:1) time underthe assumption of the hardness of aﬃne degeneracy [31]; they then turned to ﬁnd an eﬃcientbi-criteria approximation algorithm instead. Similarly, we also design a bi-criteria approximationfor the general case of the MEB with outliers problem. Deﬁnition 4 (Bi-criteria Approximation).

Given an instance ( P, γ ) for MEB with outliersand two small parameters < (cid:15), δ < , a (1 + (cid:15), δ ) -approximation of ( P, γ ) is a ball thatcovers at least (cid:0) − (1 + δ ) γ (cid:1) n points and has radius at most (1 + (cid:15) ) Rad ( P opt ) . When both (cid:15) and δ are small, the bi-criteria approximation is very close to the optimal solutionwith only slight changes on the number of covered points and the radius.We also extend the stability property of MEB to MEB with outliers. Deﬁnition 5 (( α , β )-stable for MEB with Outliers). Given an instance ( P, γ ) of theMEB with outliers problem in Deﬁnition 3, ( P, γ ) is an ( α , β )-stable instance if Rad ( P (cid:48) ) ≥ (1 − α ) Rad ( P opt ) for any P (cid:48) ⊂ P with | P (cid:48) | ≥ (cid:0) − γ − β (cid:1) n . Deﬁnition 5 directly implies the following claim.

Claim 1.

If (

P, γ ) is an ( α , β )-stable instance of the problem of MEB with outliers, thecorresponding P opt is an ( α , β − γ )-stable instance of MEB.To see the correctness of Claim 1, we can use contradiction. Suppose that there exists asubset P (cid:48) ⊂ P opt such that | P (cid:48) | ≥ (1 − β − γ ) | P opt | = (1 − γ − β ) n and Rad ( P (cid:48) ) < (1 − α ) Rad ( P opt ).Then, it is in contradiction to the fact that ( P, γ ) is an ( α, β )-stable instance of MEB withoutliers.

Before presenting our main results, we ﬁrst revisit the core-set construction algorithm for MEBby B˘adoiu and Clarkson [8], since their method will be used in our algorithms for MEB (withoutliers). 4et 0 < (cid:15) <

1. The algorithm of B˘adoiu and Clarkson [8] yields an MEB core-set of size2 /(cid:15) (for convenience, we always assume that 2 /(cid:15) is an integer). However, there is a small issuein their paper. The analysis assumes that the exact MEB of the core-set is computed in eachiteration, but instead one may only compute an approximate MEB. Thus, an immediate questionis whether the quality is still guaranteed with such a change. Kumar et al. [50] ﬁxed this issue,and showed that computing a (1 + O ( (cid:15) ))-approximate MEB for the core-set in each iterationstill guarantees a core-set with size O (1 /(cid:15) ), where the hidden constant is >

80. Increasing thecore-set size from 2 /(cid:15) to 80 /(cid:15) is neglectable in asymptotic analysis. But in Section 6, we will showthat it could cause serious issues if outliers exist. Hence, a core-set of size 2 /(cid:15) is still desirable.For this purpose, we will provide a new analysis below.For the sake of completeness, we ﬁrst brieﬂy introduce the idea of the core-set constructionalgorithm in [8]. Given a point set Q ⊂ R d , the algorithm is a simple iterative procedure. Initially,it selects an arbitrary point from Q and places it into an initially empty set T . In each ofthe following 2 /(cid:15) iterations, the algorithm updates the center of M EB ( T ) and adds to T thefarthest point from the current center of M ES ( T ). Finally, the center of M EB ( T ) induces a(1 + (cid:15) )-approximation for M EB ( Q ). The selected set of 2 /(cid:15) points ( i.e. , T ) is called the core-setof MEB. To ensure the expected improvement in each iteration, [8] showed that the followingtwo inequalities hold if the algorithm always selects the farthest point to the current center of M EB ( T ): r i +1 ≥ (1 + (cid:15) ) Rad ( Q ) − L i ; r i +1 ≥ (cid:113) r i + L i , (1)where r i and r i +1 are the radii of M EB ( T ) in the i -th and ( i + 1)-th iterations, respectively,and L i is the shifting distance of the center of M EB ( T ) from the i -th to ( i + 1)-th iteration.Fig. 1: An illustration of (2).As mentioned earlier, we often compute only an approx-imate M EB ( T ) in each iteration. In i -th iteration, we let c i and o i denote the centers of the exact and the approximate M EB ( T ), respectively. Suppose that || c i − o i || ≤ ξr i , where ξ ∈ (0 , (cid:15) (cid:15) ) (we will see why this bound is needed later). Notethat we only compute o i rather than c i in each iteration. Asa consequence, we can only select the farthest point (say q )to o i . If || q − o i || ≤ (1 + (cid:15) ) Rad ( Q ), we are done and a (1 + (cid:15) )-approximation of MEB is already obtained. Otherwise, we have(1 + (cid:15) ) Rad ( Q ) < || q − o i || ≤ || q − c i +1 || + || c i +1 − c i || + || c i − o i || ≤ r i +1 + L i + ξr i (2)by the triangle inequality (see Figure 1). In other words, we should replace the ﬁrst inequalityof (1) by r i +1 > (1 + (cid:15) ) Rad ( Q ) − L i − ξr i . Also, the second inequality of (1) still holds since itdepends only on the property of the exact MEB (see Lemma 2.1 in [8]). Thus, we have r i +1 ≥ max (cid:110)(cid:113) r i + L i , (1 + (cid:15) ) Rad ( Q ) − L i − ξr i (cid:111) . (3)This leads to the following theorem whose proof can be found in Section 10. Theorem 1.

In the core-set construction algorithm of [8], if one computes an approximate MEBfor T in each iteration and the resulting center o i has the distance to c i less than ξr i = s (cid:15) (cid:15) r i for some s ∈ (0 , , the ﬁnal core-set size is bounded by z = − s ) (cid:15) . Also, the bound could bearbitrarily close to /(cid:15) when s is small enough.Remark 1. We want to emphasize a simple observation on the above core-set constructionprocedure, which will be used in our algorithms and analysis later on. The above core-setconstruction algorithm always selects the farthest point to o i in each iteration. However, this5s actually not necessary. As long as the selected point has distance at least (1 + (cid:15) ) Rad ( Q ),the inequality (2) always holds and the following analysis is still true. If no such a point exists( i.e., Q \ B (cid:0) o i , (1 + (cid:15) ) Rad ( Q ) (cid:1) = ∅ ), a (1 + (cid:15) )-approximate MEB ( i.e., B (cid:0) o i , (1 + (cid:15) ) Rad ( Q ) (cid:1) ) hasalready been obtained. Fig. 2: We expand B (˜ o, r ), andthe larger ball is an approxi-mate MEB of P .In this section, we show an important implication of the stabilityproperty described in Deﬁnition 2. Theorem 2.

Let P be an ( α , β )-stable instance of the MEBproblem, and o be the center of its MEB. Let (cid:15) ∈ [0 , and ˜ o bea given point in R d . If the ball B (cid:0) ˜ o, r (cid:1) covers at least (1 − β ) n points from P and r ≤ (1 + (cid:15) ) Rad ( P ) , the following holds || ˜ o − o || < (cid:0) √ (cid:15) + 2 √ α (cid:1) Rad ( P ) . (4)Theorem 2 indicates that if a ball covers a large enoughsubset of P and its radius is bounded, its center should be closeto the center of M EB ( P ). Furthermore, the more stable theinstance P is ( i.e., α is smaller), the closer the two centers are.Actually, besides using it to design our sub-linear time MEB algorithms later, Theorem 2 is alsouseful in other practical scenarios. For example, if we miss βn points from P , we can computea (1 + (cid:15) )-approximate MEB of the remaining (1 − β ) n points, denoted by B (˜ o, r ) the obtainedball. Since the ball is a (1 + (cid:15) )-approximate MEB of a subset of P , we have r ≤ (1 + (cid:15) ) Rad ( P ).Moreover, due to Deﬁnition 2, we know r ≥ (1 − α ) Rad ( P ). Together with Theorem 2, we have P ⊂ (cid:124)(cid:123)(cid:122)(cid:125) by (4) B (cid:16) ˜ o, (cid:0) √ (cid:15) + 2 √ α (cid:1) Rad ( P ) (cid:17) ⊂ (cid:124)(cid:123)(cid:122)(cid:125) by r ≥ (1 − α ) Rad ( P ) B (cid:16) ˜ o, √ (cid:15) + 2 √ α − α r (cid:17) (5)and the radius √ (cid:15) +2 √ α − α r ≤ √ (cid:15) + 2 √ α − α (1 + (cid:15) ) Rad ( P ) = 1 + O ( √ (cid:15) ) + 2(1 + (cid:15) ) √ α − α Rad ( P ) . (6)That is, the ball B (cid:16) ˜ o, √ (cid:15) +2 √ α − α r (cid:17) is a O ( √ (cid:15) )+2(1+ (cid:15) ) √ α − α -approximate MEB of P (see Figure 2).Note that we cannot directly use B (cid:16) ˜ o, (cid:0) √ (cid:15) + 2 √ α (cid:1) Rad ( P ) (cid:17) since we do not know thevalue of Rad ( P ). Based on the above analysis, even if we have βn missed points, we are still ableto compute an approximate MEB of P . But this approach has a time complexity of Ω ((1 − β ) nd ).In Section 4, we will present sub-linear time algorithms for MEB.Now, we prove Theorem 2. Let P (cid:48) = B (cid:0) ˜ o, r (cid:1) ∩ P . To bound the distance between ˜ o and o ,we need to bridge them by the ball M EB ( P (cid:48) ). Let o (cid:48) be the center of M EB ( P (cid:48) ). The followingare two key lemmas to the proof. Lemma 1.

The distance || o (cid:48) − o || ≤ √ α − α Rad ( P ) .Proof. We consider two cases:

M EB ( P (cid:48) ) is totally covered by M EB ( P ) and otherwise. For theﬁrst case (see Figure 3a), it is easy to see that || o (cid:48) − o || ≤ Rad ( P ) − (1 − α ) Rad ( P ) = αRad ( P ) < (cid:112) α − α Rad ( P ) , (7)where the ﬁrst inequality comes from the fact that M EB ( P (cid:48) ) has radius at least (1 − α ) Rad ( P )(Deﬁnition 2), and the last inequality comes from the fact that α <

1. Thus, we can focus on thesecond case below.Let a be any point located on the intersection of the two spheres of M EB ( P (cid:48) ) and M EB ( P ).Consequently, we have the following claim. 6 a) (b) (c) (d) Fig. 3: (a) The case

M EB ( P (cid:48) ) ⊂ M EB ( P ); (b) an illustration of Claim 2; (c) the angle ∠ ao (cid:48) o ≥ π/

2; (d) an illustration of Lemma 2.

Claim 2.

The angle ∠ ao (cid:48) o ≥ π/ Proof.

Suppose that ∠ ao (cid:48) o < π/

2. Note that ∠ aoo (cid:48) is always smaller than π/ || o − a || = Rad ( P ) ≥ Rad ( P (cid:48) ) = || o (cid:48) − a || . Therefore, o and o (cid:48) are separated by the hyperplane H that isorthogonal to the segment o (cid:48) o and passing through the point a . See Figure 3b.Now we show that P (cid:48) can be covered by a ball smaller than M EB ( P (cid:48) ). Let o H be the point H ∩ o (cid:48) o , and t (resp., t (cid:48) ) be the point collinear with o and o (cid:48) on the right side of the sphere of M EB ( P (cid:48) ) (resp., left side of the sphere of M EB ( P ); see Figure 3b). Then, we have || t − o H || + || o H − o (cid:48) || = || t − o (cid:48) || = || a − o (cid:48) || < || o (cid:48) − o H || + || o H − a || = ⇒ || t − o H || < || o H − a || . (8)Similarly, we have || t (cid:48) − o H || < || o H − a || . Consequently, M EB ( P ) ∩ M EB ( P (cid:48) ) is coveredby the ball B ( o H , || o H − a || ). Further, because P (cid:48) is covered by M EB ( P ) ∩ M EB ( P (cid:48) ) and || o H − a || < || o (cid:48) − a || = Rad ( P (cid:48) ), P (cid:48) is covered by the ball B ( o H , || o H − a || ) that is smaller than M EB ( P (cid:48) ). This contradicts to the fact that M EB ( P (cid:48) ) is the minimum enclosing ball of P (cid:48) .Thus, the claim ∠ ao (cid:48) o ≥ π/ (cid:117)(cid:116) Given Claim 2, we know that || o (cid:48) − o || ≤ (cid:113)(cid:0) Rad ( P ) (cid:1) − (cid:0) Rad ( P (cid:48) ) (cid:1) . See Figure 3c. Moreover,Deﬁnition 2 implies that Rad ( P (cid:48) ) ≥ (1 − α ) Rad ( P ). Therefore, we have || o (cid:48) − o || ≤ (cid:113)(cid:0) Rad ( P ) (cid:1) − (cid:0) (1 − α ) Rad ( P ) (cid:1) = (cid:112) α − α Rad ( P ) . (9) (cid:117)(cid:116) Lemma 2.

The distance || ˜ o − o (cid:48) || ≤ √ (cid:15) + (cid:15) + 2 α − α Rad ( P ) .Proof. Let L be the hyperplane orthogonal to the segment ˜ oo (cid:48) and passing through the center o (cid:48) .Suppose ˜ o is located on the left side of L . Then, there exists a point b ∈ P (cid:48) located on the rightclosed semi-sphere of M EB ( P (cid:48) ) divided by L (this result was proved in [10, 35] and see Lemma2.2 in [10]; for completeness, we also state the lemma in Section 11). See Figure 3d. That is, theangle ∠ bo (cid:48) ˜ o ≥ π/

2. As a consequence, we have || ˜ o − o (cid:48) || ≤ (cid:112) || ˜ o − b || − || b − o (cid:48) || . (10)Moreover, since || ˜ o − b || ≤ r ≤ (1 + (cid:15) ) Rad ( P ) and || b − o (cid:48) || = Rad ( P (cid:48) ) ≥ (1 − α ) Rad ( P ), (10)implies that || ˜ o − o (cid:48) || ≤ (cid:112) (1 + (cid:15) ) − (1 − α ) Rad ( P ) = √ (cid:15) + (cid:15) + 2 α − α Rad ( P ). (cid:117)(cid:116)

7y triangle inequality and Lemmas 1 and 2, we immediately have || ˜ o − o || ≤ || ˜ o − o (cid:48) || + || o (cid:48) − o ||≤ (cid:0)(cid:112) (cid:15) + (cid:15) + 2 α − α + (cid:112) α − α (cid:1) Rad ( P ) < (cid:0) √ (cid:15) + 2 α + √ α (cid:1) Rad ( P ) < (cid:0) √ (cid:15) + 2 √ α (cid:1) Rad ( P ) . (11)This completes the proof of Theorem 2. Using Theorem 2, we present two diﬀerent sub-linear time sampling algorithms for computingMEB. The ﬁrst one is simpler, but has a sample size depending on the dimensionality d , whilethe second one has a sample size independent of both n and d . Algorithm 1 is based on the theory of VC dimension and (cid:15) -net [43, 68]. Roughly speaking, wecompute an approximate MEB of a small random sample ( i.e., B ( c, r )), and expand the ballslightly; then we prove that this expanded ball is an approximate MEB of the whole data set.The key idea is to show that B ( c, r ) covers at least (1 − β ) n points and therefore c is close to theoptimal center by Theorem 2. Due to space limit, we leave the proof of Theorem 3 to Section 12. Algorithm 1

MEB Algorithm I

Input:

An ( α , β )-stable instance P of MEB problem in R d ; a small parameter (cid:15) > S of Θ ( dβ log dβ ) points from P .2: Apply any approximate MEB algorithm (such as the core-set based algorithm [8]) to compute a (1 + (cid:15) )-approximate MEB of S , and let the resulting ball be B ( c, r ).3: Output the ball B (cid:0) c, √ (cid:15) +2 √ α − α r (cid:1) . Theorem 3.

With constant probability, Algorithm 1 returns a λ -approximate MEB of P , where λ = 1 + O ( √ (cid:15) ) + 2(1 + (cid:15) ) √ α − α and lim (cid:15),α → λ = 1 . (12) The running time is O (cid:0) d (cid:15)β log dβ + d(cid:15) (cid:1) . If the dimensionality d is too high, the random projection based technique Johnson-Lindenstrauss (JL) transform [27] can be used to approximately preserve the radius of enclosingball [48, 66]. However, it is not very useful for reducing the time complexity of Algorithm 1. If weapply JL-transform on the sampled Θ ( dβ log dβ ) points in Step 1, the JL-transform step itself willtake Ω ( d β log dβ ) time (our second algorithm in Section 4.2 has the time complexity linear in d ). To better understand the second sampling algorithm, we brieﬂy overview our idea below.8ig. 4: An illustration ofLemma 3; the red points arethe set Q of sampled points. High level idea:

Recall our remark below Theorem 1 in Sec-tion 2.1. If we know the value of (1 + (cid:15) ) Rad ( P ), we can performalmost the same core-set construction procedure described inTheorem 1 to achieve an approximate center of M EB ( P ),where the only diﬀerence is that we add a point with distanceat least (1 + (cid:15) ) Rad ( P ) to o i in each iteration. In this way, weavoid selecting the farthest point to o i , since this operationwill inevitably have a linear time complexity. To implementour strategy in sub-linear time, we need to determine the valueof (1 + (cid:15) ) Rad ( P ) ﬁrst. We use Lemma 3 to estimate the rangeof Rad ( P ), and then perform a binary search on the rangeto determine the value of (1 + (cid:15) ) Rad ( P ) approximately. Based on the stability property, weobserve that the core-set construction procedure can serve as an “oracle” to help us guess thevalue of (1 + (cid:15) ) Rad ( P ) (see Algorithm 2). Let h > h to o i in each iteration. We prove that the procedure cannot continue morethan z iterations if h ≥ (1 + (cid:15) ) Rad ( P ), and will continue more than z iterations with constantprobability if h < (1 − α ) Rad ( P ), where z = − s ) (cid:15) is the size of core-set described in Theorem 1.Also, during the procedure of core-set construction, we add the points to the core-set via randomsampling, rather than a deterministic way. As a consequence, we obtain our second sub-lineartime algorithm and the ﬁnal result is presented in Theorem 5. Lemma 3.

Let P be an ( α , β )-stable instance of MEB problem. Given a parameter η ∈ (0 , ,one selects an arbitrary point p ∈ P and takes a random sample Q ⊂ P with | Q | = β log η . Let p be the point farthest to p from Q . Then, with probability − η , Rad ( P ) ∈ [ 12 || p − p || , − α || p − p || ] . (13) Proof.

First, the lower bound of

Rad ( P ) is obvious since || p − p || is always no larger than2 Rad ( P ). Then, we consider the upper bound. Let B ( p , l ) be the ball covering exactly (1 − β ) n points of P , and thus l ≥ (1 − α ) Rad ( P ) according to Deﬁnition 2. To complete our proof, wealso need the following folklore lemma presented in [28]. Lemma 4. [28] Let N be a set of elements, and N (cid:48) be a subset of N with size | N (cid:48) | = β | N | for some β ∈ (0 , . If one randomly samples ln 1 /η ln 1 / (1 − β ) ≤ β ln η elements from N , then withprobability at least − η , the sample contains at least one element of N (cid:48) for any η ∈ (0 , . In Lemma 4, let N and N (cid:48) be the point set P and the subset P \ B ( p , l ), respectively. We knowthat Q contains at least one point from N (cid:48) according to Lemma 4. Namely, Q contains at least onepoint outside B ( p , l ). See Figure 4. As a consequence, we have || p − p || ≥ l ≥ (1 − α ) Rad ( P ), i.e. , Rad ( P ) ≤ − α || p − p || . (cid:117)(cid:116) Note that Lemma 3 directly implies the following result.

Theorem 4.

In Lemma 3, the ball B ( p , − α || p − p || ) is a − α -approximate MEB of P , withprobability − η .Proof. From the upper bound in Lemma 3, we know that − α || p − p || ≥ Rad ( P ). It impliesthat the ball B ( p , − α || p − p || ) covers the whole point set P . From the lower bound in Lemma 3,we know that − α || p − p || ≤ − α Rad ( P ). Therefore, it is a − α -approximate MEB of P . (cid:117)(cid:116) Since | Q | = β log η in Lemma 3, Theorem 4 indicates that we can easily obtain a − α -approximate MEB of P in O ( β (log η ) d ) time. We further show our second sampling algorithm(Algorithm 3) that achieves a lower approximation ratio. Algorithm 2 serves as a subroutine inAlgorithm 3. In Algorithm 2, we simply set z = (cid:15) with s = 1 / o i having distance less than s (cid:15) (cid:15) Rad ( T ) to the center of M EB ( T ) in Step 2(1).9 lgorithm 2 Oracle on (1 + (cid:15) ) Rad ( P ) Input:

An ( α , β )-stable instance P of MEB problem in R d ; two small parameters (cid:15) and η ∈ (0 , h >

0, and apositive integer z = (cid:15) .1: Initially, arbitrarily select a point p ∈ P and let T = { p } .2: i = 1; repeat the following steps:(1) Compute an approximate MEB of T and let the ball center be o i .(2) Randomly select a subset Q ⊂ P with | Q | = β log zη .(3) Select the point q ∈ Q that is farthest to o i , and add it to T .(4) If || q − o i || < h , stop the loop and output “yes”.(5) i = i + 1; if i > z , stop the loop and output “no”. Lemma 5. If h ≥ (1 + (cid:15) ) Rad ( P ) , Algorithm 2 returns “yes”; else if h < (1 − α ) Rad ( P ) ,Algorithm 2 returns “no” with probability at least − η .Proof. First, we assume that h ≥ (1 + (cid:15) ) Rad ( P ). Recall the remark following Theorem 1. If wealways add a point q with distance at least h ≥ (1 + (cid:15) ) Rad ( P ) to o i , the loop 2(1)-(5) cannotcontinue more than z iterations, i.e. , Algorithm 2 will return “yes”.Now, we consider the case h < (1 − α ) Rad ( P ). Similar to the proof of Lemma 3, we considerthe ball B ( o i , l ) covering exactly (1 − β ) n points of P . We know that l ≥ (1 − α ) Rad ( P ) > h according to Deﬁnition 2. Also, with probability 1 − η/z , the sample Q contains at least one pointoutside B ( o i , l ) from Lemma 4. By taking the union bound, with probability (1 − η/z ) z ≥ − η , || q − o i || is always larger than h and Algorithm 2 will return “no”. (cid:117)(cid:116) Algorithm 3

MEB Algorithm II

Input:

An ( α , β )-stable instance P of MEB problem in R d ; two small parameters (cid:15) and η ∈ (0 ,

1) and a positiveinteger z = (cid:15) ; the interval [ a, b ] for Rad ( P ) obtained by Lemma 3.1: Among the set { (1 − α ) a, (1 + (cid:15) )(1 − α ) a, · · · , (1 + (cid:15) ) w (1 − α ) a = (1 + (cid:15) ) b } where w = (cid:100) log (cid:15) − α ) (cid:101) + 1 = O ( (cid:15) log − α ), perform binary search for the value h by using Algorithm 2 with η = η w .2: Suppose that Algorithm 2 returns “no” when h = (1+ (cid:15) ) i (1 − α ) a and returns “yes” when h = (1+ (cid:15) ) i +1 (1 − α ) a .3: Run Algorithm 2 again with h = (1 + (cid:15) ) i +2 a and η = η /

2; let ˜ o be the resulting ball center of T when theloop stops.4: Return the ball B (˜ o, r ), where r = (cid:114) α + O ( (cid:15) )1 − α +2 √ α (cid:15) h . Theorem 5.

With probability − η , Algorithm 3 returns a λ -approximate MEB of P , where λ = (1 + x )(1 + x )1 + (cid:15) with x = α + O ( (cid:15) )1 − α , x = (cid:114) α + O ( (cid:15) )1 − α + 2 √ α, (14) and lim (cid:15),α → λ = 1 . The running time is ˜ O (cid:0) ( (cid:15)β + (cid:15) ) d (cid:1) , where ˜ O ( f ) = O ( f · polylog ( (cid:15) , − α , η )) .Proof. Since Algorithm 2 returns “no” when h = (1 + (cid:15) ) i (1 − α ) a and returns “yes” when h = (1 + (cid:15) ) i +1 (1 − α ) a , we know that(1 + (cid:15) ) i (1 − α ) a < (1 + (cid:15) ) Rad ( P ); (15)(1 + (cid:15) ) i +1 (1 − α ) a ≥ (1 − α ) Rad ( P ) , (16)from Lemma 5. The above inequalities together imply that(1 + (cid:15) ) − α Rad ( P ) > (1 + (cid:15) ) i +2 a ≥ (1 + (cid:15) ) Rad ( P ) . (17)10hus, when running Algorithm 2 with h = (1 + (cid:15) ) i +2 a in Step 3, the algorithm returns “yes”(by the right hand-side of (17)). Then, consider the ball B (˜ o, h ). We claim that | P \ B (˜ o, h ) | < βn .Otherwise, the sample Q contains at least one point outside B (˜ o, h ) with probability 1 − η/z in Step 2(2) of Algorithm 2, i.e., the loop will continue. Thus, it contradicts to the fact thatthe algorithm returns “yes”. Let P (cid:48) = P ∩ B (˜ o, h ), and then | P (cid:48) | > (1 − β ) n . Moreover, the lefthand-side of (17) indicates that h = (1 + (cid:15) ) i +2 a ≤ (1 + α + O ( (cid:15) )1 − α ) Rad ( P ) . (18)Now, we can apply Theorem 2, where the only diﬀerence is that we replace the “ (cid:15) ” by “ α + O ( (cid:15) )1 − α ”in the theorem. Let o be the center of M EB ( P ). Consequently, we have || ˜ o − o || ≤ (cid:0)(cid:114) α + O ( (cid:15) )1 − α + 2 √ α (cid:1) Rad ( P ) . (19)For simplicity, we let x = α + O ( (cid:15) )1 − α and x = (cid:113) α + O ( (cid:15) )1 − α + 2 √ α . Hence, h ≤ (1 + x ) Rad ( P )and || ˜ o − o || ≤ x Rad ( P ) via (18) and (19). From (19), we know that P ⊂ B (˜ o, (1 + x ) Rad ( P )).From the right hand-side of (17), we know that (1 + x ) Rad ( P ) ≤ x (cid:15) h . Thus, we have P ⊂ B (cid:16) ˜ o, x (cid:15) h (cid:17) , (20)where x (cid:15) h = (cid:113) α + O ( (cid:15) )1 − α +2 √ α (cid:15) h . Also, the radius1 + x (cid:15) h ≤ (1 + x )(1 + x )1 + (cid:15) Rad ( P ) = λRad ( P ) . (21)This means that B (cid:16) ˜ o, x (cid:15) h (cid:17) is a λ -approximate MEB of P with lim (cid:15),α → λ = 1. Success probability.

The success probability of Algorithm 2 is 1 − η . In Algorithm 3, weset η = η w in Step 1 and η = η / − η w ) log w (1 − η / > − η . Running time.

As the subroutine, Algorithm 2 runs in O ( z ( β (log zη ) d + (cid:15) d )) time; Al-gorithm 3 calls the subroutine O (cid:0) log( (cid:15) log − α ) (cid:1) times. Note that z = O ( (cid:15) ). Thus, the totalrunning time is ˜ O (cid:0) ( (cid:15)β + (cid:15) ) d (cid:1) . (cid:117)(cid:116) The result in this section is an extension of Theorem 4, but needs a more complicated analysis.A key step is to estimate the range of

Rad ( P opt ). In Lemma 3, we can estimate the range via asimple sampling procedure. However, this idea cannot be applied to the case with outliers, sincethe farthest sampled point p could be an outlier. We brieﬂy introduce our idea below. High level idea:

To estimate the range of

Rad ( P opt ), we imagine two balls centered at p (recall in the proof of Lemma 3, we only consider one ball B ( p , l ) as in Figure 4) with twoappropriate radii (see Figure 5). Intuitively, these two balls guarantee a large enough gap suchthat there exists at least one sampled point, say p , falling in the ring between the two spheres.Moreover, together with the stability property described in Deﬁnition 5, we can show that thedistance || p − p || provides a range of Rad ( P opt ) in Lemma 6. We also extend this idea to be a“sandwich” lemma (Lemma 10) for designing our sub-linear time algorithm for general instancesin Section 6. 11 emma 6. Let ( P, γ ) be an ( α, β ) -stable instance of MEB with outliers, and p be a pointrandomly selected from P . Let Q be a random sample from P with size | Q | = O (cid:0) max { β , γ } × (2 γ + β ) β log η (cid:1) for a given η ∈ (0 , . Then, if p is the t -th farthest point to p in Q , where t = γ +2 β γ + β γ | Q | + 1 , the following holds with probability (1 − η )(1 − γ ) , Rad ( P opt ) ∈ [ 12 || p − p || , − α || p − p || ] . (22)Fig. 5: An illustration of Lemma 6. Proof.

First, we assume that p ∈ P opt (note that this happens with probability 1 − γ ). Weconsider two balls B ( p , l ) and B ( p , l (cid:48) ) such that | P ∩ B ( p , l ) | = (1 − γ − β ) n ; (23) | P ∩ B ( p , l (cid:48) ) | = (1 − γ ) n. (24)That is, B ( p , l (cid:48) ) contains βn more points than B ( p , l ) from P (see Figure 5). Further, we deﬁnetwo subsets A = P \ B ( p , l (cid:48) ) and B = P ∩ ( B ( p , l (cid:48) ) \ B ( p , l )). Therefore, | A | = γn and | B | = βn .Now, suppose that we randomly sample m points Q from P , where the value of m will bedetermined later. Let { x i | ≤ i ≤ m } be m independent random variables with x i = 1 if the i -th sampled point belongs to A , and x i = 0 otherwise. Thus, E [ x i ] = γ for each i . Let σ be asmall parameter in (0 , Pr (cid:16) (cid:80) mi =1 x i / ∈ (1 ± σ ) γm (cid:17) ≤ e − O ( σ mγ ) . That is, Pr (cid:16) | Q ∩ A | ∈ (1 ± σ ) γm (cid:17) ≥ − e − O ( σ mγ ) . (25)Similarly, we have Pr (cid:16) | Q ∩ B | ∈ (1 ± σ ) βm (cid:17) ≥ − e − O ( σ mβ ) . (26)Consequently, if m = O (max { β , γ } × σ log η ), with probability (1 − η ) > − η , we have | Q ∩ A | ∈ (1 ± σ ) γm and | Q ∩ B | ∈ (1 ± σ ) βm. (27)Therefore, if we rank the points of Q by their distances to p decreasingly, we know that at mostthe top (1 + σ ) γm points belong to A , and at least the top (1 − σ )( γ + β ) m points belong to A ∪ B . To ensure (1 + σ ) γm < (1 − σ )( γ + β ) m ( i.e. , there is a gap between (1 + σ ) γm and(1 − σ )( γ + β ) m ), we need to set σ < β γ + β ( e.g. , we can set σ = β γ + β ). Then, we pick the t -thfarthest point to p from Q , where t = (1 + σ ) γm + 1, and denote it as p . As a consequence, p ∈ B with probability 1 − η .Suppose p ∈ B (see Figure 5). From Deﬁnition 5, we directly have || p − p || ≥ l ≥ (1 − α ) Rad ( P opt ) . (28)12o obtain the upper bound of || p − p || , we consider two cases: A ∩ P opt = ∅ and A ∩ P opt (cid:54) = ∅ .For the former case, since A = P \ B ( p , l (cid:48) ) and | A | = γn , we know that the whole P opt is coveredby B ( p , l (cid:48) ) and all the points of P \ P opt are outside of B ( p , l (cid:48) ). Since p ∈ B ⊂ B ( p , l (cid:48) ), wehave p ∈ P opt . It implies that || p − p || ≤ Rad ( P opt ). For the latter case, let p ∈ A ∩ P opt (seeFigure 5). Then we have || p − p || ≤ l (cid:48) ≤ || p − p || ≤ Rad ( P opt ) . (29)Thus, we have || p − p || ≤ Rad ( P opt ) for both cases.Overall, p ∈ P opt with probability 1 − γ and p ∈ B with probability 1 − η , and thus Rad ( P opt ) ∈ [ || p − p || , − α || p − p || ] with probability (1 − η )(1 − γ ). We set σ = β γ + β , andthe sample size | Q | = m = O (max { β , γ } × σ log η ) = O (cid:0) max { β , γ } × (2 γ + β ) β log η (cid:1) . (cid:117)(cid:116) Similar to Theorem 4, we can obtain an approximate solution of MEB with outliers viaLemma 6. In the proof of Lemma 6, we assume p ∈ P opt , and thus P opt ⊂ B ( p , Rad ( P opt )).Moreover, since Rad ( P opt ) ∈ [ || p − p || , − α || p − p || ], we know that B ( p , Rad ( P opt )) ⊂ B ( p , − α || p − p || ) and − α || p − p || ≤ − α Rad ( P opt ). Note that p can be selected from thesample Q in linear time O ( | Q | d ) by the algorithm in [16]. Thus, we have the following result. Theorem 6.

In Lemma 6, the ball B ( p , − α || p − p || ) is a − α -approximation of the instance ( P, γ ) , with probability (1 − η )(1 − γ ) . The running time for obtaining the ball is O (cid:0) max { β , γ } × (2 γ + β ) β (log η ) d (cid:1) . We consider the general case of MEB with outliers and present (1 + (cid:15), δ )-approximationalgorithms in this section. Recall the remark following Theorem 1. As long as the selected point has a distance to thecenter of

M EB ( T ) larger than (1 + (cid:15) ) times the optimal radius, the expected improvementwill always be guaranteed. Following this observation, we investigate the following approach.Suppose we run the core-set construction procedure decribed in Theorem 1. In the i -th step,we add an arbitrary point from P opt \ B ( o i , (1 + (cid:15) ) Rad ( P opt )) to T where o i is the approximatecenter of T . We know that a (1 + (cid:15) )-approximation is obtained after at most − s ) (cid:15) steps, thatis, P opt ⊂ B (cid:0) o i , (1 + (cid:15) ) Rad ( P opt ) (cid:1) for some i ≤ − s ) (cid:15) .However, we need to solve two key issues in order to implement the above approach: (i) how to determine the value of Rad ( P opt ) and (ii) how to correctly select a point from P opt \ B ( o i , (1 + (cid:15) ) Rad ( P opt )). Actually, we can implicitly avoid the ﬁrst issue via replacing(1 + (cid:15) ) Rad ( P opt ) by the t -th largest distance from the points of P to o i , where t is to bedetermined later in our following analysis. For the second issue, we randomly select one pointfrom the farthest t points of P to o i , and show that it belongs to P opt \ B ( o i , (1 + (cid:15) ) Rad ( P opt ))with high probability.Based on the above idea, we present a linear time (1 + (cid:15), δ )-approximation algorithm inAlgorithm 4 in Sections 6.2. Note that B˘adoiu et al. [10] also achieved a bi-criteria approximationalgorithm but with a higher complexity (see more details in our analysis on the running timeat the end of Sections 6.2). More importantly, we focus on improving the running time ofAlgorithm 4 to be sub-linear in this section. For this purpose, we need to avoid computing thefarthest t points to o i , since this operation will take linear time. Also, Algorithm 4 generates aset of candidates for the solution and we need to select the best one. This process also costslinear time. 13ur idea is inspired by our second sampling algorithm for computing the MEB of stableinstance (Algorithm 2 in Section 4.2), where it takes a random sample ﬁrst and selects thefarthest point from the sample (see step 2(2)-(3)). We generalize this idea to be a “two level”sampling procedure: we randomly take a small sample (say A ) ﬁrst; then adaptively sample onepoint from A , rather than directly select the farthest point as Algorithm 2, and add it to thecoreset. We call this procedure as “ Uniform-Adaptive Sampling ”. Another challenge is howto select the best candidate in sub-linear time. We modify our previous idea used in Lemma 6 ofSection 5, and propose a “

Sandwich Lemma ” to estimate the radius of each candidate. Wepresent our sub-linear time (1 + (cid:15), δ )-approximation algorithm that has the sample complexityindependent of n and d , in Section 6.3.As mentioned in Section 1.2, it is also possible to obtain a sub-linear time bi-critera approxima-tion by using uniform sampling [5, 29, 45]. But the sample size will depend on the dimensionality d , which is roughly O (cid:0) δ γ kd · polylog ( kdδγ ) (cid:1) , to guarantee a (1+ (cid:15), δ )-approximation for k -centerclustering with outliers ( k = 1 if considering MEB with outliers). The other property testingalgorithm presented in [5], which has the sample size independent of d , is challenging to beused to solve MEB with outliers problem as the algorithm relies on the property of minimumenclosing ball, but the ball M EB ( P opt ) is mixed with outliers in our case. (1 + (cid:15), δ )-approximation Algorithm for MEB with Outliers Input:

A point set P with n points in R d , the fraction of outliers γ ∈ (0 , < (cid:15), δ < z ∈ Z + .1: Let t = (1 + δ ) γn .2: Initially, randomly select a point p ∈ P and let T = { p } .3: i = 1; repeat the following steps until i > z :(1) Compute the approximate MEB center o i of T .(2) Let Q be the set of farthest t points from P to o i ; denote by l i the ( t + 1)-th largest distance from P to o i .(3) Randomly select a point q ∈ Q , and add it to T .(4) i = i + 1.4: Output the ball B ( o ˆ i , l ˆ i ) where ˆ i = arg i min { l i | ≤ i ≤ z } . In this section, we present our linear time (1 + (cid:15), δ )-approximation algorithm for MEBwith outliers (see Algorithm 4). In Step 3(1), we compute the approximate center o i with adistance to the exact one less than ξr v = s (cid:15) (cid:15) Rad ( T ), where s ∈ (0 ,

1) as described in Theorem 1(we will determine the value of s in our following analysis on the running time). The followingtheorem shows the success probability of Algorithm 4. Theorem 7. If z = − s ) (cid:15) , then with probability (1 − γ )( δ δ ) z , Algorithm 4 outputs a (1+ (cid:15), δ ) -approximation for the MEB with outliers problem. Before proving Theorem 7, we present the following two lemmas ﬁrst.

Lemma 7.

With probability (1 − γ )( δ δ ) z , the set T ⊂ P opt in Algorithm 4.Proof. Initially, because | P opt | / | P | = 1 − γ , the ﬁrst selected point in Step 2 belongs to P opt with probability 1 − γ . In each of the z rounds in Step 3, the selected point belongs to P opt with14robability δ δ , since | P opt ∩ Q || Q | = 1 − | Q \ P opt || Q |≥ − | P \ P opt || Q | = 1 − γn (1 + δ ) γn = δ δ . (30)Therefore, T ⊂ P opt with probability (1 − γ )( δ δ ) z . (cid:117)(cid:116) For convenience, denote by c i and r i the exact center and radius of M EB ( T ) respectively inthe i -th round of Step 3 of Algorithm 4. Lemma 8.

For each round of Step 3, at least one of the following two events happens: (1) o i isthe ball center of a (1 + (cid:15), δ ) -approximation; (2) r i +1 > (1 + (cid:15) ) Rad ( P opt ) − || c i − c i +1 || − ξr i .Proof. If l i ≤ (1 + (cid:15) ) Rad ( P opt ), then we are done. That is, B ( o i , l i ) covers (1 − (1 + δ ) γ ) n pointsand l i ≤ (1 + (cid:15) ) Rad ( P opt ). Otherwise, l i > (1 + (cid:15) ) Rad ( P opt ) and we consider the second event.Let q be the point added to T in the i -th round. Using the triangle inequality, we have || o i − q || ≤ || o i − c i || + || c i − c i +1 || + | c i +1 − q || ≤ ξr i + || c i − c i +1 || + r i +1 . (31)Since l i > (1 + (cid:15) ) Rad ( P opt ) and q lies outside of B ( o i , l i ), i.e, || o i − q || ≥ l i > (1 + (cid:15) ) Rad ( P opt ),(31) implies that the second event happens and the proof is completed. (cid:117)(cid:116) Suppose that the ﬁrst event of Lemma 8 never happens. As a consequence, we obtain a seriesof inequalities for each pair of radii r i +1 and r i , i.e., r i +1 > (1 + (cid:15) ) Rad ( P opt ) − || c i − c i +1 || − ξr i .Assume that T ⊂ P opt in Lemma 7, i.e., each time the algorithm correctly adds a point from P opt to T . Using the almost identical idea for proving Theorem 1 in Section 2.1, we know that a(1 + (cid:15) )-approximate MEB of P opt is obtained after at most z rounds. The success probabilitydirectly comes from Lemma 7. Overall, we obtain Theorem 7. Moreover, Theorem 7 directlyimplies the following corollary. Corollary 1.

If one repeatedly runs Algorithm 4 O ( − γ (1 + δ ) z ) times, with constant probability,the algorithm outputs a (1 + (cid:15), δ ) -approximation for the problem of MEB with outliers. Running time.

In Theorem 7, we set z = − s ) (cid:15) and s ∈ (0 , z small, accordingto Theorem 1, we set s = (cid:15) (cid:15) so that z = (cid:15) + 1 (only larger than the lower bound (cid:15) by 1). Foreach round of Step 3, we need to compute an approximate center o i that has a distance to theexact one less than ξr i = s (cid:15) (cid:15) r i = O ( (cid:15) ) r i . Using the proposed algorithm in [8], this can be donein O ( ξ | T | d ) = O ( (cid:15) d ) time. Also, the set Q can be obtained in linear time by the algorithmin [16]. In total, the time complexity for obtaining a (1 + (cid:15), δ )-approximation in Corollary 1 is O (cid:0) C(cid:15) ( n + 1 (cid:15) ) d (cid:1) , (32)where C = O ( − γ (1 + δ ) (cid:15) +1 ). As mentioned before, B˘adoiu et al. [10] also achieved a linear timebi-criteria approximation. However, the hidden constant of their running time is exponential on O ( (cid:15)µ ) (where µ is deﬁned in [10], and should be δγ to ensure a (1 + (cid:15), δ )-approximation)that is much larger than (cid:15) + 1. 15 .3 Improvement on Running Time In this section, we show that the running time of Algorithm 4 can be further improved to beindependent of the number of points n . First, we observe that it is not necessary to compute theset Q of the farthest t points in Step 3(2) of the algorithm. Actually, as long as the selected point q ∈ P opt ∩ Q in Step 3(3), a (1 + (cid:15), δ )-approximation is still guaranteed. Our new algorithmrelies on the following two key lemmas. In Lemma 9, we show that it is possible to obtain apoint q ∈ P opt ∩ Q via a novel uniform-adaptive sampling procedure. In Lemma 10, we show thatthe radius of each candidate solution can be estimated via random sampling. Overall, we achievea sub-linear time algorithm and the ﬁnal result is presented in Theorem 8 and Corollary 2. Lemma 9 (Uniform-Adaptive Sampling).

Let η ∈ (0 , . In Step 3(2) of Algorithm 4, ifwe randomly select n (cid:48) = O ( δγ log η ) points from P and let Q (cid:48) be the set of farthest (1 + δ ) γn (cid:48) points to o i from the sample, then, with probability at least − η , the following holds (cid:12)(cid:12)(cid:12) Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1)(cid:12)(cid:12)(cid:12) | Q (cid:48) | ≥ δ δ ) . (33)Roughly speaking, we can take a random sample ﬁrst ( i.e., the uniform sampling step), and thenrandomly select a point from the top (1 + δ ) γn (cid:48) sampled points ( i.e., the adaptive samplingstep); according to Lemma 9, with probability at least (1 − η ) δ δ ) , the selected point belongsto P opt ∩ Q . This strategy can help us avoid computing the set Q that costs Ω ( nd ). Proof (of Lemma 9).

Let A denote the set of sampled n (cid:48) points from P . First, we know that | Q | = t = (1 + δ ) γn and | P opt ∩ Q | ≥ δγn (since there are at most γn outliers in Q ). Consequently,since n (cid:48) = O ( δγ log η ), we can apply the Chernoﬀ bound and the same idea for proving (27)(let σ < /

2) to obtain: (cid:12)(cid:12)(cid:12) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:12)(cid:12)(cid:12) > δγn (cid:48) and (cid:12)(cid:12)(cid:12) A ∩ Q (cid:12)(cid:12)(cid:12) <

32 (1 + δ ) γn (cid:48) (34)with probability 1 − η . Note that Q contains all the t points having distance larger than l i inStep 3(2), thus A ∩ Q = { p ∈ A | || p − o i || > l i } . (35)Also, since Q (cid:48) is the set of the farthest (1 + δ ) γn (cid:48) points to o i from A , there exists some l (cid:48) i > Q (cid:48) = { p ∈ A | || p − o i || > l (cid:48) i } . (36)(35) and (36) imply that either ( A ∩ Q ) ⊆ Q (cid:48) or Q (cid:48) ⊆ ( A ∩ Q ). Since (cid:12)(cid:12) A ∩ Q (cid:12)(cid:12) < (1 + δ ) γn (cid:48) and | Q (cid:48) | = (1 + δ ) γn (cid:48) , we know (cid:16) A ∩ Q (cid:17) ⊆ Q (cid:48) . Therefore, (cid:16) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) = (cid:16) P opt ∩ (cid:0) A ∩ Q (cid:1)(cid:17) ⊆ Q (cid:48) . (37)Obviously, (cid:16) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) ⊆ (cid:0) P opt ∩ Q (cid:1) . (38)The above (37) and (38) together imply (cid:16) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) ⊆ (cid:16) Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) . (39)16oreover, since Q (cid:48) ⊆ A , we have (cid:16) Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) ⊆ (cid:16) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:17) . (40)Consequently, (39) and (40) together imply Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1) = A ∩ (cid:0) P opt ∩ Q (cid:1) and hence (cid:12)(cid:12)(cid:12) Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1)(cid:12)(cid:12)(cid:12) | Q (cid:48) | = (cid:12)(cid:12)(cid:12) A ∩ (cid:0) P opt ∩ Q (cid:1)(cid:12)(cid:12)(cid:12) | Q (cid:48) |≥ δ δ ) , (41)where the ﬁnal inequality comes from the ﬁrst inequality of (34) and the fact | Q (cid:48) | = (1 + δ ) γn (cid:48) . (cid:117)(cid:116) Another place needs modiﬁcation in Algorithm 4 is the computation of l i in Step 3(2), since itcosts at least linear time. In fact, the set { o , o , · · · , o z } can be viewed as a set of candidates ofthe ball center. For each candidate o i , we need to estimate the value of l i . Lemma 10 (Sandwich Lemma).

Let η ∈ (0 , and suppose δ < / . In Step 3(2) ofAlgorithm 4, if we randomly select n (cid:48)(cid:48) = O (cid:0) δ γ log η (cid:1) points from P and let ˜ l i be the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th largest distance from the sampled points to o i , then, with probability − η , thefollowing holds ˜ l i ≤ l i ; (42) (cid:12)(cid:12)(cid:12) P \ B ( o i , ˜ l i ) (cid:12)(cid:12)(cid:12) ≤ (1 + O ( δ )) γn. (43)The intuition of Lemma 10 is to show that the ball B ( o i , ˜ l i ) is “sandwiched” by two balls B ( o i , ˜ l (cid:48) i )and B ( o i , l i ), where ˜ l (cid:48) i is some value that (cid:12)(cid:12)(cid:12) P \ B ( o i , ˜ l (cid:48) i ) (cid:12)(cid:12)(cid:12) = (1 + O ( δ )) γn . Namely, ˜ l (cid:48) i ≤ ˜ l i ≤ l i . SeeFigure 6 for an illustration. (43) implies that every B ( o i , ˜ l i ) covers at least (1 − (1 + O ( δ )) γ ) n points of P . (42) implies that min { ˜ l i | ≤ i ≤ z } ≤ min { l i | ≤ i ≤ z } . Thus, if thereexists some B ( o i , l i ) that is a (1 + (cid:15), δ )-approximation, the selected ball B ( o ˆ i , ˜ l ˆ i ) should be a(1 + (cid:15), O ( δ ))-approximation, where ˆ i = arg i min { ˜ l i | ≤ i ≤ z } .Fig. 6: The red points are the sampled point set B , and the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th farthest pointis in the ring bounded by the spheres B ( o i , ˜ l (cid:48) i ) and B ( o i , l i ). Proof (of Lemma 10).

Let B denote the set of sampled n (cid:48)(cid:48) points from P . Let ˜ l (cid:48) i > (cid:12)(cid:12)(cid:12) P \ B ( o i , ˜ l (cid:48) i ) (cid:12)(cid:12)(cid:12) = (1+ δ ) − δ γn . Recall that l i is the ( t + 1)-th largest distance from P to o i in Step 3(2) of Algorithm 4. Since t = (1 + δ ) γn < (1+ δ ) − δ γn , it is easy to know ˜ l (cid:48) i ≤ l i . Below,17e aim to prove that the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th farthest point from B is in the ring bounded bythe spheres B ( o i , ˜ l (cid:48) i ) and B ( o i , l i ) (see Figure 6).Again, using the Chernoﬀ bound (let σ = δ/

2) and the same idea for proving (27), since | B | = n (cid:48)(cid:48) = O (cid:0) δ γ log η (cid:1) , we have (cid:12)(cid:12)(cid:12) B \ B ( o i , ˜ l (cid:48) i ) (cid:12)(cid:12)(cid:12) ≥ (1 − δ/

2) (1 + δ ) − δ γn (cid:48)(cid:48) > (1 − δ ) (1 + δ ) − δ γn (cid:48)(cid:48) = (1 + δ ) γn (cid:48)(cid:48) ; (44) (cid:12)(cid:12)(cid:12) B ∩ Q (cid:12)(cid:12) ≤ (1 + δ/ tn n (cid:48)(cid:48) < (1 + δ ) tn n (cid:48)(cid:48) = (1 + δ ) γn (cid:48)(cid:48) , (45)with probability 1 − η . Suppose that (44) and (45) both hold. Recall that ˜ l i is the (cid:0) (1+ δ ) γn (cid:48)(cid:48) +1 (cid:1) -th largest distance from the sampled points B to o i , so (cid:12)(cid:12)(cid:12) B \ B ( o i , ˜ l i ) (cid:12)(cid:12)(cid:12) = (1 + δ ) γn (cid:48)(cid:48) , and thus˜ l i ≥ ˜ l (cid:48) i by (44).The inequality (45) implies that the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th farthest point (say q x ) from B to o i is not in Q . Then, we claim that B ( o i , ˜ l i ) ∩ Q = ∅ . Otherwise, let q y ∈ B ( o i , ˜ l i ) ∩ Q . Then we have || q y − o i || ≤ ˜ l i = || q x − o i || . (46)Note that Q is the set of farthest t points to o i of P . So q x / ∈ Q implies || q x − o i || < min q ∈ Q || q − o i || ≤ || q y − o i || (47)which is in contradiction to (46). Therefore, B ( o i , ˜ l i ) ∩ Q = ∅ . Further, since B ( o i , l i ) excludesexactly the farthest t points ( i.e. , Q ), B ( o i , ˜ l i ) ∩ Q = ∅ implies ˜ l i ≤ l i .Overall, we have ˜ l i ∈ [˜ l (cid:48) i , l i ], i.e., the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th farthest point from B locates inthe ring bounded by the spheres B ( o i , ˜ l (cid:48) i ) and B ( o i , l i ) as shown in Figure 6. Also, ˜ l i ≥ ˜ l (cid:48) i implies (cid:12)(cid:12)(cid:12) P \ B ( o i , ˜ l i ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) P \ B ( o i , ˜ l (cid:48) i ) (cid:12)(cid:12)(cid:12) = (1 + δ ) − δ γn = (1 + O ( δ )) γn, (48)where the last equality comes from the assumption δ < /

3. So (42) and (43) are true. (cid:117)(cid:116)

By Lemmas 9 and 10, we have the following sub-linear time algorithm for MEB with outliers(Algorithm 5). Following the analysis in Section 6.2, we set s = (cid:15) (cid:15) so that z = − s ) (cid:15) = (cid:15) + 1.We present the results in Theorem 8 and Corollary 2. Comparing with Theorem 7, we havean extra (1 − η )(1 − η ) in the success probability in Theorem 8, due to the probabilities inLemmas 9 and 10. Theorem 8. If z = (cid:15) + 1 , then with probability (1 − γ ) (cid:0) (1 − η )(1 − η ) δ δ ) (cid:1) z , Algorithm 5outputs a (1 + (cid:15), O ( δ )) -approximation for the problem of MEB with outliers. To boost the success probability in Theorem 8, we need to repeatedly run Algorithm 5 andoutput the best candidate. However, we need to be careful on setting the parameters. The successprobability in Theorem 8 consists of two parts, P = (1 − γ ) (cid:0) (1 − η ) δ δ ) (cid:1) z and P = (1 − η ) z ,where P indicates the probability that { o , · · · , o z } contains a qualiﬁed candidate, and P indicates the success probability of Lemma 10 over all the z rounds. Therefore, if we runAlgorithm 5 N = O ( P ) times, with constant probability (by taking the union bound), the set ofall the generated candidates contains at least one that yields a (1 + (cid:15), O ( δ ))-approximation;moreover, to guarantee that we can correctly estimate the resulting radii of all the candidates(with constant probability), we need to set η = O ( zN ).18 lgorithm 5 Sub-linear Time (1 + (cid:15), O ( δ ))-approximation Algorithm for MEB with Outliers Input:

A point set P with n points in R d , the fraction of outliers γ ∈ (0 , (cid:15), η , η ∈ (0 , δ ∈ (0 , / z ∈ Z + .1: Let n (cid:48) = O ( δγ log η ), n (cid:48)(cid:48) = O (cid:0) δ γ log η (cid:1) , t (cid:48) = (1 + δ ) γn (cid:48) , and t (cid:48)(cid:48) = (1 + δ ) γn (cid:48)(cid:48) .2: Initially, randomly select a point p ∈ P and let T = { p } .3: i = 1; repeat the following steps until j = z :(1) Compute the approximate MEB center o i of T .(2) Randomly select n (cid:48) points from P and let Q (cid:48) be the set of farthest t (cid:48) points to o i from the sample.(3) Randomly select a point q ∈ Q (cid:48) , and add it to T .(4) Randomly select n (cid:48)(cid:48) points from P , and let ˜ l i be the ( t (cid:48)(cid:48) + 1)-th largest distance from the sampled pointsto o i .(5) i = i + 1.4: Output the ball B ( o ˆ i , ˜ l ˆ i ) where ˆ i = arg i min { ˜ l i | ≤ i ≤ z } . Corollary 2.

If one repeatedly runs Algorithm 5 N = O (cid:16) − γ (cid:0) − η (3 + δ ) (cid:1) z (cid:17) times with setting η = O ( zN ) , with constant probability, the algorithm outputs a (1 + (cid:15), O ( δ )) -approximationfor the problem of MEB with outliers. The calculation of running time is similar to (32) in Section 6.2. We just replace n bymax { n (cid:48) , n (cid:48)(cid:48) } = O (cid:0) δ γ log η (cid:1) = O (cid:0) δ γ log( zN ) (cid:1) = ˜ O (cid:0) δ γ(cid:15) (cid:1) , and change the value of C to be O (cid:16) − γ (cid:0) − η (3 + δ ) (cid:1) (cid:15) +1 (cid:17) . So the total running time is independent of n . Also, to covert theresult from (1 + (cid:15), O ( δ ))-approximation to (1 + (cid:15), δ )-approximation, we just need toreduce the value of δ in the input of Algorithm 5 appropriately. k -Center Clustering with Outliers In this section, we show that the ideas in Section 6.3 can be applied to handle the more general k -center clustering with outliers problem; in particular, Lemmas 9 and 10 still hold when theobjective is using multiple balls (rather than a single ball) to cover the points, and therefore wecan achieve a sub-linear time bi-criteria algorithm.Here, we follow the notations for MEB with outliers before. Let P be a set of n points in R d ,and γ ∈ (0 , k -center clustering with outliers is to ﬁnd k balls to cover (1 − γ ) n points of P , and the maximum radius of the balls is minimized. We also use P opt , a subset of P with size (1 − γ ) n , to denote the subset yielding the optimal solution. Also, let { C , · · · , C k } be the k clusters forming P opt , and the resulting clustering cost be r opt ; that is, each C j iscovered by an individual ball with radius r opt . We also deﬁne the bi-criteria approximation for k -center clustering with outliers as Deﬁnition 4, where the only diﬀerence is that a (1 + (cid:15), δ )-approximation solution contains k balls with radius (1 + (cid:15) ) r opt . For convenience, given a point q and a point set U in R d , the distance between q and U is deﬁned as dist ( q, U ) = min u ∈ U || q − u || . (49) Linear time algorithm.

First, our algorithm in Section 6.2 can be generalized to be alinear time bi-criteria algorithm for the problem of k -center clustering with outliers, if k isassumed to be a constant. Our idea is as follows. In Algorithm 4, we maintain a set T as thecoreset of P opt ; here, we instead maintain k sets T , T , · · · , T k as the coresets of C , C , · · · , C k ,respectively. Consequently, each T j for 1 ≤ j ≤ k has an approximate MEB center o ji in the i -thround of Step 3, and let O = { o i , · · · , o ki } . Initially, O and T j for 1 ≤ j ≤ k are all empty; werandomly select a point p ∈ P , and with probability 1 − γ , p ∈ P opt (w.l.o.g., we assume p ∈ C The asymptotic notation ˜ O ( f ) = O (cid:0) f · polylog ( η δ (1 − γ ) ) (cid:1) . T ; thus O = { p } after this step). We let Q be the set of farthest t = (1 + δ ) γn points to O , and l i be the ( t + 1)-th largest distance from P to O . Then, we randomly selecta point q ∈ Q , and with probability δ δ , q ∈ P opt (as Lemma 7). For ease of presentation, weassume that q ∈ P opt happens and we have an “oracle” to guess which optimal cluster q belongsto, say q ∈ C j q ; then, we add q to T j q and update the approximate MEB center of T j q . Sinceeach optimal cluster C j for 1 ≤ j ≤ k has the coreset with size (cid:15) + 1 (by setting s = (cid:15) (cid:15) inTheorem 1), after adding at most k ( (cid:15) + 1) points, the distance l i will be smaller than (1 + (cid:15) ) r opt .That is, a (1 + (cid:15), δ )-approximation solution is obtained when i ≥ k ( (cid:15) + 1).To remove the oracle for guessing the cluster containing q , we can enumerate all the possible k cases; since we add k ( (cid:15) + 1) points to T , T , · · · , T k , it generates k k ( (cid:15) +1) = 2 k log k ( (cid:15) +1) solutionsin total, and at least one yields a (1 + (cid:15), δ )-approximation with probability (1 − γ )( δ δ ) k ( (cid:15) +1) (by the same manner for proving Theorem 7). Theorem 9.

Let ( P, γ ) be an instance of k -center clustering with outliers. Given two parameters (cid:15), δ ∈ (0 , , there exists an algorithm that outputs a (1 + (cid:15), δ ) -approximation with probability (1 − γ )( δ δ ) k ( (cid:15) +1) . The running time is O (2 k log k ( (cid:15) +1) ( n + (cid:15) ) d ) .If one repeatedly runs the algorithm O ( − γ (1 + δ ) k ( (cid:15) +1) ) times, with constant probability,the algorithm outputs a (1 + (cid:15), δ ) -approximation solution. Similar to the scenario of MEB with outliers (Section 6.2), B˘adoiu et al. [10] also achieveda linear time bi-criteria approximation for the k -center clustering with outliers problem (seeSection 4 in their paper). However, the hidden constant of their running time is exponential on( k(cid:15)µ ) O (1) (where µ is deﬁned in [10], and should be δγ to ensure a (1 + (cid:15), δ )-approximation)that is much larger than “ k log k ( (cid:15) + 1)” in Theorem 9. Sub-linear time algorithm.

Our aforementioned linear time algorithm can be furtherimproved to be sub-linear time. We revisit the proofs of Lemmas 9 and 10, and ﬁnd thatthe results also hold for the more general k -center clustering with outliers problem (since theproofs only rely on the computation of the success probabilities of the samplings, there is nofundamental diﬀerence that whether the sets are covered by one or multiple balls). In Lemma 9,we still randomly select n (cid:48) = O ( δγ log η ) points from P ; the only diﬀerence is that we let Q (cid:48) be the set of farthest (1 + δ ) γn (cid:48) points to the set O (rather than o i ) from the sample, andcorrespondingly the distance “ || p − o i || ” is replaced by “ dist ( p, O )” in (35) and (36). Then, wecan use the same idea in the proof to show that (cid:12)(cid:12) Q (cid:48) ∩ (cid:0) P opt ∩ Q (cid:1)(cid:12)(cid:12) | Q (cid:48) | ≥ δ δ ) with probability 1 − η .In Lemma 10, we conduct the similar modiﬁcations. We let ˜ l i be the (cid:0) (1 + δ ) γn (cid:48)(cid:48) + 1 (cid:1) -th largestdistance from the sampled points to the set O (rather than o i ); in (43), “ P \ B ( o i , ˜ l i )” is replacedby “ P \ (cid:0) ∪ p ∈ O B ( p, ˜ l i ) (cid:1) ”. Theorem 10.

Let ( P, γ ) be an instance of k -center clustering with outliers. Given the parameters (cid:15), δ, η , η ∈ (0 , , there exists an algorithm that outputs a (1 + (cid:15), O ( δ )) -approximation withprobability (1 − γ ) (cid:0) (1 − η )(1 − η ) δ δ ) (cid:1) k ( (cid:15) +1) . The running time is ˜ O (2 k log k ( (cid:15) +1) ( δ γ + (cid:15) ) d ) .If one repeatedly runs the algorithm N = O (cid:16) − γ (cid:0) − η (3 + δ ) (cid:1) k ( (cid:15) +1) (cid:17) times with setting η = O ( k log k ( 2 (cid:15) +1) N ) , with constant probability, the algorithm outputs a (1 + (cid:15), O ( δ )) -approximation solution. Following our work, several interesting problems deserve to be studied in future. For example, isit possible to generalize our proposed notion of stability to other optimization problems ( e.g., the problem of k -center clustering with outliers studied in this paper)? Also, Lemmas 9 and 1020ctually do not rely on the property of minimum enclosing ball. Therefore, it is interesting toconsider developing sub-linear time algorithms for other shape ﬁtting with outliers problems byusing these lemmas, e.g., ﬂat and polytope ﬁtting [26, 41, 42, 61]. References

1. P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via coresets.

Combinatorialand Computational Geometry , 52:1–30, 2005.2. P. K. Agarwal and R. Sharathkumar. Streaming algorithms for extent problems in high dimensions.

Algorith-mica , 72(1):83–98, 2015.3. A. Aggarwal, H. Imai, N. Katoh, and S. Suri. Finding k points with minimum diameter and related problems.

Journal of algorithms , 12(1):38–56, 1991.4. Z. Allen Zhu, Z. Liao, and Y. Yuan. Optimization algorithms for faster computational geometry. In , pages 53:1–53:6, 2016.5. N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering.

SIAM Journal on Discrete Mathematics ,16(3):393–417, 2003.6. P. Awasthi, A. Blum, and O. Sheﬀet. Stability yields a PTAS for k-median and k-means clustering. In , pages 309–318, 2010.7. P. Awasthi, A. Blum, and O. Sheﬀet. Center-based clustering under perturbation stability.

Inf. Process. Lett. ,112(1-2):49–54, 2012.8. M. Badoiu and K. L. Clarkson. Smaller core-sets for balls. In

Proceedings of the ACM-SIAM Symposium onDiscrete Algorithms (SODA) , pages 801–802, 2003.9. M. Badoiu and K. L. Clarkson. Optimal core-sets for balls.

Computational Geometry , 40(1):14–22, 2008.10. M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In

Proceedings of the ACMSymposium on Theory of Computing (STOC) , pages 250–257, 2002.11. M. Balcan and M. Braverman. Finding low error clusterings. In

COLT 2009 - The 22nd Conference onLearning Theory, Montreal, Quebec, Canada, June 18-21, 2009 , 2009.12. M. Balcan, N. Haghtalab, and C. White. k-center clustering under perturbation resilience. In , pages68:1–68:14, 2016.13. M. Balcan and Y. Liang. Clustering under perturbation resilience.

SIAM J. Comput. , 45(1):102–155, 2016.14. M.-F. Balcan, A. Blum, and A. Gupta. Clustering under approximation stability.

Journal of the ACM(JACM) , 60(2):8, 2013.15. Y. Bilu and N. Linial. Are stable instances easy?

Combinatorics, Probability & Computing , 21(5):643–660,2012.16. M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection.

Journal ofComputer and System Sciences , 7(4):448–461, 1973.17. M. Ceccarello, A. Pietracaprina, and G. Pucci. Solving k-center clustering (with outliers) in mapreduce andstreaming, almost as accurately as sequentially.

CoRR , abs/1802.09205, 2018.18. D. Chakrabarty, P. Goyal, and R. Krishnaswamy. The non-uniform k-center problem. In , pages67:1–67:15, 2016.19. T. M. Chan and V. Pathak. Streaming and dynamic algorithms for minimum enclosing balls in high dimensions.

Comput. Geom. , 47(2):240–247, 2014.20. M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems withoutliers. In

Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms , pages 642–651.Society for Industrial and Applied Mathematics, 2001.21. M. Charikar, L. O’Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In

Proceedings of the thirty-ﬁfth annual ACM symposium on Theory of computing , pages 30–39. ACM, 2003.22. K. L. Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm.

ACM Transactions onAlgorithms , 6(4):63, 2010.23. K. L. Clarkson, E. Hazan, and D. P. Woodruﬀ. Sublinear optimization for machine learning.

J. ACM ,59(5):23:1–23:49, 2012.24. A. Czumaj and C. Sohler. Sublinear-time algorithms.25. A. Czumaj and C. Sohler. Sublinear-time approximation for clustering via random sampling. In

InternationalColloquium on Automata, Languages, and Programming , pages 396–407. Springer, 2004.26. G. D. Da Fonseca. Fitting ﬂats to points with outliers.

International Journal of Computational Geometry &Applications , 21(05):559–569, 2011.27. S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss.

RandomStructures & Algorithms , 22(1):60–65, 2003.

8. H. Ding and J. Xu. Sub-linear time hybrid approximations for least trimmed squares estimator and relatedproblems. In

Proceedings of the International Symposium on Computational geometry (SoCG) , page 110,2014.29. H. Ding, H. Yu, and Z. Wang. Greedy strategy works for clustering with outliers and coresets construction.

CoRR , abs/1901.08219, 2019.30. A. Efrat, M. Sharir, and A. Ziv. Computing the smallest k-enclosing circle and related problems.

ComputationalGeometry , 4(3):119–136, 1994.31. J. Erickson and R. Seidel. Better lower bounds on detecting aﬃne and spherical degeneracies.

Discrete &Computational Geometry , 13(1):41–57, 1995.32. D. Feldman, C. Xiang, R. Zhu, and D. Rus. Coresets for diﬀerentially private k-means clustering andapplications to privacy in mobile sensor networks. In

Proceedings of the 16th ACM/IEEE InternationalConference on Information Processing in Sensor Networks, IPSN 2017, Pittsburgh, PA, USA, April 18-21,2017 , pages 3–15, 2017.33. K. Fischer, B. G¨artner, and M. Kutz. Fast smallest-enclosing-ball computation in high dimensions. In

Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003,Proceedings , pages 630–641, 2003.34. M. Frank and P. Wolfe. An algorithm for quadratic programming.

Naval Research Logistics Quarterly ,3(1-2):95–110, 1956.35. A. Goel, P. Indyk, and K. R. Varadarajan. Reductions among high dimensional proximity problems. In

SODA , volume 1, pages 769–778. Citeseer, 2001.36. O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation.

J. ACM , 45(4):653–750, 1998.37. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance.

Theoretical Computer Science ,38:293–306, 1985.38. S. Guha, Y. Li, and Q. Zhang. Distributed partial clustering. In

Proceedings of the 29th ACM Symposium onParallelism in Algorithms and Architectures , pages 143–152. ACM, 2017.39. L. Gyongyosi and S. Imre. Geometrical analysis of physically allowed quantum cloning transformations forquantum cryptography.

Information Sciences , 285:1–23, 2014.40. S. Har-Peled and S. Mazumdar. Fast algorithms for computing the smallest k-enclosing circle.

Algorithmica ,41(3):147–157, 2005.41. S. Har-Peled and K. Varadarajan. Projective clustering in high dimensions using core-sets. In

Proceedings ofthe eighteenth annual symposium on Computational geometry , pages 312–318. ACM, 2002.42. S. Har-Peled and K. R. Varadarajan. High-dimensional shape ﬁtting in linear time.

Discrete & ComputationalGeometry , 32(2):269–288, 2004.43. D. Haussler and E. Welzl. eps-nets and simplex range queries.

Discrete & Computational Geometry ,2(2):127–151, 1987.44. D. S. Hochbaum and D. B. Shmoys. A best possible heuristic for the k-center problem.

Mathematics ofoperations research , 10(2):180–184, 1985.45. L. Huang, S. Jiang, J. Li, and X. Wu. Epsilon-coresets for clustering (with outliers) in doubling metrics. In , pages 814–825, 2018.46. P. Indyk. Sublinear time algorithms for metric space problems. In

Proceedings of the Thirty-First AnnualACM Symposium on Theory of Computing, May 1-4, 1999, Atlanta, Georgia, USA , pages 428–434, 1999.47. P. Indyk. A sublinear time approximation scheme for clustering in metric spaces. In , pages 154–159,1999.48. M. Kerber and S. Raghvendra. Approximation and streaming algorithms for projective clustering via randomprojections. In

Proceedings of the 27th Canadian Conference on Computational Geometry, CCCG 2015,Kingston, Ontario, Canada, August 10-12, 2015 , 2015.49. A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In , pages 299–308. IEEE, 2010.50. P. Kumar, J. S. B. Mitchell, and E. A. Yildirim. Approximate minimum enclosing balls in high dimensionsusing core-sets.

ACM Journal of Experimental Algorithmics , 8, 2003.51. S. Li and X. Guo. Distributed k -clustering for data with heavy noise. In Advances in Neural InformationProcessing Systems , pages 7849–7857, 2018.52. S. Lloyd. Least squares quantization in pcm.

IEEE transactions on information theory , 28(2):129–137, 1982.53. G. Malkomes, M. J. Kusner, W. Chen, K. Q. Weinberger, and B. Moseley. Fast distributed k-center clusteringwith outliers on massive data. In

Advances in Neural Information Processing Systems , pages 1063–1071, 2015.54. J. Matouˇsek. On enclosing k points by a circle.

Information Processing Letters , 53(4):217–221, 1995.55. R. M. McCutchen and S. Khuller. Streaming algorithms for k-center clustering with outliers and withanonymity. In

Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques ,pages 165–178. Springer, 2008.56. A. Meyerson, L. O’callaghan, and S. Plotkin. A k-median algorithm with running time independent of datasize.

Machine Learning , 56(1-3):61–87, 2004.

7. N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In

Proceedings of the twelfthannual ACM-SIAM symposium on Discrete algorithms , pages 439–447. Society for Industrial and AppliedMathematics, 2001.58. D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. On the least trimmed squaresestimator.

Algorithmica , 69(1):148–183, 2014.59. K. Nissim, U. Stemmer, and S. P. Vadhan. Locating a small cluster privately. In

Proceedings of the 35thACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco,CA, USA, June 26 - July 01, 2016 , pages 413–427, 2016.60. R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The eﬀectiveness of lloyd-type methods for thek-means problem.

Journal of the ACM (JACM) , 59(6):28, 2012.61. R. Panigrahy. Minimum enclosing polytope in high dimensions. arXiv preprint cs/0407020 , 2004.62. J. M. Phillips. Coresets and sketches.

Computing Research Repository , 2016.63. T. Roughgarden. Beyond worst-case analysis.

Commun. ACM , 62(3):88–96, 2019.64. R. Rubinfeld. Sublinear time algorithms. Citeseer, 2006.65. A. Saha, S. V. N. Vishwanathan, and X. Zhang. New approximation algorithms for minimum enclosingconvex shapes. In

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2011, San Francisco, California, USA, January 23-25, 2011 , pages 1146–1160, 2011.66. D. R. Sheehy. The persistent homology of distance functions under random projection. In , page 328, 2014.67. I. W. Tsang, J. T. Kwok, and P. Cheung. Core vector machines: Fast SVM training on very large data sets.

Journal of Machine Learning Research , 6:363–392, 2005.68. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to theirprobabilities. In

Measures of complexity , pages 11–30. Springer, 2015.69. H. Zarrabi-Zadeh and A. Mukhopadhyay. Streaming 1-center with outliers in high dimensions. In

Proceedingsof the Canadian Conference on Computational Geometry (CCCG) , pages 83–86, 2009.

Suppose that the distribution of P is uniform and dense inside M EB ( P ), and β is ﬁxed to be10% for example. If we want the radius of the remaining 90% points to be as small as possible,intuitively we should remove the outermost 10% points (since P is uniform and dense). Let P (cid:48) denote the set of innermost 90% points. Thus, we have | P (cid:48) || P | ≈ V ol (cid:0)

MEB ( P (cid:48) ) (cid:1) V ol (cid:0)

MEB ( P ) (cid:1) = ( Rad ( P (cid:48) )) d ( Rad ( P )) d , where V ol ( · ) is the volume. W.l.o.g., let Rad ( P ) = 1. Then Rad ( P (cid:48) ) ≈ . /d . Let 1 − α = 0 . /d , then(1 − α ) d = 0 .

9. Note that lim d →∞ (1 − /d ) d = 1 /e < .

9, hence α < /d when d is large enough.Thus, in this case, α tends to be 0 as d increases.

10 Proof of Theorem 1

Similar to the analysis in [8], we let λ i = r i (1+ (cid:15) ) Rad ( Q ) . Because r i is the radius of M EB ( T ) and T ⊂ Q , we know r i ≤ Rad ( Q ) and then λ i ≤ / (1 + (cid:15) ). By simple calculation, we know thatwhen L i = (cid:0) (1+ (cid:15) ) Rad ( Q ) − ξr i (cid:1) − r i (cid:0) (1+ (cid:15) ) Rad ( Q ) − ξr i (cid:1) the lower bound of r i +1 in (3) achieves the minimum value.Plugging this value of L i into (3), we have λ i +1 ≥ λ i + (cid:0) (1 − ξλ i ) − λ i (cid:1) − ξλ i ) . (50)To simplify inequality (50), we consider the function g ( x ) = (1 − x ) − λ i − x , where 0 < x < ξ . Itsderivative g (cid:48) ( x ) = − − λ i (1 − x ) is always negative, thus we have g ( x ) ≥ g ( ξ ) = (1 − ξ ) − λ i − ξ . (51)23ecause ξ < (cid:15) (cid:15) and λ i ≤ / (1 + (cid:15) ), we know that the right-hand side of (51) is alwaysnon-negative. Using (51), inequality (50) can be simpliﬁed to λ i +1 ≥ λ i + 14 (cid:0) g ( ξ ) (cid:1) = λ i + (cid:0) (1 − ξ ) − λ i (cid:1) − ξ ) . (52)(52) can be further rewritten as (cid:16) λ i +1 − ξ (cid:17) ≥ (cid:16) λ i − ξ ) (cid:17) = ⇒ λ i +1 − ξ ≥ (cid:16) λ i − ξ ) (cid:17) . (53)Now, we can apply a similar transformation of λ i which was used in [8]. Let γ i = − λi − ξ . Weknow γ i > ≤ λ i ≤ (cid:15) and ξ < (cid:15) (cid:15) ). Then, (53) implies that γ i +1 ≥ γ i − γ i = γ i (cid:0) γ i + ( 12 γ i ) + · · · (cid:1) > γ i + 12 , (54)where the equation comes from the fact that γ i > γ i ∈ (0 , ). Note that λ = 0and thus γ = 1. As a consequence, we have γ i > i . In addition, since λ i ≤ (cid:15) , that is, γ i ≤ − (cid:15) )(1 − ξ ) , we have i < (cid:15) − ξ − (cid:15)ξ = 2(1 − (cid:15)(cid:15) ξ ) (cid:15) . (55)Consequently, we obtain the theorem.

11 Lemma 2.2 in [10]

Lemma 11 ( [10]).

Let B ( c, r ) be a minimum enclosing ball of a point set P ⊂ R d , then anyclosed half-space that contains c , must also contain at least a point from P that is at distance r from c .

12 Proof of Theorem 3

We prove the following lemma ﬁrst.

Lemma 12.

Suppose β ∈ (0 , . Let S be a set of Θ ( dβ log dβ ) points sampled randomly andindependently from a given point set P ⊂ R d , and B be any ball covering S . Then, with constantprobability, | B ∩ P | ≥ (1 − β ) | P | .Proof. Consider the range space Σ = ( P, Φ ) where each range φ ∈ Φ is the complement of a ballin the space. In a range space, a subset Y ⊂ P is a β -net if for any φ ∈ Φ , | P ∩ φ || P | ≥ β = ⇒ Y ∩ φ (cid:54) = ∅ .Since | S | = Θ ( dβ log dβ ), we know that S is a β -net of P with constant probability [43, 68]. Thus,if | B ∩ P | < (1 − β ) | P | , i.e., | P \ B | > β | P | , we have S ∩ (cid:0) P \ B (cid:1) (cid:54) = ∅ . This contradicts the factthat S is covered by B . Consequently, | B ∩ P | ≥ (1 − β ) | P | . (cid:117)(cid:116) o the center of M EB ( P ). Since S ⊂ P and B ( c, r ) is a (1 + (cid:15) )-approximate MEB of S , we know that r ≤ (1 + (cid:15) ) Rad ( P ). Moreover, Lemma 12 implies that | B ( c, r ) ∩ P | ≥ (1 − β ) | P | with constant probability. Suppose it is true and let P (cid:48) = B ( c, r ) ∩ P . Then, we have || c − o || ≤ (cid:0) √ (cid:15) + 2 √ α (cid:1) Rad ( P ) (56)via Theorem 2. For simplicity, we use x to denote √ (cid:15) +2 √ α . (56) implies that the point set P iscovered by the ball B ( c, (1 + x ) Rad ( P )). Note that we cannot directly return B ( c, (1 + x ) Rad ( P ))as the ﬁnal result, since we do not know the value of Rad ( P ). Thus, we have to estimate theradius (1 + x ) Rad ( P ).Since P (cid:48) is covered by B ( c, r ) and | P (cid:48) | ≥ (1 − β ) | P | , r should be at least (1 − α ) Rad ( P ) dueto Deﬁnition 2. Hence, we have 1 + x − α r ≥ (1 + x ) Rad ( P ) . (57)That is, P is covered by the ball B ( c, x − α r ). Moreover, the radius1 + x − α r ≤ x − α (1 + (cid:15) ) Rad ( P ) . (58)This means that ball B ( c, x − α r ) is a λ -approximate MEB of P , where λ = (1 + (cid:15) ) 1 + x − α = 1 + O ( √ (cid:15) ) + 2(1 + (cid:15) ) √ α − α (59)and lim (cid:15),α → λ = 1. If we use the core-set based algorithm [8] to compute B ( c, r ), the runningtime of Algorithm 1 is O (cid:0) (cid:15) ( | S | d + (cid:15) d ) (cid:1) = O (cid:0) d (cid:15)β log dβ + d(cid:15) (cid:1)(cid:1)