Expected Size of Random Tukey Layers and Convex Layers
aa r X i v : . [ c s . C G ] A ug The Expected Size of Random Convex Layers andConvex Shells
Zhengyang Guo a, ∗ , Yi Li a , Shaoyu Pei b a School of Physical and Mathematical Sciences, Nanyang Technological University,Singapore b College of Science and Mathematics, California State University, Fresno, California,United States
Abstract
Given a planar point set X , we study the convex shells and the convex layers.We prove that when X consists of points independently and uniformly sampledinside a convex polygon with k vertices, the expected number of vertices onthe first t convex shells is O ( kt log n ) for t = O ( √ n ), and the expected numberof vertices on the first t convex layers is O (cid:0) kt log nt (cid:1) . We also show a lowerbound of Ω( t log n ) for both quantities in the special cases where k = 3 ,
4. Theimplications of those results in the average-case analysis of two computationalgeometry algorithms are then discussed.
Keywords: convex hull, convex layer, convex shell, computational geometry,geometric probability
1. Introduction
The motivation of this work is to understand the combinatorial and geo-metric properties of random convex layers and shells (see Definitions 1 and 4).The underlying planar point set X is assumed to be uniformly sampled from aconvex polygon with k vertices.There has been a lot of research on the expected size of the convex hull ofa random point set [1, 2, 3], the relation between the expected size and the ∗ Corresponding author
Email addresses:
[email protected] (Zhengyang Guo), [email protected] (Yi Li),
[email protected] (Shaoyu Pei)
Preprint submitted to Journal Name August 6, 2020 xpected area of the convex hull [4, 5], and the expected convex depth [6].However, few of them study the convex layers. In fact, the vertices on the first t convex layers, denoted by V [ t ] ( X ), are closely related to the partial enclosingproblem introduced by Atanassov et al. in [7]. The objective of this problemis to find the convex hull with the minimum area that encloses ( n − t ) of the n points in X . In some of the applications, the t excluded points are regarded asoutliers.In [7], Atanassov et al. give an algorithm with the worst-case time complexityof O (cid:0) n log n + (cid:0) t t (cid:1) (3 t ) t n (cid:1) , where the n in the second term (cid:0) t t (cid:1) (3 t ) t n refers to thesize (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) in the worst case. However, the actual runtime seldom meets suchworst cases. This implies that there might be situations where the algorithm ismore efficient than in the others. To give an overall measure on the algorithmefficiency, it makes more sense to study from a probabilistic point of view (theaverage time complexity).Assuming that X is uniformly sampled from a convex k -gon as in [4, 8, 9, 5,10, 2], we shall prove in Section 4 that E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) = O (cid:0) kt log nt (cid:1) , much smallerthan n when t is between Ω (cid:16) log nk (cid:17) and O (cid:16) n (cid:17) . As a consequence, the expectedcomplexity of Atanassov et al.’s algorithm in [7] is O (cid:0) n log n + (cid:0) t t (cid:1) (3 t ) t kt log nt (cid:1) .This explains the gap between the worst case complexity and the actual runtime.In addition, we also study the expected number of vertices on the first t convex shells U [ t ] ( X ) as defined in Definition 4. This is also related to a partialshape fitting problem [11] in which the parallelogram rather than the convexpolygon as in [7] is concerned. The time complexity of the algorithm in [11]is O ( n t + n log n ), where the n in the first term n t refers to (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) inthe worst case. As we shall prove E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) = O ( kt log n ) for t = O ( √ n ) inSection 3, the expected time complexity is then O (cid:0) kt n log n + n log n (cid:1) , smallerthan the worst-case complexity when Ω (cid:0) ( n/k ) / (cid:1) ≤ t ≤ O ( n/ ( k log n )).It is beneficial to study the convex hulls and convex shells together. Theirclose relation is revealed in Lemma 2 that U [ t ] ( X ) ⊆ V [ t ] ( X ). An upper boundon (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) is then automatically an upper bound on (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) and a lowerbound on (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) is automatically a lower bound on (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) .2 .1. Notations and Definitions We introduce the notations and definitions before reviewing the existingworks. Let X denote the planar point set and n = | X | be its size. When X is a random point set, we use P to denote the convex polygon from which X is sampled, and k denotes the number of vertices of P . We now present thedefinition of the convex layer structure as in [12]. Definition 1 (Convex Layers) . Given a planar point set X , the first convexlayer H ( X ) is defined to be the convex hull H ( X ) of the whole point set. The t -th convex layer H t ( X ) is inductively defined to be the convex hull of the re-maining points, after the points on the first ( t − convex layers have beenremoved from X . Definition 2 (Convex Order) . If a point p ∈ X is on the t -th convex layer,then we say the convex order of p is t . Definition 3 (Shell Order) . Suppose p ∈ X . For a line ℓ through p , we let N (1) ( p, ℓ, X ) and N (2) ( p, ℓ, X ) denote the number of points of X on its twosides respectively, and N ( p, ℓ, X ) := min (cid:0) N (1) ( p, ℓ, X ) , N (2) ( p, ℓ, X ) (cid:1) . The shellorder of p is then defined to be min ℓ : p ∈ ℓ N ( p, ℓ, X ) where { ℓ : p ∈ ℓ } consists all linespassing through the point p . Remark 1.
Intuitively, if a point p has shell order t , then for all lines ℓ through p , there can not be fewer than t points on either side of ℓ . At the same time,there exists a line ℓ through p , such that there are exactly t points on one sideof ℓ . Definition 4 ( t -th Convex Shell) . For t ≥ , the subset U t ( X ) of X is definedto be the set of points of shell order ( t − (see Figure 1 for an illustration). And S t ( X ) is defined to be the convex hull of U t ( X ) . As will be proved in Lemma 1,the points in U t ( X ) are in the convex position and they are exactly the verticesof S t ( X ) . The convex polygon S t ( X ) is thus referred to as the t -th convex shell,and the size of S t ( X ) is defined as | U t ( X ) | . H ( X ) the convex hull of X H t ( X ) the t -th convex layer of XV ( X ) the vertices of H ( X ) V t ( X ) the vertices of H t ( X ) S t ( X ) the t -th convex shell of XU t ( X ) the vertices of S t ( X ) A ( X ) the area of S ( X ) A t ( X ) the area of S t ( X ) Table 1: Notations used in this work.
The frequently used notations are listed in Table 1. Note that S ( X ) = H ( X ) by definition. For convenience, we also let V [ t ] ( X ) := S ti =1 V i ( X ) and U [ t ] ( X ) := S ti =1 U i ( X ). The main results in Section 3 and 4 are proved using the techniques devel-oped for computing the expected convex hull size. We thus review the worksthat study the random convex hull, in terms of its area and the number of itsvertices. Most of the interests have been in their expectations, concentrationbounds and asymptotic behaviors.A fundamental result is that the expected size of a random convex hull is O ( k log n ), when a large number n of points are independently and uniformlysampled from a convex k -gon. The result was first stated by Stein [10] and ageometric proof was later provided by Har-Peled [2, Section 2]. By the relation E | V ( X ) | = n [1 − E A ( X )] proposed in [10], an upper bound on E | V ( X ) | willfollow from a lower bound on E A ( X ). Thus in [2], the effort is devoted toderiving a lower bound on the expected area of the convex hull. A criticalobservation in [2, Section 2] is that, if p ∈ X is a vertex of the convex hull,then there exists a line ℓ through p such that one side of ℓ contains no pointsof X . This gives a necessary condition on p ∈ H ( X ), and a lower bound on theprobability of the event p ∈ H ( X ) can then be obtained. Multiplying this lowerbound by n immediately yields an lower bound on E A ( X ).In addition to the expectation, there have been a number of studies on4he asymptotic behaviors of the convex hull size, such as [4, 13, 14, 15]. In[4, Corollary 3], Affentranger and Wieacker proved that, given X uniformlysampled from a simple polytope in R d with k vertices, the expected size ofthe convex hull E | V ( X ) | is asymptotically kd ( d +1) d − log d − n + O (log d − n ) as n → ∞ . When d = 2, the result becomes E | V ( X ) | = k log n + O (1). Massefurther proved that | V ( X ) | / ( k log n ) converges to 1 in probability [14].There are also studies that assume different underlying distribution for thepoint set. When the n points are sampled independently from a coordinate-wiseindependent distribution in R d , it is proved by Nguyen in [16] that the expectedsize of the t -th convex layer is O (cid:16) t d log d − nt d (cid:17) . Some studies assume the pointset is sampled independently and uniformly from other shapes rather than aconvex polygon. In the case of a disc, the expected size of the convex hull isΘ( n / ), due to Raynaud [17]. In this work, we introduce the definition of the t -th convex shell S t ( X )and its fundamental properties. When X is uniformly sampled from a k -gon,we show that the expected number of vertices on the first t convex shells E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) = O ( kt log n ) and the expected size of the first t convex layers E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) = O ( kt log nt ). We also prove a matching lower bound E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) =Ω( t log n ) when X is sampled from a triangle or a parallelogram, which, since U [ t ] ( X ) ⊆ V [ t ] ( X ), is also a lower bound for E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) in the two special cases.Finally, we show that the two upper bounds are helpful in understanding theaverage case complexity of two partial shape fitting algorithms, both of whichaim to enclose ( n − t ) of the n given points with a shape of the minimum area.One shape is parallelogram and the other is convex polygon. In section 2 we give the fundamental properties of convex layers and convexshells. In section 3, we present the proof of the upper bound on the expectedsize of the first t convex shells, when the n points in X are sampled from a5onvex polygon. In Section 4, we prove the upper bound on the expected sizeof the first t convex layers under the same setting. In Section 5, we derive thelower bounds on the expected size of the first t convex shells for two specialcases. Finally in section 6, we apply our results to the average-case analysis oftwo shape fitting algorithms.
2. Preliminaries
In this section, we prepare some fundamental knowledge on convex shellsand convex hulls.
The following lemma shows that the points in U t ( X ) are the vertices of the t -th convex shell S t ( X ). With this lemma, we can then let S t ( X ) denote theconvex polygon with the points in U t ( X ) as its vertices, as we have alreadymentioned in Definition 4. Lemma 1.
For a planar point set X , the points in the t -th convex shell of X are in the position of a convex polygon. This is equivalent to say that the U t ( X ) has only one convex layer.Proof. Suppose there are at least two convex layers in U t ( X ). The first andsecond convex layers of U t ( X ) are denoted by C and C , respectively. Let V denote the vertices of C , and V := U t ( X ) \ V . For any point p ∈ V , let ℓ be the line through p such that there are exactly ( t −
1) points on one side.Notice that ℓ is through p and thus also through the interior of C . Hence, onthe side of ℓ which contains ( t −
1) points, there must exist a point q whichbelongs to V . This implies that for the line ℓ ′ through q and parallel to ℓ ,there are at most ( t −
2) points on its one side. This contradicts the fact that q ∈ V ⊆ U t ( X ). Finally we conclude that there can be only one single convexlayer in each U t ( X ).The next lemma reveals the relation between convex shells and convex layers.6 ( X ) U ( X ) U ( X ) U ( X ) Figure 1: The solid line is the 1st convex shell U ( X ), the dashed line is the 2ndshell U ( X ), the dotted line is the 3rd shell U ( X ) and the dash dot line is the 4thshell U ( X ). The vertices in each shell are in the convex positions, thus the shell isreferred to as the convex shell. Lemma 2. U [ t ] ( X ) ⊆ V [ t ] ( X ) .Proof. If a point p ∈ X \ V [ t ] ( X ), then p can only lie on the ( t + 1)-st or a deeperlayer of X . On any side of any line passing through p , there must be at leastone vertex from each previous layer, including the 1-st to the t -th. In total thereare at least t points and by Definition 4 it holds that p / ∈ U [ t ] ( X ). In conclusion, U [ t ] ( X ) ∩ ( X − V [ t ] ( X )) = ∅ and thus U [ t ] ( X ) ⊆ V [ t ] ( X ).The following lemma reveals the relative position of different convex shells.It shows that the vertices on the first t convex shells are outside the ( t + 1)-stshell. Lemma 3. U [ t ] ( X ) ∩ S t +1 ( X ) = ∅ and thus S t ( X ) ⊆ H (cid:0) X \ U [ t − ( X ) (cid:1) .Proof. Suppose not. Let p ∈ U [ t ] ( X ) ∩ S t +1 ( X ) and ℓ be a line through p onone side of which there are at most ( t −
1) points. As S t +1 ( X ) is a convexpolygon with U t +1 ( X ) as its vertices and p ∈ U [ t ] ( X ), p must lie in the interiorof S t +1 ( X ) and thus ℓ is through the interior of S t +1 ( X ). Then there must bea q ∈ U t +1 ( X ) on the side of ℓ where there are at most ( t −
1) points. Let ℓ ′ denote the line through q and parallel to ℓ . Then there are at most ( t −
2) pointson one side of ℓ ′ and this contradicts the fact that q ∈ U t +1 ( X ). Lemma 4. If X ∪ X = X , then we have U [ t ] ( X ) ⊆ U [ t ] ( X ) ∪ U [ t ] ( X ) . roof. For each point p ∈ U [ t ] ( X ), there exists a line ℓ through it, on one sideof which there are at most ( t −
1) points of X . Then there will be neither morethan ( t −
1) points of X nor more than ( t −
1) points of X on the same sideof ℓ . Then we have p ∈ U [ t ] ( X ) if p ∈ X , or p ∈ U [ t ] ( X ) if p ∈ X .The following corollary is a generalization to k subsets. Corollary 1.
Given X = X ∪ X ∪ · · · ∪ X k , we have U [ t ] ( X ) ⊆ U [ t ] ( X ) ∪ U [ t ] ( X ) ∪ · · · ∪ U [ t ] ( X k ) . The following lemma is an analogous result of Lemma 4 for V [ t ] . Lemma 5. If X ∪ X = X , then H t ( X ) ∪ H t ( X ) ⊆ H t ( X ) and V [ t ] ( X ) ⊆ V [ t ] ( X ) ∪ V [ t ] ( X ) .Proof. The statement is well-known when t = 1. Assume it holds for t and weshall prove it for ( t + 1). By the hypothesis assumption, V [ t ] ( X ) ⊆ V [ t ] ( X ) ∪ V [ t ] ( X − X ), we then have X − V [ t ] ( X ) ⊇ X − V [ t ] ( X ) ⊇ X − V [ t ] ( X ) ∪ V [ t ] ( X − X ) = X − V [ t ] ( X ) . Further by Definition 1, H t +1 ( X ) = H (cid:0) X − V [ t ] ( X ) (cid:1) ⊆ H (cid:0) X − V [ t ] ( X ) (cid:1) = H t +1 ( X ) . Similarly we also have H t +1 ( X ) ⊆ H t +1 ( X ). Therefore H t +1 ( X ) ∪ H t +1 ( X ) ⊆ H t +1 ( X ).Now we prove V [ t +1] ( X ) ⊆ V [ t +1] ( X ) ∪ V [ t +1] ( X ). For a point p ∈ V [ t +1] ( X ), p can not be interior of H t +1 ( X ). We have already shown that H t +1 ( X ) ∪ H t +1 ( X ) ⊆ H t +1 ( X ), so p can be interior of neither H t +1 ( X ) nor H t +1 ( X ).If p ∈ X , then p ∈ V [ t +1] ( X ); otherwise, p ∈ X and p ∈ V [ t +1] ( X ). Corollary 2.
Given X = X ∪ X ∪ · · · ∪ X k , we have V [ t ] ( X ) ⊆ V [ t ] ( X ) ∪ V [ t ] ( X ) ∪ · · · ∪ V [ t ] ( X k ) . .2. Convex Order The following lemma examines how the convex order of a point p in X willchange after an additional point q in added to X . Lemma 6.
Given a planar point set X and a point p ∈ X , the convex orderof p will either remain unchanged or increase at most by 1 after an additionalpoint q is added into X .Proof. By the proof of [6, Lemma 3.1], we know that V t ( X ) ⊆ V t ( X ∪ { q } ) ∪ V t +1 ( X ∪ { q } ). Then if p ∈ V t ( X ), then either p ∈ V t ( X ∪ { q } ) or p ∈ V t +1 ( X ∪ { q } ). Therefore, the convex order of p will either remain unchangedor increase by 1. The following lemma shows the relation between the expected size and theexpected area of the convex shells.
Lemma 7.
Let C denote a bounded and closed convex set in the plane. Thenfor n independently and uniformly sampled points in C , we have E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) ≤ n [1 − E A ( S t +1 ( X ))] . Proof.
On the one hand, by Lemma 3, the points in U [ t ] ( X ) can not lie in S t +1 ( X ). On the other hand, there might be points of X − U [ t ] ( X ) not lyingin S t +1 ( X ) either. Since those point not belonging to S t +1 ( X ) are uniform in C − S t +1 ( X ), in expectation we have E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) ≤ n E [1 − A ( S t +1 ( X ))] = n [1 − E A ( S t +1 ( X ))] . The following are elementary calculus inequalities connecting the sum andthe integral.
Lemma 8.
When f ( x ) is an increasing function on [0 , + ∞ ) , we have f (0) + Z n f ( x ) dx ≤ n X i =0 f ( i ) ≤ Z n +10 f ( x ) dx. Lemma 9.
Let f ( x ) be a function which is increasing on [0 , s ] and decreasingon [ s, + ∞ ) , where s ∈ [ t, t + 1] and < t < n − is an integer, then we have f (0) + Z n +10 f ( x ) dx − Z t +1 t f ( x ) dx ≤ n X i =0 f ( i ) ≤ Z n f ( x ) dx + 2 f ( s ) . (1) Proof.
On [0 , t ] where the function f ( x ) is increasing, we have f (0) + Z t f ( x ) dx ≤ X i ≤ t f ( i ) ≤ Z t f ( x ) dx + f ( s )and on [ t + 1 , + ∞ ) where the function f ( x ) is decreasing, we have Z n +1 t +1 f ( x ) dx ≤ X t +1 ≤ i ≤ n f ( i ) ≤ Z nt +1 f ( x ) dx + f ( s ) . Adding the two inequalities above, we get f (0) + Z n +10 f ( x ) dx − Z t +1 t f ( x ) dx ≤ n X i =0 f ( i ) ≤ Z n f ( x ) dx + 2 f ( s ) .
3. An Upper Bound on E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) In this section, we prove an upper bound O ( kt log n ) on E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) , whenthe n points in X are sampled independently and uniformly from a convex k -gon P , our proof is inspried by [2]. By definition 4, U [ t ] ( X ) consists of all the points p ∈ X with shell order at most ( t − p is in U [ t ] ( X )if and only if there exists a line ℓ through p , such that on one side of ℓ thereare at most ( t −
1) points. By this observation, we can derive an upper boundon the probability that the shell order of a point p ∈ X is at most ( t − Lemma 10.
Suppose that a point p ∈ X is passed through by two lines ℓ and ℓ , as shown in Figure 2. If all the four regions partitioned by ℓ and ℓ containat least t points of X , then for any line ℓ through p , there must be at least t points on either of the two sides. In other words, the point p can not be on thefirst t convex shells of X . ℓ ℓℓ Figure 2: The plane is divided into 4 sectors by the two line ℓ and ℓ . On each sideof the line ℓ , there is a complete region which contains at least t points. Proof.
On each side of any line ℓ through p , there is a complete region under thepartition by ℓ and ℓ (the gray and black regions in Figure 2), which containsat least t points of X . By Definition 3, we know that the convex order of p mustbe larger than t .Lemma 10 gives a necessary condition that a point p ∈ X is on the first t convex shells. That is, for any given partition of the plane by two lines ℓ and ℓ , there must be at most ( t −
1) points in one of the four regions. Thisnecessary condition yields an upper bound on E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) . The Lemma 10 givesa necessary condition that a point p ∈ X is on the first t convex shells. Thatis, for any give partition of the plane by two lines ℓ and ℓ , there must be nomore than ( t −
1) points in one of the four regions. By this necessary condition,we can now prove an upper bound on E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) . Theorem 1.
Let X be a set of n points sampled independently and uniformlyfrom a triangle T , and t = O ( √ n ) . Then E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) = O ( t log n ) . There are two differences between our proof for Theorem 1 and the one for[2, Lemma 2.5]. First, we consider the convex shell instead of the convex hull.Second, for the relation between the expected area and the expected size, weresort to Lemma 7 rather than [2, Lemma 2.1].11ol 1 col 2 col 3 col 4 col 5row 5row 4row 3row 2row 1
Figure 3: The figure shows how we partition a triangle into n cells when n = 5.There are 5 columns and 5 rows and the area of each cell is equal to each other. Proof of Theorem 1.
We partition T into n equal-area triangles by segmentsemanating from a fixed vertex. Each triangle is further partitioned into ( n −
1) trapezoids and a triangle with equal-area by line segments parallel to theopposite side. See Figure 3 for details. There are thus n cells in T , each ofwhich has area 1 /n . Let G ij denote the cell region in the i -th row and j -th column. We also define G [ i ,i ] , [ j ,j ] := i [ i ′ = i j [ j ′ = j G i ′ ,j ′ . When i = i or j = j , we abbreviate it as G i, [ j ,j ] and G [ i ,i ] ,j , respectively.Let Z j denote the number of cells which are on the bottom of the j -th columnand at the same time lie outside of the t -th convex shell. A cell G ij is said tobe outside the convex polygon S t ( X ) if G i,j \ S t ( X ) = ∅ . Now we need tofind an upper bound on E [ Z j ]. To do this, let I , I be the smallest row indicessuch that G [1 ,I ] , [1 ,j − , G [1 ,I ] , [ j +1 ,n ] contains at least t points in X , respectively.Then Z j ≤ max ( I , I ) ≤ I + I by Lemma 10 and thus E Z j ≤ E I + E I .Note thatPr( I = k ) = (cid:18) nt − (cid:19) · (cid:18) ( j − kn (cid:19) t − · j − n · (cid:18) n − ( j − k − n (cid:19) n − t ≤ (cid:18) nt − (cid:19) · (cid:18) kn (cid:19) t − · (cid:18) j − n (cid:19) t − · j − n · e − ( j − k − n − t ) n ≤ n · t − · (cid:18) j − n (cid:19) t · k t − · e − ( j − n − t ) n · ( k − ( I ) = n X k =1 k · Pr( I = k ) ≤ n · ( t − (cid:18) j − n (cid:19) t n X k =1 k t e − ( j − n − t ) n · ( k − = e ( j − n − t ) n n · ( t − (cid:18) j − n (cid:19) t n X k =1 k t e − ( j − n − t ) n · k ≤ en · ( t − (cid:18) j − n (cid:19) t n X k =1 k t e − ( j − n − t ) n · k . Observe that f ( x ) = x t exp (cid:16) − ( j − n − t ) n x (cid:17) is increasing on h , n t ( j − n − t ) i anddecreasing on h n t ( j − n − t ) , ∞ (cid:17) . It follows from Lemma 9 that E ( I ) ≤ e (cid:0) j − n (cid:1) t n · ( t − "Z ∞ x t e − ( j − n − t ) n x d x + 2 e − t (cid:18) tn ( j − n − t ) (cid:19) t ≤ e (cid:0) j − n (cid:1) t n · ( t − "(cid:18) n ( j − n − t ) (cid:19) t +1 Z ∞ x t e − x d x + 2 e − t (cid:18) tn ( j − n − t ) (cid:19) t = e (cid:0) j − n (cid:1) t n · ( t − (cid:18) nj − (cid:19) t +1 (cid:18) nn − t (cid:19) t +1 t ! + 2 e e − t t t n · ( t − (cid:18) nn − t (cid:19) t = ej − (cid:18) nn − t (cid:19) t +1 t + 2 e e − t t t n · ( t − (cid:18) nn − t (cid:19) t . Since (cid:18) nn − t (cid:19) t = (cid:18) − tn (cid:19) − t ≤ e − tn ( − t ) = e t n = O (1)by t = O (cid:16) n (cid:17) and, by Stirling’s approximation, e − t t t ( t − e − t t t +1 t ! ≤ e − t t t +1 √ πt t + e − t = O ( √ t ) . we arrive at E I = O (cid:16) tj − + √ tn (cid:17) . Similarly, E I = O (cid:16) tn − j + √ tn (cid:17) . Then wehave E Z j ≤ E I + E I ≤ O (cid:18) tj − tn − j + √ tn (cid:19) , Observe that at the top of the j -th column, the expected number of cells outside S t ( X ) is O (cid:16) tj − + tn − j + √ tn (cid:17) . From the four directions in total, the expected13umber of cells in T which are outside S t ( X ) is at most4 n − X j =2 O (cid:18) tj − tn − j + √ tn (cid:19) = O ( t log n ) , whence it follows that E A ( S t ( X )) ≥ − O (cid:18) t log nn (cid:19) . By Lemma 7, we finally conclude that E (cid:12)(cid:12) U [ t − (cid:12)(cid:12) = O ( t log n ), where the hiddenconstant is an absolute constant. Theorem 2.
Let X be a set of n points sampled independently and uniformlyfrom a convex k -gon. Then for t = O ( √ n ) , we have E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) = O ( kt log n ) .Proof. We partition the convex k -gon into k triangles. Let X , X , ..., X k denote the set of points of X in the triangles and n , n , ..., n k denote thenumber of points, respectively. Note that n , n , ..., n k are random numbersand P ki =1 n i = n . Then by Corollary 1, we have E (cid:2) U [ t ] ( X ) | n , n , ..., n k (cid:3) ≤ k X i =1 E (cid:2) U [ t ] ( X i ) | n i (cid:3) ≤ k X i =1 O ( t log n i ) = O ( kt log n ) . The claimed result follows immediately.
4. An Upper Bound on E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) In this section, we will prove an upper bound of O (cid:0) kt log nt (cid:1) on E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) ,when X is sampled uniformly from a convex k -gon. The proof is inspired by [5]and [16]. We first study the cases where the points in X are sampled uniformlyfrom a triangle T and obtain an upper bound O (cid:0) t log nt (cid:1) on E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) . Thenby Corollary 2 we can get an upper bound O (cid:0) kt log nt (cid:1) when points in X aresampled uniformly in a k -gon. The problem can be further reduced to findingan upper bound on the probability Pr (cid:0) p ∈ V [ t ] ( X ) (cid:1) for a single point p ∈ X ,which multiplied by n will be an upper bound on E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) . Theorem 3.
Let X be a set of n points sampled independently and uniformlyfrom a triangle T , then E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) = O (cid:0) t log nt (cid:1) . R R (0 ,
0) (1 , (cid:0) , (cid:1)(cid:0) , (cid:1) (cid:0) , (cid:1) (0 , (cid:0) , (cid:1) Figure 4: The triangle is divided into three parts, by the three line segments fromthe gravity center to the middle point of each edge.
Proof of Theorem 3.
As the combinatorial properties of convex hulls are affineinvariant, we can assume the three vertices of T are (0 , ,
0) and (0 , (cid:0) , (cid:1) to the midpoints( , (cid:0) , (cid:1) and (cid:0) , (cid:1) of each edge, the triangle T is partitioned into three re-gions R , R and R with equal area (see Figure 4). Then Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R i (cid:1) are all equal for i = 1 , , (cid:0) p ∈ V [ t ] ( X ) (cid:1) = X i =1 Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R i (cid:1) Pr( p ∈ R i )= X i =1 Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R i (cid:1) ·
13= Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) . (0 ,
0) (1 , , Figure 5: By the horizontal line and the vertical line through the given point p , theplane is divided into four quadrants. We now focus on finding an upper bound on Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) . Forthis purpose, the triangle T is divided into four quadrants by a vertical and ahorizontal line through p as shown in Figure 5. Figure 6: The four figures illustrate how we partition each quadrant of the triangle into cellswhen t = 4. In each single quadrant, the cells have the equal area. The diagonal cells in eachquadrant are marked out by shadow. The divisions ensure that there are t diagonal cells ineach quadrant. Each quadrant is further partitioned into multiple cells as in Figure 6. Thetriangular quadrant is partitioned into (2 t + 1) t cells by (2 t −
1) equally spacedhorizontal lines between (0 ,
0) and (0 , t −
1) equally spacedvertical lines between (0 ,
0) and (1 , t equal-area cells as in Figure 6. The constructions ensure thatin each quadrant, the cells have the equal area and there are t diagonal cellswhich are marked by shadow in Figure 6. We claim that if p ∈ V [ t ] ( X ), then at16east one of the diagonal cells must be empty. The proof of this claim is deferredto Lemma 11.When p ∈ R , the area of each quadrant is at least p p by [5, Section2], and each piece has probability mass at least p p . Therefore each diagonalcell has probability mass at least p p t , and the expected number of points inevery single cell is at least np p t . By the multiplicative form of Chernoff bound[18, Theorem 4.5], the probability that a diagonal cell is empty is at mostexp (cid:0) − np p t (cid:1) . Further by a union bound, the probability that in triangle T there is at least one empty diagonal cell is at most 4 t exp (cid:0) − np p t (cid:1) . Now weclaim that Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) ≤ t Z / e − ny t log (cid:18) y (cid:19) d y, whose proof is postponed to Lemma 14. Further we claim that Z / e − ny t log (cid:18) y (cid:19) d y = O (cid:18) t n log nt (cid:19) . and we put off the proof to Lemma 15. Combining the two results above, wesee that for any p ∈ X , Pr (cid:0) p ∈ V [ t ] ( X ) (cid:1) = O (cid:16) t n log (cid:0) nt (cid:1)(cid:17) . Finally we obtainthat E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) = O (cid:0) t log (cid:0) nt (cid:1)(cid:1) .Now we are ready to prove the following main theorem. Theorem 4.
Let X be a set of n points sampled independently and uniformlyfrom a convex k -gon, then we have E (cid:12)(cid:12) V [ t ] ( X ) (cid:12)(cid:12) = O (cid:0) kt log (cid:0) nt (cid:1)(cid:1) .Proof. As in the proof of Theorem 2, we partition the k -gon into k triangles. Let n , n , . . . , n k denote the number of points in each triangle. Then by Corollary 2we have E (cid:2) V [ t ] ( X ) | n , n , ..., n k (cid:3) ≤ k X i =1 E (cid:2) V [ t ] ( X i ) | n i (cid:3) ≤ k X i =1 O (cid:16) t log n i t (cid:17) = O (cid:16) kt log nt (cid:17) . (0 ,
0) (1 , , Figure 7: The diagonal cells are marked by the shadows. Connecting one point inthe diagonal cell of the same order in each quadrant, we can get a convex layer. Thefigure presents the cases when t = 4, there are in all t convex layers, each marked bya closed poly-line. In the rest of this section, we state and prove those lemma that are used inthe proof of Theorem 3. We denote the density and the cumulative distributionfunctions of p p by ρ p p ( · ) and F p p ( · ) respectively. Lemma 11. If p ∈ V [ t ] ( X ) , then there must be at least one empty diagonal cell.Proof. If none of the 4 t diagonal cells is empty, we can construct t convexlayers enclosing p , where each layer consists four diagonal cells, one from eachquadrant as shown in Figure 7. This implies that when taking only the points inthe diagonal cells into account, the convex order of p is at least ( t +1). Althoughthere may be other points besides those in the diagonal cells, from Lemma 6 weknow that the convex order of p cannot decrease after those additional pointsare included. This contradicts the fact that p ∈ V [ t ] ( X ) and our assumptionthat none of the 4 t diagonal cells is empty is thus not true. Lemma 12 ([5, Theorem 1]) . F p p ( y | p ∈ R ) ≤ F p p ( y | p ∈ [0 , × [0 , . Lemma 13 ([19, section I.8]) . ρ p p ( y | p ∈ [0 , × [0 , /y ) . Lemma 14. If Pr (cid:0) p ∈ V [ t ] ( X ) | p p = y, p ∈ R (cid:1) ≤ te − ny t , then Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) ≤ t Z / e − ny t log (cid:18) y (cid:19) d y. roof. It is easy to prove that p p reaches its maximum value at (cid:0) , (cid:1) for p ∈ H . Then we havePr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) = Z / Pr (cid:0) p ∈ V [ t ] ( X ) | p p = y, p ∈ R (cid:1) · ρ p p ( y | p ∈ R ) d y ≤ t Z / e − ny t · ρ p p ( y | p ∈ R ) d y = 4 t Z / e − ny t d F p p ( y | p ∈ R ) . By Lemma 12 and Lemma 13, Z / e − ny t d F p p ( y | p ∈ R ) ≤ Z / e − ny t d F p p ( y | p ∈ [0 , × [0 , Z / e − ny t log (cid:18) y (cid:19) d y. thus Pr (cid:0) p ∈ V [ t ] ( X ) | p ∈ R (cid:1) ≤ t Z / e − ny t log (cid:18) y (cid:19) d y. Lemma 15. R / e − ny t log (cid:16) y (cid:17) d y = O (cid:16) t n log nt (cid:17) .Proof. Substituting y with z = ny t we get I = Z / e − ny t log (cid:18) y (cid:19) d y = 16 t n Z / e − z (cid:18) log n t + log 1 z (cid:19) d z ≤ t n log n t Z ∞ e − z d z + 16 t n Z ∞ e − z log 1 z d z. Since both R ∞ e − z dz and R ∞ e − z log z d z are constants, we conclude Z / e − ny t log(1 /y )d y = O (cid:18) t n log nt (cid:19) .
5. The Lower Bound of E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) We shall prove the lower bound on the size of the first t convex shells fortwo special cases. In Subsection 5.1, We show an asymptotic lower bound19 t log n , when n points in X are sampled independently and uniformly from aparallelogram. Next in Subsection 5.2, we prove it is t log(2 n ) when the pointsare sampled uniformly from a triangle. We need the following lemma.
Lemma 16 ([4, Section 3]) . For all integer r, s ≥ and for all c ∈ (0 , wehave Z Z (1 − cxy ) n − s ( xy ) r d x d y = r !( d − c r +1 · log nn r +1 + O (cid:18) n r +1 (cid:19) as n tends to infinity. Without loss of generality, we may assume the parallelogram is a unit square[0 , × [0 , p = ( x , y ) ∈ X , we nowcompute the probability that it is on the first t convex shells of X . For thispurpose, we introduce the following definition. Definition 5.
Given a point p = ( x , y ) with ≤ x ≤ and ≤ y ≤ , thedividing line is defined to be ℓ : x x + y y = 1The line divides the unit square into a triangle of area 2 x y and a pentagonof area (1 − x y ). Notice that a sufficient condition for a point p to be onthe first t convex shells is that, there are no more than ( t −
1) points in thetriangular part. We thus have the following theorem.
Theorem 5.
For n independently and uniformly sampled points from a square,the expected size of the first t convex shells is at least t log n . roof. Pr( p ∈ U [ t ] ) ≥ Pr(no more than t points under the dividing line ℓ )= 4 Z Z t − X i =0 (cid:18) n − i (cid:19) (2 xy ) i (1 − xy ) n − − i dxdy ≥ t − X i =0 (cid:18) n − i (cid:19) Z Z (2 xy ) i (1 − xy ) n − − i dxdy ≥ t − X i =0 (cid:18) n − i (cid:19) Z Z (2 xy ) i (1 − xy ) n − − i d (2 x ) d (2 y )= t − X i =0 (cid:18) n − i (cid:19) Z Z (cid:16) xy (cid:17) i (cid:16) − xy (cid:17) n − − i dxdy = t − X i =0 i (cid:18) n − i (cid:19) Z Z ( xy ) i (cid:18) − xy (cid:19) n − − i dxdy. By Lemma 16, when n → ∞ , we have Z Z ( xy ) i (cid:18) − xy (cid:19) n − − i dxdy = i ! (cid:0) (cid:1) i +1 log nn i +1 + O (cid:18) n i +1 (cid:19) . Therefore, as n → ∞ ,Pr( p ∈ U [ t ] ) ≥ t − X i =0 i (cid:18) n − i (cid:19) " i ! (cid:0) (cid:1) i +1 log nn i +1 + O (cid:18) n i +1 (cid:19) = t − X i =0 (cid:20) · ( n − n − − i )! · n i · log nn + O (cid:18) i i ! n (cid:19)(cid:21) ≈ t − X i =1 nn + O (cid:18) n (cid:19) = 2 t log nn + O (cid:18) n (cid:19) . Finally, the expected number of points on the first t convex shells E (cid:12)(cid:12) U [ t ] (cid:12)(cid:12) = X p ∈ X Pr( p ∈ U [ t ] ) ≥ t log n. Theorem 6.
For n independent and uniformly sampled points from a triangle,the expected size of the first t convex shells is at least t log (2 n ) as n tends toinfinity. roof. The rectangle is divided into two congruent right triangles by a diagonal.Let X denote the set of points in the first triangle, and X ′ denote those in thesecond. By Lemma 4, since we know that U [ t ] ( X ∪ X ′ ) j U [ t ] ( X ) ∪ U [ t ] ( X ′ )and by the independent choice of points, then E X ∪ X ′ (cid:12)(cid:12) U [ t ] ( X ∪ X ′ ) (cid:12)(cid:12) = E X E X ′ (cid:12)(cid:12) U [ t ] ( X ) ∪ U [ t ] ( X ′ ) (cid:12)(cid:12) ≤ E X E X ′ (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) + E X E X ′ (cid:12)(cid:12) U [ t ] ( X ′ ) (cid:12)(cid:12) = E X (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) + E X ′ (cid:12)(cid:12) U [ t ] ( X ′ ) (cid:12)(cid:12) for any X and X ′ . Now we consider the special case where | X | = | X ′ | = n . Since the two triangles are congruent and | X | = | X ′ | = n , it holds that E X (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) = E X ′ (cid:12)(cid:12) U [ t ] ( X ′ ) (cid:12)(cid:12) . It follows from Theorem 5 that E X (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) ≥ · E X ∪ X ′ (cid:12)(cid:12) U [ t ] ( X ∪ X ′ ) (cid:12)(cid:12) ≥ · t log 2 n = t log 2 n. As proved in Lemma 2, we have U [ t ] ( X ) ⊆ V [ t +1] ( X ) for any planar pointset. This indicates that for a random point set X , if we get a lower bound on E (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) , it will automatically be a lower bound on E (cid:12)(cid:12) V [ t +1] ( X ) (cid:12)(cid:12) .
6. Applications
In this part, we illustrate how our results in Section 3 and 4 help in theaverage case analysis of two partial enclosing problems. The objectives are toenclose ( n − t ) of the given n points in X by a specified shape, such that thearea of the shape is minimized. This kind of problems is known as partial shapefitting where a predefined shape is to be detected on a given point set, only( n − t ) points of which are supposed to be in this shape. Such studies includes[7, 20, 21, 22, 23] and in [7, 21, 22] the other t points are regarded as outliers.The average case complexity is another important measure besides the worstcase complexity. As pointed out in [1], the average case analysis is desirablebecause the best-case and worst-case performance of an algorithm usually differsgreatly, especially for those output-sensitive ones. In this situation, average case22omplexity seems to be a more accurate and fair measurement to an algorithm’sperformance. Before conducting the analysis, a probability distribution shouldbe devised to the input point set. We choose the uniform distribution in aconvex polygon, which has been widely adopted in the computational geometrycommunity [4, 8, 9, 5, 10, 2]. The algorithm given in [11] studies how to find a parallelogram with theminimum-area that encloses ( n − t ) of the n given points. The time complexityof the algorithm is O (cid:0) t τ + n log n (cid:1) , where τ is the number of p ∈ X for whichthere exists a q ∈ X such that there are no more than t points on either side ofthe line pq . By this definition, we know that the point set is definitely U [ t ] ( X )and τ = (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) . In the worst case, we have (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) ≤ n and the worst casetime complexity is thus O (cid:0) n t + n log n (cid:1) . However, on average, we have E h O (cid:16) t (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) + n log n (cid:17)i ≤ E (cid:2) O (cid:0) nt (cid:12)(cid:12) U [ t ] ( X ) (cid:12)(cid:12) + n log n (cid:1)(cid:3) ≤ O (cid:0) kt n log n + n log n (cid:1) , when X is uniformly sampled from a k -gon. When t is between Ω (cid:16)(cid:0) nk (cid:1) / (cid:17) and O (cid:16) nk log n (cid:17) , the average case complexity is smaller than the worst-case complex-ity. This explains why in many cases the actual runtime of the algorithm isfaster than the worst-case complexity. Another application of our result is the algorithm for the minimum enclosingconvex hull. The problem is as follows. Let X be a set of n points in R .The task is to find a subset X ′ ⊂ X , | X ′ | = t , such that area of H t ( X \ X ′ ) is minimized. In [7], Atanassov et al. provide an elegant solution to thisproblem with time complexity O (cid:0) n log n + (cid:0) t t (cid:1) (3 t ) t | H [ t ] ( X ) | (cid:1) . In the worstcase, | H [ t ] ( X ) | = n , which happens when X has at most t layers. For the averagecase, Theorem 4 implies the time complexity of O (cid:0) n log n + k (cid:0) t t (cid:1) (3 t ) t t log nt (cid:1) ,when X is uniformly distributed in convex k -gon. This is better when t isbetween Ω (cid:16) log nk (cid:17) and O (cid:16) n (cid:17) . 23 eferences [1] R. A. Dwyer, Average-case analysis of algorithms for convex hulls andVoronoi diagrams, Citeseer, 1988.[2] S. Har-Peled, On the expected complexity of random convex hulls, arXivpreprint arXiv:1111.5340 (2011).[3] I. Hueter, The convex hull of a normal sample, Advances in applied prob-ability 26 (4) (1994) 855–875.[4] F. Affentranger, J. A. Wieacker, On the convex hull of uniform randompoints in a simpled-polytope, Discrete & Computational Geometry 6 (3)(1991) 291–305.[5] R. A. Dwyer, On the convex hull of random points in a polytope, Journalof Applied Probability 25 (4) (1988) 688–699.[6] K. Dalal, Counting the onion, Random Structures & Algorithms 24 (2)(2004) 155–165.[7] R. Atanassov, P. Bose, M. Couture, A. Maheshwari, P. Morin, M. Paquette,M. Smid, S. Wuhrer, Algorithms for optimal outlier removal, Journal ofdiscrete algorithms 7 (2) (2009) 239–248.[8] I. B´ar´any, et al., Sylvesters question: The probability that nn