Algorithms for Colourful Simplicial Depth and Medians in the Plane
aa r X i v : . [ c s . C G ] A ug ALGORITHMS FOR COLOURFUL SIMPLICIAL DEPTH AND MEDIANSIN THE PLANE
OLGA ZASENKO AND TAMON STEPHEN
Abstract.
The colourful simplicial depth (CSD) of a point x ∈ R relative to a configu-ration P = ( P , P , . . . , P k ) of n points in k colour classes is exactly the number of closedsimplices (triangles) with vertices from 3 different colour classes that contain x in their con-vex hull. We consider the problems of efficiently computing the colourful simplicial depth ofa point x , and of finding a point in R , called a median , that maximizes colourful simplicialdepth.For computing the colourful simplicial depth of x , our algorithm runs in time O ( n log n + kn )in general, and O ( kn ) if the points are sorted around x . For finding the colourful median,we get a time of O ( n ). For comparison, the running times of the best known algorithm forthe monochrome version of these problems are O ( n log n ) in general, improving to O ( n ) ifthe points are sorted around x for monochrome depth, and O ( n ) for finding a monochromemedian. Introduction
The simplicial depth of a point x ∈ R relative to a set P of n data points is exactlythe number of simplices (triangles) formed with the points from P that contain x in theirconvex hull. A simplicial median of the set P is any point in R which is contained in themost triangles formed by elements of P , i.e. has maximum simplicial depth with respect to P . Here we consider a set P that consists of k colour classes P , . . . , P k . The colourfulsimplicial depth of x with respect to configuration P is the number of triangles with verticesfrom 3 different colour classes that contain x . A colourful simplicial median of a configuration P = ( P , P , . . . , P k ) is any point in the convex hull of P with maximum colourful simplicialdepth.The monochrome simplicial depth was introduced by Liu [Liu90]. Up to a constant, it canbe interpreted as the probability that x is in the convex hull of a random simplex generatedby P . The colourful version, see [DHST06], generalizes this to selecting points from k distributions. Then medians are central points which are in some sense most representativeof the distribution(s). Our objective is to find efficient algorithms for finding both thecolourful simplicial depth of a given point x with respect to a configuration, and a colourfulsimplicial median of a configuration.1.1. Background.
Both monochrome and colourful simplicial depth extend to R d and arenatural objects of study in discrete geometry. For more background on simplicial depth andcompeting measures of data depth, see [Alo06] and [FR05]. Monochrome depth has seen aflurry of activity in the past few years, most notably relating to the First Selection Lemma ,which is a lower bound for the depth of the median, see e.g. [MW14].The colourful setting for simplicial depth is suggested by B´ar´any’s approach [B´ar82] toproving a colourful version of Carath´eodory’s theorem. Deza et al. [DHST06] formalized thenotion and considered bounds for the colourful depth of points in the intersection of the onvex hulls of the colours. Among the recent work on colourful depth are proofs of thelower [Sar15] and upper [ABP +
16] bounds conjectured by Deza et al., with the latter resultshowing beautiful connections to Minkowski sums of polytopes.The monochrome simplicial depth can be computed by enumerating simplices, but ingeneral dimension, it is quite challenging to compute it more efficiently [Alo06], [CO01],[FR05]. Several authors have considered the two-dimensional version of the problem, includ-ing Khuller and Mitchell [KM90], Gil, Steiger and Wigderson [GSW92] and Rousseeuw andRuts [RR96]. Each of these groups produced an algorithm that computes the monochromedepth in O ( n log n ) time, with sorting the input as the bottleneck. If the input points aresorted, these algorithms take linear time.For simplicial medians, Khuller and Mitchell [KM90], and Gil, Steiger, and Wigder-son [GSW92] considered an in-sample version of a simplicial median, that, they looked fora point from P with maximum simplicial depth. However, we consider a simplicial median to be any point x ∈ R maximizing the simplicial depth. Rousseeuw and Ruts [RR96] foundan algorithm to compute the simplicial median in O ( n log n ) time, Aloupis et al. [ALST03]improved this to O ( n ). This is arguably as good as should be expected, following the obser-vation of Lemma 3.1 in Section 3.1 that shows that there are in some sense Θ( n ) candidatepoints for the location of the colourful median. Remark 1.1.
The two groups [KM90, GSW92] who studied the in-sample median computedthe simplicial depth of each point in the data set P in total O ( n ) time. Organization and Main Results.
In Section 2, we develop an algorithm for comput-ing colourful simplicial depth that runs in O ( n log n + kn ) time. This retains the O ( n log n )asymptotics of the monochrome algorithms when k is fixed. As in the monochrome case,sorting the initial input is a bottleneck, and the time drops to O ( kn ) if the input is sortedaround x . In this case, for fixed k , it is a linear time algorithm.In Section 3, we turn our attention to computing a colourful simplicial median. We developan algorithm that does this in O ( n ) time using a topological sweep. This is independentof k and matches the running time from the monotone case. Section 4 contains conclusionsand discussion about future directions.2. Computing Colourful Simplicial Depth
Preliminaries.
We consider a family of sets P , P , . . . , P k ⊆ R , k ≥
3, where each P i consists of the points of some particular colour i . Refer to the j th element of P i as P ij .We generally use superscripts for colour classes, while subscripts indicate the position in thearray. We will sometime perform arithmetic operations on the subscripts, in which case theindices are taken modulo the size of the array i.e. (mod n i ).We denote the union of all colour sets by P : P = k S i =1 P i . The total number of points is n , where | P i | = n i , k P i =1 n i = n . We assume that points of P S { x } are in general position toavoid technicalities. Without loss of generality, we can take x = ~
0, the zero vector.
Definition 2.1. A colourful triangle is a triangle with one vertex of each colour, i.e. it isa triangle whose vertices v , v , v are chosen from distinct sets P i , P i , P i , where i i = i , i ; i = i . efinition 2.2. The colourful simplicial depth ˆ D ( x, P ) of a point x relative to the set P in R is the number of colourful triangles containing x in their convex hull. We reserve D ( x, P ) for the (monochrome) simplicial depth, which counts all triangles from P regardless of thecolours of their vertices. x Figure 1.
A configuration P of 8 points in R surrounding a point x withˆ D ( x, P ) = 6. Remark 2.3.
We are checking containment in closed triangles. With our general positionassumption, this will not affect the value of ˆ D ( x, P ) . It is more natural to consider closedtriangles than open triangles in defining colourful medians; the open triangles version of thisquestion may also be interesting. Throughout the paper we work with polar angles θ ij formed by the data points P ij anda fixed ray from x . We remark that simplicial containment does not change as points aremoved on rays from x , see for example [ZS00]. Thus we can ignore the moduluses of the P ij ,and work entirely with the θ ij , which lie on the unit circle C with x as its origin. We will attimes abuse notation, and not distinguish between P ij and θ ij .Note that the ray taken to have angle 0 is arbitrary, and may be chosen based on anunderlying coordinate system if available, or set to the direction of the first data point P .We can sort the input by polar angle, in other words, we can order the points around x .(Perhaps it is naturally presented this way.) We reduce the θ ij to lie in the range [0 , π ).The antipode of some point α on the unit circle is ¯ α = ( α + π ) mod 2 π . A key fact incomputing CSD is that a triangle △ abc does not contain x if and only if the correspond-ing polar angles of points a , b and c lie on a circular arc of less than π radians. This isillustrated in Fig. 2, and is equivalent to the following lemma, stated by Gil, Steiger andWigderson [GSW92]: Lemma 2.4.
Given points a , b , c on the unit circle C centred at x , let ¯ a be antipodal to a . Then △ abc contains x if and only if ¯ a is contained in the minor arc (i.e. of at most π radians) with endpoints b and c . Outline of Strategy.
Recall that we denote the ordinary and colourful simplicialdepth by D ( x, P ) and ˆ D ( x, P ) respectively. We can compute ˆ D ( x, P ) by first computing D ( x, P ) and then removing all triangles that contain less than three distinct colours. To thisend, we denote the number of triangles with at least two vertices of colour i as D i ( x, P ).When x and P are clear from the context, we will abbreviate these to D , ˆ D and D i .Since we can compute D ( x, P ) efficiently using the algorithms mentioned in the introduc-tion [GSW92], [KM90], [RR96], the challenge is to compute D i ( x, P ) for each i = 1 , , . . . , k . bc ¯ aa Figure 2.
Antipode ¯ a falls in the minor arc between b and c and, therefore,the triangle △ abc contains x .Then we conclude ˆ D ( x, P ) = D ( x, P ) − k P i =1 D i ( x, P ). To compute D i efficiently for eachcolour i , we walk around the unit circle tracking the minor arcs between pairs of points ofcolour i , and the number of antipodes between them. We do this is in linear time in n bymoving the front and back of the interval once around the circle, and adjusting the num-ber of relevant antipodes with each move. This builds on the approach of Gil, Steiger andWigderson [GSW92] for monochrome depth. Remark 2.5.
When computing D i , we count antipodes of all k colours; the triangles withthree vertices of colour i will be counted three times: △ abc , △ bca and △ cab . Thus thequantity obtained by this count is in fact D i ∗ := D i + 2 k P i =1 D ( x, P i ) . We separately compute k P i =1 D ( x, P i ) , allowing us to correct for the overcounting at the end. Data Structures and Preprocessing.
We begin with the arrays θ i of polar angles,which we sort if necessary. All elements in k S i =1 θ i are distinct due to the general positionrequirement. By construction we have:(1) 0 ≤ θ i < θ i < . . . < θ in i − < π, for all 1 ≤ i ≤ k . Let ¯ θ i be the array of antipodes of θ i , also sorted in ascending order. We generate ¯ θ i byfinding the first θ ij ≥ π , moving the part of the array that begins with that element to thefront, and hence the front of the original array to the back; π is subtracted from the elementsmoved to the front and added to those moved to the back. This takes linear time.We merge all ¯ θ i into a common sorted array denoted by A . Now we have all antipodesordered as if we were scanning them in counter-clockwise order around the circle C with origin x . Let us index the n elements of A starting from 0. Then, for each colour i = 1 , . . . , k , wemerge A and θ i into a sorted array A i . Once again, this corresponds to a counter-clockwiseordering of data points around C .While building A i , we associate pointers from the elements of array θ i to the correspondingposition (index) in A i . This is done by updating the pointers whenever a swap occurs duringthe process of merging the arrays. Denote the index of some θ ij in A i by p ( θ ij ). Then thenumber of the antipodes that fall in the minor arc between two consecutive points θ ij and θ ij θ il ( i,j ) ¯ θ ij θ il ( i,j )+1 Figure 3.
Index l ( i, j ) and index ( l ( i, j ) + 1) θ ij +1 on C is (cid:0) p (cid:0) θ ij +1 (cid:1) − p (cid:0) θ ij (cid:1) − (cid:1) , if p (cid:0) θ ij (cid:1) < p (cid:0) θ ij +1 (cid:1) , or (cid:0) n + n i − p (cid:0) θ ij (cid:1) + p (cid:0) θ ij +1 (cid:1) − (cid:1) ,if p (cid:0) θ ij (cid:1) > p (cid:0) θ ij +1 (cid:1) . Note that p (cid:0) θ ij (cid:1) is never equal to p (cid:0) θ ij +1 (cid:1) .Now, for each point θ ij , we find the index l ( i, j ) in the corresponding array θ i suchthat ∠ θ ij , x, θ il ( i,j ) < π and ∠ θ ij , x, θ il ( i,j )+1 > π (Fig. 3). Thus the sequence of points θ ij , θ ij +1 , . . . , θ il ( i,j ) is maximal on an arc shorter than π . Viewing the minor arc betweentwo points as an interval, the intervals with left endpoint θ ij and right end point from thissequence overlap and can be split into small disjoint intervals as follows:(2) (cid:2) θ ij , θ it (cid:1) = t [ h = j +1 (cid:2) θ ih − , θ ih (cid:1) , where t = j + 1 , . . . , l ( i, j ) . Computing D i ∗ . Let us the denote the count of the antipodes within the minor arcbetween a and b by c ( a, b ). Then D i ∗ can be written as follows:(3) D i ∗ = n i − X j =0 l ( i,j ) X t = j +1 c (cid:0) θ ij , θ it (cid:1) . Note that index t is taken modulo n i . From (2) we have:(4) c (cid:0) θ ij , θ it (cid:1) = t X h = j +1 c (cid:0) θ ih − , θ ih (cid:1) , for t = j + 1 , . . . , l ( i, j ) . Due to (3) and (4), we have:(5) D i ∗ = n i − X j =0 l ( i,j ) X t = j +1 t X h = j +1 c (cid:0) θ ih − , θ ih (cid:1) . Let C ih = c (cid:0) θ ih − , θ ih (cid:1) , | C i | = n i . Then (5) can be rewritten as:(6) D i ∗ = n i − X j =0 l ( i,j ) X t = j +1 t X h = j +1 C ih . Let us create an array of prefix sums: S i , where S it = P th =0 C ih , | S i | = n i . This array canbe filled in O ( n i ) time and proves to be very useful when we need to calculate a sum of theelements of C i between two certain indices. In fact, such sum can be obtained in constanttime using the elements of array S i : t X h = j +1 C ih = S it − S ij , if t ≥ j + 1 , j = n i − ,S in i − + S it − S ij , if t < j + 1 , j = n i − ,S it , if j = n i − . Combining (6) and (7), we get: D i ∗ = n i − X j =0 l ( i,j ) X t = j +1 S it − n i − X j =0 (( l ( i, j ) − j ) mod n i ) · S ij + , if t ≥ j + 1 or j = n i − , n i − P j =0 l ( i,j ) P t = j +1 S in i − , if t < j + 1 . (8)Let us create another array of prefix sums T i , where T ij = P jt =0 S it , | T i | = n i . This arrayis used to retrieve the sum of elements of S i between the indices j + 1 and l ( i, j ) in O (1)time:(9) l ( i,j ) X t = j +1 S it = T il ( i,j ) − T ij , if l ( i, j ) ≥ j + 1 , j = n i − ,T in i − + T il ( i,j ) − T ij , if l ( i, j ) < j + 1 , j = n i − ,T il ( i,j ) , if j = n i − . Also note that the index t runs from j + 1 to l ( i, j ). So t < j + 1 in (8) is only possibleif initially j + 1 > l ( i, j ) and we wrapped around the array. In other words, t < j + 1 isequivalent to j + 1 > l ( i, j ) and t = 0 , . . . , l ( i, j ). Hence: D i ∗ = n i − X j =0 (cid:0) T il ( i,j ) − T ij − (( l ( i, j ) − j ) mod n i ) · S ij (cid:1) + n i · T in i − + n i − P j =0 l ( i,j ) P t =0 S in i − , if l ( i, j ) < j + 1 , , otherwise . (10)After simplifying, we obtain: D i ∗ = n i − X j =0 (cid:0) T il ( i,j ) − T ij − (( l ( i, j ) − j ) mod n i ) · S ij (cid:1) + ( n i · (cid:0) T in i − + (( l ( i, j ) + 1) mod n i ) · S in i − (cid:1) , if l ( i, j ) < j + 1 , , otherwise . (11)2.5. Algorithm and Analysis.
First, we find all polar angles and their antipodes, whichtakes O ( n ) in total. Second, we sort the arrays of polar angles θ i and their correspondingantipodal elements ¯ θ i , which gives us O (cid:18) k P i =1 n i log n i (cid:19) . Third, we need to rotate ¯ θ i , so thatthey are in ascending order. This will take O ( n ) time. Then we compute for each i thenumber of triangles with all three vertices of colour i that contain x , i.e. D ( x, P i ), using thealgorithm of Rousseeuw and Ruts [RR96] for sorted data. This will run in O ( n i ), for each i , or lgorithm 1 CSD ( x , P ) Input: x , P = ( P , . . . , P k ) . Output: ^D ( x , P ) . Sum1 ← , Sum2 ← ; for i ← , k do for j ← , n i − θ ij ← polar angle of ( P ij − x ) mod π ; ¯ θ ij ← ( θ ij + π ) mod π ; end for Sort ( θ i ); ⊲ while permuting ¯ θ i Restore the order in ¯ θ i ; Sum1 ← Sum1 + D ( x , θ i ); ⊲ use the algorithm from [RR96] end for A ← Merge ( ¯ θ , . . . , ¯ θ k ); ⊲ A is sorted D ← D ( x , A ); ⊲ use the algorithm from [RR96] for i ← , k do B ← Merge ( A , θ i ); ⊲ update p ( θ ij ) the pointers of θ ij , ⊲ B stands for A i for j ← , n i do ⊲ j = j mod n i if p ( θ ij − ) < p ( θ ij ) then C j ← p ( θ ij ) − p ( θ ij − ) − ; ⊲ C = C i - array of antipodal counts else C j ← n + n i − p ( θ ij − ) + p ( θ ij ) − ; end if end for Find l ( i , ) using binary search in θ i ; S ← C ; T ← S ; ⊲ S = S i , T = T i - prefix sum arrays for j ← , n i − Find l ( i , j ); S j ← S j − + C j ; T j ← T j − + S j ; end for Sum2 ← Sum2 + D i ∗ ( x , P ) obtained from the formula (11); delete B , C , S , T ; end for return ^D ( x , P ) = D − ( Sum2 − ∗ Sum1 ) ; ⊲ Sum1 = k P i = D ( x , P i ) ( n ) in total. Hence lines 2-10 of the Algorithm 1 take O (cid:18) k P i =1 n i + k P i =1 n i log n i (cid:19) = O ( n log n )time to complete. This follows from the facts that k P i =1 n i = n and n log n is convex.To generate the sorted array A of antipodes, we merge the k single-coloured arrays usinga heap (following e.g. [CLR89]) in O ( n log k ) time. We need to compute the monochromedepth D ( x, P ) of x with respect to all points in P , regardless of colour. For this we canuse the sorted array of antipodes rather than sorting the original array. Thus we again usethe linear time monochrome algorithm [RR96] with x and A . Note that working with theantipodes is equivalent due to the fact that the simplicial depth of x does not change if werotate the system of data points around the centre x .After that, we execute a cycle of k iterations – one for each colour. It starts with merg-ing two sorted arrays A and θ i , which is linear in the size of arrays we are merging andtakes O (cid:18) k P i =1 ( n + n i ) (cid:19) = O ( kn ) in total. Filling the arrays C is linear. Since the l ( i, j )appear in sequence in the array θ i , we find the first one l ( i,
0) using a binary search thattakes O (log n i ), and O ( k log n ) in total. Then we find the rest of l ( i, j ) in O ( n ) time foreach i by scanning through the array starting from the element θ il ( i, . The remaining theoperations take constant time to execute. Therefore, total running time of Algorithm 1 is O ( n + n log n + n + n log k + kn + k log n + kn ) = O ( n log n + kn ). The n log n term cor-responds to the initial sorting of the data points, if they are presented in sorted order, therunning time drops to O ( kn ).As for space, arrays θ i , ¯ θ i and A take O (3 n ) = O ( n ) space in total. Note that merging k sorted arrays into A can be done in place [GG10]. At each iteration i , we create B of size O ( n + n i ), and C , S , T of size O ( n i ) each. Fortunately, we only need these arrays withinthe i th iteration, so we can delete them in the end (line 31 of the Algorithm 1) and reuse thespace freed. To store the indices l ( i, j ), we need O ( n ) space, which again can be reallocatedwhen i changes. Thus the amount of space used by our algorithm is O ( n ).An implementation of this algorithm is available on-line [Zas16]. Remark 2.6.
In Section 3, we will want to compute the colourful simplicial depth of thedata points themselves. This can be done by computing ˆ D ( x, P \ { x } ) and counting colourfulsimplices which have x as a vertex. This is the number of pairs of vertices of some othercolour, so for x of colour i ′ , it is k P i =1 i = i ′ n i · k P j = i +1 j = i ′ n j . A naive evaluation of this expression takes O ( k ) time. Instead we use prefix sum arrays to compute it in linear time. Let K i = i P j =1 j = i ′ n j .Then k P j = i +1 j = i ′ n j = K k − K i , and we compute k P i =1 i = i ′ n i · ( K k − K i ) to obtain the result. Array K takes O ( k ) space and takes linear time to fill out. Computing Colourful Simplicial Medians
Preliminaries.
Consider a family of sets P , P , . . . , P k ∈ R , k ≥
3, where each P i consists of the points of some particular colour i . Define n i = | P i | , for i = 1 , . . . , k . Let P e the union of all colour sets: P = k S i =1 P i . Recall that we denote the CSD of a point x ∈ R relative to P by ˆ D ( x, P ).Our objective is to find a point x inside the convex hull of P , denoted conv ( P ), maximizingˆ D ( x, P ). Call the depth of such a point ˆ µ ( P ). Let S be the set of line segments formed byall possible pairs of points ( A, B ), where A ∈ P i , B ∈ P j , i < j . We will refer to these as colourful segments . R R R G G B B a bce df gh i Figure 4.
A configuration P of 7 points in R , whose simplicial median hasdepth 6 and occurs at points B , G , B , G , d, f, g .The following lemma (from [ALST03]) is here adapted to a colourful setting: Lemma 3.1.
To find a point with maximum colourful simplicial depth it suffices to considerthe intersection points of the colourful segments in S .Proof. The segments of S partition conv ( P ) into cells of dimension 2, 1, 0 of constantcolourful simplicial depth [DHST06]. Consider a 2-dimensional cell. Let p be a point inthe interior of this cell, q a point on the interior of an edge and v a vertex, so that q and v belong to the same line segment (Fig. 5). Then the following inequality holds:ˆ D ( p, P ) ≤ ˆ D ( q, P ) ≤ ˆ D ( v, P ), since any colourful simplices containing p also contain q , andany containing q also contain v . (cid:3) Let col ( A ) denote the colour of a point A . We store the segments in S as pairs of points: s = ( A, B ), col ( A ) < col ( B ). It is helpful to view each segment as directed, i.e. a vector,with A as the tail and B as the head. Each segment s extends to a directed line h dividing R into two open half-spaces: s + and s − , where s + lies to the right of the vector s , and s − to the left (Fig. 6). We denote the set lines generated by segments by H , so that everysegment s ∈ S corresponds to a line h ∈ H .We call the intersection points of the segments in S vertices . Note that drawing thecolourful segments is equivalent to generating a rectilinear drawing of the complete graph K n Unlike the monochrome case, here some cells may not be convex, and some points of conv ( P ) may falloutside any cell. q v Figure 5.
An example of a cell
AB s + s − (a) A Bs + s − (b) Figure 6. s + and s − of the segment s = ( A, B )with a few edges removed (the monochrome ones). Thus, unless the points are concentratedin a single colour class, the Crossing Lemma (see e.g. [PRTT06]) shows that we will haveΘ( n ) vertices. More precisely, we have a rectilinear drawing of a complete k -partite graph;bounds for this are considered some graphs from this family by Gethner et al. [GHL +
16] andreferences therein.Computing the CSD of each of these points gives an O ( n log n ) algorithm for findinga simplicial median. To improve this, we follow Aloupis et al. [ALST03], and computethe monochrome simplicial depth of most vertices based on values of their neighbours andinformation about the half-spaces of local segments.Denote the number of points in s + that have colours different from the endpoints of s by r ( s ), and those in s − by l ( s ). Let r i ( s ) and l i ( s ) be the number of points of a colour i in s + and s − respectively. Let ¯ r i ( s ) and ¯ l i ( s ) be the number of points of all k colours except for thecolour i in s + and s − respectively. So for segment s = ( A, B ), we have quantities as follows¯ r col ( A ) ( s ) = k P i =1 ,i = col ( A ) r i ( s ), ¯ l col ( A ) ( s ) = k P i =1 ,i = col ( A ) l i ( s ). Then it follows: r ( s ) = ¯ r col ( A ) ( s ) − r col ( B ) ( s )and l ( s ) = ¯ l col ( A ) ( s ) − l col ( B ) ( s ). The quantities ¯ r col ( A ) ( s ) and ¯ l col ( A ) ( s ), r col ( B ) ( s ) and l col ( B ) ( s ),can be obtained as byproducts of an algorithm that computes half-space depth.The half-space depth HSD ( x, P ) of a point x relative to data set P is the smallest numberof data points in a half-plane through the point x [Tuk75]. An algorithm to compute half-space depth is described by Rousseeuw and Ruts [RR96], it runs in O ( | P | ) time when P issorted around x . It calculates the number of points k i in P that lie strictly to the left of eachline formed by x and some point P i , where x is the tail of the vector −→ xP i . Then the numberof points to the right −→ xP i is | P | − k i −
1. Algorithm 2 uses a version of the half-space depthalgorithm to produce the quantities r ( s ) and l ( s ) that are used by our algorithm. lgorithm 2 Preprocessing: Computing r ( s ) , l ( s ) Input: P , P , . . . , P k . Output: r ( s ) , l ( s ) for all s ∈ S. Construct List ( P i ) lists of points sorted around P i for eachi = , . . . , n − S ← ∅ , H ← ∅ ; for i ← , n − t ← pop ( List ( P i )); ¯ θ col ( P i ) = polar angles of List ( P i ) with NO points of colour col ( P i ); θ col ( P t ) = polar angles of List ( P i ) of colour col ( P t ) only; Compute ¯r col ( P i ) ( s ) , ¯l col ( P i ) ( s ) while running HSD ( P i , ¯ θ col ( P i ) ) [RR96]; for i ′ ← , k do if i ′ > col ( P i ) then Compute r i ′ ( s ) , l i ′ ( s ) during the execution of HSD ( P i , θ i ′ ) [RR96]; for j ← , n i ′ − s ← ( P i , P i ′ j ); ⊲ create a new segment ver ( s ) ← ∅ ; cross ( ver ( s )) ← ∅ ; h ← ( slope ( s ) , intercept ( s )); push ( S , s ); push ( H , h ); r ( s ) ← ¯r col ( P i ) ( s ) − r i ′ ( s ); l ( s ) ← ¯l col ( P i ) ( s ) − l i ′ ( s ); end for delete θ i ′ ; end if end for delete ¯ θ col ( P i ) ; end for The algorithm of [LC85] will, for each P i ∈ P , sort P \ { P i } around P i in Θ( | P | ) time. Inparticular, it assigns every point P i ∈ P a list of indices that determine the order of points P \ { P i } in the clockwise ordering around P i . Denote this by List ( P i ). These ideas allow usto compute r ( s ) and l ( s ) for every segment s . At every iteration i , we form arrays of sortedpolar angles ¯ θ col ( P i ) and θ i ′ . Together they take O (2 n ) = O ( n ) space.3.2. Computing a Median.
To compute the CSD of all vertices, we carry out a topologicalsweep (see e.g. [EG89]). We begin by extending the segments in S to a set of lines H . Theset V ∗ of intersection points of lines of H includes the Θ( n ) vertices V which are on theinterior of a pair of segments of S , points from P , and additional exterior intersections. Wecall points in V ∗ \ V phantom vertices .Call a line segment of any line in H between two neighbouring vertices, or a ray from avertex on a line that contains no further vertices an edge . A topological line is a curve in R that is topologically a line and intersects each line in H exactly once. We choose an initialtopological line to be an unbounded curve that divides R into two pieces such that all thefinitely many vertices in V lie on one side of the curve, by convention the right side. We callthis line the leftmost cut . We call a vertical cut the list ( c , c , . . . , c m ) of the m = | H | edges ntersecting a particular topological line. For each i , 1 ≤ i ≤ m − c i and c i +1 share a 2-cellin the complex induced by H . Two vertical cuts are illustrated in Fig. 7(a).The topological sweep begins with the leftmost cut and moves across the arrangement tothe right, crossing one vertex at a time. If two edges c i and c i +1 of the current cut have acommon right endpoint, we store the index i in the stack I . For example, in Figure 7(a), I = { , } . An elementary step is performed when we move to a new vertex by popping thestack I . In Figure 7(b), we have moved past the vertex v , a common right endpoint of c and c which is the intersection point of h and h . The updated stack is I = { , } . h h h h h c c c c c (a) The leftmost cut h h h h h vc c c c c (b) An elementary step in a topologicalsweep Figure 7.
We focus on the elementary steps, because at each step we can compute the CSD of thecrossed vertex. As it moves, the topological line retains the property that everything tothe left of it has already been swept over. That is, if we are crossing vertex v that belongsto segment s , every vertex of the line containing s on the opposite side of the topologicalline prior to crossing has already been swept. For each segment s ∈ S we store the lastprocessed vertex and denote it by ver ( s ), along with its CSD. Since every vertex lies at theintersection of two segments, we also store the crossing segment for s and ver ( s ), denote itby cross ( ver ( s )). Before starting the topological sweep, for each s ∈ S we assign ver ( s ) = ∅ ,and cross ( ver ( s )) = ∅ . After completing an elementary step where we crossed a vertex v thatlies at the intersection of s i and s j , we assign ver ( s i ) ← v , ver ( s j ) ← v , cross ( ver ( s i )) = s j , cross ( ver ( s j )) = s i .The topological sweep skips through phantom vertices (see Lemma 3.2), and computesthe CSD of vertices in P directly. We now explain how we process a non-phantom vertex v at an elementary step when we have an adjacent vertex already computed. Assume v isat the intersection of s i = −→ AB and s k = −→ EF , see Figure 8(a). Without loss of generalitywe take ver ( s i ) = p , where cross ( ver ( s i )) = s j . We view this elementary step as movingalong the segment s i from its intersection point with s j to the one with s k . Each intersectingsegment forms a triangle with every point strictly to one side. Thus when we leave segment s j = ( C, D ) behind, we exit as many colourful triangles that contain p as there are points onthe other side of s j of colours different from col ( C ) and col ( D ). When we encounter segment s k = ( E, F ), we enter the colourful triangles that contain v formed by s k and each point ofa colour different from col ( E ) and col ( F ) on the other side of s k . Let us denote the x and y coordinates of a point A by A.x and
A.y respectively. Now, to compute the CSD of v knowing the CSD of p , we execute Subroutine 3.When both ver ( s i ) = ∅ , ver ( s k ) = ∅ , i.e. vertex v is the first vertex to be discovered forboth segments (Fig. 8(b)), we execute CSD ( v, P ) to find the depth, and otherwise update vAA ′ B B ′ C D EFs i s j s k (a) Two adjacent vertices p and v and theircorresponding line segments. A colourful tri-angle △ CDA ′ contains p but not v , where col ( A ′ ) / ∈ { col ( C ) , col ( D ) } . Similarly, a colour-ful triangle △ EF B ′ contains v but not p ,where col ( B ′ ) / ∈ { col ( E ) , col ( F ) } . vA = p BEFs i s k (b) Here ver ( s i ) = ∅ , hence cross ( ver ( s i )) = ∅ ,and we can not runSubroutine 3. Figure 8.
Capturing a new vertex
Subroutine 3 Computing ˆ D ( v ) from ˆ D ( p ) Input: ^D ( p ) , p , v , s j = ( C , D ) , s k = ( E , F ) . Output: ^D ( v ) . if ( v . x − C . x )( D . y − C . y ) − ( v . y − C . y )( D . x − C . x ) < then ^D ( v ) ← ^D ( p ) − r ( s j ); else ^D ( v ) ← ^D ( p ) − l ( s j ); end if if ( p . x − E . x )( F . y − E . y ) − ( p . y − E . y )( F . x − E . x ) < then ^D ( v ) ← ^D ( v ) + r ( s k ); else ^D ( v ) ← ^D ( v ) + l ( s k ); end if in the usual way. Since once a segment s has ver ( s ) nonempty it cannot return to beingempty, we call CSD at most O ( n ) times. Lemma 3.2.
At phantom vertices, we do not need to compute the CSD or update values of ver ( s ) .Proof. First, notice that the simplicial median itself won’t occur at such a point as it iseither inside a cell or a colourful segment, as noted in Lemma 3.1. Then, notice that if weare moving along a segment of an extended line s outside the segment (i.e. extended from asegment, but not inside it) then we will next encounter either another phantom vertex or avertex from P . In the first case, we repeat the current situation and no computations areneeded, and in the second, we do not require prior vertex information to find the depth aswe have already computed the vertex directly from a call to CSD ( v, P ).On the other hand, if we are moving along s inside a segment, then the line we are crossingmeets s outside of its segment. This means that as we cross we are not entering or leavingany colourful triangles, so colourful depth remains constant and will not attain its maximum t the crossing. We do not need to update ver ( s ) since the previous point from V remainsthe relevant one from the perspective (cid:3) Indeed, it is possible to run the algorithm without generating or considering phantomvertices. We include them here as they provide a canonical starting point, the leftmost cut,and a clean definition of vertical cuts. Also, we wanted to address why they don’t affectthe calculation. In practice, including phantom vertices will increase the time and memoryrequirements by constant factor.3.3.
Running Time and Space Analysis.
Algorithm 4 is our main algorithm. First,it computes the half-space counts r ( s ) and l ( s ), which has a running time of O ( n ). Atthe same time, we initialize the structure S that contains the colourful segments, setting ver ( s ) = ∅ and cross ( ver ( s )) = ∅ for all s ∈ S . Note that these as well as H , List ( P i ), r ( s ), l ( s ) require O ( n ) storage.Sorting the lines in H according to their slopes while also permuting the segments in S takes O ( n log n ) time. We assume non-degeneracy and no vertical lines (these can use somespecial handling, see e.g. [EM88]). Computing the CSD of points where no previous vertexis available takes O ( n log n + kn ) total time. The topological sweep takes linear time in thenumber of intersection points of H, so O ( n ). We do not store all the vertices, but only oneper segment. Steps 15–29 (except for 18) in Algorithm 4 take O(1) time, including the callsto the Subroutine 3. As for step 18, it is executed at most n times; following its execution, ver ( s ) will be initiated for the two relevant segments. Therefore, the total time it will takeis O ( n log n + kn ). Hence overall our algorithm takes O ( n ) time and needs O ( n ) storage.Algorithm 4 returns a point that has maximum colourful simplicial depth along with itsCSD. It is simple to modify the algorithm to return a list of all such points if there is morethan one. We believe that maintaining such a list will not increase the required storage,i.e. it will contain O ( n ) points throughout the execution of the algorithm, but we haven’tproved this. 4. Conclusions and Questions
Our main result is an algorithm computing the colourful simplicial depth of a point x relative to a configuration P = (cid:0) P , P , . . . , P k (cid:1) of n points in R in k colour classes canbe solved in O ( n log n + kn ) time, or in O ( kn ) time if the input is sorted. If we assume, asseems likely, that we cannot do better without sorting the input, then for fixed k this resultis optimal up to a constant factor. It is an interesting question whether we can improve thedependence on k , in particular when k is large.Computing colourful simplicial depth in higher dimension is very challenging, in particularbecause there is no longer a natural (circular) order of the points. Non-trivial algorithms formonochrome depth do exist in dimension 3 [CO01], [GSW92], but we do not know of anynon-trivial algorithms for d ≥
4. Algorithms for monochrome and colourful depth in higherdimension are an appealing challenge. Indeed, for ( d + 1) colours in R d , it is not even clearhow efficiently one can exhibit a single colourful simplex containing a given point [BO97],[DHST08]. lgorithm 4 Computing ^ µ ( P ) Input: P , . . . , P k , S , H , r ( s ) , l ( s ) . Output: v , ^ µ ( P ) . Run Algorithm 2; ⊲ Compute r ( s ) , l ( s ) ; Sort H while permuting S; max ← for i ← , n − θ = polar angles of List ( P i ); ^D ( P i ) ← CSD ( P i , θ ); if d > max then max ← ^D ( P i ); median ← P i ; end if end for I ← ∅ ; Push common right endpoints of the edges of the leftmost cut onto I; while I = ∅ do ⊲ Start of the topological sweep. v ← pop ( I ); ⊲ v lies at the intersection of s i = ( A , B ) and s k = ( E , F ) if v lies in the interiors of s i and s k then if ver ( s i ) = ∅ & ver ( s k ) = ∅ then ^D ( v ) = CSD ( v , P ); else if ver ( s i ) = ∅ then ^D ( v ) ← Subroutine 3 ( ^D ( p ) , p , v , s j , s k ); ⊲ p = ver ( s i ) , s j = cross ( ver ( s i )) else ^D ( v ) ← Subroutine 3 ( ^D ( p ) , p , v , s j , s i ); ⊲ p = ver ( s k ) , s j = cross ( ver ( s k )) end if if ^D ( v ) > max then max ← ^D ( v ); median ← v ; end if ver ( s i ) ← v, ver ( s k ) ← v, cross ( ver ( s i )) ← s k , cross ( ver ( s k )) ← s i ; end if Push any new common right endpoints of the edges onto I; end while ⊲ End of the topological sweep. return ( median , max ) . cknowledgments This research was partially supported by an NSERC Discovery Grant to T. Stephen andby an SFU Graduate Fellowships to O. Zasenko. We thank A. Deza for comments on thepresentation.
References [ABP +
16] Karim Adiprasito, Philip Brinkmann, Arnau Padrol, Pavel Pat´ak, Zuzana Pat´akov´a, and RamanSanyal,
Colorful simplicial depth, Minkowski sums, and generalized Gale transforms , preprint.arXiv:1607.00347, 2016.[Alo06] Greg Aloupis,
Geometric measures of data depth , Data depth: robust multivariate analysis,computational geometry and applications, DIMACS Ser. Discrete Math. Theoret. Comput. Sci.,vol. 72, Amer. Math. Soc., Providence, RI, 2006, pp. 147–158.[ALST03] Greg Aloupis, Stefan Langerman, Michael Soss, and Godfried Toussaint,
Algorithms for bivariatemedians and a Fermat-Torricelli problem for lines , Comput. Geom. (2003), no. 1, 69–79.[B´ar82] Imre B´ar´any, A generalization of Carath´eodory’s theorem , Discrete Math. (1982), no. 2-3,141–152.[BO97] Imre B´ar´any and Shmuel Onn, Colourful linear programming and its relatives , Math. Oper. Res. (1997), no. 3, 550–567.[CLR89] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, Introduction to algorithms , TheMIT Press and McGraw-Hill Book Company, 1989.[CO01] Andrew Y. Cheng and Ming Ouyang,
On algorithms for simplicial depth , Proceedings of the 13thCanadian Conference on Computational Geometry, 2001, pp. 53–56.[DHST06] Antoine Deza, Sui Huang, Tamon Stephen, and Tam´as Terlaky,
Colourful simplicial depth , Dis-crete Comput. Geom. (2006), no. 4, 597–615.[DHST08] Antoine Deza, Sui Huang, Tamon Stephen, and Tam´as Terlaky, The colourful feasibility problem ,Discrete Appl. Math. (2008), no. 11, 2166–2177.[EG89] Herbert Edelsbrunner and Leonidas J. Guibas,
Topologically sweeping an arrangement , J. Com-put. System Sci. (1989), no. 1, 165–194, 18th Annual ACM Symposium on Theory of Com-puting (Berkeley, CA, 1986).[EM88] Herbert Edelsbrunner and Ernst Peter M¨ucke, Simulation of simplicity: a technique to copewith degenerate cases in geometric algorithms , Proceedings of the Fourth Annual Symposium onComputational Geometry (Urbana, IL, 1988), ACM, New York, 1988, pp. 118–133.[FR05] Komei Fukuda and Vera Rosta,
Data depth and maximum feasible subsystems , Graph theoryand combinatorial optimization, GERAD 25th Anniv. Ser., vol. 8, Springer, New York, 2005,pp. 37–67.[GG10] Viliam Geffert and Jozef Gajdoˇs,
Multiway in-place merging , Theoret. Comput. Sci. (2010),no. 16-18, 1793–1808.[GHL +
16] Ellen Gethner, Leslie Hogben, Bernard Lidick´y, Florian Pfender, Amanda Ruiz, and MichaelYoung,
On crossing numbers of complete tripartite and balanced complete multipartite graphs ,Journal of Graph Theory (2016), to appear.[GSW92] Joseph Gil, William Steiger, and Avi Wigderson,
Geometric medians , Discrete Math. (1992),no. 1-3, 37–51.[KM90] Samir Khuller and Joseph S. B. Mitchell,
On a triangle counting problem , Inform. Process. Lett. (1990), no. 6, 319–321.[LC85] D. T. Lee and Y. T. Ching, The power of geometric duality revisited , Inform. Process. Lett. (1985), no. 3, 117–122.[Liu90] Regina Y. Liu, On a notion of data depth based on random simplices , Ann. Statist. (1990),no. 1, 405–414.[MW14] Jir´ı Matousek and Uli Wagner, On Gromov’s method of selecting heavily covered points , DiscreteComput. Geom. (2014), no. 1, 1–33.[PRTT06] J´anos Pach, Radoˇs Radoiˇci´c, G´abor Tardos, and G´eza T´oth, Improving the crossing lemma byfinding more crossings in sparse graphs , Discrete Comput. Geom. (2006), no. 4, 527–552. RR96] Peter J Rousseeuw and Ida Ruts,
Bivariate location depth , Applied Statistics: Journal of theRoyal Statistical Society Series C (1996), no. 4, 516–526.[Sar15] Pauline Sarrabezolles, The colourful simplicial depth conjecture , J. Combin. Theory Ser. A (2015), 119–128.[Tuk75] John W. Tukey,
Mathematics and the picturing of data , Proceedings of the International Congressof Mathematicians (Vancouver, B. C., 1974), Vol. 2, Canad. Math. Congress, Montreal, Que.,1975, pp. 523–531.[Zas16] Olga Zasenko,
Colourful simplicial depth in the plane , 2016, Java code, available at https://github.com/olgazasenko/ColourfulSimplicialDepthInThePlane , accessed Augst25th, 2016.[ZS00] Yijun Zuo and Robert Serfling,