Computation of quantile sets for bivariate data
CComputation of quantile sets for bivariate ordered data
Andreas H. Hamel ∗ , Daniel Kostner † Abstract
Algorithms are proposed for the computation of set-valued quantiles and thevalues of the lower cone distribution function for bivariate data sets. These newobjects make data analysis possible involving an order relation for the data pointsin form of a vector order in two dimensions. The bivariate case deserves specialattention since two-dimensional vector orders are much simpler to handle than suchorders in higher dimensions. Several examples illustrate how the algorithms workand what kind of conclusions can be drawn with the proposed approach.
MSC 2010.
Keywords. cone distribution function, set-valued quantile, polyhedral set, Benson’salgorithm, vector order, complete lattice
Algorithms are presented for computing the quantile sets for bivariate random variables aswell as the values of the corresponding lower cone distribution function in the presence ofan order relation for their values. Such quantiles in the general multivariate case have beendefined in [6] as a common generalization of univariate quantiles and Tukey’s (halfspace)depth regions. Likewise, the lower cone distribution function is a common generalizationof the univariate cumulative distribution function and Tukey’s (halfspace) depth function.It can also be used as a ranking function for multi-criteria decision making problems [9].Moreover, it was shown in [2] that set-valued quantiles and the lower cone distributionfunctions form Galois connections between complete lattices of sets and the interval [0,1] ∗ Free University of Bozen, Faculty of Economics and Management, University Square 1, 39031 Bru-neck, Italy, [email protected] † Free University of Bozen, Faculty of Economics and Management, University Square 1, 39100 Bozen,Italy, [email protected] a r X i v : . [ m a t h . S T ] J a n f real number. This generalizes a property which is straightforward and well-known in theunivariate case, but has never been discussed with respect to depth functions and depthregions. It is also shown in [2] that set-valued quantiles characterize the distribution of arandom set extension of the original random variable as well as its capacity functional.The bivariate case deserves special attention since convex cones in IR have a verysimple representation (every closed convex cone is polyhedral, i.e., the intersection of afinite number of halfspaces—this number being 1 or 2 in almost all cases) and, using this,the computations can be done much faster than in the general IR d -valued case: there arepolyhedral cones with arbitrary many facets already in IR .Algorithms for the bivariate location depth and corresponding depth regions weregiven in [17, 18]. Algorithms for depth functions and regions in general dimensions canbe found, for example, in [5, 10, 11, 12, 15]. These references are mainly concerned withTukey depth functions/regions, i.e., they do not take an order relation for the value ofthe random variable into account. The reader may compare [4] which is one of the veryfew references dealing with statistics for multivariate ordered data.On the other hand, an order relation is often present and intuitive since decisionmakers have preferences or the impact of some events is clearly preferred over the onesof others. A few examples illustrating this feature are discussed below: hurricane scales,hail insurance and human resource management. It is beyond the scope of this paper togive an exhaustive statistical analysis of these example; they are used as showcases for thetype of conclusions which can be drawn with the approach initiated in [6], in particularwhat sets it apart from a mere depth function approach.The paper is organized as follows. In the next section, vector orders in IR are reviewed.Section 3 includes the definition of the main concepts and preparatory results. Thealgorithms are presented in Section 4 while in Section 5 several examples are discussedincluding a new view on hurricane scales and the problem of finding best candidates fortasks/jobs. The basic assumption is that there is a preference relation for the two-dimensional datapoints in form of a vector preorder, i.e., a reflexive and transitive relation which is com-patible with the algebraic operations in IR . Such vector preorders are in one-two-onecorrespondence with convex cones C ⊆ IR including 0 ∈ IR via y ≤ C z ⇔ z − y ∈ C (2.1)(see, for example, [1, Chap. 8]). A convex cone C ⊆ IR is a set satisfying sC ⊆ C for all s > C + C ⊆ C . 2n the following, it is assumed that the cone C generating the preorder via (2.1) isclosed. Such vector preorders in IR have some special features compared to the case d >
2. Only the following cases are possible:1. The cone is a linear subspace which is either C = { } , or C = IR , or a straightline C = L in IR .2. The cone is a ray: C = { sb | s ≥ } for some b ∈ IR \{ } .3. The cone is generated by two linearly independent vectors: C = { s b + s b | s , s ≥ } for some b , b ∈ IR which are linearly independent.4. The cone is a closed (homogeneous) halfspace: C = { s b − s b + s ¯ z | s , s , s ≥ } for two linearly independent vectors b, ¯ z ∈ IR \{ } .There are interesting non-closed cones such as the lexicographic ordering cone evenin IR ; such cases require a different type of analysis (since the bipolar theorem does notapply) and therefore, they will not be considered here.The case C = { } leads to the Tukey halfspace depth function and regions; it is dealtwith, e.g., already in [18]. While the case C = IR is trivial, the case C = L will not bediscussed in this paper. Finally, if C is a closed halfspace, there is w ∈ IR \{ } such that C = H + ( w ) = { z ∈ IR | w T z ≥ } ; in this case, the order ≤ H + ( w ) is a total preorder (areflexive, transitive relation such that either y ≤ H + ( w ) z or z ≤ H + ( w ) y or both) and thesituation can be reduced to the univariate case.In this paper, the main subject is the case of a closed convex pointed cone with non-empty interior, i.e., C is generated by two linearly independent vectors b , b ∈ IR , then C is theintersection of exactly two halfspaces, i.e., there are v , v ∈ IR \{ } linearly independentsuch that C = H + ( v ) ∩ H + ( v )(choose v and v orthogonal to b and b , respectively, such that ( v ) T b ≥
0, ( v ) T b ≥ b , b as well as v , v are known, and the twosets { b , b } and { v , v } are called a V-representation and an H-representation of C ,respectively. The set C + = (cid:8) w ∈ IR | ∀ z ∈ C : w T z ≥ (cid:9) = (cid:8) w ∈ IR | w T b , w T b ≥ (cid:9) is called the (positive) dual cone of C (always a closed convex cone). Under the givenassumptions, C + = (cid:91) t ≥ tB + , where B + = { sv + (1 − s ) v | s ∈ [0 , } is a base of C + , i.e., for each w ∈ C + \{ } thereare unique t > b ∈ B + such that w = tv .3 Empirical cone distribution functions and quantiles
In this section, we give the definitions of lower cone distribution functions and associ-ated quantiles for bivariate random variables in case of a finite data sets. Let ˜ X = (cid:8) x , x , . . . , x N (cid:9) ⊆ IR be a finite collection of data points which could be a sample ofa random variable. The following definition provides the bivariate empirical counterpartto the concepts from [6]. Compare also [9] for the IR d -valued case with applications to amulti-criteria decision making problem. Definition 3.1
The functions F X,w : IR → [0 , for w ∈ B + and F X,C : IR → [0 , defined by F ˜ X,w ( z ) = 1 N (cid:110) x ∈ ˜ X | x ∈ z − H + ( w ) (cid:111) and (3.1) F ˜ X,C ( z ) = min w ∈ B + F ˜ X,w ( z ) = 1 N min w ∈ B + (cid:110) x ∈ ˜ X | x ∈ z − H + ( w ) (cid:111) . (3.2) are called empirical lower w -distribution function and empirical lower C -distribution func-tion, respectively, for the data set ˜ X . The functions w - depth ( z ; ˜ X ) := N · F ˜ X,w ( z ) and c - depth ( z ; ˜ X ) := N · F ˜ X,C ( z ) arecalled the w -location depth and the cone location depth for ˜ X .The functions w - depth and c - depth can be interpreted as follows. For each point z ∈ IR , the w -location depth gives the number of data points which are dominated by z with respect to the total preorder generated by H + ( w ), i.e., data points x ∈ ˜ X satisfying w (cid:62) x ≤ w (cid:62) z . The cone location depth of z ∈ IR gives the minimal number of data pointswhich are dominated by z with respect to all total preorders generated by H + ( w ) for w ∈ C + . Thus, a point z ∈ IR dominates at least c - depth ( z ; ˜ X ) data points with respectto the total preorder generated by H + ( w ) for all w ∈ C + , i.e., no matter which weightedaverage with weights from C + is taken. Data points which are higher ranked than othersare “deeper” in the sense that they improve with respect to more w ’s at the same time. Proposition 3.2 (1) w (cid:62) y ≤ w (cid:62) z implies w - depth ( y ; ˜ X ) ≤ w - depth ( z ; ˜ X ) ;(2) y ≤ C z implies c - depth ( y ; ˜ X ) ≤ c - depth ( z ; ˜ X ) . Proof.
This follows directly from the monotonicity property of F ˜ X,w and F ˜ X,C ,respectively, in [6, Proposition 1 (b)]. (cid:3)
The following example shows that the cone location depth (as well as F ˜ X,C ) can beunderstood as a ranking function for the data points which reflects the order ≤ C . Thisseems to be very much in the spirit of Tukey’s original work. This example also showsthat points which are non-comparable with respect to the order ≤ C can have the same orvery different cone location depths. 4 xample 3.3 For every data point in Figure 3.1 Tukey’s (halfspace) depth HD and thecone location depth CD are computed. One may already realize that the cone locationdepth “follows the cone” (increases in directions in which the cone “opens”) whereas thehalfspace depth increases toward the center of the data cloud. This means that the conelocation depth ranks the data points taking into account the order generated by the cone.This is a new feature not captured by depth functions.
Figure 3.1: C = IR , N = 20 , p = 0 . , (cid:100) N p (cid:101) = 4Empirical quantiles are set-valued functions, i.e., they map into the power set P (IR ),the set of all subsets of IR including the empty set. Definition 3.4
The empirical w -quantile function Q − ˜ X,w : [0 , → P (IR ) and the empir-ical C -quantile function Q − ˜ X,C : [0 , → P (IR ) associated to ˜ X and C are defined by Q − ˜ X,w ( p ) = (cid:8) z ∈ IR | F ˜ X,w ( z ) ≥ p (cid:9) and Q − ˜ X,C ( p ) = (cid:8) z ∈ IR | F ˜ X,C ( z ) ≥ p (cid:9) , respectively. The definitions of F ˜ X,w and F ˜ X,C immediately yield Q − ˜ X,w ( p ) = (cid:110) z ∈ IR | { x ∈ ˜ X | x ∈ z − H + ( w ) } ≥ (cid:100) N p (cid:101) (cid:111) (3.3) Q − ˜ X,C ( p ) = (cid:26) z ∈ IR | min w ∈ B + { x ∈ ˜ X | x ∈ z − H + ( w ) } ≥ (cid:100) N p (cid:101) (cid:27) (3.4)5here (cid:100)
N p (cid:101) is the value of the ceiling function at
N p defined by (cid:100) r (cid:101) = min { k ∈ IN | r ≤ k } (the least natural number which is greater than or equal to r ∈ IR). Clearly, Q − ˜ X,w ( p ) = (cid:110) z ∈ IR d | w - depth ( z ; ˜ X ) ≥ (cid:100) N p (cid:101) (cid:111) and Q − ˜ X,C ( p ) = (cid:110) z ∈ IR d | c - depth ( z ; ˜ X ) ≥ (cid:100) N p (cid:101) (cid:111) . Let ˜ Y ⊆ IR be a finite univariate data set. We denote its empirical lower quantile by q − ˜ Y ( p ) = min { ¯ y ∈ ˜ Y | { y ∈ ˜ Y | y ≤ ¯ y } ≥ (cid:100) N p (cid:101)} . With this notation, one has ∀ p ∈ (0 ,
1) : Q − ˜ X,w ( p ) = (cid:110) z ∈ IR | w (cid:62) z ≥ q − w (cid:62) ˜ X ( p ) (cid:111) and (3.5) ∀ p ∈ (0 ,
1) : Q − ˜ X,C ( p ) = (cid:92) w ∈ B + (cid:110) z ∈ IR d | w (cid:62) z ≥ q − w (cid:62) ˜ X ( p ) (cid:111) (3.6)as in the general (multi-dimensional) case (see [6, Proposition 6]).Next, we formally state that at least one data point must be on the boundary of each w -quantile. Proposition 3.5 If p ∈ [0 , , w ∈ B + and Q − ˜ X,w ( p ) (cid:54)∈ { IR , ∅} , then there is x ( w, p ) ∈ ˜ X such that Q − ˜ X,w ( p ) = x ( w, p ) + H + ( w ) . (3.7) Moreover, the following three conditions are equivalent for ˜ x ∈ ˜ X :(a) Q − ˜ X,w ( p ) = ˜ x + H + ( w ) .(b) One has { x ∈ ˜ X | x ∈ ˜ x − H + ( w ) } ≥ (cid:100) N p (cid:101) , (3.8) { x ∈ ˜ X | x ∈ ˜ x − int H + ( w ) } < (cid:100) N p (cid:101) . (3.9) (c) q − w (cid:62) ˜ X ( p ) = w (cid:62) ˜ x . Proof.
Since Q − ˜ X,w ( p ) is a closed halfspace with normal w , there is a point y ∈ IR d such that Q − ˜ X,w ( p ) = y + H + ( w ). By (3.3), (cid:110) x ∈ ˜ X | x ∈ y − H + ( w ) (cid:111) ≥ (cid:100) N p (cid:101) . If therewould be no data point on the boundary of this halfspace, one even had (cid:110) x ∈ ˜ X | x ∈ y − int H + ( w ) (cid:111) ≥(cid:100) N p (cid:101) . Then, there would exist y (cid:48) ∈ y − int H + ( w ) with (cid:110) x ∈ ˜ X | x ∈ y (cid:48) − int H + ( w ) (cid:111) ≥(cid:100) N p (cid:101) , hence y (cid:48) ∈ Q − ˜ X,w ( p ), but y (cid:48) (cid:54)∈ y + H + ( w ), a contradiction.6a) ⇒ (b): If (a) is true, then ˜ x ∈ Q − ˜ X,w ( p ) and (3.8) follows from (3.3). If (3.9) wouldnot be true, then, by a similar argument as in the first part of the proof, ˜ x would not bein Q − ˜ X,w ( p ).(b) ⇒ (a): Assume z ∈ ˜ x + H + ( w ). Then z − H + ( w ) ⊇ ˜ x − H + ( w ), hence { x ∈ ˜ X | x ∈ z − H + ( w ) } ≥ { x ∈ ˜ X | x ∈ ˜ x − H + ( w ) } ≥ (cid:100) N p (cid:101) which means z ∈ Q − ˜ X,w ( p ). Hence ˜ x + H + ( w ) ⊆ Q − ˜ X,w ( p ). Conversely, if z ∈ Q − ˜ X,w ( p ), then { x ∈ ˜ X | x ∈ z − H + ( w ) } ≥ (cid:100) N p (cid:101) by (3.8). This and (3.9) imply z (cid:54)∈ ˜ x − int H + ( w )(otherwise, z − H + ( w ) ⊆ ˜ x − int H + ( w ) which leads to a contradiction). But then z ∈ ˜ x + H + ( w ), hence Q − ˜ X,w ( p ) ⊆ ˜ x + H + ( w ).(a) ⇔ (c) Both directions follow from (3.5). (cid:3) For the following result, some notation is needed which is also used in the remainderof the paper. For w ∈ B + we define the following sets˜ X = ( w, p ) = (cid:110) x ∈ ˜ X | w (cid:62) x = w (cid:62) x ( w, p ) (cid:111) ˜ X ≤ ( w, p ) = (cid:110) x ∈ ˜ X | w (cid:62) x ≤ w (cid:62) x ( w, p ) (cid:111) . The set ˜ X = ( w, p ) includes all data points on the boundary of the shifted halfspace Q − ˜ X,w ( p ) = x ( w, p )+ H + ( w ), whereas ˜ X ≤ ( w, p ) includes the data points in x ( w, p ) − H + ( w )with x ( w, p ) ∈ ˜ X from (3.7) in both cases. Standing assumption.
In the remainder of the paper, it is assumed that C + isgenerated by the two linearly independent vectors v , v ∈ C + , i.e., { v , v } is a V-representation of C + . In this case, the set B + = (cid:8) (1 − s ) v + sv | s ∈ [0 , (cid:9) is a base of C + . Equivalently, C is generated by two linearly independent vectors b , b ∈ C , i.e., { b , b } is a V-representation of C . Proposition 3.6 If p ∈ (0 , and Q − ˜ X,C ( p ) (cid:54)∈ { IR , ∅} , then Q − ˜ X,C ( p ) can be representedas Q − ˜ X,C ( p ) = (cid:92) w ∈ W ( p ) Q − ˜ X,w ( p ) such that W ( p ) ⊆ B + and X = ( w, p ) ≥ and X ≤ ( w, p ) ≥ (cid:100) N p (cid:101) + 1 for all w ∈ W ( p ) \{ v , v } . In particular, W ( p ) is a finite set. roof. First, take w ∈ B + \{ v , v } . By Proposition 3.5, X = ( w, p ) ≥ X = ( w, p ) = (cid:110) x ∈ ˜ X | w (cid:62) x = w (cid:62) x ( w, p ) (cid:111) = (cid:110) x ∈ ˜ X | Q − ˜ X,w ( p ) = x + H + ( w ) (cid:111) . Since w (cid:54)∈ { v , v } , there is s ∈ (0 ,
1) such that w = (1 − s ) v + sv . Assume that X = ( w, p ) = 1, i.e., ˜ X = ( w, p ) = { x ( w, p ) } . Then, there are ε , ε > s := s + ε ∈ (0 , s := s − ε ∈ (0 ,
1) and the two halfspaces x ( w, p ) + H + ( w ( s )) and x ( w, p ) + H + ( w ( s ))contain exactly the same set of data points as Q − ˜ X,w ( p ) = x ( w, p ) + H + ( w ) where w ( s i ) =(1 − s i ) v + s i v ∈ B + , i = 1 ,
2. Hence Q − ˜ X,w ( s i ) ( p ) = x ( w, p ) + H + ( w ( s i )), i = 1 ,
2, and x ( w, p ) + H + ( w ) (cid:41) (cid:0) x ( w, p ) + H + ( w ( s )) (cid:1) (cid:92) (cid:0) x ( w, p ) + H + ( w ( s )) (cid:1) . This means that Q − ˜ X,w ( p ) does not contribute to the intersection in Q − ˜ X,C ( p ) = (cid:92) w ∈ B + Q − ˜ X,w ( p ) , (3.10)and it is enough to run it over those w ∈ B + with X = ( w, p ) ≥ v , v .Secondly, take such a w ∈ B + \ { v , v } and assume X ≤ ( w, p ) ≤ (cid:100) N p (cid:101) . Now, (3.8)implies ”=” in this inequality. Pick ˜ x , ˜ x ∈ ˜ X = ( w, p ) such that( v ) (cid:62) ˜ x = max (cid:110) ( v ) (cid:62) x | x ∈ ˜ X = ( w, p ) (cid:111) , (3.11)( v ) (cid:62) ˜ x = max (cid:110) ( v ) (cid:62) x | x ∈ ˜ X = ( w, p ) (cid:111) . (3.12)One has ˜ x (cid:54) = ˜ x and ( v ) (cid:62) ˜ x > ( v ) (cid:62) ˜ x and ( v ) (cid:62) ˜ x > ( v ) (cid:62) ˜ x by (3.11), (3.12), hence (cid:0) v − v (cid:1) (cid:62) (cid:0) ˜ x − ˜ x (cid:1) > . (3.13)Since the data set ˜ X is finite, it is always possible to find ε , ε > w := w ( ε ) , w := w ( ε ) such that w ( ε ) = w + ε ( v − v ) ∈ B + for ε ∈ [0 , ε ] w ( ε ) = w + ε ( v − v ) ∈ B + for ε ∈ [0 , ε ]and the following conditions are satisfied: ˜ x i is kept on the boundary of the halfspace˜ x i + H + ( w i ( ε )) for i = 1 ,
2, and one has X = ( w i ( ε ) , p ) = 1, Q − ˜ X,w i ( ε ) ( p ) = ˜ x i + H + ( w i ( ε ))8nd X ≤ ( w i ( ε ) , p ) = (cid:100) N p (cid:101) for ε ∈ [0 , ε i ), and for w i (i.e., for ε = ε i ), i = 1 ,
2, one has Q − ˜ X,w i ( p ) = ˜ x i + H + ( w i ), X ≤ ( w i , p ) ≥ (cid:100) N p (cid:101) and either w i = v i or X = ( w i , p ) ≥ w in direction v and v , respectively,around ˜ x and ˜ x until the next data point is hit. The data points in ˜ X ≤ ( w i ( ε ) , p ) with ε ∈ [0 , ε i ), i = 1 ,
2, are exactly the same as in ˜ X ≤ ( w, p ).Then, one has˜ x + H + ( w ) = ˜ x + H + ( w ) (cid:41) (cid:2) ˜ x + H + ( w ) (cid:3) ∩ (cid:2) ˜ x + H + ( w ) (cid:3) (3.14)which means that w is indeed redundant in the intersection (3.10). To see this, observethat there is ˆ z ∈ [˜ x + H + ( w )] ∩ [˜ x + H + ( w )] satisfying˜ x + H + ( w ) = ˆ z + H + ( w ) and ˜ x + H + ( w ) = ˆ z + H + ( w )(ˆ z is the intersection point of the two boundary lines of ˜ x + H + ( w ), ˜ x + H + ( w ) andit does not have to be a data point, of course). Claim. ˆ z ∈ ˜ x i + int H + ( w ), i.e., w T (ˆ z − ˜ x i ) >
0, for i = 1 , w ) (cid:62) (ˆ z − ˜ x ) = 0 , ( w ) (cid:62) (ˆ z − ˜ x ) = 0 , i.e., (cid:0) w + ε (cid:0) v − v (cid:1)(cid:1) (cid:62) (ˆ z − ˜ x ) = 0 (3.15) (cid:0) w − ε (cid:0) v − v (cid:1)(cid:1) (cid:62) (ˆ z − ˜ x ) = 0 . (3.16)Multiplying the second equation by − ε (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) + ε (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) = 0 , so one of the two parts of the sum must be ≤
0, the other ≥
0. With the help of (3.11),(3.12) one gets (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) = (cid:0) v − v (cid:1) (cid:62) ˆ z − ( v ) (cid:62) ˜ x + ( v ) (cid:62) ˜ x < (cid:0) v − v (cid:1) (cid:62) ˆ z − ( v ) (cid:62) ˜ x + ( v ) (cid:62) ˜ x = (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) , hence (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) < , (cid:0) v − v (cid:1) (cid:62) (ˆ z − ˜ x ) > . Together with (3.15), (3.16), this gives w (cid:62) (ˆ z − ˜ x ) = − ε ( v − v ) (cid:62) (ˆ z − ˜ x ) > w (cid:62) (ˆ z − ˜ x ) = ε ( v − v ) (cid:62) (ˆ z − ˜ x ) > ,
9o ˆ z ∈ ˜ x i + int H + ( w ) for i = 1 , z ∈ (cid:2) ˜ x + H + ( w ) (cid:3) ∩ (cid:2) ˜ x + H + ( w ) (cid:3) = (cid:2) ˆ z + H + ( w ) (cid:3) ∩ (cid:2) ˆ z + H + ( w ) (cid:3) , i.e., ( w ) (cid:62) ( z − ˆ z ) ≥ , ( w ) (cid:62) ( z − ˆ z ) ≥ . The definitions of w and w yield (cid:0) w + ε (cid:0) v − v (cid:1)(cid:1) (cid:62) ( z − ˆ z ) ≥ , (cid:0) w − ε (cid:0) v − v (cid:1)(cid:1) (cid:62) ( z − ˆ z ) ≥ , hence w (cid:62) ( z − ˆ z ) ≥ max {− ε (cid:0) v − v (cid:1) (cid:62) ( z − ˆ z ) , ε (cid:0) v − v (cid:1) (cid:62) ( z − ˆ z ) } ≥ , so z ∈ ˆ z + H + ( w ) ⊆ ˜ x i + int H + ( w ), i = 1 ,
2, where the inclusion is the claim above. Thisproves (3.14) which completes the proof of the proposition. (cid:3)
Remark 3.7
For i = 1 , , one can have X ≤ ( w i , p ) = (cid:100) N p (cid:101) or X ≤ ( w i , p ) ≥ (cid:100) N p (cid:101) + 1 . In the first case, w i is also redundant by Proposition 3.6. This is exploited in the algorithmbelow. In the second, w i cannot be ruled out by the proposition, but it could still beredundant for the intersection in (3.10) . Remark 3.8
Under the standing assumption, the situation can be reduced to the casewhen the cone is C = IR . Let C be generated by the two linearly independent vectors b , b ∈ IR , i.e., C = { s b + s b | s ≥ , s ≥ } . Then, there is an invertiblematrix A ∈ IR × such that AC = IR and one can use the affine invariance of the conedistribution function F X,C (see [6, Proposition 2.7]) to get ∀ z ∈ IR : F X,C ( z ) = F AX, IR ( Az ) . Indeed, since AC = IR if, and only if, Ab = e and Ab = e , one may easily see that A − = (cid:18) b b b b (cid:19) and A = 1 b b − b b (cid:18) b − b − b b (cid:19) do the job. Moreover, w ∈ C + ⇔ (cid:0) A − (cid:1) (cid:62) w ∈ IR , so ( A − ) (cid:62) C + = IR . The procedure now is: first, transform the data and the cone C by A ; secondly find F AX, IR and AQ − X,C ; finally transform back. Clearly, this idea is restrictedto the bivariate case. The bivariate algorithms
In this section, we provide some more theoretical background for algorithms which producethe empirical quantiles Q − ˜ X,w ( p ) with empirical halfspace depth regions as special casesand the values of c - depth with the halfspace depth function as a special case, respectively.Pseudocodes of the algorithms will also be given. The algorithm below is designed such that it starts with w = v and then runs through B + until it hits v . At an intermediate step, a w n ∈ B + is generated. There are threecases:(1) w n = v and X = ( v , p ) = 1.(2) If w n (cid:54) = v and X = ( w n , p ) ≥ w n = v .Case (3) serves as a stopping criterion. In case (1) and (2), a permutation of the datapoints is generated which in turn is used to generate a new w n +1 ∈ B + . Set K := (cid:100) N p (cid:101) . Case (1).
The input is w := w = v ∈ B + , x ( w, p ) (the only element in X = ( w, p )).Find the permutation π w of { , . . . , N } such that w (cid:62) x π w (1) ≤ . . . < w (cid:62) x π w ( K ) < . . . ≤ w (cid:62) x π w ( N ) . (4.1)One has x π w ( K ) = x ( w, p ). Case (2).
The input is w := w n ∈ B + . Find the permutation π w of { , . . . , N } suchthat w (cid:62) x π w (1) ≤ . . . ≤ w (cid:62) x π w ( K ) ≤ . . . ≤ w (cid:62) x π w ( N ) (4.2)Find the set X = ( w, p ) = (cid:110) x ∈ ˜ X | w (cid:62) x = w (cid:62) x π w ( K ) (cid:111) and set L := X = ( w, p ). Define a new permutation π v of the points in X = ( w, p ) by( v ) (cid:62) x π v ( k ) ≤ . . . ≤ ( v ) (cid:62) x π v ( k L ) (4.3)of the L data points in X = ( w, p ). Proposition 4.1
All inequalities in (4.3) are strict.
Proof.
Assume to the contrary that ( v ) (cid:62) x = ( v ) (cid:62) y with x, y ∈ X = ( w, p ). One has w = (1 − s ) v + sv for some s ∈ [0 , s = 0. Then w = v and ( v ) (cid:62) x = ( v ) (cid:62) y from (4.2) as well as( v ) (cid:62) x = ( v ) (cid:62) y . This is impossible for x (cid:54) = y since v , v are linearly independent.Secondly, if s ∈ (0 , s ( v ) (cid:62) x = s ( v ) (cid:62) y from ((1 − s ) v + sv ) (cid:62) x = ((1 − s ) v + sv ) (cid:62) y and again get the two equations ( v ) (cid:62) x = ( v ) (cid:62) y , ( v ) (cid:62) x =( v ) (cid:62) y . Thus, one ends up with the same contradiction as before. (cid:3) Next, find k (cid:96) such that X < ( w, p ) + k (cid:96) = K and re-arrange the permutation (4.1) asfollows x π w (1) , . . . , x π w ( X < ( w n ,p )) , x π v ( k ) , . . . , x π v ( k (cid:96) ) ,x π v ( k (cid:96) +1) , . . . , x π v ( k L ) , x π w ( X < ( w n ,p )+ L +1) , . . . , x π w ( N ) . Re-label this permutation by π w , its K -th element now is x π w ( K ) = x π v ( k (cid:96) ) . Case (1) and (2).
In both cases, the next step is a rotation of w in direction v : Set w ( s ) = (1 − s ) w + sv and solve the problemmaximize s (RP)subject to w ( s ) (cid:62) x π w ( k ) ≤ w ( s ) (cid:62) x π w ( K ) for k = 1 , . . . , K − w ( s ) (cid:62) x π w ( k ) ≥ w ( s ) (cid:62) x π w ( K ) for k = K + 1 , . . . , N (4.5) s ∈ [0 ,
1] (4.6)The output is w (¯ s ) with some ¯ s ∈ (0 ,
1] and X < ( w (¯ s ) , p ), X = ( w (¯ s ) , p ). If w (¯ s ) (cid:54) = v (i.e., ¯ s (cid:54) = 1), then X = ( w (¯ s ) , p ) ≥
2. The idea is to keep x π w ( K ) on the boundary of the w ( s )-quantile for s ∈ [0 , ¯ s ]. Lemma 4.2 (the rotation lemma for quantiles)
One has(a) Q − ˜ X,w ( s ) ( p ) = x π w ( K ) + H + ( w ( s )) for all ≤ s ≤ ¯ s ,(b) strict inequalities in (4.4) , i.e., X < ( w, p ) ⊆ X < ( w ( s ) , p ) and x π w ( K − k (cid:96) ) , . . . , x π w ( K − ∈ X < ( w ( s ) , p ) for all < s < ¯ s ,(c) strict inequalities in (4.5) for all < s < ¯ s ,(d) X = ( w ( s ) , p ) = 1 for all < s < ¯ s . Proof.
The first claim is by construction, i.e., (4.4), (4.5). If one assumes w ( t ) (cid:62) x π w ( n ) = w ( t ) (cid:62) x π w ( K ) for some t ∈ (0 , ¯ s ) and n ∈ { , . . . , K − } , then (cid:0) (1 − t ) w + tv (cid:1) (cid:62) x π w ( n ) = (cid:0) (1 − t ) w + tv (cid:1) (cid:62) x π w ( K ) . (4.7)12y (4.2), one has w (cid:62) x π w ( n ) ≤ w (cid:62) x π w ( K ) . If “=” would be true in this inequality, then one had the two equations w (cid:62) x π w ( n ) = w (cid:62) x π w ( K ) and ( v ) (cid:62) x π w ( n ) = ( v ) (cid:62) x π w ( K ) where the second is a consequence of the first and (4.7). This is a contradiction since thetwo vectors w, v are linearly independent.If w (cid:62) x π w ( n ) < w (cid:62) x π w ( K ) would be true, then (4.7) would yield ( v ) (cid:62) x π w ( n ) > ( v ) (cid:62) x π w ( K ) .However, this cannot be true as one can see as follows. Taking ε > t + ε < ¯ s one has by (4.4) (cid:0) (1 − t − ε ) w + ( t + ε ) v (cid:1) (cid:62) x π w ( n ) = (cid:0) (1 − t − ε ) w + ( t + ε ) v (cid:1) (cid:62) x π w ( K ) (4.7) would now yield ( − εw + εv ) (cid:62) x π w ( n ) ≤ ( − εw + εv ) (cid:62) x π w ( K ) . Rearranging terms, using w (cid:62) x π w ( n ) ≤ w (cid:62) x π w ( K ) and dividing by ε > v ) (cid:62) x π w ( n ) ≤ ( v ) (cid:62) x π w ( K ) which would produce a contradiction. (cid:3) The proposed algorithm works as follows.
Quantile Algorithm.Step 1.
Initialize w := v , W := { w } , RI = ∅ . Step 2.
Until w n = v repeat: rotation step with input w n , output w n +1 := w (¯ s ) andupdate W := W ∪ { w n +1 } , update n := n + 1. If X < ( w n , p ) + X = ( w n , p ) = K , then RI := RI ∪ { n } . Step 3.
Update W := W \{ w k | k ∈ RI } . Step 4.
Compute Q − ˜ X,C ( p ) = (cid:92) w ∈ W (cid:2) x ( w, p ) + H + ( w ) (cid:3) (4.8)while removing the redundant w ’s from W .13 orollary 4.3 (1) The Quantile Algorithm terminates after a finite number of rotationsteps.(2) (4.8) is true. Proof. (1) There are only finitely many data points and hence only finitely manyhalfspaces in IR with two points on their boundaries. The algorithm checks those whichsatisfy the conditions of Proposition 3.6 consecutively which is guaranteed by Lemma 4.2,i.e., it identifies the set W ( p ) ∪ { v , v } .(2) By construction, the algorithm identifies all elements of W ( p ) and only removesthose (in Step 2) which are redundant according to Proposition 3.6. (cid:3) Remark 4.4 If (cid:100) N p (cid:101) = 1 , then Q − ˜ X,w ( p ) is the convex hull of the data points plus thecone C . Remark 4.5
In our version of the algorithm, Step 4 makes use of the so-called Bensonalgorithm for the representation of convex polyhedrons implemented in the
Bensolve package [13, 14].
Bensolve generates the smallest H -representation as well as the V -representation of Q − ˜ X,C ( p ) which is a convex polyhedron. In particular, it removes theremaining redundant w ’s. In the bivariate case, a more direct approach is possible, but wepreferred to use Bensolve since it is also usable for higher dimensional data.
We show the result of the algorithm for the 20-data points example from above.
Example 4.6
The figure 4.1 illustrates the Quantiles for p = { . , . , . , . , . } based on the 20-data points example from above. Figure 4.1: C = IR , N = 20 , p = { . , . , . , . , . }
14n the worst case, each data point but two is on the boundary of two halfspaces whichcontribute to Q − ˜ X,C ( p ). Example 4.7
The following example showcases the situation in which indeed N rotationsteps have to be performed and the quantile set is the intersection of N + 1 halfspaces.One may suspect that this feature is mainly due to the fact that the data points are notcomparable with respect to ≤ IR . Figure 4.2: C = IR , N = 8 , p = 0 . Remark 4.8
If there are two data points x, y ∈ ˜ X with x − y ∈ int C , then these twopoints cannot be on the boundary of a w -quantile for w ∈ C + { } . Indeed, if v ⊥ ( x − y ) ,then v (cid:54)∈ C + \{ } since x − y ∈ int C . This confirms that the quantile Q − ˜ X,C ( p ) tends tohave less vertices if there are more pairs of data points comparable with respect to ≤ C . Inthe extreme case of a linearly ordered data set, the lower C -quantile has the form ”datapoint plus cone.” Of course, the univariate case can be seen as a very special one for thissituation. The algorithm in this subsection produces the value of F ˜ X,C ( z ) and c - depth ( z ; ˜ X ), respec-tively, for z ∈ IR d . Basic ideas from the algorithm in Section 4.2 will reappear, but the twoalgorithms are in some sense dual to each other: while the quantile algorithm changes thepoints on the boundary of the intermediate w -quantiles and keeps the property of being a p -quantile, the cone distribution function algorithm keeps the point z ∈ IR at which F ˜ X,C
15s computed on the boundary of intermediate w -quantiles, but the value of w - depth ( z ; ˜ X )changes.First, if necessary, z ∈ IR d is added to the set of data points ˜ X and adjust N . Thismeans, without loss of generality, z ∈ ˜ X and X = N . For w ∈ B + , let X = ( w, z ) = (cid:110) x ∈ ˜ X | w (cid:62) x = w (cid:62) z (cid:111) be the set of all data points which are on the boundary of the halfspace z − H + ( w ). Notethe difference to X = ( w, p ) used previously.Again, the algorithm is designed such that it starts with w = v and then runsthrough B + until it hits v . At an intermediate step, a w n ∈ B + is generated. In each ofthe three cases(1) w n = v and X = ( v , z ) = 1, i.e., X = ( v , z ) = { z } ,(2) If w n (cid:54) = v and X = ( w n , z ) ≥ w n = v ,a value K n is computed which is a current upper bound of c - depth ( z ; ˜ X ) and then arotation step is carried out except in case (3) which again serves as a stopping criterion.In case (1) and (2), the rotation step is performed such that z stays on the boundary ofthe resulting w n +1 -quantile. Case (1).
Find the permutation w (cid:62) x π w (1) ≤ . . . < w (cid:62) z < . . . ≤ w (cid:62) x π w ( N ) (4.9)and the number K such that x π w ( K ) = z . Case (2).
The input is w ∈ B + \{ v } . Find the permutation w (cid:62) x π w (1) ≤ . . . ≤ w (cid:62) z ≤ . . . ≤ w (cid:62) x π w ( N ) (4.10)Find the set X = ( w, z ) and determine L := X = ( w, z ) as well as k := (cid:110) x ∈ ˜ X | w (cid:62) x < w (cid:62) z (cid:111) .Clearly, one always has z ∈ X = ( w, z ). On X = ( w, z ), define a new permutation π by( v ) (cid:62) x π (1) ≤ . . . ≤ ( v ) (cid:62) z ≤ . . . ≤ ( v ) (cid:62) x π ( L ) . (4.11)As in Proposition 4.1, all inequalities in this permutation are strict. Let (cid:96) be the numbersuch that z = x π ( (cid:96) ) . Set K = k + (cid:96) . In the permutation (4.10), replace the elements in X = ( w, z ) in the order generated by the permutation (4.11) and relabel the permutationby π w . Now, z = x π w ( K ) . Case (1) and (2).
In both cases, set w ( s ) = (1 − s ) w + sv and solve the problem(RP), (4.4)-(4.6). The output is w (¯ s ) with some ¯ s ∈ (0 , w (¯ s ) (cid:54) = v (i.e., ¯ s (cid:54) = 1), then X = ( w (¯ s ) , z ) ≥
2. 16 emma 4.9 (the rotation lemma for the CDF) If w ∈ B + \{ v } , then(a) there are k + L data points in z − H + ( w ) (with L = 1 in Case (1)), i.e., w - depth ( z ; ˜ X ) = k + L ,(b) there are k + (cid:96) data points in z − H + ( w ( s )) for all s ∈ (0 , ¯ s ) , i.e., w ( s ) - depth ( z ; ˜ X ) = k + (cid:96) for s ∈ (0 , ¯ s ) ,(c) there are at least k + (cid:96) data points in z − H + ( w (¯ s )) , i.e., w (¯ s ) - depth ( z ; ˜ X ) ≥ k + (cid:96) . Proof.
First, observe that one has X = ( w ( s ) , z ) = { z } for s ∈ (0 , ¯ s ) due to the factthat there are only finitely many data points: the rotation of w in the direction of v around z removes all data points from the boundary line of z − H + ( w ) except z .(a) This is due to the definition of k and L . (b) According to (4.11) and (4.5), the L − (cid:96) data points x π ( (cid:96) +1) , . . . , x π ( L ) are not in z − H + ( w ( s )) along with the data points x π ( L +1) , . . . , x π ( N ) , while the points x π (1) , . . . , x π ( (cid:96) ) remain in z − H + ( w ( s )) accordingto (4.4) for all s ∈ (0 , ¯ s ). (c) According to (4.4), (4.5), there are at least k + (cid:96) data pointsin z − H + ( w (¯ s )) and if at least one of the inequalities in (4.5) is satisfied as an equationfor s = ¯ s , then there are at least k + (cid:96) + 1 data points in z − H + ( w (¯ s )). (cid:3) The cone location depth algorithm.Step 1.
Initialize w := v and set K := K with K as found in Case (1) or (2). Step 2.
Until w n = v repeat: starting with π w obtained from the permutationsfound in Case (1) or (2), perform a rotation step with input w n , output w n +1 := w (¯ s n )and K n +1 . Update K := min { K n , K n +1 } . Update n := n + 1. Step 3. If w n = v , compute ( v )- depth ( z ; ˜ X ) and update K := min { K, ( v )- depth ( z ; ˜ X ) } . Step 4.
Compute F ˜ X,C ( z ) = (cid:40) KN : z is an original data point K − N − : z is not an original data point (4.12) Corollary 4.10 (1) The Cone Distribution Function algorithm terminates after a finitenumber of rotation steps.(2) (4.12) is true.
Proof. (1) There are only finitely many data points and hence only finitely manyhalfspaces in IR with at least two data points on their boundaries. The algorithms checksthose with z as one boundary point and normal direction in B + as well as z − H + ( v ), z − H + ( v ).(2) By Lemma 4.9, K = k + (cid:96) ≤ w ( s )- depth ( z ; ˜ X ) for each s ∈ [0 , ¯ s ] with equalityfor s ∈ (0 , ¯ s ). By construction in Step 2, the algorithm determines the minimum of thesenumbers and v - depth ( z ; ˜ X ). Equation (4.12) follows taking into account that z might ormight not be an original data point. (cid:3) .4 Pseudocodes In this section, pseudocodes for the two algorithms are provided along with a few ex-planatory remarks. The following algorithms assigns an index to each element of the setof data points ˜ X . This index is then used trough out the algorithm in order to simplifythe identification of each point and to minimize rounding errors. Algorithm 1
Bivariate Lower Cone Quantile. procedure ConeQuantile ( p , ˜ X , b ) (cid:46) p ∈ (0 ,
1] , ˜ X ∈ IR N × IR , b ∈ IR × IR v ← (cid:18) b , − b , − b , b , (cid:19) (cid:46) see remark 4.13 if ( v ) T b < ∨ ( v ) T b < then v ← (cid:18) − b , b , b , − b , (cid:19) end if K ← (cid:100) p × N (cid:101) (cid:46) see proposition 3.5 w ← v , W ← { w } (cid:46) v is assigned to w , which is then saved in W n = 1 (cid:46) n counts the iterations while do w ← w n π ˜ Xw ← IndexSort ( ˜ X (cid:62) w ) (cid:46) see remark 4.14 q ← q ∪ (cid:110) w (cid:62) x π ˜ Xw ( K ) (cid:111) (cid:46) see remark 4.19 B R ← (cid:110) j ∈ { , . . . , N } | w (cid:62) x π ˜ Xw ( K ) = w (cid:62) x π ˜ Xw ( j ) (cid:111) (cid:46) see remark 4.15 B I ← (cid:110) π ˜ Xw ∈ { , . . . , N } | w (cid:62) x π ˜ Xw ( K ) = w (cid:62) x π ˜ Xw (cid:111) (cid:46) index of boundary points B ← (cid:110) x ∈ ˜ X | w (cid:62) x π ˜ Xw ( K ) = w (cid:62) x (cid:111) (cid:46) set of boundary points if max( B R ) > K then (cid:46) see remark 4.20 I K ← I K ∪ n end if if w = v then (cid:46) stopping condition, see remark 4.21 break while end if π Bv ← IndexSort ( B (cid:62) v ) π ˜ Xw ( B R ) ← B I ( π Bv ) (cid:46) see remark 4.17 w n +1 ← Rotation ( ˜
X, w, v , π ˜ Xw , K ) (cid:46) see (RP), (4.4)-(4.6) W ← W ∪ { w n +1 } (cid:46) add the new direction w n +1 to W n = n + 1 (cid:46) update the counting variable n end while I K ← I K ∪ { , n } (cid:46) see remark 4.22 W ← W ( I K ) (cid:46) extract the directions with indexes I K q ← q ( I K ) (cid:46) extract the scalarized boundary points with indexes I K Q ← polyh ( W, q ) (cid:46) see remark 4.23 W, q ← hrep ( Q ) D, V ← vrep ( Q ) end procedureAlgorithm 2 Bivariate Cone Distribution function. procedure ConeDistribution ( z , ˜ X , b ) (cid:46) z ∈ IR , ˜ X ∈ IR N × IR , b ∈ IR × IR I ← { , . . . , N } (cid:46) index set of ˜ X = (cid:8) x , . . . , x N (cid:9) I z ← { i ∈ I | z = x i } (cid:46) set of positions of z in ˜ X if I z ) = 0 then (cid:46) see remark 4.11 ˜ X ← ˜ X ∪ z (cid:46) add z at the end of ˜ X N ← N + 1 i z ← N I ← I ∪ i z else if I z ) = 1 then i z ← I z (cid:46) index of z in ˜ X else if I z ) > then i z ← max ( I z ) (cid:46) see remark 4.12 end if v ← (cid:18) b , − b , − b , b , (cid:19) (cid:46) see remark 4.13 if ( v ) T b < ∨ ( v ) T b < then v ← (cid:18) − b , b , b , − b , (cid:19) end if w ← v , k ← N (cid:46) w and k are updated in each iteration while do π ˜ Xw ← IndexSort ( ˜ X (cid:62) w ) (cid:46) see remark 4.14 B R ← (cid:110) j ∈ { , . . . , N } | w (cid:62) x i z = w (cid:62) x π ˜ Xw ( j ) (cid:111) (cid:46) see remark 4.15 if w = v then (cid:46) stopping condition, see remark 4.16 k ← min ( k, max ( B R )) break while end if B I ← (cid:110) π ˜ Xw ∈ { , . . . , N } | w (cid:62) x i z = w (cid:62) x π ˜ Xw (cid:111) (cid:46) Index of boundary points B ← (cid:110) x ∈ ˜ X | w (cid:62) x i z = w (cid:62) x (cid:111) (cid:46) set of boundary points π Bv ← IndexSort ( B (cid:62) v ) π ˜ Xw ( B R ) ← B I ( π Bv ) (cid:46) see remark 4.17 K ← (cid:110) j ∈ { , . . . , N } | i z = π ˜ Xw ( j ) (cid:111) (cid:46) see remark 4.18 k ← min ( k, K ) (cid:46) update k if K < k w ← Rotation ( ˜
X, w, v , π ˜ Xw , K ) (cid:46) see (RP), (4.4)-(4.6) end while if I z ) = 0 then (cid:46) p is equal to kN or k − N − , if z / ∈ ˜ X . p ← k − N − else p ← kN end if end procedureRemark 4.11 I z is the set of positions of z in ˜ X , whereas is a counting function.There are three important cases: z / ∈ ˜ X , I z ) = 1 and I z ) > . Remark 4.12
As the sort algorithm will keep the order of the multiples of z , we need theindex ( i ∈ I ) of the “last” multiple in ˜ X . Remark 4.13
The V-representation of C (matrix b ) is converted into the V-representationof C + (matrix v ). v and v are chosen orthogonal to b and b , respectively, such that ( v ) T b ≥ , ( v ) T b ≥ . Remark 4.14
The function indexsort ( B ) sorts the elements of B in ascending orderand outputs the indexes of the elements of B in sorted order. Remark 4.15
The rank/position j of the boundary points in the permutation π ˜ Xw ( j ) . emark 4.16 The while loop is stopped, if the w generated in the previous iteration isequal to v . Moreover, k is updated if ( X < ( v , p ) + X = ( v , p )) < k . Remark 4.17
Rearrange elements of B I based on the permutation π Bv and replace thisrearranged indexes with the ones in the permutation π ˜ Xw at the positions B R . Remark 4.18
The K for each iteration (direction) is found by looking for the positionof the index of z ( i z ) in the permutation π ˜ Xw . Remark 4.19
The point x i ∈ ˜ X with index i = π ˜ Xw ( K ) is scalarized with w and saved to q . Remark 4.20
The halfspaces that include more than K points are relevant for the calcu-lation of the quantile. The number of points in the halfspace can be derived by taken thebiggest number in the set B R , which includes the ranks j of the boundary points in thepermutation π ˜ Xw ( j ) . Remark 4.21
The while loop is stopped, if the w generated in the previous iteration isequal to v . Remark 4.22
The halfspaces that include K points are redundant, with exception of v and v . Therefore, the index and n are also included to I K . Remark 4.23
The
Bensolve tools (see [13, 14]) provide two functions, polyh and vrep , to convert a H-representation into a V-representation. First, the H-representationis transformed into the ’polyhedral object’ format with polyh . Secondly, the output ofthe latter function is then used as input for vrep which returns the V-representation.
Bensolve is also used to remove all redundant halfspaces via hrep . The same algorithm can be used to compute bivariate Tukey depth regions. In this case,it is applied consecutively to the three sets B +0 = co (cid:8) ( − , − (cid:62) , (1 , − (cid:62) (cid:9) ,B +1 = co (cid:8) (1 , − (cid:62) , (0 , (cid:62) (cid:9) ,B +2 = co (cid:8) ( − , − (cid:62) , (0 , (cid:62) (cid:9) . The resulting algorithm is very close to the one described in [18]. It is worth notingthough that replacing the unit circle used in [18] by the three sections B +0 , B +1 , B +2 admits21he formulation of the rotation step as a linear problem. The following picture shows aTukey depth region for Example 3.3 with 20 data points.Tukey depth region for p = 0 .
25. Halfspace boundaries: B +0 , B +1 , B +2 The cone location depth algorithm was applied to compute the values of the halfspacedepth function in Example 3.3 above.
Example 5.1
The following figure shows the values of the cone location depth for a sam-ple of the bivariate standard normal distribution which comprises 80 points with C = IR . Figure 5.1: Cone location depth for a sample of a bivariate normal distribution22 xample 5.2
The next figure shows the values of the cone location depth for a sample ofa bivariate uniform distribution over two squares with the cone C = IR . This exampleis taken from [4] and was also discussed in [6]. Figure 5.2: Cone location depth for a sample of X ∼ U nif orm (cid:2) (0 , × (0 , ∪ (1 , × (1 , (cid:3) .A few examples with real world data are added. The aim is not to provide an exten-sive analysis for them, but to show how a bivariate approach might change the picturecompared to a univariate one. One should note that in each of these examples there is anintuitive understanding for ”better” or ”worse” data points. Consequently, a mere depthfunction approach would not produce meaningful results. For all of these examples, thecone IR generating the componentwise order is used for the sake of simplicity, but thereare of course other (and maybe better) options. Example 5.3
The motivation to consider this data set stems from [7, 8] where it is dis-cussed that the maximum sustained wind speed (WS) alone (and hence the storm categoryaccording to the Saffir-Simpson scale) is not always a good proxy for the potential destruc-tive impact of a hurricane. While the minimal sea level pressure (SLP) already seems to bea better proxy (see the strong arguments in [8]), one can also think of using more than oneparameter: along with wind speed and central pressure, storm wind radii and the stormtranslation speed (in particular, the devastating impact of Hurricane Dorian 2019 on theBahamas supports such a parameter) are discussed in the quoted references. Yet anotherparameter should be storm surge as discussed in [19] since this feature was removed fromthe Saffir-Simpson scale in the wake of 2005 Katrina.Here, the two parameters WS and SLP are used to compare hurricanes. Note that thevalues usually occur at different times which are also different from the time of landfall:other choices are possible and can easily be implemented. The cone C is generated by b , b } = { (1 , , (0 , − } , since stronger wind and lower pressure characterize potentiallymore destructive hurricanes. The quantiles could then be used to categorize hurricanes.A comparison with a scale based only on SLP produced very similar results, but withthe following difference: The C -quantiles categorize the hurricanes based on the SLP aswell as WS. This means that a hurricane needs to overcome a threshold in both variables.Therefore, a hurricane, that is in a specific category on a scale based on one variable isnot always in the corresponding category on the scale based on C -quantiles, as it does notreach the threshold in the other variable.This serves just as an example to illustrate the potential of a categorization via set-valued quantiles. We think that even different cones should be used. The one generatedby b = (1 , and b = √ / − , − (or similar vectors) would be a good candidate tocapture destructive storms with very low central pressure, but only moderate wind speed:the 2012 hurricane Sandy is the prominent example since it was not even categorized as ahurricane at landfall according to the Saffir-Simpson scale. On the other hand, the half-space H + ((0 , − as cone C reproduces the situation considered in [8]. The question howto choose the cone should be subject to an extended analysis using historical data.Such data can also be utilized in the following way: the track of a current hurricanecan be followed in the quantile graph, thus providing a strong impression how ”close” itcomes to previous (major) hurricane (see Dorian’s 2019 track below). This could delivera strong warning message.Finally, note that fast computations in real-time (see [8]) are an issue. The algorithmspresented in this paper could meet such demands. C -quantiles.Figure 5.4: Dorian’s 2019 track through the quantile graph. Example 5.4
South Tyrol, a region in the north-east of Italy, is one of the biggest appleproducer in Europe. Due to its geographical position in the middle of the Alps it is hit egularly by severe hailstorms. The three main possibilities to hedge against the risk of haildamages are a hail insurance with substantial European subsidies, the installation of hailnets and the combination of these two options. In the following analysis the third optionis compared to the first one. Hence, two types of farmers are compared: one type hasstipulated only a hail insurance and the other one hedges their product with hail nets andinsures it additionally. Both insurance products are subsidized by the European Union,whereas the contract with hail nets is discounted.The data used for this analysis is given by the Hagelschutzkonsortium. It comprises,inter alia, the area insured ( AI ), the sum insured ( SU ), the premia paid by the farmer( F P ), the sustained damage ( SD ) and the indemnity payments ( IP ) for each insurancecontract signed between 2013 to 2017 by the members of the hail-defense syndicate. Onlythe farmers that have signed a contract in all 5 years are taken for the analysis. Eachfarmer is represented by two numbers. The first number is a proxy for the yearly businessreturn: no hail net: RB = SU − SD with hail net: RB = SU − SD − . ∗ AI where the estimated cost for a hectar of hail net is e RI = IP − F P
The figure 5.5 has on the x-axes RB -values and on the y-axes RI -values. The bluedata points represent the farmers with no hail nets and that have signed an insurancecontract in each year between 2013 and 2017, whereas the red data points represent thefarmers with hail nets. There are points that are located far away from the rest, thisactually corresponds to the market situation with a few very big producers and the vastmajority of small farmers. p = 0 . p = 0 . p = 0 . The last figure shows the lower C -quantiles at p = 0 . , p = 0 . and p = 0 . for thefarmers without hail nets in blue and for the farmers with hail nets in red. There are twomain insights that can be derived from this analysis. First, the red quantile is shifted tothe right with respect to the blue quantile for all p . This means that the farmers with hailnets have higher yearly returns. Moreover, as p increases, the horizontal distance betweenthe quantiles increases. This implies that the combination of hail nets and insurance ismore profitable the bigger the farmer. Second, the blue and red quantiles for p = 0 . and p = 0 . are aligned a little below the x -axes. This can be interpreted as the insurancecontracts being profitable for the insurance company, but also being not too costly for thefarmer. For p = 0 . , the quantiles are substantially above the x -axes, indicating thatbigger farmers actually make a profit out of the insurance contracts. This phenomenaeven increases for the farmers with hail nets.In general, the insurance is paying off for all farmers. However, the farmers thatinstall a hail net and stipulate an insurance are more profitable and from a certain sizethey have even better payoffs on the insurance. xample 5.5 A problem in human resource management very often is that applicantsfor a job (or persons potentially assigned to carry out a certain task or job) have to beevaluated according to several, often contradictory or competing criteria. The authorsof this paper, as many other professionals working in academia, were subject to suchevaluations. Typically, while the decision maker almost always has a clear understandingof what is better for each criterion (more publications, more project money raised, morecontributions to administrative tasks etc.), the final decision is often the result of an adhoc aggregation procedure (distribute some points for each achievement in each activityand sum the points at the end) which does not take into account the incomparableness ofthe different criteria and hence the different candidates. Such an aggregation is usually ascalarization: assign numbers to each candidate and then use the total order in IR .Here, we use data from student results for a simple illustration what can be done insuch cases and what kind of information one could expect if one would apply the methodssuggested in this note. The two criteria are the average grade and the total number ofcredit points where the latter may serve as a proxy for the study time: the higher thisnumber, the more courses the student finished within the time interval considered. Thetraditional aggregation procedure is: used the average grade as ranking for all studentswho earned a minimum number of credit points (made the threshold) and do not considerthe others.Note that these rankings have serious consequences: they are used to admit (or notadmit) students to Erasmus programs, award prizes etc.The average marks above 28 are highlighted in blue (one in red) in the above table.Their distribution in the right column shows that the ranking can change drastically if thecone location depth for two criteria is used instead of an (more or less arbitrarily chosen)aggregation procedure. The part highlighted in red shows that students with the same valueof the cone location depth can have very different average grades. We do not claim herethat the ranking according to the cone location depth is the ”true” ranking (it depends onthe cone which also is a choice), but we would like to point out that ”traditional” decisionmakers should be equally aware that each ranking based on an aggregation procedure couldbe highly questionable and very far from being ”objective.” . The higher the values of the cone distribution the“better” the student.St01 St02 St03 St04 St05 St06 St07 St08 St09 St10 St11CD 31 30 28 27 26 26 25 23 22 22 21AG 28.38 27.00 26.71 26.22 28.43 26.86 26.13 28.86 28.57 25.57 27.00St12 St13 St14 St15 St16 St17 St18 St19 St20 St21 St22CD 20 19 18 17 16 15 14 13 13 13 12AG 25.11 27.80 28.50 28.17 29.17 28.50 27.17 25.50 24.11 24.09 25.33St23 St24 St25 St26 St27 St28 St29 St30 St31 St32 St33CD 11 10 10 10 9 8 7 7 7 7 6AG 23.25 24.33 23.43 23.14 24.17 22.86 25.00 25.00 22.14 22.00 26.00St34 St35 St36 St37 St38 St39 St40 St41 St42CD 6 5 4 3 3 3 2 2 1AG 21.71 25.20 24.50 29.50 20.00 19.38 20.25 19.13 18.25Table 5.1: Student data: cone depth vs. average grades29 Conclusions
In multivariate data analysis models, an order relation for the data points is very often(intuitively) present, but not part of the statistical analysis. In this paper, it is shownhow decision makers can analyze bivariate data based on lower cone distribution functionsand set-valued quantiles. It seems to us that a mere depth function approach is onlyappropriate if there is no (intuitive or explicitly modeled) preference present for datapoints they are roughly rotation symmetric. This could be subject to a debate to whichthis paper aims to contribute.The algorithms for computing the new objects separate the combinatorial/nonlinearpart (permutations) and linear/computational geometry part (linear rotation step forcomputing convex polyhedrons, see also, e.g., [16] with emphasis on the link betweencomputational geometry and depth functions/regions). This makes the computationstractable.The approach works independently of dependence structures: in Example 5.3, themaximum windspeed and the minimal pressure are clearly strongly correlated which isnot the case in Example 5.5. The relationships of our approach with dependence structureapproaches is an interesting research question. The major difference (and difficulty) isthat the lower cone distribution function is different from the joint distribution function(even if the cone is IR , see [6]) which is the basic object, e.g., for copula approaches. Funding.
D. Kostner’s work was part of the project ”Re-insurance of hail risks inSouth Tyrol” (Hail-Risk) with PI Prof. A. Weißensteiner funded by Free University ofBozen, Italy.
References [1] CD Aliprantis, KC Border, Infinite Dimensional Analysis, 3rd edition, SpringerPublishers 2006[2] C Ararat, AH Hamel,
Lower cone distribution functions and set-valued quantilesform Galois connections , SIAM Theory Probab. Appl. 65(2), 179-190, 2020[3] D Ballatore, G Vittone,
I costi della protezione nel melo: l’esperienzapiemontese premia le reti , Dossier grandine, FRUTTICOLTURA 5, 2008, https://terraevita.edagricole.it/agri24/img/Frutticoltura052008_DossierGrandine2.pdf [4] A Belloni, RL Winkler,
On multivariate quantiles under partial orders , Ann. Statis-tics 39(2), 1125-79, 2011 305] R Dyckerhoff, P Mozharovskyi,
Exact computation of the halfspace depth , Compu-tational Statistics & Data Analysis 98, 19-30, 2016[6] AH Hamel, D Kostner,
Cone distribution functions and quantiles for multivariaterandom variables , J. Multivariate Analysis 167, 97–113, 2018[7] CG Hebert, RA Weinzapfel, MA Chambers,
Hurricane severity index: a new wayof estimating a tropical cyclone’s destructive potential , 29th Conf. on Hurricanesand Tropical Meteorology, 2010[8] PJ Klotzbach, MM Bell, SG Bowen, EJ Gibney, KR Knapp, CJ Schreck III,
Sur-face pressure a more skillful predictor of normalized hurrricane damamge thanmaximum sustained wind , Bull. Amer. Meteorol. Soc. 101(6), 830-846, 2020[9] D Kostner,
Multi-criteria decision making via multivariate quantiles , Math.Method Oper. Res. 91, 73-88, 2020[10] X Liu,
Fast implementation of the Tukey depth , Computational Statistics 32(4),1395-1410, 2017[11] X Liu, K Mosler, P Mozharovskyi,
Fast computation of Tukey trimmed regions andmedian in dimension p >
2, J. Comp. Graph. Statistics 28(3), 682-697, 2019[12] X Liu, Y Zuo,
Computing halfspace depth and regression depth , Communicationsin Statistics–Simulation and Computation 43, 969-985, 2014[13] A L¨ohne, B Weißing,
The vector linear program solver BENSOLVE–notes on the-oretical background
Computing multiple-output regression quantile regions ,Computational Statistics & Data Analysis 56(4), 840-853, 2012[16] PJ Rousseeuw, M Hubert,
Statistical depth meets computational geometry: a shortsurvey , in: JE Goodman, J O’Rourke, CD Toth (eds.), Handbook of Discrete andComputational Geometry, 3rd edition, Capman and Hall/CRC Press, 2017[17] PJ Rousseeuw, I Ruts,
Algorithm AS 307: Bivariate location depth , J. Royal Stat.Soc. Series C 45(4), 516-526, 1996[18] I Ruts, PJ Rousseeuw,
Computing depth contours of bivariate point clouds , Comp.Statistics & Data Anal. 23, 153-168, 19963119] AM Walker, DW Titley, ME Mann, RG Naijjar, SK Miller,