Directional quantile classifiers
DDIRECTIONAL QUANTILE CLASSIFIERS
By Alessio Farcomeni † and Marco Geraci ‡ and Cinzia Viroli § University of Rome “Tor Vergata” † , University of South Carolina ‡ ,University of Bologna § Abstract
We introduce classifiers based on directional quantiles.We derive theoretical results for selecting optimal quantile levelsgiven a direction, and, conversely, an optimal direction given a quan-tile level. We also show that the misclassification rate is infinitesimalif population distributions differ by at most a location shift and ifthe number of directions is allowed to diverge at the same rate of theproblem’s dimension. We illustrate the satisfactory performance ofour proposed classifiers in both small and high dimensional settingsvia a simulation study and a real data example. The code imple-menting the proposed methods is publicly available in the R package Qtools .
1. Introduction.
The idea of using quantiles in classification is rela-tively recent and largely unexplored. The median classifier for high-dimensionalproblems proposed by Hall, Titterington and Xue (2009), which calculatesthe L distance of the coordinates of a multivariate data point from com-ponentwise medians (rather than centroids), is particularly advantageouswhen data exhibit heavy-tailed or skewed distributions. Building on Hall,Titterington and Xue’s (2009) idea, Hennig and Viroli (2016a) proposedquantile classifiers which hinge on the sum of distances from componentwisequantiles at some generic level θ ∈ (0 , † Corresponding author: Alessio Farcomeni, Department of Economics and Finance,University of Rome “Tor Vergata”, Italy. E-mail: [email protected]
MSC 2010 subject classifications:
Primary 62G05; secondary 62G20
Keywords and phrases: classification, L distance, machine learning, quantiles for mul-tivariate data a r X i v : . [ s t a t . M E ] S e p FARCOMENI ET AL taken into account by computing linear combinations of input variables. Sec-ond, directional quantiles have a simple interpretation since the projections’weights embody the relative importance of the variables involved in theclassification problem. Finally, in the special case of p canonical directions(with p equal to the number of variables), the use of directional quantilesleads to the componentwise quantile classifier (Hennig and Viroli, 2016a),and thus inherits asymptotic optimal properties as shown in Appendix. Di-rectional quantiles have already found application in risk classification prob-lems (Geraci et al., 2020) and proved to be a worthwhile alternative to riskclassification based on componentwise quantile thresholds.In general, the application of our methods does not require any assump-tion on the shape of the population distributions. We derive asymptotictheoretical properties of the proposed classifier, under the assumption thatdistributions for alternative populations differ by at most a location-shift.While this assumption may be unrealistic in practice, empirical results sup-port the merit of the proposed classifier also when the distributions differby shape and not just by location.The rest of the paper is organised as follows. In the next section, we intro-duce notation and basic definitions, followed by our proposal of directionalquantile classifiers. Theoretical results are stated in Section 3. We reportthe results of a simulation study in Section 4 and of a real data analysis inSection 5. Concluding remarks are given in Section 6. All proofs of theoret-ical results are reported in Appendix 6. A software implementation of ourapproach can be found in the package Qtools (Geraci, 2016), freely availableon the Comprehensive R Archive Network (R Core Team, 2020).
2. Methods.
Notation and definitions.
Let X (1) = (cid:16) X (1)1 , X (1)2 , . . . , X (1) p (cid:17) (cid:62) and X (2) = (cid:16) X (2)1 , X (2)2 , . . . , X (2) p (cid:17) (cid:62) denote two p -variate random variables withabsolutely continuous distributions F (1) and F (2) defined on the same space X ⊆ R p for two populations Π (1) and Π (2) , respectively. The marginal dis-tributions of the components of X ( k ) are denoted by F ( k ) j , for j = 1 , , . . . and k = 1 ,
2. Further, I ( · ) denotes the indicator function which is equal to1 if its argument is true, and 0 otherwise.Our goal is to assign a new observation y = ( y , y , . . . , y p ) (cid:62) to eitherΠ (1) or Π (2) according to how close the point is to one or the other. Inquantile-based classification (Hennig and Viroli, 2016a), the distance is firstcalculated for each component of y using the asymmetrically weighted loss IRECTIONAL QUANTILE CLASSIFIERS function(1) Φ ( k ) ( θ ; y j ) = { θ + (1 − θ ) I ( y j − Q ( k ) X j ( θ ) < }| y j − Q ( k ) X j ( θ ) | for j = 1 , , . . . , p and k = 1 ,
2, where Q ( k ) X j ( θ ) is the componentwise quantileat level θ ∈ (0 ,
1) for the k th population, which can be obtained by inversionof F ( k ) j . Subsequently, y is assigned to Π (1) if the discrepancy(2) d ( y , θ ) = p (cid:88) j =1 { Φ (2) ( θ ; y j ) − Φ (1) ( θ ; y j ) } is positive, and to Π (2) otherwise. The quantile classifier reduces to thecomponentwise median classifier of Hall, Titterington and Xue (2009) for θ = 0 .
5. An extension of (2) to more than two populations is straightforward.The classification rule based on (2) does not acknowledge the possible in-terdependence among the variables, since quantiles are obtained marginallyfor each variable. We address this limitation by using directional quantilesfor multivariate data (Kong and Mizera, 2012). We now explain our ideainformally and, in the next section, give a rigorous treatment.Define u to be a vector with unit norm in R p . Throughout this paper, ourfocus will be on the projected random variables u (cid:62) X ( k ) ≡ Z ( k ) , k = 1 , Z ⊆ R . By assumption, the Z ( k ) ’s are continuous. We denotethe corresponding distribution and density functions with G ( k ) ( · ; u ) and g ( k ) ( · ; u ), respectively.Our goal is to develop a classifier where the quantities in (1) are oppor-tunely redefined on the corresponding projections along u to capture themultivariate nature of the distributions, namely(3) Φ ( k ) ( θ ; u (cid:62) y ) = { θ + (1 − θ ) I ( u (cid:62) y − Q ( k ) Z ( θ ; u ) < }| u (cid:62) y − Q ( k ) Z ( θ ; u ) | for k = 1,2, where Q ( k ) Z ( θ ; u ) ≡ Q ( k ) u (cid:62) X ( θ ) is the θ th quantile of Z ( k ) . Thelatter is obtained by inverting G ( k ) and it can be recognised as the θ th directional quantile of X ( k ) in the direction u (Kong and Mizera, 2012).By working with projections, we basically summarise a multivariate prob-lem as a univariate one. Clearly, one difficulty to address is how many andwhich directions should be considered. To this end, we should note thatnot all the directions are equally useful for classification. To exemplify, con-sider Figure 1, which depicts bivariate normal samples from two independentpopulations centered at (1,1) and (3,3), respectively, and same variance. Wewant to assign the new observation y = (1 . , . (cid:62) to one of the two popula-tions. The log-density at y of two bivariate normal distributions with sample FARCOMENI ET AL −1 0 1 2 3 4 5 − x x y −1 0 1 2 3 4 5 − x x y−1 0 1 2 3 4 5 − x x y −1 0 1 2 3 4 5 − x x y Figure 1 . Simulated data depicting bivariate normal samples from two independent dis-tributions (black and grey dots). The red filled squares mark the point with coordinates (1 . , . , while dashed lines mark directions. means and covariance matrices separately estimated from the two samples,is − . − .
7, respectively. This suggests that y has been generated morelikely from F than from F .Now compute Φ ( k ) (0 . u (cid:62) y ), k = 1 ,
2, as in (3) for four normalised direc-tions. The results are reported in Table 1. Based on a principle of minimumdistance, we assign y to F , thus consistently with a maximum likelihoodprinciple, for three, though not all four, directions. IRECTIONAL QUANTILE CLASSIFIERS Table 1
Distance Φ ( k ) (0 . u (cid:62) y ) , k = 1 , , calculated for simulated data using four differentdirections u . u (cid:62) Φ (1) Φ (2) ( − . , − .
81) 0.27 0.01(0 . , .
97) 1.58 0.07( − . , − .
98) 0.30 0.08(1 . , .
02) 0.03 0.23
Directional quantile classifier.
Let ϑ = { θ , θ , . . . , θ R } be a set of R distinct quantile levels on (0 , υ r = { u r , u r , . . . , u rS r } containing S r normalised directions associated with θ r , r = 1 , . . . , R , and let υ = { υ , υ , . . . , υ R } . (Note that for convenience one may set S r = S for r = 1 , . . . , R .)As mentioned in the previous section, we need to be wary of particulardirections that may lead us to a classification error. Therefore, we introduceweights ω rs associated with each direction u rs to decrease (or increase) theirrelative importance. Let ω = ( ω , . . . , ω S , . . . , ω RS R ) (cid:62) denote the vectorof all such weights. We propose the discrepancy(4) d ( y , ϑ, υ, ω ) = R (cid:88) r =1 S r (cid:88) s =1 ω rs { Φ (2) ( θ r ; u (cid:62) rs y ) − Φ (1) ( θ r ; u (cid:62) rs y ) } , where Φ ( k ) is defined in (3). Then our directional quantile classifier (DQC)assigns the observation y to Π (1) if d ( y , ϑ, υ, ω ) >
0, or to Π (2) otherwise.Note that if R = 1, S r = p , ω rs = 1, and υ = { e , e , . . . , e p } the standardbasis in R p , then (4) reduces to (2).A difficulty associated with the calculation of (4) is the selection of quan-tile levels, directions, and weights in the training data, say x , that give thebest performance on the test data, say y . For some prior probabilities π and π , let ψ ( x , ϑ, υ, ω ) = π (cid:90) X I { d ( x , ϑ, υ, ω ) > } d F (1) ( x )+ π (cid:90) X I { d ( x , ϑ, υ, ω ) ≤ } d F (2) ( x )(5)denote the population probability of correct classification by the DQC. Notethat maximising (5) is equivalent to minimising the theoretical misclassifica-tion rate. For any given level θ and direction u , the optimal misclassificationrate is obtained when π (cid:90) X Φ (1) ( θ ; u (cid:62) x ) d F (1) ( x ) < π (cid:90) X Φ (2) ( θ ; u (cid:62) x ) d F (1) ( x ) FARCOMENI ET AL and π (cid:90) X Φ (2) ( θ ; u (cid:62) x ) d F (2) ( x ) < π (cid:90) X Φ (1) ( θ ; u (cid:62) x ) d F (2) ( x ) , which is equivalent to minimise π (cid:90) X (cid:110) Φ (1) ( θ ; u (cid:62) x ) − Φ (2) ( θ ; u (cid:62) x ) (cid:111) d F (1) ( x )+ π (cid:90) X (cid:110) Φ (2) ( θ ; u (cid:62) x ) − Φ (1) ( θ ; u (cid:62) x ) (cid:111) d F (2) ( x ) . (6)In the general problem with K populations, the minimum misclassificationrate is obtained when K (cid:88) k =1 π k (cid:90) X Φ ( k ) ( θ ; u (cid:62) x ) d F ( k ) ( x ) < K (cid:88) k =1 π k (cid:90) X min k (cid:48) (cid:54) = k Φ ( k (cid:48) ) ( θ ; u (cid:62) x ) d F ( k ) ( x ) . (7)Let ∆ ( k ) ( x , θ, u ) = Φ ( k ) ( θ ; u (cid:62) x ) − min k (cid:48) (cid:54) = k Φ ( k (cid:48) ) ( θ ; u (cid:62) x ). Given a sample of n observations x i and corresponding class labels (cid:96) i ∈ { , , . . . , K } , we aimto solve(8) min ϑ,υ, ω K (cid:88) k =1 (cid:88) i : (cid:96) i = k R (cid:88) r =1 S r (cid:88) s =1 ω rs ∆ ( k ) ( x i , θ r , u rs ) . Problem (8) may seem daunting, but luckily we can solve for ω rather easily.Given ϑ and υ , problem (8) is linear with unit-norm constraints and can beminimised by using the Lagrange multiplier method. This problem has aclosed-form solution given by ˆ ω = (ˆ ω , . . . , ˆ ω S , . . . , ˆ ω RS R ) (cid:62) with generic rs th element ˆ ω rs = ˜∆ rs (cid:113)(cid:80) Rr =1 (cid:80) S r s =1 ˜∆ rs , where ˜∆ rs = (cid:80) Kk =1 (cid:80) i : (cid:96) i = k ∆ ( k ) ( x i , θ r , u rs ).We now turn to how to choose directions and quantile levels. A crudesolution would consist in doing a multidimensional grid search on p + 1dimensions. However, such a solution would become computationally pro-hibitive even at modest values of p . Thankfully, we are able to mitigatethe computational cost of a na¨ıve numerical solution with some theoreticalresults (Section 3); in particular, with Theorem 1, which guarantees thatfor each projection there exists (at least) a quantile that leads to the opti-mal Bayes misclassification probability, and Theorem 2, which, conversely, IRECTIONAL QUANTILE CLASSIFIERS identifies the best direction for a given quantile level. Unfortunately, a theo-retical result for the simultaneous optimisation with respect to θ and u doesnot exist. Nevertheless, we show that our DQC is asymptotically optimal(i.e. the misclassification rate goes to zero) when the number of directionsincreases with p and n (Theorem 3) under certain assumptions.In summary, there are different possible approaches including randomlyselecting one or more directions and using the optimal quantile levels asso-ciated with those directions; or spanning a grid of quantile levels and usingthe optimal directions associated with those quantiles. After some empir-ical investigation, we found that a strategy that gives satisfactory resultsin different settings is as follows. First, we define a grid of θ values span-ning the unit interval and, for each of these values, randomly draw a setof normalised directions from the hyperplane that is identified as optimalaccording to Theorem 2. The performance of a DQC based on each single θ value is evaluated using five-fold cross-validation. In the end, we use asingle quantile level (optimal according to cross-validation), with the cor-responding directions sampled from the optimal hyperplane. In particular,this strategy improves over the use of an asymptotically optimal quantilelevel when n is small. Moreover, when p is not too large, a similar strategycan be used to select an approximately optimal hyperplane.
3. Theoretical results.
In this section, we present theoretical resultsconcerning our DQC. The proofs of lemmas and theorems are reported inthe Appendix.3.1.
Optimal quantile level θ . We derive the theoretical rate of correctclassification as a function of θ , for given u . We assume K = 2 populations,although results can be generalised to K > Lemma . For given u , let Q α ( θ ; u ) = min { Q (1) Z ( θ ; u ) , Q (2) Z ( θ ; u ) } withcorresponding inverse G α ( · ; u ), density g α ( · ; u ), and prior probability of cor-rect classification π α , and let Q β ( θ ; u ) = max { Q (1) Z ( θ ; u ) , Q (2) Z ( θ ; u ) } withcorresponding inverse G β ( · ; u ), density g β ( · ; u ), and prior probability of cor-rect classification π β . The probability of correct classification of the direc-tional quantile classifier is(9) ψ ( θ ) = π α G α ( ˜ Q ( θ ; u ); u ) + π β { − G β ( ˜ Q ( θ ; u ); u ) } where ˜ Q ( θ ; u ) = θQ α ( θ ; u ) + (1 − θ ) Q β ( θ ; u ). Analogously, the theoreticalmisclassification rate is(10) 1 − ψ ( θ ) = π α { − G α ( ˜ Q ( θ ; u ); u ) } + π β G β ( ˜ Q ( θ ; u ); u ) . FARCOMENI ET AL a Q~ Q b g a Q~Q b z g Figure 2 . Misclassification probability (shaded grey area) with two location-shifted skeweddistributions according to median classifier (upper panel) and the optimal quantile classifier(lower panel).
Theorem . Assume that the density functions g α ( z ; u ) and g β ( z ; u )exist for z and are nonzero on the same compact domain Z . Further assumethat there is a point z with π α g α ( z ; u ) = π β g β ( z ; u ) so that π α g α ( z ; u ) >π β g β ( z ; u ) for z on one side of z and π α g α ( z ; u ) < π β g β ( z ; u ) for z on theother side of z . Then the quantile classifier using the quantile ˜ Q ( θ ; u ) thatminimises the theoretical misclassification probability achieves the optimalBayes misclassification probability.The consistency of the classifier may be illustrated with an example. Con-sider a two class decision problem where one population is a location-shiftversion of the other. Figure 2 shows two distributions which have both thesame right skewness. The quantiles Q α ( θ ) and Q β ( θ ) are marked by dashedlines. The median classifier (Hall, Titterington and Xue, 2009) in the upperpanel leads to a non-optimal misclassification probability equal to 0.30. How-ever, the misclassification probability is reduced to 0.28 by setting θ = 0 . IRECTIONAL QUANTILE CLASSIFIERS Optimal direction u . The next lemma and theorem give the optimaldirection that minimises the misclassification rate at a given θ . Lemma . Let z be a realisation of either Z (1) or Z (2) , thenΦ (2) ( θ ; z ) − Φ (1) ( θ ; z ) ≤ Q (2) Z ( θ ) − Q (1) Z ( θ ) , where Φ ( k ) ( θ ; z ) = θ max( η ( k ) ,
0) + (1 − θ ) max( − η ( k ) ,
0) and η ( k ) = z − Q ( k ) Z ( θ ), k = 1 , Theorem . Let W = ( W , W , . . . , W p ) (cid:62) be a p -variate random vari-able such that Q W j ( θ ) = 0, for j = 1 , . . . , p , and let µ ( k ) = ( µ ( k )1 , µ ( k )2 , . . . , µ ( k ) p ) (cid:62) be a vector of constants, k = 1 ,
2. We assume that X ( k ) = W + µ ( k ) and itsprobability distribution function is F ( k ) , for k = 1 ,
2. Moreover, assume that Q (2) Z ( θ ; u ) > Q (1) Z ( θ ; u ), where Q ( k ) Z ( θ ; u ) is the θ -quantile of Z ( k ) ≡ u (cid:62) X ( k ) .(Notice that there is no loss of generality with this assumption since the case Q (2) Z ( θ ; u ) ≤ Q (1) Z ( θ ; u ) can be reformulated as Q (2) Z ( θ ; − u ) > Q (1) Z ( θ ; − u ).)Under these assumptions, the normalised direction u that minimises themisclassification error (6) is(11) ˆ u = µ (2) − µ (1) (cid:107) µ (2) − µ (1) (cid:107) The generalization of Theorem 2 to
K > K ( K − / Asymptotic misclassification rate.
In this section, we show that un-der certain assumptions, the correct classification probability converges tounity when the number of dimensions grows to infinity along with the sam-ple size and the number of projections. The proof is built following a strat-egy similar to that used in Hall, Titterington and Xue (2009, Theorem 2),although our premises start from milder assumptions. In particular the pro-jections are not required to obey the “ ψ − mixing condition” (Bradley,2005), which is rather strict in practice. Our theorem is developed for any θ r ∈ (0 , ω rs = 1, and R = 1. Thus, the asymptotic resultholds for sub-components of the summation in (8), which are then weightedand summed to minimise the misclassification rate. Hence, the overall crite-rion inherits the optimal properties of its additive components.As we did with the theorems in the previous sections, we present this the-orem for K = 2 classes. Its extension to K > K − FARCOMENI ET AL
Theorem . Consider a set of directions υ = { u , . . . , u S } sampledfrom a unit p -sphere and let n = max( n , n ), with n and n denoting thesample sizes of the two groups in the training set. Assume(i) For a constant A > S ≥ A n .(ii) The p variables X ( k )1 , X ( k )2 , . . . , X ( k ) p have each the same distribution as W + µ ( k )1 , W + µ ( k )2 , . . . , W p + µ ( k ) p , respectively. Moreover, Q W j ( θ ) =0 ∀ j and sup j ≥ Var( W j ) = A < + ∞ .(iii) The first moments of the projections are uniformly bounded in a strongsense. This implies that ∀ c > ∀ u , ∃ v with | u (cid:62) v | > c such thatinf s ≥ inf | u (cid:62) s v | >c θ E | u (cid:62) s W + u (cid:62) s v | − (1 − θ ) E | u (cid:62) s W | > . (iv) For some (cid:15) >
0, the proportion of values s ∈ { , , . . . , S } for which | θ u (cid:62) s µ (2) − (1 − θ ) u (cid:62) s µ (1) | > (cid:15) multiplied by n / , say n / (cid:93) K (cid:15) , is of larger order than S , which means S (cid:0) n / (cid:93) K (cid:15) (cid:1) − goes to zero as n and S increase.Under the previous assumptions, the directional quantile classifier C basedon d ( y , θ, υ, ω ) = S (cid:88) s =1 { Φ (2) ( θ ; u (cid:62) s y ) − Φ (1) ( θ ; u (cid:62) s y ) } , makes the correct choice asymptotically. More specifically, as p → ∞ , theclassifier C makes the correct decision with probability P (1) {C ( Y ) = 1 } + P (2) {C ( Y ) = 2 } converging to 1 if both n and n diverge with p , where P ( k ) , k = 1 , Y is drawnfrom population k .
4. Simulation study.
We assessed the performance of the proposedclassifier in a simulation study under three scenarios with two populations.In the first scenario, observations were generated independently from a mul-tivariate Student’s t distribution with 3 degrees of freedom, with eitheruncorrelated or correlated variables. In the second scenario, observationswere generated as in the first scenario, but each variable was subsequentlytransformed according to x (cid:55)→ log( | x | ) to induce asymmetry. In both cases,the two populations differed by a location shift equal to 0.4. Finally, in IRECTIONAL QUANTILE CLASSIFIERS the third scenario observations were generated as in the first scenario, buteach variable was subsequently transformed according to x (cid:55)→ log( | x | ) or to x (cid:55)→ − log( | x | ) depending on whether observations belonged to one or theother population, respectively.Data were generated for each combination of overall sample size n ∈{ , , } (with n/ p ∈{ , , , } . All in all, this resulted in 3 × × × t distribution with correlated vari-ables was generated randomly for each p using the function rcorrmatrix with default settings as provided in the package clusterGeneration (Qiuand Joe, 2015; Joe, 2006). This resulted in non-constant pairwise correla-tions on the interval ( − , k -nearest neighbour (KNN) (Cover andHart, 1967), penalised logistic regression (PLR) (Park and Hastie, 2008),support vector machines (SVM) (Cortes and Vapnik, 1995; Wang, Zhu andZou, 2008), and na¨ıve Bayes classifier (Bayes) (Hand and Yu, 2001). Tuningparameters for PLR, KNN, and SVM where selected using cross-validation.For the CQC, the Galton correction was used to reduce skewness and opti-mal quantile was selected by minimising the error rate on the training set(Hennig and Viroli, 2016a).We used the package Qtools (Geraci, 2016, 2020) for the directional quan-tile classifier; the package quantileDA (Hennig and Viroli, 2016b) for thecentroid, median and componentwise quantile classifiers; the package eqc (Lai and McLeod, 2019) for the ensemble quantile classifier; the package
MASS (Venables and Ripley, 2002) for linear discriminant analysis; the pack-age class (Venables and Ripley, 2002) for k -nearest neighbour; the package e1071 (Meyer et al., 2019) for support vector machines and Bayes classifiers;and the package stepPlr (Park and Hastie, 2018) for penalised logistic re-gression. All analyses were carried out in R version 4.0.0 (R Core Team,2020).The misclassification rates averaged over 100 replications for all simulationcases are reported in Tables 2-4. The results indicate that the performanceof our proposed classifier improves as n and p increase, in agreement with FARCOMENI ET AL the theoretical results. In the first two scenarios, our classifier outperformsthe competitors in both scenarios when variables are uncorrelated. Whenvariables are correlated, the proposed classifier still performs very well, evenif it is not uniformly the best. In the third scenario where class distributionshave different shapes, the performance of our classifier is often, but notalways, the best.
5. Clinical trial on Crohn’s disease.
We analyse data from a matchedcase-control study in first-degree relatives (FDRs) of Crohn’s disease (CD)patients originally published by Sorrentino et al. (2014). The goal of thestudy was to identify asymptomatic FDRs with early CD signs using severalintestinal inflammatory markers. The latter included hemoglobin, erythro-cyte sedimentation rate, C-reactive protein, fecal calprotectin, and averagemature ileum score. In our analysis, we grouped subjects into 2 classes, onewith signs of inflammation ( n = 9 subjects with early or frank CD) andone with normal values of markers ( n = 26 subjects with no signs of in-flammation, including healthy controls). In a separate analysis, we augmentthe dataset with 45 artificial markers generated from independent standardnormal distributions to investigate the impact of uninformative noise on theperformance of the DQC. We approach data analysis with leave-one-out val-idation and evaluate the misclassification rate as the proportion of subjectsthat are misclassified when each is left out of analysis.We estimated the classification error for all the classifiers as included inour simulation study (Section 4). The results are reported in Table 5. Theproposed DQC outperforms its competitors in both the original ( p = 5) andnoisy ( p = 50) versions of the dataset.
6. Conclusions.
We proposed directional quantile classifiers whose pre-dictive ability is consistently good in both simulation and real data studies,on small and large dimensional classification problems. In particular, theempirical results show that our approach either outperforms its competitorsor, when this is not the case, its performance is still in the ballpark of that ofthe best classifiers. Such a reliable behaviour across different scenarios is notshared by the other selected classifiers. Moreover, the directional quantileclassifiers enjoy optimal theoretical properties under certain assumptions.A limitation of the approach is that the number of directions needed tospan a p -sphere with a regular grid becomes prohibitive already at modestvalues of p . On the other hand, our theoretical results indicate that one cansample directions from an optimal hyperplane, thus reducing the computa-tional burden, but not at the expense of the classifier’s performance. Ourstrategy allows us to balance the importance of quantile levels and directions IRECTIONAL QUANTILE CLASSIFIERS Table 2
Misclassification rates averaged over 100 replications for ten classifiers (DQC, directionalquantile classifier; Centroid, centroid classifier; Median, median classifier; CQC,componentwise quantile classifier; EQC, ensemble quantile classifier; LDA, lineardiscriminant analysis; KNN, k-nearest neighbour; PLR, penalised logistic regression;SVM, support vector machines; Bayes, na¨ıve Bayes) in the first scenario wherepopulations have symmetric distributions.
Uncorrelated CorrelatedDimension p
10 50 100 500 10 50 100 500
Sample size n = 50DQC 0.334 0.187 0.120 0.020 0.315 0.202 0.128 0.028Centroid 0.355 0.232 0.168 0.049 0.349 0.277 0.189 0.059Median 0.372 0.230 0.153 0.043 0.357 0.252 0.170 0.047CQC 0.362 0.273 0.220 0.180 0.367 0.284 0.222 0.177EQC 0.373 0.240 0.172 0.044 0.339 0.253 0.168 0.055LDA 0.365 0.382 0.295 0.313 0.245 0.252 0.308 0.339KNN 0.362 0.287 0.263 0.212 0.360 0.300 0.271 0.211PLR 0.348 0.199 0.134 0.023 0.275 0.154 0.103 0.025SVM 0.413 0.252 0.140 0.046 0.401 0.263 0.140 0.049Bayes 0.390 0.333 0.302 0.225 0.395 0.327 0.287 0.237 Sample size n = 100DQC 0.306 0.145 0.089 0.015 0.283 0.146 0.076 0.017Centroid 0.325 0.181 0.114 0.025 0.331 0.210 0.132 0.036Median 0.334 0.194 0.129 0.032 0.339 0.211 0.136 0.033CQC 0.343 0.213 0.151 0.076 0.341 0.223 0.157 0.092EQC 0.337 0.213 0.135 0.039 0.329 0.193 0.138 0.040LDA 0.338 0.236 0.393 0.182 0.226 0.055 0.105 0.240KNN 0.346 0.214 0.184 0.102 0.325 0.223 0.191 0.108PLR 0.329 0.182 0.113 0.019 0.240 0.092 0.056 0.020SVM 0.370 0.176 0.106 0.032 0.382 0.153 0.069 0.034Bayes 0.367 0.284 0.227 0.179 0.371 0.291 0.242 0.179 Sample size n = 500DQC 0.286 0.128 0.069 0.010 0.263 0.126 0.058 0.010Centroid 0.300 0.145 0.080 0.014 0.288 0.154 0.077 0.016Median 0.320 0.173 0.101 0.020 0.315 0.178 0.099 0.019CQC 0.327 0.176 0.108 0.023 0.320 0.183 0.103 0.025EQC 0.324 0.177 0.106 0.021 0.296 0.149 0.085 0.021LDA 0.302 0.160 0.106 0.367 0.196 0.027 0.000 0.036KNN 0.326 0.163 0.098 0.018 0.283 0.166 0.097 0.022PLR 0.301 0.161 0.104 0.013 0.194 0.044 0.018 0.008SVM 0.300 0.142 0.084 0.018 0.238 0.066 0.020 0.014Bayes 0.329 0.198 0.145 0.077 0.326 0.199 0.140 0.0764 FARCOMENI ET AL
Table 3
Misclassification rates averaged over 100 replications for ten classifiers (DQC, directionalquantile classifier; Centroid, centroid classifier; Median, median classifier; CQC,componentwise quantile classifier; EQC, ensemble quantile classifier; LDA, lineardiscriminant analysis; KNN, k-nearest neighbour; PLR, penalised logistic regression;SVM, support vector machines; Bayes, na¨ıve Bayes) in the second scenario wherepopulations have distributions with same skewness.
Uncorrelated CorrelatedDimension p
10 50 100 500 10 50 100 500
Sample size n = 50DQC 0.313 0.170 0.096 0.052 0.306 0.169 0.095 0.059Centroid 0.323 0.212 0.145 0.097 0.330 0.220 0.140 0.113Median 0.334 0.206 0.147 0.105 0.334 0.215 0.140 0.106CQC 0.350 0.235 0.187 0.234 0.360 0.245 0.193 0.248EQC 0.340 0.180 0.089 0.015 0.333 0.178 0.102 0.019LDA 0.317 0.383 0.238 0.228 0.315 0.397 0.233 0.237KNN 0.382 0.275 0.210 0.064 0.364 0.282 0.213 0.078PLR 0.313 0.183 0.095 0.002 0.322 0.186 0.098 0.004SVM 0.330 0.224 0.150 0.021 0.332 0.240 0.150 0.021Bayes 0.378 0.281 0.220 0.153 0.377 0.272 0.223 0.161 Sample size n = 100DQC 0.293 0.129 0.060 0.012 0.280 0.118 0.058 0.010Centroid 0.310 0.168 0.106 0.057 0.307 0.161 0.104 0.065Median 0.328 0.177 0.110 0.067 0.316 0.171 0.106 0.071CQC 0.317 0.173 0.110 0.135 0.314 0.160 0.113 0.137EQC 0.310 0.135 0.071 0.006 0.296 0.126 0.062 0.006LDA 0.301 0.218 0.374 0.084 0.281 0.203 0.395 0.089KNN 0.358 0.242 0.188 0.047 0.353 0.256 0.186 0.045PLR 0.300 0.163 0.079 0.001 0.284 0.153 0.078 0.001SVM 0.318 0.177 0.089 0.005 0.333 0.168 0.079 0.005Bayes 0.330 0.229 0.167 0.095 0.334 0.225 0.166 0.098 Sample size n = 500DQC 0.273 0.097 0.038 0.000 0.265 0.097 0.035 0.000Centroid 0.282 0.119 0.059 0.007 0.275 0.116 0.056 0.008Median 0.295 0.128 0.074 0.017 0.286 0.124 0.069 0.019CQC 0.272 0.099 0.053 0.019 0.261 0.092 0.050 0.018EQC 0.267 0.088 0.035 0.001 0.244 0.079 0.032 0.001LDA 0.279 0.116 0.060 0.374 0.266 0.114 0.057 0.372KNN 0.323 0.206 0.140 0.016 0.310 0.207 0.140 0.015PLR 0.279 0.121 0.060 0.000 0.266 0.119 0.056 0.000SVM 0.283 0.109 0.046 0.000 0.274 0.107 0.044 0.000Bayes 0.273 0.129 0.080 0.020 0.266 0.125 0.077 0.021 IRECTIONAL QUANTILE CLASSIFIERS Table 4
Misclassification rates averaged over 100 replications for ten classifiers (DQC, directionalquantile classifier; Centroid, centroid classifier; Median, median classifier; CQC,componentwise quantile classifier; EQC, ensemble quantile classifier; LDA, lineardiscriminant analysis; KNN, k-nearest neighbour; PLR, penalised logistic regression;SVM, support vector machines; Bayes, na¨ıve Bayes) in the third scenario wherepopulations have distributions with opposite skewness.
Uncorrelated CorrelatedDimension p
10 50 100 500 10 50 100 500
Sample size n = 50DQC 0.199 0.171 0.166 0.159 0.237 0.172 0.110 0.023Centroid 0.228 0.176 0.169 0.160 0.362 0.265 0.190 0.066Median 0.321 0.283 0.273 0.264 0.359 0.240 0.166 0.045Quantile 0.236 0.112 0.087 0.073 0.371 0.279 0.215 0.181EQC 0.315 0.279 0.256 0.234 0.349 0.239 0.162 0.051LDA 0.277 0.450 0.253 0.161 0.248 0.270 0.298 0.349KNN 0.277 0.213 0.192 0.173 0.365 0.284 0.214 0.074PLR 0.259 0.252 0.213 0.173 0.317 0.189 0.100 0.003SVM 0.231 0.175 0.170 0.159 0.338 0.240 0.157 0.018Bayes 0.229 0.132 0.123 0.106 0.373 0.288 0.227 0.145 Sample size n = 100DQC 0.188 0.162 0.165 0.166 0.195 0.133 0.071 0.016Centroid 0.214 0.167 0.167 0.166 0.336 0.212 0.128 0.033Median 0.314 0.287 0.285 0.275 0.341 0.215 0.132 0.032Quantile 0.214 0.086 0.071 0.058 0.346 0.226 0.159 0.091EQC 0.296 0.254 0.256 0.241 0.334 0.198 0.132 0.039LDA 0.237 0.300 0.456 0.174 0.222 0.056 0.110 0.238KNN 0.246 0.204 0.191 0.184 0.351 0.251 0.189 0.044PLR 0.234 0.252 0.235 0.187 0.284 0.156 0.079 0.001SVM 0.221 0.169 0.170 0.166 0.323 0.174 0.083 0.004Bayes 0.187 0.112 0.105 0.095 0.343 0.232 0.169 0.092 Sample size n = 500DQC 0.182 0.166 0.162 0.159 0.177 0.111 0.053 0.010Centroid 0.209 0.170 0.165 0.160 0.283 0.153 0.076 0.015Median 0.312 0.288 0.283 0.279 0.312 0.178 0.099 0.020Quantile 0.203 0.069 0.052 0.041 0.316 0.182 0.104 0.024EQC 0.282 0.249 0.241 0.236 0.293 0.148 0.085 0.021LDA 0.212 0.194 0.212 0.474 0.194 0.027 0.000 0.032KNN 0.193 0.173 0.176 0.178 0.311 0.206 0.138 0.015PLR 0.213 0.201 0.226 0.237 0.266 0.118 0.057 0.001SVM 0.209 0.172 0.167 0.163 0.274 0.107 0.043 0.001Bayes 0.164 0.102 0.096 0.086 0.269 0.126 0.080 0.0196 FARCOMENI ET AL
Table 5
Cross-validated estimates of the misclassification rates for the Crohn’s disease dataset( p = 5 ) and its noisy version ( p = 50 ) using ten classifiers (DQC, directional quantileclassifier; Centroid, centroid classifier; Median, median classifier; CQC, componentwisequantile classifier; EQC, ensemble quantile classifier; LDA, linear discriminant analysis;KNN, k -nearest neighbour; PLR, penalised logistic regression; SVM, support vectormachines; Bayes, na¨ıve Bayes). p = 5 p = 50DQC 0.229 0.229Centroid 0.286 0.286Median 0.400 0.400CQC 0.314 0.343EQC 0.314 0.314LDA 0.257 0.543KNN 0.371 0.343PLR 0.286 0.343SVM 0.257 0.257Bayes 0.286 0.257 used for classification by means of weights, which can be optimised using aconvenient closed-form expression. IRECTIONAL QUANTILE CLASSIFIERS APPENDIX A - PROOFS OF THEOREMS
A.1. Proofs of Lemma 1 and Theorem 1.
Proof.
The proofs of Lemma 1 and Theorem 1 follow the argumentsgiven in Hennig and Viroli (2016a, Supplementary Material). Here, we brieflysketch the main idea. The optimal value θ that minimises the theoreticalmisclassification probability can be obtained by setting the first derivativeof (10) to zero, from which π α g α { ˜ Q ( θ ; u ) } = π β g β { ˜ Q ( θ ; u ) } . By assumption, there exists θ ∈ (0 ,
1) such that ˜ Q ( θ ; u ) = z . Hence,the identity above is satisfied because Q α ( θ ; u ) and Q β ( θ ; u ) are continuousfunctions of θ that converge to the lower and upper bound of Z for θ ap-proaching either 0 or 1, respectively. Furthermore, under the assumptions ofTheorem 1, the optimal Bayesian classifier has a single decision boundaryat ˜ Q ( θ ; u ). A.2. Proof of Lemma 2.
Proof.
Without loss of generality, assume Q (1) Z ( θ ) ≤ Q (2) Z ( θ ). Let ∆( θ ; z ) =Φ (2) ( θ ; z ) − Φ (1) ( θ ; z ) and consider three possible, distinct cases: z ≤ Q (1) Z ( θ ), Q (1) Z ( θ ) < z < Q (2) Z ( θ ), and Q (2) Z ( θ ) ≤ z .If z ≤ Q (1) Z ( θ ), then∆( θ ; z ) = (1 − θ ) { Q (2) Z ( θ ) − z } − (1 − θ ) { Q (1) Z ( θ ) − z } = (1 − θ ) { Q (2) Z ( θ ) − Q (1) Z ( θ ) } ≤ Q (2) Z ( θ ) − Q (1) Z ( θ )by definition. If Q (1) Z ( θ ) < z < Q (2) Z ( θ ), then∆( θ ; z ) = (1 − θ ) { Q (2) Z ( θ ) − z } − θ { z − Q (1) Z ( θ ) } = θ { Q (1) Z ( θ ) − Q (2) Z ( θ ) } + Q (2) Z ( θ ) − z ≤ θQ (1) Z ( θ ) − Q (2) Z ( θ ) ≤ Q (2) Z ( θ ) − Q (1) Z ( θ ) . Finally, if Q (2) Z ( θ ) ≤ z , then∆( θ ; z ) = θ { z − Q (2) Z ( θ ) } − θ { z − Q (1) Z ( θ ) }≤ Q (2) Z ( θ ) − Q (1) Z ( θ ) . FARCOMENI ET AL
A.3. Proof of Theorem 2.
Proof.
By Lemma 2, the differences Φ (1) ( θ ; u (cid:62) x ) − Φ (2) ( θ ; u (cid:62) x ) andΦ (2) ( θ ; u (cid:62) x ) − Φ (1) ( θ ; u (cid:62) x ) are upper bounded by Q (2) Z ( θ ; u ) − Q (1) Z ( θ ; u )since Q (2) Z ( θ ; u ) > Q (1) Z ( θ ; u ). Therefore the quantity in (6), which is to beminimised with respect to u subject to (cid:107) u (cid:107) = 1, is uniformly bounded aboveby π (cid:90) X (cid:110) Φ (1) ( θ ; u (cid:62) x ) − Φ (2) ( θ ; u (cid:62) x ) (cid:111) d F (1) ( x )+ π (cid:90) X (cid:110) Φ (2) ( θ ; u (cid:62) x ) − Φ (1) ( θ ; u (cid:62) x ) (cid:111) d F (2) ( x ) + λ ( u (cid:62) u − ≤ Q (2) Z ( θ ; u ) − Q (1) Z ( θ ; u ) + λ ( u (cid:62) u − Q W ( θ ; u ) + u (cid:62) µ (2) ) − ( Q W ( θ ; u ) + u (cid:62) µ (1) ) + λ ( u (cid:62) u − u (cid:62) ( µ (2) − µ (1) ) + λ ( u (cid:62) u − . To find u , we minimise the Lagrangian function u (cid:62) ( µ (2) − µ (1) )+ λ ( u (cid:62) u − A.4. Proof of Theorem 3.
Proof.
Let Q ( k ) Z ( θ ; u s ) be the empirical quantile computed on the pro-jected training data u (cid:62) s X ( k ) . We writeΦ ( k ) ( θ ; u (cid:62) s Y ) = γ ( k ) s ( θ ) | u (cid:62) s Y − Q ( k ) Z ( θ ; u s ) | , where γ ( k ) s ( θ ) = θ + (1 − θ ) I { u (cid:62) s Y < Q ( k ) Z ( θ ; u s ) } . Let µ y denote thevector of quantiles of Y , and put µ ( k ) y = µ ( k ) − µ y for k = 1 , V = Y − µ y . By the triangular inequality γ (2) s ( θ ) | u (cid:62) s Y − Q (2) Z ( θ ; u s ) | − γ (1) s ( θ ) | u (cid:62) s Y − Q (1) Z ( θ ; u s ) | = γ (2) s ( θ ) | u (cid:62) s V − u (cid:62) s µ (2) y | − γ (1) s ( θ ) | u (cid:62) s V − u (cid:62) s µ (1) y | + τ | Q (2) Z ( θ ; u s ) − u (cid:62) s µ (2) | + τ | Q (1) Z ( θ ; u s ) − u (cid:62) s µ (1) | , where τ and τ satisfy | τ k | ≤ k = 1 ,
2. Hence T ≡ S (cid:88) s =1 γ (2) s ( θ ) | u (cid:62) s Y − Q (2) Z ( θ ; u s ) | − γ (1) s ( θ ) | u (cid:62) s Y − Q (1) Z ( θ ; u s ) | = T + τ R + τ R , IRECTIONAL QUANTILE CLASSIFIERS where T = (cid:80) Ss =1 γ (2) s ( θ ) | u (cid:62) s V − u (cid:62) s µ (2) y | − γ (1) s ( θ ) | u (cid:62) s V − u (cid:62) s µ (1) y | , R = (cid:80) Ss =1 | Q (1) Z ( θ ; u s ) − u (cid:62) s µ (1) | and R = (cid:80) Ss =1 | Q (2) Z ( θ ; u s ) − u (cid:62) s µ (2) | .Given the convergence of the empirical quantiles to the respective popu-lation quantiles, it follows that P (1) ( T > c − c Sn − / ) ≥ P (1) ( T > c ) − P ( R > c Sn − / ) − P ( R > c Sn − / ) ≥ P (1) ( T > c ) − S (cid:88) s =1 e − n δ (1) s − S (cid:88) s =1 e − n δ (2) s for any c , c >
0, where δ ( k ) s = (cid:20) min (cid:26) F ( k ) (cid:18) u (cid:62) s µ ( k ) + c Sn / (cid:19) − θ, θ − F ( k ) (cid:18) u (cid:62) s µ ( k ) − c Sn / (cid:19)(cid:27)(cid:21) . Now define d s = E (cid:110) γ (2) s ( θ ) | u (cid:62) s ( V − µ (2) y ) | − γ (1) s ( θ ) | u (cid:62) s ( V − µ (1) y ) | (cid:111) . Given (cid:15) >
0, let K (cid:15) denote the set of indices s ∈ { , , . . . , S } such that | γ (2) s ( θ ) u (cid:62) s µ − γ (1) s ( θ ) u (cid:62) s µ | > (cid:15) ∀ θ ∈ (0 , Y has distribution F (1) , we have d s = E (cid:110) γ (2) s ( θ ) | u (cid:62) s ( Y − µ ) | − γ (1) s ( θ ) | u (cid:62) s ( Y − µ ) | (cid:111) = γ (2) s ( θ ) E | u (cid:62) s ( Z + µ − µ ) | − γ (1) s ( θ ) E | u (cid:62) s Z | , where E is the expectation under P (1) .Therefore, by assumption ( iii ) and provided c ≥ (cid:15) , we have (cid:88) s ∈K (cid:15) d s ≥ a ( c )( (cid:93) K c )where a ( c ) >
0, with a ( c ) = γ (2) s ( θ ) E | u (cid:62) s ( Z + µ − µ ) | − γ (1) s ( θ ) E | u (cid:62) s Z | in view of ( iii ). As a consequence, for E ( T ) = (cid:80) Ss =1 d s and (cid:15) →
0, and ∀ c ,we have E ( T ) ≥ a ( c )( (cid:93) K c ) , (A.1)where (cid:93)A denotes the cardinality of the set A . FARCOMENI ET AL
By the Chebychev inequality and provided that c < E ( T ), we have P (1) ( T > c ) ≥ − P (1) ( | T − E ( T ) | > c ) ≥ − c − E { T − E ( T ) } ≥ − c − var ( T ≥ − A c − S, (A.2)where var denotes the variance under P (1) and the second inequality followsfrom assumption ( ii ); more specificallyvar ( T ) = var (cid:40) S (cid:88) s =1 (cid:16) γ (2) s ( θ ) | u (cid:62) s ( V − µ (2) y ) | − γ (1) s ( θ ) | u (cid:62) s ( V − µ (1) y ) | (cid:17)(cid:41) ≤ var (cid:40) S (cid:88) s =1 (cid:16) γ (2) s ( θ ) u (cid:62) s ( V − µ (2) y ) − γ (1) s ( θ ) u (cid:62) s ( V − µ (1) y ) (cid:17)(cid:41) = var (cid:40) S (cid:88) s =1 (cid:16) γ (2) s ( θ ) u (cid:62) s ( W + µ (1) − µ (2) ) − γ (1) s ( θ ) u (cid:62) s W (cid:17)(cid:41) ≤ S (cid:88) s =1 A u (cid:62) s u s + 2 S − (cid:88) s =1 S (cid:88) s (cid:48) = s +1 A u (cid:62) s u s (cid:48) . Stam (1982) proved that a uniform random variable on the sphere, U ∈ R p , converges to a standard Gaussian as p → ∞ . Therefore, for S → ∞ , bythe strong law of large numbers we have2 (cid:80) S − s =1 (cid:80) Ss (cid:48) = s +1 A U (cid:62) s U s (cid:48) S ( S − a.s. −−→ A E( Ξ (cid:62) Ξ ) = 0 , where Ξ and Ξ are two independent standard Gaussians. This explainswhy the covariances become negligible in the last part of (A.2) as p increases.It remains to prove that c < E ( T ). Consider c = c Sn / , where c isa positive constant. By (A.1), the latter holds if c Sn − / < a ( c ) K c . Butthis is true because it implies that S (cid:16) n / (cid:93) K c (cid:17) − < a ( c ) c − , where the term on the left goes to zero according to assumption ( iv ) while a ( c ) >
0, thus c − > c = c Sn / , we have P (1) ( T > c Sn − / − c Sn − / ) ≥ − A nc S − S (cid:88) s =1 e − n δ (1) s − S (cid:88) s =1 e − n δ (2) s . IRECTIONAL QUANTILE CLASSIFIERS We wish to choose c and c such as P (1) ( T > ≥ − (cid:15). Therefore, we fix (cid:15) and choose c such that A c A ≤ (cid:15) , where A is defined inassumption ( i ). It follows that A Sc = A nc S ≤ A c A ≤ (cid:15). Then we choose c such that c > c and observe that 2 (cid:80) Ss =1 e − n δ (1) s +2 (cid:80) Ss =1 e − n δ (2) s → n, S → ∞ . Since this is true for each (cid:15) >
0, then P (1) ( T > →
1, and similarly P (2) ( T < → FARCOMENI ET AL
REFERENCES
Bradley, R. C. (2005). Basic properties of strong mixing conditions: A survey and someopen questions.
Probability Surveys Cortes, C. and
Vapnik, V. (1995). Support-vector networks.
Machine Learning Cover, T. M. and
Hart, P. E. (1967). Nearest neighbor pattern classification.
IEEETransactions on Information Theory Geraci, M. (2016). Qtools: A collection of models and other tools for quantile inference.
R Journal Geraci, M. (2020). Qtools: Utilities for quantiles R package version 1.5.2.
Geraci, M. , Boghossian, N. S. , Farcomeni, A. and
Horbar, J. D. (2020). Quantilecontours and allometric modelling for risk classification of abnormal ratios with anapplication to asymmetric growth-restriction in preterm infants.
Statistical Methods inMedical Research Hall, P. , Titterington, D. M. and
Xue, J. H. (2009). Median-based classifiers forhigh-dimensional data.
Journal of the American Statistical Society
Hand, D. J. and
Yu, K. (2001). Idiot’s Bayes - Not so stupid after all?
InternationalStatistical Review Hennig, C. and
Viroli, C. (2016a). Quantile-based classifiers.
Biometrika
Hennig, C. and
Viroli, C. (2016b). quantileDA: Quantile classifier R package version1.1.
Joe, H. (2006). Generating random correlation matrices based on partial correlations.
Journal of Multivariate Analysis Kong, L. and
Mizera, I. (2012). Quantile Tomography: Using Quantiles with Multivari-ate Data.
Statistica Sinica Lai, Y. and
McLeod, I. (2019). eqc: Ensemble quantile classification R package version1.2-2.
Lai, Y. and
McLeod, I. (2020). Ensemble quantile classifier.
Computational Statistics &Data Analysis
Meyer, D. , Dimitriadou, E. , Hornik, K. , Weingessel, A. and
Leisch, F. (2019).e1071: Misc functions of the department of statistics, probability theory group (for-merly: E1071), TU Wien R package version 1.7-3.
Park, M. Y. and
Hastie, T. (2008). Penalized logistic regression for detecting geneinteractions.
Biostatistics Park, M. Y. and
Hastie, T. (2018). stepPlr: L2 penalized logistic regression with step-wise variable selection R package version 0.93.
Qiu, W. and
Joe, H. (2015). clusterGeneration: Random cluster generation (with specifieddegree of separation) R package version 1.3.4.
Sorrentino, D. , Avellini, C. , Geraci, M. , Dassopoulos, T. , Zarifi, D. , Vadal´a diPrampero, S. F. and
Benevento, G. (2014). Tissue studies in screened first-degreerelatives reveal a distinct Crohn’s disease phenotype.
Inflammatory Bowel Diseases Stam, A. J. (1982). Limit theorems for uniform distributions on spheres in high-dimensional Euclidean spaces.
Journal of Applied Probabability R Core Team (2020). R: A language and environment for statistical computing R Foun-dation for Statistical Computing, Vienna, Austria.
Tibshirani, R. , Hastie, T. , Narasimhan, B. and
Chu, G. (2002). Diagnosis of multiplecancer types by shrunken centroids of gene expression.
Proceedings of the NationalAcademy of Sciences of the United States of America Venables, W. N. and
Ripley, B. D. (2002).
Modern applied statistics with S , Fourthed. Springer, New York, NY.
Wang, L. , Zhu, J. and
Zou, H. (2008). Hybrid Huberized support vector machines formicroarray classification and gene selection.
Bioinformatics24