[PDF] A test for comparing conditional ROC curves with multidimensional covariates

Abstract

The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in the literature to compare the discriminatory capability of different classification procedures based on diagnostic variables. The performance of these variables can be sometimes influenced by the presence of other covariates, and thus they should be taken into account when making the comparison. A new non-parametric test is proposed here for testing the equality of two or more dependent ROC curves conditioned to the value of a multidimensional covariate. Projections are used for transforming the problem into a one-dimensional approach easier to handle. Simulations are carried out to study the practical performance of the new methodology. A real data set of patients with Pleural Effusion is analysed to illustrate this procedure.

Full PDF

AA test for comparing conditional ROC curves withmultidimensional covariates

Ar´ıs Fanjul-Hevia , Juan Carlos Pardo-Fern´andez , Ingrid Van Keilegom , and WenceslaoGonz´alez-Manteiga Departamento de Estad´ıstica e Investigaci´on Operativa y Did´actica de la Matem´atica,Universidad de Oviedo Departamento de Estat´ıstica e Investigaci´on Operativa and Centro de Investigaci´onsBiom´edicas (CINBIO), Universidade de Vigo Research Centre for Operations Research and Statistics, KU Leuven Departamento de Estat´ıstica, An´alise Matem´atica e Optimizaci´on, Universidade de Santiagode Compostela

Abstract

The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in theliterature to compare the discriminatory capability of diﬀerent classiﬁcation procedures based ondiagnostic variables. The performance of these variables can be sometimes inﬂuenced by the pres-ence of other covariates, and thus they should be taken into account when making the comparison.A new non-parametric test is proposed here for testing the equality of two or more dependent ROCcurves conditioned to the value of a multidimensional covariate. Projections are used for transform-ing the problem into a one-dimensional approach easier to handle. Simulations are carried out tostudy the practical performance of the new methodology. A real data set of patients with PleuralEﬀusion is analysed to illustrate this procedure.

Keywords : bootstrap, covariates, hypothesis testing, projections, ROC curves.

In any classiﬁcation problem such as a diagnostic method –in which the aim is to discriminate betweentwo populations, usually identiﬁed as the healthy population and the diseased population– the mainconcern is to minimize the number of subjects that are misclassiﬁed. Receiver Operating Character-istic (ROC) curves are commonly used in this context for studying the behaviour of the classiﬁcationvariables (see, for example, the monograph of Pepe, 2003, as an introduction to the topic). They com-bine the notions of sensitivity (the ability of classifying a diseased patient as diseased) and speciﬁcity(the ability of classifying a healthy individual as healthy), two measurements that can be expressedin terms of the cumulative distribution functions of the diagnostic variables of the diseased and thehealthy populations.When there is more than one variable for diagnosing a certain disease one can compare theirrespective ROC curves in order to decide whether their discriminatory capability is diﬀerent or not.This is what happens in the medical example that we will be using in this paper for illustrating1 a r X i v : . [ s t a t . M E ] F e b urposes, a real data set containing the information of patients with pleural eﬀusion. In this data setthere are two variables (the carbohydrate antigen 152 and the cytokeratin fragment 21-1) that can beused for deciding whether that pleural eﬀusion is due to the presence of a malignant tumour or not.The objective of the analysis will be to compare the diagnostic capability of those markers.There are several methodologies discussed in the literature for making that sort of comparisons(for a review of such methodologies, see Fanjul-Hevia and Gonz´alez-Manteiga, 2018), although mostof them do not consider the possible eﬀect that the presence of covariates can have in the performanceof the test. In the example provided, apart from the diagnostic variables there are other covariatessuch as the age or the neuron-speciﬁc enolase of the patients. It is important to take this informationinto account, because the diagnostic capability of a marker may change with the value of a covariate(Pardo-Fern´andez et al., 2014). In this paper the aim is to propose a test to compare ROC curvesthat includes the presence of a multidimensional covariate in the analysis.One way of introducing the eﬀect of the covariates into the study is by using the conditionalROC curve . If we consider Y F and Y G as the continuous diagnostic markers in the diseased andhealthy populations, respectively, X F “ p X F , . . . , X Fd q as the continuous d ´ dimensional covariate ofthe diseased population and X G “ p X G , . . . , X Gd q as the continuous d ´ dimensional covariate of thehealthy population, then, given a ﬁxed value x “ p x , . . . , x d q P R X (where R X is the intersectionof R X F and R X G , the supports of X F and X G , and is assumed to be non-empty), the conditionalROC curve is deﬁned as ROC x p p q “ ´ F p G ´ p ´ p | x q| x q , p P p , q , (1)where F p y | x q “ P p Y F ď y | X F “ x q , and G p y | x q “ P p Y G ď y | X G “ x q .By comparing these conditional ROC curves instead of the standard ROC curves it is possibleto incorporate the potential eﬀect of the covariates in the analysis of the equivalence of two or moremethods of diagnosis. A test for performing this comparison is proposed in Fanjul-Hevia et al. (2021)for the case of a continuous one-dimensional covariate. The objective here is to extend that method-ology to the case in which we have a multidimensional covariate. Thus, the aim is to test, given acertain x P R X , H : ROC x p p q “ ¨ ¨ ¨ “ ROC x K p p q for all p P p , q , (2)where K is the number of diagnostic markers (and thus, ROC curves) that are being compared. Inthis context we would have K diagnostic variables and one d ´ dimensional covariate in the healthypopulation, p X F , Y F , ¨ ¨ ¨ , Y FK q , and similar variables in the diseased population, p X G , Y G , ¨ ¨ ¨ , Y GK q .In practice this kind of test could help to design a more personalised diagnostic method based onthe covariate values of each patient. With this methodology, in the medical example at hand wecould determine whether the carbohydrate antigen 152 and the cytokeratin fragment 21-1 are equallysuitable for the diagnosis of a patient with a certain age and a certain enolase value.In order to be able to make this comparison, we are going to rely on the estimation of the corre-sponding conditional ROC curves. There is a wide range of estimation methods in the literature: someof them estimate the conditional distribution functions involved in the deﬁnition of the conditionalROC curve, others use regression functions to include the eﬀect of the covariates (following direct orindirect approaches). See Pardo-Fern´andez et al. (2014) for a further review of this topic.In Fanjul-Hevia et al. (2021) the estimation of the conditional ROC curve that is used is based2n the indirect (or induced) regression methodology, which incorporates the covariate informationthrough regression models by considering the eﬀect of those covariates in the diagnostic marker ineach population of healthy or diseased separately. However, this method was originally designed forone single covariate. One could think of extending that methodology by changing the estimator ofthe conditional ROC curve for another capable of handling multidimensional covariates. Neverthe-less, there are not many methods in the literature capable of considering more than one covariatewhen estimating the conditional ROC curve, and most of them have some parametric assumptionsthat we would like to avoid making. See In´acio de Carvalho et al. (2013) as an example of a non-parametric Bayesian model to estimate the conditional distribution functions involved in the ROCcurves, Rodr´ıguez- ´Alvarez et al. (2011) or Rodr´ıguez- ´Alvarez et al. (2018) as an example of a directROC regression model or Rodr´ıguez and Mart´ınez (2014) as an example of induced methodology(framed in a Bayesian setting). In our case we will be following a frequentist approach.The tests related to multidimensional data tend to become less powerful when the dimension ofthe problem increases. This is why, in this paper, the problem of comparing conditional ROC curvesis ﬁrst transformed using projections in such a way that the multidimensional problem becomes aunidimensional problem easier to handle. This idea has been applied several times in the literaturefor reducing the dimension in goodness-of-ﬁt problems (see, for example, Escanciano, 2006; Garc´ıa-Portugu´es et al., 2014; Patilea et al., 2016), but, to the best of our knowledge, it is the ﬁrst time thatit is applied on an ROC curve setting. In the last few years random projections are increasingly beingused as a way to overcome the curse of dimensionality. The characterization of the multidimensionaldistribution of the original data by the distribution of the randomly projected unidimensional data iswhat allows for the reduction of the dimension.To that end, in Section 2 we show how (2) can be transformed in a test with one-dimensional co-variates by using projections. Then, a methodology is proposed for testing that equivalent hypothesis.In Section 3 the results from a simulation study show the practical performance of the test in termsof level approximation and power. The procedure is illustrated in Section 4 by analysing the real dataset containing information of patients with pleural eﬀusion. This section is divided in three subsections. In the ﬁrst one, 2.1, we present a result that allows us totransform the problem discussed in (2) into an equivalent one, easier to handle, by using projectionsto reduce the multidimensional role of the covariate to a unidimensional one.In subsection 2.2 we show a methodology to test the equality of conditional ROC curves on aunidimensional problem (based on the one proposed in Fanjul-Hevia et al., 2021). Finally, in 2.3,we combine that methodology with the result obtained in 2.1 to solve our original problem withmultidimensional covariates. Both sections 2.2 and 2.3 include the statistic proposed to perform thetest and a bootstrap algorithm to approximate its distribution.

In order to present the transformation of the problem, ﬁrst we need to introduce the deﬁnition of theROC curve conditioned to a pair p x F , x G q P R X F ˆ R X G : ROC x F ,x G p p q “ ´ F p G ´ p ´ p | x G q| x F q , p P p , q . (3)3his concept is very similar to the conditional ROC curve (1): the only diﬀerence is that this newdeﬁnition allows us to condition on diﬀerent values for the diseased and healthy populations. In thiscase x F and x G are unidimensional, but the deﬁnition could be applied on a multidimensional case.Even if the interpretability of this new ROC curve is not very clear in practice, theoretically it doesnot present any problems (as it will not do its estimation), as the population of healthy and diseasedare always considered to be independent.The following result is the base for developing the test for comparing ROC curves with multidi-mensional covariates. It borrows the ideas in Escanciano (2006) of using projections for reducing thedimension of the covariate in a regression context. Since here we are dealing with ROC curves, thedimension reduction is less straightforward and some adjustments are required, as each ROC dependson two cumulative distribution functions. To the best of our knowledge, the idea of using projectionshas not been considered in the context of ROC curves.Given x , β P R d , x β denotes the scalar product of the vectors x and β . For now on, all thevectors representing the projections will be considered to be contained in the d ´ dimensional unitsphere S d ´ “ t β P R d : || β || “ u . This way we ensure that all possible directions are equallyimportant. Lemma 1.

Assume E | Y Fk | ă 8 and E | Y Gk | ă 8 for every k P t , . . . , K u . Then, given a certain x P R X , and assuming dependence among the ROC curves (meaning the covariate is common for allthe K curves considered), then ROC x p p q “ ¨ ¨ ¨ “ ROC x K p p q for all p P p , q a.s. if and only if ROC p β F q x , p β G q x p p q “ ¨ ¨ ¨ “ ROC p β F q x , p β G q x K p p q for all p P p , q a.s. for any β F , β G , where β F and β G are d ´ dimensional coordinates in S d ´ that represent the directions of the projec-tions. The proof of this Lemma can be found in the Appendix. Note that p β F q x and ` β G ˘ x are one-dimensional values. By using these ROC curves conditioned to a pair of projected covariates (as deﬁnedin 3), the problem is reduced to a one-dimensional covariate conditional ROC curve comparison testfor each possible direction β F and β G .Thus, taking advantage of the result in Lemma 1, instead of testing for the null hypothesis (2), wemay use this equivalent formulation to develop a methodology that, given a certain x P R X , tests H : ROC p β F q x , p β G q x p p q “ ¨ ¨ ¨ “ ROC p β F q x , p β G q x K p p q for all p P p , q @ β F , β G (4)against the general alternative H : H is not true. The notation @ will be used instead of ‘for any’ toshorten the expression (this applies mainly in the proofs found in the Appendix).In a ﬁrst step, a statistic for testing the equivalence of these ROC curves is presented for a certainpair of ﬁxed projections, and then that statistic is adapted to include all possible directions. The objective in this section is to develop a test for the equivalent problem presented in Lemma 1 fora ﬁxed pair of projections β F and β G . Here a test is presented for comparing two or more dependent4OC curves conditioned to two one-dimensional values. Given the pair p x F , x G q P R X F ˆ R X G , theaim is then to test H : ROC x F ,x G p p q “ ¨ ¨ ¨ “ ROC x F ,x G K p p q for all p P p , q (5)against the general alternative H : H is not true.The samples available in this context are:- tp X Fi , Y F ,i , . . . , Y FK,i qu n F i “ an i.i.d. sample from the distribution of p X F , Y F , . . . , Y FK q ,- tp X Gi , Y G ,i , . . . , Y GK,i qu n G i “ an i.i.d. sample from the distribution of p X G , Y G , . . . , Y GK q ,with n F and n G the sample sizes of the diseased and healthy populations, respectively. Deﬁne n “ n F ` n G as the total sample size used for the estimation of each conditional ROC curve (that will bethe same for all k P t , . . . , K u ). Note that both X F and X G are here one-dimensional covariates.The method used for the estimation of the conditional ROC curves is based on the one proposedin Gonz´alez-Manteiga et al. (2011), which relies on non-parametric location-scale regression models.To be more precise, for each k “ , . . . , K , assume that Y Fk “ µ Fk p X F q ` σ Fk p X F q ε Fk (6) Y Gk “ µ Gk p X G q ` σ Gk p X G q ε Gk (7)where, for D P t

F, G u , µ Dk p¨q “ E p Y Dk | X D “ ¨q and p σ Dk q p¨q “ V ar p Y Dk | X D “ ¨q are the conditionalmean and the conditional variance functions (both of them unknown smooth functions), and the error ε Dk is independent of X D . The dependence structure between the K diagnostic variables is modelledby introducing a dependence structure between the errors: p ε D , . . . , ε DK q will follow a multivariatedistribution function with zero mean and a covariance matrix with ones in the diagonal.Given this location-scale regression model structure for the diagnostic variables, the k ´ th ROCcurve conditioned to a pair of values p x F , x G q P R X F ˆ R X G can be expressed in terms of the marginalcumulative distribution functions of the errors, H Fk and H Gk : ROC x F ,x G k p p q “ ´ H Fk ´` H Gk ˘ ´ p ´ p q b k p x F , x G q ´ a k p x F , x G q ¯ , (8)where a k p x F , x G q “ µ Fk p x F q ´ µ Gk p x G q σ Fk p x F q and b k p x F , x G q “ σ Gk p x G q σ Fk p x F q . Thus, this k ´ th conditional ROC curve can be estimated by { ROC x F ,x G k p p q “ ´ ż ˆ H Fk ˆ´ ˆ H Gk ¯ ´ p ´ p ` h k u q ˆ b k p x F , x G q ´ ˆ a k p x F , x G q ˙ κ p u q du, (9)where, for D P t

F, G u , ‚ ˆ H Dk p y q “ p n D q ´ ř n D i “ I p ˆ ε Dk,i ď y q , ‚ ˆ ε Dk,i “ Y Dk,i ´ ˆ µ Dk p X Di q ˆ σ Dk p X Di q , with i P t , ¨ ¨ ¨ , n D u , ‚ ˆ µ Dk p x q “ ř n D i “ W Dk,i p x, g Dk q Y Dk,i is a non-parametric estimator of µ Dk p x q based on local weights W Dk,i p x, g Dk q depending on a bandwidth parameter g Dk ,5 p ˆ σ Dk q p x q “ ř n D i “ W Dk,i p x, g Dk qr Y Dk,i ´ ˆ µ Dk p X Di qs is a non-parametric estimator of p σ Dk q p x q . Forsimplicity we take the same bandwidth parameter g Dk that is used for the estimation of theregression function ˆ µ Dk p x q , ‚ W Dk,i p x, g Dk q “ κ g Dk p x ´ X Di q ř n D l “ κ g Dk p x ´ X Dl q are Nadaraya-Watson-type weights, where κ g Dk p¨q “ κ p¨{ g Dk q{ g Dk and κ is a probability density function symmetric around zero. ‚ ˆ a k p x F , x G q “ ` ˆ µ Fk p x F q ´ ˆ µ Gk p x G q ˘ { ˆ σ Fk p x F q and ˆ b k p x F , x G q “ ˆ σ Gk p x G q{ ˆ σ Fk p x F q . ‚ h k is a bandwidth parameter responsible for the smoothness of the estimator. Its value does notseem to have a signiﬁcant eﬀect on the conditional ROC curve estimation.This way of estimating the conditional ROC curve is similar to the one proposed in Gonz´alez-Manteigaet al. (2011), with the diﬀerence that they condition the ROC curve on a single value x and here wehave a pair of values x F and x G , each one of them related to the diseased and the healthy population,respectively. As both populations are independent, the adaptation of the methodology of Gonz´alez-Manteiga et al. (2011) to this case is straightforward.Once we know how to estimate this doubly conditional ROC curve we can propose a test statisticfor the test (5): S x “ K ÿ k “ ψ ˆ ? ng k t { ROC kx F ,x G p p q ´ { ROC x F ,x G ‚ p p qu ˙ , (10)where: ‚ for k P t , . . . , K u , g k “ n F g Fk ` n G g Gk n , where g Fk and g Gk are bandwidth parameters involved in theestimation of the k -th conditional ROC curve. ‚ for k P t , . . . , K u , { ROC x F ,x G k p p q is the estimated conditional ROC curve given p x F , x G q , as seenin (9), ‚ { ROC x F ,x G ‚ p p q “ ´ř Kk “ g k ¯ ´ ř Kk “ g k { ROC kx F ,x G p p q is a sort of weighted average of the K conditional ROC curves. ‚ ψ is a real-valued function that measures the diﬀerence between each estimated conditional ROCcurve and the weighted average of all of them. This function may be similar to the ones usedfor the comparison of cumulative distribution functions (after all, a ROC curve can be viewedas a cumulative distribution function). For example, if one considers the L -measure, then theresulting test statistic is S xL “ K ÿ k “ ng k ż ˆ { ROC kx F ,x G p p q ´ { ROC x F ,x G ‚ p p q ˙ dp. On the other hand, when using the Kolmogorov-Smirnov criteria the resulting test statistic is S xKS “ K ÿ k “ ? ng k sup p ˇˇˇˇ { ROC kx F ,x G p p q ´ { ROC x F ,x G ‚ p p q ˇˇˇˇ . The null hypothesis will be rejected for large values of S x . In order to obtain the distribution of thisstatistic, a bootstrap algorithm is proposed. This bootstrap algorithm is adapted from the procedure6roposed in Mart´ınez-Camblor and Corral (2012) and has been already used by Mart´ınez-Cambloret al. (2013) and by Fanjul-Hevia et al. (2021) in the context of ROC curves. The key of this algorithmis that T x “ K ÿ k “ ψ ˆ ? ng k "ˆ{ ROC x F ,x G k p p q ´ { ROC x F ,x G ‚ p p q ˙ ´ ´ ROC x F ,x G k p p q ´ ROC x F ,x G ‚ p p q ¯*˙ , coincides with the statistic S x as long as the null hypothesis holds, where ROC x F ,x G ‚ p p q “ ˜ K ÿ k “ g k ¸ ´ K ÿ k “ g k ROC x F ,x G k p p q , ă p ă . The quantity T x can be rewritten as T x “ K ÿ k “ ψ ˜ K ÿ j “ ? ng j α kj t { ROC x F ,x G j p p q ´ ROC x F ,x G j p p qu ¸ , (11)where α kj “ I p k “ j q ´ ? g k ? g j ´ř Ki “ g i ¯ ´ . Note that, in general, T x cannot be computed from thedata, as it depends on the unknown theoretical conditional ROC curves, but it is useful when applyingthe bootstrap algorithm.The bootstrap algorithm suggested to approximate a p-value for this test is the following:A.1 From the original samples, tp X Fi , Y F ,i , . . . , Y FK,i qu n F i “ and tp X Gi , Y G ,i , . . . , Y GK,i qu n G i “ , compute thetest statistic value (10), that we will denote by s x .A.2 For b “ , . . . , B , generate the bootstrap samples tp X Fi , Y F,b ˚ ,i , . . . , Y F,b ˚ K,i qu n F i “ and tp X Gi , Y G,b ˚ ,i , . . . , Y G,b ˚ K,i qu n G i “ as follows:(i) For each D P t

F, G u , let !´ ε D,b ˚ ,i , . . . , ε D,b ˚ K,i ¯) n D i “ be an i.i.d. sample from the empiricalcumulative multivariate distribution function of the original residuals.(ii) Reconstruct the bootstrap samples tp X Di , Y D,b ˚ ,i , . . . , Y D,b ˚ K,i qu n D i “ for each D P t

F, G u , where Y D,b ˚ k,i “ ˆ µ Dk p X Dk,i q ` ˆ σ Dk p X Dk,i q ε D,b ˚ k,i .A.3 Compute the test statistic based on the bootstrap samples, for b “ , . . . , B using (11) as t x,b ˚ “ K ÿ k “ ψ ˜ K ÿ j “ ? ng j α kj t { ROC x F ,x G ,b ˚ j p p q ´ { ROC x F ,x G j p p qu ¸ , where { ROC x F ,x G ,b ˚ j is the estimated j ´ th conditional ROC curve of the b ´ th bootstrap sample.A.4 The distribution of S x under the null hypothesis (and thus, the distribution of T x ) is approx-imated by the empirical distribution of the values t t x, ˚ , . . . , t x,B ˚ u and the p-value is approxi-mated by p ´ value “ B B ÿ b “ I p s x ď t x,b ˚ q .

7n contrast with the usual bootstrap algorithms in testing setups, in this case the null hypothesisis not employed when generating of the bootstrap samples (Step A.2), because replicating the nullhypothesis of equal ROC curves is not a straightforward problem. Instead, it is used in the computationof the bootstrap statistic (Step A.3) by using T x instead of S x , that are equal under the null hypothesis.This particularity also appears in the bootstrap algorithm of the next section.There are two kind of bandwidth parameters that appear in the estimation of the k ´ th conditionalROC curve (9), with k P t , . . . K u . The ﬁrst one, h k , is taken as 1 {? n , and the second ones, g Fk and g Gk ,are selected by least-squares cross-validation. Note that, for each bootstrap iteration, the bandwidthparameters could change, as their selection depends on the sample. However, h k remains constant,as we are choosing it in terms of the sample size, and that is the same for each bootstrap iteration.As for g Fk and g Gk , for computational issues we have decided to compute them on step A.1 using theoriginal sample, and then apply the same bandwidths for all the bootstrap estimations. The cross-validation method can be very time-consuming, and this simpliﬁcation prevents the simulations tobecome infeasible. Once having seen a strategy for testing (4) for only one pair of ﬁxed directions, the idea now is tomodify the previous procedure so the new statistic takes into account all the possible directions that β F and β G can take. For that purpose, consider the test statistic D x S “ ż S d ´ ż S d ´ S p β F q x , p β G q x d β F d β G , (12)where d β F and d β G represent the uniform density on the sphere of dimension d , S d ´ . This ensuresthat all directions are equally important.The expression S p β F q x , p β G q x is equal to the statistic used in (10) for testing the equality of K ROC curves when conditioned to the value of the pair ` p β F q x , p β G q x ˘ , that is, S p β F q x , p β G q x “ K ÿ k “ ψ ˆ ? ng k t { ROC k p β F q x , p β G q x p p q ´ { ROC p β F q x , p β G q x ‚ p p qu ˙ . Note that, in this context with d ´ dimensional covariates, the samples are tp X Fi , Y F ,i , . . . , Y FK,i qu n F i “ and tp X Gi , Y G ,i , . . . , Y GK,i qu n G i “ , with X Fi “ p X F ,i , ¨ ¨ ¨ , X Fd,i q and X Gi “ p X G ,i , ¨ ¨ ¨ , X Gd,i q .In practice, as it is done in Colling and Van Keilegom (2017), to compute the test statistic D x S random directions β F , . . . , β Fn β and β G , . . . , β Gn β are drawn uniformly from S d ´ , where n β is thenumber of random directions considered (the same number of directions is taken for β F and for β G ).With them, the approximated statistic is˜ D x S “ n β n β ÿ r “ n β ÿ l “ S p β Fr q x , p β Gl q x . (13)In order to obtain the distribution of the statistic, a bootstrap algorithm (similar to the onedescribed in the previous section) is proposed. To do so, the following expression is introduced: D x T “ ż S d ´ ż S d ´ T p β F q x , p β G q x d β F d β G , (14)8here T p β F q x , p β G q x is the same as in (11), but for the conditioning values of ` p β F q x , p β G q x ˘ : T p β F q x , p β G q x “ K ÿ k “ ψ ˜ K ÿ j “ ? ng j α kj t { ROC p β F q x , p β G q x j p p q ´ ROC p β F q x , p β G q x j p p qu ¸ . As it happened in (11), T p β F q x , p β G q x cannot be computed without knowing the true distribution ofthe diagnostic markers. However, it can be computed in the bootstrap algorithm below, and there D x T is approximated by ˜ D x T “ n β n β ÿ r “ n β ÿ l “ T p β Fr q x , p β Gl q x . (15)As happened before, for two given projections β F and β G , S p β F q x , p β G q x and T p β F q x , p β G q x coincideas long as the null hypothesis holds, and thus the same happens with D x S and D x T .Taking into account these approximations, the resulting bootstrap algorithm goes as follows:B.1 Draw n β random directions β F , . . . , β Fn β and β G , . . . , β Gn β uniformly from S d ´ .B.2 For each random directions β Fr and β Gl (with r, l P t , . . . , n β u ) , consider the sample !´ p β Fr q X Fi , Y F ,i , . . . , Y FK,i ¯) n F i “ and !´ p β Gl q X Gi , Y G ,i , . . . , Y GK,i ¯) n G i “ and the conditioning values ` p β Fr q x , p β Gl q x ˘ . With them, following steps A.1–A.3 of the bootstrap algorithm of the previoussubsection, compute the value of s p β Fr q x , p β Gl q x and the B corresponding t p β Fr q x , p β Gl q x ,b ˚ .B.3 Compute ˜ d x S “ n β ř n β r “ ř n β l “ s p β Fr q x , p β Gl q x and ˜ d x ,b ˚ T “ n β ř n β r “ ř n β l “ t p β Fr q x , p β Gl q x ,b ˚ as in (13)and (15).B.4 Approximate the p-value of the test by: p ´ value “ B B ÿ b “ I p ˜ d x S ď ˜ d x ,b ˚ T q . Remark 1.

Note that n β represents the number of random directions drawn from S d ´ consideredfor the approximation of (13) and (15), but that, in fact, we are using n β diﬀerent combination ofpairs p β F , β G q P S d ´ ˆ S d ´ to make that approximation. This could become a problem from thecomputational point of view, as the complexity of the problem increases very fast when increasing thevalue of n β .As an alternative, we could consider using D x S “ ż S d ´ ˆ S d ´ S p β F q x , p β G q x d β F β G , instead of statistic (12), where d β F β G represents the uniform density on the torus of dimension d , S d ´ ˆ S d ´ . This ensures, as before, that all pairs of directions are equally important. Thus, inpractice, instead of using the approximation (13) we could considerˆ D x S “ m β m β ÿ r “ S p β Fr q x , p β Gr q x , p β F , β G q , . . . , p β Fm β , β Gm β q are pairs of random directions drawn uniformly from S d ´ ˆ S d ´ ,and where m β would represent here the same as n β before, with the advantage that it allows for moreﬂexibility because it can assume non-squared values. A similar adaptation could be applied for theapproximation of D x T in (14). Remark 2.

In the literature we can ﬁnd papers, like for example Cuesta-Albertos et al. (2007) orCuesta-Albertos et al. (2019), that use only one random projection. The main idea is to performthe test at hand for a randomly selected projection instead of for all possible projections. The useof projections results in a dimension reduction (as desired), and, despite being a procedure that mayproduce less powerful tests, the use of one single projection results in a reduction of the computationalcost.Following that idea, instead of testing the equality of covariate-projected ROC curves for allpossible projections, we could test the equality of covariate-projected ROC curves for some randompair of projections given a certain x P R X , meaning: H : ROC p β F q x , p β G q x “ ¨ ¨ ¨ “ ROC p β F q x , p β G q x K for some β F , β G . (16)The equivalence between this hypothesis and the one of interest in this paper given in (2) stillneeds theoretical justiﬁcation. However, it is a possibility worth studying, if only for computationalreasons. A way of perform this approach could be to consider the proposed methodology for n β “ In order to analyse the performance of the proposed methodology, simulations were run for the com-parison of several dependent conditional ROC curves. On a ﬁrst stage, these simulations were focusedon analysing the behaviour of the unidimensional test described in Section 2.2, but we do not displaythem here, as they are very similar to the ones that can be found in Fanjul-Hevia et al. (2021). In-stead, we show the results for several scenarios (ﬁrst under the null hypothesis and then under thealternative) in which we compare K ROC curves (with K P t , u ) conditioned to a d ´ dimensionalcovariate (with d P t , u ).All the curves used in the simulation study were drawn from location-scale regression modelssimilar to the ones presented in (6) and (7), only that, in this case, the regression and the conditionalstandard deviation functions are for d ´ dimensional covariates. The construction of those curves issummarized in Table 1, were all the diﬀerent conditional mean and conditional standard deviationfunctions are displayed.The regression errors were considered to have multivariate normal distribution with zero mean,variance one and correlation ρ for all the models.In all scenarios the covariates X F , X G , X F , X G , X F and X G are uniformly distributed in the unitinterval. Thus, the value of the multidimensional covariate x at which the conditional ROC curvesshould be compared is contained in r , s d . Particularly, the comparisons are made for x “ p . , . q and for x “ p . , . , . q , for d “ d “

3, respectively.The study contains simulations for diﬀerent sample sizes p n F , n G q P tp , q , p , q , p , qu and diﬀerent values of ρ that represent diﬀerent possible degrees of correlation between the diagnosticvariables under comparison ( ρ P t´ . , , . u ).Moreover, two diﬀerent functions ψ were considered for the construction of S p β F q x , p β G q x : onebased on the L ´ measure and the other one based on the Kolmogorov-Smirnov criterion (from now10 ovariate ROC curves Regression functions Conditional standarddeviation functions ROC x µ F p x q “ sin p . πx q ` . x µ G p x q “ . x x σ F p x q “ . ` . x σ G p x q “ . ` . x x “ ˆ x x ˙ ROC x µ F p x q “ . ` sin p . πx q ` . x µ G p x q “ . x x σ F p x q “ . ` . x σ G p x q “ . ` . x ROC x µ F p x q “ sin p . πx q ` . x µ G p x q “ ´ . ` . x ` . x x σ F p x q “ . ` . x σ G p x q “ . ` . x ROC x µ F p x q “ sin p . πx q` . x ` . x ,µ G p x q “ . x x ` x σ F p x q “ . ` . x ,σ G p x q “ . ` . x x “ ¨˝ x x x ˛‚ ROC x µ F p x q “ sin p . πx q` . x ` . x ,µ G p x q “ x x ` x σ F p x q “ . ` . x σ G p x q “ . ` . x ROC x µ F p x q “ sin p . πx q` . x ` . x ,µ G p x q “ ´ . ` . x x ` x σ F p x q “ . ` . x ` . x σ G p x q “ . ` . x Table 1: Conditional mean and conditional standard deviation functions of the conditional ROC curvesconsidered in the simulation study.2 ´ dimensional covariate ´ dimensional covariate K “ , ROC x ROC x . . . . . . ( x ,x ) = ( ) p R O C x . . . . . . ( x ,x ,x ) = ( ) p R O C x Table 2: Scenarios under the null hypothesis considered for calibrating the level of the test.on denoted by L and KS respectively). The number of iterations used in the bootstrap algorithmwas 200, and 500 data sets were simulated to compute the proportion of rejection in each scenario.Furthermore, the number of directions that was used for approximating the test statistic D x S wastaken as n β “ n β “

25 diﬀerent pairs ofdirections were considered).

The scenarios that were considered for calibrating the level of the test (by comparing the same con-ditional ROC curves) are represented in Table 2.The results of the simulations obtained for n β “ d “

2) and2 (for d “ . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r K = . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r K = . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r Figure 1: Estimated proportion of rejection under the null hypothesis and the corresponding limits ofthe critical region (in gray) for the level 0.05 (dotted black line) with d “ n β “ ρ .In general it can be said that the expected nominal level is reached, as most of the estimatedproportions are close to the corresponding nominal level. The L statistic seems to overestimate thelevel in a few scenarios, but its behaviour improves when increasing the sample size. The KS statisticis a little more conservative. On the other hand, the scenarios that were considered for studying the power of the test (by comparingdiﬀerent conditional ROC curves) are represented in Table 3.The results of the simulations are summarized in Figures 3 (for n β “ K “ K “ . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r K = . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r K = . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r Figure 2: Estimated proportion of rejection under the null hypothesis and the corresponding limits ofthe critical region (in gray) for the level 0.05 (dotted black line) with d “ n β “ ρ . 12 ´ dimensional covariate ´ dimensional covariate K “ ROC x vs. ROC x ROC x vs. ROC x . . . . . . ( x ,x ) = ( ) p R O C x . . . . . . ( x ,x ,x ) = ( ) p R O C x K “ ROC x vs. ROC x vs. ROC x ROC x vs. ROC x vs. ROC x . . . . . . ( x ,x ) = ( ) p R O C x . . . . . . ( x ,x ,x ) = ( ) p R O C x Table 3: Scenarios under the alternative hypothesis considered for calibrating the power of the test.

ROC x and ROC x are represented in purple, ROC x and ROC x in green, and ROC x and ROC x inyellow.respectively, and the ﬁrst and the second column represent the simulation results for d “ d “

3, respectively. In this case, only α “ .

05 was considered.It can be seen that the power of the test grows with the considered sample sizes. The L statisticyields higher power than the KS statistic, which is consistent with KS being more conservative.Moreover, the diﬀerence between the conditional ROC curves considered for the case of d “ d “

3, which translates in higherpower for the cases in which d “ ρ “ .

5, and the lowest for ρ “ ´ . d “ ρ “ ´ . K “ n F , n G ) being (100,100).whereas for the second sample size considered ( n F , n G ) take the value (250,150). The highest samplesize is also unbalanced, but not so much. 13 emark 3. In order to evaluate the modiﬁcation of the method proposed in Remark 1 and 2 we haverun simulations for the same scenarios previously described. We show here the results for the scenarioswith K “ d “ m β “

50 (ﬁrst row) and m β “

25 (second row), and the results for considering only one randomprojection (Remark 2), i.e., m β “ n β “ m β “

25 is comparablewith n β “ L statistic forthe smaller sample size and otherwise close to the nominal level, and the KS statistic is always moreconservative. Increasing m β from 25 to 50 does not seem to aﬀect the results signiﬁcantly, and neitherdoes reducing it to a single random projection ( m β “ m β “ m β “

25 and m β “

1. The ﬁrs two graphics are very similar to the one obtainedfor n β “

200 300 400 500 600 . . . . . . N k d =2 x=(0.5,0.6) K = r =−0.5 r =0 r =0.5 (L2) r =−0.5 r =0 r =0.5 (KS) 200 300 400 500 600 . . . . . . N k d =3 x=(0.5,0.6,0.5)

200 300 400 500 600 . . . . . . N k K =

200 300 400 500 600 . . . . . . N k Figure 3: Estimated proportion of rejection under the alternative hypothesis for diﬀerent sample sizesand diﬀerent ρ , for n β “ α “ . . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r m b =

50 0 . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r m b =

25 0 . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (100,100) r m b = . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,150) r . . −0.5 0 0.5 −0.5 0 0.5 L2 KS (n F ,n G ) = (250,350) r Figure 4: Estimated proportion of rejection under the null hypothesis and the corresponding limits ofthe critical region (in gray) for the level 0.05 (dotted black line) with K “ d “ m β “ , , ρ .

200 300 400 500 600 . . . . . . N k m b =50 K = d = r =−0.5 r =0 r =0.5 (L2) r =−0.5 r =0 r =0.5 (KS) 200 300 400 500 600 . . . . . . N k m b =25

200 300 400 500 600 . . . . . . N k m b =1 Figure 5: Estimated proportion of rejection under the alternative hypothesis for diﬀerent sample sizesand diﬀerent ρ , for n β “ , , K “ d “ m β used to approximate the value of the statistic from 25 to 50. It remains an open problem to determinean optimal value for that parameter.As for the idea mentioned in Remark 2, using only one random projection seems to produce a wellcalibrated test, despite having considerably lower power. An illustration of the proposed test is displayed in this section through the analysis of the previouslymentioned data set concerning 463 patients with pleural eﬀusion. This data set has been provided15y Dr. F. Gude, from the Unidade de Epidemiolox´ıa Cl´ınica of the Hospital Cl´ınico Universitario deSantiago (CHUS), and it has been used for a previous study in Vald´es et al. (2013).From a medical perspective, the goal is to ﬁnd a way to discriminate the patients in which thepleural eﬀusion (PE) has a malignant origin (MPE) from those in which the PE is due to other non-cancer-related causes. 200 individuals form the sample had MPE (the diseased population in thiscontext), against 263 who did not (healthy population). For that matter, two diagnostic markerswere considered, the carbohydrate antigen 152 ( ca125 ) and the cytokeratin fragment 21-1 ( cyfra ).Moreover, the information of two diﬀerent covariates is also available: the age and the neuron-speciﬁcenolase ( nse ). Due to the characteristics of the data (positive values, most of them close to zero, withsome extreme high values), logarithms of those variables – excluding the variable age – were consideredfor the study. Being the logarithm a monotone transformation, its use does not have an eﬀect on theestimation of the common ROC curve. However, it does aﬀect the estimation of the conditional ROCcurves, as it reduces the eﬀect of the more extreme values of the variables. A representation of therelationship of each one of those biomarkers with the two covariates is depicted in Figure 6, for bothMPE (green) and the non-MPE (blue) patients. It can be observed that the shape of the point cloudsFigure 6: Scatterplot of the three diﬀerent diagnostic biomarkers in function of the two covariatesconsidered: age and log p nse q . The healthy subjects are represented in blue and the diseased ones ingreen.of the two populations changes with the values of the covariates, specially in the case of the diseasedpopulation.In order to evaluate whether the discriminatory capability of those markers ( Y F and Y G as thevariables containing the information of log p ca q , and Y F and Y G as the variables containing theinformation of log p cyf ra q ) is the same when the covariates age and log p nse q are taken into account,the methodology explained in previous sections is applied, comparing their respective ROC curvesconditioned to diﬀerent values of the bidimensional covariate X “ p X , X q with X “ age and X “ log p nse q . In order to explore the advantages of using this method over the ones that donot consider multidimensional covariates, we also test the equivalence of the ROC curves of thosediagnostic markers for the case in which no covariates are taken into account and for the case in whichonly one of the covariates is included in the analysis.Figure 7 shows how those two covariates are distributed in the diseased and healthy populations.Note that the covariates have diﬀerent magnitudes: the values that the variable age takes are alwaysgoing to be bigger than the values of log p nse q . Thus, if we were to use the procedure directly overthese variables, when projecting the multidimensional covariate X on any direction, the eﬀect of the16 ge X20 40 60 80 100 . . . . A ll M PE N on − M PE

20 40 60 80 log(nse)

X−2 0 2 4 6 . . . A ll M PE N on − M PE −2 0 2 4 Figure 7: Histograms and boxplots of the two covariates considered ( age and log p nse q ). The healthysubjects are represented in blue and the diseased ones in green. The black histogram lines and thewhite boxplot correspond to the two populations of the healthy and the diseased patients combined.second component will be overshadowed by the ﬁrst component’s. To prevent this from happeningwe decided to use the standardized variables of X and X instead of the originals. This also aﬀectsthe value x at which the conditional ROC curves are being compared. Note that an ROC curveconditioned to a certain value x is the same as the ROC curve in which the covariate is modiﬁed bya one-to-one transformation and that is conditioned to the corresponding transformed x value.Given a non-degenerate multidimensional covariate X the standardization proposed here is toconsider the multidimensional covariate X s “ B ´ p X ´ a q , with B a diagonal matrix with p a V ar p X q , . . . , a V ar p X d qq in the diagonal and a “ p E p X q , ¨ ¨ ¨ , E p X d qq . Then, for a given variable Y , a given y P R and a certain value of the covariate x , P p Y ď y | X “ x q “ P p Y ď y | B ´ p X ´ a q “ B ´ p x ´ a qq “ P p Y ď y | X s “ x s qq , with x s “ B ´ p x ´ a q and, thus, ROC x p p q “ ´ F p G ´ p ´ p | x q| x q “ ´ F p G ´ p ´ p | x s q| x s q “ ROC x s p p q , Note that the standardization that takes place here does not care for the covariance between thecovariates that conform X , as we are only interested on obtaining covariates with similar magnitudes.Also, in practice the standardization is made considering the sample mean and the sample standarddeviation of the covariates at hand.We start the analysis of the performance of the two diagnostic markers by comparing their respec-tive ROC curves without taking into account any covariate information. For that matter we use themethod proposed by DeLong et al. (1988). The estimated ROC curves for both markers are depicted inFigure 8. The p-value obtain for that comparison was 0.138. Similar results were obtained when usingother ways of comparing ROC without covariates (like Mart´ınez-Camblor et al. (2013) or Venkatramanand Begg (1996)). Thus, we do not ﬁnd signiﬁcant diﬀerences between the two diagnostic variables interms of diagnostic accuracy.Next, we compare the two diagnostic markers taking into account a unidimensional covariate using17 ge

51 67 83 p-values ( L

2) 0.454 0.218 0.936 age

51 67 83 p-values ( KS ) 0.512 0.202 0.762log p nse q -0.92 1.14 3.20 p-values ( L

2) 0.844 0.012 0.470 log p nse q -0.92 1.14 3.20 p-values ( KS ) 0.900 0.008 0.412Table 4: Results for the comparison of the ROC curves of the diagnostic markers log p ca q andlog p cyf ra q when considering a unidimensional covariate, that covariate being the age or the log p nse q .the test proposed in Fanjul-Hevia et al. (2021) for dependent diagnostic markers. We consider thecovariates age and log p nse q , each one at a time. We test the equality of the ROC curves conditionedto the values of t , , u in the case of age and the values of t´ . , . , . u in the case oflog p nse q . The corresponding ROC curve for every case is estimated in Figure 8. For each considered . . . . . . ROC curves without covariates R O C log(ca125)log(cyfra) 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . age x = 51 p R O C x ( p ) . . . . . . age x = 67 p R O C x ( p ) . . . . . . age x = 83 p R O C x ( p ) . . . . . . lnse x = −0.92 R O C x ( p ) . . . . . . lnse x = 1.14 R O C x ( p ) . . . . . . lnse x = 3.2 R O C x ( p ) Figure 8: ROC curve estimation for both diagnostic variables ( log(ca125) and log(cyfra) , representedby the solid and the dashed line, respectively) without covariates and conditioned to diﬀerent valuesof the covariates age and log p nse q .covariate and each value of the covariate we obtain a p-value of the test, summarized in Table 4.The test is made considering two types of statistics, one based on the L -measure and the other inthe Kolmogorov-Smirnov criteria, although both of them yield similar results. When comparing theROC curves conditioned on diﬀerent values of the age , the results are in line with the obtained forthe previous case, in which no covariates where taken into account: the equality of the two curves isnot rejected. However, when considering the covariate log p nse q , we see that for a certain value (1.14)the null hypothesis is rejected (for a signiﬁcance level of 5%). This matches the representation of theconditional ROC curves depicted in Figure 8.Finally, we compare the performance of the two diagnostic variables considering the eﬀect of boththe age and the log p nse q at the same time. This is where we use the methodology proposed in thispaper. We test the equality of their respective ROC curves conditioned to nine pairs of values ofthe two covariates: the ones obtained by making all the possible combinations of t , , u and t´ . , . , . u . As before, two diﬀerent type of statistics were considered: L and KS (and onceagain, the results are similar in both cases). The results obtained are summarized in Table 5. Note thatin this case we did not represent the estimated ROC curves conditioned to the bidimensional covariate18 (cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80) log p nse q age

51 67 83-0.92 (cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80) log p nse q age

51 67 83-0.92 log(ca125) and log(cyfra) when considering the multidimensional covariate ( age ,log p nse q ). p age, log p nse qq . This is to stress the fact that, with this methodology, { ROC x (with x bidimensional)does not need to be computed at all.The obtained p-values show that, depending on the pair of values of the covariate considered,we can ﬁnd signiﬁcative diﬀerences between the ROC curves of the log p ca q and the log p cyf ra q markers, including pairs of values that when considered separately in the previous test did not rejectedthe null hypothesis. Likewise, ﬁnding diﬀerences between the ROC curves conditioned to marginalcovariates at certain values does not mean that those diﬀerences will be signiﬁcant when consideringthe multidimensional covariates (for example, when we conditioned the ROC curves marginally to thevalue of 1.14 log p nse q we ﬁnd diﬀerences, but when considering both covariates this diﬀerence betweenthe ROC curves only remains signiﬁcant for the age of 83). In this work a new non-parametric methodology has been presented for comparing two or more depen-dent ROC curves conditioned to the value of a continuous multidimensional covariate. This methodcombines existing techniques for reducing the dimension in goodness-of-ﬁt tests and for estimatingand comparing ROC curves conditioned to a one-dimensional covariate.A simulation study was carried out in order to analyse the practical performance of the test.Two diﬀerent functions were proposed for the construction of the statistic, the L and the KS , thesecond one being a little more conservative. Diﬀerent correlations between the diagnostic variablesand diﬀerent sample sizes have been considered, including uneven ones without any appreciable eﬀecton the test performance.Finally, the methodology was illustrated by means of an application to a data set: with thisnew test it was possible to detect diﬀerences on the discriminatory ability of two diagnostic variablesconditioned to two diﬀerent covariates without the need of an estimator of an ROC curve conditionedto a multidimensional covariate. With this application it becomes clear the importance of being ableto include the eﬀect of multidimensional covariates to the ROC curves analysis, as diﬀerent conclusionscould be drawn of the comparison of those curves when considering a multidimensional covariate, whenconsidering unidimensional covariates or when excluding the covariates from the study. Acknowledgements

The research of A. Fanjul-Hevia is supported by the Ministerio de Educaci´on, Cultura y Deporte (fel-lowship FPU14/05316), as well as by the Spanish Ministerio de Educaci´on y Formaci´on Profesional(Mobility Grant EST18/00673). A. Fanjul-Hevia, W. Gonz´alez-Manteiga and I. Van Keilegom ac-knowledge the support from the Spanish Ministerio de Econom´ıa, Industria y Competitividad, through19rant number and MTM2016-76969-P, which includes support from the European Regional Develop-ment Fund (ERDF). J.C. Pardo-Fern´andez acknowledges ﬁnancial support from grant MTM2017-89422-P, funded by the Spanish Ministerio de Econom´ıa, Industria y Competitividad, the AgenciaEstatal de Investigaci´on and the ERDF. I. Van Keilegom is ﬁnancially supported by the EuropeanResearch Council (2016-2021, Horizon 2020 / ERC grant agreement No. 694409). The Supercom-puting Center of Galicia (CESGA) is acknowledged for providing the computational resources thatallowed to run most of the simulations. Dr. F. Gude (Unidade de Epidemiolox´ıa Cl´ınica, HospitalCl´ınico Universitario de Santiago) is thanked for providing the data set analysed in this article.

Appendix: proofs

The proofs needed for Lemma 1 are presented below.

Lemma 2.

Escanciano (2006) or Cuesta-Albertos et al. (2019): Given a random variable Y such that E | Y | ă 8 , E r Y | X s “ a.s. ô E r Y | β X s “ a.s. for any vector β P S d ´ . (17)From now on it will be assumed that all projections β considered satisfy β P S d ´ . Lemma 3.

Let Y , ¨ ¨ ¨ , Y K be K dependent random variables with cumulative distribution functions F , . . . , F K , respectively, such that E | Y k | ă 8 for every k P t , . . . , K u . Let X be a multidimensionalcovariate. Then, given c , . . . , c K , F p c | X q “ ¨ ¨ ¨ “ F K p c K | X q a.s. ô F p c | β X q “ ¨ ¨ ¨ “ F K p c K | β X q a.s. @ β , (18) with β P S d ´ .Proof. It is proven for K “ F p c | X q “ F p c | X q a.s. ô E r I p Y ď c q| X s “ E r I p Y ď c q| X s a.s. p˚q ô E r I p Y ď c q ´ I p Y ď c q| X s “ a.s. p q ô E r I p Y ď c q ´ I p Y ď c q| β X s “ a.s. @ β ô E r I p Y ď c q| β X s “ E r I p Y ď c q| β X s a.s. @ β ô F β p c | β X q “ F β p c | β X q a.s. @ β , where F β i p c i | β X q “ P p Y i ď c i | β X “ β X q for i “ , X (i.e., there is no X and X as there would be in theindependent case). Deﬁnition 1.

The inverted conditional ROC curve (IROC) is deﬁned as:

IROC p p q “ ´ G p F ´ p ´ q qq , q P p , q . Related to the previous deﬁnition, the inverted conditional ROC curve (

IROC x ) , given the pair20 x F , x G q P R X F ˆ R X G , can also be deﬁned as: IROC x G ,x F p q q “ ´ G p F ´ p ´ q | x F q| x G q , q P p , q . Lemma 4.

The equality of ROC curves is equivalent to the equality of the inverted ROC curves, i.e.,

ROC p p q “ ¨ ¨ ¨ “ ROC K p p q @ p P p , q ô IROC p q q “ ¨ ¨ ¨ “ IROC K p q q @ q P p , q . Moreover, the same property holds when talking about conditional ROC curves. Given the pair p x F , x G q P R X F ˆ R X G , ROC x F ,x G p p q “ ¨ ¨ ¨ “ ROC x F ,x G K p p q @ p P p , q ô IROC x G ,x F p q q “ ¨ ¨ ¨ “ IROC x G ,x F K p q q @ q P p , q . (19) Proof.

It is proven for the unconditional case, and for K “

2. The conditional case is similar.

ROC p p q “ ROC p p q @ p P p , q ô ´ F p G ´ p ´ p qq “ ´ F p G ´ p ´ p qq @ p P p , q Take q “ ´ F p G ´ p ´ p qq (and hence, q “ ROC p p q ). q will take all the values in p , q , and thus, p “ ´ G p F ´ p ´ q qq “ IROC p q q .Then, ROC p p q “ ROC p p q @ p P p , q ô ´ F p G ´ p ´ p ´ G p F ´ p ´ q qqqq “ q @ q P p , qô ´ G p F ´ p ´ q q “ ´ G p F ´ p ´ q qq @ q P p , qô IROC p q q “ IROC p q q @ q P p , q . Proof of Lemma 1

Proof.

It is proven for K “

2. For p P p , q , ROC x p p q “ ROC x p p q a.s. ôô ´ F p G ´ p ´ p | x q| x q “ ´ F p G ´ p ´ p | x q| x q a.s. ô F p G ´ p ´ p | x q| x q “ F p G ´ p ´ p | x q| x q a.s. p q ô F β F ` G ´ p ´ p | x q|p β F q x ˘ “ F β F ` G ´ p ´ p | x q|p β F q x ˘ a.s. @ β F ô ROC p β F q x , x p p q “ ROC p β F q x , x p p q a.s. @ β F p q ô IROC x , p β F q x p q q “ IROC x , p β F q x p q q a.s. @ β F for q P p , qô G pp F β F q ´ p ´ q |p β F q x q| x q “ G pp F β F q ´ p ´ q |p β F q x q| x q a.s. @ β F p q ô G β G pp F β F q ´ p ´ q |p β F q x q|p β G q x q “ G β G pp F β F q ´ p ´ q |p β F q x q|p β G q x q a.s. @ β F , β G ô IROC p β G q x , p β F q x p q q “ IROC p β G q x , p β F q x p q q a.s. @ β F , β G p q ô ROC p β F q x , p β G q x p ˜ p q “ ROC p β F q x , p β G q x p ˜ p q a.s. @ β F , β G for ˜ p P p , q , where F β F ` c |p β F q x ˘ “ P ` Y F ď c |p β F q X F “ p β F q x ˘ , F β F ` c |p β F q x ˘ “ P ` Y F ď c |p β F q X F “ p β F q x ˘ , G β G ` c |p β G q x ˘ “ P ` Y G ď c |p β G q X G “ p β G q x ˘ , G β G ` c |p β G q x ˘ “ P ` Y G ď c |p β G q X G “ p β G q x ˘ .21 eferences Colling, B. and Van Keilegom, I. (2017). Goodness-of-ﬁt tests in semiparametric transformationmodels using the integrated regression function.

Journal of Multivariate Analysis , 160:10–30.Cuesta-Albertos, J. A., del Barrio, E., Fraiman, R., and Matr´an, C. (2007). The random projec-tion method in goodness of ﬁt for functional data.

Computational Statistics & Data Analysis ,51(10):4814–4831.Cuesta-Albertos, J. A., Garc´ıa-Portugu´es, E., Febrero-Bande, M., and Gonz´alez-Manteiga, W. (2019).Goodness-of-ﬁt tests for the functional linear model based on randomly projected empirical pro-cesses.

Annals of Statistics , 47(1):439–467.DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under twoor more correlated receiver operating characteristic curves: a nonparametric approach.

Biometrics ,44:837–845.Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections.

Econo-metric Theory , 22(6):1030–1051.Fanjul-Hevia, A. and Gonz´alez-Manteiga, W. (2018). A comparative study of methods for testing theequality of two or more ROC curves.

Computational Statistics , 33:357–377.Fanjul-Hevia, A., Gonz´alez-Manteiga, W., and Pardo-Fern´andez, J. C. (2021). A non-parametric testfor comparing conditional ROC curves.

Computational Statistics & Data Analysis , 157:107146.Garc´ıa-Portugu´es, E., Gonz´alez-Manteiga, W., and Febrero-Bande, M. (2014). A goodness-of-ﬁt testfor the functional linear model with scalar response.

Journal of Computational and GraphicalStatistics , 23(3):761–778.Gonz´alez-Manteiga, W., Pardo-Fern´andez, J. C., and Van Keilegom, I. (2011). ROC curves in non-parametric location-scale regression models.

Scandinavian Journal of Statistics , 38(1):169–184.In´acio de Carvalho, V., Jara, A., Hanson, T. E., and de Carvalho, M. (2013). Bayesian nonparametricROC regression modeling.

Bayesian Analysis , 8(3):623–646.Mart´ınez-Camblor, P., Carleos, C., and Corral, N. (2013). General nonparametric ROC curve com-parison.

Journal of the Korean Statistical Society , 42(1):71–81.Mart´ınez-Camblor, P. and Corral, N. (2012). A general bootstrap algorithm for hypothesis testing.

Journal of Statistical Planning and Inference , 142(2):589–600.Pardo-Fern´andez, J. C., Rodr´ıguez- ´Alvarez, M. X., and Van Keilegom, I. (2014). A review on ROCcurves in the presence of covariates.

REVSTAT–Statistical Journal , 12(1):21–41.Patilea, V., S´anchez-Sellero, C., and Saumard, M. (2016). Testing the predictor eﬀect on a functionalresponse.

Journal of the American Statistical Association , 111(516):1684–1695.Pepe, M. S. (2003).

The Statistical Evaluation of Medical Tests for Classiﬁcation and Prediction .Oxford University Press, Oxford. 22odr´ıguez, A. and Mart´ınez, J. C. (2014). Bayesian semiparametric estimation of covariate-dependentROC curves.

Bioestatistics , 15(2):353–369.Rodr´ıguez- ´Alvarez, M. X., Roca-Pardi˜nas, J., and Cadarso-Su´arez, C. (2011). A new ﬂexible directROC regression model: Application to the detection of cardiovascular risk factors by anthropometricmeasures.

Computational Statistics & Data Analysis , 55(12):3257–3270.Rodr´ıguez- ´Alvarez, M. X., Roca-Pardi˜nas, J., Cadarso-Su´arez, C., and Tahoces, P. G. (2018).Bootstrap-based procedures for inference in nonparametric receiver-operating characteristic curveregression analysis.

Statistical Methods in Medical Research , 27(3):740–764.Vald´es, L., San-Jos´e, E., Ferreiro, L., Gonz´alez-Barcala, F.-J., Golpe, A., ´Alvarez-Doba˜no, J. M.,Toubes, M. E., Rodr´ıguez-N´u˜nez, N., R´abade, C., Lama, A., and Gude, F. (2013). Combining clin-ical and analytical parameters improves prediction of malignant pleural eﬀusion.

Lung , 191(6):633–643.Venkatraman, E. S. and Begg, C. B. (1996). A distribution-free procedure for comparing receiveroperating characteristic curves from a paired experiment.