[PDF] Analysis of ensemble feature selection for correlated high-dimensional RNA-Seq cancer data

Abstract

Discovery of diagnostic and prognostic molecular markers is important and actively pursued the research field in cancer research. For complex diseases, this process is often performed using Machine Learning. The current study compares two approaches for the discovery of relevant variables: by application of a single feature selection algorithm, versus by an ensemble of diverse algorithms. These approaches are used to identify variables that are relevant discerning of four cancer types using RNA-seq profiles from the Cancer Genome Atlas. The comparison is carried out in two directions: evaluating the predictive performance of models and monitoring the stability of selected variables. The most informative features are identified using a four feature selection algorithms, namely U-test, ReliefF, and two variants of the MDFS algorithm. Discerning normal and tumor tissues is performed using the Random Forest algorithm. The highest stability of the feature set was obtained when U-test was used. Unfortunately, models built on feature sets obtained from the ensemble of feature selection algorithms were no better than for models developed on feature sets obtained from individual algorithms. On the other hand, the feature selectors leading to the best classification results varied between data sets.

Full PDF

AAnalysis of ensemble feature selection forcorrelated high-dimensional RNA-Seq cancerdata

Aneta Polewko-Klim − − − and Witold RRudnicki , , − − − Institute of Informatics, University of Biaystok, Biaystok, Poland [email protected] Computational Center, University of Biaystok, Biaystok, Poland Interdisciplinary Centre for Mathematical and Computational Modelling,University of Warsaw, Warsaw, Poland

Abstract.

Discovery of diagnostic and prognostic molecular markers isimportant and actively pursued the research ﬁeld in cancer research. Forcomplex diseases, this process is often performed using Machine Learn-ing. The current study compares two approaches for the discovery ofrelevant variables: by application of a single feature selection algorithm,versus by an ensemble of diverse algorithms. These approaches are usedto identify variables that are relevant discerning of four cancer types us-ing RNA-seq proﬁles from the Cancer Genome Atlas. The comparison iscarried out in two directions: evaluating the predictive performance ofmodels and monitoring the stability of selected variables. The most infor-mative features are identiﬁed using a four feature selection algorithms,namely U-test, ReliefF, and two variants of the MDFS algorithm. Dis-cerning normal and tumor tissues is performed using the Random Forestalgorithm. The highest stability of the feature set was obtained when U-test was used. Unfortunately, models built on feature sets obtained fromthe ensemble of feature selection algorithms were no better than for mod-els developed on feature sets obtained from individual algorithms. On theother hand, the feature selectors leading to the best classiﬁcation resultsvaried between data sets.

Keywords:

Random forest · RNA · Feature selection · Ensemble learning

The high-throughput DNA sequencing techniques produce data with tens ofthousands probes and each of them could be potentially relevant for diagnostics,prognosis and therapeutics.Feature selection (FS) techniques are indispensable tools for ﬁltering outirrelevant variables and ranking the relevant ones in molecular biological inves-tigations [28], [15]. The choice of the FS method is very important for further a r X i v : . [ q - b i o . GN ] A p r A. Polewko-Klim & W. R. Rudnicki investigation because it greatly limits number of features under scrutiny, allow-ing to concentrate on most relevant ones. On the other hand, FS increases therisk of omitting biological important variables.FS methods are typically divided into three major groups, namely ﬁlters,wrappers, and embedded [1]. The bias in the ﬁltering FS methods does not cor-relate with the classiﬁcation algorithms, hence they generalise better than theother methods. Nevertheless, it is well known that individual feature selectionalgorithms are not robust with respect to ﬂuctuations in the input data [22]. Con-sequently, application of a single FS algorithm cannot ensure optimal modellingresults both in terms of predictive performance and stability. This is particularlyevident in the integrative analysis of high-dimensional *omics data [18].There are numerous FS algorithms that are based on diﬀerent principles andcan generate highly variable results for the same data set. The presence of highlycorrelated features may result in multiple equally optimal set of features andconsequently to the instability of FS method. [10] Such instability reduces theconﬁdence in selected features [22] and their usage as diagnostic or prognosticmarkers. This variability can be to some extent minimised by application ofensemble methods (EFS) that involve combination of diﬀerent selectors [1].

The ensemble FS can be broadly assigned to one of two classes: homogeneous (thesame base feature selector) and heterogeneous (diﬀerent feature selectors), [1].Regardless of the class, the output of ensemble FS is given either in a form of aﬁnal feature set or as a ranking of features. Therefore some papers focus on thecomparison of diﬀerent strategies for the ordering of these feature subsets [29].Other researchers are focused on the evaluation of ensembles. Two quantitiesof interest are the diversity [25] and stability of the feature selection process[20, 22]. And though various methods of feature selection have been developedfor high-dimensional data, such as high-throughput genomics data, it is still abig challenge to choose the appropriate method for this type of data [15, 30].The stability of FS algorithms for the classiﬁcation of this type of data hasbeen investigated for instance by Moulos et. al [20] and Dessi and Pes [6]. Itwas shown, that stability of ensemble feature selection increase only for theseFS methods that are intrinsically weak (in term of stability). Shahrjooihaghighiet. al [26] proposed an ensemble FS based on the fusion of ﬁve feature selectionmethods (rank product, fold change ratio, ABCR, t-test, and PLSDA) for moreeﬀective biomarker discovery. The methodology for comparing the outcomes ofdiﬀerent FS techniques is presented in [5].Current study is focused on developing and optimisation of a feature se-lection protocol aiming at identiﬁcation of biomarkers important for diagnosticof cancers using the results of high-throughput molecular biology experimentalmethods. It is based on ensemble of four diverse feature selection methods andapplication of classiﬁcation algorithm that is used to evaluate quality of the setof features. nsemble Feature Selection 3 – whether application of ensemble of FS methods gives more stable resultsthan individual algorithms; – what is the optimal number of variables for individual algorithms and forensemble; – whether models built using features returned by ensemble are better thanmodels built using the same number; of variables returned by individualalgorithms; – which feature selection algorithm returns best sets of variables?The main contributions of the current study are following: – we present a novel perspective of optimization and evaluation of the featureselection for correlated high-dimensional RNA-Seq cancer data; – we compare both the predictive performance of models and the stability ofselected feature sets in ensemble feature selection with that of individual FSalgorithms; – we show, that performance of feature selection methods vary between datasets even in for very similar data sets; – we propose to use an ensemble approach as a reference for selecting themethod that works best for a particular data set. Four data sets from The Cancer Genome Atlas database that contain RNA-sequencing data of tumor-adjacent normal tissues for various typed of cancerwere used. [3, 4, 8, 12, 14, 31] These data set all include a large number of highlycorrelated and potentially informative features [21]. The preprocesing of datainvolved standard steps for RNA-Seq data. First the log2 transformation wasperformed. Then features with zero and near zero (1%) variance across patientswere removed. After preprocessing the datasets contain: – the primary BRCA dataset: 1205 samples (112 normal and 1093 tumor),20223 variables; – the LUAD dataset: 574 samples (59 normal and 515 tumor), 20172 variables; – the KIRC dataset: 605 samples (72 normal and 533 tumor), 20222 variables, – the HNSC dataset: 564 samples (44 normal and 520 tumor), 20235 variables.All data sets are imbalanced, they contain roughly ten times more cancer thannormal samples. A. Polewko-Klim & W. R. Rudnicki

The procedure outlined above was ap-plied to four ﬁlter FS methods, namely, Mann-Whitney U-test [17], ReliefF [11],[13] and MDFS [19, 23] in two variants: one-dimensional (MDFS-1D) and two-dimensional (MDFS-2D). Since only the ranking of variables is used in the pro-cedure outlined above no corrections of p-value due to multiple testing werenecessary.

U-test is a robust statistical ﬁlter that is routinely used in analysis of *omicsdata. It assigns probability to the hypothesis that two samples correspondingto two decision classes (normal and tumor tissue) are drawn from populationswith the same average value. The U-test use the p-value to select and rank thefeatures.

MDFS is a ﬁlter, which is based on information theoretical approach, and whichcan take synergistic eﬀects between variables into account [19, 23]. MDFS alsouses p-values of the test to rank features. In the current study, 1D and 2Dversion of MDFS algorithm were used, referred to as MDFS-1D and MDFS-2D,respectively.

ReliefF is a ﬁlter that computes ranking of importance for variables in theinformation system, based on the distances in the small-dimensional subspacesof the system [13]. Two variants of distance between nearest neighbours, namely,

ReliefFexpRank and

ReliefFbestK were tested for the current study. Slightlybetter results were obtained for the former, hence it was used in all subsequentwork. This R implementation of algorithm from

CORElearn package was used[24].

Filter-based feature selection.

The individual prediction models in k-foldcross-validation for each of four ﬁlter FS methods and data sets were constructed.The feature selection process and the learning process from RNA-Seq data setwere realized by using the Algorithm 1.This algorithm outlined above was repeated for several values of N and itwas repeated multiple times, to minimize the eﬀects of random ﬂuctuations. Thestability of feature selection was measured by comparing feature sets obtainedin multiple runs of the procedure. Ensemble feature selection.

The ensemble set of N-top relevant variableswas constructed by a union of top-N variables from each ﬁlter FS methods, asit is shown in Algorithm 2. The size of the set may vary between N and 4 N depending on the similarity of the sets returned by individual FS algorithms.All comparisons between feature sets obtained from ensemble and features setsobtained by individual ﬁlters were performed on sets with comparable numbersof total variables. For example if union of four top-5 sets resulted in a set with 20variables it was subsequently compared with other sets containing 20 variables. nsemble Feature Selection 5 Algorithm 1: FS ( l, f, N, D = { S , . . . , S k } ) the ﬁlter feature selectionalgorithm with Random forest classiﬁer input : Learning method l Feature selection method f Number of top features NData set D = { ( y i , x i ) } Mi =1 with V = { v , . . . , v p } featuresand with M instances, randomly partitioned into aboutequally-sized folds S j output: Ranked feature sets F j , j = 1 , . . . , k Performance estimation metric E foreach S j do Deﬁne the training set D \ j ( V ) ← D ( V ) \ S j (V) Perform feature selection on the training set R j ← f ( D \ j ( V )) Remove highly correlated features with ranked list R j of features Collect the N highest ranked feature set F j = { v , . . . , v n } with R j Build the model on the training set L j ← l ( D \ j ( F j )) Performance estimation: use the trained model L j on a test set S j E ← k ΣE j end Algorithm 2: EFS ( l, W = { F , . . . , F k } , D = { S , . . . , S k } ) the ensem-ble feature selection algorithm with Random forest classiﬁer input : Learning method l The 4 × k sets of top-N uncorrelated features with 4 ﬁlters W i F i,j , i = 1 , . . . , , j = 1 , . . . , k with top-N featuresData set D = { ( y m , x m ) } Mm =1 described with features F i,j and with M instances, randomly partitioned into aboutequally-sized folds S n output: Collected feature sets C p , p = 1 , . . . , k Performance estimation metric E p foreach S j do Collect the union of feature set C p = F j ∪ F j ∪ F j ∪ F j Deﬁne the training set D \ n ( C p ) ← D ( C p ) \ S n ( C p ) Build the model on the training set L n,p ← l ( D \ n ( C p )) Performance estimation: E n,p ← L n,p ( S n ( C p )) E p ← k ΣE n,p end A. Polewko-Klim & W. R. Rudnicki

All applied ﬁlters provide their own ranking of the features. The U-test andMDFS algorithms rank features by their statistical signiﬁcance and, the Reli-efF algorithm by their performance in classiﬁcation. Then the joint set of mostimportant variables is created as a union of top-N sets from individual rank-ings. The ranking within the combined set was not necessary and it was neverperformed.Algorithm 2 was repeated several times for diﬀerent values of N as in thecase Algorithm 1. The stability of feature selection was also estimated. Classiﬁcation

The quality of the feature-set was evaluated by building a ma-chine learning model using selected features and measuring its quality. To thisend the Random Forest [2] algorithm was used. It has been shown that Ran-dom Forest is generally reliable algorithm, that works well out-of the box, rarelyfails and in most cases returns results that are very close to best achievable forgiven problem [7]. The quality of model was evaluated using area under ROCcurve (AUC). This measure is independent of the balance of classes in the dataand does not need any ﬁtting. The scheme of ensemble feature selection andsupervised classiﬁcation is presented in Figure 1.Fig. 1: Pipeline of the ensemble FS method. See notation in text.

Measuring stability of feature selection

The total stability of ﬁlter FSmethod is measured as the average of the pairwise similarity for all pairs ofthe most informative feature subsets ( s i , s j ) from n runs of a model in full k -fold cross-validation. To this end the Lustgarten’s stability measure (ASM) [16],which can be applied to sets of unequal sizes, was used. It is described by theformula: ASM = 2 c ( c − c − (cid:88) i =1 c (cid:88) j = i +1 (cid:18) | s i ∩ s j | − | s i | ∗ | s j | /mmin ( | s i | , | s j | ) − max (0 , | s i | + | s j | − m ) (cid:19) (1) nsemble Feature Selection 7 where: m is total feature number of dataset and c = n * k . Optimization of feature selection.

In the ﬁrst step four threshold lev-els for deﬁning highly correlated variables were examined to establish thresh-old leading to best results of classiﬁcation. Four thresholds levels were tested: | r | = { . , . , . , . } The subsequent analyses were performed for the optimalthreshold level.The following analyses were performed for each individual FS ﬁlter and forensemble FS ﬁlters: – how many uncorrelated variables should be included in the model to obtainbest classiﬁcation; – how stable is stability measure for top-N feature subsets; – whether adding the highly correlated variables to top-N variables inﬂuencespredictive power.Entire modelling protocol, including bot feature selection and model build-ing step was performed within k = 5 fold cross-validation and was repeated n = 30 times, independently for each FS method and data set. Within eachcross-validation iteration feature selection algorithm was performed once andthen models were trained for all feature set sizes N = { , , . . . , } .Analysis was performed using the R (version 3.5) [27] and R/Bioconductorpackages. [9] In the ﬁrst step, the impact of correlation between informative features on thepredictive power of RF model was examined. The results of this analysis aredisplayed in Figure 2. It can be seen that squares corresponding to correlationthreshold 0.7 in many cases fall bellow other lines on the AUC plots. Thereforethreshold for removal of highly correlated variables was set at Spearman’s cor-relation coeﬀcient r higher than 0.75. This value of coeﬃcient is applied in thesubsequent analysis. One may note, that MDFS-1D ﬁlter is the most robust withrespect to change in the feature level correlation among the applied FS methods.At the next stage of the analysis, the accuracy of models built using top-N features was examined, see Figure 3. The number of variables for ensemblemodel is obtained as the average number of variables in the union of top-Nvariables from all FS methods averaged over 150 cross-validated sets. Generally,the performance of the models is poor for the smallest sizes of variable setsbut increases rapidly with increasing number of variables, reaching plateau afterroughly 40 variables are included. However, there are notable exceptions. In A. Polewko-Klim & W. R. Rudnicki(a) U-testI . . . . . . . . . . . . . . . . . . . . . . . . (b) MDFS-1D . . . . . . . . . . . . . . . . . . . . . . . . (c) MDFS-2D . . . . . . . . . . . . . . . . . . . . . . . . (d) ReliefF . . . . . . . . . . . . . . . . . . . . . . . . II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fig. 2: AUC vs N-top biomarkers for diﬀerent value of Spearman’s rank corre-lation coeﬃcient. Results for BRCA, HNSC, KIRC and LUAD data sets aredisplayed in rows I, II, III, and IV, respectively.particular for BRCA the best model was obtained with 20 variables returned byU-test. Even stronger eﬀect was obtained for KIRC data set. Here the MDFS-2Dfeature selection leads to clearly best results at 25 variables, whereas the bestresults for other ﬁlters are obtained with 15 variables.Relative performance of models developed using diﬀerent FS algorithms varysigniﬁcantly between data sets. For example, the MDFS-2D is clearly the bestfeature selector for KIRC, and that strongly suggest that non-trivial synergiesbetween variables are present in this data set. MDFS-1D is best feature selectorfor HNSC data set, U-test is best for BRCA, both algorithms are similarly goodfor LUAD.In all cases, the AUC values of models built using variables returned by anensemble of FS algorithms are comparable to individual models built with asimilar number of variables, the AUC curves of the ensemble models (full circlepoints) are located roughly in the middle of other models as shown in Fig. 3.Only for BRCA data set for the number of variables larger than 40 theperformance of model built on ensemble variables is comparable with the bestmodel for the individual data set, which in this case is a model built usingvariables from U-test. nsemble Feature Selection 9(a) BRCA . . . l l l l l l l l l l l l l l l l l l l l . . . . . . . . . l l l l l l . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (b) HNSC . . . l l l l l l l l l l l l l l l l l l l l . . . . . . . . . l l l l l l l . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (c) KIRC . . . l l l l l l l l l l l l l l l l l l l l . . . . . . . . . l l l l l l l . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (d) LUAD . . . l l l l l l l l l l l l l l l l l l l l . . . . . . . . . l l l l l l . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble

Fig. 3: AUC for models built using top-N variables.In the next step the eﬀect of adding back redundant variables was exam-ined. To this end the RF models were built for the sets of variables consistingof uncorrelated top-N variables and all informative variables highly correlatedwith them that were previously removed from feature rank list. The results aredisplayed in Figure 4. Clearly adding redundant variables to the main feature setdoes not improve classiﬁcation results in most cases. An exception are modelsbuilt using variables obtained with the MDFS-2D method for BRCA, HNSC andKIRC data. This eﬀect may arise due to inclusion of correlated variables whichinteract synergistically with other variables in a slightly diﬀerent way than thosepreviously included. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (b) HNSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (c) KIRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble (d) LUAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ll U−testMDFS1DMDFS2DReliefFEnsamble

Fig. 4: AUC for models built using top-N variables. Solid lines correspond tomodels built using top-N uncorrelated variables. Dashed lines correspond tomodels built using top-N uncorrelated variables and variables correlated withthem.

Often the important property of a feature selection method is stability or robust-ness of the selected features to perturbations in the data. This is particularlyimportant for identiﬁcation of prognostic or diagnostic markers. Therefore thesensitivity of feature selection algorithms to variations in the training sets thatarise in the cross-validation were examined. The similarity between 150 featuresubsets obtained in 150 iterations of cross-validation were measured using theLustgarten’s index ASM, see Figure 5. The highest stability was obtained forvariables selected with U-test. For this FS method the value of the ASM indexvaries between 0.7 and 0.8. The remaining FS methods are much less stable, nsemble Feature Selection 11(a) BRCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U−test MDFS1D MDFS2D ReliefF Ensamble (b) HNSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U−test MDFS1D MDFS2D ReliefF Ensamble (c) KIRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U−test MDFS1D MDFS2D ReliefF Ensamble (d) LUAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U−test MDFS1D MDFS2D ReliefF Ensamble

Fig. 5: Clock-wise the average similarity (ASM) between 150 feature subsets fortop-N variables. Dotted lines correspond to sets consisting from top-N variables.Solid lines correspond to sets consisting from top-N variables and variables highlycorrelated with them.with least stable MDFS-2D for which the ASM index is generally below 0.2 andmost stable ReliefF for which ASM index varies between 0.3 and 0.5. The diﬀer-ence in stability between algorithms is due to the diﬀerences in approach usedby the algorithm. The U-test is a deterministic algorithm, for which diﬀerencesarise exclusively due to variation of sample composition. On the other hand allother remaining algorithms rely on randomisation, hence increased variance canbe expected. In most cases the stability increases with increasing number ofvariables. The notable exception is the MDFS-2D algorithm for KIRC data set,where relatively high stability (ASM > of a small core of most relevant variables, that must be present in nearly all casesand that strongly contribute to classiﬁcation. This small core is augmented by adiverse group of loosely correlated relevant but redundant variables. Finally, inmost cases adding redundant variables increases the stability of feature subsets,but the diﬀerence is small. The training time does not depend on the type of molecular data, such as mi-croarray gene expression data or DNA copy number data. Execution time of thetask depends on the size of the dataset, the number of training iterations, thefeature selection algorithm, as well as the CPU model or the GPU model. Themost time consuming individual step of the algorithm is feature selection, themodel building with Random Forest is relatively quick. However, 150 distinctRandom Forest were built using the results of the same feature selection step.Therefore total time of both components was similar. The example executiontimes for a single iteration of algorithm for KIRC data set are presented in theTable 1. Among feature selection algorithms used in the study, the ReliefF is byfar the most time-consuming.Table 1: Execution times for a single iteration of the algorithm for the KIRC dataset. Computations performed on a CPU Intel Xeon Processor E5-2650v2. TheMDFS-2D algorithm was executed using a GPU-accelerated version on NVIDIATesla K80 co-processor.

U-test MDFS-1D MDFS-2D ReliefF Ensemble RF × ×

100 Total00m:37s 00m:04s 00m:03s 05m:41s 05m:54s 00m:03s 05m:19s 10m:13s

The single run of the algorithm involved calling four FS methods, removingcorrelated features, producing a ranking of the features, and calling RF classica-tion algorithm 150 times (5 feature sets 20 sizes of feature set). The algorithmwas executed 150 times, computations for one data set took about 25 hours ofCPU time.

The current study demonstrates that relying on a single FS algorithm is notoptimal. Diﬀerent FS algorithms are best suited for identiﬁcation of the mostrelevant feature sets in various data sets. Combining variables from multipleFS algorithms into a single feature set does not improve the performance incomparison with an equally numerous feature set generated by the individualalgorithm that is best suited for the particular data set. On the other hand,application of multiple algorithms increases the chances of identifying the bestFS algorithm for the problem under scrutiny. In particular, application of the nsemble Feature Selection 13

FS algorithms that can detect synergies in the data can signiﬁcantly improvethe quality of machine learning models.Interestingly, the stability of a FS algorithm is not required for buildinga highly predictive machine learning models. This is possible, since biologicalsystems often contain multiple informative variables. Therefore, useful modelscan be obtained using very diverse combinations of predictive variables.

Notes

Acknowledgements

This work was supported by the National Science Centre, Poland in frame ofgrant Miniatura 2 No. 2018/02/X/ST6/02571

References

1. Bol´on-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: A reviewand future trends. Information Fusion , 1–12 (2019)2. Breiman, L.: Random forests. Machine Learning (5), 5–32 (2001)3. Ciriello, G., Gatza, M.L., Beck, A.H., Wilkerson, M.D.e.a.: Comprehensive molec-ular portraits of invasive lobular breast cancer. Cell (2), P506–519 (2015)4. Collisson, E., Campbell, J., Brooks, A., et al.: Comprehensive molecular proﬁlingof lung adenocarcinoma. Nature , 543–550 (2014)5. Dess´ı, N., Pascariello, E., Pes, B.: A comparative analysis of biomarker selectiontechniques. BioMed Research International , 3876736. Dess´ı, N., Pes, B.: Stability in biomarker discovery: Does ensemble feature selectionreally help? In: Current Approaches in Applied Artiﬁcial Intelligence, IEA/AIE2015. Lecture Notes in Computer Science. vol. 9101, pp. 191–200. Springer (2015)7. Fern´andez-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hun-dreds of classiﬁers to solve real world classiﬁcation problems? Journal of MachineLearning Research , 3133–3181 (2014)8. Hammerman, P., Lawrence, M., Voet, D., et al.: Comprehensive genomic charac-terization of squamous cell lung cancers. Nature , 519–525 (2012)9. Huber, W., Carey, V., Gentleman, R., Anders, S., et al.: Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods (2), 115–121(2015)10. Kamkar, I., Gupta, S.K., Phung, D., Venkatesh, S.: Exploiting feature relationshipstowards stable feature selection. In: 2015 IEEE International Conference on DataScience and Advanced Analytics (DSAA). pp. 1–10 (2015)11. Kira, K., Rendell, L.: The feature selection problem: Traditional methods and anew algorithm. AAAI , 129–134 (1992)12. Koboldt, D., Fulton, R., McLellan, M., et al.: Comprehensive molecular portraitsof human breast tumours. Nature , 61–70 (2014)13. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: MachineLearning: ECML-94. pp. 171–182. ECML (1994)14. Lawrence, M., Sougnez, C., Lichtenstein, L., et. al: Comprehensive genomic char-acterization of head and neck squamous cell carcinomas. Nature , 576–582(2015)4 A. Polewko-Klim & W. R. Rudnicki15. Liang, S., Ma, A., Yang, S., Wang, Y., Ma, Q.: A review of matched-pairs featureselection methods for gene expression data analysis. Computational and StructuralBiotechnology Journal , 8897 (2018)16. Lustgarten, J.L., Gopalakrishnan, V., Visweswaran, S.: Measuring stability of fea-ture selection in biomedical datasets. In: AMIA Annual Symposium Proceedings.p. 406410. AMIA (2009)17. Mann, H., Whitney, D.: Controlling the false discovery rate: A practical and pow-erful approach to multiple testing. Annals of Mathematical Statistics (1), 50–60(1947)18. Meng, C., Zeleznik, O.A., Thallinger, G.G., et. al: Dimension reduction techniquesfor the integrative analysis of multi-omics data. Brieﬁngs in Bioinformatics (4),628–641 (2016)19. Mnich, K., Rudnicki, W.: All-relevant feature selection using multidimensional ﬁl-ters with exhaustive search. Information Sciences , 277–297 (2020)20. Moulos, P., Kanaris, I., Bontempi, G.: Stability of feature selection algorithms forclassiﬁcation in high-throughput genomics datasets. In: 13th IEEE InternationalConference on BioInformatics and BioEngineering. pp. 1–4 (2013)21. Peng, L., Bian, X.W., Li, D.K., et. al: Large-scale RNA-Seq Transcriptome Analysisof 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types.Scientiﬁc Reports (1), 1–18 (2015)22. Pes, B.: Ensemble feature selection for high-dimensional data: a stability analysisacross multiple domains. Neural Computing and Applications pp. 1–23 (2019)23. Piliszek, R., Mnich, K., Migacz, S., et. al: MDFS: Multidimensional feature selec-tion in R. The R Journal (1) (2019)24. Robnik-Sikonja, M., Savicky, P.: CORElearn: Classiﬁcation, Regression and Fea-ture Evaluation, R package version 1.54.1 (2018), https://CRAN.R-project.org/package=CORElearn25. Seijo-Pardo, B., Bol´on-Canedo, V., Alonso-Betanzos, A.: Testing diﬀerent ensem-ble conﬁgurations for feature selection. Neural Processing Letters (2), 32–3930. Wenric, S., Shemirani, R.: Using supervised learning methods for gene selection inRNA-seq case-control studies. Frontiers in Genetics (4), 301–312 (2018)31. Zhou, Y., Zhou, B., Pache, L., Chang, M., et al.: Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communica-tions10