[PDF] A summary of the prevalence of Genetic Algorithms in Bioinformatics from 2015 onwards

Abstract

In recent years, machine learning has seen an increasing presencein a large variety of fields, especially in health care and bioinformatics.More specifically, the field where machine learning algorithms have found most applications is Genetic Algorithms.The objective of this paper is to conduct a survey of articles published from 2015 onwards that deal with Genetic Algorithms(GA) and how they are used in this http URL achieve the objective, a scoping review was conducted that utilized Google Scholar alongside Publish or Perish and the Scimago Journal & CountryRank to search for respectable sources. Upon analyzing 31 articles from the field of bioinformatics, it became apparent that genetic algorithms rarely form a full application, instead they rely on other vital algorithms such as support vector machines.Indeed, support vector machines were the most prevalent algorithms used alongside genetic algorithms; however, while the usage of such algorithms contributes to the heavy focus on accuracy by GA programs, it often sidelines computation times in the process. In fact, most applications employing GAs for classification and feature selectionare nearing or at 100% success rate, and the focus of future GA development should be directed elsewhere. Population-based searches, like GA, are often combined with other machine learning algorithms. In this scoping review, genetic algorithms combined with Support Vector Machines were found to perform best. The performance metric that was evaluated most often was accuracy. Measuring the accuracy avoids measuring the main weakness of GAs, which is computational time. The future of genetic algorithms could be open-ended evolutionary algorithms, which attempt to increase complexity and find diverse solutions, rather than optimize a fitness function and converge to a single best solution from the initial population of solutions.

Full PDF

SSwerhun et al.

RESEARCH

A summary of the prevalence of GeneticAlgorithms in Bioinformatics from 2015 onwards

Mekaal Swerhun * † , Jasmine Foley † , Brandon Mossop † and Vijay Mago * Correspondence:[email protected] of Computer Science,Lakehead University, 955 OliverRd, P7B 5E1 Thunder Bay,CanadaFull list of author information isavailable at the end of the article † Equal contributor

AbstractBackground:

In recent years, machine learning has seen an increasing presencein a large variety of ﬁelds, especially in health care and bioinformatics. Morespeciﬁcally, the ﬁeld where machine learning algorithms have found mostapplication is Genetic Algorithms.

Objective:

The objective of this paper is to conduct a survey of articlespublished from 2015 onwards that deal with Genetic Algorithms and how they areused in bioinformatics.

Methods:

To achieve the objective, a scoping review was conducted that utilizedGoogle Scholar alongside Publish or Perish and the Scimago Journal& CountryRank to search for respectable sources.

Results:

Upon analyzing 31 articles from the ﬁeld of bioinformatics, it becameapparent that genetic algorithms rarely form a full application, instead they relyon other vital algorithms such as support vector machines. Indeed, support vectormachines were the most prevalent algorithms used alongside genetic algorithms(GA); however, while the usage of such algorithms contributes to the heavy focuson accuracy by GA programs, it often sidelines computation times in the process.In fact, most applications employing GAs for classiﬁcation and feature selectionare nearing or at 100% success rate, and the focus of future GA developmentshould be directed elsewhere.

Conclusion:

Population-based searches, like GA, are often combined with othermachine learning algorithms. In this scoping review, genetic algorithms combinedwith Support Vector Machines were found to perform best. The performancemetric that was evaluated most often was the accuracy. Measuring the accuracyavoids measuring the main weakness of GAs, which is computational time. Thefuture of genetic algorithms could be “open-ended” evolutionary algorithms,which attempt to increase complexity and ﬁnd diverse solutions, rather thanoptimize a ﬁtness function and converge to a single “best” solution from theinitial population of solutions.

Keywords:

Genetic Algorithm; Bioinformatics; Machine Learning; FeatureSelection; Datasets

Genetic Algorithms

Genetic Algorithms (GA) belong to a larger class of evolutionary algorithms. Aparallel search heuristic algorithm inspired by Charles Darwin’s theory of naturalselection is modeled by the guiding principle of

Survival of the Fittest [1]. Thealgorithm selects the ﬁttest individuals of the population with the aim of producingoﬀspring for the next generation that inherit the optimal characteristics of the a r X i v : . [ c s . N E ] A ug werhun et al. Page 2 of 20 parents. This process continues to iterate developing sequential populations, untilit converges on a generation with the ﬁttest individuals [2]. Similarly, GA solvesproblems by optimizing a single criterion, known as a ﬁtness function. The ﬁtnessfunction estimates the importance by assigning a value to each chromosome thatrelates to its ability to solving the problem [2, 3]. A chromosome could be an arrayof numbers, a binary string, or a list of instances in a database, all relating to anddepending on the problem. Each individual that forms the population, representsdiﬀerent possible solutions. Chromosomes deemed ﬁtter have an increased likelihoodof being used in the following generation. The individuals proceed through a processof evolution, which are is of the principles of mutation, selection, and crossover allimpacting the ﬁtness value [2, 4]. The most noteworthy beneﬁt about GA is itsability to search sophisticated and massive spaces proﬁciently and identify nearoptimal solutions rapidly [3]. Often in order to achieve better performance, GA-based selected features are applied as input to classiﬁers [2].

Popularity of Genetic Algorithms in Biomedical Applications

While the properties accredited to GAs make them desirable to a variety of ﬁelds,their use in biomedical applications is far-ranging and well-established as shall bemade evident in this article. In the medical ﬁeld GA-based solutions have been posedfor a variety of problems including symptom and ailment classiﬁcation [3, 4, 5], vi-sualization [6] as well as identiﬁcation and diagnoses of diseases [2, 7]. GA-basedsolutions have also increasingly been used at the molecular level in tasks such ashandling and predicting transposon-derived piRNAs [8]. Yet the importance of GA-based solutions in the medical ﬁeld is not limited to solving problems on the micro-scopic scope as applications have been developed to handle larger scale infrastruc-ture and logistics that can be vital for entire health care systems [9, 10]. Amongthe most frequent uses of GAs however, is their role in feature selection where theyhelp to narrow down the possible features so that a complementary algorithm canachieve far greater performance [7, 11, 12, 13, 14]. At times, a GA-based solutionmay involve the GA ﬁll multiple of the above mentioned roles such as ﬁnding usagein both feature selection and classiﬁcation. Of course, other GA applications beyondwhat has already been mentioned exist; however, the applications mentioned hereshow just how important GA has become in the biomedical ﬁeld and these are themost common uses found in the papers surveyed in this article.

Key Findings of the Survey

While conducting research, a few key points have been discerned which frequentlyappeared in the selected papers for this survey article. These have been summarizedbelow. • Applications often use GA alongside other machine learning algorithms, mostcommonly classiﬁcation algorithms. • Among classiﬁcation engines used in conjunction with GA, Support VectorMachines (SVM) is the top performing. • Accuracy is one of the prime evaluation metrics focused on; while computa-tion time is often ignored or under-performing for usage in live biomedicalsituations. werhun et al.

Page 3 of 20 • In general, applications employing GAs for classiﬁcation, and feature selectionare reaching close to perfect and at times even perfect results.

Structure of the Paper

The following sections of this survey article are organized as follows: Section 2focuses on the thirty-one papers surveyed for this article. This section ﬁrst discussesthe methodology explaining how the papers were selected before discussing thebiomedical issues the papers investigate. Section 2 concludes with a discussion ofthe common data sets and tools used within the papers. In section 3, the focus ison how the researchers evaluate their studies, with the various performance metricsused being examined and explained to discern the advantages and disadvantages ofprioritizing one metric over another. Next in section 4, this article brieﬂy discussesthe future of GA. The ﬁnal section concisely concludes the ﬁndings of this surveyarticle.

Paper Selection

The proposed searching procedure in this survey aims to outline a simple yet eﬀec-tive sequence of operations in order to identify and select high quality manuscriptspublished in journals. While utilizing Google Scholar and/or Publish or Perish [15],the ﬁrst step was to establish the date range of the journals published startingwith 2015 and proceeding onward. This survey focuses on the applications of GA,which yields a wide range of possibilities. Therefore, in order to narrow the scope,additional key search terms were needed. In step two, additional key terms, such asbiomedical/medicine, and machine learning, were used alongside the main searchterm. Once a paper was identiﬁed it was added to a list of prospective sources. Thequality of the paper was examined and identiﬁed in step three by utilizing ScimagoJournal & Country Rank (SJR)[16] to access the quality of the journal where thepaper has been published. Papers published in journals with a journal ranking ofQ2, Q3, and Q4 were immediately removed from the list, and papers publishedin journals with a journal ranking of Q1 at time of publication were kept. Oncethe paper met the quality criteria for its journal ranking, step four ensured if theGA has a dominant role or is used as a key element in the paper. If the paperdoes not have either, it was removed from the list. Papers that had GA serving adominant role or where GA was used as a key element were kept, further analyzed,and contributed to this survey. Therefore, each paper had to meet all of the aboverequirements set in place to be selected. The whole process is illustrated and canbe found as a ﬂowchart in Figure 1. As a result of this searching methodology, atotal of 31 papers were selected for this survey and can be found in Table 1.

Applications of GA in Bioinformatics

Using the described searching procedure above, Table 1 provides a summary con-taining key information on the papers selected for this survey. In addition to Table 1,Table 3 shows the extent to which results could be replicated to obtain similar ﬁnd-ings to those papers studied in this survey. Yet Table 3 also serves to highlighta concerning issue as it shows how few papers provide the necessary information werhun et al.

Page 4 of 20 needed in order for others to reproduce their results. All chosen papers discuss thepossible and proposed application in biomedical applications, and are limited toSJR Q1 rankings, from 2015 and later. Key ﬁndings included in Table 1 in additionto the biomedical application were examined, the use of GA was noted, and thebeneﬁts of the proposed application were identiﬁed. 19 of the 31 papers surveyedmention the GA playing a key role in feature selection. Feature selection is a datapre-processing technique that reduces the overall number of features by eliminatingredundant samples [1]. The task of feature selection is to extract those featuresthat are deemed the most informative and important in predicting the outcomefor an individual [2]. This technique is an essential step in reducing the dimen-sionality of the search space and the computational complexity. Alongside featureselection, GAs are commonly used in classiﬁcation programs. Just about half of thepapers surveyed, 16 out of the 31, use a form of classiﬁcation. Classiﬁcation aimsto predict outcomes associated with a particular individual given a feature vectordescribing that individual. GA provides an eﬃcient and robust feature selectionalgorithm that speeds up the learning process of classiﬁers and stabilizes the clas-siﬁcation accuracy. Within bioinformatics, feature selection and classiﬁcation bothserve vital roles and can often be found within the same program, with the GAselecting features that are then used by a separate algorithm to assign a label thatmay be a diagnosis of a general disease or even the identiﬁcation of symptoms. Inrecent years, GA-based applications have developed to not only identify ailments,but recommend what treatments should be used to combat an ailment that hasappeared in diﬀerent patients [5]. GA also has been utilized in non-standard im-plementations such as running multiple GA in parallel [11], or nested inside oneanother as in [7], which has allowed for the diagnosis and identiﬁcation of diﬀerentcancers biomarkers. Indeed, non-standard implementations have even allowed for ahybrid GA-based application to be created that can determine the person to receivethe highest quality of life improvement from a lung transplant, helping to ensurethat any unforeseen bias does not eﬀect the transplant [12]. Additionally, GAs havebeen used for imaging and visualizing applications both due to their importance infeature selection and their ability to combine representations of learned informationsuch as known shapes, and relative position into a single framework that can beused in three-dimensional segmentation [17]. Finally GAs have been employed tohandle logistics both in handling complex hospital supply chains [9] and in opti-mizing ambulance dispatches to non-emergency situations [10]. Therefore, it can beeasily seen that bioinformatics research entails many problems that can be solvedusing machine learning tasks, and that GA is well-suited for such tasks. Yet, it isimportant that research conducted in this area be highly accurate, eﬃcient, andreliable in order for the results to be meaningful. They need to be prompt and ableto withstand the volatile situations that can be found in this ﬁeld, especially sincesuch solutions are becoming prevalent in nearly every aspect of bioinformatics.

Datasets

In order to learn more about how the papers selected for this survey came to theirconclusions, a closer look was given to the data used and the sources of the data.Out of the 31 surveyed papers, not a single one used the exact same raw data. Threegeneral patterns emerge from the diversity of datasets. werhun et al.

Page 5 of 20

The most common method for data acquisition in the 31 papers was conductedby accessing digital repositories to ﬁnd datasets relevant for the topic of the paper.These repositories act as a tool, compiling datasets that are available to the publicand therefore allowing researchers to focus on their project immediately rather thanhaving to conduct a multitude of tests just to acquire data to use for testing. Someexamples of repositories seen in the surveyed papers are as follows. • UCSC Genome Browser used by both Li et al. (2016) and Tangherloni et al.(2019) provides access to assembled genomes including the human genome[8, 18]. • Gene Expression Omnibus used by Sayed et al. (2019) provides more special-ized data related to genomics and is itself part of the National Center forBiotechnology Information data resources [7]. • Protein Data Bank used by Moraes et al. (2017) provides data relating towide selection of proteins and related components [19].Besides acquiring data from public repositories, another method of data acquisitionemployed by some of the surveyed papers was requesting access to data that isgenerally kept private. Among the sources for this type of data, private databasescurated by institutions were the most common. It is important to note that notall required a paper’s authors to be a member of the institution as is the case inOztekin et al. (2018), who accessed their data from the United Network for OrganSharing [12]. In addition, some data sources originate from entities whose primaryconcern was not data curation, but who could grant access to records of theirregular functions. One instance of such data collection can be seen in the work ofFogue et al. (2016) who received their data from an Ambulance Company based inHusca, Spain [10]. The ﬁnal method of data acquisition employed was only used bya minority of the papers surveyed -creation of the data by the project members [20].This ﬁnal method although being necessary in cases where the data needed is notavailable does not ensure an unbiased result and would consume signiﬁcant timefor properly compiling the information. Indeed, it would appear to be that due tothese downsides, this method of data acquisition is far from favoured.Despite the prevalence of acquiring data from pre-existing sources, the raw dataacquired often has to go through preprocessing before it is used. What this entailscan be widely diﬀerent depending on the source of the data and its intended purpose;however, most commonly the goal is to narrow down the raw data into a set deemedusable for the project. Such a process may be necessary because in some cases a)the raw dataset does not have enough records, or b) not all records are complete,or c) records are not usable (too much noise) [13]. A summary of the datasets usedby the 31 surveyed papers and their sources can be found in Table 2.

Tools

In addition to looking at what datasets the surveyed papers use, this paper takesa look at the tools and additional machine learning algorithms employed alongsidethe GA, although a few papers rely solely on GA. Indeed, when looking at thesurveyed papers it would appear that GA-focused solutions beneﬁt the most whenthey are supported by complimentary tools and algorithms. The use of componentsis much like the datasets mentioned, where a wide variety was used in each study werhun et al.

Page 6 of 20 to achieve the goal of that particular paper. However, unlike the datasets a fewtools and additional machine learning algorithms were employed across multiplepapers fairly regularly. The full selection of tools and machine learning algorithmsemployed has been compiled in Table 4.Amongst the 31 surveyed articles, two tools proved to be the most prevalent. Theﬁrst of these is MATLAB, which is used in [6, 9, 11, 21, 17, 22, 23, 13]. The secondtool is Weka, which sees usage in [24, 2, 25, 21]. MATLAB is a fairly well-known andimportant tool in studies such as signal processing, data analytics, image processing,and machine learning, partially due to its versatility. In fact, even though all thesurveyed papers have a focus on GAs, the way that MATLAB is utilized variesfrom paper to paper. For instance Soufan et al. (2015) only makes limited use ofMATLAB to ensure fairness when evaluating programs [11]. P(cid:32)lawiak (2018) usesMATLAB alongside the library, LIBSVM, to implement their study [13].Weka is a more specialized tool that provides an environment for classiﬁcation,regression, clustering, and feature selection. It accomplishes this by aiding its usersin the extraction of information and helping them ﬁnd suitable algorithms for cre-ating accurate predictive model with that information [26]. Although Weka has afar smaller toolbox, it can be ideal for researchers working in bioinformatics dueto its focus. Indeed, both of these tools have proven beneﬁcial for a number of thesurveyed articles as shown by Hashem et al. (2017) who use both tools to performalgorithms such as Particle Swarm Optimization [21].Throughout the surveyed articles, additional machine learning algorithms are of-ten used alongside the GA, where they prevalently serve as classiﬁcation algorithms.The goal of such algorithms is to be able to predict successfully the correct outcomethat is associated with a particular occurrence after having received a selection offeatures that describe the occurrence [26]. A vast number of these algorithms areused in the articles surveyed including diﬀerent types of Neural Networks (NN), asseen in Table 4; however, the most common is Support Vector Machines (SVM).SVM are frequently used in biomedical applications, and this survey shows that theaddition of GA does not change this fact. One of the biggest appeal of SVM is theirnear perfect success rate and their perceived simplicity of simply assigning labelsto objects based on what side of a hyperplane they end up on [27]. Computationrequirements for the SVM scale quadratically, resulting in longer run times as datainputs increase [27]. This in itself is not necessarily a current negative; however, asapplications become more complex, the SVM quadratic run time growth should notbe ignored in future works employing it alongside GA.

A key step in the process of building a machine learning model is to estimateits performance on data that was not part of building the model. The data toevaluate the performance of the model is called the testing set, while the data thatis used to build the model is called the training set. A primary concern for anymachine learning prediction model is avoiding a model with either high bias orhigh variance. Bias is the error resulting from a wrong assumption. A model withhigh bias oversimpliﬁes. This is also known as underﬁtting. It results in a largeerror between the test set outcome value and the model prediction. Variance is werhun et al.

Page 7 of 20 the error from the model being overly sensitive to ﬂuctuations in the training set.High variance can cause an algorithm to model the noise in the data, which resultsin model overﬁtting. High variance decreases the amount of ﬂexibiliy, and reducesthe ability of the model to generalize to unseen instances. A visualization of thetrade-oﬀs made between bias and variance can be seen in Figure 2.The confusion matrix is a key concept related to the performance metrics of aclassiﬁer model. The confusion matrix is simply a square matrix that records thecounts of the true positive (TP), true negative (TN), false positive (FP), and falsenegative (FN) predictions of a classiﬁer. The true positive rate (TPR) is calculatedas the number of true positives divided by the sum of the false positives and thetrue negatives,

T P R = T PF N + T P (1)The false positive rate (FPR) is calculated as the number of false positives dividedby the sum of the false positives and the true positives,

F P R = F PF P + T N (2)A dimension of the confusion matrix represents the instances in a predicted classwhile the other dimension represents the instances in the actual class (ground truth).If the predicted class is the same as the ground truth, then the confusion matrixwill label this sample as true, otherwise false [28].The precision is deﬁned as the ratio of the true positives to the sum of the truepositives and the false positives,

P recision = T PT P + F P (3)The recall is deﬁned as the ratio of the true positives to the sum of the true positivesand the false negatives,

Recall = T PT P + F N (4)The F score is deﬁned as the two divided by the inverse of the precision, plus theinverse of the recall, F = 2 recall -1 + precision -1 (5)Receiver Operating Characteristic (ROC) graphs are useful tools to select mod-els for classiﬁcation based on performance with respect to the false positive rate(FPR) and true positive rate (TPR), which are computed by shifting the decisionthreshold of the classiﬁer. The diagonal of an ROC graph presents random guessing(50 percent probability of being correct), and classiﬁcation models that fall belowthis value are considered worse than random guessing. A perfect classiﬁer would werhun et al. Page 8 of 20 fall into the top left corner of the graph with a TPR of 1 and an FPR of 0. Basedon the ROC curve, the area under the curve can be computed to characterize theperformance of the classiﬁcation model [28].The prediction error and accuracy provide general information regarding the per-formance of the prediction model. The error can be understood as the sum of thefalse predictions divided by the total number of predictions,

Error = F P + F NF P + F N + T P + T N (6)The accuracy is calculated as the sum of the correct predictions divided by the totalnumber of predictions. More precisely, accuracy is the ratio of the number of correctpredictions (the sum of the true positives and true negatives) to the total numberof predictions from the model (the sum of the true positives, true negatives, falsepositives, false negatives),

Accuracy = T P + T NF P + F N + T P + T N (7)There are many methods to evaluate the performance of a model. Each perfor-mance metric has certain advantages and disadvantages based on the data, such asthe number of classes in the prediction variable, the number of instances of eachclass, or how imbalanced the outcome class happens to be, and the cost of misclas-sifying a prediction. In medicine, misclassiﬁcation can be deadly. The discussionrelating to advantages and disadvantages will focus on the accuracy, as it was themost common performance metric. Some attention will be also be paid to the truepositive rate and false positive rate, as it oﬀers a more nuanced metric, especially inrelation to biomedical applications. What metrics are used by each surveyed papercan be found in Table 5.

Advantages

Accuracy is a simple performance metric to compute, and the most intuitive eval-uation method. It is the most common metric, so it is often used to compare withother models in the literature.The true positive rate and false positive rate are especially usefully for imbalancedclass problems. For example, in tumour diagnosis, the detection of malignant tu-mours is the primary concern since missing the potential presence of a tumour couldhave serious implications, like death. However, it is also important to decrease thenumber of benign tumours that are incorrectly classiﬁed as malignant (false posi-tive) to not unnecessarily concern a patient. The true positive rate provides usefulinformation about the fraction of positive (or relevant) samples that were correctlyidentiﬁed out of the total number of positives. In medicine, the samples tend tobe imbalanced, so the true positive rate and false positive rate will be the mostappropriate performance metric.An ROC graph is a useful tool to visualize the true positive rate and false pos-itive rate. Finding the area under the curve is a simple method to determine theperformance of the model. werhun et al.

Page 9 of 20

Disadvantages

Accuracy was the primary performance metric used in this scoping review. However,it has some limitations that are important to consider, especially in the medicaldomain. It is only a reliable performance metric when the number of samples areequal for each class (no imbalance). For example, consider a case where 99 percentof samples belong to class A and only 1 percent to class B. Then it is trivial for themodel to obtain 99 percent accuracy by simply predicting every training instanceto belong to class A. If the identical model is evaluated on a diﬀerent test set thenthe accuracy would be signiﬁcantly reduced. For example, if the test set has 60percent of its samples from class A and 40 percent of its samples from class B, thenthe accuracy would plummet to 60 percent. This examples illuminates the potentialfor the accuracy metric to be misleading, which can lead to assuming the modelis better than reality. In the medical ﬁeld, the price of misclassifying a sample hasthe potential to be extremely costly. If the model is attempting to predict a rarebut fatal disease, the cost of failing to diagnose the disease of a sick person is muchgreater than the cost of sending a healthy person to do more tests.The papers mostly failed to evaluate a major drawback of GA, which is the amountof computation it requires. In traditional machine learning, such as neural networks,the model improves as the amount of training data increases. However, the perfor-mance of a GA might degrade before it improves. GAs also keep a population ofsolutions, instead of a single solution. These requirements of GA are computation-ally costly, and should be evaluated as a performance metric whenever consideringa genetic algorithm as a learning algorithm[14].

Some of the founders of computer science, such as Alan Turing, John von Neu-mann, Norbert Wiener, were motivated by the idea of providing computer pro-grams with operations like self-replication and adaption[14]. These motivations havebeen explored in various areas of research such as evolution strategies, evolutionaryprogramming, and genetic algorithms. These eﬀorts grew into the ﬁeld known asevolutionary computation, of which GAs are the most prominent example.The GAs are a powerful tool for solving problems and for simulating natural sys-tems in a wide range of scientiﬁc ﬁelds. GAs are promising approaches for solvingchallenging technological problems. GAs are an important area of research in ma-chine learning, especially working together with other approaches such as neural net-works. GAs are part of a movement in computer science that explores biologically-inspired approaches to computation. These systems are adaptable, parallel, able tohandle complexity, able to learn, and even be creative [14]. Furthermore, the com-puting resources that are currently widely available and allow for unprecedentedparallel processing are well-suited to implementing GA.The GA attempt to model natural evolution, which is done with operators such asadaption, selection, crossover, and mutation. This approach retains a population ofsolutions that converges on the objective, which is a form of black-box optimization.However, natural evolution is a process that ceaselessly creates greater complexityand novelty, rather than a process that converges on a single solution. In fact,evolution on Earth can be thought of as a single run of a single algorithm that werhun et al.

Page 10 of 20 invented all of nature [29]. Another term for the notion of a single process inventingmassive complexity for near-eternity is “open-ended.” Open-endedness has provenimpossible to program. Presently, no such algorithm exists that has the endless,proliﬁc creative potential of natural evolution.Currently, most evolutionary algorithms (EAs) converge to a solution, based onthe ﬁtness function that is chosen. The ﬁtness function, which tends to select the“best” performing individuals in the population of solutions, acts as an objectivethat is optimized. The optimization consists of selecting more of the ﬁtter solutionson average, while only selecting a minority of other “less” ﬁt solutions to main-tain some diversity. However, the divergence of natural evolution and the “open-endedness” is not implemented with this approach. Natural evolution is not struc-tured like an optimization algorithm as there is no explicit objective, and organismsare often rewarded for being diﬀerent rather than just better. For example, organ-isms that are suﬃciently diﬀerent from their predecessors can establish a new nichein which they can beneﬁt from reduced competition and are therefore more likely tosurvive [30][31]. In opposition to optimization algorithms that converge to a single“best” solution, natural evolution has a tendency toward divergence. This alterna-tive perspective in evolutionary computation in that evolution is an algorithm fordiversiﬁcation rather than optimization [32].An EA inspired by this approach is called novelty search (NS), which searchesfor behavioural diversity without any explicit objective. In some domains, NS ﬁndsthe global optimum even when objective-based solutions consistently fail [32]. Analgorithm that avoids an objective function is able to ﬁnd solutions that are notpossible if attempting to solve them directly with objectives. This insight has im-plications beyond GA, such as in the pursuit of “human-level” AI, since it captureswhat many consider our most human-like quality–creativity.A potentially fruitful application for open-ended evolutionary algorithms is inany sort of creative design. This includes the design of cars, art, medicines, robots,video games, and so on. Open-ended evolutionary algorithms oﬀer the potential togenerate endless alternatives in almost any conceivable design domain, in the sameway that natural evolution generated endless solutions to the problems of survivaland reproduction in nature [29].There are many potential biomedical applications for open-ended, evolutionaryalgorithms. One would be the development of vaccines. The open-ended algorithmcould search the space of possibilities while simultaneously ﬁnding solutions thatwork in each environment. Provided some initial set of rules that describe whatis possible biologically, the algorithm could continuously explore this space of pos-sibilities, and report any number of potentially useful ﬁndings to researchers toinvestigate further.

Population-based search like GAs are often combined with other machine learningalgorithms. In classiﬁcation problems, GA serves as a population of solutions, ratherthan a single solution. In this scoping review, GAs combined with Support VectorMachines were found to perform best. The performance metric that was evaluatedmost often was the accuracy. This avoids measuring the main weakness of GA, which werhun et al.

Page 11 of 20 is computational time. In an attempt to better utilize the power of GAs, the futureof GAs could be “open-ended” evolutionary algorithms, which attempt to increasecomplexity and ﬁnd diverse solutions, rather than optimize a ﬁtness function to ﬁnda single “best” solution. This approach attempts to model the most powerful featureof natural evolution—its endless ability to create novel and creative solutions to ﬁtan environment that is constantly changing.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

The ﬁrst three authors contributed equally for the development of this research article. The last author providedsupervision and guidance.

Acknowledgements

The authors would like to thank the infrastructure support provided by the CASES Building at Lakehead University.

References

1. Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expressiondata classiﬁcation. Neurocomputing , 56–62 (2017)2. Aliˇckovi´c, E., Subasi, A.: Breast cancer diagnosis using ga feature selection and rotation forest. NeuralComputing and Applications (4), 753–763 (2017)3. Salem, H., Attiya, G., El-Fishawy, N.: Classiﬁcation of human cancer diseases by gene expression proﬁles.Applied Soft Computing , 124–134 (2017)4. Subasi, A., Kevric, J., Canbaz, M.A.: Epileptic seizure detection using hybrid machine learning methods. NeuralComputing and Applications (1), 317–325 (2019)5. Zhang, P., West, N.P., Chen, P.-Y., Thang, M.W., Price, G., Cripps, A.W., Cox, A.J.: Selection of microbialbiomarkers with genetic algorithm and principal component analysis. BMC bioinformatics (6), 413 (2019)6. Mohammed, M.A., Ghani, M.K.A., Arunkumar, N.a., Hamed, R.I., Abdullah, M.K., Burhanuddin, M.: A realtime computer aided object detection of nasopharyngeal carcinoma using genetic algorithm and artiﬁcial neuralnetwork based on haar feature fear. Future Generation Computer Systems , 539–547 (2018)7. Sayed, S., Nassef, M., Badr, A., Farag, I.: A nested genetic algorithm for feature selection in high-dimensionalcancer microarray datasets. Expert Systems with Applications , 233–243 (2019)8. Li, D., Luo, L., Zhang, W., Liu, F., Luo, F.: A genetic algorithm-based weighted ensemble method forpredicting transposon-derived pirnas. BMC bioinformatics (1), 329 (2016)9. Khanduzi, R., Sangaiah, A.K.: A fast genetic algorithm for a critical protection problem in biomedical supplychain networks. Applied Soft Computing , 162–179 (2019)10. Fogue, M., Sanguesa, J.A., Naranjo, F., Gallardo, J., Garrido, P., Martinez, F.J.: Non-emergency patienttransport services planning through genetic algorithms. Expert Systems with Applications , 262–271 (2016)11. Soufan, O., Kleftogiannis, D., Kalnis, P., Bajic, V.B.: Dwfs: a wrapper feature selection tool based on a parallelgenetic algorithm. PloS one (2) (2015)12. Oztekin, A., Al-Ebbini, L., Sevkli, Z., Delen, D.: A decision analytic approach to predicting quality of life forlung transplant recipients: A hybrid genetic algorithms-based methodology. European Journal of OperationalResearch (2), 639–651 (2018)13. P(cid:32)lawiak, P.: Novel methodology of cardiac health recognition based on ecg signals and evolutionary-neuralsystem. Expert Systems with Applications , 334–349 (2018)14. Mitchell, M.: An Introduction to Genetic Algorithms. MIT press, ??? (1998)15. Publish or Perish. https://harzing.com/resources/publish-or-perish

16. Scimago Journal & Country Rank.

17. Ghosh, P., Mitchell, M., Tanyi, J.A., Hung, A.Y.: Incorporating priors for medical image segmentation using agenetic algorithm. Neurocomputing , 181–194 (2016)18. Tangherloni, A., Spolaor, S., Rundo, L., Nobile, M.S., Cazzaniga, P., Mauri, G., Li`o, P., Merelli, I., Besozzi, D.:Genhap: a novel computational method based on genetic algorithms for haplotype assembly. BMCbioinformatics (4), 172 (2019)19. Moraes, J.P., Pappa, G.L., Pires, D.E., Izidoro, S.C.: Gass-web: a web server for identifying enzyme active sitesbased on genetic algorithms. Nucleic acids research (W1), 315–319 (2017)20. Liu, P., El Basha, M.D., Li, Y., Xiao, Y., Sanelli, P.C., Fang, R.: Deep evolutionary networks with expeditedgenetic algorithms for medical image denoising. Medical image analysis , 306–315 (2019)21. Hashem, S., Esmat, G., Elakel, W., Habashy, S., Raouf, S.A., Elhefnawi, M., Eladawy, M.I., ElHefnawi, M.:Comparison of machine learning approaches for prediction of advanced liver ﬁbrosis in chronic hepatitis cpatients. IEEE/ACM transactions on computational biology and bioinformatics (3), 861–868 (2017)22. Hemanth, D.J., Anitha, J.: Modiﬁed genetic algorithm approaches for classiﬁcation of abnormal magneticresonance brain tumour images. Applied Soft Computing , 21–28 (2019)23. Tan, M.S., Tan, J.W., Chang, S.-W., Yap, H.J., Kareem, S.A., Zain, R.B.: A genetic programming approach tooral cancer prognosis. PeerJ , 2482 (2016)24. Al-Rajab, M., Lu, J., Xu, Q.: Examining applying high performance genetic data feature selection andclassiﬁcation algorithms for colon cancer diagnosis. Computer methods and programs in biomedicine ,11–24 (2017) werhun et al. Page 12 of 20

25. Gangavarapu, T., Patil, N.: A novel ﬁlter–wrapper hybrid greedy ensemble approach optimized using thegenetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets. Applied SoftComputing , 105538 (2019)26. Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using weka.Bioinformatics (15), 2479–2481 (2004)27. Noble, W.S.: What is a support vector machine? Nature biotechnology (12), 1565–1567 (2006)28. Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge DiscoveryHandbook, pp. 875–886. Springer, ??? (2009)29. Lehman, J., Stanley, K.O.: Abandoning objectives: Evolution through the search for novelty alone. Evolutionarycomputation (2), 189–223 (2011)30. Kirschner, M., Gerhart, J.: Evolvability. Proceedings of the National Academy of Sciences (15), 8420–8427(1998)31. Lehman, J., Stanley, K.O.: Evolvability is inevitable: Increasing evolvability without the pressure to adapt. PloSone (4) (2013)32. Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: A new frontier for evolutionary computation. Frontiersin Robotics and AI , 40 (2016)33. Ramadan, E., Naef, A., Ahmed, M.: Protein complexes predictions within protein interaction networks usinggenetic algorithms. BMC bioinformatics (7), 269 (2016)34. Lee, N.K., Li, X., Wang, D.: A comprehensive survey on genetic algorithms for dna motif prediction.Information Sciences , 25–43 (2018)35. Corus, D., Oliveto, P.S.: Standard steady state genetic algorithms can hillclimb faster than mutation-onlyevolutionary algorithms. IEEE Transactions on Evolutionary Computation (5), 720–732 (2017)36. Ans´otegui, C., Malitsky, Y., Samulowitz, H., Sellmann, M., Tierney, K.: Model-based genetic algorithms foralgorithm conﬁguration. In: Twenty-Fourth International Joint Conference on Artiﬁcial Intelligence (2015)37. Bhardwaj, A., Tiwari, A., Krishna, R., Varma, V.: A novel genetic programming approach for epileptic seizuredetection. Computer methods and programs in biomedicine , 2–18 (2016)38. Tan, C.H., Tan, M.S., Chang, S.W., Yap, K.S., Yap, H.J., Wong, S.Y.: Genetic algorithm fuzzy logic for medicalknowledge-based pattern classiﬁcation. Journal of Engineering Science and Technology , 242–258 (2018)39. Dashtban, M., Balafar, M.: Gene selection for microarray cancer classiﬁcation using a new evolutionary methodemploying artiﬁcial intelligence concepts. Genomics (2), 91–107 (2017)40. La Cava, W., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional geneticprogramming for multiclass classiﬁcation. Swarm and evolutionary computation , 260–272 (2019)41. Devarriya, D., Gulati, C., Mansharamani, V., Sakalle, A., Bhardwaj, A.: Unbalanced breast cancer dataclassiﬁcation using novel ﬁtness functions in genetic programming. Expert Systems with Applications ,112866 (2020) Figure 1 Finding Quality Papers.

Search criteria used to identify papers for the article.werhun et al.

Page 13 of 20

Figure 2 Bias Variance Trade-Oﬀs

Visualization of underﬁtting and overﬁtting.

Table 1: Information about the Quality of the PapersArticle SJRRank CitiesPerYear Year ofpublica-tion How the GA is Used Beneﬁts Biomedical Applica-tions[3] Q1 23 2017 Genetic Programming usedfor cancer disease classiﬁca-tion. IG/GA method improves classiﬁ-cation accuracy by reducing thenumber of features and prevent-ing the GA from being trappedby local optimum. Cancer Classiﬁcation[24] Q1 6.67 2017 Using GA for feature selec-tion. Combing GA and PSOfor feature selection. Classi-ﬁcation using GP. Selecting fewer genes, classiﬁca-tion algorithm takes less com-putational time. GA/DT andGA/GP yields highest classiﬁca-tion accuracy. Colon Cancer.[1] Q1 28.33 2017 Adaptive Genetic Algorithm(AGA) improves conven-tional GA by adjusting val-ues of crossover and muta-tion probability. The adapt-ability increases robustness,increasing the chance ofﬁnding optimal solutions. Combing MIM (Mutual informa-tion maximization) with AGA,eliminates redundant samplesand reduces the dimension of thegene expression data. General applications tobiomedical datasets.[2] Q1 27.67 2017 GA feature selection - ex-traction of information andsigniﬁcant features. Reduces computation complex-ity and speeds up the data min-ing process.GA for feature selection, com-bined with Rotation Forest re-sulted in highest classiﬁcationaccuracy. Breast Cancer Diagno-sis.[25] Q1 2.00 2019 GA optimizes the subspaceensembling process. Optimizing with GA, outper-forms selected base feature se-lection techniques in terms ofprediction accuracy. General applications tobiomedical datasets.[6] Q1 18.50 2018 Machine learning ap-proaches based on the GAfor feature selection. Reduces overlapping betweenclasses, and reduces the numberof features to enhance the timecost. Visualizing borderpoints for resectionof NasopharyngealCarcinoma.[9] Q1 2.00 2019 GA results in high-qualitysolutions (accuracy and ex-ecution time) GA-FBC (Fast Branch CutMethod) provides eﬃcient so-lutions, regarding performancemetrics. Biomedical supplychain networks.[4] Q1 37.00 2019 GA used to determine opti-mum parameters of SVM. Combing GA with SVM oﬀersquick global optimizing ability. Classiﬁcation of EEGdata for Epilepticseizure detection.[11] Q1 13 2015 Feature selection tool devel-oped based on GA Able to signiﬁcantly reduce thenumber of features withoutsacriﬁcing classiﬁcation perfor-mance. Feature selection forbiomedical data.[8] Q1 9.25 2016 Uses a GA-based weightedensemble method to predicttransposon-derived piRNAs Has higher performance and ro-bustness compared to similarmethods. Prediction of piRNAs. werhun et al.

Page 14 of 20 [18] Q1 8.00 2019 GAs with tournament selec-tion and elitism Speeds up the required compu-tations, and can take into ac-count datasets produced by 3rdgeneration sequencing technolo-gies Helps solve the haplo-typing problem.[19] Q1 1.33 2017 GA performs the search ofthe generated database A freely available method,through a webapp that ranksamong the top (4th) Identiﬁcation of en-zyme active sites al-lowing for non-exactmatches.[5] Q1 1.00 2019 GA used to ﬁnd subset ofthe principal components,from a Principal componentanalysis. Use of Principal componentanalysis before the GA improvesthe results of GA selection Help identify whattreatments shouldbe done for diﬀerentpatients.[33] Q1 3.50 2016 GA used to identify com-plexes in protein interactionnetworks Method allows for identifyingclustering with varying densities.It is more scalable and robustand it can be tuned. Used to detect denseand sparse protein clus-ters.[7] Q1 11.00 2019 Uses 2 GAs. The outer GAserves as the main algo-rithm and outputs the sub-set of genes evaluated bySVM. The inner GA takesdata from DNA methylationand outputs subset of CpGsites. Far higher accuracy comparedto other methods, and has beenshown to be able to diﬀerentiatebetween lung cancer subtypes Identiﬁcation of dis-ease (cancer) biomark-ers.[34] Q1 4.00 2018 Compares the performanceof multiple GAs N/A Guidelines for the de-velopment of GA basedsolutions for DNA mo-tif prediction.[12] Q1 17 2018 GA used in feature selectionwhile predicting the Qualityof Life Study included all UNOS fea-tures (after preprocessing) al-lowing for their eﬀect to be as-sessed. Minimize or elimi-nate personal bias inlung transplants byautomation. Helpingto increase the rateof successful lungtransplants.[35] Q1 10.67 2018 Prove the beneﬁts ofcrossover in Genetic Algo-rithms Established that GA withcrossover is 25 percent fasterthan mutation alone, withcertain parameters. N/A[21] Q1 2.67 2018 Finding the best features,predict advanced ﬁbrosis. GA is able to work in parallel. Predict advanced ﬁbro-sis.[36] Q1 10.20 2015 Automatic algorithm con-ﬁguration Numerical results show thatmodel-based genetic algorithmssigniﬁcantly improve our abil-ity to eﬀectively conﬁgure algo-rithms automatically. N/A[17] Q1 10.00 2016 GA for combining represen-tations of learned informa-tion such as known shapes,regional properties and rel-ative position of objectsinto a single framework toperform automated three-dimensional segmentation. GA-based method are very use-ful for medical imaging applica-tions. GA tested for prostatesegmentation on pelviccomputed tomogra-phy and magneticresonance images.[22] Q1 5.00 2018 Three diﬀerent modiﬁedGenetic Algorithm ap-proaches are proposed forfeature selection The number of features are re-duced, decreasing the dimen-sionality of the features Magnetic Resonancebrain image classiﬁca-tion[37] Q1 8.25 2016 Classiﬁcation Proposes a constructive geneticprogramming approach that in-creasing the number of useful“building blocks” Classifying EEG signals[23] Q1 1.50 2016 Feature Selection Compared the performance tosupport vector machines, logisticregression and performed better. Recognition of can-cerous cells and alsogene expression proﬁl-ing data werhun et al.

Page 15 of 20 werhun et al.

Page 16 of 20 werhun et al.

Page 17 of 20 (cid:88) × [24] × × [1] × × [2] × × [25] (cid:88) × [6] × × [9] (cid:88) × [4] × × [11] × × [8] × (cid:88) [18] × (cid:88) [19] × × [5] × × [33] (cid:88) (cid:88) [7] (cid:88) × [34] × × [12] × × [35] (cid:88) × [21] × × [36] × × [17] × × [22] × × [37] (cid:88) × [23] × × [38] (cid:88) × [39] × × [20] (cid:88) × [10] × × [13] (cid:88) × [40] × × [41] (cid:88) × Table 4: Tools UsedArticle Tools Additional ML Algorithms Utilized/Validation[3] Does not specify. 10-fold cross validationClassiﬁcation Algorithm:- Genetic Programming (GP)[24] Weka Machine Learning package Leave one out cross validation (LOOCV), k-fold crossvalidationClassiﬁcation Algorithms:- Decision Tree,- Naive Bayes,- Support Vector Machine,- Genetic Programming werhun et al.

Page 18 of 20 [1] Does not specify. Multiple cross validations.Classiﬁcation Algorithm:- Back Propagation Neural Network (BP),- Support Vector Machine (SVM),- Extreme Leaning Machine (ELM),- Regularized Extreme Leaning Machine (RELM)[2] Weka employed to implement algorithms. 10-fold cross validation.Classiﬁcation Algorithm:- Rotation Forest Model,- Logistic Regression,- Bayesian Network,- Multilayer Perceptron (MLP),- Radial Basis Function Networks (RBFN),- Support Vector Machine (SVM),- C4.5 Decision Tree,- Random Forest,- Rotation Forest[25] All experiments coded in Python 2.7 andWeka 3.8.3 (to implement all the prede-termined feature selection methods).Python Scikit-learn package implementedall the classiﬁers. 10-fold cross validationClassiﬁcation Algorithms:- Random Forests,- Bootstrap Aggregating with C4.5 Decision Trees,- K-Nearest Neighbour[6] MATLAB 2014a utilized for the evalua-tion of the present approach. Cross validation.Classiﬁcation Algorithms:- Artiﬁcial Neural Networks[9] All approaches in this study are coded us-ing MATLAB software. N/A[4] Does not specify. 10-fold cross validation s Classiﬁcation Algorithm:- Support Vector Machine[11] PGAPack software libraries, K-NearestNeighbour from AlgLib Library, MatlabR2012b Classiﬁcation algorithms :- K-Nearest Neighbour- Naive-Bayes- Combination of above 2 algorithms.[8] Random forest classiﬁcation engine fromscikit-learn python package 10-fold cross validation, Their weighted ensemblemethod is constructed using training data.Classiﬁcation Algorithms: - Random forest- Support Vector Machine[18] Message Passing Interface speciﬁcationsin C++, Roche/454 genome sequencer,PacBio RS II sequencer, General Error-Model based SIMulator toolbox N/A[19] Flask framework for Python, frontend de-veloped using Bootstrap framework. Runson top of an Apache server with commu-nication being made using a Web ServerGateway Interface N/A[5] Use of sequence analysis pipelines such as:- DADA2- PEAR Software V0.9.6- BWA Software Package V0.7.12- Stats package in R 5-fold Cross Validation- Classiﬁcation Algorithms:- Logistic Regression[33] GO term ﬁnder Spectral clustering[7] - biomaRt- GenomicRanges- Mminﬁ- IlluminaHumanMethyla-tion27kabbi:ilmn12:hg19 R packages- SVM method from e1071 package- Gene Ontology, Kyoto Encyclopedia ofGenes and Genomes 5-fold Cross ValidationDeep-learning Neural NetworkClassiﬁcation algorithm:- Support vector machine[34] Local Search TechniquesGibbs SamplingExpectation maximizationAdditional non-GA methods/tools men-tioned but not shown to be tested: listcan be found in supplementary materialspdf. GA Motif discovery: PCEA, GAPWM, kmerGA, GAMI,FGMA, Paul and Iba, Gadem, GA-DPAF, GASMEN,MDGA, GALF (GALF-P), GALF-G, GAME, GEMFA,GAPK, iGAPK werhun et al.

Page 19 of 20 [12] Does not specify. 5-fold Cross ValidationRandom undersamplingClassiﬁcation algorithm used:- k-Nearest neighbour- Support Vector Machine (SVM)- Artiﬁcial Neural Network (ANN)[35] The ONEMAX benchmark function N/A.[21] MedCalc, MATLAB, Weka Implemented several types of Machine learning tech-niques:- particle swarm optimization- multi-linear regression- decision tree learning algorithms to compare.[36] Comparing Continuous Optimizers(COCO) software Classiﬁcation Algorithm:- Random Trees[17] In preprocessing, the images were im-proved with the “imadjust” function inMATLAB N/A[22] Implemented in MATLAB Neural Network[37] N/A N/A[23] GPLAB, which is a genetic program- mingtoolbox, which runs in the MATLAB en-vironment Classiﬁcation Algorithms:- Support Vector Machine- Logistic Regression[38] N/A Fuzzy Logic[39] Does not specify LOOCV and 10 Fold CVClassiﬁers:- KNN- Support Vector Machine- Naive BayesFilter Methods:- Laplacian-score- Fisher-score[20] GA progress is processed on Tensorﬂowplatform with GEFORCE GTX TITANGPUs Convolutional Neural Networks[10] Google Maps API N/A[13] MATLAB R2014b, libsvm library forMATLAB 4-fold cross validation10-fold cross validationClassiﬁcation Algorithms:-Support Vector Machine-K-Nearest Neighbour-Probabilistic Neural Network-Radial Basis Function Neural Network[40] PyTorch Neural Network, Decision Tree[41] Python packages NoneTable 5: Performance EvaluationArticle Acc. ROCCurve AUC TP TN FP FN Speciﬁcity Sensitivity/ Recall Prec./PPV F-Mea-sure Avg.Run-time Comp.Com-plex-ity Other[3] (cid:88) × × (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × × (cid:88) [24] (cid:88) × × × × × × × × × × (cid:88) (cid:88) [1] (cid:88) × × × × × × × × × × × × [2] (cid:88) (cid:88) (cid:88) (cid:88) × (cid:88) × × × × (cid:88) × (cid:88) [25] (cid:88) × × × × × × × × × × × (cid:88) -Feature im-portance-chi-squaretest[6] (cid:88) (cid:88) × (cid:88) × (cid:88) × (cid:88) (cid:88) × × × × [9] (cid:88) × × × × × × × (cid:88) × × (cid:88) × [4] (cid:88) × × (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × × × Fitness classi-ﬁcation accu-racy werhun et al.

Page 20 of 20 [11] × × × (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × -Stability-G-Mean[8] (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × × × [18] (cid:88) × × × × × × × × × × (cid:88) × Convergencerate for Av-erage BestFitness[19] (cid:88) × × × × × × × × × × (cid:88) × [5] × (cid:88) (cid:88) × × × × × × × × × × [33] × × × (cid:88) (cid:88) (cid:88) × × (cid:88) (cid:88) (cid:88) × × Discard Ratio[7] (cid:88) × × (cid:88) (cid:88) (cid:88) (cid:88) × × × × × × [34] × × × (cid:88) × (cid:88) (cid:88) × (cid:88) (cid:88) (cid:88) × × [12] (cid:88) × × (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × G-Mean[35] × × × × × × × × × × × (cid:88) × [21] (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × × × [36] (cid:88) × × × × × × × × × × × × [17] × × × × × × × × × × × × × Dice Similar-ity[22] (cid:88) (cid:88) (cid:88) (cid:88) × × × (cid:88) (cid:88) × × × × [37] (cid:88) (cid:88) (cid:88) (cid:88) × × × (cid:88) (cid:88) × × × × [23] (cid:88) (cid:88) (cid:88) × × × × × × × × × × [38] (cid:88) (cid:88) (cid:88) × × × × × × × × × × [39] (cid:88) × × × × × × × × × × (cid:88) (cid:88)

Laplacian-score, Fisher-score[20] (cid:88) × × × × × × × × × × × × [10] × × × × × × × × × × × × × -AmbulanceUsage-PatientWaiting time[13] (cid:88) × × (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × (cid:88) (cid:88) -Sum of Er-rors-k-coeﬃcient-Acceptancefeature coef-ﬁcient[40] (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) × × [41] (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)(cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)