[PDF] Tuning a Multiple Classifier System for Side Effect Discovery using Genetic Algorithms

Abstract

In previous work, a novel supervised framework implementing a binary classifier was presented that obtained excellent results for side effect discovery. Interestingly, unique side effects were identified when different binary classifiers were used within the framework, prompting the investigation of applying a multiple classifier system. In this paper we investigate tuning a side effect multiple classifying system using genetic algorithms. The results of this research show that the novel framework implementing a multiple classifying system trained using genetic algorithms can obtain a higher partial area under the receiver operating characteristic curve than implementing a single classifier. Furthermore, the framework is able to detect side effects efficiently and obtains a low false positive rate.

Full PDF

TTuning a Multiple Classiﬁer System for Side Effect Discovery usingGenetic Algorithms

Jenna M. Reps, Uwe Aickelin and Jonathan M. Garibaldi

Abstract — In previous work, a novel supervised frameworkimplementing a binary classiﬁer was presented that obtainedexcellent results for side effect discovery. Interestingly, uniqueside effects were identiﬁed when different binary classiﬁerswere used within the framework, prompting the investigationof applying a multiple classiﬁer system. In this paper weinvestigate tuning a side effect multiple classifying system usinggenetic algorithms. The results of this research show that thenovel framework implementing a multiple classifying systemtrained using genetic algorithms can obtain a higher partialarea under the receiver operating characteristic curve thanimplementing a single classiﬁer. Furthermore, the frameworkis able to detect side effects efﬁciently and obtains a low falsepositive rate.

I. I

NTRODUCTION S IDE EFFECTS of prescription drugs are a commonoccurrence that often lead to patient morbidity andmortality. When there is an association between a medicalevent (e.g., sickness, rash and weakness) and a drug, thisis termed an adverse event (AE). When the relationship isproven to be causal (i.e., the drug causes the medical event),it is referred to as an adverse drug reaction (ADR).As a large quantity of medical data are often storedin databases, numerous methods have been presented thatmake use of medical databases with the aim of identifyingADRs efﬁciently [1], [2]. Unfortunately, the majority of thesemethods work by ﬁnding medical events that are highlyassociated to a drug, therefore, rather than detecting ADRsthey detect AEs. This has lead to the methods having highfalse positive rates [3], [4] as the majority of associationsare not causal. Recent research has focused on using super-vised techniques such as logistic regression [5] to reducethe impact of confounding (i.e., when a hidden variable isresponsible for the association). These supervised methodsaim to distinguish between associations that are causal ornon-causal by ﬁnding alternative causes of the medical event.Unfortunately, this requires generating a large number ofregression models and also requires additional knowledgeof possible confounders (e.g., other possible causes of themedical event). Consequently, these methods are often slowand dependant on current knowledge. Alternatively, a recentframework, side effect classiﬁer (SEC), has been proposedthat applies a single supervised classiﬁer to identify ADRsefﬁciently [6] and the results suggest this framework is lesssusceptible to confounding.

Jenna M. Reps, Uwe Aickelin and Jonathan M. Garibaldi are with theSchool of Computer Science, The University of Nottingham, UK (email: { jenna.reps, uwe.aickelin, jonathan.garibaldi } @nottingham.ac.uk). ACKGROUND

A. Genetic Algorithms

Genetic algorithms are probabilistic search procedures in-spired by the natural process of evolution [8]. The algorithmis an iterative process that initially starts with a populationof candidate solutions that are randomly generated, andthen these candidate solutions are evolved. Each candidatesolution has a set of genotypes (e.g., parameter values)and the set of genotypes determine the candidate solution’sﬁtness. During each iteration, a new generation of candidate a r X i v : . [ c s . L G ] S e p ig. 1T HE SCHEMA OF A MULTIPLE CLASSIFIER SYSTEM . solutions are created by recombination and mutation ofthe previous candidate solutions’ genotypes based on theirﬁtness. B. Multiple Classiﬁer System

The term ensemble is used to describe a composition ofmultiple classiﬁers. A type of ensemble that consists of acomposition of various different classiﬁers has frequentlybeen termed a multiple classiﬁer system [9] rather thancalled an ensemble. This is to help distinguish between acombination of the same classiﬁer trained with differentperspectives (e.g., combining decision trees that are trainedusing different independent variables) and a combination ofdifferent classiﬁers (e.g., combining a SVM, a random forest,a neural network and a logistic regression model). Fig. 1illustrates a multiple classiﬁer system that combines the out-put of multiple single classiﬁers to generate a single output.The aim of a multiple classiﬁer system is the take advantageof diversity between classiﬁers to improve the classifyingaccuracy while maintaining efﬁciency. Multiple classiﬁersystems have been successfully implemented in numerousmachine learning tasks including diagnosing melanoma [10],classifying breast lesions [11] and detecting naked bodies inimages [12]. In the previous examples, combining multipleclassiﬁers, under a suitable weighting scheme, was shown toimprove performance compared to a single classiﬁer.As the classiﬁers used to identify ADRs within the SECframework appear to be diverse, implementing a multipleclassiﬁer system that combines all the classiﬁers may im-prove the detection of ADRs.

C. Previous Pharmacovigilance

Pharmacovigilance is the study of prescription drug sideeffects. One important part of pharmacovigilance is theprocess of detecting drug side effects after the drugs havebeen approved and marketed. Identifying drug side effects isa difﬁcult task due to the majority of side effects relying onmultiple factors, so it is common for some side effects tobe observed rarely. Clinical trails are unable to identify themajority of side effects prior to marketing due to them only

Fig. 2A

N EXAMPLE OF AN

SRS

DATABASE ENTITY RELATIONSHIP DIAGRAM involving a small number of patients and being conductedunder unrealistic conditions [13]. For example, patients in-volved in clinical trials are unlikely to take other drugs duringthe trial, so drug interactions can not be analysed.In general, the most widely implemented pharmacovigi-lance techniques have been developed for a speciﬁc type ofmedical database known as the spontaneous reporting system(SRS) databases [14]. These databases consist of all thereports made by medical staff or the general public relating toa suspected ADR. The general design of the SRS databases isillustrated in Fig. 2. The SRS databases contain natural linksbetween drugs and medical events, see Fig. 3. Sometimesadditional information about the patient is included into thereport, such as age and gender, but this is not compulsory.The techniques for detecting ADRs look for medical eventslinked disproportionally more to the drug than expected [15].Unfortunately, due to the reporting being voluntary, manyADRs may not be reported, and it is possible that somerare ADRs may never be noticed. This under-reporting canprevent the early detection of ADRs and this means patientsare put at risk for longer. In addition, there are known dataquality issues such as missing, duplicated or incorrect data[16].Due to the limitations associated with the SRS databases,recent work has focused on using different types of medicaldatabases [17]. One example is the longitudinal health-care databases. These databases contain medical informationabout patients often spanning many years and it is commonfor them to contain records for millions of patients. As thistype of database does not rely on voluntary reporting, itpresents a unique perspective for signalling ADRs. However,it has been shown to suffer from different limitations. Themain limitation is that there are no clear links betweendrugs and medical events within the data itself, so potentiallinks are inferred by ﬁnding the medical events that occurshortly after the drug in time. This is illustrated in Fig. 5.Unfortunately, the majority of the drug and medical eventslinked by time are associated but do no correspond to ADRs,and it has proved difﬁcult for unsupervised algorithms todistinguish between the non-causal and causal relationships.In [18], the authors presented a semi-supervised algorithmthat requires a user to input a drug of interest and then returnsa ranked list of medical events. The higher a medical event is ig. 5A

N EXAMPLE OF INFERRING A LINK BETWEEN A DRUG AND MEDICAL EVENT WITHIN A LONGITUDINAL HEALTHCARE DATABASE . T

HE MEDICALEVENTS ARE REPRESENTED BY CIRCLES AND THE DRUGS REPRESENTED BY SQUARES . T

HE POTENTIAL ACUTE

ADR

S ARE THE MEDICAL EVENTSOBSERVED DURING THE [ t , t ] TIME PERIOD CENTRED AROUND THE PRESCRIPTION .Fig. 3I

LLUSTRATION OF HOW THE REPORTS IN THE

SRS

DATABASE CONTAINDIRECT LINKS BETWEEN DRUGS AND MEDICAL EVENTS . E

ACH REPORTWITHIN THE DATABASE CONSISTS OF AN OBSERVATION OF A PATIENTTAKING A DRUG AND THEN EXPERIENCING THE MEDICAL EVENTSOMETIME AFTER . ranked by the algorithm, the more likely that medical eventcorresponds to a rare ADR of the speciﬁed drug of interest.The algorithm generated the data by extracting attributesthat are insightful for ADR detection from a longitudinalhealthcare database and determined labels for some medicalevents by mining online medical websites. The labelled andunlabelled data were then used to clusters similar medicalevents into either an ADR cluster, an indicator (a cause oftaking the drug) cluster or a noise cluster. Medical events Fig. 4A

N EXAMPLE OF A LONGITUDINAL HEALTHCARE DATABASE ENTITYRELATIONSHIP DIAGRAM assigned to the noise cluster were ﬁltered, and the remainingmedical events where ranked based on how often theyoccurred after the drug divided by how often they occurredbefore the drug multiplied by a cluster dependent weight.The success of the semi-supervised algorithm themprompted the idea of generating causal inference basedattributes for a selection of drug-medical event pairs thatare deﬁnitively known ADRs or non-ADRs [19] and usingthis data to train a classiﬁer that can then be used to pre-dict new ADRs. One such supervised framework generatedattributes based on the counterfactual theory of causality[20], whereas another framework, SEC, generated attributesbased on the Bradford Hill causality criteria. Rather thanmining online forums for the known ADR and non-ADRlabels, both frameworks used an online resource that containsists of ADRs that were mined from drug packaging. Bothsupervised frameworks demonstrated excellent performanceand previous results suggest supervised techniques may helpimprove current pharmacovigilance.

1) SEC Framework:

The previously presented SECframework is a supervised algorithm for detecting ADRs.The algorithm automates the technique of inferring causalityvia the Bradford Hill causality criteria, as this technique iscommonly applied to assess whether a side effect is causedby a drug or not. The SEC framework requires three steps.The ﬁrst step is data generation where suitable labelleddata are extracted for each drug-medical event pair thatrepresent a possible acute ADR. The second step is traininga binary classiﬁer using the labelled data to classify eachdrug-medical event pair as an ADR or non-ADR, and theﬁnal step is applying the trained classiﬁer to new unlabelleddata.

Step 1) Data generation

As we are interesting in detecting acutely occurring ADRs,we ﬁnd the drug-medical event pairs that are possible ADRsby investigating the medical events that occur within a monthof a drug being prescribed. To train a binary classiﬁerwe need a set of attribute vectors x i ∈ R n and theircorresponding class y i ∈ {− , } . In the SEC framework,each data point corresponds to a drug-medical event pairof interest, where the i th drug-medical event pair has theattribute vector x i and class y i . Therefore, to generate thetraining data, the ﬁrst step is to identify the drug-medicalevent pairs of interest, the second step is to determine theirlabels and the ﬁnal step is to calculate their attributes.To identify the drug-medical events pairs of interest, werestrict out attention to a set of speciﬁed drugs, denoted by D . For each drug d i ∈ D , we use temporal relationshipsto identify the risk medical events of d i ( RM E d i ). Therisk medical events of d i are the medical events that wereobserved during the month after a prescription of d i for oneor more patients, RM E d i = { medical events | the medicalevent occurs within a month of d i for one or more patients } .The drug-medical event pairs of interest are all the possiblecombinations of d - e , where d ∈ D and e ∈ RM E d . Thedrug-medical event pairs of interested with labels are thendetermined. For the i th drug-medical event pair, if the medicalevent is labelled as a known side effect of the drug withinthe online drug resource known as SIDER [21], then thepair is labelled as an ADR ( y i = 1 ). Alternatively, if themedical event cannot possibly correspond to an acute ADR(e.g, the medical event is ‘cancer’, ‘menopause’ or ‘death offamily member’), the drug-medical event is labelled as a non-ADR ( y i = − ). Any drug-medical event pair neither listedon SIDER as corresponding to a known ADR nor clearly anon-ADR is ignored as the pair has no deﬁnitive label.For the i th drug-medical event pair labelled as an ADR ornon-ADR, we calculate the Bradford Hill causality criteriabased attributes, described in [6] and denote the vectorconsisting of these attributes by x i . The attributes are derived from a selection of the Bradford Hill causality criteria: • Association strength:

How strong the association be-tween the drug and medical event is. • Temporality:

Does the drug precede the medical eventor the other way? • Speciﬁcity:

How speciﬁc the medical event is, or howsimilar patients experiencing the medical event are. item

Biological gradient:

Measures whether the probabilityof the medical event increases as the drug dosageincreases. • Experimentation:

Does the medical event start and stopwhen the drug starts and stops?In summary, for the i th labelled drug-medical event pairwe have ( x i , y i ) , where x i is the Bradford Hill causalityattributes and y i = 1 when the i th drug-medical event pair isa known ADR and y i = − when the i th drug-medical eventpair is a known non-ADR. The complete set of labelled datais denoted by X , where X = { ( x i , y i ) } . Step 2)Training a binary classiﬁer

The labelled data are then used to train a binary classiﬁer(the choice of classiﬁer is determined by the user as anyclassiﬁer can be used within the framework), f : X → Y ; f ( x i ) → {− , } (1)where f ( x i ) = − means the drug-medical event pairis classiﬁed as a non-ADR and f ( x i ) = 1 means thedrug-medical event pair is classiﬁed as an ADR. The chosenclassiﬁer is trained using ten-fold cross validation to reduceoverﬁtting. In previous work [6], the random forest classiﬁerwas found to perform better than a support vector machine,a logistic regression and a naive Bayes classiﬁer. Step 3)Applying trained classiﬁer

The trained classiﬁer is then applied to the attribute vector x ∗ for a new drug-medical event pair, and the prediction f ( x ∗ ) is returned.For evaluating the framework, the labelled data are par-titioned into training/testing data and validation data. Thetraining/testing data are used to train the classiﬁer and thevalidation data are used to evaluate the performance of thetrained classiﬁer by comparing the predicted class with thetrue class. III. M ATERIALS

The THIN database contains temporal medical data forover 11 million patients (approximately 4 million currentlyactive patients). The data is anonymised, so each patient isrepresented by a unique patient ID rather than the patientsreal name. There are three main tables within the THINdatabase, the patient table, the medical table and the therapytable, see Figs. 6-8. The patient table contains personalinformation about each patient in the database including theiryear of birth, their gender and their date of registration. Thetherapy table contains timestamped records of each patient’sdrug prescription history, so each record includes the patient ig. 6T

HE PATIENT TABLE WITHIN THE

THIN

DATABASE .Fig. 7T

HE MEDICAL TABLE WITHIN THE

THIN

DATABASE . ID, the date of the prescription and information about theprescription (drug details and dose details). The medicaltable is similar to the therapy table but contains timestampedrecords of each patient’s medical event history (i.e., illnesses,diseases, laboratory tests and administrative events), so atypical record contains the patient ID, the date of the medicalevent and the medical event information, recorded via theREAD codes.Each READ codes consist of ﬁve elements from thealphabet { a − z, A − Z, − , ˙ } and they have a hierarchalstructure. The depth of a node within a tree is the length ofthe minimum path from the node to the root. Unfortunately, Fig. 8T

HE THERAPY TABLE WITHIN THE

THIN

DATABASE . the READ codes have redundancies and the same medicalevent can be represented by various distinct READ codes.This can cause issues for data miners, however the SECalgorithm generates attributes speciﬁcally to prevent thisissue having a negative effect on its ability to detect ADR.IV. M ETHODOLOGY

In this paper we are developing a multiple classiﬁer systemto be implemented within the SEC framework and com-paring its ability to detect side effects with the frameworkimplementing a single classiﬁer. Therefore, in this section themethods used to analyse the single classiﬁer framework andthe multiple classiﬁer system framework are both described.To evaluate each framework, we determine all the la-belled drug-medical event pairs correspond to the drugs:nifedipine, amlodipine, felodipine, nicardipine, verapamil,ciproﬂoxacin, ofoxacin, norﬂoxacin, nalidixic acid, moxi-ﬂoxacin, ﬂuconazole, itraconazole, posaconazole, voricona-zole, ibuprofen, fenoprofen, ketoprofen, celecoxib, ﬂurbipro-fen, nabumetone, naproxen, budesonide, beclometasone, hy-drocortisone and prednisolone. These labelled data are com-posed of the Bradford Hill causality criteria derivedattributes for each drug-medical event data-point and a labelspecifying whether the drug-medical event data-point is listedas an ADR on SIDER or one of the manually selected non-ADRs.There were a total of drug-medical event data pointswith known labels corresponding to the chosen drugs. Thelabelled data were partitioned into training/testing data X T (80% of the labelled data) and validation data X V (20% ofthe labelled data). The training/testing date were used to trainthe classiﬁer or multiple classiﬁer system and the validationdata were used to evaluate the framework implementing thesingle classiﬁer or multiple classiﬁer system.The measure used to determine the effectiveness of eachframework is the area under the receiver operating charac-teristic curve. This measure corresponds to the probabilityof a drug-medical event pair known to be an ADR beingassigned a higher conﬁdence of being within the ADR classby the framework than a drug-medical event pair known tobe a non-ADR [22]. In particular, we restrict out attentionto a partial area, as we are only interested in the section ofthe curve where few drug-medical event pairs are classedas side effects [23]. When many drug-medical event pairsare classed as ADRs, there are likely to be many non-ADRspairs incorrectly classed as ADRs and this is undesirable.The partial area under the curve that we are interested in isdenoted by pAUC [0 . , and a more detailed explanation ofhow the measure is calculated can be found in section IV-C. A. SEC Framework: Single Classiﬁer

To analyse the single classiﬁer framework, the SEC frame-work implementing either a random forest, support vectormachine, logistic regression, naive Bayes or k-nearest neigh-bours classiﬁer is trained using ten-fold cross validation onthe training/testing data X T . The trained classiﬁer is denotedby f : R → {− , } , where f ( x i ) = − represents the i th rug-medical event pair being classiﬁer as a non-ADR and f ( x i ) = 1 represents the i th drug-medical event pair beingclassiﬁer as an ADR. B. SEC Framework: Ensemble Classiﬁer

The multiple classiﬁer system framework requires train-ing multiple classiﬁers and learning the optimal weightedcombination of the classiﬁers. In this framework, after thetraining data is generated, the data is ﬁrstly used to trainvarious classiﬁers and then used to determine a weightedcombination of all the classiﬁer.

1) Training the classiﬁers:

Five classiﬁers (random forest,support vector machine, logistic regression, naive Bayes andk-nearest neighbours) are trained via ten fold cross validationto determine the optimal parameters that maximise the partialarea of interest under the curve (pAUC [0 . , , see section IV-C) using the training/testing X T set. Each classiﬁer is trainedusing a grid search over suitable parameter values, these canbe seen in Table I and the chosen parameter values are alsolisted.For each trained classiﬁer f i , we can also extract theclassiﬁers conﬁdence that the drug-medical event is in theADR class, this is denoted by c i : R → [0 , . So c i ( x j ) isthe conﬁdence of the i th classiﬁer that the j th drug-medicalevent pair is an ADR.

2) Determining the weights:

Using these conﬁdence func-tions, genetic algorithms are applied to ﬁnd the optimalweights β i , i ∈ [1 , for the multiple classiﬁer system thatdetermines the class of the j th drug-medical event pair by, f ( x j ) = (cid:26) if (cid:80) i β i c i ( x j ) ≥ α ∈ (0 , − otherwise (2)The value α is the natural threshold and this controls thestringency of the multiple classiﬁer system.The weights are determined by implementing a geneticalgorithm with a mutation rate of 0.1 and applying elitismwith a candidate population size of 1000 until convergence,see Table II for full details. The ﬁtness of each weightvector ( β ) is the ten fold cross validation average of thethe partial AUC over the speciﬁcity range [0.9,1] for themultiple classiﬁer system based on that weight scheme onthe training/testing set. The optimal weight vector was, β = ( β , β , β , β , β )= (0 . , . , . , . , . (3)where c () is random forest, c () is support vector machine, c () is K-nearest neighbours, c () is logistic regression and c () is naive Bayes. C. Evaluation

The framework implementing a single trained classiﬁer orthe multiple classiﬁer system is then applied to the validationset and the prediction of each data-point in the validation setis compared with the truth. The number of true positives(TP), false positives (FP), false negatives (FN) and truenegatives (TN) are calculated as follows, TP : |{ i | y i = f ( x i ) = 1 }| FP : |{ i | y i = − , f ( x i ) = 1 }| FN : |{ i | y i = 1 , f ( x i ) = − }| TN : |{ i | y i = f ( x i ) = − }| Using the above values, the accuracy, precision, sensitivity,and speciﬁcity can be calculated,Sensitivity = (TP) / (TP+FN)Speciﬁcity = (TN) / (TN+FP)Accuracy = (TP+TN) / (TP+FP+FN+TN)Precision = (TP) / (TP+FP) (4)The receiver operating characteristic (ROC) curve is gener-ated by potting the sensitivity against 1 minus the speciﬁcityand the AUC is the area under this curve. The AUC measuresthe general ability of a classiﬁer rather than only consideringhow well it does it at its natural threshold and is a fairer mea-sure for comparing different classiﬁers. The pAUC [0 . , isthe partial area under the ROC curve, between the speciﬁcityvalues of . − , this value is useful as we are interestedin the classiﬁers ability when the speciﬁcity is high and thenumber of of false positives is low.V. R ESULTS & D

ISCUSSION

The results are presented in Table III and ROC plotsfor the framework implementing the range of classiﬁersor the multiple classiﬁer system can be seen in Fig. 9.The optimal value for α (the multiple classiﬁer system’snatural threshold) was found to be α = 0 . . It can beseen that the framework implementing a multiple classiﬁersystem ( f ) obtained a superior accuracy, sensitivity andpAUC [0 . , than the framework implementing any singleclassiﬁer. However, using a bootstrap test to compare thepAUC [0 . , s [24] of the random forest and the multipleclassiﬁer system at a 5% signiﬁcance level, the pAUC [0 . , was not shown to be signiﬁcantly different (p-value=0.499).The highest precision and speciﬁcity values were obtained bythe framework implementing a support vector machine andnot the multiple classiﬁer system. This is probably due to themultiple classiﬁer system being optimised speciﬁcally for thepartial AUC. If the precision or speciﬁcity was deemed to bemore important, different weights could be calculated by thegenetic algorithm to optimise the multiple classiﬁer systemfor the desired measure (e.g., precision or speciﬁcity).The ensemble weights do not necessarily reﬂect the impor-tance of the classiﬁer within the ensemble, as each classiﬁerhas varying ranges for its conﬁdence function values. It maybe useful to normalise the conﬁdence function values priorto determining the optimal ensemble weights. If the classiﬁerconﬁdence weights were normalised, then the ensembleweights would correspond to the importance of the classiﬁerand this would then help indicate which of the classiﬁers wasmost inﬂuential within the ensemble. This knowledge couldbe used to remove classiﬁers that had little inﬂuence.The advantage of ensemble approaches rather than relyingon any individual classiﬁer is that they generally reduce theclassiﬁer’s variance. This is useful for ADR detection, asthe training set is likely to change and grow as new ADRs ABLE IT

HE DIFFERENT CLASSIFIERS USED BY THE MULTIPLE CLASSIFIER SYSTEM AND THEIR OPTIMAL PARAMETERS . Classiﬁer Parameters :-(grid search range) Optimal Parameters f : Random Forest mtry:-[1,30] mtry=11 f : Support Vector Machine (Radial) sigma:-(0,1], C:-(0,10] sigma=0.0978, C=6.1624 f : K-Nearest Neighbours K:-[1,100] K= 17 f : Logistic Regression decay:-[0,10] decay=0 f : Naive Bayes fL:-[0,1], usekernel:- { TRUE,FALSE } fL=0, usekernel=TRUE TABLE IIT

HE GENETIC ALGORITHM PARAMETERS . Populationsize Crossover type Mutation type Elitismused Selection criteria Initialisation Stoppingcriteria1000 Local arithmeticcrossover Uniform randommutation True Fitness proportional selectionwith ﬁtness linear scaling Uniformlychosen from[0,1] After 500iterations

TABLE IIIT

HE RESULTS OF THE

SEC

FRAMEWORK IMPLEMENTING A SINGLE CLASSIFIER OR MULTIPLE CLASSIFIER SYSTEM FOR THE VALIDATION SET . Framework classiﬁer Accuracy Precision Sensitivity Speciﬁcity pAUC [ . , f : Random Forest 0.930 0.789 0.380 0.989 0.769 f : Support Vector Machine 0.921 f : K-Nearest neighbours 0.917 0.729 0.222 0.991 0.695 f : Logistic Regression 0.086 0.085 f : Naive Bayes 0.912 0.577 0.354 0.972 0.710 f : Multiple Classiﬁer System are discovered. An ensemble approach for ADR detectionis also useful, as previous results have shown that eachclassiﬁer tends to make different mistakes, so the ensemblecan overcome an individual classiﬁers misclassiﬁcation. Thisis the likely reason why the ensemble obtained an improvedperformance. However, the disadvantages are that the en-semble is computationally longer due to the requirement oftraining multiple classiﬁers and then tuning the ensembleweights. Although the multiple classiﬁer system improvedthe accuracy and pAU C [0 . , compared to each singleclassiﬁer, the improvement was not signiﬁcant. This maysuggest that when the training data is sufﬁciently large toenable good performance from a single classiﬁer, the smallbeneﬁt in performance of the ensemble is not enough toovercome the extra cost of complexity. It would be interestingto investigate how the ensemble performs relative to eachindividual classiﬁer at various training set sizes.VI. C ONCLUSIONS

In previous work, it was shown that different classiﬁersdetected different side effects. In this paper we combinedvarious classiﬁers with the aim of improving the overalldiscovery of side effects. The classiﬁers were combined usinggenetic algorithms to tune a multiple classiﬁer system thatcan be used within a side effect discovery framework. Wethen compared the side effect discovery framework imple-menting a multiple classiﬁer system with the framework im-

Fig. 9T HE ROC

PLOTS FOR THE FRAMEWORKS ABILITY TO DETECT

ADR

SWHEN IMPLEMENTING THE DIFFERENT CLASSIFIERS . lementing a single classiﬁer. The results show that a largerpartial AUC can be obtained by a multiple classiﬁer systemthat integrates multiple diverse classiﬁer by calculating aweighted aggregate of their conﬁdences that a data-pointbelongs to the class ADR. This research presents a noveluseful application of genetic algorithms.Possible areas of future work could investigate using asuitable evolutionary algorithm to tune each of the individualclassiﬁers rather than using a grid search (i.e., a selection ofvalues for each parameter in input and the search is done overall possible parameter combinations), as this may increasetheir individual performance in addition to the multiplesystem classiﬁers performance.R EFERENCES[1] G. N. Nor´en, J. Hopstadius, A. Bate, K. Star, and I. R. Edwards,“Temporal pattern discovery in longitudinal electronic patient records,”

Data Mining and Knowledge Discovery , vol. 20, no. 3, pp. 361–387,2010.[2] I. Zorych, D. Madigan, P. Ryan, and A. Bate, “Disproportion-ality methods for pharmacovigilance in longitudinal observationaldatabases,”

Statistical Methods in Medical Research , vol. 22, no. 1,pp. 39–56, 2013.[3] P. B. Ryan, D. Madigan, P. E. Stang, J. Marc Overhage, J. A. Racoosin,and A. G. Hartzema, “Empirical assessment of methods for riskidentiﬁcation in healthcare data: results from the experiments of theObservational Medical Outcomes Partnership,”

Statistics in Medicine ,vol. 31, no. 30, pp. 4401–4415, 2012.[4] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson,and R. B. Hubbard, “Comparison of algorithms that detect drugside effects using electronic healthcare databases.”

Soft Computing ,vol. 17, no. 12, pp. 2381–2397, 2013. [Online]. Available: http://link.springer.com/content/pdf/10.1007%2Fs00500--013--1097--4.pdf[5] O. Caster, N. Nor´en, D. Madigan, and A. Bate, “Logistic regressionin signal detection: another piece added to the puzzle,”

ClinicalPharmacology & Therapeutics , vol. 94.[6] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Automating the bradford hill causality assessmentfor signalling drug side effects,”

Journal of the American MedicalInformatics Association (Submitted) , 2014.[7] A. B. Hill, “The environment and disease: association or causation?”

Proceedings of the Royal Society of Medicine , vol. 58, no. 5, p. 295,1965.[8] D. E. Goldberg and J. H. Holland, “Genetic algorithms and machinelearning,”

Machine Learning , vol. 3, no. 2, pp. 95–99, 1988.[9] T. Windeatt, “Diversity measures for multiple classiﬁer system analysisand design,”

Information Fusion , vol. 6, no. 1, pp. 21–36, 2005.[10] A. Sboner, C. Eccher, E. Blanzieri, P. Bauer, M. Cristofolini, G. Zu-miani, and S. Forti, “A multiple classiﬁer system for early melanomadiagnosis,”

Artiﬁcial Intelligence in Medicine , vol. 27, no. 1, pp. 29–44, 2003.[11] R. Fusco, M. Sansone, A. Petrillo, and C. Sansone, “A multipleclassiﬁer system for classiﬁcation of breast lesions using dynamic andmorphological features in DCE-MRI,” in

Structural, Syntactic, andStatistical Pattern Recognition . Springer, 2012, pp. 684–692.[12] L. G. Esposito and C. Sansone, “A multiple classiﬁer approach fordetecting naked human bodies in images,” in

In Proceedings of the17 th International Conference on Image Analysis and Processing(ICIAP) . Springer, 2013, pp. 389–398.[13] O. P. Corrigan, “A risky business: the detection of adverse drug reac-tions in clinical trials and post-marketing exercises,”

Social Science &Medicine , vol. 55, no. 3, pp. 497–507, 2002.[14] L. H¨armark and A. Van Grootheest, “Pharmacovigilance: methods,recent developments and future perspectives,”

European Journal ofClinical Pharmacology , vol. 64, no. 8, pp. 743–752, 2008.[15] E. P. van Puijenbroek, A. Bate, H. G. Leufkens, M. Lindquist, R. Orre,and A. C. Egberts, “A comparison of measures of disproportionalityfor signal detection in spontaneous reporting systems for adverse drugreactions,”

Pharmacoepidemiology and Drug Safety , vol. 11, no. 1,pp. 3–10, 2002. [16] J. Lexchin, “Is there still a role for spontaneous reporting of adversedrug reactions?”

Canadian Medical Association Journal , vol. 174,no. 2, pp. 191–192, 2006.[17] P. M. Coloma, G. Triﬁr`o, M. J. Schuemie, R. Gini, R. Herings,J. Hippisley-Cox, G. Mazzaglia, G. Picelli, G. Corrao, L. Pedersen et al. , “Electronic healthcare databases for active drug safety surveil-lance: is there enough leverage?”

Pharmacoepidemiology and DrugSafety , vol. 21, no. 6, pp. 611–621, 2012.[18] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson,and R. B. Hubbard, “A novel semi-supervised algorithm for rareprescription side effect discovery,”

IEEE Journal of Biomedical andHealth Informatics , vol. 18, no. 2, pp. 537–547, 2014.[19] J. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Attributes for causal inference in electronic healthcaredatabases,” in

In Proceedings of the IEEE 26th International Sympo-sium on Computer-Based Medical Systems (CBMS) . IEEE, 2013, pp.548–549.[20] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Signalling paediatric side effects using an ensembleof simple study designs,”

Drug Safety , vol. 37, no. 3, pp. 163–170,2014.[21] M. Kuhn, M. Campillos, I. Letunic, L. J. Jensen, and P. Bork, “Aside effect resource to capture phenotypic effects of drugs,”

MolecularSystems Biology , vol. 6, no. 1, pp. 343–348, 2010.[22] A. P. Bradley, “The use of the area under the ROC curve in the eval-uation of machine learning algorithms,”

Pattern Recognition , vol. 30,no. 7, pp. 1145–1159, 1997.[23] Y. Jiang, C. E. Metz, and R. M. Nishikawa, “A receiver operatingcharacteristic partial area index for highly sensitive diagnostic tests.”

Radiology , vol. 201, no. 3, pp. 745–750, 1996.[24] M. Pepe, G. M. Longton, and H. Janes, “Estimation and comparisonof receiver operating characteristic curves,”