Tuning a Multiple Classifier System for Side Effect Discovery using Genetic Algorithms
TTuning a Multiple Classifier System for Side Effect Discovery usingGenetic Algorithms
Jenna M. Reps, Uwe Aickelin and Jonathan M. Garibaldi
Abstract — In previous work, a novel supervised frameworkimplementing a binary classifier was presented that obtainedexcellent results for side effect discovery. Interestingly, uniqueside effects were identified when different binary classifierswere used within the framework, prompting the investigationof applying a multiple classifier system. In this paper weinvestigate tuning a side effect multiple classifying system usinggenetic algorithms. The results of this research show that thenovel framework implementing a multiple classifying systemtrained using genetic algorithms can obtain a higher partialarea under the receiver operating characteristic curve thanimplementing a single classifier. Furthermore, the frameworkis able to detect side effects efficiently and obtains a low falsepositive rate.
I. I
NTRODUCTION S IDE EFFECTS of prescription drugs are a commonoccurrence that often lead to patient morbidity andmortality. When there is an association between a medicalevent (e.g., sickness, rash and weakness) and a drug, thisis termed an adverse event (AE). When the relationship isproven to be causal (i.e., the drug causes the medical event),it is referred to as an adverse drug reaction (ADR).As a large quantity of medical data are often storedin databases, numerous methods have been presented thatmake use of medical databases with the aim of identifyingADRs efficiently [1], [2]. Unfortunately, the majority of thesemethods work by finding medical events that are highlyassociated to a drug, therefore, rather than detecting ADRsthey detect AEs. This has lead to the methods having highfalse positive rates [3], [4] as the majority of associationsare not causal. Recent research has focused on using super-vised techniques such as logistic regression [5] to reducethe impact of confounding (i.e., when a hidden variable isresponsible for the association). These supervised methodsaim to distinguish between associations that are causal ornon-causal by finding alternative causes of the medical event.Unfortunately, this requires generating a large number ofregression models and also requires additional knowledgeof possible confounders (e.g., other possible causes of themedical event). Consequently, these methods are often slowand dependant on current knowledge. Alternatively, a recentframework, side effect classifier (SEC), has been proposedthat applies a single supervised classifier to identify ADRsefficiently [6] and the results suggest this framework is lesssusceptible to confounding.
Jenna M. Reps, Uwe Aickelin and Jonathan M. Garibaldi are with theSchool of Computer Science, The University of Nottingham, UK (email: { jenna.reps, uwe.aickelin, jonathan.garibaldi } @nottingham.ac.uk). ACKGROUND
A. Genetic Algorithms
Genetic algorithms are probabilistic search procedures in-spired by the natural process of evolution [8]. The algorithmis an iterative process that initially starts with a populationof candidate solutions that are randomly generated, andthen these candidate solutions are evolved. Each candidatesolution has a set of genotypes (e.g., parameter values)and the set of genotypes determine the candidate solution’sfitness. During each iteration, a new generation of candidate a r X i v : . [ c s . L G ] S e p ig. 1T HE SCHEMA OF A MULTIPLE CLASSIFIER SYSTEM . solutions are created by recombination and mutation ofthe previous candidate solutions’ genotypes based on theirfitness. B. Multiple Classifier System
The term ensemble is used to describe a composition ofmultiple classifiers. A type of ensemble that consists of acomposition of various different classifiers has frequentlybeen termed a multiple classifier system [9] rather thancalled an ensemble. This is to help distinguish between acombination of the same classifier trained with differentperspectives (e.g., combining decision trees that are trainedusing different independent variables) and a combination ofdifferent classifiers (e.g., combining a SVM, a random forest,a neural network and a logistic regression model). Fig. 1illustrates a multiple classifier system that combines the out-put of multiple single classifiers to generate a single output.The aim of a multiple classifier system is the take advantageof diversity between classifiers to improve the classifyingaccuracy while maintaining efficiency. Multiple classifiersystems have been successfully implemented in numerousmachine learning tasks including diagnosing melanoma [10],classifying breast lesions [11] and detecting naked bodies inimages [12]. In the previous examples, combining multipleclassifiers, under a suitable weighting scheme, was shown toimprove performance compared to a single classifier.As the classifiers used to identify ADRs within the SECframework appear to be diverse, implementing a multipleclassifier system that combines all the classifiers may im-prove the detection of ADRs.
C. Previous Pharmacovigilance
Pharmacovigilance is the study of prescription drug sideeffects. One important part of pharmacovigilance is theprocess of detecting drug side effects after the drugs havebeen approved and marketed. Identifying drug side effects isa difficult task due to the majority of side effects relying onmultiple factors, so it is common for some side effects tobe observed rarely. Clinical trails are unable to identify themajority of side effects prior to marketing due to them only
Fig. 2A
N EXAMPLE OF AN
SRS
DATABASE ENTITY RELATIONSHIP DIAGRAM involving a small number of patients and being conductedunder unrealistic conditions [13]. For example, patients in-volved in clinical trials are unlikely to take other drugs duringthe trial, so drug interactions can not be analysed.In general, the most widely implemented pharmacovigi-lance techniques have been developed for a specific type ofmedical database known as the spontaneous reporting system(SRS) databases [14]. These databases consist of all thereports made by medical staff or the general public relating toa suspected ADR. The general design of the SRS databases isillustrated in Fig. 2. The SRS databases contain natural linksbetween drugs and medical events, see Fig. 3. Sometimesadditional information about the patient is included into thereport, such as age and gender, but this is not compulsory.The techniques for detecting ADRs look for medical eventslinked disproportionally more to the drug than expected [15].Unfortunately, due to the reporting being voluntary, manyADRs may not be reported, and it is possible that somerare ADRs may never be noticed. This under-reporting canprevent the early detection of ADRs and this means patientsare put at risk for longer. In addition, there are known dataquality issues such as missing, duplicated or incorrect data[16].Due to the limitations associated with the SRS databases,recent work has focused on using different types of medicaldatabases [17]. One example is the longitudinal health-care databases. These databases contain medical informationabout patients often spanning many years and it is commonfor them to contain records for millions of patients. As thistype of database does not rely on voluntary reporting, itpresents a unique perspective for signalling ADRs. However,it has been shown to suffer from different limitations. Themain limitation is that there are no clear links betweendrugs and medical events within the data itself, so potentiallinks are inferred by finding the medical events that occurshortly after the drug in time. This is illustrated in Fig. 5.Unfortunately, the majority of the drug and medical eventslinked by time are associated but do no correspond to ADRs,and it has proved difficult for unsupervised algorithms todistinguish between the non-causal and causal relationships.In [18], the authors presented a semi-supervised algorithmthat requires a user to input a drug of interest and then returnsa ranked list of medical events. The higher a medical event is ig. 5A
N EXAMPLE OF INFERRING A LINK BETWEEN A DRUG AND MEDICAL EVENT WITHIN A LONGITUDINAL HEALTHCARE DATABASE . T
HE MEDICALEVENTS ARE REPRESENTED BY CIRCLES AND THE DRUGS REPRESENTED BY SQUARES . T
HE POTENTIAL ACUTE
ADR
S ARE THE MEDICAL EVENTSOBSERVED DURING THE [ t , t ] TIME PERIOD CENTRED AROUND THE PRESCRIPTION .Fig. 3I
LLUSTRATION OF HOW THE REPORTS IN THE
SRS
DATABASE CONTAINDIRECT LINKS BETWEEN DRUGS AND MEDICAL EVENTS . E
ACH REPORTWITHIN THE DATABASE CONSISTS OF AN OBSERVATION OF A PATIENTTAKING A DRUG AND THEN EXPERIENCING THE MEDICAL EVENTSOMETIME AFTER . ranked by the algorithm, the more likely that medical eventcorresponds to a rare ADR of the specified drug of interest.The algorithm generated the data by extracting attributesthat are insightful for ADR detection from a longitudinalhealthcare database and determined labels for some medicalevents by mining online medical websites. The labelled andunlabelled data were then used to clusters similar medicalevents into either an ADR cluster, an indicator (a cause oftaking the drug) cluster or a noise cluster. Medical events Fig. 4A
N EXAMPLE OF A LONGITUDINAL HEALTHCARE DATABASE ENTITYRELATIONSHIP DIAGRAM assigned to the noise cluster were filtered, and the remainingmedical events where ranked based on how often theyoccurred after the drug divided by how often they occurredbefore the drug multiplied by a cluster dependent weight.The success of the semi-supervised algorithm themprompted the idea of generating causal inference basedattributes for a selection of drug-medical event pairs thatare definitively known ADRs or non-ADRs [19] and usingthis data to train a classifier that can then be used to pre-dict new ADRs. One such supervised framework generatedattributes based on the counterfactual theory of causality[20], whereas another framework, SEC, generated attributesbased on the Bradford Hill causality criteria. Rather thanmining online forums for the known ADR and non-ADRlabels, both frameworks used an online resource that containsists of ADRs that were mined from drug packaging. Bothsupervised frameworks demonstrated excellent performanceand previous results suggest supervised techniques may helpimprove current pharmacovigilance.
1) SEC Framework:
The previously presented SECframework is a supervised algorithm for detecting ADRs.The algorithm automates the technique of inferring causalityvia the Bradford Hill causality criteria, as this technique iscommonly applied to assess whether a side effect is causedby a drug or not. The SEC framework requires three steps.The first step is data generation where suitable labelleddata are extracted for each drug-medical event pair thatrepresent a possible acute ADR. The second step is traininga binary classifier using the labelled data to classify eachdrug-medical event pair as an ADR or non-ADR, and thefinal step is applying the trained classifier to new unlabelleddata.
Step 1) Data generation
As we are interesting in detecting acutely occurring ADRs,we find the drug-medical event pairs that are possible ADRsby investigating the medical events that occur within a monthof a drug being prescribed. To train a binary classifierwe need a set of attribute vectors x i ∈ R n and theircorresponding class y i ∈ {− , } . In the SEC framework,each data point corresponds to a drug-medical event pairof interest, where the i th drug-medical event pair has theattribute vector x i and class y i . Therefore, to generate thetraining data, the first step is to identify the drug-medicalevent pairs of interest, the second step is to determine theirlabels and the final step is to calculate their attributes.To identify the drug-medical events pairs of interest, werestrict out attention to a set of specified drugs, denoted by D . For each drug d i ∈ D , we use temporal relationshipsto identify the risk medical events of d i ( RM E d i ). Therisk medical events of d i are the medical events that wereobserved during the month after a prescription of d i for oneor more patients, RM E d i = { medical events | the medicalevent occurs within a month of d i for one or more patients } .The drug-medical event pairs of interest are all the possiblecombinations of d - e , where d ∈ D and e ∈ RM E d . Thedrug-medical event pairs of interested with labels are thendetermined. For the i th drug-medical event pair, if the medicalevent is labelled as a known side effect of the drug withinthe online drug resource known as SIDER [21], then thepair is labelled as an ADR ( y i = 1 ). Alternatively, if themedical event cannot possibly correspond to an acute ADR(e.g, the medical event is ‘cancer’, ‘menopause’ or ‘death offamily member’), the drug-medical event is labelled as a non-ADR ( y i = − ). Any drug-medical event pair neither listedon SIDER as corresponding to a known ADR nor clearly anon-ADR is ignored as the pair has no definitive label.For the i th drug-medical event pair labelled as an ADR ornon-ADR, we calculate the Bradford Hill causality criteriabased attributes, described in [6] and denote the vectorconsisting of these attributes by x i . The attributes are derived from a selection of the Bradford Hill causality criteria: • Association strength:
How strong the association be-tween the drug and medical event is. • Temporality:
Does the drug precede the medical eventor the other way? • Specificity:
How specific the medical event is, or howsimilar patients experiencing the medical event are. item
Biological gradient:
Measures whether the probabilityof the medical event increases as the drug dosageincreases. • Experimentation:
Does the medical event start and stopwhen the drug starts and stops?In summary, for the i th labelled drug-medical event pairwe have ( x i , y i ) , where x i is the Bradford Hill causalityattributes and y i = 1 when the i th drug-medical event pair isa known ADR and y i = − when the i th drug-medical eventpair is a known non-ADR. The complete set of labelled datais denoted by X , where X = { ( x i , y i ) } . Step 2)Training a binary classifier
The labelled data are then used to train a binary classifier(the choice of classifier is determined by the user as anyclassifier can be used within the framework), f : X → Y ; f ( x i ) → {− , } (1)where f ( x i ) = − means the drug-medical event pairis classified as a non-ADR and f ( x i ) = 1 means thedrug-medical event pair is classified as an ADR. The chosenclassifier is trained using ten-fold cross validation to reduceoverfitting. In previous work [6], the random forest classifierwas found to perform better than a support vector machine,a logistic regression and a naive Bayes classifier. Step 3)Applying trained classifier
The trained classifier is then applied to the attribute vector x ∗ for a new drug-medical event pair, and the prediction f ( x ∗ ) is returned.For evaluating the framework, the labelled data are par-titioned into training/testing data and validation data. Thetraining/testing data are used to train the classifier and thevalidation data are used to evaluate the performance of thetrained classifier by comparing the predicted class with thetrue class. III. M ATERIALS
The THIN database contains temporal medical data forover 11 million patients (approximately 4 million currentlyactive patients). The data is anonymised, so each patient isrepresented by a unique patient ID rather than the patientsreal name. There are three main tables within the THINdatabase, the patient table, the medical table and the therapytable, see Figs. 6-8. The patient table contains personalinformation about each patient in the database including theiryear of birth, their gender and their date of registration. Thetherapy table contains timestamped records of each patient’sdrug prescription history, so each record includes the patient ig. 6T
HE PATIENT TABLE WITHIN THE
THIN
DATABASE .Fig. 7T
HE MEDICAL TABLE WITHIN THE
THIN
DATABASE . ID, the date of the prescription and information about theprescription (drug details and dose details). The medicaltable is similar to the therapy table but contains timestampedrecords of each patient’s medical event history (i.e., illnesses,diseases, laboratory tests and administrative events), so atypical record contains the patient ID, the date of the medicalevent and the medical event information, recorded via theREAD codes.Each READ codes consist of five elements from thealphabet { a − z, A − Z, − , ˙ } and they have a hierarchalstructure. The depth of a node within a tree is the length ofthe minimum path from the node to the root. Unfortunately, Fig. 8T
HE THERAPY TABLE WITHIN THE
THIN
DATABASE . the READ codes have redundancies and the same medicalevent can be represented by various distinct READ codes.This can cause issues for data miners, however the SECalgorithm generates attributes specifically to prevent thisissue having a negative effect on its ability to detect ADR.IV. M ETHODOLOGY
In this paper we are developing a multiple classifier systemto be implemented within the SEC framework and com-paring its ability to detect side effects with the frameworkimplementing a single classifier. Therefore, in this section themethods used to analyse the single classifier framework andthe multiple classifier system framework are both described.To evaluate each framework, we determine all the la-belled drug-medical event pairs correspond to the drugs:nifedipine, amlodipine, felodipine, nicardipine, verapamil,ciprofloxacin, ofoxacin, norfloxacin, nalidixic acid, moxi-floxacin, fluconazole, itraconazole, posaconazole, voricona-zole, ibuprofen, fenoprofen, ketoprofen, celecoxib, flurbipro-fen, nabumetone, naproxen, budesonide, beclometasone, hy-drocortisone and prednisolone. These labelled data are com-posed of the Bradford Hill causality criteria derivedattributes for each drug-medical event data-point and a labelspecifying whether the drug-medical event data-point is listedas an ADR on SIDER or one of the manually selected non-ADRs.There were a total of drug-medical event data pointswith known labels corresponding to the chosen drugs. Thelabelled data were partitioned into training/testing data X T (80% of the labelled data) and validation data X V (20% ofthe labelled data). The training/testing date were used to trainthe classifier or multiple classifier system and the validationdata were used to evaluate the framework implementing thesingle classifier or multiple classifier system.The measure used to determine the effectiveness of eachframework is the area under the receiver operating charac-teristic curve. This measure corresponds to the probabilityof a drug-medical event pair known to be an ADR beingassigned a higher confidence of being within the ADR classby the framework than a drug-medical event pair known tobe a non-ADR [22]. In particular, we restrict out attentionto a partial area, as we are only interested in the section ofthe curve where few drug-medical event pairs are classedas side effects [23]. When many drug-medical event pairsare classed as ADRs, there are likely to be many non-ADRspairs incorrectly classed as ADRs and this is undesirable.The partial area under the curve that we are interested in isdenoted by pAUC [0 . , and a more detailed explanation ofhow the measure is calculated can be found in section IV-C. A. SEC Framework: Single Classifier
To analyse the single classifier framework, the SEC frame-work implementing either a random forest, support vectormachine, logistic regression, naive Bayes or k-nearest neigh-bours classifier is trained using ten-fold cross validation onthe training/testing data X T . The trained classifier is denotedby f : R → {− , } , where f ( x i ) = − represents the i th rug-medical event pair being classifier as a non-ADR and f ( x i ) = 1 represents the i th drug-medical event pair beingclassifier as an ADR. B. SEC Framework: Ensemble Classifier
The multiple classifier system framework requires train-ing multiple classifiers and learning the optimal weightedcombination of the classifiers. In this framework, after thetraining data is generated, the data is firstly used to trainvarious classifiers and then used to determine a weightedcombination of all the classifier.
1) Training the classifiers:
Five classifiers (random forest,support vector machine, logistic regression, naive Bayes andk-nearest neighbours) are trained via ten fold cross validationto determine the optimal parameters that maximise the partialarea of interest under the curve (pAUC [0 . , , see section IV-C) using the training/testing X T set. Each classifier is trainedusing a grid search over suitable parameter values, these canbe seen in Table I and the chosen parameter values are alsolisted.For each trained classifier f i , we can also extract theclassifiers confidence that the drug-medical event is in theADR class, this is denoted by c i : R → [0 , . So c i ( x j ) isthe confidence of the i th classifier that the j th drug-medicalevent pair is an ADR.
2) Determining the weights:
Using these confidence func-tions, genetic algorithms are applied to find the optimalweights β i , i ∈ [1 , for the multiple classifier system thatdetermines the class of the j th drug-medical event pair by, f ( x j ) = (cid:26) if (cid:80) i β i c i ( x j ) ≥ α ∈ (0 , − otherwise (2)The value α is the natural threshold and this controls thestringency of the multiple classifier system.The weights are determined by implementing a geneticalgorithm with a mutation rate of 0.1 and applying elitismwith a candidate population size of 1000 until convergence,see Table II for full details. The fitness of each weightvector ( β ) is the ten fold cross validation average of thethe partial AUC over the specificity range [0.9,1] for themultiple classifier system based on that weight scheme onthe training/testing set. The optimal weight vector was, β = ( β , β , β , β , β )= (0 . , . , . , . , . (3)where c () is random forest, c () is support vector machine, c () is K-nearest neighbours, c () is logistic regression and c () is naive Bayes. C. Evaluation
The framework implementing a single trained classifier orthe multiple classifier system is then applied to the validationset and the prediction of each data-point in the validation setis compared with the truth. The number of true positives(TP), false positives (FP), false negatives (FN) and truenegatives (TN) are calculated as follows, TP : |{ i | y i = f ( x i ) = 1 }| FP : |{ i | y i = − , f ( x i ) = 1 }| FN : |{ i | y i = 1 , f ( x i ) = − }| TN : |{ i | y i = f ( x i ) = − }| Using the above values, the accuracy, precision, sensitivity,and specificity can be calculated,Sensitivity = (TP) / (TP+FN)Specificity = (TN) / (TN+FP)Accuracy = (TP+TN) / (TP+FP+FN+TN)Precision = (TP) / (TP+FP) (4)The receiver operating characteristic (ROC) curve is gener-ated by potting the sensitivity against 1 minus the specificityand the AUC is the area under this curve. The AUC measuresthe general ability of a classifier rather than only consideringhow well it does it at its natural threshold and is a fairer mea-sure for comparing different classifiers. The pAUC [0 . , isthe partial area under the ROC curve, between the specificityvalues of . − , this value is useful as we are interestedin the classifiers ability when the specificity is high and thenumber of of false positives is low.V. R ESULTS & D
ISCUSSION
The results are presented in Table III and ROC plotsfor the framework implementing the range of classifiersor the multiple classifier system can be seen in Fig. 9.The optimal value for α (the multiple classifier system’snatural threshold) was found to be α = 0 . . It can beseen that the framework implementing a multiple classifiersystem ( f ) obtained a superior accuracy, sensitivity andpAUC [0 . , than the framework implementing any singleclassifier. However, using a bootstrap test to compare thepAUC [0 . , s [24] of the random forest and the multipleclassifier system at a 5% significance level, the pAUC [0 . , was not shown to be significantly different (p-value=0.499).The highest precision and specificity values were obtained bythe framework implementing a support vector machine andnot the multiple classifier system. This is probably due to themultiple classifier system being optimised specifically for thepartial AUC. If the precision or specificity was deemed to bemore important, different weights could be calculated by thegenetic algorithm to optimise the multiple classifier systemfor the desired measure (e.g., precision or specificity).The ensemble weights do not necessarily reflect the impor-tance of the classifier within the ensemble, as each classifierhas varying ranges for its confidence function values. It maybe useful to normalise the confidence function values priorto determining the optimal ensemble weights. If the classifierconfidence weights were normalised, then the ensembleweights would correspond to the importance of the classifierand this would then help indicate which of the classifiers wasmost influential within the ensemble. This knowledge couldbe used to remove classifiers that had little influence.The advantage of ensemble approaches rather than relyingon any individual classifier is that they generally reduce theclassifier’s variance. This is useful for ADR detection, asthe training set is likely to change and grow as new ADRs ABLE IT
HE DIFFERENT CLASSIFIERS USED BY THE MULTIPLE CLASSIFIER SYSTEM AND THEIR OPTIMAL PARAMETERS . Classifier Parameters :-(grid search range) Optimal Parameters f : Random Forest mtry:-[1,30] mtry=11 f : Support Vector Machine (Radial) sigma:-(0,1], C:-(0,10] sigma=0.0978, C=6.1624 f : K-Nearest Neighbours K:-[1,100] K= 17 f : Logistic Regression decay:-[0,10] decay=0 f : Naive Bayes fL:-[0,1], usekernel:- { TRUE,FALSE } fL=0, usekernel=TRUE TABLE IIT
HE GENETIC ALGORITHM PARAMETERS . Populationsize Crossover type Mutation type Elitismused Selection criteria Initialisation Stoppingcriteria1000 Local arithmeticcrossover Uniform randommutation True Fitness proportional selectionwith fitness linear scaling Uniformlychosen from[0,1] After 500iterations
TABLE IIIT
HE RESULTS OF THE
SEC
FRAMEWORK IMPLEMENTING A SINGLE CLASSIFIER OR MULTIPLE CLASSIFIER SYSTEM FOR THE VALIDATION SET . Framework classifier Accuracy Precision Sensitivity Specificity pAUC [ . , f : Random Forest 0.930 0.789 0.380 0.989 0.769 f : Support Vector Machine 0.921 f : K-Nearest neighbours 0.917 0.729 0.222 0.991 0.695 f : Logistic Regression 0.086 0.085 f : Naive Bayes 0.912 0.577 0.354 0.972 0.710 f : Multiple Classifier System are discovered. An ensemble approach for ADR detectionis also useful, as previous results have shown that eachclassifier tends to make different mistakes, so the ensemblecan overcome an individual classifiers misclassification. Thisis the likely reason why the ensemble obtained an improvedperformance. However, the disadvantages are that the en-semble is computationally longer due to the requirement oftraining multiple classifiers and then tuning the ensembleweights. Although the multiple classifier system improvedthe accuracy and pAU C [0 . , compared to each singleclassifier, the improvement was not significant. This maysuggest that when the training data is sufficiently large toenable good performance from a single classifier, the smallbenefit in performance of the ensemble is not enough toovercome the extra cost of complexity. It would be interestingto investigate how the ensemble performs relative to eachindividual classifier at various training set sizes.VI. C ONCLUSIONS
In previous work, it was shown that different classifiersdetected different side effects. In this paper we combinedvarious classifiers with the aim of improving the overalldiscovery of side effects. The classifiers were combined usinggenetic algorithms to tune a multiple classifier system thatcan be used within a side effect discovery framework. Wethen compared the side effect discovery framework imple-menting a multiple classifier system with the framework im-
Fig. 9T HE ROC
PLOTS FOR THE FRAMEWORKS ABILITY TO DETECT
ADR
SWHEN IMPLEMENTING THE DIFFERENT CLASSIFIERS . lementing a single classifier. The results show that a largerpartial AUC can be obtained by a multiple classifier systemthat integrates multiple diverse classifier by calculating aweighted aggregate of their confidences that a data-pointbelongs to the class ADR. This research presents a noveluseful application of genetic algorithms.Possible areas of future work could investigate using asuitable evolutionary algorithm to tune each of the individualclassifiers rather than using a grid search (i.e., a selection ofvalues for each parameter in input and the search is done overall possible parameter combinations), as this may increasetheir individual performance in addition to the multiplesystem classifiers performance.R EFERENCES[1] G. N. Nor´en, J. Hopstadius, A. Bate, K. Star, and I. R. Edwards,“Temporal pattern discovery in longitudinal electronic patient records,”
Data Mining and Knowledge Discovery , vol. 20, no. 3, pp. 361–387,2010.[2] I. Zorych, D. Madigan, P. Ryan, and A. Bate, “Disproportion-ality methods for pharmacovigilance in longitudinal observationaldatabases,”
Statistical Methods in Medical Research , vol. 22, no. 1,pp. 39–56, 2013.[3] P. B. Ryan, D. Madigan, P. E. Stang, J. Marc Overhage, J. A. Racoosin,and A. G. Hartzema, “Empirical assessment of methods for riskidentification in healthcare data: results from the experiments of theObservational Medical Outcomes Partnership,”
Statistics in Medicine ,vol. 31, no. 30, pp. 4401–4415, 2012.[4] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson,and R. B. Hubbard, “Comparison of algorithms that detect drugside effects using electronic healthcare databases.”
Soft Computing ,vol. 17, no. 12, pp. 2381–2397, 2013. [Online]. Available: http://link.springer.com/content/pdf/10.1007%2Fs00500--013--1097--4.pdf[5] O. Caster, N. Nor´en, D. Madigan, and A. Bate, “Logistic regressionin signal detection: another piece added to the puzzle,”
ClinicalPharmacology & Therapeutics , vol. 94.[6] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Automating the bradford hill causality assessmentfor signalling drug side effects,”
Journal of the American MedicalInformatics Association (Submitted) , 2014.[7] A. B. Hill, “The environment and disease: association or causation?”
Proceedings of the Royal Society of Medicine , vol. 58, no. 5, p. 295,1965.[8] D. E. Goldberg and J. H. Holland, “Genetic algorithms and machinelearning,”
Machine Learning , vol. 3, no. 2, pp. 95–99, 1988.[9] T. Windeatt, “Diversity measures for multiple classifier system analysisand design,”
Information Fusion , vol. 6, no. 1, pp. 21–36, 2005.[10] A. Sboner, C. Eccher, E. Blanzieri, P. Bauer, M. Cristofolini, G. Zu-miani, and S. Forti, “A multiple classifier system for early melanomadiagnosis,”
Artificial Intelligence in Medicine , vol. 27, no. 1, pp. 29–44, 2003.[11] R. Fusco, M. Sansone, A. Petrillo, and C. Sansone, “A multipleclassifier system for classification of breast lesions using dynamic andmorphological features in DCE-MRI,” in
Structural, Syntactic, andStatistical Pattern Recognition . Springer, 2012, pp. 684–692.[12] L. G. Esposito and C. Sansone, “A multiple classifier approach fordetecting naked human bodies in images,” in
In Proceedings of the17 th International Conference on Image Analysis and Processing(ICIAP) . Springer, 2013, pp. 389–398.[13] O. P. Corrigan, “A risky business: the detection of adverse drug reac-tions in clinical trials and post-marketing exercises,”
Social Science &Medicine , vol. 55, no. 3, pp. 497–507, 2002.[14] L. H¨armark and A. Van Grootheest, “Pharmacovigilance: methods,recent developments and future perspectives,”
European Journal ofClinical Pharmacology , vol. 64, no. 8, pp. 743–752, 2008.[15] E. P. van Puijenbroek, A. Bate, H. G. Leufkens, M. Lindquist, R. Orre,and A. C. Egberts, “A comparison of measures of disproportionalityfor signal detection in spontaneous reporting systems for adverse drugreactions,”
Pharmacoepidemiology and Drug Safety , vol. 11, no. 1,pp. 3–10, 2002. [16] J. Lexchin, “Is there still a role for spontaneous reporting of adversedrug reactions?”
Canadian Medical Association Journal , vol. 174,no. 2, pp. 191–192, 2006.[17] P. M. Coloma, G. Trifir`o, M. J. Schuemie, R. Gini, R. Herings,J. Hippisley-Cox, G. Mazzaglia, G. Picelli, G. Corrao, L. Pedersen et al. , “Electronic healthcare databases for active drug safety surveil-lance: is there enough leverage?”
Pharmacoepidemiology and DrugSafety , vol. 21, no. 6, pp. 611–621, 2012.[18] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson,and R. B. Hubbard, “A novel semi-supervised algorithm for rareprescription side effect discovery,”
IEEE Journal of Biomedical andHealth Informatics , vol. 18, no. 2, pp. 537–547, 2014.[19] J. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Attributes for causal inference in electronic healthcaredatabases,” in
In Proceedings of the IEEE 26th International Sympo-sium on Computer-Based Medical Systems (CBMS) . IEEE, 2013, pp.548–549.[20] J. M. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, andR. B. Hubbard, “Signalling paediatric side effects using an ensembleof simple study designs,”
Drug Safety , vol. 37, no. 3, pp. 163–170,2014.[21] M. Kuhn, M. Campillos, I. Letunic, L. J. Jensen, and P. Bork, “Aside effect resource to capture phenotypic effects of drugs,”
MolecularSystems Biology , vol. 6, no. 1, pp. 343–348, 2010.[22] A. P. Bradley, “The use of the area under the ROC curve in the eval-uation of machine learning algorithms,”
Pattern Recognition , vol. 30,no. 7, pp. 1145–1159, 1997.[23] Y. Jiang, C. E. Metz, and R. M. Nishikawa, “A receiver operatingcharacteristic partial area index for highly sensitive diagnostic tests.”
Radiology , vol. 201, no. 3, pp. 745–750, 1996.[24] M. Pepe, G. M. Longton, and H. Janes, “Estimation and comparisonof receiver operating characteristic curves,”