Detecting adverse drug reactions for the drug Simvastatin
22012 Fourth International Conference on Multimedia Information Networking and Security
Detect adverse drug reactions for drug Simvastatin
Yihu i • 1,11.1 1 2 Uwe Aickelin 'Institute of Intelligent Information Processing Department of Computer Science,Shandong Polytechnic University, China University of Nottingham,UKYihui liu [email protected] II. FEATUREMATRIX AND FEATURE SELECTION
Abstract—Adverse drug reaction (ADR) is widelyconcerned for public health issue. In this study wepropose an original approach to detect the ADRs usingfeature matrix and feature selection. The experiments arecarried out on the drug Simvastatin. Major side effectsfor the drug are detected and better performance isachieved compared to other computerized methods. Thedetected ADRs are based on the computerized method,further investigation is needed.
Keywords- adverse drug reaction; feature matrix;feature selection; Simvastatin
I. INTRODUCTIONAdverse drug reaction (ADR) is widely concerned forpublic health issue. ADRs are one of most common causes towithdraw some drugs from market [1 ]. Now two majormethods for detecting ADRs are spontaneous reportingsystem (SRS) [ 2 , 3 ], and prescription event monitoring(PEM) [ 4 , 5 ]. The World Health Organization (WHO)defines a signal in pharmacovigilance as "any reportedinformation on a possible causal relationship between anadverse event and a drug, the relationship being unknown orincompletely documented previously"[ 6]. For spontaneousreporting system, many machine learning methods are usedto detect ADRs, such as Bayesian confidence propagationneural network (BCPNN) [7], decision support method [8],genetic algorithm [9], knowledge based approach [10], etc.One limitation is the reporting mechanism to submit ADRreports [8], which has serious underreporting and is not ableto accurately quantify the corresponding risk. Anotherlimitation is hard to detect ADRs with small number ofoccurrences of each drug-event association in the database.In this paper we propose feature selection approach todetect ADRs from The Health Improvement Network(THIN) database. First feature matrix, which represents themedical events for the patients before and after taking drugs,is created by linking patients' prescriptions andcorresponding medical events together. Then significantfeatures are selected based on feature selection methods,comparing the feature matrix before patients take drugs withone after patients take drugs. Finally the significant ADRscan be detected from thousands of medical events based oncorresponding features. Experiments are carried out on thedrug Simvastatin. Good performance is achieved.
A. The Extraction of Feature Matrix
To detect the ADRs of drugs, first feature matrix isextracted from THIN database, which describes the medicalevents that patients occur before or after taking drugs. Thenfeature selection method of Student's t-test is performed toselect the significant features from feature matrix containingthousands of medical events. Figure 1 shows the process todetect the ADRs using feature matrix. Feature matrix A describes the medical events for each patient during 60 daysbefore they take drugs. Feature matrix B reflects themedical events during 60 days after patients take drugs. Inorder to reduce the effect of the small events, and save thecomputation time and space, we set 100 patients as a group.Matrix X and Y are feature matrix after patients are dividedinto groups. Matrix X
Figure 1. The process to detect ADRs. Matrix A and B arefeature matrix before patients take drugs or after patients takedrugs. The time period of observation is set to 60 days.Matrix X and Y are feature matrix after patients are dividedinto groups. We set 100 patients as one group. B. Medical Events and Readcodes
Medical events or symptoms are represented by medicalcodes or Readcodes. There are 103387 types of medicalevents in "Readcodes" database. The Read Codes used ingeneral practice (GP), were invented and developed by DrJames Read in 1982. The NHS (National Health Service) hasexpanded the codes to cover all areas of clinical practice.The code is hierarchical from left to right or from level 1 tolevel 5. It means that it gives more detailed information fromlevel 1 to level 5. Table 1 shows the medical symptomsbased on Readcodes at level 3 and at level 5. 'Other softtissue disorders' is general term using Readcodes at level 3.`Foot pain', 'Heel pain', etc., give more details usingReadcodes at level 5.
Matrix ATHINdatabase GroupA Featureselection SignificantFeaturesDrug ID • DrugID TherapyPatients Group B __________ •Matra B Matrix rPrescriptiondate Aor B
Medical C. Feature Selection Based on Student's t-test
Feature extraction and feature selection are widely usedin biomedical data processing [11-18]. In our research weuse Student's t-test [19] feature selection method to detectthe significant ADRs from thousands of medical events.Student's t-test is a kind of statistical hypothesis test basedon a normal distribution, and is used to measure thedifference between two kinds of samples.
TABLE I.MEDICAL EVENTS BASED ONREADCODES AT LEVEL
D. Other Parameters
The variable of ratio R, is defined to evaluate significantchanges of the medical events, using ratio of the patientnumber after taking the drug to one before taking the drug.The variable R represents the ratio of patient number aftertaking the drug to the number of whole population having oneparticular medical symptom.The ratio variables R, and R are defined as follows: N //N B if N B A if N B = = N A I N where N B and N A represent the numbers of patientsbefore or after they take drugs for having one particularmedical event respectively. The variable N represents thenumber of whole population who take drugs. III. EXPERIMENTS AND RESULTS
Simvastatin [20], under the trade name Zocor, is ahypolipidemic drug used to control elevated cholesterol, orhypercholesterolemia. It is a member of the statin class ofpharmaceuticals. Simvastatin has side effects [20,21,22]:severe allergic reactions (rash; hives; itching; difficultybreathing; tightness in the chest; swelling of the mouth, face,lips, or tongue; unusual hoarseness); burning, numbness, ortingling; change in the amount of urine produced; confusion;dark or red-colored urine; decreased sexual ability;depression; dizziness; fast or irregular heartbeat; fever, chills,or persistent sore throat; joint pain; loss of appetite; memoryproblems; muscle pain, tenderness, or weakness (with orwithout fever and fatigue); pale stools; red, swollen, blistered,or peeling skin; severe or persistent nausea or stomach orback pain; shortness of breath; trouble sleeping; unusualbruising or bleeding; unusual tiredness or weakness;vomiting; yellowing of the skin or eyes.14905 patients from 20GP data in THIN database aretaking Simvastatin, and 13060 medical events are obtainedbased on Readcodes at level 1-5. After grouping them,149x13060 feature matrix is obtained. For Readcodes at level1-3, 149x2693 feature matrix is obtained. Table 2 shows the top 30 detected results in ascendingorder of p value of Student's t-test, using Readcodes at level1-5 and at level 1-3. The detected results are using p valueless than 0.05, which represent the significant change afterpatients take the drug. Table 3 shows the results indescending order of the ratio of the number of patients aftertaking the drug to one before taking the drug. Table 4 showspotential ADRs related cancer for Simvastatin. The detectedADRs are based on our computerized method, furtherinvestigation is needed.It is clear that our detected results are consistent withpublished side effects for statin drugs [21, 22]. Major ADRsof 'muscle and musculoskeletal' events for statin drugs aredetected not only based on Readcodes at level 1-5, but alsobased on Readcodes at level 1-3.
IV. CONCLUSIONS
In this study we propose a novel method to successfullydetect the ADRs using feature matrix and feature selection. Afeature matrix, which characterizes the medical events beforepatients take drugs or after patients take drugs, is createdfrom THIN database. The feature selection method ofStudent's t-test is used to detect the significant features fromthousands of medical events. The significant ADRs, whichare corresponding to significant features, are detected.Experiments are performed on the drug Simvastatin.Compared to other computerized method, our proposedmethod achieves good performance.
REFERENCES[1] G. Severino, and M.D. Zompo, "Adverse dmg reactions: role ofpharmacogenomics," Pharmacological Research, vol. 49, pp. 363-373, 2004.[2] Y. Qian, X. Ye, W. Du, J. Ren, Y. Sun, H. Wang B. Luo, Q. Gao, M.Wu, and J. He, "A computerized system for signal detection inspontaneous reporting system of Shanghai China,"Pharmacoepidemiology and Drug Safety, vol. 18, pp. 154-158, 2009.[3] K.S. Park, and 0. Kwon, "The state of adverse event reporting andsignal generation of dietary supplements in Korea," RegulatoryToxicology and Pharmacology, vol. 57, pp. 74-77, 2010.[4] R. Kasliwal, L.V. Wilton, V. Cornelius, B. Aurich-Barrera, S.A.W.Shakir, "Safety profile of Rosuvastatin-results of a prescription-event monitoring study of 11680 patients," Drug Safety, vol. 30, pp.157170, 2007.[5] R.D. Mann, K. Kubota, G. Pearce, and L. Wilton, "Salmeterol: astudy by prescription-event monitoring in a UK cohort of 15,407patients," J Clin Epidemiol, vol. 49, pp. 247-250, 1996..[6] R.H. Meyboom, M. Lindquist, A.C. Egberts, and I.R. Edwards,"Signal detection and follow-up in pharmacovigilance," Dmg Saf.,vol. 25, pp. 459-465, 2002.[7] A. Bate, M. Lindquist, I.R. Edwards, S. Olsson, R. Orre, A. Lansner,and R.M. De Freitas, "A Bayesian neural network method foradverse drug reaction signal generation," Eur J Clin Pharmacol, vol.54, pp.315-321, 1998.[8] M. Hauben and A. Bate, "Decision support methods for the detectionof adverse events in post-marketing data," Dmg Discovery Today,vol. 14, pp. 343-357, 2009.[9] Y. Koh, C.W. Yap, and S.C. Li, "A quantitative approach of usinggenetic algorithm in designing a probability scoring system of anadverse drug reaction assessment system," International Journal ofMedical Informatics, vol. 77, pp. 421-430, 2008.[10]C. Henegar, C. Bousquet, A.L. Lillo-Le, P. Degoulet, and M.C.Jaulent, "A knowledge based approach for automated signalgeneration in pharmacovigilance," Stud Health Technol Inform,vol. 107, pp. 626-630, 2004. R = NB NA R1 R2
Level1-5 1 1Z12.00 Chronic kidney disease stage 3 185 1095 5.92 7.352 M03z000 C,ellulitis NOS 98 503 5.13 3.373 F4C0.00 Acute conjunctivitis 113 525 4.65 3.524 N131.00 Cervicalgia - pain in neck 140 609 4.35 4.095 1106z000 Chest infection NOS 284 1201 4.23 8.066 N143.00 Sciatica 83 366 4.41 2.467 F46..00 Cataract 40 312 7.80 2.098 1M10.00 Knee pain 198 762 3.85 5.119 A53..11 Shingles 41 262 6.39 1.7610 C34..00 Gout 107 381 3.56 2.5611 1A55.00 Dysuria 70 308 4.40 2.0712 N245.17 Shoulder pain 185 717 3.88 4.8113 F45..00 Glaucoma 20 148 7.40 0.9914 K190.00 Urinary tract infection, site not specified 128 607 4.74 4.0715 F501.00 Infective otitis extema 89 372 4.18 2.5016 1D14.00 C/O: a rash 152 689 4.53 4.6217 N094K12 Hip pain 96 461 4.80 3.0918 1832.11 Ankle swelling symptom 34 190 5.59 1.2719 1C9..00 Sore throat symptom 97 410 4.23 2.7520 B33..11 Basal cell carcinoma 42 212 5.05 1.42Level1-3 1 1106..00 Acute bronchitis and bronchiolitis 598 2221 3.71 14.902 1Z1..00 Chronic renal impairment 213 1286 6.04 8.633 171..00 Cough 571 2192 3.84 14.714 N24..00 Other soft tissue disorders 807 2643 3.28 17.735 N21..00 Peripheral enthesopathies and allied syndromes 265 1054 3.98 7.076 1105..00 Other acute upper respiratory infections 213 1074 5.04 7.217 M03..00 Other cellulitis and abscess 140 659 4.71 4.428 F4C..00 Disorders of conjunctiva 147 731 4.97 4.909 173..00 Breathlessness 461 1403 3.04 9.4110 19F..00 Diarrhoea symptoms 189 861 4.56 5.7811 K19..00 Other urethral and urinary tract disorders 221 1010 4.57 6.7812 183..00 Oedema 177 795 4.49 5.3313 N09..00 Other and unspecified joint disorders 355 1413 3.98 9.4814 N13..00 Other cervical disorders 146 648 4.44 4.3515 F46..00 Cataract 67 435 6.49 2.9216 1B1..00 General nervous symptoms 437 1413 3.23 9.4817 J57..00 Other disorders of intestine 75 361 4.81 2.4218 N14..00 Other and unspecified back disorders 246 984 4.00 6.6019 1M1..00 Pain in lower limb 228 851 3.73 5.7120 1D1..00 GO: a general symptom 317 1278 4.03 8.57Variable NB and NA represent the numbers of patients before or after they take drugs for having one particular medical event Variable R1represents the ratio of thenumbers of patients after taking drugs to the numbers of patients before taking drugs. Variable R2 represents the ratio of the numbers of patients after taking drugs tothe number of the whole population.TABLEIII.THETOP20ADRSFORSIMVASTATINBASEDONDESCENDINGORDEROFR1VALUE.Rank Readcodes Medical events NB NA R1 R2
Level 1-5 1 1Z1E.00 Chronic kidney disease stage 3A without proteinuria 0 40 40.00 0.272 Eu32000 [X]Mild depressive episode 1 39 39.00 0.263 C106.00 Diabetes mellitus with neurological manifestation 1 39 39.00 0.264 Eu32100 [X]Moderate depressive episode 0 27 27.00 0.185 SK17100 Other leg injury 1 26 26.00 0.176 11120.11 Catarrh unspecified 0 26 26.00 0.177 S646000 Minor head injury 2 51 25.50 0.348 1125..00 Bronchopneumonia due to unspecified organism 1 24 24.00 0.169 173L.00 MRC Breathlessness Scale: grade 5 0 22 22.00 0.1549