Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients
Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S
DDevelopment of a Machine Learning Model andMobile Application to Aid in Predicting Dosageof Vitamin K Antagonists Among IndianPatients (cid:63)
Amruthlal M , Devika S , Ameer Suhail P A , Aravind K Menon , VigneshKrishnan , Alan Thomas , Manu Thomas , Sanjay G , Lakshmi Kanth L R ,Jimmy Jose ID , and Harikrishnan S Department of Computer Science and Engineering,National Institute of Technology Calicut, India [email protected] Sree Chitra Tirunal Institute for Medical Sciences and Technology,Thiruvananthapuram, India [email protected]
Abstract.
Patients who undergo mechanical heart valve replacementsor have conditions like Atrial Fibrillation have to take Vitamin K Antag-onists (VKA) drugs to prevent coagulation of blood. These drugs havenarrow therapeutic range and need to be very closely monitored due tolife threatening side effects. The dosage of VKA drug is determined andrevised by a physician based on Prothrombin Time - International Nor-malised Ratio (PT-INR) value obtained through a blood test. Our workaimed at predicting the maintenance dosage of warfarin, the present mostwidely recommended anticoagulant drug, using the de-identified medicaldata collected from 109 patients from Kerala. A Support Vector Machine(SVM) Regression model was built to predict the maintenance dosageof warfarin, for patients who have been undergoing treatment from aphysician and have reached stable INR values between 2.0 and 4.0.
Keywords:
Cardiac Valve Replacement - Mechanical Heart Valve -Atrial Fibrillation - Vitamin K Antagonists (VKA) - Prothrombin Time- International Normalised Ratio (PT-INR) - Warfarin - Support VectorMachine (SVM) Regression - Algorithm - Artificial intelligence - MachineLearning.
Patients who undergo cardiac procedures such as mechanical heart valve replace-ment and those who have atrial fibrillation require oral anticoagulant (OAC) (cid:63)
The current work is the result of a Memorandum of Understanding between Na-tional Institute of Technology Calicut and Sree Chitra Tirunal Institute for MedicalSciences and Technology, Thiruvananthapuram a r X i v : . [ q - b i o . Q M ] A p r Amruthlal M et al. drugs, mostly Vitamin K antagonists (VKA) to prevent blood clotting. Thedosage of VKA drugs are monitored by physicians by observing PT-INR (Pro-thrombin Time International Normalised Ratio) values obtained through ablood test. These VKA drugs have got a very narrow therapeutic range[1], sothey require very close monitoring. This will normally require the services of aphysician.Developing countries like India have disparities in access to healthcare wherethe patients find it difficult to get the services of a physician. Patients have totravel long distances to meet a physician to show the results of PT-INR andalter the dosage of the drugs, if necessary. The cost of travelling and physicaldisabilities, which prevent patients from travelling, make recurrent blood tests atedious process. This forces the patient to avoid doing the test, which can leadto bleeding or formation of blood clots inside blood vessels (thrombosis).Of late, the PT-INR test is widely available in many small laboratories even invillages and also as point-of-care (POC) devices, which are readily available andcan be used at home. If we can predict the warfarin dosage from PT-INR resultsusing a handheld device like a mobile phone or a computer based application, itwill be useful to people with limited healthcare access.Identifying a stable dose of warfarin just after initiation of the drug is atedious process which requires supervision of a physician and is usually donebefore the patient is discharged, for example after the valve surgery [1]. Our workfocuses on predicting the maintenance dosage in patients who are on follow-upat home with stable INR readings for some time.Studies in the field have shown that, incorporating pharmacogenomic datacan lead to higher accuracy compared to clinical models in warfarin dosageprediction [2]. But the genetic information used in the pharmacogenetic modelsis difficult to obtain and the gene variant with biggest influence on predictionvaries between populations [3]. So we thought of developing a simple algorithmbased on previous and current INR values and the last drug dose, as requiredfor the different indications.This paper is organized as follows. The next section, Section 2 discusses aboutexisting literature on clinical and pharmacogenetic warfarin dosage prediction.Details of preliminary analysis of dataset used for the work is provided in Sec-tion 3. Section 4 discusses different machine learning models that were tried out.Section 5 describes the algorithm used to convert predicted daily dosage (in dec-imals) to weekly dosage sequence (in integer) to match current clinical practice.Section 6 is about the client application designed to be used by patients anddoctors for dosage prediction.
As mentioned in the introduction, finding the required dosage of VKAs for aparticular patient begins just after surgery while he/she is in the hospital. Thepatient is discharged on this particular dose. Then the patient checks PT-INR achine learning model for anti-coagulation guidance 3 periodically and adjusts the dose initially every week for one month, then fort-nightly for two months and then monthly thereafter.There were many attempts using machine learning to predict the VKA doserequirement. Sharabiani et al [4] developed a methodology to predict the initialdosage of warfarin for new patients. The patients were classified into 2 groups,those who require more than 30mg of warfarin per week and those requiringless than 30mg per week. The classification was done using Relevance VectorMachines (RVM). For each class, customised regression models were developedto predict the dosage for each patient in the respective class. The dataset usedin this approach was multi-ethnic and contained continuous variables like bodysurface area (BSA) and PT-INR along with categorical variables such as gender,race, presence of diabetes , and history of smoking with the most weightage givento the BSA. The prediction accuracy was 11.6 in terms of root mean squarederror (RMSE). This accuracy is insufficient to be practically used. One reason forhigh error rate is the high dependency of warfarin on the demographic data[5].Hence, using a multi-ethnic dataset to train the model could result in loweraccuracy.Another study by Schelleman et al [3], aimed to develop a dosage predictionalgorithm for Caucasians and African Americans by taking into account theclinical, environmental and genetic factors and comparing the result obtainedwith giving the empirical 5mg per day as maintenance dosage. The datasetconsidered variables like age, gender, body surface area and had informationabout the variants in CYP2C9 and VKORC1 genes which are responsible forthe metabolism and action of the VKA drugs. Separate models were built forCaucasians and African Americans. In both the models, highest weightage wasgiven to the VKORC1 gene variant variable. The model for Caucasians obtainedbetter accuracy than the model for African Americans. The reason for this, theyclaimed, might be because they had not considered certain gene variants whichmight have been more important than the VKORC1 gene variant. Since themodel focused primarily on Caucasians and African Americans, the same modelmay not be applicable to Indian population and we do not have genetic datafrom all our population.The members of the International Warfarin Pharmacogenetics Consortium(IWPC) developed a pharmacogenetic prediction model [2] to predict a stabletherapeutic dose of warfarin and compared the result with that of a model whichonly considered the clinical factors and a model which gave fixed dosage to indi-viduals. The dataset considered contained data of individuals from 9 countriesand 4 continents whose target INR was between 2 and 3. Genotypic variableswere taken into account along with the variables like age, race, height, etc. Themodel with the least predictive mean absolute error was chosen as the best modelfor all the three cases considered. They are1. Taking into account the pharmacogenetic factors,2. Taking into account only the clinical factors, and3. Model which gave a fixed dose of 5mg of warfarin per day.
Amruthlal M et al.
The performance of the algorithms was checked in three dose groups, thosewith less than 21 mg per week, those who require more than 49mg per week andthose who require doses between 21 mg and 49 mg per week. The least squareslinear regression modelling method was used to develop the required algorithmwhich gave the square root of the required dose. The result of this model claimedthat the model taking into account the pharmacogenetic factors predicted thedosage with maximum accuracy. Then came the model which took into accountthe clinical factors.It was observed that in most of the cases, the physician predicts the dosagewithout considering the patients genotypic data, which is not usually available.The genetic factors influenced more in the initial dosage prediction, than theprediction of the maintenance dose.Considering the above discussed factors, there is a need for a better, morelocalised approach in developing the required algorithm. There are no predictivealgorithms specific to Indian patients . We attempted to develop an algorithmusing de-identified patient data obtained from Sree Chitra Tirunal Institute forMedical Sciences and Technology (SCTIMST).
This project is done in collaboration with National Institute of Technology Cali-cut (NITC) and SCTIMST. SCTIMST provided the deidentified data of 109Indian patients who are attending the INR Clinic of SCTIMST, to NITC. Thedata model parameters are described in Table 1.Table 1: Data Parameters description
S.No Parameter Description1 Age Age of the patient2 Old INR Value INR value of patient before PT-INR test3 New INR Value INR value of patient after PT-INR test4 Old Dosage Old warfarin dosage of the patient (in mg)5 Gender Gender of the patient6 Procedure type Type of procedure the patient had under-gone. It can be MVR, DVR, AVR or AF.7 New Dosage New warfarin dosage prescribed by thedoctor after the INR test (in mg)MVR - Mitral Valve Replacement, AVR - Aortic Valve Replacement, DVR - DoubleValve Replacement, AF - Atrial Fibrillation, INR - International Normalised Ratioachine learning model for anti-coagulation guidance 5
A preliminary analysis was conducted on the dataset to figure out patterns andpossible biases in the data. Frequency distribution of parameters are plotted tofigure out shortcomings in dataset. The plots are given from Figure 1 to Figure5.
Gender Distribution
The dataset has adequate representation from male andfemale population. Data points from other genders are absent, which may leadto inaccurate results for LGBTQ patients. See Figure 1.
Procedure Type Distribution
Patients undergoing AVR and MVR proce-dures are represented well. A shortage of data from AF and DVR categories canbe seen in Figure 2.Fig. 1: Gender distribution Fig. 2: Procedure type distribution
Age Distribution
Patients from age 12 to 94 are present in the dataset. Pa-tients in age group 40 to 70 are generally well represented. See Figure 3.
Amruthlal M et al.
Fig. 3: Age distribution Fig. 4: Old INR distribution achine learning model for anti-coagulation guidance 7
Old INR value Distribution
Adequate data points are present in the datasetwith old INR values ranging from 1.7 to 5.5. See Figure 4.
New INR Value Distribtion
Sufficient data points are present in the datasetwith new INR values in target range 1.6 to 5.5. See Figure 5.Fig. 5: New INR distribution
Data preprocessing is done to improve the accuracy of the model. The datasethas old dosage and new dosage given as daily dosage for some patients and asdosage sequence for other patients. The dosage sequence varies from fixed singledaily dose to sequential dosing (two-day sequence to four-day sequence eg. 2 mg,3 mg, 3 mg and this cycle repeats every third day). In order to attain uniformityin dataset, the sequence is averaged out to daily dosage by dividing the sum ofsequence by the number of days.Some patient data also contains dosage of acenocoumarol (acitrom), anothercommonly used VKA oral drug. Warfarin dosage is converted to acitrom dosageby dividing by a factor of two based on clinical data [6].The categorical parameters present in the dataset (Gender and Proceduretype) are encoded using one-hot encoding scheme. One hot encoding schemeencodes categorical parameters using binary representation to remove any extraweight assigned to higher integer value of categorical label in the schema [7].Gender is encoded with two binary variables and procedure type is encodedwith four binary variables.
A general procedure of randomly splitting dataset into 70% training data and30% testing data was followed and a set of machine learning models was applied
Amruthlal M et al. to detect accuracy.The accuracy of regression models was compared using R square value. R = (1 − u/v )where, u = dataset − size (cid:88) n =1 ( y ( n ) true − y ( n ) predicted ) v = dataset − size (cid:88) n =1 ( y ( n ) true − y mean ) y ( n ) true = Real warfarin dosage of n th patient y ( n ) predicted = Predicted warfarin dosage of n th patient y mean = Mean real warfarin dosageThe best possible value for R is 1.0. In statistics, linear regression is a modelling tool used for mapping the rela-tionship between a scalar response and one or more explanatory variables. Theregression coefficients obtained in linear regression training is given in Table 2.Table 2: Regression coefficients of LR Model
Feature CoefficientGender .268093776Procedure type -.0392217685Age .000704527364Old INR .164906416New INR -.777297970Old Dose .917862769
Observations
A variance score of 0.951 and mean square error of 0.439 wasobserved which strongly suggests that data is linearly distributed. The highestcoefficient was obtained for old dosage and new INR suggesting that the targetnew dosage is more co-related to these parameters. achine learning model for anti-coagulation guidance 9
Support Vector Machines are models which are trained under the condition thatan optimal hyperplane exists which separates the dataset into different classes[8]. It can be represented as the equation given below. f ( x ) = wx + b (1)Where f ( x ) is the optimal hyperplane with normal vector w and intercept b .Support vector regressions adds an extra condition that f(x) should satisfy | f ( x ) − y ( x ) | Kernels can be used to map the dataset to different dimensionsto obtain more seperability. The kernels tested include1. Linear kernel with e = 0.012. Polynomial kernel with degree = 2 and e = 0.013. Radial Basis Function kernel with gamma = 0.1 and e = 0.01The optimal e value for training was found to be 0.01 through k fold cross vali-dation . The dataset was split into 10 groups (k = 10) in random and one groupis chosen as the testing set and the rest as training set. Training is conductedwith e values 0.001, 0.01, 0.1, 1.0, 10 and it is found that e = 0.01 gives betteraverage variance. Table 3: Coefficients of Linear SVMR Model Feature CoefficientSex 0.95736124Procedure -0.00674263Age 0.13047947Old INR 1.36002697New INR -3.26086646Old Dose 0.955252 Observations It is found that linear kernel model gives the lowest mean squareerror value of 0.41 and variance value of 0.955. The coefficients of linear SVRare given in Table 3 The predicted daily dosage of warfarin has nanogram precision whereas the war-farin medicine is available only in doses of milligrams. Hence, an algorithm wasdesigned to convert the daily decimal dosage (in mg) to a weekly sequence ofintegers. The algorithm takes predicted daily dosage and sequence length to pro-duce best possible sequence with given sequence length and minimal error frompredicted daily dosage.1. Initialize lower-bound = floor(daily dosage)2. Initialize upper-bound = ceil(daily dosage)3. Initialize predicted-sequence = { lower-bound } with length sequence-length4. Initialize target-sum = predicted-dosage*sequence-length5. while(sum(predicted-sequence) - target-sum < = 0) do – previous-sequence = predicted-sequence – replace last found occurrence of lower-bound with upper-bound in predicted-sequence6. Return minimum of previous sequence, predicted-sequence with respect tothe minimization function abs(sum(sequence) - target-sum) The application was developed in the Android Platform. Since the app is goingto be used by the common man, care was taken to make it simple and userfriendly. Initially, when the mobile application is loaded, the user has to authen-ticate with username and password. This will guide them to the main page. Thismain page will have fields namely patient’s age, gender, old INR value, new INRvalue and old dosage, which the user has to enter. The initial entry is to be doneby hospital staff and the patient needs to enter only the old INR value, new INRvalue and old dosage. The old dosage can be entered as a single value or a set ofvalues. When the user clicks the Predict button, if the values entered by the userare all valid, the user is directed to the output page which shows the warfarindosage of the patient for the next week.The initial version of the application was tested in the heart failure clinicof SCTIMST and based on the feedback, many modifications were done in theapplication. The window to enter three consecutive day doses and the provisionthe enter the drug acenocoumarol (acitrom) were added. The conversion fromacitrom to warfarin is by mutiplying with a factor of 2 as the drug is more potent.The current version of the app, displays the drug doses for a week starting fromthe date of entry not in descending order as before (eg 3mg, 3mg, 2mg, 2mg,2mg, 2mg, 2mg) but varying doses interspersed(eg 2mg, 3mg, 2 mg, 3mg, 2mg,2mg, 2mg) achine learning model for anti-coagulation guidance 11 Fig. 6: Login Screen Fig. 7: Input Screen Fig. 8: Output Screen Various machine learning models are compared and linear SVM Regressionmodel with coefficients given in Table 3 gave the best variance score and ac-curacy. Regression models with linear base generally outperforms other modelsin prediction. It is found out that new INR value and old dosage are the prin-cipal components in prediction of new dosage. The daily dosage predicted bythe server is stable and usable in case of stable INR value range of two to four.For an INR value below 2 and above 4, it was decided to refer for the help of aphysician as he/she may need urgent medical help. The weekly prediction modelalso provides reasonable accuracy with one-off errors in boundary cases. Weeklymodel is going to be tested in the INR clinic of SCTIMST for further improve-ment in accuracy. A variance score of 0.955 was obtained for daily predictionwith mean square error of 0.41. The application was pretested using a different set of 50 physician assignedvalues for INR prediction in the INR Clinic of SCTIMST. It was found that theapplication predicted the values accurately in lower INR ranges, but towardsthe higher range(3.5-4 range of INR), there was variation compared to physicianassigned values in the tune of upto 5 mg in weekly doses. This indicates the needfor further refinement of the algorithm, probably by re-training the algorithmwith more physician derived datasets. We are planning to use 500 more data totry and improve the accuracy of the algorithm. Once we improve the accuracy as described above and find the values in theacceptable range compared to physician derived values, we will approach theethics committee of SCTIMST to test the efficacy and utility of this applicationin patients attending the INR clinic of SCTIMST. Once the efficacy and utilityis proven it will be released to the public. 10 Conclusion A warfarin dosage prediction algorithm was developed using data from Indianpatients. The linear regression model and the support vector regression modelwere tested and the support vector regression model was found to show betterresults with a lowest mean square error value of 0.41 and a variance value of0.955. A mobile application was developed using the algorithm and is going tobe tested in the INR clinic of SCTIMST in a larger group of patients for easeof use and accuracy. The application, after testing, can be used for predictionof daily and weekly dosage of warfarin and acenocoumarol for patients withoutconsulting a physician. Patients who are on the above oral anticoagulants fromremote areas can use this application installed in their mobile phones. References 1. Clive Kearon, Jeffrey S. Ginsberg, Michael J. Kovacs, David R. Anderson, PhilipWells, Jim A. Julian, Betsy MacKinnon, Jeffrey I. Weitz, Mark A. Crowther, SeanDolan, Alexander G. Turpie, William Geerts, Susan Solymoss, Paul van Nguyen,Christine Demers, Susan R. Kahn, Jeannine Kassis, Marc Rodger, Julie Ham-bleton, and Michael Gent. Comparison of Low-Intensity Warfarin Therapy withConventional-Intensity Warfarin Therapy for Long-Term Prevention of RecurrentVenous Thromboembolism. New England Journal of Medicine , 349(7):631–639,2003.2. T E Klein, R B Altman, N Eriksson, B F Gage, S E Kimmel, M-T M Lee, N A Limdi,D Page, D M Roden, M J Wagner, M D Caldwell, and J A Johnson. Estimation ofthe Warfarin Dose with Clinical and Pharmacogenetic Data. New England Journalof Medicine , 360(8):753–764, 2009.3. H. Schelleman, J. Chen, Z. Chen, J. Christie, C. W. Newcomb, C. M. Brensinger,M. Price, A. S. Whitehead, C. Kealey, C. F. Thorn, F. F. Samaha, and S. E. Kimmel.Dosing algorithms to predict warfarin maintenance dose in Caucasians and AfricanAmericans. Clin Pharmacol Ther , 84(3):332–339, 2008.4. Ashkan Sharabiani, Adam Bress, Elnaz Douzali, and Houshang Darabi. RevisitingWarfarin Dosing Using Machine Learning Techniques. Computational and Mathe-matical Methods in Medicine , page 9, 2015.5. Steven Lane, Sameh Al-Zubiedi, Ellen Hatch, Ivan Matthews, Andrea L Jorgensen,Panos Deloukas, Ann K Daly, B Kevin Park, Leon Aarons, Kayode Ogungbenro,Farhad Kamali, Dyfrig Hughes, and Munir Pirmohamed. The population pharma-cokinetics of R- and S-warfarin: Effect of genetic and clinical factors.