A Minimal-Input Multilayer Perceptron for Predicting Drug-Drug Interactions Without Knowledge of Drug Structure
AA Minimal-Input Multilayer Perceptron for
Predicting Drug-Drug Interactions Without
Knowledge of Drug Structure
Alun Stokes, William Hum, Jonathan ZaslavskyMcMaster UniversityMay 22, 2020
Abstract
The necessity of predictive models in the drugdiscovery industry cannot be understated.With the sheer volume of potentially usefulcompounds that are considered for use, it is be-coming increasingly computationally difficult toinvestigate the overlapping interactions betweendrugs. Understanding this is also important tothe layperson who needs to know what they canand cannot mix, especially for those who userecreational drugs - which do not have the samerigorous warnings as prescription drugs. With-out access to deterministic, experimental resultsfor every drug combination, other methods arenecessary to bridge this knowledge gap. Ideally,such a method would require minimal inputs,have high accuracy, and be computationallyfeasible. We have not come across a model thatmeets all these criteria. To this end, we proposea minimal-input multi-layer perceptron thatpredicts the interactions between two drugs.This model has a great advantage of requiringno structural knowledge of the molecules inquestion, and instead only uses experimentallyaccessible chemical and physical properties- per compound in total. Using a set ofknown drug-drug interactions, and associatedproperties of the drugs involved, we trained ourmodel on a dataset of about , entries.We report an accuracy of . on unseensamples of interactions between drugs on whichthe model was trained, and an accuracy of . on unseen samples of interactions betweenunseen drugs. We believe this to be a promisingand highly extensible model that has potentialfor high generalized predictive accuracy withfurther tuning. Keywords drug interaction, neural network, minimal-inputmodel, multi-layer perceptron, classification
With the rapid technological advances in thedrug discovery process as well as the prevalenceof increasingly-many drugs in our lives, under-standing the ways in which drugs affect one an-other in the body is a pressing problem. Notonly do we need to delineate individual effectsof chemicals in the body, we must be able toidentify the potential for interaction in order tobetter design these compounds. Exhaustivelytesting every drug-drug combination is both costand time-prohibitive due to the sheer volumeof compounds to be tested, making optimizedmethods necessary to perform proper analysis[3]. A practical approach for identifying po-tential drug interactions and assessing associ-ated risks has emerged in recent years with thedevelopment of predictive pharmacological net-works [1]. The applications of such methods arefar reaching in both the process of drug discov-ery, as well as the societal good that would comewith easy, dynamic access to tools that allowlaypeople to better control their health. Theprevalence of the use of recreational drugs seesmany drug-related ailments coming as a result ofmixing drugs that should not be mixed. As such,developing accessible, fast models to determinethese interactions is of great importance to oursociety [3]. In this paper, we propose the use ofmachine learning methods and known drug-druginteractions to predict these interactions in un-known drug pairings, as well as drugs on whicha model has not been trained.One of the largest barriers to predicting drug-1 a r X i v : . [ q - b i o . Q M ] M a y rug interactions is the complexity of factors as-sociated with that interaction. As such, manydeterministic or exhaustive methods fail the testof feasibility due to their necessary computa-tional complexity. Even such a problem as codi-fying the molecular structure of a chemical leadsto a great complexity of information to be pro-cessed, with this processing being non-trivial ofits own merit. Therefore, any strategy to predictthese interactions reasonably and at scale needsto be able to do so on a minimal set of param-eters, with limited computational complexity.Our model fits these requirements given eachdrug molecule is parameterized by 20 floating-point values, and all the necessary computationconsists of matrix operations that are hardware-accelerated on modern computers. This leads toa model which, on an average consumer laptopprocessor (Intel R (cid:13) Core i5-8259U), can predictinteractions for over 10,000 drug pairs per sec-ond - with ≈ . accuracy.This model determines the binary condition ofinteractivity, rather than categorizing the typeor location of interaction between two drugs.The implementation of the latter is, however, ex-pected to progress naturally from the presentedmodel, and would only require superficial adjust-ments. The reason for using a less specific modelin this paper is two-fold: the limited accessibil-ity of a dataset that characterizes the locationand types of interactions, as well as the amountof time allotted for completion of this project.We, however, do not feel that this is a weaknessor limitation of the model.The drug-drug interaction dataset used in thispaper was provided by The Science and Engi-neering of Emerging Systems Lab [2], who col-lected their data from DrugBank. The molecu-lar properties of each drug were obtained fromPubChem [4], a database maintained by the Na-tional Centre for Biotechnology Information. The primary dataset used in this paper [2], is aline-separated compilation of lists of drugs thatare known to interact with one another. Thereare approximately , unique species docu-mented, and about , individual drug-drug interactions. Certain parts of the datasetwere unusable, as they specified macromolecule-drug interactions, and we are only interestedin small-molecule drugs. Additionally, certaincompounds did not have entries in [4] that couldbe programmatically downloaded and processed,so those interactions were discarded. This leftabout , drug-drug interactions for thetraining, validation, and testing of the model. A larger and more generalized set would do wellto help the model train to determine the impor-tant factors in a drug-drug interaction. As such,several data augmentation techniques were per-formed on the set in order to increase its size,whilst not perverting the accuracy or validity ofits information.First, given we only had records of drug-druginteractions, we needed to generate a set of drug-drug non-interactions. Given that this datasetlists every subset of the drugs that do interact, alist of non-interacting drugs could be generatedby permuting two drugs in the set, and check-ing for their inclusion in the original dataset.This doubled the size of the original dataset -although it should be noted that doubling waschosen to keep the training categories evenly dis-tributed. This improves training performanceand avoids the necessity for weighting the train-ing rates per category. Next, each 2-tuple ofinteracting drugs was reversed and added backinto the list, again doubling the dataset. The or-der in which the two drugs are fed into the modelshould be independent of their interaction, andas such it should be trained to find an interactionbetween two drugs, regardless of which is first inthe list. This brought the size of the dataset upto approximately , drug pairs. This setwas then shuffled in order to prevent the modelfrom learning any patterns present in the origi-nal presentation of the data. Of this full set, is set aside for validation during training, and afurther composes the testing set. These arechosen randomly, although for the purposes ofcomparing models, the generator seed was fixedin order to allow comparable statistics.Once the interaction set was processed, infor-mation of the individual compounds needed tobe associated with each drug in an interaction.This data was obtained from [4], and certainattributes were extracted and assigned to theinteracting molecules to be fed into the model.The full list of extracted properties can be seein Table 1. Note that although these are mostlyinteger-valued parameters, for the purposes ofcomputation they are treated as floating-pointvalues.The neural network is a feed-forward network,meaning propagation only goes in the forwarddirection. Specifically, it is a multi-layer percep-tron, with numerous hidden layers. Given theassumed non-linearity of this problem, at leastseveral latent layers were necessary to predict in-teractions with any degree of accuracy. Each la-tent layer is either simply a fully-connected (FC)layer, a dropout layer, or a batch normalization(BN) layer. The network follows a diamond-like structure for the number of nodes per layer,2able 1: Chemical Parameters for Input intoModelParameter Type , increasing over several layersto , and then decreasing over several layersto before the final output. Every FC layer isproceeded by a BN layer. Table 2 breaks downeach layer of the network. Additionally, layersare grouped together to create a ‘super-layer’structure composed of two identically sized setsof an FC and BN layer, all followed by a dropoutlayer. The purpose of these dropout layers is toprevent the model from being able to ‘memo-rize’ the training data to artificially increase itstraining accuracy. Both the first and second la-tent layers as well as the last latent layer lack adropout layer proceeding them to prevent early-stage data loss and highly variable classificationrespectively. The FC layers use rectified linearunits (ReLUs) for activation, while the outputlayer uses a sigmoid function. Adam was usedas the optimizer, with parameters left as default.All dropout layers use a drop rate of . , andthe total number of trainable parameters in themodel is , . Various numbers of epochswere used for training in order to identify theoptimal value, and batch sizes between 32 and256 were tested for the same purpose.After training, the model is evaluated on thetesting set to quantify how well it has learned- i.e. its generalizability. As well, the modelis tested on the interactions (or lack thereof) of Table 2: Layers Composing ModelType Activation Output ParametersInput 40FC ReLU 256 11520BN 256 1024FC ReLU 256 65792BN 256 1024FC ReLU 512 131584BN 512 2048FC ReLU 512 262656BN 512 2048Dropout 512FC ReLU 256 131328BN 256 1024FC ReLU 256 65792BN 256 1024Dropout 256FC ReLU 128 32896BN 128 512FC ReLU 128 16512BN 128 512Dropout 128FC ReLU 128 16512BN 128 512FC Sigmiod 1 129drugs upon which it has not been trained - afurther test of how well this model actually pre-dicts drug-drug interactions. The data for thislast test is obtained from [4]. In this project, the architecture of the networkremained fundamentally unchanged throughouttesting after being determined from early testson several variants. These early tests were fo-cused on the number of FC layers in each super-layer, as well as the number of super-layers (i.e.the depth of the network). As well, it was deter-mined how many FC layers were to be includedbefore the first super-layer, and after the final.As can be seen in Table 2, it was found thatthe model performed optimally (in terms of thevalidation accuracy after 100 epochs) with twoFC layers, then three super-layers, then a sin-gle FC layer before the output. In whole, ev-ery combination of 1 to 3 FC layers before thesuper-layers, 1 to 5 super-layers, and 1 to 3 FClayers after the super-layers was tested. As well,different numbers of neurons in each layer andsuper-layer were tested, all with the diamond3attern described earlier. It was found that thesizes 256, 512, 256, 128, 128 performed best ofthe tested hyperparameters.The final, best performing model used the ar-chitecture mentioned previously, with uniformdropout rates of . , a batch size of , andtrained over epochs. The training accuracycurve can be seen in Figure 1. Of particularnote is how well the validation accuracy mimicsthe training accuracy. When tested on a set ofunseen interactions for known drugs, the modelscored a binary accuracy of . , lining up veryclosely with the validation accuracy at epoch of . . When tested on a set of unseen interac-tions between unknown drugs, the model scoreda binary accuracy of . . This small drop inaccuracy is expected due to the generalizationof the data being tested on. A confusion matrixfor the can be seen in Figure 2. The model itselfis relatively small, with the whole architectureand learned weights coming out to about MBin size.Figure 1: A plot of the training and valida-tion accuracy over epochs for a model trainedon approximately , samples with a batchsize of . Training took about minutes.The testing accuracy for this model is 0.965 onunseen interactions, and 0.942 on unseen drugs. In training a slew of models based on the de-termined architecture, we were able to tune thehyperparameters - including the dropout rate,output size of FC layers, activation functionsand learning rate - and ascertain the best per-formance from our chosen architecture. Al-though simplistic, the simplicity is what makesthe model as efficient as it is. To train over epochs on about , interactions takes justunder two hours on two Nvidia R (cid:13) P100 GPUs - Figure 2: A confusion matrix showing the cor-rect and incorrect predictions of drug-drug in-teractions over the seen drugs testing set. Thecolumn-wise contrast of colour intensity repre-sent the statistical power and specificity of . and . respectively.quite a short time in this field. This makes themodel highly extensible to much larger datasets,while staying within reasonable training times,even without optimal hardware. The potentialfor models such as this to find a purpose out-side dedicated research laboratories hinges ontheir efficiency. To be used in a consumer-facingapplication that predicts drug interactions, thismodel needs to be accessible outside of the field -in terms of necessary computational power, easeof use, and ease of access to necessary informa-tion. We believe that our model meets thesecriteria.Of most important note from our results is thefinal accuracies this model was able to achievefor both known and unknown drugs. The latterof the two was lower, at only . , but thisis to be expected with the generalization of themodel. The fact that the drop in accuracy was sosmall (just over . ) is a great encouragementto the efficacy of this model at predicting drug-drug interactions. It seems to have been able toextract some of the most important informationthat determines whether two drugs will interact,as it does so without having seen those drugs inits training.The importance of models as the one we havecreated here cannot be overstated. Being ableto predict drug-drug interactions is vitally im-portant in both the commercial setting of drugdevelopment, as well as in the context of an indi-vidual being able to know and understand whatdrugs they should and should not be able to taketogether. In order for a model to address the4eeds in whatever sector, it must be able to, withhigh accuracy, quickly determine from minimalinformation whether two compounds will inter-act. Where many other models working to thisend have failed previously is in this last criterion[1, 3]. Most models require an intimate knowl-edge of the structures of the chemicals of interestat a minimum, and often require in addition tothat a host of other information that is not easilyaccessible for any given molecule. The strengthof our minimal information requirement is thatthese parameters can be experimentally deter-mined without knowing or having to encode thestructure of the molecule, and uses very readilyavailable properties of compounds. This savesnot only on the burden of procuring datasetsthat have all the necessary information, it re-duces the number of necessary tests to be run ona compound before one has the requisite amountof necessary information to predict the interac-tion of two drugs. This not only improves thelogistics of using such a model, it decreases theamount of time to both train and utilize thatmodel and makes it more accessible in whole.As mentioned previously, one of the primaryextensions of the model in its current state isto switch from the binary classification of inter-action to a categorical classification of the typeand location of interaction. We consider thisa natural extension, and preliminary tests withsmall sets indicates promising results, similarto the accuracy for binary classification. Theprimary barrier to this is the procurement ofa dataset that has labelled interaction infor-mation, but this information is available to becreated through processing raw data providedby [4].A further improvement is to increase the num-ber of parameters read into the network for eachdrug. While the set given provides a relativelyhigh degree of accuracy, the classification accu-racy could be improved by including other, eas-ily obtainable properties of the chemicals of in-terest. Again, the only boundary to this is thetime it would take to compile a labelled datasetwith the requisite information.Finally, we would like to perform sensitivityanalysis on the model. This is an endeavourin and of itself, but doing so would allow foran entirely new and informative purpose to themodel by correlating each input with its effecton the model’s output. This is effectively a wayto determine which chemical properties are mostimportant in determining whether or not theywill interact in the body. The hope in doingso is to further investigate where areas of re-search should focus to best characterize and pre-dict these interactions. Conclusions
With available resources and sufficient drugdata, this model has potential to predict drug-drug interactions with a high degree of accuracy.The implications of this model are capable ofgreat societal benefit, in terms of pharmacoeco-nomics and public health [3, 5]. With sufficientaccuracy, the model can reduce high develop-ment expenses and save time by avoiding rig-orous testing for drug-drug interactions duringdrug discovery. By predicting interactions com-putationally rather than experimentally, the ef-ficiency of the pharmaceutical industry can beimproved [5]. As the number of drugs and chem-icals continues to grow, along with their interac-tions, models such as this become increasinglyimportant.With recreational drug use becoming morecommon, the importance of drug-drug interac-tion prediction is exceedingly pertinent [3]. Thegrowing variety of both legal and illegal recre-ational drugs raises the potential for unfore-seen consequences related to various drug com-binations [3]. Using the minimal-input multi-layer perceptron network described here allowsfor quick prediction with sufficient information(which is largely readily available online) topredict possible problematic interactions. Theavailability of the model to the world of medicineas well as the public can be far reaching in im-proving public health.
References [1] Aurel Cami, Shannon Manzi, Alana Arnold,and Ben Y. Reis. Pharmacointeraction Net-work Models Predict Unknown Drug-DrugInteractions.
PLOS ONE , 8(4):e61468, April2013.[2] Roger Guimerà and Marta Sales-Pardo. ANetwork Inference Method for Large-ScaleUnsupervised Identification of Novel Drug-Drug Interactions.
PLoS Computational Bi-ology , 9(12):e1003374, December 2013.[3] Andrej Kastrin, Polonca Ferk, and BraneLeskošek. Predicting potential drug-drug in-teractions on topological and semantic sim-ilarity features using statistical learning.
PLOS ONE , 13(5):e0196865, May 2018.[4] Sunghwan Kim, Jie Chen, Tiejun Cheng,Asta Gindulyte, Jia He, Siqian He, QingliangLi, Benjamin A. Shoemaker, Paul A.Thiessen, Bo Yu, Leonid Zaslavsky, JianZhang, and Evan E. Bolton. Pub-Chem 2019 update: improved access to5hemical data.
Nucleic Acids Research ,47(D1):D1102–D1109, 2019.[5] Takako Takeda, Ming Hao, Tiejun Cheng,Stephen H. Bryant, and Yanli Wang. Pre-dicting drug–drug interactions through drugstructural similarities and interaction net-works incorporating pharmacokinetics andpharmacodynamics knowledge.