[PDF] Drug Package Recommendation via Interaction-aware Graph Induction

Abstract

Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at developing a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real-world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance.

Full PDF

DDrug Package Recommendation via Interaction-awareGraph Induction ∗ Zhi Zheng , Chao Wang , Tong Xu , Dazhong Shen , Penggang Qin ,Baoxing Huai , Tongzhu Liu , Enhong Chen Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Huawei Technologies, The First Affiliated Hospital of USTC{zhengzhi97, wdyx2012, sdz, qinpg}@mail.ustc.edu.cn, {tongxu, cheneh}@ustc.edu.cn,[email protected], [email protected]

ABSTRACT

Recent years have witnessed the rapid accumulation of massiveelectronic medical records (EMRs), which highly support the intel-ligent medical services such as drug recommendation. However,prior arts mainly follow the traditional recommendation strategieslike collaborative filtering, which usually treat individual drugs asmutually independent, while the latent interactions among drugs,e.g., synergistic or antagonistic effect, have been largely ignored. Tothat end, in this paper, we target at developing a new paradigm fordrug package recommendation with considering the interaction ef-fect within drugs, in which the interaction effects could be affectedby patient conditions. Specifically, we first design a pre-trainingmethod based on neural collaborative filtering to get the initialembedding of patients and drugs. Then, the drug interaction graphwill be initialized based on medical records and domain knowledge.Along this line, we propose a new Drug Package Recommendation(DPR) framework with two variants, respectively DPR on WeightedGraph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solvethe problem, in which each the interactions will be described assigned weights or attribute vectors. In detail, a mask layer is uti-lized to capture the impact of patient condition, and graph neuralnetworks (GNNs) are leveraged for the final graph induction taskto embed the package. Extensive experiments on a real-world dataset from a first-rate hospital demonstrate the effectiveness of ourDPR framework compared with several competitive baseline meth-ods, and further support the heuristic study for the drug packagegeneration task with adequate performance.

CCS CONCEPTS • Information systems → Data mining . KEYWORDS

Drug Recommendation, Package Recommendation, Graph NeuralNetwork

With the growth of population and the intensification of populationaging, people’s demand for high-quality medical services contin-ues to rise, and the pressure on the medical workers is increasing.Moreover, certain public health emergencies such as the outbreakof COVID-19, will also have a huge impact on the medical system.Meanwhile, artificial intelligence (AI) technologies have shown ∗ Tong Xu is the corresponding author. enormous potential to reduce human labor. Therefore, if AI tech-nologies could be effectively utilized to realize intelligent diagnosisand drug recommendation clinically, it will greatly improve theoverall quality of medical services.Fortunately, with the popularization of information technologyin the medical industry, electronic medical records (EMRs) havebeen widely used in major hospitals, which powerfully support thedownstream intelligent applications like medical image analysis[11, 26], chronic disease management [12, 30], medical text analysis[2, 31], etc. However, due to the limitation of data and technology,drug recommendation based on EMR is still largely unexplored.In terms of data, similar to traditional recommendation system,drug recommendation is sensitive to data quality, but it is hard toget reliable medical data sources. Moreover, most patients haveonly been recorded once or several times in EMR database, whichmakes it hard to utilize conventional personalized recommendationmethods based on user preference analysis. In terms of technol-ogy, it is very important for the recommender system to considerboth drug effect and the interaction between drugs at the sametime, and give the patient a suitable drug package, which containsmultiple drugs. However, most of existing studies generally relyon traditional methods such as collaborative filtering [43] to solvethis problem. Due to the lack of item relation data for interactionanalysis, there are limits for these methods to achieve satisfactoryperformance in practical applications.In order to address the above challenges, in this paper, we aim todevelop a new paradigm for drug package recommendation withthe awareness of drug interaction. The rationale behind this is thatthe interaction between drugs will influence the effect of the drugpackage, and the impact of drug interaction on drug effect will befurther affected by patient conditions. We illustrate this by a patientwith kidney disease as shown in Figure 1. The drug package for thispatient contains three drugs, respectively pyridoxine, aztreonamand cefuroxime. Cefuroxime is synergistic with the other two drugs,which can improve the effect of the drug package. Torasemide isantagonistic with pyridoxine, so it is not included in the package.Furthermore, the combination of cefuroxime and gentamicin has asynergistic antibacterial effect, but at the same time it may increasenephrotoxicity, so it is not suitable for this patient.Along this line, we first design a pre-training model to get theembedding of patients and drugs based on neural collaborativefiltering (NCF). Then we collect drug interaction data from publiconline dataset and divide drug pairs into three categories with thehelp of domain experts, respectively No Interaction, Synergism andAntagonism. After that, we propose to represent drug packages as a r X i v : . [ c s . I R ] F e b rug package CefuroximeAztreonamPyridoxineSynergism

Candidate drugs

TorasemideNephrotoxicity Gentamicin

Antagonism

Figure 1: An example for a patient with kidney disease. graphs based on the labeled data. Furthermore, we propose a DrugPackage Recommendation (DPR) framework with two variants. Thefirst one, namely DPR on Weight Graph (DPR-WG), regards theeffect of drug interaction as graph edge weights, while the secondone, DPR on Attributed Graph (DPR-AG), utilizes edge attributevectors to describe the influence of drug interaction. In both twomodels, we exploit a mask layer to capture the impact of the patientcondition on the drug package representation, and Graph NeuralNetworks (GNNs) are leveraged for the final graph induction task toembed the package. Finally, extensive experiments on a real-worlddataset from a first-rate hospital demonstrate the effectiveness ofour DPR framework compared with several competitive baselinemethods, and further support the heuristic study for the drug pack-age generation task with adequate performance.Specifically, the major contributions of this paper can be sum-marized as follows: • We develop a new paradigm to represent drug packages asgraphs based on drug interaction classification. • We design a drug package recommendation framework withtwo variants, which can integrate drug interaction informa-tion based on graph induction. • We propose to utilize a mask layer to capture the impact ofpatient condition on the drug package representation. • We conduct extensive experiments on a real-world data setfrom a first-rate hospital, which clearly validate the effec-tiveness of our DPR framework and reveal some interestingrules based on the derived insights on patient conditions anddrug interaction.

In this section, we will summarize the related works as follow-ing three categories, respectively drug recommendation system,package recommendation system, and graph neural networks.

Recommendation systems have been widely used in a variety of ap-plications like social networking and e-commerce. The methods canbe broadly classified into two categories, respectively neighborhood-based collaborative filtering methods based on similar users or items[1], and model-based methods, particularly latent factor modelsthat factorize the user-item matrix into user factors and item factors[19]. Recent recommender systems have been further advanced bythe significant contribution from deep learning [16, 39, 42], where user preferences and item characteristics can be learned in deeparchitectures. Based on these technologies, some methods focus-ing on drug recommendation have been put forward. For example,[44] introduces a LDA-based contextual collaborative model calledMedicine-LDA to integrate the multi-source information. [41] con-structs a heterogeneous graph which includes patients and drugs,and describes a novel recommendation system based on label prop-agation. [8] develops a joint model with a recommendation com-ponent and an ADR label prediction component to recommend aset of to-avoid drugs. With the increasing emergence of knowledgegraph, some researchers have extracted information from medi-cal database like [22] to build up giant medical knowledge graphs.Based on these knowledge graphs, [38] proposes to jointly embeddiseases, drugs and patients into a shared lower dimensional space,and decomposes the drug recommendation into a link predictionprocess. However, these models lack the ability to recommend drugsas a package, and the studies on drug interaction are not thoroughenough.

Most recommendation research concentrates on recommendingone item to users at a time. However, in many real world scenarios,the platform needs to show users a set of items, in other words,a package (or a bundle). Several efforts have been made to solvethis problem. Some studies turn this problem into optimizationproblems like 0-1 Knapsack problem, and provide some approxi-mate solutions due to the NP-Hardness [10, 21, 32, 45]. [27] putsforward a Tourist-Area-Season topic model and proposes a cocktailapproach on personalized travel package recommendation. [3] pro-poses a bundle generation network which decomposes the problemby derterminantal point processes. [33] develops a model whichutilizes the trained features of an item recommendation modelto learn the personalized ranking over bundles. [7] contributes aneural network solution based on factorized attention network toaggregate the item embeddings in a package. [6] proposes a modelbased on graph neural network which explicitly models the interac-tion and affiliation between users, bundles, and items by unifyingthem into a heterogeneous graph. However, these models neglectthe different types of interactions between items, which preventsthem from capturing satisfactory performance for drug packagerecommendation.

Recently, many studies on extending deep learning approaches forgraph data have emerged. Unlike standard neural networks, GNNsretain a state that can represent information from its neighborhoodwith arbitrary depth. For example, [18] presents graph convolu-tional network (GCN) for semi-supervised learning on graph datavia an approximation of spectral graph convolutions. [14] presentsGraphSAGE to generate node embeddings by sampling and ag-gregating features from the local neighborhoods of nodes. [35]presents graph attention networks (GATs) which leverage maskedself-attentional layers to address the shortcomings of methods basedon graph convolutions. [13] further presents that the essence ofexisting GNNs is to learn a message passing algorithm and an aggre-gation procedure to compute a function of the entire input graph,nd reformulates existing models into a single common frameworkcalled Message Passing Neural Networks (MPNNs). With the strongpower of learning structure, GNNs have been widely applied inmany fields. For example, [24, 40] utilize graph data and graph neu-ral networks for competitive analysis. [28] propose a deep modelto integrate structural and temporal social contexts to address thedynamic social-aware recommendation task.

In this section, we first introduce the real-world dataset used in ourstudy, and then propose the problem formulation of drug packagerecommendation.

The EMR dataset used in this paper comes from the electronicmedical record database of a first-rate hospital in China. As shownin Figure 2, each medical record contains the following information: • Demographics . Demographics are formatted data includ-ing basic patient information, such as patient’s gender, age,type of medical insurance, whether surgery has been per-formed, etc. This information provides guidance for doctorsto prescribe, for example, some drugs are not suitable for chil-dren, while some drugs are only covered by certain medicalinsurance, etc. • Laboratory results . A laboratory test is a procedure inwhich the hospital takes a sample of the patient’s body fluidor body tissue to get information of the patient’s health. Thelaboratory results are shown as the patient’s values and nor-mal values for laboratory items. For example, "glucose value:77 mg/dL, normal value: 65-99 mg/dL". • Admission notes . An admission note is part of a medicalrecord that documents the patient’s status including physi-cal examination findings, reasons why the patient is beingadmitted for inpatient care to a hospital, and the initial in-structions for the patient’s care. • Drugs . This information includes all of the drugs used dur-ing the patient’s hospital stay.In order to integrate and utilize the above multi-source hetero-geneous data, we conduct the following preprocessing steps. First,for the demographics, we convert them into documents, e.g., "Gen-der : Male, Age : Teenager". Second, for the laboratory results, wedivide the results into three levels, respectively normal, abnormallyhigh and abnormally low according to the given normal values. Wethen extract all abnormal test results (abnormally high and low)from the results and converted them into documents, e.g., "glucosevalue : abnormally high, lipid panel : abnormally high". After that,we merge the demographic documents and laboratory result docu-ments, namely disease documents. Finally, for the admission notes,we remove all the punctuation and meaningless characters, andadjust all of the admission notes in the dataset to the same lengthby padding and cut-off.For the purpose of studying the interaction between drugs, wecollect data from two large online pharmaceutical knowledge bases,

Table 1: Statistics of our dataset.

Discription NumberThe number of records 158,556The number of drugs 1,428The number of words in disease document 1,242The average size of drug packages 18The number of aligned drugs 565The number of drug pairs with No Interaction 2,560The number of drug pairs with Synergism 22,986The number of drug pairs with Antagonism 6,389i.e., DrugBank and YaoZhi , where users can check drug proper-ties and drug-drug interaction. The drug interaction information inthese two databases are stored in text format based on some certaintemplates. We further classify the templates into three categorieswith the help of domain experts, respectively No Interaction, Syner-gism and Antagonism. No Interaction means there is no interactionbetween two drugs. Synergism means the combination of two drugscan lead to enhanced drug effect, and Antagonism is the opposite.Table 2 shows some examples of different drug interactions. Notethat the interaction can be directed, for example, if drug A canincrease the effect of drug B, then the direction is from A to B.Moreover, for most of the drug pairs, we cannot confirm whetherthere is any type of interactions between them, so we leave themas unlabeled. Section 4.2.1 will further discuss how to exploit theselabeled and unlabeled data.Finally, we pick out the EMR records containing more than onedrug and we get totally 158,556 EMR records with complete infor-mation. More detailed statistics of our data are shown in Table 1. Based on the above EMR and drug interaction data, here we intro-duce the problem formulation of drug package recommendation.For facilitating illustration, Table 3 lists some important mathemat-ical notations used throughout this paper.Suppose there are 𝑁 patients and 𝑀 drugs in the training set.Based on the above preprocessing method, for patient 𝑖 , we canconstruct the disease document and turn it into one-hot encodingform as W 𝑖 = (cid:8) 𝑤 𝑖, , 𝑤 𝑖, , . . . , 𝑤 𝑖,𝑝 (cid:9) , where 𝑤 𝑖, · is the 0/1 indicatorvalue for a demographic feature or a lab result. In addition, we canformulate the admission note as T 𝑖 = (cid:8) 𝑡 𝑖, , 𝑡 𝑖, , . . . , 𝑡 𝑖,𝑞 (cid:9) , where 𝑡 𝑖, · is a word in the processed admission note. In this way, the patient 𝑖 can be expressed as a patient description U 𝑖 = {W 𝑖 , T 𝑖 } . We alsohave the drug package P 𝑖 = (cid:8) 𝑑 𝑖, , 𝑑 𝑖 , . . . , 𝑑 𝑖,𝑠 (cid:9) , where 𝑑 𝑖, · is a drugthat patient 𝑖 used. Moreover, based on the labeled drug interactiondata, we can construct the drug relation matrix R ∈ R 𝑀 × 𝑀 , where R 𝑖 𝑗 represents the interaction between 𝑑 𝑖 and 𝑑 𝑗 , namely Note that the direction is from 𝑑 𝑖 to 𝑑 𝑗 . Along this line,the problem of drug package recommendation can be formulatedas: https://go.drugbank.com/releases/latest https://db.yaozh.com/interaction able 2: Examples of drug interaction labeling. Drug A Drug B Description Classification DirectionAmoxicillin Oseltamivir No Interaction No Interaction BidirectionDipyridamole Valsartan Dipyridamole may increase the antihypertensive activities of Valsartan. Synergism A to BRepaglinide Doxepin Doxepin may decrease the hypoglycemic activities of Repaglinide. Antagonism B to A

Medical Record

Demo- graphics Age : 4

Gender : Female

Insurance : Basic medical insurance

Operation : None

Anion gap - abnormally high Creatinine - abnormally low Aspartate/glutamate - abnormally high Etc.Laboratory ResultsAdmission Notes

The child was admitted to the hospital for "coughing for 10 days and fever for 4 days".

Physical examination: neck Soft, thick breath sounds in lungs ...... Drugs Glucose, Sodium Bicarbonate, Xylitol, Budesonide, Erythromycin, Terbutaline, Cefuroxime, Sodium Chloride

Figure 2: An example of the medical record in our dataset.

Definition 1 (Drug Package Recommendation).

Given a set ofpatient descriptions {U , U , . . . , U 𝑁 } with the corresponding drugpackages {P , P , . . . , P 𝑁 } , and the drug relation matrix R , the goalof drug package recommendation is to get a personalized scoringfunction for each patient: 𝑓 𝑢 : P → R . Note that the cold start patients and packages are very commonin our drug package recommendation problem. For example, anew patient comes to the hospital or a doctor prescribes a newdrug package. This requires the model to score a package based onthe patient condition and the effect of drug packages, making theproblem radically different from traditional recommendation basedon user-item interaction matrix.

In this section, we will introduce the framework of our model indetail. As shown in Figure 3, our framework mainly consists ofthree components, i.e., pre-training, package graph construction,and drug package recommendation. Specifically, we first design apre-training method based on neural collaborative filtering to getthe initial embedding of patients and drugs. Then, we propose toconstruct drug package graphs based on the medical records anddomain knowledge. Finally, a novel Drug Package Recommendation(DPR) framework with two variants are proposed to solve the drugpackage recommendation problem.

Table 3: Mathematical notations.

Symbol Description

𝑁, 𝑀

The number of patients and the number of drugs; P 𝑖 The drug package of patient 𝑖 ; W 𝑖 The disease document of patient 𝑖 ; T 𝑖 The admission note of patient 𝑖 ; U 𝑖 The patient discription of patient 𝑖 ; G 𝑖 The drug package graph of patient 𝑖 ; R The drug relation matrix; Θ Model Parameters; 𝑑 𝑗 The 𝑗 th drug in the entire drug set; 𝑑 𝑖, · Drug in the drug package of patient 𝑖 ; 𝑤 𝑖, · Indicator value in the disease document of patient 𝑖 ; 𝑡 𝑖, · Word in the admission note of patient 𝑖 ; 𝑀𝐿𝑃 (·)

Multilayer Perceptron with ReLU Activation Function.

A patient’s description consists of two heterogeneous parts, anda drug package consists of several drugs. In order to recommenddrug packages, we first need to get the embeddings of drugs andpatients. Therefore, we propose a pre-training method as follows.First, we propose a hybrid method to get the patient embedding u based on patient description U = {W , T } , which can be splitinto two steps. To be specific, in the first step, we extract the featureof the patient’s disease document by MLP as: m 𝑤 = 𝑀𝐿𝑃 (W) . (1)In the second step, we associate each word 𝑡 𝑘 in patients’ admis-sion notes with a word embedding vector x 𝑘 . By this way we canconvert T to a sequence of vectors ( x , x , . . . , x 𝑞 ) . Then we inputthe sequence into char-LSTM [20] as: i 𝑡 = 𝜎 ( W 𝑥𝑖 x 𝑡 + W ℎ𝑖 h 𝑡 − + W 𝑐𝑖 c 𝑡 − + b 𝑖 ) , f 𝑡 = 𝜎 (cid:16) W 𝑥 𝑓 x 𝑡 + W ℎ𝑓 h 𝑡 − + W 𝑐𝑓 c 𝑡 − + b 𝑓 (cid:17) , c 𝑡 = f 𝑡 ⊙ c 𝑡 − + i 𝑡 ⊙ tanh ( W 𝑥𝑐 x 𝑡 + W ℎ𝑐 h 𝑡 − + b 𝑐 ) , o 𝑡 = 𝜎 ( W 𝑥𝑜 x 𝑡 + W ℎ𝑜 h 𝑡 − + W 𝑐𝑜 c 𝑡 + b 𝑜 ) , h 𝑡 = o 𝑡 ⊙ tanh ( c 𝑡 ) . (2)We get the final time step output h 𝑞 as the embedding of T , andthe patient embedding u is the concatenation of the two parts: u = (cid:2) m 𝑤 || h 𝑞 (cid:3) . (3)Second, we associate each drug 𝑑 𝑗 with a randomly initializedembedding d 𝑗 which directly projects drug one-hot ID to the la-tent space. Finally, We utilize Neural Collaborative Filtering (NCF)framework [16] and Bayesian Personalized Ranking (BPR) loss [34]to train the above embeddings and models. Specifically, for pa-tient 𝑖 , we get a patient-drug predictive model by feeding patient re-training Patient

Disease document Admission note

MLP char-LSTM concat.

DrugDrugembedding concat.

MLP

Pick probability

Package Graph Construction / 1 2/ 02 / 1 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 Drug package Relation matrix

Package Recommendation

Package graph Patient embedding

Mask Layer

MLPMask vector

DPR-WG

DPR-AG

Graph embeddingGraph embedding

Patient embedding concat.

Patient embedding Pick probability

Relation matrix construction

Interaction data

Graph construction / 1 2/ 02 / 10 1 / 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 Relation matrix

Graph Update

Package graph Updated graphUpdated graph

Graph Induction

Recommend

Figure 3: A framework overview of the drug package recommendation system. embedding u 𝑖 and drug embedding d 𝑗 into a matching model:ˆ 𝑟 𝑖 𝑗 = 𝑀𝐿𝑃 (cid:0)(cid:2) u 𝑖 || d 𝑗 (cid:3)(cid:1) , (4)Then we adopt BPR loss as: 𝐿 = 𝑁 ∑︁ 𝑖 = ∑︁ 𝑗 ∈P 𝑖 ∑︁ 𝑙 ∉ P 𝑖 − ln 𝜎 (cid:0) ˆ 𝑟 𝑖 𝑗 − ˆ 𝑟 𝑖𝑙 (cid:1) + 𝜆 ∥ Θ ∥ , (5)where 𝑑 𝑗 is in drug package and 𝑑 𝑙 is not. We minimize the lossfunction forcing the prediction ˆ 𝑟 𝑖 𝑗 to be larger than ˆ 𝑟 𝑖𝑙 . 𝜎 (·) is thesigmoid function, and Θ is the parameter set. 𝐿 regularization isapplied to prevent overfitting. Compared with traditional item recommendation, the core problemof drug package recommendation is how to get the representationof drug packages considering the interaction between drugs. There-fore, in this section, we propose to utilize graph models to solvethis problem. To be specific, we first present a method to convertthe drug packages into package graphs. Then, we formulate themessage passing framework which will be further utilized for thegraph induction task.

For drug package P , we definea corresponding package graph G = {V , E} , where V is the nodeset and E is the edge set. Each specific node 𝑣 ∈ V is associated withcorresponding drug embedding d . Each directed edge 𝑒 𝑣𝑢 ∈ E alsohas its attribute, and its form will change with different methods,which will be discussed in later sections.The topology structure of the package graph G , i.e., whetheredge 𝑒 𝑣𝑢 should exist, needs to be defined. Theoretically, since anypair of drugs may have drug interaction, the package graph G should be a complete graph, where all nodes are connected witheach other. However, this will make the time complexity of graphinduction increases from 𝑂 ( 𝑛 ) to 𝑂 (cid:0) 𝑛 (cid:1) owing to the pairwiseinteraction. Furthermore, we find that the frequency of drug co-occurrence obeys a long-tailed distribution, which means most ofthe drug pairs have no clear relationship. Therefore, we propose thefollowing two criterions to define the topology of a package graph.For nodes 𝑣, 𝑢 : 1 ) If R 𝑣𝑢 ≠ −

2, which means this drug paired hasbeen labeled in Section 3.1, then edge 𝑒 𝑣𝑢 exists. 2 ) Calculate theco-occurrence proportion 𝑝 𝑖 𝑗 = 𝑛𝑢𝑚 𝑖 𝑗 / 𝑛𝑢𝑚 𝑖 , where 𝑛𝑢𝑚 𝑖 meansthe number of packages containing drug 𝑖 , and 𝑛𝑢𝑚 𝑖 𝑗 means thenumber of packages containing both drug 𝑖 and drug 𝑗 . If 𝑝 𝑖 𝑗 isbigger than a threshold value, then edge 𝑒 𝑣𝑢 exists. We propose to exploitthe MPNN [13] framework for making use of the package graphsconstructed in the last section. MPNN is a general approach todescribe GNNs, which inductively learns a node representation byrecursively aggregating and transforming the feature vectors of itsneighboring nodes. A per-layer update of the MPNN model in oursetting involves message passing, message aggregation, and noderepresentation updating, which can be expressed as: m ( 𝑙 ) 𝑣𝑢 = MESSAGE ( h ( 𝑙 − ) 𝑢 , h ( 𝑙 − ) 𝑣 , e 𝑣𝑢 ) , (6) M ( 𝑙 ) 𝑢 = AGGREGATION ({ m ( 𝑙 ) 𝑣𝑢 , e 𝑣𝑢 } | 𝑣 ∈ N ( 𝑢 )}) , (7) h ( 𝑙 ) 𝑢 = UPDATE ( 𝑀 ( 𝑙 ) 𝑢 , h ( 𝑙 − ) 𝑢 ) , (8)where m ( 𝑙 ) 𝑣𝑢 is the message vector passing from 𝑣 to 𝑢 , h ( 𝑙 ) 𝑢 is therepresentation of node 𝑢 on the layer 𝑙 ; e 𝑣𝑢 is the attribute corre-sponding to edge 𝑒 𝑣𝑢 . N ( 𝑢 ) is the neighborhood of node 𝑢 fromwhere it collects information to update its aggregated message M 𝑢 . ∗ 𝑒 𝑐 𝑣 𝑣 𝑣 𝑣 Updated graph

TanhLinearMLPDrug embedding Drug embeddingPatient mask concat.

Update edge attribute

Figure 4: Edge attribute updating progress of DPR-WG. h ( ) 𝑢 is initialized by corresponding drug embedding d i , and we alsoexpress it as d u for facilitating illustration. After the formulation of package graphs and massage passing neu-ral networks, we can finish the graph induction task, i.e., get theembedding of the package graph, based on the MPNN frameworkand further solve the drug package recommendation problem. Thekey to obtain effective representation of the drug package graphis to utilize the edge attributes to capture the interaction betweenthe drugs. Therefore, we propose the following two ways to for-mulate the edge attributes in package graphs from two differentpoint of views. First, since the two major interactions in our dataset,respectively Synergism and Antagonism, are opposite to each other,we can simply exploit signed edge weights to describe the druginteraction intensity. Second, if we expect our model to be moregeneric, we can define the edge attributes as vectors which containthe information about the type of interaction. Along this line, wepropose our Drug Package Recommendation (DPR) model with twovariants, respectively DPR on Weighted Graph (DPR-WG) and DPRon Attributed Graph (DPR-AG) in the following sections.

In DPR-WG, we present to converta package graph G into a weighted graph by assigning real num-bers to edge attributes, i.e., 𝑒 𝑣𝑢 ∈ R . Specifically, for edge 𝑒 𝑣𝑢 in apackage graph G , we initialize the edge attribute as: e 𝑣𝑢 =  R 𝑣𝑢 = , − R 𝑣𝑢 = ,𝑝 𝑣𝑢 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒. Note that we set edge attribute for 𝑒 𝑣𝑢 even if R 𝑖 𝑗 = R 𝑖 𝑗 = − u , and get the conditional drug embedding ˆ d 𝑢 as follows:ˆ d 𝑢 = 𝜎 ( 𝑀𝐿𝑃 ( u )) ⊙ d 𝑢 , (9)where the mask layer is formed as 𝜎 ( 𝑀𝐿𝑃 (·)) , and the mask vector 𝜎 ( 𝑀𝐿𝑃 ( u )) plays the role of feature selecting on the drug embed-dings. ⊙ represents the element-wise product of two vectors. Then, 𝑣 𝑣 𝑣 𝑣 Updated graph

MLP

Drug embedding Drug embedding concat.

Interactionvector

Patient mask

Transfer matrix

Edge attribute

Classification vector

Update edge attribute

Figure 5: Edge attribute updating progress of DPR-AG. for edge 𝑒 𝑣𝑢 , we can calculate a contextual impact factor 𝑐 𝑣𝑢 as: 𝑐 𝑣𝑢 = 𝑇𝑎𝑛ℎ ( 𝑎 ⊤ 𝑀𝐿𝑃 ([ ˆ d 𝑢 || ˆ d 𝑣 ])) , (10)where 𝑎 ⊤ is a row vector which has the same length with the MLPoutput. The contextual impact factor 𝑐 𝑣𝑢 reflects the impact of thethe patient condition on the drug interaction between 𝑑 𝑢 and 𝑑 𝑣 .After the above calculation, we can update the edge attribute asˆ e 𝑣𝑢 = 𝑐 𝑣𝑢 ∗ e 𝑣𝑢 . Figure 4 shows the updating progress in detail.Then, we can form the GNN layer using edge weight for filteringas the following steps: m ( 𝑙 ) 𝑣𝑢 = 𝑊 ( 𝑙 − ) h ( 𝑙 − ) 𝑣 , (11) M ( 𝑙 ) 𝑢 = ∑︁ 𝑣 ∈N( 𝑢 ) 𝐺𝑅𝑈 (cid:16) ˆ e 𝑣𝑢 m ( 𝑙 ) 𝑣𝑢 , h ( 𝑙 − ) 𝑢 (cid:17) , (12) h ( 𝑙 ) 𝑢 = 𝑀𝐿𝑃 (cid:16) 𝑊 ( 𝑙 − ) h ( 𝑙 − ) 𝑢 + M ( 𝑙 ) 𝑢 (cid:17) , (13)where 𝑊 denotes the model’s parameters to be learned, and GRU de-notes the gated recurrent neural network [9]. We set the dimensionof all the layers equal to the dimension of 0th layer.Now we can utilize the formed GNN layer for the graph induc-tion task. Note that different from general sparse graphs, a drugpackage graph is a graph which is dense enough. Therefore, weonly need one layer of GNN to extract almost all the informationwe expected, and there is no need for high-order neighbors, whichwill be discussed later. For each node 𝑣 , we have the initial node em-bedding d 𝑣 and the corresponding hidden representation h 𝑣 fromthe GNN layer. Following [25], the package graph embedding canbe formed as: g = ∑︁ 𝑣 ∈ 𝑉 𝜎 ( 𝑀𝐿𝑃 ([ d 𝑣 || h 𝑣 ])) ⊙ ( 𝑀𝐿𝑃 ([ d 𝑣 || h 𝑣 ])) . (14)Again, we utilize NCF framework and BPR loss to train the model.For patient 𝑖 , we have the patient embedding u 𝑖 and the correspond-ing package graph embedding g 𝑖 . The loss function can be formedas follows, where the MLP model is the final prediction model: 𝐿 = 𝑁 ∑︁ 𝑖 = ∑︁ 𝑗 ≠ 𝑖 − ln 𝜎 (cid:0) 𝑀𝐿𝑃 ([ u 𝑖 || g 𝑖 ]) − 𝑀𝐿𝑃 (cid:0)(cid:2) u 𝑖 || g 𝑗 (cid:3)(cid:1)(cid:1) + 𝜆 ∥ Θ ∥ , (15) In DPR-AG, the package graph G is formed as an attributed graph, whereboth nodes and edges have corresponding attribute vectors.pecifically, for edge 𝑒 𝑣𝑢 and the corresponding drug embedding d 𝑣 , d 𝑢 , we first form the edge attribute vector e 𝑣𝑢 as the interactionvector between d 𝑣 and d 𝑢 , which is calculated by MLP model as: e 𝑣𝑢 = 𝑀𝐿𝑃 ([ d 𝑣 || d 𝑢 ]) . (16)Then, we utilize the mask layer again to update the edge attributesby adding the impact of patient’s condition on the interaction vectoras follows: ˆ e 𝑣𝑢 = 𝜎 ( 𝑀𝐿𝑃 ( u )) ⊙ e 𝑣𝑢 . (17)The detailed updating progress is shown in Figure 5. Based on theabove steps, we can form the GNN layer as the following steps, notethat the settings are the same as DPR-WG in the former section: m ( 𝑙 ) 𝑣𝑢 = 𝑊 ( 𝑙 − ) ˆ e ( 𝑙 − ) 𝑣𝑢 , (18) M ( 𝑙 ) 𝑢 = ∑︁ 𝑣 ∈N( 𝑢 ) m ( 𝑙 ) 𝑣𝑢 , (19) h ( 𝑙 ) 𝑢 = 𝑀𝐿𝑃 (cid:16) 𝑊 ( 𝑙 − ) h ( 𝑙 − ) 𝑢 + M ( 𝑙 ) 𝑢 (cid:17) . (20)After getting the package graph embedding by equation 14, we canform the loss function for DPR-AG. The essential difference betweenDPR-WG and DPR-AG is that, in DPR-WG, the prior knowledge isleveraged explicitly by initializing the edge weights according tothe relation matrix R . On the contrary, we propose to utilize theprior knowledge implicitly in DPR-AG. Specifically, we design ahybrid loss function as: 𝐿 = 𝑁 ∑︁ 𝑖 = ∑︁ 𝑗 ≠ 𝑖 − ln 𝜎 (cid:0) 𝑀𝐿𝑃 ([ u 𝑖 || g 𝑖 ]) − 𝑀𝐿𝑃 (cid:0)(cid:2) u 𝑖 || g 𝑗 (cid:3)(cid:1)(cid:1) − 𝑁 ∑︁ 𝑖 = ∑︁ 𝑢,𝑣 ∈G 𝑖 R 𝑢𝑣 ≠ − ln (cid:16) 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 (cid:0) e ⊤ 𝑣𝑢 Q (cid:1) R 𝑢𝑣 (cid:17) + 𝜆 ∥ Θ ∥ , (21)where the MLP model is the final prediction model. Q ∈ R 𝐷 × isthe transfer matrix to transform the edge attribute e 𝑣𝑢 into clas-sification probabilities, where 𝐷 is the dimension of e 𝑣𝑢 . We addcross entropy loss to the loss function, which aims to force the edgeattribute e 𝑣𝑢 to contain the interaction type information. In this section, we evaluate the proposed model with a numberof competitive baselines. Meanwhile, many discussions and casestudies on drug package recommendation will be presented.

We omit the dataset description in this section since it has beenintroduced in Section 3.1. Other experimental settings will be de-scribed in the following parts.

To evaluate the perfor-mance of our models for drug package recommendation, we selecteda number of state-of-art methods as baselines. Specifically, we firstchose two popular traditional recommendation approaches, andseveral state-of-art package recommendation models as follows: • NCF [16]: NCF is a state of art deep neural networks onrecommendation system, which replacing the inner productin matrix factorization with a neural architecture. This model recommends top 𝐾 drugs as packages for the patients in testsets based on the patient embeddings, where 𝐾 is the averagesize of drug packages. • NN : This method utilizes the pretrained patient embeddingsbased on NCF, and returns the drug package correspond-ing to the Nearset Neighbor (NN) by calculating the cosinesimilarity of patient embeddings. • Package2vec : [36] proposes to utilize Item2vec [4] for en-hancing the item embeddings in a package , and we extendItem2vec following [23] to get the embedding of a pack-age. NCF framework and BPR loss are utilized to train thepackage recommendation model. • LDA [5]: This method utilizes the LDA model to get theembedding of a package and uses the same framework asPackage2vec to recommend packages. • BR [33]: BR is a package recommendation method whichaggregates item latent vectors to get the package embeddingsbased on package size and item compatibility. • DAM [7]: DAM is the state-of-art neural network architec-ture for package recommendation which utilizes factorizedattention network to get the embedding of packages. • GNN : This method is a simplified variant of our models,which only uses the package graph structure and ignore theedge attributes.It is worth noting that the drug package recommendation ismuch different from general recommendation since there is nofixed users in our task. Therefore, in all of the baseline methods,we exploited the patient embedding model proposed in Section 4.1to get the representation of patients. Another problem is how togenerate packages for patients in test set since most of the mod-els are discriminant. Therefore, we proposed that except for theNCF model which can generate packages itself, all the remainingmodels only pick out the best package from a candidate set, andthe candidate set consists of drug packages from 10 most similarpatients. The similarity was calculated by the cosine similarity be-tween patient embeddings. Evaluation metrics including Precision,Recall and F1-score were utilized to compare the performance ofthe models.

We implemented our model by Py-Torch and Pytorch Geometric . The parameters were all initializedusing Kaiming [15] initialization. For the pre-training model, weset the output dimension of the MLP, the dimension of char embed-dings, and the hidden size of the LSTM as 32, while the dimension ofpatient embeddings was set as 64. For the construction of packagegraph, we set the threshold value of co-occurrence proportion as0.01. For the BPR loss used in this paper, we used negative sam-pling to train the model and set the negative sampling ratio as 10,which means 10 negative samples for one positive sample. For allthe MLP models used in this paper, we set the dimension of hiddenlayers as 128. In the process of model training, we used the Adamoptimizer [17] for parameter optimization. We set learning rate as0.001 and mini-batch size as 256. The parameters of baselines wereset up similarly as our method and were all tuned to be optimal toensure fair comparisons. For the dataset splitting, we divided our https://pytorch.org/ https://github.com/rusty1s/pytorch_geometric able 4: The performance of each model. model Precision Recall F1-scoreNCF 0.3812 0.5442 0.4200NN 0.4890 0.4985 0.4732Package2vec 0.4846 0.5268 0.4857LDA 0.5014 0.5219 0.4904BR 0.5068 0.5106 0.4879DAM 0.5254 0.5107 0.4979GNN 0.5085 0.5288 0.5009DPR-WG 0.5133 dataset into 80%/10%/10% training/validation/test and we reportperformance on the test set for the model that performed best onthe validation set. To demonstrate the effectiveness ofour drug package recommendation framework, we compared DPR-WG and DPR-AG with all the baselines, and the results are shownin Table 4. From the results, we can get several observations:First, the performance of our models surpasses most of the base-line methods on different evaluation metrics. This clearly provesthe effectiveness of our DPR framework based on package graphconstruction and message passing neural networks. Furthermore,our models obtain much higher recall than baselines, which indi-cates our models are more likely to prevent doctors from neglectingcertain factors in practical application.Second, the performance of NCF model is the worst, since thismethod based on collaborative filtering prefers to recommend itemswith higher popularity, and cannot model the drugs as a whole.This clearly verifies the necessity for the studies of package recom-mendation systems.Third, the GNN model which only leverages the graph topologi-cal structure to exchange information between different drugs can-not achieve comparable result with our model. However, this modelsurpasses all the other baselines. This verifies the effectiveness ofconstructing package graphs to capture the interaction betweendrugs, and futher indicates the effectiveness of our method for thegraph induction process.Last but not least, the results of the models except NCF are closeto each other, since patients with similar condition are more likelyto use similar drugs.

To further validate the effectiveness of eachcomponent of our models, we also designed some simplified vari-ants of our models as follows: • DPR-WG-Context : This method is a simplified variant ofDPR-WG which only utilizes the edge attributes initializedby the drug interaction matrix and ignores the influence ofthe patient condition. • DPR-WG-Type : This method is a simplified variant of DPR-WG which only uses the contextual impact factor as edgeattributes and ignores the drug interaction type.

Table 5: The results of ablation study. model Precision Recall F1-scoreDPR-WG-Context 0.5126 0.5330 0.5053DPR-WG-Type 0.5126 0.5377 0.5074DPR-AG-Mask 0.5152 0.5342 0.5061DPR-AG-Type 0.5154 0.5317 0.5056DPR-WG 0.5133 0.5488 0.5137DPR-AG 0.5260 0.5407 0.5162

WG-0.01 WG-0.05 WG-0.1 AG-0.01 AG-0.05 AG-0.1

Figure 6: The preformance of our models with different co-occurrence proportion threshold.Table 6: The preformance of our models with different num-ber of GNN layers. model Precision Recall F1-scoreDPR-WG-1 0.5133 0.5488 0.5137DPR-WG-2 0.4994 0.5582 0.5100DPR-AG-1 0.5260 0.5407 0.5162DPR-AG-2 0.5139 0.5457 0.5128 • DPR-AG-Mask : This method is a simplified variant of DPR-AG which deletes the mask layer in the calculation process. • DPR-AG-Type : This method is a simplified variant of DPR-AG which deletes the cross entropy loss in the loss function.In this way, the edge attributes dose not contain the infor-mation of drug interaction type.The results of ablation study are shown in Table 5 from whichwe can draw the following conclusions. First, DPR-WG performsbetter than the two variants. This indicates that both the contextualimpact factors and the initial edge weights are significant, whichclearly verifies our assumption that patient condition will influencethe interaction effect between drugs. Second, DPR-AG also performsbetter than the two variants, which verifies that both parts of druginteraction type and mask vectors are effectual, and the mask layerwe proposed can effectively extract the feature of patient condition. (a) DPR-WG (b) DPR-AG

Figure 7: The performance of DPR-WG and DPR-AG withdifferent number of negative samples.

We investigated the sensitivity of our model parameter in thissection. First, we evaluated how the threshold for co-occurrenceproportion affected the performance, and the results are shown inFigure 6. From the results, we can find that as the number of edgesdecreases, the model performance does not change significantly,and the F1-score shows a downward trend. This indicates the factthat there is no interaction between most of the drug pairs.Next, we investigated whether utilizing two GNN layers canaffect the results. Table 6 shows the results of our two modelswith one and two GNN layers. The results have not witnesseda performance improvement by adding one more GNN layer. Asmentioned before, different from general graphs, we only need oneGNN layer to extract almost all the information we expect sincethe drug package graph is dense enough.Finally, we verified the impact of the negative sampling ratio. Asshown in Figure 7, we can find the performance only fluctuates ina small range, and the model with a small negative sample numberalso works well in practice. All the above experiments have provedthe robustness of the methods proposed in this paper.

In this part, we present some cases to illustrate the effectiveness ofour models and reveal some interesting medical rules based on thederived insights on patient conditions and drug interaction.

As mentioned before, we extracted themask vector 𝜎 ( 𝑀𝐿𝑃 ( u )) of patient 𝑢 to describe the impact of thepatient condition. In order to analyze the effect of the mask vectors,we randomly selected 1,000 patients and their corresponding maskvectors, and projected them into two-dimensional space with t-SNE,which is proposed in [29]. We further selected three representativepatient groups with special needs for drugs based on commonsense, respectively pregnant women, infants (or young children)and patients with liver disease.Figure 8 shows the visualization result. We can find that themask vectors of infants and pregnant women deviate the mostfrom the vectors of other patients, which indicates that these twogroups have the most special requirements for drug selecting, andthis is consistent with our common sense. Moreover, the maskvectors of patients with liver disease are also relatively deviatedfrom other patients, but the degree of aggregation is lower than PregnantInfantLiver diseaseOthers

Figure 8: Visualization of mask vectors. previous two groups. This indicates that patients with liver diseasehave special needs for drugs, but there are also certain personalizedneeds. We can further study the impact of patient conditions ondrug selection by statistical methods such as clusting, which showsa great possibility of our method to help medical researchers.

In Section 4.3.1, we pro-pose to utilize contextual impact factors to reflect the impact ofpatient condition on drug interaction. In this section we will showhow these impact factors play a role for recommending packages.We picked patient

In Section 4.3.2, edge attribute vec-tors are calculated to describe the interaction between two drugs.The attribute vectors are forced to contain drug interaction categoryinformation, and mask vectors are utilized to bring the impact of able 7: Contextual Impact Factor Analysis for Patient

Drug 1 Drug 2 Description Type FactorPotassium Chloride Cefazolin drug 2 may decrease the excretion rate of drug 1. Synergism 0.993Midazolam Potassium Chloride drug 1 may decrease the excretion rate of drug 2. Synergism -0.264Ephedrine Methylprednisolone drug 1 may increase the excretion rate of drug 2. Antagonism -0.309

Table 8: Edge Attribute Analysis for Patient

Drug 1 Drug 2 Type 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 (cid:0) e ⊤ 𝑣𝑢 Q (cid:1) 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 (cid:0) ˆ e ⊤ 𝑣𝑢 Q (cid:1) Warfarin Ondansetron Synergism [0.007, 0.807, 0.184] [0.015, 0.923, 0.061]Metformin Spironolactone Antagonism [0.358, 0.163, 0.478] [0.769, 0.022, 0.208]

Table 9: The results of package generation. model Non-heuristic Heuristicdoctor1 39% 61%doctor2 37% 63%doctor3 39% 61%doctor4 45% 55%doctor5 30% 70%average 38% 62%patient condition. We propose that the mask vector plays a role byfeature selecting. If we multiply a contextual edge attribute vector e 𝑣𝑢 with the classification transfer matrix Q , we can get a person-alized drug interaction classification result, and we will illustratethis intuition in this case study.We picked patient Until now, we have considered recommending drug packages thatalready exist within the EMR database. However, existing packagescannot meet the needs of new patients sometimes. Therefore, wepresent a heuristic algorithm which combines the existing packages, personalized drug prediction lists and drug interaction matrix togenerate new packages. The algorithm is described as follows.First, we get the drug frequency rank list 𝐿 which contains drugsin descending order of occurrence frequency in the EMR dataset.Then, we calculate the drug co-occurrence proportion matrix 𝑀 which is mentioned in Section 4.2. For a new patient, we can getthe patient embedding based on the patient’s description. With thepatient embedding, we can get the candidate set 𝑆 from similarpatients as previously mentioned, and we can get the personalizedprediction list 𝑙 of all drugs by utilizing the NCF model obtained inthe pre-training phase, which contains drugs in descending orderof predict value. It is worth noting that, as shown in Section 5.2,the top drugs in 𝑙 can be incorrect. Finally, start with the initialcandidate set 𝑆 , we can get new drug packages as:(1) Form a new candidate set 𝑆 based on reforming the packagesin 𝑆 by the following ways: • Delete the drugs that only appear in a small number ofpackages in 𝑆 and rank low in 𝑙 ; • Add the drugs that rank low in 𝐿 and rank high in 𝑙 , whichmeans these drugs are not recommended just because theyhave high popularity.(2) Generate candidate set 𝑆 by modifying the drugs in 𝑆 usingmore radical strategies as: • If drug 𝑑 ranks high in 𝑙 and has synergism relationshipwith a drug in package 𝑝 , then add drug 𝑑 to package 𝑝 ; • If drug 𝑑 ranks high in 𝑙 and has high co-occurrence pro-portion with a drug in package 𝑝 , then add drug 𝑑 topackage 𝑝 ; • If drug 𝑑 and 𝑑 in package 𝑝 have antagonism relation-ship and low co-occurrence proportion, then delete thedrug with lower lank in 𝑙 ;(3) returns final candidate set 𝑆 = 𝑆 ∪ 𝑆 ∪ 𝑆 .We verified the effectiveness of our heuristic algorithm on DPR-WG, where the non-heuristic model selected the best package fromthe initial candidate set 𝑆 , and the heuristic model selected bestpackage from 𝑆 . Due to the hidden security risks of directly usingthe generated package, we randomly selected some test samples andhanded them to five doctors to mark the packages they preferred.The results are shown in Table 9, where the percentages reflectthe ratio of the doctors’ choice. From the results, we can find thatutilizing the drug packages generated by the heuristic algorithm able 10: Examples for the heuristic method. Patient ID Ground Truth Non-heuristic Heuristic

Isoniazid , Silybin , Pyridoxine

Hydroxyethyl starch , Peptide hormones , Lidocaine can significantly improve the performance of drug package recom-mendation. Furthermore, we picked two examples to illustrate theeffect of the heuristic method. The examples are shown in Table 10,where patient 𝐿 and 𝑙 , andPyridoxine was added because of the synergism interaction withLevofloxacin in the first example. For the deleting strategy, severalincorrect drugs were deleted in the second example. All the resultsconfirm the effectiveness of our package generation method. In this paper, we studied the problem of drug package recommenda-tion. Specifically, we first designed a pre-training method based onneural collaborative filtering to get the initial embedding of patientsand drugs. Then, the drug interaction graph was initialized basedon medical records and domain knowledge. Furthermore, we pro-posed a new drug package recommendation framework with twovariants, respectively DPR-WG and DPR-AG to solve the problem,in which each the interactions was described as signed weights orattribute vectors. Finally, extensive experiments on a real-worlddata set from a first-rate hospital demonstrated the effectiveness ofour DPR framework compared with several competitive baselinemethods, and further supported the heuristic study for the drugpackage generation task with adequate performance.

REFERENCES [1] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next gen-eration of recommender systems: A survey of the state-of-the-art and possibleextensions.

IEEE transactions on knowledge and data engineering

17, 6 (2005),734–749.[2] Naveed Afzal, Sunghwan Sohn, Sara Abram, Christopher G Scott, RajeevChaudhry, Hongfang Liu, Iftikhar J Kullo, and Adelaide M Arruda-Olson. 2017.Mining peripheral arterial disease cases from narrative clinical notes using natu-ral language processing.

Journal of vascular surgery

65, 6 (2017), 1753–1761.[3] Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and JunGao. 2019. Personalized bundle list recommendation. In

The World Wide WebConference . 60–71.[4] Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embeddingfor collaborative filtering. In . IEEE, 1–6.[5] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.

Journal of machine Learning research

3, Jan (2003), 993–1022.[6] Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. BundleRecommendation with Graph Convolutional Networks. In

Proceedings of the 43rdInternational ACM SIGIR Conference on Research and Development in InformationRetrieval . 1673–1676. [7] Liang Chen, Yang Liu, Xiangnan He, Lianli Gao, and Zibin Zheng. 2019. MatchingUser with Item Set: Collaborative Bundle Recommendation with Deep AttentionNetwork.. In

IJCAI . 2095–2101.[8] Wen-Hao Chiang, Li Shen, Lang Li, and Xia Ning. 2018. Drug recommendationtoward safe polypharmacy. arXiv preprint arXiv:1803.03185 (2018).[9] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014.Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).[10] Ting Deng, Wenfei Fan, and Floris Geerts. 2013. On the complexity of packagerecommendation problems.

SIAM J. Comput.

42, 5 (2013), 1940–1986.[11] Steven E Dilsizian and Eliot L Siegel. 2014. Artificial intelligence in medicineand cardiac imaging: harnessing big data and advanced computing to providepersonalized medical diagnosis and treatment.

Current cardiology reports

16, 1(2014), 441.[12] Francisca García-Lizana and Antonio Sarría-Santamera. 2007. New technologiesfor chronic disease management and control: a systematic review.

Journal oftelemedicine and telecare

13, 2 (2007), 62–68.[13] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E.Dahl. 2017. Neural Message Passing for Quantum Chemistry. In

Proceedings ofthe 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW,Australia) (ICML’17) . JMLR.org, 1263–1272.[14] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representationlearning on large graphs. In

Advances in neural information processing systems .1024–1034.[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deepinto rectifiers: Surpassing human-level performance on imagenet classification.In

Proceedings of the IEEE international conference on computer vision . 1026–1034.[16] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-SengChua. 2017. Neural collaborative filtering. In

Proceedings of the 26th internationalconference on world wide web . 173–182.[17] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization. In , Yoshua Bengioand Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980[18] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907 (2016).[19] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-niques for recommender systems.

Computer

42, 8 (2009), 30–37.[20] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami,and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In

Proceedings of the 2016 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies . Association forComputational Linguistics, San Diego, California, 260–270. https://doi.org/10.18653/v1/N16-1030[21] Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of experts insocial networks. In

Proceedings of the 15th ACM SIGKDD international conferenceon Knowledge discovery and data mining . 467–476.[22] Vivian Law, Craig Knox, Yannick Djoumbou, Tim Jewison, An Chi Guo, YifengLiu, Adam Maciejewski, David Arndt, Michael Wilson, Vanessa Neveu, et al. 2014.DrugBank 4.0: shedding new light on drug metabolism.

Nucleic acids research

International conference on machine learning . 1188–1196.[24] Shuangli Li, Tong Xu, Hao Liu, Xinjiang Lu, and Hui Xiong. 2020. CompetitiveAnalysis for Points of Interest. 1265–1274. https://doi.org/10.1145/3394486.3403179[25] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gatedgraph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).26] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra AdiyosoSetio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, BramVan Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medicalimage analysis.

Medical image analysis

42 (2017), 60–88.[27] Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalizedtravel package recommendation. In . IEEE, 407–416.[28] Yang Liu, Zhi Li, Wei Huang, Tong Xu, and En-Hong Chen. 2020. ExploitingStructural and Temporal Influence for Dynamic Social-Aware Recommendation.

Journal of Computer Science and Technology

35 (2020), 281–294.[29] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.

Journal of machine learning research

9, Nov (2008), 2579–2605.[30] Suzanne Martin, Greg Kelly, W George Kernohan, Bernadette McCreight, andChristopher Nugent. 2008. Smart home technologies for health and social caresupport.

Cochrane database of systematic reviews

British journal of haematology

Proceedings of the third ACM conference on Recommendersystems . 353–356.[33] Apurva Pathak, Kshitiz Gupta, and Julian McAuley. 2017. Generating and person-alizing bundle recommendations on steam. In

Proceedings of the 40th InternationalACM SIGIR Conference on Research and Development in Information Retrieval .1073–1076.[34] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In

Proceedingsof the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (Montreal,Quebec, Canada) (UAI ’09) . AUAI Press, Arlington, Virginia, USA, 452–461.[35] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprintarXiv:1710.10903 (2017). [36] Mengting Wan, Di Wang, Jie Liu, Paul Bennett, and Julian McAuley. 2018. Rep-resenting and recommending shopping baskets with complementarity, compat-ibility and loyalty. In

Proceedings of the 27th ACM International Conference onInformation and Knowledge Management . 1133–1142.[37] Hao Wang, Tong Xu, Qi Liu, Defu Lian, Enhong Chen, Dongfang Du, Han Wu,and Wen Su. 2019. MCNE: An end-to-end framework for learning multipleconditional network representations of social network. In

Proceedings of the 25thACM SIGKDD International Conference on Knowledge Discovery & Data Mining .1064–1072.[38] Meng Wang, Mengyue Liu, Jun Liu, Sen Wang, Guodong Long, and Buyue Qian.2017. Safe medicine recommendation via medical knowledge graph embedding. arXiv preprint arXiv:1710.05980 (2017).[39] Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen.2017. Deep Matrix Factorization Models for Recommender Systems.. In

IJCAI ,Vol. 17. Melbourne, Australia, 3203–3209.[40] Le Zhang, Tong Xu, Hengshu Zhu, Chuan Qin, Qingxin Meng, Hui Xiong, and En-hong Chen. 2020. Large-Scale Talent Flow Embedding for Company CompetitiveAnalysis. In

Proceedings of The Web Conference 2020 . 2354–2364.[41] Ping Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino. 2014. Towardspersonalized medicine: leveraging patient similarity and drug similarity analytics.

AMIA Summits on Translational Science Proceedings

ACM Computing Surveys(CSUR)

52, 1 (2019), 1–38.[43] Yin Zhang, Daqiang Zhang, Mohammad Mehedi Hassan, Atif Alamri, and LimeiPeng. 2015. CADRE: Cloud-assisted drug recommendation service for onlinepharmacies.

Mobile Networks and Applications

20, 3 (2015), 348–355.[44] Zhi Zheng, Tong Xu, Chuan Qin, Xiangwen Liao, Yi Zheng, Tongzhu Liu, andGuixian Tong. 2020. Multi-Source contextual collaborative recommendation formedicine.

Journal of computer research and development

57, 8 (2020), 1741–1754.[45] Tao Zhu, Patrick Harrington, Junjun Li, and Lei Tang. 2014. Bundle recommenda-tion in ecommerce. In