DCDIR: A Deep Cross-Domain Recommendation System for Cold Start Users in Insurance Domain
Ye Bi, Liqiang Song, Mengqiu Yao, Zhenyu Wu, Jianming Wang, Jing Xiao
DDCDIR: A Deep Cross-Domain Recommendation System forCold Start Users in Insurance Domain
Ye Bi, Liqiang Song, Mengqiu Yao, Zhenyu Wu, Jianming Wang, Jing Xiao [email protected],{songliqiang537,yaomengqiu621,wuzhenyu447,wangjianming888,xiaojing661}@pingan.com.cnPing An Technology Shenzhen Co., Ltd
ABSTRACT
Internet insurance products are apparently different from tradi-tional e-commerce goods for their complexity, low purchasing fre-quency, etc. So, cold start problem is even worse. In traditionale-commerce field, several cross-domain recommendation (CDR)methods have been studied to infer preferences of cold start usersbased on their preferences in other domains. However, these CDRmethods couldnâĂŹt be applied into insurance domain directlydue to product complexity. In this paper, we propose a Deep Cross-Domain Insurance Recommendation System (DCDIR) for cold startusers. Specifically, we first learn more effective user and item latentfeatures in both domains. In target domain, given the complexityof insurance products, we design a meta-path based method overinsurance product knowledge graph. In source domain, we employGRU to model users’ dynamic interests. Then we learn a featuremapping function by multi-layer perceptions . We apply DCDIR onour companyâĂŹs dataset, and show DCDIR significantly outper-forms the state-of-the-art solutions.
CCS CONCEPTS • Applied computing → Online insurance ; •
Information sys-tems → Data mining ; •
Networks → Data path algorithms.
KEYWORDS
Insurance Recommendation, Cross-domain Recommendation, ColdStart Problem, Knowledge Graph
ACM Reference Format:
Ye Bi, Liqiang Song, Mengqiu Yao, Zhenyu Wu, Jianming Wang, Jing Xiao.2020. DCDIR: A Deep Cross-Domain Recommendation System for ColdStart Users in Insurance Domain. In
Proceedings of the 43rd InternationalACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20), July 25–30, 2020, Virtual Event, China.
ACM, Xi’an, China, 5 pages.https://doi.org/10.1145/3397271.3401193
Nowadays, internet finance is booming and rapidly infiltrating intoall kinds of traditional financial fields. Internet insurance adaptedto the trend of economic boom in internet age, since it can not only
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
SIGIR ’20, July 25–30, 2020, Virtual Event, China © 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-8016-4/20/07...$15.00https://doi.org/10.1145/3397271.3401193 overcome the limitations of live sales and geography, but also pro-vide savings for both companies and their consumers. Due to natureof insurance industry, the products that insurance companies canprovide on internet always have the following characteristics: 1) thecoverage time is no more than 1 year; 2) the prices are lower thanlong-term insurances; 3) they covers widely, including property andcasualty, etc.; 4) the customers are not required to buy other insur-ance products earlier. However, recommending insurance productsonline is challenging. First, insurance policies are so complex thatordinary users are relatively lack of knowledge to understand them.Besides, insurance products are typically bought to be used for along time period (e.g. one year for car insurance), so there existsdata sparsity and cold start problem. Researchers try to solve theproblem by recommendation systems (RS) [5, 9, 11], however, thesemethods directly apply traditional RS model to insurance domain,neglecting item complexity and data sparsity.PingAn Jinguanjia (PAJGJ) is one of the most popular compre-hensive applications (APP) in China. In addition to traditional e-commerce products (defined as nonfinancial products in this paper),e.g. household supplies, it also provides financial products likeinsurance products, investment services. Here we focus on rec-ommending insurance products. Traditional RS, like collaboratefiltering (CF) could not perform effectively in insurance domain forits particular characteristics. To get more accurate recommendation,our company tries to use side information from PAJGJ (interactionbehaviors from nonfinancial domain), but to little avail.Cross-domain recommendation (CDR) [4, 6, 7], employing datafrom multiple domains, is one of the promising ways to solve datasparsity and cold start problem. Generally, CDR can be categorizedinto two categories. One is interested in improving the overall per-formance in target domain by aggregating knowledge between twodomains [6]. The other one aims at infering the preferences of coldstart users based on their preferences observed in other domains[4, 7]. These methods assume that there exists overlap in infor-mation between users and/or items across different domains, andtrain a mapping function from source domain into target domain.Unfortunately, we could not apply CDR methods into insuranceand nonfinancial domain directly for its properties.Based on the observations, we propose a novel framework calleda Deep Cross-Domain Insurance Recommendation System (DCDIR)for cold start users. Specifically, we first try to learn more effectiveuser and item latent features in both source and target domains.In target domain, given the complexity of insurance products, wedesign a meta-path based method over the knowledge graph weconstructed. In source domain, we employ gated recurrent unit(GRU) to model users’ dynamic interests. After obtaining the la-tent features of the overlapping users, a feature mapping function a r X i v : . [ c s . I R ] J u l etween the two domains is learned by multi-layer perceptron(MLP).In summary, our contributions in this paper are as follows: • To the best of our knowledge, this is the first work to utilizecross-domain mechanism to give personalized recommenda-tions for cold start users in insurance domain. • For the complexity of insurance products, we design a meta-path based method to learn more effective latent user anditem features, revealing reasons behind recommendations. • We conduct experiments on our company’s scenarios, theresults prove the efficacy of DCDIR over several baselines.
Let U = { u , u , . . . , u m } denote overlapping users between nonfi-nancial domain (source domain) D s and insurance domain (targetdomain) D t . If a user only appears in one domain, he/she is a coldstart user in the other domain. The user-item interaction matricesare denoted as Y s ∈ R m × s and Y t ∈ R m × t , which are defined ac-cording to usersâĂŹ implicit feedbacks. We additionally use NI su and NI tu for the sequences of items that user u has interactedwith. We also have a insurance knowledge graph (ISKG) G , whichconsists of multiple entity types (i.e. Product, Feature, Need) andmany entity-relation-entity triples ( h , r , t ) . For example, ( travel acci-dent insurance, insurance.product-insurance.type, accident insurance )states the type of “travel accident insurance” is accident insurance.Given rating matrices and ISKG, our goal is to learn the mappingfunction from nonfinancial domain to insurance domain, whichcan help us deal with cold start users. To provide recommendations to cold start users, we propose DCDIR.As shown in Figure 1, DCDIR contains three main parts: learninguser latent features in two domain, mapping of user latent features.
Figure 1: The Framework of DCDIR
As mentioned above, the complexity of insurance products is typ-ically non-trivial, understanding the items may require a consid-erable cognitive overload [11]. To help users better understandinsurance products, we design a meta-path based method. Figure 2shows the framework, we first pretrain KG by TransD [3], and getentity and relation embeddings, which are denoted by e , r ∈ R d .Then, we generate meta-paths connecting user’s interacted items and target item. To select high-quality meta-paths, we properly de-sign a score function. Finally, we use GRU to model each meta-pathand employ max-pooling to aggregate these selected paths. Figure 2: Meta-Path based ISKG Module
The triples in KG describe relational prop-erties of items, which constitute several paths between the user’sinteracted items and target item. For a given user u , we formallydefine the path from i ∈ NI tu to target item v as a sequenceof entities and relations: p e , e L = [ e r −−→ e r −−→ . . . r L − −−−→ e L ] ,where e = i ∈ NI tu , e L = v , ( e l , r l , e l + ) is the l -th triple in p e , e L , and L denotes the number of triples in the path. We use S u = (cid:8) p e , e L | e ∈ NI tu (cid:9) to denote all generated paths of u . Fromthe construction of ISKG, we know that relation r l − and entity e l have similar semantics, so the embedding of p e , e L is denotedas [ e , e , . . . , e L ] . Long meta-paths are likely to introduce noisysemantics [12], we properly design two meta-paths based on ourscenario, where we fix entity type and path length. They are rep-resented as ( P , F , N , F , P ) and ( P , N , F , N , P ) , where P , N and F denote “Product” , “Need” and “Feature”. Here are two examples. ( P , F , N , F , P ) GEF → compensate critical illness → insure critical illness → high premium → AX BFB ( P , N , F , N , P ) ESB → insure medical treatment → high level assurance → insure accident → BW RW X where
GEF is critical illness insurance,
EBS is health insurance,
AX BFB and
BW RW X are accident insurances.
There still somany meta-paths, even though we have fixed path structure. Someof the paths bring much more noises than useful signals, so we use top - K sampling module to select K useful paths. Specifically, for agiven path p e , e L = [ e , e , . . . , e L ] , we define a score function: s e , e L = softmax ( P |NI tu | ) + e TL ∥ e L ∥ L − (cid:213) i = e i ∥ e i ∥ , (1)where P is e ’s position in NI tu . The first part of (1) is to measureinteraction time, since more recent items in a sequence have a largerimpact on usersâĂŹ next actions. The second part is to measurethe similarity between the path and the target item. For a user,we select top-K paths with high score, which are denoted by a set S top − Ku , K is a given parameter. .1.3 Path Embedding and User Feature Representation. A pathinstance is a node entity sequence, to embed such sequence into alow-dimensional vector, we take GRU [2]. The formulations are: x n = σ ( W x e n + U x h n − + b x ) r n = σ ( W r e n + U r h n − + b r ) (cid:101) h n = tanh ( W h e n + r n ◦ U h h n − + b h ) h n = ( − x n ) ◦ h n − + x n ◦ (cid:101) h n , (2)where σ is sigmoid function, ◦ is element-wise product, W x , W r , W h ∈ R n H × d , U x , U r , U h ∈ R n H × n H , n H = d is hidden size. Let p e , e L = h n , and apply max-pooling, i.e.: u t = max-pooling (cid:110) p e , e L | p e , e L ∈ S top − Ku (cid:111) . In our APP, each item i in nonfinancial domain is associated with adescription c i . In order to learn more effective latent features, weemploy word2vec [8]. Suppose there are n words in i ’s content c i .We utilize word2vec to obtain word vectors, which are representedas { w ik } nk = . Then we get the final item embedding by: i = max-pooling ( concat ({ w ik } nk = )) . To model user latent feature u s , we employ GRU over NI su , andlet u s = h GRU n (cid:0) NI su (cid:1) , the equation is replacing e n by i n in eq. (2). We employ MLP [7] to learn mapping function between two do-mains, taking u s as input and u t as output. The loss function is: L cross = (cid:213) u ∈U ∥ f mlp ( u s ) − u t ∥ . In the training process, loss functions for each part is added togetherfor joint optimization. The overall loss function is: L = L cross + L T + L S . where L T and L S are recommendation loss in target and sourcedomain, respectively. Take the target domain as an example, L T = (cid:213) ( u , v )∈ Y t − ( y uv log ˆ y uv + ( − y uv ) log ( − ˆ y uv )) , where ˆ y uv = σ ( f ( u t , v t )) , σ (·) is sigmoid function, f is a rankingfunction which can be a dot-product or a deep neural network. In this paper, we assume cold start users have interactions in non-financial domain, but no interactions in insurance domain. Afterlearning the latent features in nonfinancial domain u s , we can getthe corresponding mapping latent features ˆ u t = f mlp ( u s ) . Basedon learned ˆ u t , we can make recommendations to cold start users. Table 1: Statistics of the JGJISNF dataset.
IS-domain (Target domain) NF-domain (Source domain)
Items 42
Items 3,836
Interactions 300,000
Interactions 600,000
KG relations 7
KG enitities 77
KG triples 282
Overlapped-users 21,016
Training-sequences 12,437
Test-sequences 4,218
Validation-sequences 4,298
We conduct extensive experiments to answer the following ques-tions: RQ1: How does DCDIR model perform compared with base-lines in terms of NDCG and Recall@3? RQ2: Can DCDIR alleviatethe data sparsity problem? RQ3: How does path-based ISKG moduleaffect the performance of DCDIR for cold start users?
Datasets.
There is no publicly available dataset for CDR-ISNF(cross-domain recommendation for insurances and nonfinancialproducts). To demonstrate the overall effectiveness of the proposedDCDIR model, we build and release a sub-dataset (named JGJISNF)from a comprehensive e-commerce dataset that contains about 20million users pursue logs from June 1st 2018 to May 31th 2019.The pursue logs are collected on IS-domain and NF-domain from awell-known e-commerce platform PAJGJ. The IS-domain containsshort-term insurances (periods is less than 1 year, e.g., includingillness insurances, accident insurances, etc.) interactions . The NF-domain contains user logs of non-financial products (daily neces-sities products, e.g.,clothes, skincare products, fruits, electronicsproducts, etc). In the two domains, we gather chronological userbehaviors, user profiles and detailed product descriptions. Due tothe complexity of insurance products, we construct a knowledgegraph of insurance products based on their own information.
Comparative Models and Metrics.
We compare DCDIR withfour baselines and two variants of DCDIR. The baselines can becategorized into single-domain group (BPR[10] and GRU4REC [2])and cross-domain group (EMCDR-BPR [7],EMCDR-GRU, DCDIR,DCDIR-V1 and DCDIR-V2). The first group is to validate the use-fulness of CDR models, and the second group is for demonstrat-ing the advantage of path-based method. DCDIR leverage path-based method to deal with insurance products’ complex knowledgegraph, while DCDIR-V1 and DCDIR-V2 use only simple products’attributes and KGE method (2-hop entity aggregation among ISKG),respectively.We evaluate all models in terms of Recall@N (N=3) and NDCG.We adopt a common and widely used strategy to avoid heavy com-putation on evaluating all user-item pairs [1, 13, 14]. For each user u , we randomly sample negative items that don’t appear in thetraining set and rank them with the single ground-truth item. Parameter Setting.
We randomly select 30% of the total over-lapped users and remove their information in the target domainas cold start users for evaluating the performance (i.e., test users).To study the performance of DCDIR with respect to the number ofoverlapped users, we restrict the number of the overlapped userssimilarly to the real-world distribution. We build four training setswith a certain fraction η ∈ { , , , } of overlapped able 2: Performance comparison in Recall@3 and NDCG. The best baseline except DCDIR is bolded. Numbers in “()” representthe percentage of three variants’ performance at η =10 % compared with their best performance in other sparsity level. η % % % % Method NDCG Recall@3 NDCG Recall@3 NDCG Recall@3 NDCG Recall@3BPR 0.27011 0.06418 0.27105 0.06518 0.27133 0.06451 0.27325 0.07124GRU4REC 0.23923 0.02143 0.25964 0.07768 0.30725 0.09611 0.30623 0.08602EMCDR-BPR 0.27343 0.07291 0.27342 0.07291 0.27342 0.07325 0.27347 0.07325EMCDR-GRU 0.26775 0.11794 0.26801 0.11794 0.29056 0.11996 0.31288 0.12298DCDIR-V1 0.34781(-4.66 % ) 0.17321(-6.28 % ) 0.35196 0.18016 0.35653 0.18078 0.36481 0.18481DCDIR-V2 (-13.05 % ) (-27.60 % ) DCDIR 0.39394(-3.95 % ) 0.25185(-5.31 % ) 0.39741 0.25227 0.40773 0.26268 0.41016 0.26597DCDIR vs. best 8.59 % % % % % % % % users who do not belong to the test users. These settings are chosenwith grid search on the validation set. Item embedding size andGRU hidden state size are set to 50. We use dropout with drop ratio p = .
8. For the parameters in Section 3.1.2 (path-based method sec-tion), we try different settings, the analysis of which can be found inSection 4.3.For the hyper-parameters of the Adam optimizer,we setthe learning rate α = 0.001. To speed up the training and convergequickly, batch size is set to 32. We test the model performance onthe validation set for every epoch. To answer RQ1 and RQ2, three variants of DCDIR are comparedwith four state-of-the-art models with different densities. Table 2shows the performance comparison. Overall, benefiting from theproposed insurance products’ KG path-based representations andsource domain information, DCDIR beats all comparative methods,and achieves the range of 0.22%-8.59% and 0.51%-26.23% improve-ments over the best comparative model in Recall@3 and NDCGunder all levels of data sparsity, respectively. These experimentsreveal a number of interesting discoveries: (1) All cross-domainmethods yield better performances than single-domain methodswith mixture of target and source domain data , demonstratingthe importance of cross-domain module; (2) Owing to the capa-bility of using insurance productsâĂŹ knowledge, three variantsof DCDIR (DCDIR, DCDIR-V1 and DCDIR-V2) defeat other com-parative methods; (3) It also demonstrates that DCDIR achievesmore improvements in a sparser dataset than in a denser one. It isvalidated that, compared to comparative approaches, DCDIR canbetter diminish the negative impacts of the data sparsity issue. Wealso conduct experiments to compare DCDIR with DCDIR-V1 andDCDIR-V2 (definition refer to 4.1 comparative models). Numbers inâĂIJ( )âĂİ shows the performance of DCDIR-V2 using KGE methoddeclines sharply in terms of Recall@3 (-13.05%) and NDCG (-27.60%)when using a sparser dataset, while DICIR-V1 cannot outperformDICDIR in all levels of sparsity. This shows that, DCDIR can getmore stable and better performance with limited data.
The cold start problem is one of the major challenges for RS. It isnecessary to study if our designed meta-path based ISKG modulecan deal with cold start users problem in an effective way. Therefore,we compare DCDIR with different parameters’ value, the numberof path selected and strategy of choosing high-quality paths, in an
Table 3: Performance comparison in Recall@3 and NDCGunder a sparse setting ( η =10 % ) with changing path numberand choosing path strategy. ISKG module Metricsparameter value NDCG Recall@3path_num 10 0.36611 0.1820720 0.39394 0.2518530 0.38435 0.18541path_strategy ‘topk’ 0.39394 0.25185‘random’ 0.34624 0.16065 extremely sparse dataset with η =10%, where the segmentation oftraining, testing and validation dataset as introduced above. Table 3indicates that, suffering from the cold start problem, DCDIR’s bestparameters in ISKG module are path number as 20 and choosingpath strategy is our designed top K method in terms of Recall@3and NCDG. Specifically, path strategy can effect the performanceof DCDIR significantly with a large improvement in Recall@3 andNCDG, respectively. Top K strategy optimizes the choice of high-quality insurance products’ KG paths, which both leverage rich andcomplicated information and interference information. Therefroe,DCDIR can better handle cold start users. To deal with insurance product complexity and cold start prob-lem, we propose DCDIR for cold start users. Specifically, we firstlearn more effective user and item latent features in two domains.In target domain, given the complexity of insurance products, wedesign a meta-path based method over insurance product knowl-edge graph, which can provide interpretable recommendations tousers. In source domain, we employ GRU to model users’ dynamicinterests. Then we learn a feature mapping function by multi-layerperceptions . We apply DCDIR on our companyâĂŹs dataset, andshow DCDIR significantly outperforms the state-of-the-art solu-tions.
REFERENCES [1] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua. 2017. Neural CollaborativeFiltering. In
WWW . 173–182.[2] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk. 2016. Session-based Recom-mendations with Recurrent Neural Networks. In
ICLR .[3] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao. 2015. Knowledge Graph Embedding viaDynamic Mapping Matrix. In
ACL . 687–696.[4] S. K. Kang, J. Hwang, D. Lee, and H. Yu. 2019. Semi-Supervised Learning forCross-Domain Recommendation to Cold-Start Users. In
CIKM . 1563–1572.[5] Z. Liu, C. Zang, K. Kuang, H. Zou, H. Zheng, and P. Cui. 2019. Causation-DrivenVisualizations for Insurance Recommendation. In
ICME Workshops . 471–476.[6] M. Ma, P. Ren, Y. Lin, Z. Chen, J. Ma, and M. D. Rijke. 2019. π -Net: A Paral-lel Information-sharing Network for Shared-account Cross-domain SequentialRecommendations. In SIGIR . 685–694.7] T. Man, H. Shen, X. Jin, and X. Cheng. 2017. Cross-Domain Recommendation:An Embedding and Mapping Approach. In
IJCAI . 2464–2470.[8] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. DistributedRepresentations of Words and Phrases and their Compositionality. In
NIPS .[9] M. Qazi, G. M. Fung, K. J. Meissner, and E. R. Fontes. 2017. An Insurance Recom-mendation System Using Bayesian Networks. In
RecSys . 274–278.[10] S. Rendle, C. Freudenthaler, Z. Gantner, and L. S. Thieme. 2009. BPR: BayesianPersonalized Ranking from Implicit Feedback. In
UAI . 452–461. [11] L. Rokach, G. Shani, B. Shapira, E. Chapnik, and G. Siboni. 2013. Recommendinginsurance riders. In
SAC . 253–260.[12] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. 2011. PathSim: Meta Path-Based Top-KSimilarity Search in Heterogeneous Information Networks. In
PVLDB .[13] X. Wang, X. He, Y. Cao, M. Liu, and T. Chua. 2019. KGAT: Knowledge GraphAttention Network for Recommendation. In
SIGKDD . 950–958.[14] X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua. 2019. Explainable Reasoningover Knowledge Graphs for Recommendation. In