[PDF] CRSLab: An Open-Source Toolkit for Building Conversational Recommender System

Abstract

Full PDF

CCRSLab: An Open-Source Toolkit for Building ConversationalRecommender System

Kun Zhou † , Xiaolei Wang † , Yuanhang Zhou , Chenzhan Shang , Yuan Cheng ,Wayne Xin Zhao ∗ , Yaliang Li , and Ji-Rong Wen

School of Information, Renmin University of China Gaoling School of Artiﬁcial Intelligence, Renmin University of China School of Statistics, Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods Alibaba Group

Abstract

In recent years, conversational recommendersystem (CRS) has received much attention inthe research community. However, existingstudies on CRS vary in scenarios, goals andtechniques, lacking uniﬁed, standardized im-plementation or comparison. To tackle thischallenge, we propose an open-source CRStoolkit CRSLab, which provides a uniﬁed andextensible framework with highly-decoupledmodules to develop CRSs. Based on thisframework, we collect 6 commonly-usedhuman-annotated CRS datasets and implement18 models that include recent techniques suchas graph neural network and pre-training mod-els. Besides, our toolkit provides a series ofautomatic evaluation protocols and a human-machine interaction interface to test and com-pare different CRS methods. The projectand documents are released at https://github.com/RUCAIBox/CRSLab . Recent years have witnessed remarkable progressin the conversational recommender system(CRS) (Christakopoulou et al., 2016; Sun andZhang, 2018; Li et al., 2018), which aims toprovide high-quality recommendations to usersthrough natural language conversations. To buildan effective CRS, users have proposed a surge ofdatasets (Kang et al., 2019; Zhou et al., 2020c;Liu et al., 2020) and models (Lei et al., 2020;Chen et al., 2019; Liao et al., 2019). However,these works are different in scenarios ( e.g., moviesor E-commerce platform), goals ( e.g., accuraterecommendation or user activation) and tech-niques ( e.g., graph neural network or pre-trainingmodels), hence it is challenging for users toquickly set up reasonable baseline systems ordevelop new CRS models. †† Equal contribution. ∗∗ Corresponding author, Email: batmanﬂ[email protected].

To alleviate the above issues, we have devel-oped

CRSLab , the ﬁrst open-source CRS toolkitfor research purpose. In CRSLab, we offer auniﬁed and extensible framework with highly-decoupled modules to develop a CRS. Speciﬁ-cally, we unify the task description of existingworks for CRS into three sub-tasks, namely rec-ommendation , conversation and policy , coveringthe common functional requirements of main-stream CRSs. To implement the overall frame-work, we design and develop highly-decoupledmodules ( e.g., data modules and model modules),which provide clear interfaces. Besides, we encap-sulate useful procedures and common functionsshared by different modules for reuse. In this way,it is easy for users to add new datasets or developnew models with our toolkit.Based on the framework, we integrate com-prehensive benchmark datasets and modelsin CRSLab. So far, we have incorporated6 commonly-used human-annotated datasetsand implemented 18 models, including ad-vanced techniques such as graph neural network(GNN) (Schlichtkrull et al., 2018; Zhou et al.,2020a) and pre-training models (Devlin et al.,2019; Zhou et al., 2020b). To support thesemodels, we perform necessary preprocessing onintegrated datasets ( e.g., entity linking and wordsegmentation), and release the processed data.In our CRSLab, we provide ﬂexible supportingmechanisms via the conﬁguration ﬁles or com-mand lines to run, compare and test these modelson integrated datasets, by which users can developa powerful CRS.Furthermore, CRSLab provides a series of auto-matic evaluation protocols and a human-machineinteraction interface for testing and comparing dif-ferent CRSs, which are useful to standardize theevaluation protocol for conversational recommen-dation. Speciﬁcally, we implement various au- a r X i v : . [ c s . C L ] J a n omatic evaluation metrics to test a CRS on rec-ommendation, conversation and policy tasks, re-spectively, covering commonly-used metrics inexisting works. In addition, CRSLab providesa human-machine interactive interface to performquantitative analysis, which is helpful for users todeploy their systems and converse with the sys-tems via the webpage. As aforementioned, existing CRS datasets andmodels vary in scenarios and goals, thus the corre-sponding domains and task deﬁnitions can be dif-ferent, which brings gaps when applying existingmodels on different datasets or scenarios. To ﬁllthese gaps, based on previous works (Lei et al.,2020; Zhou et al., 2020c; Sun and Zhang, 2018),we unify the task of CRS into two basic sub-tasksand an auxiliary sub-task, namely recommenda-tion, conversation and policy. These three sub-tasks are described as: given the dialog context( i.e., historical utterances) and other useful sideinformation ( e.g., interaction history and knowl-edge graph), we aim to (1) predict user-preferreditems (recommendation), (2) generate a proper re-sponse (conversation), and (3) select proper inter-active action (policy).It is worth noting that the above task descriptioncovers most of CRS models and datasets. The rec-ommendation and the conversation sub-tasks havebeen considered by all of these works. The policysub-task is needed by recent works (Zhou et al.,2020c; Lei et al., 2020), by which the CRS canproactively guide the dialog for better recommen-dation. For different goals and scenarios, the pol-icy sub-task can be different. For example, TG-ReDial (Zhou et al., 2020c) utilizes a topic pre-diction model to accomplish the policy sub-task,while DuRecDial (Liu et al., 2020) deﬁnes it as agoal planning task.

The overall framework of our toolkit CRSLab ispresented in Figure 1. The conﬁguration mod-ule provides a ﬂexible interface for users to easilyset up the experiment environment ( e.g., datasets,models and hyperparameters). The data, modeland evaluation modules are built upon the conﬁg-uration module, which forms the core part of ourtoolkit. The bottom part is the utility module, pro-viding auxiliary functions and interfaces for reuse

Configuration

Configuration File Command Line

Data

DatasetDataLoader

Model

ModelSystem

Evaluator

MetricsEvaluator

Utilities

Layers Scheduler ResourceLogger … Figure 1: The overall framework of CRSLab. in other modules ( e.g., logger and resource). In thefollowing part, we brieﬂy present the designs ofthe above modules, and more details can be foundin the toolkit documents.

In CRSLab, we design the conﬁguration modulefor users to conveniently select or modify the ex-periment setup ( e.g., dataset, model and hyperpa-rameters). Speciﬁcally, we design the class

Conﬁg to store all the conﬁguration settings, which spec-iﬁes the model and its hyperparameters for eachcomponent of the CRS and environment for agiven experiment. To avoid specifying compli-cated command line parameters, we provide a fewcommonly-used conﬁgured settings ( i.e., ﬁle pathand debug mode) in the command line while oth-ers in YAML conﬁguration ﬁles. In this way, userscan build and evaluate a variety of different CRSswith only slight modiﬁcations in the conﬁgurationﬁles.

For extensibility and reusability, we design anelegant data ﬂow that transforms raw datasetinto the model input as following: Raw PublicDataset −→ Preprocessed Dataset −→ Dataset −→ DataLoader −→ System . Next, we detail thedesign of these components.

Since raw public datasets vary in formats and fea-tures, we preprocess these datasets to support uni-ﬁed interfaces in data modules. Based on the taskdescription in Section 2, we ﬁrst preprocess CRSdatasets to match the input and output formats.Speciﬁcally, we organize the dialog context andside information as the input while extract the rec-ommended items, dialog actions and responses as ataset Dialog Utterance Domain Policy Model Entity KG Word KGReDial (Li et al., 2018) 10,006 182,150 Movie – DB CNetTG-ReDial (Zhou et al., 2020c) 10,000 129,392 Movie Topic Prediction CN-DB HNetGoRecDial (Kang et al., 2019) 9,125 170,904 Movie Action Prediction DB CNetDuRecDial (Liu et al., 2020) 10,200 156,000 Movie, Music Goal Planning CN-DB HNetINSPIRED (Hayati et al., 2020) 1,001 35,811 Movie Strategy Prediction DB CNetOpenDialKG (Moon et al., 2019) 13,802 91,209 Movie, Book Path Generation DB CNet

Table 1: The collected datasets in CRSLab. DB and CN-DB stand for the entity-oriented knowledge graph DBpediaand CN-DBpedia, respectively. CNet and HNet stand for the word-oriented knowledge graph ConceptNet andHowNet, respectively. the output of recommendation, policy and conver-sation sub-tasks, respectively. To support someadvanced models ( e.g., graph neural network andpre-training models), we incorporate useful sidedata ( e.g., knowledge graph) and conduct speciﬁcpreprocessing ( e.g., entity link and BPE segment).As shown in Table 1, we have collected 6commonly-used human-annotated datasets and re-leased the preprocessed versions with the side datain our CRSLab. Besides, we also release thepre-trained word embeddings and other associatedﬁles, which ease the use of integrated datasets andreduce the time cost.

To decouple the implementation of data prepar-ing in CRSLab, we design the class

Dataset forintegrating the model-independent data process-ing functions, while the rest functions are imple-mented by the class

DataLoader . In this way,

Dataset only focuses on processing the input datainto a uniﬁed format ( i.e., a list of python . dict ),without considering speciﬁc models. In CRSLab,we design the class BaseDataset which includessome common attributes ( e.g., conﬁgurations anddata path) and basic functions ( e.g., load data) of

Dataset , hence users can inherit

BaseDataset withvery few modiﬁcations to integrate new datasets.

Indeed, different CRS models need different for-mats. Since

Dataset has processed the input datainto a uniﬁed format,

DataLoader further reformu-lates data for supporting various models. Specif-ically,

DataLoader focuses on selecting featuresfrom the processed data after

Dataset to form ten-sor data( i.e., torch . Tensor ) in a batch or mini-batch, which can be directly used for the updateand computation of downstream models. To im-plement it, we design the class

BaseDataLoader tointegrate common attributes and functions, and in- herit it to produce new dataloaders for correspond-ing models.

Based on the task description and above data mod-ules, we reorganize the implementations of exist-ing CRS in a hierarchical framework, in which themodel module provides functions and interfacesfor building and running speciﬁc models, whilethe system module trains or evaluates containedmodels for accomplishing the deﬁned task.

As mentioned before, a CRS may consist of sev-eral models for corresponding sub-tasks. In themodel module, we focus on providing a basicstructure and useful highly-decoupled functionsor procedures for development. Speciﬁcally, weunify the basic attributes and functions of variousmodels ( e.g., parameter initialization and modelloading) into the class

BaseModel . A user can in-herit

BaseModel and implement a few functions todevelop and design new models.We have carefully surveyed the recent litera-ture and selected commonly-used models in fourcategories, namely CRS models, recommendationmodels, conversation models and policy models.Among them, CRS models integrate the recom-mendation model and the conversation model toimprove both models, while recommendation, pol-icy and conversation models only focus on one in-dividual sub-task. As illustrated in Table 2, wemainly focus on recently proposed neural meth-ods, and also keep some classic heuristic methodssuch as Popularity and PMI. In the ﬁrst release ver-sion, we have implemented 18 models, includingsome advanced models such as graph neural net-works and pre-training models. For all the imple-mented models, we have tested their performanceon two or three selected datasets, and invited acode reviewer to examine the correctness of the ategory Model GNN PTM ReferenceCRS model ReDial × × (Li et al., 2018)KBRD √ × (Chen et al., 2019)KGSF √ × (Zhou et al., 2020a)TG-ReDial × √ (Zhou et al., 2020c)Recommendation model Popularity × × –GRU4Rec × × (Hidasi et al., 2016)SASRec × × (Kang and McAuley, 2018)TextCNN × × (Kim, 2014)R-GCN √ × (Schlichtkrull et al., 2018)BERT × √ (Devlin et al., 2019)Conversation model HERD × × (Serban et al., 2016)Transformer × × (Vaswani et al., 2017)GPT-2 × √ (Radford et al., 2019)Policy model PMI × × –MGCG × × (Liu et al., 2020)Conv-BERT × √ (Zhou et al., 2020c)Topic-BERT × √ (Zhou et al., 2020c)Proﬁle-BERT × √ (Zhou et al., 2020c)

Table 2: The implemented models in CRSLab. Recommendation, policy and conversation models specify corre-sponding individual sub-task, while CRS models can accomplish these sub-tasks together. GNN and PTM standfor the graph neural network and pre-training models, respectively. implementation. In the future, more methods willalso be incorporated along with regular updates.

To support ﬂexible architectures for CRS at a highlevel, we devise the system module which servesas a junction to integrate the dataloader, model andevaluator modules for building a complete CRS.Speciﬁcally, the system module mainly aims toset up models for accomplishing the CRS task,distribute the tensor data from dataloader to cor-responding models, train the models with properoptimization strategy, and conduct evaluation withspeciﬁed protocols.To implement the above requirements, we de-sign the class

BaseSystem to unify the struc-ture and interfaces, which contains correspond-ing functions. In

BaseSystem , we also imple-ment a series of useful functions, such as optimizerinitialization, learning ratio adjustment and earlystop strategy. These functions and tiny tricks easethe developing process of new system and largelyimprove the user experiences with our CRSLab.

The function of the evaluation module is to im-plement the evaluation protocols for CRS models.In CRSLab, we implement commonly-used auto-matic evaluation metrics. Besides we also designa human-machine interactive interface for users toperform an end-to-end quantitative analysis.

Since the CRS task is divided into three sub-tasks, we develop corresponding automatic met-rics in the evaluation module. We summarizeall the supported automatic evaluation metricsin Table 3. For recommendation sub-task, fol-lowing existing CRS models (Sun and Zhang,2018; Zhang et al., 2018), we develop ranking-based metrics for measuring the ranking perfor-mance of the generated recommendation lists by aCRS. For conversation sub-task, CRSLab supportsboth relevance-based and diversity-based evalua-tion metrics. The relevance-based metrics includePerplexity, BLEU (Papineni et al., 2002) and Em-bedding metrics (Liu et al., 2016), which measuresthe similarity between ground-truth and generatedresponses from the perspective of probability, n-gram and word embedding, respectively. Thediversity-based metrics are Distinct- { } (Liet al., 2016), measuring the number of distinct { } -gram in the generated responses. Sincethe policy sub-task varies in existing CRSs ( e.g., action and topic prediction), we implement thecommonly-used metrics Accuracy and Hit@K toevaluate the performance between the true andpredicted values.Similarly, we design the class BaseEvaluator by implementing common attributes and func-tions. Then, we inherit

BaseEvaluator andimplement

RecEvaluator , ConvEvaluator and ategory MetricsRecommendation Metrics Hit@ { } , MRR@ { } , NDCG@ { } Conversation Metrics Perplexity, BLEU- { } , Embedding Average/Extreme/Greedy,Distinct- { } Policy Metrics Accuracy, Hit@ { } Table 3: The implemented automatic evaluation metrics in CRSLab.

PolicyEvaluator for evaluating recommendation,conversation and policy sub-tasks, respectively. Itis worth noting that we implement report () func-tion in these evaluators. With this function userscan print and monitor the performance of modelsevaluating on validation or test set. To evaluate a CRS quantitatively, CRSLab offers ahuman-machine interaction interface to help usersperform an end-to-end evaluation. The human-machine interaction interface is integrated with thesystem module, by which the interaction strategywithin the interface can be easily adapted for aspeciﬁc policy model. In this way, a user can con-verse with a CRS and diagnose the system, whichprovides an approach to directly evaluating theoverall performance of a CRS. Besides, the inter-action interface enables users to correct errors bymodifying intermediate results.Speciﬁcally, to perform end-to-end evaluation,users ﬁrst set up the background of a simulateduser ( e.g., interaction history and user proﬁle),then freely chat with the CRS through the inter-face. During a conversation, the dialog history andthe output of each component (including the rec-ommended items and selected policy) are stored asa dictionary, which helps users get a good under-standing of how their system works.

In order to better use our CRSLab, we design theutility module which includes auxiliary functions( e.g., logger () and scheduler () ). Speciﬁcally, weimplement a series of useful functions to facili-tate the use of our toolkit. A particularly use-ful function is scheduler () , which provides a setof strategies for training large-scale models, suchas warming-up strategy and weight decay. Be-sides, we also implement other functions to im-prove the user experiences with our toolkit, suchas save model () and load model () to store andreuse the learned models, logger () to print and monitor the running process.To ease the development of a new CRS, we alsodecouple commonly-used functions or procedures( e.g., Layers ) in other modules to form the utilityﬁle ( i.e., utils.py), which constitutes another partof the utility module. In this way, users can as-semble or slightly modify functions in utility ﬁlesto develop and design a new CRS.

In this section, we show how to use our CRSLabwith code examples. We detail the usage descrip-tion in two parts, namely running an existing CRSin our toolkit and implementing a new CRS basedon the interfaces provided in our toolkit.

Our CRSLab allows for easy creation of a CRSwithin a few lines of code. Figure 2 presents ageneral procedure for running an existing CRS inour toolkit.To begin with, the whole procedure relies on theconﬁguration to prepare the dataset and build thesystem. In the conﬁguration, the user selects adataset to use and speciﬁes the tokenizer. Then,the

Dataset class will automatically download thedataset and perform necessary processesing steps( e.g., tokenize and convert tokens to IDs) basedon the conﬁgurations. This procedure is exe-cuted by the function get dataset () . Based onthe processed datasets, users can use the function get dataloader () to generate training, validationand test sets, in which the conﬁgurations specifythe batch size and other parameters for data pro-cessing. After that, users can adopt the function get system () to build a CRS, which leverages theprepared side data from dataset and the above dat-aloaders. In the CRS, the conﬁgurations specifythe structure of models and set up the training andevaluation procedures. users can start the runningprocess by the following function System . ﬁt () . mport ... Config file Command line

Get ConfigurationBuild DataLoader

Train Dataloader TestDataloaderValid Dataloader

Build Dataset

Dataset VocabSide data

Build System

Build model Initialize model Evaluate modelTrain model

Figure 2: An illustrative usage ﬂow of our CRSLab.

Based on our toolkit, it is convenient to implementa new CRS with provided interfaces. The useronly needs to inherit a few basic classes and im-plement some interface functions. In this part, wewill introduce the detailed implementation processof adding a new dataset and model, respectively.

To add a new dataset, one needs to inherit

BaseDataset to design a new

Dataset class forpreparing the dataset into a uniﬁed format. In

Dataset , the following functions are requiredto be implemented: init () , load data () and data preprocess () .Speciﬁcally, in init () , users set up param-eters and the dataset links. In load data () , thetraining, validation, test data and other side dataare loaded from corresponding ﬁles. Noting thatif users follow our naming protocol, all theyneed to do is to reuse the implemented functionsfrom the existing Dataset class. The function data preprocess () performs the preparing of theloaded data. We integrate useful functions in theutility module to ease the implementation. To add a new model, users should inherit

BaseModel to design a new

Model class, in whichthey need to implement the build model () and forward () functions. In build model , users buildthe model, initialize the parameters and set up theloss function. While in forward () , users use themodel to predict the result or calculate the lossfor the input data. Indeed, users can leverage the encapsulated layers and functions from the util-ity ﬁles to implement the two functions, which aredecoupled from existing CRS models and may beuseful in most cases. In this paper, we have released a new conversa-tional recommender system (CRS) toolkit calledCRSLab, which is the ﬁrst open-source CRStoolkit for research purpose. In CRSLab, we offera uniﬁed and extensible framework with highly-decoupled modules to develop a CRS. Basedon this framework, we integrate comprehensivebenchmark datasets and models. So far, we haveincorporated 6 commonly-used datasets and im-plemented 18 models in our toolkit. Besides,CRSLab also provides extensive automatic eval-uation protocols and a human-machine interactiveinterface to compare and test different CRSs.With CRSLab toolkit, we expect to help usersquickly implement existing CRSs, ease the devel-oping process of new systems, and set up a bench-mark framework for the research of CRS. In thefuture, we will make continuous efforts to addmore datasets and models, and will also consideradding more utilities for improving the usage ofour toolkit, such as result visualization and algo-rithm debugging.

References

Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding,Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. To-wards knowledge-based recommender dialog sys-tem. In

Proceedings of the 2019 Conference onmpirical Methods in Natural Language Processingand the 9th International Joint Conference on Nat-ural Language Processing, EMNLP-IJCNLP 2019,Hong Kong, China, November 3-7, 2019 , pages1803–1813.Konstantina Christakopoulou, Filip Radlinski, andKatja Hofmann. 2016. Towards conversational rec-ommender systems. In

Proceedings of the 22ndACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining, San Francisco,CA, USA, August 13-17, 2016 , pages 815–824.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: pre-training ofdeep bidirectional transformers for language under-standing. In

Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, NAACL-HLT 2019, Minneapolis, MN,USA, June 2-7, 2019, Volume 1 (Long and Short Pa-pers) , pages 4171–4186.Shirley Anugrah Hayati, Dongyeop Kang, Qingxi-aoyang Zhu, Weiyan Shi, and Zhou Yu. 2020. IN-SPIRED: toward sociable recommendation dialogsystems. In

Proceedings of the 2020 Conference onEmpirical Methods in Natural Language Process-ing, EMNLP 2020, Online, November 16-20, 2020 ,pages 8142–8152.Bal´azs Hidasi, Alexandros Karatzoglou, Linas Bal-trunas, and Domonkos Tikk. 2016. Session-basedrecommendations with recurrent neural networks.In .Dongyeop Kang, Anusha Balakrishnan, Pararth Shah,Paul Crook, Y-Lan Boureau, and Jason Weston.2019. Recommendation as a communication game:Self-supervised bot-play for goal-oriented dialogue.In

Proceedings of the 2019 Conference on Empir-ical Methods in Natural Language Processing andthe 9th International Joint Conference on NaturalLanguage Processing, EMNLP-IJCNLP 2019, HongKong, China, November 3-7, 2019 , pages 1951–1961.Wang-Cheng Kang and Julian J. McAuley. 2018. Self-attentive sequential recommendation. In

IEEE In-ternational Conference on Data Mining, ICDM2018, Singapore, November 17-20, 2018 , pages197–206.Yoon Kim. 2014. Convolutional neural networks forsentence classiﬁcation. In

Proceedings of the 2014Conference on Empirical Methods in Natural Lan-guage Processing, EMNLP 2014, October 25-29,2014, Doha, Qatar, A meeting of SIGDAT, a SpecialInterest Group of the ACL , pages 1746–1751.Wenqiang Lei, Xiangnan He, Yisong Miao, QingyunWu, Richang Hong, Min-Yen Kan, and Tat-SengChua. 2020. Estimation-action-reﬂection: Towards deep interaction between conversational and recom-mender systems. In

WSDM ’20: The ThirteenthACM International Conference on Web Search andData Mining, Houston, TX, USA, February 3-7,2020 , pages 304–312.Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao,and Bill Dolan. 2016. A diversity-promoting ob-jective function for neural conversation models. In

NAACL HLT 2016, The 2016 Conference of theNorth American Chapter of the Association forComputational Linguistics: Human Language Tech-nologies, San Diego California, USA, June 12-17,2016 , pages 110–119.Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz,Vincent Michalski, Laurent Charlin, and Chris Pal.2018. Towards deep conversational recommenda-tions. In

Advances in Neural Information Process-ing Systems 31: Annual Conference on Neural Infor-mation Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr´eal, Canada , pages 9748–9758.Lizi Liao, Ryuichi Takanobu, Yunshan Ma, XunYang, Minlie Huang, and Tat-Seng Chua. 2019.Deep conversational recommender in travel.

CoRR ,abs/1907.00710.Chia-Wei Liu, Ryan Lowe, Iulian Serban, MichaelNoseworthy, Laurent Charlin, and Joelle Pineau.2016. How NOT to evaluate your dialogue sys-tem: An empirical study of unsupervised evaluationmetrics for dialogue response generation. In

Pro-ceedings of the 2016 Conference on Empirical Meth-ods in Natural Language Processing, EMNLP 2016,Austin, Texas, USA, November 1-4, 2016 , pages2122–2132.Zeming Liu, Haifeng Wang, Zheng-Yu Niu, Hua Wu,Wanxiang Che, and Ting Liu. 2020. Towardsconversational recommendation over multi-type di-alogs. In

Proceedings of the 58th Annual Meeting ofthe Association for Computational Linguistics, ACL2020, Online, July 5-10, 2020 , pages 1036–1049.Seungwhan Moon, Pararth Shah, Anuj Kumar, and Ra-jen Subba. 2019. Opendialkg: Explainable conver-sational reasoning with attention-based walks overknowledge graphs. In

Proceedings of the 57th Con-ference of the Association for Computational Lin-guistics, ACL 2019, Florence, Italy, July 28- August2, 2019, Volume 1: Long Papers , pages 845–854.Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic eval-uation of machine translation. In

Proceedings of the40th Annual Meeting of the Association for Compu-tational Linguistics, July 6-12, 2002, Philadelphia,PA, USA , pages 311–318.Alec Radford, Jeffrey Wu, Rewon Child, David Luan,Dario Amodei, and Ilya Sutskever. 2019. Languagemodels are unsupervised multitask learners.

OpenAIBlog , 1(8):9.ichael Sejr Schlichtkrull, Thomas N. Kipf, PeterBloem, Rianne van den Berg, Ivan Titov, and MaxWelling. 2018. Modeling relational data with graphconvolutional networks. In

The Semantic Web - 15thInternational Conference, ESWC 2018, Heraklion,Crete, Greece, June 3-7, 2018, Proceedings , pages593–607.Iulian Vlad Serban, Alessandro Sordoni, Yoshua Ben-gio, Aaron C. Courville, and Joelle Pineau. 2016.Building end-to-end dialogue systems using gener-ative hierarchical neural network models. In

Pro-ceedings of the Thirtieth AAAI Conference on Arti-ﬁcial Intelligence, February 12-17, 2016, Phoenix,Arizona, USA , pages 3776–3784.Yueming Sun and Yi Zhang. 2018. Conversational rec-ommender system. In

The 41st International ACMSIGIR Conference on Research & Development inInformation Retrieval, SIGIR 2018, Ann Arbor, MI,USA, July 08-12, 2018 , pages 235–244.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In

Advances in Neural Information Pro-cessing Systems 30: Annual Conference on NeuralInformation Processing Systems 2017, 4-9 Decem-ber 2017, Long Beach, CA, USA , pages 5998–6008.Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang,and W. Bruce Croft. 2018. Towards conversational search and recommendation: System ask, user re-spond. In

Proceedings of the 27th ACM Interna-tional Conference on Information and KnowledgeManagement, CIKM 2018, Torino, Italy, October22-26, 2018 , pages 177–186.Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuan-hang Zhou, Ji-Rong Wen, and Jingsong Yu. 2020a.Improving conversational recommender systems viaknowledge graph based semantic fusion. In

KDD’20: The 26th ACM SIGKDD Conference on Knowl-edge Discovery and Data Mining, Virtual Event, CA,USA, August 23-27, 2020 , pages 1006–1014.Kun Zhou, Wayne Xin Zhao, Hui Wang, Sirui Wang,Fuzheng Zhang, Zhongyuan Wang, and Ji-RongWen. 2020b. Leveraging historical interaction datafor improving conversational recommender system.In

CIKM ’20: The 29th ACM International Confer-ence on Information and Knowledge Management,Virtual Event, Ireland, October 19-23, 2020 , pages2349–2352.Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xi-aoke Wang, and Ji-Rong Wen. 2020c. Towardstopic-guided conversational recommender system.In