[PDF] A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles

Abstract

There is increasing interest in developing personalized Task-oriented Dialogue Systems (TDSs). Previous work on personalized TDSs often assumes that complete user profiles are available for most or even all users. This is unrealistic because (1) not everyone is willing to expose their profiles due to privacy concerns; and (2) rich user profiles may involve a large number of attributes (e.g., gender, age, tastes, . . .). In this paper, we study personalized TDSs without assuming that user profiles are complete. We propose a Cooperative Memory Network (CoMemNN) that has a novel mechanism to gradually enrich user profiles as dialogues progress and to simultaneously improve response selection based on the enriched profiles. CoMemNN consists of two core modules: User Profile Enrichment (UPE) and Dialogue Response Selection (DRS). The former enriches incomplete user profiles by utilizing collaborative information from neighbor users as well as current dialogues. The latter uses the enriched profiles to update the current user query so as to encode more useful information, based on which a personalized response to a user request is selected. We conduct extensive experiments on the personalized bAbI dialogue benchmark datasets. We find that CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3.06% in terms of response selection accuracy compared to state-of-the-art methods. We also test the robustness of CoMemNN against incompleteness of user profiles by randomly discarding attribute values from user profiles. Even when discarding 50% of the attribute values, CoMemNN is able to match the performance of the best performing baseline without discarding user profiles, showing the robustness of CoMemNN.

Full PDF

AA Cooperative Memory Network for Personalized Task-orientedDialogue Systems with Incomplete User Profiles

Jiahuan Pei

University of AmsterdamAmsterdam, The [email protected]

Pengjie Ren ∗ Shandong UniversityQingdao, [email protected]

Maarten de Rijke

University of Amsterdam & Ahold DelhaizeAmsterdam, The [email protected]

ABSTRACT

There is increasing interest in developing personalized Task-orientedDialogue Systems (TDSs). Previous work on personalized TDSs of-ten assumes that complete user profiles are available for most oreven all users. This is unrealistic because (1) not everyone is will-ing to expose their profiles due to privacy concerns; and (2) richuser profiles may involve a large number of attributes (e.g., gen-der, age, tastes, . . . ). In this paper, we study personalized TDSswithout assuming that user profiles are complete. We propose aCooperative Memory Network (CoMemNN) that has a novel mech-anism to gradually enrich user profiles as dialogues progress and tosimultaneously improve response selection based on the enrichedprofiles. CoMemNN consists of two core modules: User Profile En-richment (UPE) and Dialogue Response Selection (DRS). The formerenriches incomplete user profiles by utilizing collaborative infor-mation from neighbor users as well as current dialogues. The latteruses the enriched profiles to update the current user query so asto encode more useful information, based on which a personalizedresponse to a user request is selected.We conduct extensive experiments on the personalized bAbIdialogue benchmark datasets. We find that CoMemNN is able toenrich user profiles effectively, which results in an improvement of3.06% in terms of response selection accuracy compared to state-of-the-art methods. We also test the robustness of CoMemNN againstincompleteness of user profiles by randomly discarding attributevalues from user profiles. Even when discarding 50% of the attributevalues, CoMemNN is able to match the performance of the bestperforming baseline without discarding user profiles, showing therobustness of CoMemNN.

CCS CONCEPTS • Computing methodologies → Discourse, dialogue and prag-matics ; •

Information systems → Personalization . KEYWORDS

Dialogue systems, personalization, neural networks, collaborativeagents ∗ Corresponding authorThis paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3449843

ACM Reference Format:

Jiahuan Pei, Pengjie Ren, and Maarten de Rijke. 2021. A Cooperative MemoryNetwork for Personalized Task-oriented Dialogue Systems with IncompleteUser Profiles. In

Proceedings of the Web Conference 2021 (WWW ’21), April19–23, 2021, Ljubljana, Slovenia.

ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3442381.3449843

The use of Task-oriented Dialogue Systems (TDSs) is becomingincreasingly widespread. Unlike Open-ended Dialogue Systems(ODSs) [12, 48], TDSs are meant to help users achieve specificgoals during multiple-turn dialogues [3]. Applications include book-ing restaurants, planning trips, grocery shopping, customer ser-vice [e.g., 2, 21, 25, 26, 39, 45].Considerable progress has been made in improving the perfor-mance of TDSs [e.g., 2, 7, 14, 15, 17, 27, 42]. Human-human di-alogues reflect diverse personalized preferences in terms of, e.g.,modes of expression habits [6, 46], individual needs and related tospecific goals [9, 19, 23]. Recent work has begun to explore howto improve the user experience by personalizing TDSs in similarways. Several personalized TDS models have been proposed andhave achieved good performance [9, 19, 47]. Personalized TDS mod-els use user profiles in order to be able to capture, and optimizefor, users’ personal preferences. Those user profiles may not al-ways be available or complete. While profiles may be obtainedby asking users to fill in personal profiles with all predefined at-tributes [9, 19, 47], more often than not, they are incomplete andhave missing values for some of the attributes of interest: (1) notall users are willing to expose their profiles due to privacy con-cerns [37]; Tigunova et al. [36] have shown that users rarely revealtheir personal information in dialogues explicitly; and (2) user pro-files may involve many attributes (such as, e.g., gender, age, tastes),which makes it hard to collect values for all of them. For example,even if we know a user’s favorite food is “fish and chips,” this doesnot mean the user does not like “hamburgers.”In this paper, we study the problem of personalized TDSs with in-complete user profiles . This problem comes with two key challenges:(1) how to infer missing attribute values of incomplete user profiles;and (2) how to use enriched profiles so as to enhance personalizedTDSs. There have been previous attempts to extract user profilesfrom open-ended dialogues [11, 13, 35, 36, 41] but to the best of ourknowledge the problem of inferring and using missing attributevalues has not been studied yet in the context of TDSs.We address the problem of personalized TDSs with incompleteuser profiles by proposing an end-to-end

Cooperative Memory Net-work (CoMemNN) in which profiles and dialogues are used to mu-tually improve each other. See Figure 1 for an intuitive sketch. The a r X i v : . [ c s . A I] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Pei et al.

Figure 1: Cooperative interaction between user profiles anddialogues. intuition behind CoMemNN is that user profiles can be graduallyimproved (i.e., missing values can be added) by leveraging usefulinformation from each dialogue turn, and, simultaneously, the per-formance of Dialogue Response Selection (DRS) can be improvedbased on enriched profiles for later turns. For example, when user 𝑢 produces the utterance “Does it have ‘decent’ french fries?” andthe user reveals his like of “french fries,” the attribute ‘favorite food’in his user profile can be enriched with the value of “french fries.”In addition, we want to consider collaborative information fromsimilar users, assuming that similar users have similar preferencesas reflected in their user profiles. For example, a young male non-vegetarian who is a big fan of “pizza” might also love “fish and chips”if there are several users with similar profiles stating “fish and chips”as their favorite food. In turn, knowledge of these preferences canaffect the choice of the response selected by a TDS in case there aremultiple candidate responses. In other words, users with similarprofiles may expect the same or a similar response given a certaindialogue context [19]. CoMemNN operationalizes these intuitionswith two key modules: User Profile Enrichment (UPE) and DialogueResponse Selection (DRS). The former enriches incomplete userprofiles by utilizing useful information from the current dialogue aswell as collaborative information from similar users. The latter usesthe enriched profiles to update the query representing all requestedinformation, based on which a personalized response is selected toreply to user requests.To verify the effectiveness of CoMemNN, we conduct extensiveexperiments on the personalized bAbI dialogue (PbAbI) benchmarkdataset, which comes in two flavors, a small version which has 1,000dialogues, and a large version, which has 12,000 dialogues. First, wefind that CoMemNN improves over the best baseline by 3.06%/2.80%on the small/large dataset, respectively, when using all availableuser profiles. Second, to assess the performance of CoMemNN in thepresence of incomplete user profiles, we randomly discard valuesof attributes with varying probabilities and find that even when itdiscards 50% of the attribute values, the performance of CoMemNNmatches the performance of the best performing baseline withoutdiscarding user profiles. In contrast, the best performing baselinedecreases 2.12%/1.97% in performance on the small/large datasetwith the same amount of discarded values.The main contributions of this paper are as follows: • We consider the task personalized TDSs with incomplete userprofiles, which has not been investigated so far, to the best of ourknowledge. • We devise a CoMemNN model with dedicated modules to gradu-ally enrich user profiles as a dialogue progresses and to improveresponse selection based on enriched profiles at the same time. • We carry out extensive experiments to show the robustness ofCoMemNN in the presence of incomplete user profiles.

In this section, we briefly present an overview of related work onpersonalized Open-ended Dialogue Systems (ODSs) and personal-ized Task-oriented Dialogue Systems (TDSs).

Previous studies on personalized ODSs mainly fuse unstructuredpersona information [22, 48]. Li et al. [12] first attempt to incorpo-rate a persona into the Seq2Seq framework [34] to generate person-alized responses. Ficler and Goldberg [6] apply an RNN languagemodel conditioned on a persona to control response generationwith linguistic style. Zhang et al. [48] find that selection modelsbased on Memory Networks [33] are more promising than recur-rent generation models based on Seq2Seq [34]. Mazare et al. [22]develop a response selection model based on MemNN and modelpersona to improve the performance of an ODS. Song et al. [32]explore how to generate diverse personalized responses using avariational autoencoder conditioned on a persona memory. Liu et al.[16] make use of persona interaction between two interlocutors.Xu et al. [43] further exploit topical information to extend persona.Prior attempts to address data sparsity problems in order toenhance personalized ODSs have considered pretraining [8, 51],sketch generation and filling [30], multiple-stage decoding [31],multi-task learning [18], transfer learning [40, 44, 49], and meta-learning [20]. Only few studies have explored structured user pro-files for ODSs [28, 50, 52].Most of the methods listed above focus on unstructured personainformation while we target structured user profiles. Importantly,they focus on ODSs, so they cannot be applied to TDSs directly.

Unlike ODSs, personalized TDSs have not been investigated ex-tensively so far. Joshi et al. [9] release the first and, so far, onlybenchmark dataset for personalized TDSs, to the best of our knowl-edge. They propose a memory network based model, MemNN, toencode user profiles and conduct personalized response selection.They also propose an extension of MemNN, Split MemNN, whichsplits a memory into a profile memory followed by a dialogue mem-ory. Zhang et al. [47] introduce Retrieval MemNN by incorporatinga retrieval module into memory network, which enhances the per-formance by retrieving the relevant responses from other users. Luoet al. [19] present Personalized MemNN which learns distributedembeddings for user profiles, dialogue history, and the dialoguehistory from users with the same gender and age, and shows bet-ter performance by using the idea user bias towards KnowledgeBase (KB) entries over candidate responses. Mo et al. [23] introduce

Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Table 1: Summary of main notation used in the paper. 𝑋 𝑢𝑡 User utterance at turn 𝑡 . 𝑋 𝑠𝑡 System response at turn 𝑡 . 𝐷 𝑡 Dialogue history at turn 𝑡 . h 𝑡 Hidden representation of 𝐷 𝑡 . 𝑢 A user profile in the form of {( 𝑎 𝑖 , 𝑣 𝑖 )} 𝑚𝑖 = , 𝑣 𝑖 is a candidatevalue of 𝑖 -th attribute 𝑎 𝑖 . p One-hot representation of 𝑢 . q 𝑡 A query representation that represents the user’s currentrequest at turn 𝑡 . M 𝑃𝑡 Profile memory that contains user profile presentations of 𝑢 and his/her neighbors at turn 𝑡 . M 𝐷𝑡 Dialogue memory that contains dialogue history presentationof 𝑢 and his/her neighbors at turn 𝑡 .a transfer reinforce learning paradigm to alleviate data scarcity,which uses a collection of multiple users as a source domain andan individual user as a target domain.The methods above all assume that complete user profiles canbe obtained by urging users to fill in all blanks in user profiles,which is unrealistic in practice. Thus, it remains unexplored howthe methods above perform when incomplete user profiles areprovided, and whether we can bridge the gap in performance iftheir performance is negatively affected. An alternative is to firstinfer missing user profiles, e.g., by mining query logs or previousconversations [11, 13, 35, 36], and then apply the model with theabove methods. But to do so, we need to train a model to infermissing user profiles asynchronously. Besides, it will likely bringcumulative errors to downstream TDS tasks. Instead, we proposeto enrich user profiles and achieve a TDS simultaneously with anend-to-end model. In this work, we follow previous studies and model a personalizedTDS as a response selection task, which selects a response frompredefined candidates given a dialogue context [5, 9, 19, 27, 29, 38,47]. Table 1 summarizes the main notation used in this paper.Given a dialogue context ( 𝑢, 𝐷 𝑡 , 𝑋 𝑢𝑡 ) at the 𝑡 -th dialogue turn,our goal is to select an appropriate response 𝑦 𝑡 = 𝑋 𝑠𝑡 from can-didate responses 𝑌 = { 𝑋 𝑠𝑗 }| | 𝑌 | 𝑗 = . Here, 𝑢 is the user profile, whichconsists of 𝑚 attribute-value pairs {( 𝑎 𝑖 , 𝑣 𝑖 )} 𝑚𝑖 = , where 𝑎 𝑖 is the 𝑖 -thattribute and 𝑣 𝑖 is a candidate value of 𝑎 𝑖 . For example, in Fig. 1, theuser profile is denoted as {(Gender, Male), (Age, Young), (Dietary,Non-vegetarian), (Favorite food, Fish and Chips)}. 𝐷 𝑡 = 𝑋 𝑡 − isthe dialogue history. Similar to [9, 19, 47], 𝐷 𝑡 is represented as asequence of words that are aggregated from historical utterances [ 𝑋 𝑢 , 𝑋 𝑠 , . . . , 𝑋 𝑢𝑡 − , 𝑋 𝑠𝑡 − ] , alternating between the user 𝑢 or system 𝑠 . 𝑋 𝑢𝑡 denotes the current user utterance, representing the user’scurrent request. An overview of the proposed architecture, CoMemNN, is shown inFig. 2. A key aspect of the architecture is that it aims to capture

Figure 2: An overview of the CoMemNN architecture, whichconsists of two cooperative modules: UPE and DRS. all useful information from the given dialogue context ( 𝑢, 𝐷 𝑡 , 𝑋 𝑢𝑡 ) ,based on which we learn a query representation q 𝑡 to representthe user’s current request. q 𝑡 is usually initialized with the cur-rent user utterance 𝑋 𝑢𝑡 [9, 19, 47]. Then, q 𝑡 is updated by the UserProfile Enrichment (UPE) module by incorporating dialogue andpersonal information from dialogues and user profiles, respectively.Specifically, UPE captures the interaction between user profilesand dialogues with three submodules: Memory Initialization (MI),Memory Updating (MU) and Memory Reading (MR). MI searchesneighbors of the current user to initialize the profile memory M 𝑃𝑡 ,which contains profiles from both the current user and his/herneighbors. MI also initializes the dialogue memory M 𝐷𝑡 with the di-alogue history of both the current user and his/her neighbors, eachof which is represented by addressing dialogue historical utterancerepresentations with q 𝑡 . MU updates the profile memory M 𝑃𝑡 andthe dialogue memory M 𝐷𝑡 by considering their interaction, afterwhich the user profiles are enriched by inferring missing valuesbased on the dialogue and personal information from the currentuser and his/her neighbors. Afterwards, MR updates the query rep-resentation q 𝑡 by reading from the enriched profile memory as wellas dialogue memory. Finally, the Dialogue Response Selection (DRS)module uses the updated query to match candidate responses so asto select an appropriate response. Next, we introduce each of themodules MI, MU and MR, one by one. Profile Memory Initialization.

To model user-profile relations, weinitialize the profile memory as: M 𝑃𝑡 = [ Ψ ( 𝑢 ) , . . . , Ψ ( 𝑢 𝑘 )] ∈ R 𝑘 × 𝑑 ,where 𝑢 is the Current Profile (CP) from the current user. Theothers are Neighbor Profiles (NPs) from neighbor users. For eachuser profile, the 𝑖 -th attribute can be represented as an one-hotvector ˜p 𝑖 ∈ R 𝐶 ( 𝑝 𝑖 ) , where there are 𝐶 ( 𝑎 𝑖 ) candidate values for 𝑝 𝑖 . Then, each user profile can be initialized as an one-hot vector p = Concat ( ˜p , . . . , ˜p 𝑚 ) ∈ R 𝑛 ( 𝑛 = (cid:205) 𝑚𝑖 = ( 𝐶 ( 𝑝 𝑖 )) , which is the con-catenation of one-hot representations of attributes. 𝑘 is the number WW ’21, April 19–23, 2021, Ljubljana, Slovenia Pei et al.

Figure 3: An overview of the dynamic pipeline of the CoMemNN model. The UPE modules captures the interaction betweenuser profiles and dialogues by three submodules: MI, MU and MR. The DRS module and the UPE module cooperate so as toselect better responses. Section 3 contains a walkthrough of the model. of users, 𝑑 is the embedding dimension, and Ψ is a linear transfor-mation function. Given any user profile 𝑢 , we find his/her ( 𝑘 − ) nearest neighbors based on dot product similarity. Dialogue Memory Initialization.

To model user-dialogue relations,we initialize a dialogue memory M 𝐷𝑡 = [ h 𝑡 , . . . , h 𝑘𝑡 ] ∈ R 𝑘 × 𝑑 , where h 𝑡 is the representation of the Current Dialogue (CD) from thecurrent user. The others are the Neighbor Dialogues (NDs) fromneighbor users. For each user, the dialogue history can be computedas: h 𝑡 = ( 𝑡 − ) ∑︁ 𝑖 = 𝜆 𝑖𝑡 H 𝑖𝑡 ∈ R 𝑑 𝜆 𝑖𝑡 = ( ˜q 𝑡 ) 𝑇 · H 𝑖𝑡 ∈ R , (1)where we use the updated query ˜q 𝑡 to address the aggregateddialogue history H 𝑡 , the addressing weight 𝜆 𝑖𝑡 is computed by thedot product of query ˜q 𝑡 and the 𝑖 -th utterance representation H 𝑖𝑡 .Following [4, 19], we represent each utterance as a bag-of-wordsusing the embedding matrix E ∈ R 𝑑 × 𝑉 , where 𝑑 is the embeddingdimension, 𝑉 is the vocabulary size, Φ (·) maps the utterance to abag of dimension 𝑉 . At the beginning of turn 𝑡 , the updated query ˜q 𝑡 is initialized as: ˜q 𝑡 = E Φ ( 𝑋 𝑢𝑡 ) ∈ R 𝑑 . (2)Similarly, the aggregated dialogue history H 𝑡 of the current user 𝑢 can be embedded as: H 𝑡 = [ E Φ ( 𝑋 𝑢 ) , E Φ ( 𝑋 𝑠 ) , . . . , E Φ ( 𝑋 𝑢𝑡 − ) , E Φ ( 𝑋 𝑠𝑡 − )] ∈ R ( 𝑡 − )× 𝑑 . (3) Dialogue Memory Updating.

To obtain an intermediate dialoguememory ˜M 𝐷𝑡 , we update the 𝑖 -th dialogue memory slot ˜M 𝐷𝑡 [ : , 𝑖 ] us-ing the newest updated query ˜q 𝑡 to address initial dialogue memory M 𝐷𝑡 as: ˜M 𝐷𝑡 [ : , 𝑖 ] = 𝑘 ∑︁ 𝑗 = 𝛽 𝑗𝑡 M 𝐷𝑡 [ : , 𝑗 ] ∈ R 𝑑 𝛽 𝑗𝑡 = ( ˜q 𝑡 ) 𝑇 · M 𝐷𝑡 [ : , 𝑗 ] ∈ R . (4)Next, the initial dialogue memory is updated by assigning M 𝐷𝑡 = ˜M 𝐷𝑡 . As the dialogue evolves, the profile memory will graduallyimprove the dialogue memory because ˜q 𝑡 contains informationfrom the previous profile memory, so addressing with ˜q 𝑡 linksprofile-dialogue relations to the dialogue memory. Profile Memory Updating.

Similarly, we can obtain an intermediateprofile memory ˜M 𝑃𝑡 with the following steps: ˜M 𝑃𝑡 [ : , 𝑖 ] = 𝑘 ∑︁ 𝑗 = 𝛼 𝑗𝑡 M 𝑃𝑡 [ : , 𝑗 ] ∈ R 𝑑 𝛼 𝑗𝑡 = ( M 𝑃𝑡 [ : , 𝑖 ]) 𝑇 · M 𝑃𝑡 [ : , 𝑗 ] ∈ R . (5)Next, the profile memory slot M 𝑃𝑡 [ : , 𝑖 ] is updated by a function Γ (·) using the intermediate profile memory slot ˜M 𝑃𝑡 [ : , 𝑖 ] and the newestupdated dialogue memory slot ˜M 𝐷𝑡 [ : , 𝑖 ] : M 𝑃𝑡 [ : , 𝑖 ] = Γ ( ˜M 𝑃𝑡 [ : , 𝑖 ] , ˜M 𝐷𝑡 [ : , 𝑖 ]) ∈ R 𝑑 , (6)where Γ (·) is a mapping function that is implemented by a MultipleLayer Perceptron (MLP) in this work. In this process, the dialoguememory helps to improve the profile memory because Γ (·) linksdialogue-profile relations to the profile memory. Dialogue Memory Reading.

Since the first memory slot correspondsto the current user, we compute m 𝐷𝑡 by hard addressing and use itto update the query ˜q 𝑡 as follows: ˜q 𝑡 = ˜q 𝑡 + m 𝐷𝑡 ∈ R 𝑑 m 𝐷𝑡 = ˜M 𝐷𝑡 [ : , ] ∈ R 𝑑 . (7) Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Profile Memory Reading.

Similarly, we obtain m 𝑃𝑡 by hard address-ing and use it to update the query ˜q 𝑡 as follows: ˜q 𝑡 = ˜q 𝑡 + m 𝑃𝑡 ∈ R 𝑑 m 𝑃𝑡 = M 𝑃𝑡 [ : , ] ∈ R 𝑑 . (8) We use the latest updated query ˜q 𝑡 to match with candidate dialogueresponses and the predicted response distribution is computed asfollows: ˜y 𝑡 = Softmax ( ˜q 𝑇𝑡 r + b , . . . , ˜q 𝑇𝑡 r | 𝑌 | + b | 𝑌 | ) ∈ R | 𝑌 | b 𝑗 = (cid:40) f 𝑖 ∈ R if r 𝑗 mentions 𝑖 -th attribute of a KB entry0 otherwise f = ReLU ( Fp ) ∈ R 𝑘𝑏 , (9)where r 𝑗 is the representation of the 𝑗 -th candidate response, | 𝑌 | is the number of all candidate responses. We follow Luo et al. [19]to model the user bias towards KB entries over the 𝑗 -th candidateresponse by a term b 𝑗 , where the dimension 𝑘𝑏 is the number ofattributes of a KB entry. p ∈ R 𝑛 is the one-hot representation ofthe current user profile. F ∈ R 𝑘𝑏 × 𝑛 maps user profiles into a KBentry. Multiple-hop reading or updating has been shown to help improveperformance of MemNN by reading or updating the memory mul-tiple times [9, 19, 33]. To enhance CoMemNN, we devise a learningalgorithm to update the query and memories with multiple hops,and further differentiate the specific losses of the UPE and DRSmodules. The learning procedure is shown in Algorithm 1. First,MI searches neighbors { 𝑢 , . . . , 𝑢 𝑘 } of the current user 𝑢 to ini-tialize the profile memory M 𝑃𝑡 and dialogue memory M 𝐷𝑡 . Second,MU and MR are conducted 𝐻𝑜𝑝𝑁 times, and for each time: MUupdates the dialogue memory M 𝐷𝑡 and the profile memory M 𝑃𝑡 byconsidering their cooperative interaction. After that, MR updatesthe query representation q 𝑡 by reading from the enriched dialoguememory followed by profile memory. Last, the Dialogue ResponseSelection (DRS) module uses the newest updated query ˜q 𝑡 to matchcandidate responses so as to predict a response distribution ˜y 𝑡 .To evaluate the performance of DRS and UPE, we define twomapping functions to get prediction labels: • Argmax (·) : it outputs the index 𝑦 𝑡 with the highest probabilityin a predicted response distribution ˜ y 𝑡 ; • PiecewiseArgmax (·) : it generates a 1-0 vector from the predictedenriched profile m 𝑃𝑡 , where ˜ p 𝑡 [ 𝑖 ] = m 𝑃𝑡 [ 𝑖 ] achieves thehighest probability among the values that belong to the sameattribute.To optimize DRS, we use a standard cross-entropy loss between theprediction ˜y and the one-hot encoded true label y : L DRS ( 𝜃 ) = − 𝑁 𝑁 ∑︁ 𝑖 = | 𝑌 | ∑︁ 𝑗 = y 𝑗 log ˜y 𝑗 , (10)where 𝜃 are all parameters in the model and 𝑁 is the number oftraining samples. Algorithm 1:

Multiple hop CoMemNN.

Input: turn 𝑡 , user 𝑢 , profile p , dialogue history H 𝑡 ,query q 𝑡 , response candidates { r , . . . , r | 𝑌 | } , maxhop 𝐻𝑜𝑝𝑁 , ( 𝑘 − ) neighbors Output:

A index y 𝑡 of next response; An one-hot vector ˜ p 𝑡 presenting the enriched profile. { 𝑢 , . . . , 𝑢 𝑘 } ← Search ( p , 𝑘 − ) ; ⊲ MI M 𝑃𝑡 ← [ p , . . . , p 𝑘 ] ; M 𝐷𝑡 ← [ h 𝑡 , . . . , h 𝑘𝑡 ] ; h 𝑖𝑡 ← ( ˜ 𝑞 𝑡 , H 𝑖𝑡 ) , 𝑖 ∈ [ , 𝑘 ] ; ˜q 𝑡 ← q 𝑡 ; while hop ≤ HopN do ˜M 𝐷𝑡 ← M 𝐷𝑡 ; ˜M 𝑃𝑡 ← M 𝑃𝑡 ; ⊲ MU M 𝐷𝑡 ← ˜M 𝐷𝑡 ; M 𝑃𝑡 ← Γ ( ˜M 𝑃𝑡 , ˜M 𝐷𝑡 ) ; m 𝐷𝑡 ← M 𝐷𝑡 ; ˜q 𝑡 ← ˜q 𝑡 + m 𝐷𝑡 ; ⊲ MR m 𝑃𝑡 ← M 𝑃𝑡 ; ˜q ← ˜q 𝑡 + m 𝑃𝑡 ; end ˜y 𝑡 ← Softmax ( ˜q 𝑇𝑡 r + b , . . . , ˜q 𝑇𝑡 r | 𝑌 | + b | 𝑌 | ) ; ⊲ DRS 𝑦 𝑡 ← Argmax 𝑗 ( ˜y 𝑡 ) ; ˜ p 𝑡 ← PiecewiseArgmax ( m 𝑃𝑡 ) To control the learning of UPE, we introduce the element-wisemean squared loss between the sampled profile p = { 𝑝 , . . . , 𝑝 𝑁 } and its corresponding enriched profile ˜ p = { ˜ 𝑝 , . . . , ˜ 𝑝 𝑁 } : L UPE ( 𝜃 ) = − 𝑁 𝑁 ∑︁ 𝑖 = ( 𝑝 𝑖 − ˜ 𝑝 𝑖 ) , (11)where 𝜃 are all parameters in the model and 𝑁 is the number ofsampled values.Finally, the final loss is a linear combination: L( 𝜃 ) = 𝜇 L DRS ( 𝜃 ) + ( − 𝜇 )L UPE ( 𝜃 ) , (12)where 𝜇 is a hyper-parameter to balance the relative importance ofthe constituent losses. We seek to answer the following questions in our experiments:(Q1) How well does CoMemNN perform? Does it significantly andcontinuously outperform state-of-the-art methods? (Q2) What arethe effects of different components in CoMemNN? (Q3) Do differentprofile attributes contribute differently? and (Q4) How well doesCoMemNN perform in terms of robustness?

We use the personalized bAbI dialogue (PbAbI) dataset [9] for ourexperiments; this is an extension of the bAbI dialogue (bAbI) datasetthat incorporates personalization [2]. To the best of our knowledge,this is the only available open dataset for personalized TDSs. Thereare two versions: a large version with around 12,000 dialoguesand a small version with 1,000 dialogues. These two datasets sharethe same vocabulary with 14 ,

819 tokens and candidate responseset with 43 ,

863 responses. It defines four user profile attributes(gender, age, dietary preference, and favorite food) and composes

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Pei et al. corresponding attribute-value pairs to a user profile. Each conver-sation is provided with all of the above user profile attributes, e.g.,{(Gender, Male), (Age, Young), (Dietary, Non-vegetarian), (Favorite:Fish and Chips)}. But this does not mean the given user profile iscomplete because the user may also like “Paella”, although “Fish andChips” is his/her favorite food. To simulate incomplete profiles withvarious degrees of incompleteness, we randomly discard attributevalues from a user profile with probabilities of [0%, 10%, 30%, 50%,70%, 90%, 100%] and obtain 7 alternative datasets, respectively.We evaluate the performance of the full dialogue task using thefollowing two metrics [9]: • Response Selection Accuracy (RSA): the fraction of correct re-sponses out of all candidate responses [9, 19]; and • Profile Enrichment Accuracy (PEA): we define this metric as thefraction of correct profile values out of all discarded profile values.We use a paired t-test to measure statistical significance ( 𝑝 < . 𝜎 , namely stability coefficient , which is defined as the standard deviation ofa list of performance results. Formally, given a list of evaluationvalues [ 𝑧 , . . . , 𝑧 𝑁 + ] , either RSA or PEA scores, 𝜎 is computed asfollows: 𝜎 ( z ) = (cid:118)(cid:117)(cid:116) 𝑁 𝑁 ∑︁ 𝑖 = ( z 𝑖 − ¯ z ) z = [ 𝑧 − 𝑧 , . . . , 𝑧 𝑁 + − 𝑧 𝑁 ] , (13)where ¯ 𝑧 is the mean of the values in performance difference list z . We compare with all the methods that have reported results on thePbAbI dataset [9]. • Memory Network (MemNN) . It regards the profile informationas the first user utterance ahead of each dialogue and achievespersonalization by modeling dialogue context using the standardMemNN model [1]. • Split Memory Network (SMemNN) . It splits memory into aprofile memory and a dialogue memory. The former encodes userprofile attributes as separate entries and the latter operates thesame as the MemNN. The element-wise sum of both memoriesare used for final decision [9]. • Retrieval Memory Network (RMemNN) . It features an en-coder-encoder memory network with a retrieval module thatemploys the user utterances and user profiles to collect relevantinformation from similar users’ conversations [47]. • Personalized Memory Network (PMemNN) . It uses MemNNto model the current user profile, the current dialogue history, aswell as the dialogue history of all users with the same gender andage. It also models user bias towards different KB entries [19]. • Neighbor-based Personalized Memory Network (NPMem-NN) . Our implementation of PMemNN is based on Pytorch. Un-like PMemNN, we use the dialogue history from the nearest ( 𝑘 − ) neighbors instead of all users with the same gender andage. We follow the experimental settings detailed in [19]. The embeddingsize of word/profile is 128. The size of memory is 250. The mini-batch size is 64. The maximum number of training epoch is 250,and the number of hops is 3 (see Algorithm 1). The K-NearestNeighbors (KNN) algorithm is implemented based on faiss with theinner product measurement and the number of collaborative users 𝑘 = And the code of the other models is taken from the original papers.We use Adam [10] as our optimization algorithm with learningrate of 0 .

01 and initialize the learnable parameters with the Xavierinitializer. We also apply gradient clipping [24] with range [− , ] during training. We use 𝑙 − . We treat the importance of lossesof DRS and UPE equally, i.e., 𝜇 = .

5. The code is available online. We show the overall response selection performance of all methodsin Table 2.

Table 2: Overall performance in terms of the RSA met-ric. Bold face indicates leading results. Significant improve-ments over NPMemNN are marked with ∗ (paired t-test, 𝑝 < . ). Small set (%) Large set (%) MemNN [9] 77.74 85.10SMemNN [9] 78.10 87.28RMemNN [47] 83.94 87.33PMemNN [19] 88.07 95.33NPMemNN 87.91 97.49CoMemNN ∗ ∗ First, CoMemNN outperforms all baselines on both the small andlarge datasets by a large margin. It significantly outperforms thebest baseline PMemNN by 3.06% on the small dataset and 2.80% onthe large dataset. The improvements demonstrate the effectivenessof CoMemNN. We believe the main reason is that the proposedcooperative mechanism is able to enrich the incomplete profilesgradually as dialogues progress and the enriched profiles improvehelp to response selection simultaneously. We will analyze this inmore depth in the next session.Second, the performance of NPMemNN is comparable to that ofPMemNN on the small dataset and achieves 2.16% higher RSA onthe large dataset. Recall that NPMemNN is our implementation ofPMemNN using Pytorch; the only difference is the KNN algorithmused for neighbor searching, so the result shows that our newneighbor searching method is more effective. Since our CoMemNNis built upon NPMemNN, for the remaining experiments, we willuse NPMemNN for further comparison and analysis.Third, the results on the small and large datasets mostly showconsistent trends. For the remaining analysis experiments in the https://github.com/facebookresearch/faiss https://pytorch.org/ https://github.com/Jiahuan-Pei/CoMemNN Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles WWW ’21, April 19–23, 2021, Ljubljana, Slovenia next section (Section 6), we will report results on the small datasetonly. The findings on the large dataset are qualitatively similar.

We compare CoMemNN and NPMemNN under different profilediscard ratios. The results are shown in Table 3.

Table 3: Comparison of CoMemNN and NPMemNN in termsof the RSA metric w.r.t. different profile discard ratios. Boldface indicates leading results. Significant improvementsover NPMemNN are marked with ∗ (paired t-test, 𝑝 < . ).The values of Diff. are computed by absolute difference ofRSA (%) between CoMemNN and NPMemNN.Discard Ratio

0% 10% 30% 50% 70% 90% 100%NPMemNN 87.91 86.11 86.56 85.79 83.93 84.08

CoMemNN ∗ ∗ ∗ ∗ ∗ ∗ − ∗ ∗ ∗ ∗ ∗ ∗ ∗ Large Set/Diff. 0.64 0.93 1.63 2.01 1.58 5.67 2.23First, CoMemNN significantly outperforms NPMemNN on boththe small and large datasets when the profile discard ratios rangefrom 0% to 90%. Specifically, it gains an improvement of 0.75%–3.79% on the small dataset and 0.64%–5.67% on the large dataset, re-spectively. Without discarding profile attribute values, CoMemNNachieves 3.22% / 0.64% of improvement compared with NPMemNN.Unlike the raw profiles where each attribute has only one value,the enriched profile generated by CoMemNN is able to represent adistribution over all possible values, which can better capture users’preference. For example, a user may label “Fish and Chips” as hisfavorite food, but this does not mean he does not like “Paella.” Withthe raw profile, this is not addressed.Second, the performance of CoMemNN steadily decreases withthe increase of the profile discard ratio, as is to be expected. This isreasonable as it becomes more and more challenging for CoMemNNto find back missing values of user profiles. Interestingly, the per-formance difference between CoMemNN and NPMemNN first in-creases and then decreases with the increase of the profile discardratio. A possible reason is that CoMemNN is able to infer the miss-ing values of user profiles effectively with lower profile discardratios. However, the profile enrichment ability decreases due tothe lack of too many profile values. This hypothesis can be veri-fied by the results that the increase trend lasts longer on the largedataset. Because even with the same profile discard ratio, there aremore values of user profiles left on the large dataset for CoMemNNto infer the missing ones. We note that NPMemNN outperformsCoMemNN when all user profiles are discarded on the small dataset.The reason is that UPE cannot enrich user profiles properly in thiscase, which results in a negative impact on DRS. But this is not thecase on the large dataset where UPE can still enrich user profilesproperly when the model can find enough personal informationclues from more dialogue history.Third, to answer Q4 , we compute the statistic 𝜎 (Eq. 13) tocompare the model stability. The 𝜎 values for CoMemNN and NPMemNN are 0.3357/1.0407 on the small dataset and 1.3479/1.4849on the large dataset, respectively. Thus, NPMemNN has higher devi-ations, which shows that CoMemNN is more stable than NPMemNNwith various profile discard ratios. We analyze the performance of the following variants of CoMemNN: • CoMemNN . The full model. • CoMemNN-PEL . CoMemNN without Profile Enrichment Loss(PEL), defined in Eq. 11. • CoMemNN-PEL-UPE . CoMemNN without PEL or UPE. This isexactly NPMemNN. • CoMemNN-NP . CoMemNN without the Neighbor Profile (NP)as input for UPE. • CoMemNN-NP-CP . CoMemNN without NP or the Current Pro-file (CP) as input for UPE. • CoMemNN-ND . CoMemNN without the Neighbor Dialogue(ND) of dialogues as input for UPE. • CoMemNN-ND-CD . CoMemNN without ND or the CurrentDialogue (CD) of dialogues as input for UPE. • CoMemNN-ND-NP . CoMemNN without ND or NP of dialoguesas input for UPE.

We study the PEA performance of different variants in Table 4.First, CoMemNN can effectively enrich user profiles by inferringthe missing values. It is able to correctly predict more than 98.98%of missing values in user profiles under different profile discardratios. We believe UPE benefits a lot from modeling the interac-tion between user profiles and dialogues. UPE is able to capturemore personal information from dialogue history with dialoguesgradually going on. The PEA scores are all very high, because thePbAbI dataset is simulated, which makes it relatively easy to predictmissing attribute values of user profiles.Second, we can see that each component of UPE generally hasa positive effect on the performance since most PEA scores ofmost variants decrease. Specifically, CoMemNN-PEL decreases by8.38%–14.20% compared with CoMemNN. This means that it is im-portant to add the UPE loss (Eq. 11), rather than only optimizingthe DRS loss (Eq. 10). We also show how the four componentsof UPE (i.e., NP, CP, ND, and CD as defined in Section 3.3) af-fect its performance. We find that: (1) CoMemNN-ND-NP contin-uously decreases 0.90%–2.32% with the increase of the profile dis-card ratio. This means that neighbor users play an important role.(2) CoMemNN-ND-CD (with 100% profile discard ratio) decreasesdramatically, which is as expected, because CoMemNN cannot in-fer the missing values without any dialogue history and profiles.This also explains the increase of the corresponding RSA score inTable 5. (3) The decrease is mostly less than 2.32% except that thedecrease of CoMemNN-ND-CD (with 100% profile discard ratio, i.e.,no NP or CP as well) is 64.2%. This reveals that different informationsources are complementary to each other. The performance willnot be affected largely unless all the four inputs (i.e., NP, CP, ND,CD) are removed.Lastly, we compute the stability coefficient 𝜎 (Eq. 13) of thevariants in Table 4 which are 0.1867, 1.8781, 0.2236, 0.1402, 25.6845, WW ’21, April 19–23, 2021, Ljubljana, Slovenia Pei et al.

Table 4: Performance of UPE evaluated in terms of Profile Enrichment Accuracy (PEA). In each cell, the first number representsthe PEA (%), and the number in parentheses shows the difference compared with CoMemNN. ↓ and || denote a decrease andno change compared to CoMemNN, respectively.Discard Ratio

10% 30% 50% 70% 90% 100%CoMemNN 99.99 99.93 99.82 99.83 99.38 98.98CoMemNN-PEL 85.71 ( ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ || ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ Table 5: Ablation study on DRS evaluated in terms Response Selection Accuracy (RSA). In each cell, the first number representsthe RSA (%), and the number in parentheses shows the difference compared with CoMemNN. ↓ and ↑ denote decrease andincrease, respectively. Underlining marks results that are ≥

0% 10% 30% 50% 70% 90% 100%CoMemNN 91.13 89.90 88.69 87.80 86.35 84.83 82.85CoMemNN-PEL 90.84 ( ↓ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↑ ↓ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↓ ↓ ↑ ↑ We investigate the RSA performance of different variants in Table 5.First, the performance decreases generally by removing anycomponent of UPE. In particular, CoMemNN-PEL has a greatereffect on RSA when the profile discard ratios get larger. This isreasonable because the larger the profile discard ratio, the morespace for improvement the proposed model has compared withNPMemNN. CoMemNN-PEL-UPE is inferior to CoMemNN-PELgenerally, which means that the UPE module helps as it implic-itly impact the DRS loss (Eq. 10). But this ability weakens when theprofile discard ratio is larger than 90%.Second, we observe that the four information sources (i.e., NP,CP, ND, CD) have different effects under different profile discardratios. Particularly, the profiles of the current users and their neigh-bors generally contribute most to the RSA performance. We cansee that CoMemNN-NP-CP drops 1.50%–4.53% under all profile dis-card ratios. The reason is that user profiles directly store personalinformation; it is easier to infer missing values from collaborativeuser profiles than from dialogues.Third, we find that NP and ND are complementary to each other.CoMemNN-NP either has a massive drop (2.54%–3.05%) or smallchanges ( ≤ We compare the RSA performance of CoMemNN and NPMemNNwith different numbers of hops. The results are shown in Table 6.

Table 6: Analysis of the effect of hop number on DRS. Boldface indicates leading results. Significant improvementsover NPMemNN are marked with ∗ (paired t-test, 𝑝 < . ).The values of Diff. are computed by absolute difference ofRSA (%) between CoMemNN and NPMemNN. NPMemNN 88.11 87.22 87.91 87.61CoMemNN ∗ ∗ ∗ ∗ Diff. 1.96 3.56 3.22 3.16

Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

We see that CoMemNN greatly outperforms NPMemNN by alarge margin (1.96%–3.56%) with all number of hops. This fur-ther confirms the non-trivial improvement of CoMemNN. Besides,CoMemNN improves by 1.06% when the number of hop changesfrom 1 to 3 and slightly decreases with 4. This means that CoMemNNbenefits from a multiple-hop mechanism.

We explore how the four types of profile attributes (i.e., gender, age,dietary preference, and favorite food) affect the RSA performance.The results are shown in Table 7.

Table 7: Analysis of profile attribute importance to DRS. Dis-card attribute table shows we discard all values of a specificattribute or a combination of two specific attributes. Retainattribute table shows we retain all values of a specific at-tribute and discard all values for the rest. Underline indi-cates the lower bound baseline that retains no attributes.Bold face indicates the upper bound baseline that retains allattributes.Discarded attribute none gender age dietary favorite allgender / /age / /

Retained attribute

First, each attribute works well in isolation. Specifically, when weonly retain the values of each single attribute, we obtain the resultsin the last row as 87.46%, 87.93%, 90.57%, 87.37% for gender, age,dietary, favorite, respectively. The attribute “dietary” contributesmost followed by “age”, “gender” and “favorite.”Second, different types of attributes depend on each other andinfluence the RSA performance differently. If we only remove thevalues of one attribute, we get the results on the diagonal: 93.05%,92.26%, 86.74%, 90.25%, respectively. Removing “dietary” drops mostfollowed by “favorite.” Thus, “dietary” contributes more than therest.An exception is that the RSA performance increases when dis-carding “gender” and “age.” We believe this is the effect of theneighbors. To show this, we further investigate the effect of “gen-der” and “age” without using neighbor information. The results areshown in Table 8.We can see that removing “gender” and “age” decreases theperformance in this case. Thus, the different effects of “gender” and“age” are due to the neighbors.

In this paper, we have studied personalized TDSs without assumingthat we have complete user profiles. We have proposed CooperativeMemory Network (CoMemNN), which introduces a cooperativemechanism to enrich user profiles gradually as dialogues progress,and to improve response selection based on enriched profiles simul-taneously. We also devise a learning algorithm to effectively learnCoMemNN with multiple hops.

Table 8: Analysis of profile attribute importance to DRSwithout the effect of neighbors. Bold face indicates the base-line of CoMemNN without neighbors. In each cell, the firstnumber represents the RSA (%), and the number in paren-thesis shows the difference values, and ↓ denotes decreasecompared with the baseline. RSA (Diff.)CoMemNN w/o neighbors

CoMemNN w/o neighbors - gender 88.25 ( ↓ ↓ ↓ ACKNOWLEDGMENTS

This research was partially supported by the China ScholarshipCouncil (CSC). All content represents the opinion of the authors,which is not necessarily shared or endorsed by their respectiveemployers and/or sponsors.

REFERENCES [1] Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2016. Learning end-to-endgoal-oriented dialog. arXiv preprint arXiv:1605.07683 (2016).[2] Antoine Bordes and Jason Weston. 2017. Learning end-to-end goal-orienteddialog. In

ICLR .[3] Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. 2017. A survey ondialogue systems: Recent advances and new frontiers.

ACM SIGKDD ExplorationsNewsletter

19, 2 (2017), 25–35.[4] Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexan-der Miller, Arthur Szlam, and Jason Weston. 2015. Evaluating prerequisite qual-ities for learning end-to-end dialog systems. arXiv preprint arXiv:1511.06931 (2015).[5] Mihail Eric, Lakshmi Krishnan, Francois Charette, and Christopher D Manning.2017. Key-value retrieval networks for task-oriented dialogue. In

SIGDIAL . 37–49.[6] Jessica Ficler and Yoav Goldberg. 2017. Controlling linguistic style aspects inneural language generation. In

Proceedings of the Workshop on Stylistic Variation .94–104.[7] Matthew Henderson, Ivan Vulić, Daniela Gerz, Iñigo Casanueva, PawełBudzianowski, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, NikolaMrkšić, and Pei-Hao Su. 2019. Training neural response selection for task-orienteddialogue systems. In

ACL . 5392–5404.[8] Jonathan Herzig, Michal Shmueli-Scheuer, Tommy Sandbank, and David Konop-nicki. 2017. Neural response generation for customer service based on personalitytraits. In

Proceedings of the 10th International Conference on Natural LanguageGeneration . 252–256.[9] Chaitanya K Joshi, Fei Mi, and Boi Faltings. 2017. Personalization in goal-orienteddialog. In

NIPS Conversational AI Workshop .[10] Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimiza-tion. In

ICLR .[11] Aaron W Li, Veronica Jiang, Steven Y Feng, Julia Sprague, Wei Zhou, and JesseHoey. 2020. ALOHA: Artificial learning of human attributes for dialogue agents.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Pei et al. In AAAI . 8155–8163.[12] Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, andBill Dolan. 2016. A persona-based neural conversation model. In

ACL . 994–1003.[13] Xiang Li, Gokhan Tur, Dilek Hakkani-Tür, and Qi Li. 2014. Personal knowledgegraph population from user utterances in conversational understanding. In . IEEE, 224–229.[14] Zibo Lin, Deng Cai, Yan Wang, Xiaojiang Liu, Hai-Tao Zheng, and Shuming Shi.2020. Grayscale data construction and multi-level ranking objective for dialogueresponse selection. arXiv preprint arXiv:2004.02421 (2020).[15] Fei Liu and Julien Perez. 2017. Gated end-to-end memory networks. In

EACL .1–10.[16] Qian Liu, Yihong Chen, Bei Chen, Jian-Guang Lou, Zixuan Chen, Bin Zhou, andDongmei Zhang. 2020. You impress me: Dialogue generation via mutual personaperception. arXiv preprint arXiv:2004.05388 (2020).[17] Yichao Lu, Manisha Srivastava, Jared Kramer, Heba Elfardy, Andrea Kahn, SongWang, and Vikas Bhardwaj. 2019. Goal-oriented end-to-end conversationalmodels with profile features in a real-world setting. In

NAACL-HLT . 48–55.[18] Yi Luan, Chris Brockett, Bill Dolan, Jianfeng Gao, and Michel Galley. 2017. Multi-task learning for speaker-role adaptation in neural conversation models. In

IJCNLP . 605–614.[19] Liangchen Luo, Wenhao Huang, Qi Zeng, Zaiqing Nie, and Xu Sun. 2019. Learningpersonalized end-to-end goal-oriented dialog. In

AAAI . 6794–6801.[20] Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, and Pascale Fung. 2019. Per-sonalizing dialogue agents via meta-learning. In

ACL . 5454–5459.[21] Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. 2018. Mem2Seq: Effectivelyincorporating knowledge bases into end-to-end task-oriented dialog systems. In

ACL . 1468–1478.[22] Pierre-Emmanuel Mazare, Samuel Humeau, Martin Raison, and Antoine Bordes.2018. Training millions of personalized dialogue agents. In

EMNLP . 2775–2779.[23] Kaixiang Mo, Yu Zhang, Shuangyin Li, Jiajun Li, and Qiang Yang. 2018. Per-sonalizing a dialogue system with transfer reinforcement learning. In

AAAI .5317–5324.[24] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty oftraining recurrent neural networks. In

ICML . 1310–1318.[25] Jiahuan Pei, Pengjie Ren, and Maarten de Rijke. 2019. A modular task-orienteddialogue system using a neural mixture-of-experts. In

SIGIR Workshop on Con-versational Interaction Systems .[26] Jiahuan Pei, Pengjie Ren, Christof Monz, and Maarten de Rijke. 2020. Retrospec-tive and prospective mixture-of-generators for task-oriented dialogue responsegeneration. In

ECAI . 2148–2155.[27] Jiahuan Pei, Arent Stienstra, Julia Kiseleva, and Maarten de Rijke. 2019. SEntNet:Source-aware recurrent entity network for dialogue response selection. In

IJCAIWorkshop SCAI .[28] Qiao Qian, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018.Assigning personality/profile to a chatting machine for coherent conversationgeneration.. In

IJCAI . 4279–4285.[29] Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, and Lazaros Polymenakos.2018. Learning end-to-end goal-oriented dialog with multiple answers. In

EMNLP .3834–3843.[30] Michael Shum, Stephan Zheng, Wojciech Kryściński, Caiming Xiong, and RichardSocher. 2019. Sketch-Fill-AR: A persona-grounded chit-chat generation frame-work. arXiv preprint arXiv:1910.13008 (2019).[31] Haoyu Song, Yan Wang, Wei-Nan Zhang, Xiaojiang Liu, and Ting Liu. 2020.Generate, delete and rewrite: A three-stage framework for improving personaconsistency of dialogue generation. arXiv preprint arXiv:2004.07672 (2020).[32] Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Ex-ploiting persona information for diverse generation of conversational responses.In

IJCAI . 5190–5196.[33] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memorynetworks. In

Advances in neural information processing systems . 2440–2448.[34] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learningwith neural networks. In

NeurIPS . 3104–3112.[35] Anna Tigunova. 2020. Extracting personal information from conversations. In

Companion Proceedings of The Web Conference 2020 . 284–288.[36] Anna Tigunova, Andrew Yates, Paramita Mirza, and Gerhard Weikum. 2019.Listening between the lines: learning personal attributes from conversations. In

The Web Conference . ACM, 1818–1828.[37] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learningfor recommender systems. In

SIGKDD . ACM, 1235–1244.[38] Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gasic, Lina M RojasBarahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network-basedend-to-end trainable task-oriented dialogue system. In

EACL . 438–449.[39] Jason D Williams, Kavosh Asadi, and Geoffrey Zweig. 2017. Hybrid code net-works: Practical and efficient end-to-end dialog control with supervised andreinforcement learning. In

ACL . 665–677.[40] Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. Trans-fertransfo: A transfer learning approach for neural network based conversationalagents. arXiv preprint arXiv:1901.08149 (2019). [41] Chien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu, and Pascale Fung.2020. Getting to know you: User attribute extraction from dialogues. In

Proceed-ings of the 12th Language Resources and Evaluation Conference . 581–589.[42] Chien-Sheng Wu, Andrea Madotto, Genta Indra Winata, and Pascale Fung. 2018.End-to-end dynamic query memory network for entity-value independent task-oriented dialog. In

IEEE-ICASSP . IEEE, 6154–6158.[43] Minghong Xu, Piji Li, Haoran Yang, Pengjie Ren, Zhaochun Ren, Zhumin Chen,and Jun Ma. 2020. A neural topical expansion framework for unstructuredpersona-oriented dialogue generation. In

ECAI . –.[44] Min Yang, Zhou Zhao, Wei Zhao, Xiaojun Chen, Jia Zhu, Lianqiang Zhou, andZigang Cao. 2017. Personalized response generation via domain adaptation. In

SIGIR . 1021–1024.[45] Steve Young, Milica Gašić, Blaise Thomson, and Jason D Williams. 2013. POMDP-based statistical spoken dialog systems: A review.

Proc. IEEE

ACL: StudentResearch Workshop . 229–235.[47] Bowen Zhang, Xiaofei Xu, Xutao Li, Yunming Ye, Xiaojun Chen, and ZhongjieWang. 2020. A memory network based end-to-end personalized task-orienteddialogue generation.

Knowledge-Based Systems (2020), 106398.[48] Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, andJason Weston. 2018. Personalizing dialogue agents: I have a dog, do you havepets too?. In

ACL . 2204–2213.[49] Wei-Nan Zhang, Qingfu Zhu, Yifa Wang, Yanyan Zhao, and Ting Liu. 2019. Neuralpersonalized response generation as domain adaptation.

World Wide Web

22, 4(2019), 1427–1446.[50] Yinhe Zheng, Guanyi Chen, Minlie Huang, Song Liu, and Xuan Zhu. 2019. Person-alized dialogue generation with diversified traits. arXiv preprint arXiv:1901.09672 (2019).[51] Yinhe Zheng, Rongsheng Zhang, Minlie Huang, and Xiaoxi Mao. 2020. A Pre-training based personalized dialogue generation model with persona-sparse data.In

AAAI . 9693–9700.[52] Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The design and im-plementation of xiaoice, an empathetic social chatbot.