[PDF] INSPIRED: Toward Sociable Recommendation Dialog Systems

Abstract

In recommendation dialogs, humans commonly disclose their preference and make recommendations in a friendly manner. However, this is a challenge when developing a sociable recommendation dialog system, due to the lack of dialog dataset annotated with such sociable strategies. Therefore, we present INSPIRED, a new dataset of 1,001 human-human dialogs for movie recommendation with measures for successful recommendations. To better understand how humans make recommendations in communication, we design an annotation scheme related to recommendation strategies based on social science theories and annotate these dialogs. Our analysis shows that sociable recommendation strategies, such as sharing personal opinions or communicating with encouragement, more frequently lead to successful recommendations. Based on our dataset, we train end-to-end recommendation dialog systems with and without our strategy labels. In both automatic and human evaluation, our model with strategy incorporation outperforms the baseline model. This work is a first step for building sociable recommendation dialog systems with a basis of social science theories.

Full PDF

II NSPIRED : Toward Sociable Recommendation Dialog Systems

Shirley Anugrah Hayati ♡ Dongyeop Kang ☆ Qingxiaoyang Zhu ♡ Weiyan Shi ♡ Zhou Yu ♡♡ Department of Computer Science, University of California, Davis ☆ University of California, Berkeley { sahayati, qinzhu, wyshi, joyu } @[email protected] Abstract

In recommendation dialogs, humans com-monly disclose their preference and make rec-ommendations in a friendly manner. However,this is a challenge in developing a sociable rec-ommendation dialog system, due to the lackof dialog dataset annotated with such sociablestrategies. Therefore, we present I

NSPIRED , anew dataset of 1,001 human-human dialogsfor movie recommendation with measures forsuccessful recommendations. To better under-stand how humans make recommendationsin communication, we design an annotationscheme related to recommendation strategiesbased on social science theories and annotatethese dialogs. Our analysis shows that sociablerecommendation strategies, such as sharingpersonal opinions or communicating with en-couragement, more frequently lead to success-ful recommendations. Based on our dataset,we train end-to-end recommendation dialogsystems with and without our strategy labels.In both automatic and human evaluation, ourmodel with strategy incorporation outperformsthe baseline model. This work is a ﬁrst step forbuilding sociable recommendation dialog sys-tems with a basis of social science theories . Sociable conversational agents build rapport withusers, in order to gain trust and favor from them. So-cial science researchers believe that the rapport in-ﬂuence a more persuasive recommendation to suc-cessfully suggest an item that satisﬁes user needs(Yoo et al., 2012; Gkika and Lekakos; Pecune et al.,2019; Gretzel and Fesenmaier, 2006).However, existing works on recommendation di-alog systems lack a study about communicationstrategies used by human speakers for making suc-cessful and persuasive recommendations. They col- Dataset and code are available at https://github.com/sweetpeach/Inspired

REC: Hi! Happy Thanksgiving! I'm here to help you ﬁnd a trailer!

SEEK: Happy Thanksgiving! My favorite movie is ﬁnding Nemo I really like it

REC: Awesome! So do you like Disney movies in general?

SEEK: Yup they are so colorful and full of life!

REC: Yeah, I love Disney too! I have Disney + and watch it everyday haha. Have you seen the new Lady and the Tramp? I ﬁnd it relatable to my dog!

SEEK: Lol that’s good enough! Never heard of that one! what is it about?

REC: It's about a dog named Lady who runs away with a stray named Tramp out of jealousy . .What do you think?

SEEK: Woo sounds good! I deﬁnitely want to see this. Thank you!

REC: No problem! Hope you enjoy it as I did!

OFFERING HELPENCOURAGEMENTPERSONAL EXPERIENCEPREFERENCE CONFIRMATIONPERSONAL OPINIONSIMILARITYCREDIBILITY EXPERIENCE INQUIRYOPINION INQUIRY

Figure 1: An example snippet of human-human recom-mendation dialog in I

NSPIRED . REC refers a personwho recommends a movie and SEEK refers a personwho looks for a recommendation. Above each recom-mender’s utterance is the recommendation strategy an-notated by human workers. Best seen in colors. lect the dataset in scenario-based settings or convertproduct review datasets into question-answeringconversations (Reschke et al., 2013; Yan et al.,2017; Sun and Zhang, 2018; Kang et al., 2019;Li et al., 2018). Common issues with these types ofdatasets are: (1) homologous utterances, (2) mostlyquestion-answering pairs, and (3) lack of user en-gagement.In this work, we aim to validate whether sociablerecommendation strategies are effective for makinga successful recommendation in a dialog. To do so, a r X i v : . [ c s . C L ] O c t ataset I NSPIRED C ONV R EC G O R EC D IAL R E D IAL (Sun and Zhang, 2018) (Kang et al., 2019) (Li et al., 2018)Naturalness (cid:51) (cid:55) (cid:55) (cid:51)

Sociable Strategies (cid:51) (cid:55) (cid:55) (cid:55)

Movie Information (cid:51) (cid:55) (cid:51) (cid:55)

Conversation Types Mixed QA Mixed Mixed

Table 1: Comparison of related recommendation dialog datasets. “QA” refers to question-answer pairs. “Mixed” in-dicates that the conversations contain both statements and question-answer pairs. C

ONV R EC collected 385 human-curated dialogs, but only released 875,721 simulated dialogs. we propose I NSPIRED , a recommendation dialogdataset of two-paired crowd-workers in a naturalsetting, with additional annotations for sociablerecommendation strategies. The dataset consists of1,001 dialogs, and each utterance is manually an-notated with the sociable strategies based on socialscience theory. To encourage more natural dialogﬂow, we do not set any restrictions on the numberof movies or the type of movies to recommend. Fig-ure 1 shows an example of annotated dialog. Moreexamples are in Table 11 and 12 in the Appendix.Our analyses show that sociable recommenda-tion strategies are correlated with successful rec-ommendation in dialogs. These insights motivateus to build a more sociable recommendation dialogsystem to achieve better persuasion outcomes.For extrinsic evaluation, we build two end-to-enddialog systems trained on the I

NSPIRED dataset:one is encoded with recommendation strategies andthe other is not. We ﬁnd that the model encodedwith our strategy annotations performs better inboth automatic and human evaluation.We believe that enriching the intersection be-tween social science and computational linguisticsin I

NSPIRED opens plenty of rooms for future stud-ies on sociable recommendation dialog.

Social science theories on recommendation.

Psychological researchers believe that interactionswith recommendation systems should not only beseen from a technical perspective but should alsobe examined from a social and emotional perspec-tive (Zanker et al., 2006). Yoo et al. (2012) proposethat credibility, likeability, friendliness, humor, andother language styles are signiﬁcant factors for per-suasive recommendations. Pecune et al. (2019) hasstudied modeling social explanation for movie rec- ommendation, such as personal opinion and per-sonal experience. H¨aubl and Murray (2003) ﬁndthat more information on recommendation mayhelp consumers make better purchase decisions,but leave them overwhelmed with the abundant in-formation. Inspired by these theories, we borrowsuch principles in the design of our sociable recom-mendation strategies.

Conversational recommendation systems.

While studies on conversational recommendationsystems have been done, none of them focuson the sociable recommendation strategies forpersuasive outcome. This is is due to the lack ofexisting datasets for studying effective strategiesin recommendation dialog. Table 1 comparesdifferent factors across the recommendation dialogdatasets including I

NSPIRED .Prior works on recommendation dialogs collectdata based on template-based question-answeringpairs from user reviews (Thompson et al., 2004;Reschke et al., 2013; Sun and Zhang, 2018; Zhanget al., 2018b). These datasets contain structuredutterances where the recommender continuouslyasks for the seeker’s product preference.Kang et al. (2019) collected goal-driven recom-mendation dialogs (G O R EC D IAL ) in a gamiﬁedsetting where both speakers are given a small setof movies with descriptions to ﬁnd the best recom-mendation. This role-play game setting may noteffectively reﬂect the real-world situation since theseeker pretends that they like the given movies.The most similar work to ours is Li et al. (2018)’sR E D IAL dataset which consists of chit-chats formovie recommendation. However, the recommen-dations are conditioned on the movies mentionedin the dialog, and not directly on the language us-age. Also, they tend to mention only movie namesrather than an in-depth discussion on the movie igure 2: Movie search interface for recommenders. preference.Our work is also closely related to Radlinskiet al. (2019) on movie preference elicitation andFabian Galetzka1 (2020) on movie discussion inthe dialog setting. Preference elicitation is an im-portant step for the human recommender to com-prehend seeker’s taste before recommendation, butthese datasets are not recommendation conversa-tions.Meanwhile, dialogs in I

NSPIRED have bothstages: preference elicitation and recommendation.I

NSPIRED also captures sociable recommendationstrategies in conversations and measures recom-mendation with ratings.

Sociability in dialog systems.

In human-humanconversations, people engage in a talk that does notonly contain task-oriented topics (Bickmore andCassell, 2005). Thus, sociability has raised moreattention in dialog systems as they become moresociable, engaging, and user-adaptive (Zhang et al.,2018a; Shi and Yu, 2018; G¨oker and Thompson,2000).Zhang et al. (2018a) proposed a chit-chat datasetand presented the task of more personalized dialogssystem conditioned on user proﬁle information. So-ciability leads to a more persuasive conversation(Yoo et al., 2012), so social skills are essential fordialog systems to make successful recommenda-tions.Communication strategies on speciﬁc tasks, suchas donation and product price negotiation, havebeen found useful for task completion (Wang et al.,2019; Zhou et al., 2019). In this work, we connectdifferent sociable strategies with recommendationin dialog settings and show that sociable strategieshave a positive impact on recommendation success.

To ensure that the recommended movie has trailersand metadata information, we curate a databasewith all movie trailers from Movieclips Trailer released between 2008 and 2020, and movies fromMovieLens dataset (Harper and Konstan, 2015).In total, we have 17,869 movies with trailers andmetadata information. We design a simple moviesearch interface (Figure 2) to assist recommendersin searching for a movie. We recruit crowd-workers from Amazon Mechan-ical Turk. In each conversation, two workers arerandomly paired and assigned different roles: oneas a recommender and another as a seeker. Our col-lection set-up is more realistic compared to priorworks as (1) recommenders have no limitations ofthe number of movies to recommend, (2) seekersaccept or reject a movie following their true prefer-ence, and (3) we record if seekers actually watchthe video trailer or not.

Recommender.

Recommenders’ task is to rec-ommend a movie successfully to the seeker. Beforechatting, we show them tips for sociable recommen-dation strategies with example utterances. Thenthey chat with the seekers in two phases: user in-formation gathering and movie recommendation.In the user information gathering phase, recom-menders are asked to understand the seekers’ movietastes. In the recommendation phase, the recom-menders can still request seekers’ preference whilebrowsing movies to recommend. We encourage therecommenders to continue the conversation untilseekers accept a movie.

Seeker.

Seekers are asked to talk about movierecommendations without any strategy support. Af-ter they complete the conversation, seekers canopt to accept or reject the provided movie recom-mendations. If the seekers accept the recommenda-tion, they can watch the entire recommended movietrailer or part of it, or simply skip it after the con-versation. We record how long seekers watched therecommended movie trailer and ask them to ratethe trailer on 5-Likert scale in the post-task survey. youtube.com/user/movieclipsTRAILERS ataset Statistics Recommender’s Statistics

Seeker’s Statistics

Table 2: I

NSPIRED ’s statistics.

We use ParlAI platform (Miller et al., 2017) andhire 1,594 US crowd-workers from Amazon Me-chanical Turk with a minimum of 90% task ac-ceptance rate. The dialog collection process lastedfrom November 2019 to March 2020.Workers ﬁrst ﬁll out questionnaires related totheir personality traits and values before their con-versations. The questionnaire consists of three per-sonality trait models: the Big Five personality traits(15 questions) (Goldberg, 1993), the Schwartz Por-trait Value (10 questions) (Schwartz, 2003), andthe Decision Making Style (2 questions) (Hamiltonet al., 2016) . Then, recommenders start the conver-sation and both workers should chat for a minimumof 10 turns or until a recommendation is made. Af-ter the conversation ends, both workers will answera post-task survey of demographic questions suchas age, and gender. Seekers are asked to rate thetrailer with a high score (4 or 5 stars) on a 5-Likertscale and provide the reason of why they rejector do not ﬁnish watching the video. Both workersreceive a bonus up to $2 if they complete the entireprocess in addition to the base pay of $0.5.Table 2 presents statistics of the collecteddataset . Even though our dataset has relativelysmall number of samples compared to R E D IAL orG O R EC D IAL , it has human annotations on eachsociable strategy. Moreover, our dataset can be We also release this personality information in our datasetfor future work Dialog collection interfaces are in appendix H in Ap-pendix

Cases

Accept (Rating 4-5) 532 (53.1%)Accept (Rating 3 or lower) 45 (4.5%)Accept (Other Reasons) 289 (28.9%)Accept Uninterested 123 (12.3%)Reject 12 (1.2%)

Table 3: Statistics of dialogs when the seekers accept orreject the ﬁnal recommended movie. “Accept (Rating 4-5)” means that the seekers accept the recommendationand give rating 4 or 5, and the same is for “Accept (Rat-ing 3 or lower)”. “Accept (Other Reasons)” suggeststhat the seeker gives other reasons for not ﬁnishing thevideo. “Accept Uninterested” indicates that the seekersaccept the recommendation, do not ﬁnish watching thevideo, and explains in the post-task survey that they arenot interested in the recommended video. used in combination with other datasets in a semi-supervised setting, as shown in our implementationof recommendation dialog systems in §6.The statistics of accept and reject cases areshown in Table 3. We have higher number of suc-cessful cases (79.7%) compared to failure cases.This shows that people tend to accept recommen-dations, and it is not surprising since watching avideo trailer is an entertaining, low-risk activity.For training the dialog model, we use every dialogfrom all cases so that the dialog system will be ableto respond to diverse responses.

After conversations are collected, two experts,trained with linguistics background, develop anannotation scheme using content analysis method(Krippendorff, 2004) and from past study on hu-man behavior in making recommendations. Similarapproaches have been done in prior studies on workfor persuasion task (Wang et al., 2019) or negotia-tion task (Zhou et al., 2019). We divide the recom-mendation strategies into two categories: sociablestrategies and preference elicitation strategies. So-ciable strategies are also derived from our literaturestudy on the social science theories.

Sociable strategies contain eight strategies re-lated to the recommendation task. These strategiesrelate to the recommenders trying to build rapportwith the seekers.•

Personal opinion refers to a condition whenrecommenders express their subjective opinion ategory Example P ERSONAL O PINION “I really like Disney’s more recent princesses”P

ERSONAL E XPERIENCE “I have Disney+ and watched it everyday!”S

IMILARITY “Oh, I love Disney as well.”E

NCOURAGEMENT “You should deﬁnitely watch it!”O

FFERING H ELP “I’m here to help you ﬁnd a trailer!”P

REFERENCE C ONFIRMATION “So do you like Disney movies in general?”C

REDIBILITY “It’s about a dog named Lady who runs away with a stray namedTramp”S

ELF -M ODELING “We are planning to go see Maleﬁcent, we heard it was a very goodmovie.”E

XPERIENCE I NQUIRY “Have you seen the new Lady and the Tramp?”O

PINION I NQUIRY “What do you like about the Avengers: End-game?”R

ECOMMENDATION “You should check out Shazam!”

Table 4: Example utterances for each strategy. about a movie, including its plot, actors, or othermovie attributes.•

Personal experience refers to the use of sharingpersonal experience related to a movie. For exam-ple, recommenders may say that they watch themovie several times to convince the seekers thatthe movie is good. Both personal opinion andpersonal experience are part of self-disclosurethat leads to establishing rapport with the seekers(Altman, 1973).•

Similarity refers to a condition when the recom-menders are empathizing and being like-mindedtoward seekers about their movie preference toproduce similarity among them. Similarity is be-lieved to inﬂuence the seekers’ liking for thesource that leads to trust the recommenders’ judg-ment more (O’Keefe, 2004), following Lazars-feld and Merton (1964)’s homophily theory thatstates humans like other people who are similarto them.•

Encouragement is the use of praise of the seek-ers’ movie taste and encouragement to watch arecommended movie to build rapport and pro-mote the recommended movie.•

Offering help is a strategy when the recom-menders disclose explicit intention to help theseeker or being transparent. It is a part of “trans-parency” strategy from Gretzel and Fesenmaier(2006).•

Preference conﬁrmation is a strategy when therecommenders ask or rephrase the seeker’s prefer- ence. This strategy is also a part of “transparency”strategy which states that the recommenders dis-close their thinking process of understanding theseekers’ preference.•

Self-modeling is a strategy when the recom-mender becomes a role model to do somethingﬁrst so that the Seeker would follow (Dowrick,1999).•

Credibility happens when the recommendershows expertise and trustworthiness in providinginformation to persuade the seeker (Fogg, 2002;O’Keefe, 2004; Rhoads and Cialdini, 2002). Inour study, a recommender is doing credibilityappeal when they provide factual informationabout movie attributes, such as the plot, actors,or awards that the movie has.

Preference elicitation inquiries include the fol-lowing inquiries that are asked by the recom-menders to know the seekers’ movie tastes.•

Experience inquiry asks for seeker’s experienceon movie watching, such as whether a seeker haswatched a certain movie or not.•

Opinion inquiry asks for seeker’s opinion onmovie-related attributes. Example answers forthis inquiry is the seeker’s explanation on whatthey like about the plot or if they admire theactors’ acting skill.Other kinds of utterances, such as greetings orthanks, fall into non-strategy category. We also la-bel sentences which are recommendation. Recom-mendation is deﬁned as when the recommender ategory

Credibility 2,687 (13.7%)Personal Opinion 2,599 (13.9%)Encouragement 1,975 (10.6%)Similarity 957 (5.1%)Offering Help 953 (5.1%)Preference Conﬁrmation 950 (5.1%)Personal Experience 564 (3%)Self-Modeling 449 (2.4%)

Preference Elicitation Inquiries

Experience Inquiry 1,505 (8.1%)Opinion Inquiry 2,120 (11.3%)

Non-strategy

No Strategy 2,566 (13.7%)Acknowledgment 1,354 (7.2%)Recommendation 2177 (6.1%)

Table 5: Statistics of the number of utterances anno-tated with strategies in I

NSPIRED . suggests a new movie title for the ﬁrst time forthe seeker. 30% of the recommendation sentencesare “experience inquiries”, 27% are “encourage-ment”, and 14% are “personal opinion”. Exam-ple annotated utterances are displayed in Table 4.Meanwhile, Table 5 shows the number of annotatedutterances in I NSPIRED . To ensure annotation quality, we separate our anno-tation study in two steps. First, we hire two expertswith linguistics training to perform annotation, inorder to test the validity of the scheme. The two ex-perts annotated 30 randomly selected conversationsand reached a Kappa agreement of 0.77, suggestingthat our scheme is possible to replicate.Our dataset contains more than 18k utterances,so it’s too costly to hire experts to annotate all ofthem. In the second step, We hire US-based crowd-workers (95% task acceptance) from Amazon Me-chanical Turk for the annotation tasks. In each task,a worker was given a tutorial of the annotation andthen they were given 10 dialogs to annotate. Oneof the dialogs was labeled by experts to calibratethe quality of the worker’s annotation, called asevaluation dialog. Five workers work on the sametask. We ﬁlter out workers whose score is belowthe threshold 0.60 on the evaluation dialog. To set

Figure 3: Distribution of sociable strategies over thedialog turns. Best viewed in color. this threshold in a reasonable value, we conductedthe following study. This time we ran onetask inwhich all the dialogs are already labeled with theexperts including the evaluation dialogs. We foundthat if the workers’ score on the evaluation dialog isabove 0.60, their agreement score with the expert’sannotation on the rest of the dialogs in this task is0.77.These selected high quality crowd-workers an-notate the rest of the dialogs. We still have ﬁveworkers annotate the same dialog. If more thanone worker disagrees on a utterance’s annotation,the experts are then involved to annotate them asquality control. The inter-annotator majority agree-ment among all workers is 0.78 over all dialogs.The annotation scheme for the crowd-workers areprovided in Figure 12 in the Appendix.

As shown in Figure 3, we observe that differentsociable strategies are unequally distributed acrossconversation turns. Most notably, “offering help”and “similarity” often happen at the beginning, in-dicating that recommenders strategically attempt tobuild rapport with seekers at the early stages. Then,“credibility” and “personal opinion” frequently ap-pear in the conversations, as recommenders seek topersuade. Moreover, “encouragement” mostly ap-pears in the middle and at the end of conversations. .2 What Strategies Contribute to SuccessfulRecommendations?

We study the association of sociable strategies andsuccessful recommendations. A recommendationis considered successful if seekers ﬁnish watchinga substantial portion of the recommended movietrailer and rate the trailer with a high score (4 or 5stars). We set a threshold that seekers need to watchat least more than 50% of the video duration sincesome videos have advertisements at the end, etc.On the other hand, a recommendation is consideredunsuccessful if the seekers reject the recommenda-tion (“Reject”) or skip watching the trailer (“AcceptUninterested”). Thus, for our analysis, we use 532successful dialogs and 135 unsuccessful dialogs forour analysis on association of strategies in success-ful recommendations.To analyze the effect of our sociable recommen-dation strategies on success of recommendation,we run a logistic regression model to predict thesuccess of recommendation (1 = successful, 0 =unsuccessful). We use frequency of the strategy ina dialog as the feature value.Table 6 shows the coefﬁcients of each strategywith respect to the recommendation. We observethat “personal opinion”, “similarity”, “encourage-ment”, and “credibility” strategies have a signiﬁ-cant positive effect on successful recommendations.This conﬁrms with the previous studies that moresociable recommenders are more likely to be suc-cessful in the recommendation.“Similarity” strategy has the highest coefﬁcientvalue which suggests that if the recommender isconforming to the seeker’s preference, the seeker ismore likely to favor the recommendation. This alsosupports the theory in O’Keefe (2004) that like-ability helps in recommendation. We also observethat all the preference elicitation inquiries are notsigniﬁcantly contributing to the successful recom-mendation. From this result, we are not saying thatrecommenders need not to query seekers’ prefer-ences since it is crucial to understand their tastes.However, a more sociable approach is necessaryfor a more successful recommendation.

In a recommendation task, a natural question to askis how big a role the recommended product playsin the acceptance of recommendation. If the qualityof the product matters more than how you recom-

Category CoefﬁcientSociable Strategies

Personal Opinion 0.12*Personal Experience 0.05Similarity *Encouragement 0.20**Offering Help 0.03Preference Conﬁrmation 0.05Self-Modeling 0.02Credibility 0.09*

Preference Elicitation

Experience Inquiry − Table 6: Associations between different strategies andsuccessful recommendation. * p < p < mend, it makes more sense to improve the productsrather than the recommendation skills. Therefore,we also analyze if adding movie attributes, such asthe genre, recent release date, and the number oflikes of the movie trailer have an impact on success-ful recommendation along with the eight sociablestrategies and two preference elicitation inquiries.For the popularity, we categorize the top 10%movies in terms of the number of likes to be popularand the rest to be non-popular in our database. Amovie is said to be recent if it is released in 2019or 2020. For the genre, we select the top ﬁve mostpopular genres in the movie database. When wecheck with the recommended movies in I NSPIRED ,96% of recommended movies are covered by thetop ﬁve genres.Results of the analysis between the strategiesand movie attributes are shown in Table 8 in theAppendix. Sociable strategies remain signiﬁcantlycorrelated with successful recommendations. Rec-ommenders who perform “similarity” strategy, ex-press “personal opinion”, and show “encourage-ment” are more likely to successfully recommend amovie ( p < . ). Surprisingly, none of the movieattributes has signiﬁcant effect on successful recom-mendations. A possible reason is that the seekers’movie tastes are so diverse that movie attributessuch as genre do not have a signiﬁcant impact onthe recommendation success. lternating Memory Recommender LMSeeker LM

S_utt t R_utt t s t0 loss(R_utt t , R_utt' t )loss(S_utt t , S_utt't) ...... s tn s t1 r t0 ...... r tn r' t0 r' t1 r t1 r' tn (REC Label) s' t1 s' t0 s' tn (Seek Label) Figure 4: The Seeker’s language model (Seeker LM)and the Recommender’s language model (Recom-mender LM) are separate memory. The Seeker LM in-put at turn t is the seeker’s utterance S utt t consist-ing of a sequence of tokens s t , s t , ...s tn . The Rec-ommender LM input at turn t is the recommender’sutterance R utt t consisting of a sequence of tokens r t , r t , ..., r tn . The < strategy t > prepended as aspecial token. For the baseline, the recommender’s in-put does not contain the strategies. To evaluate how the strategies in I

NSPIRED areuseful in creating a more engaging and persuasiverecommendation dialog, we develop a generativedialog model as our baseline to compare against ourstrategy-incorporated dialog system. We split thedialogs into 801/100/100 for train/validation/testsplit. We use external recommendation systemfrom TMDB with heuristics to select the movies.More details for heuristics and training set-up arein the Appendix. The baseline dialog model uses two separateTransformer-based pretrained language models(Vaswani et al., 2017; Radford et al., 2019; Wuet al., 2019) to learn the recommender’s andseeker’s language models separately in alternatingorder. Both language models are trained to maxi-mize the likelihood of generating ground truth ut-terance on the alternating memory as shown in Fig-ure 4. The model is pretrained on non-task relatedcorpus, WebText, and task-related corpus: recom-mendation dataset from R E D IAL (Li et al., 2018)and movie preference elicitation dataset (Radlin-ski et al., 2019). Then, we ﬁne-tune the model withI

NSPIRED .We replace movie attributes such as titles, actors,and genres with indexed placeholders. It is because in a single conversation, multiple attributes maybe mentioned several times. The replacement withplaceholders improves factual correctness as we re-place them back with the original movie attributeslater. At the end of the sentence, we append theattribute information as below: Original : “If you like La La Land, you shouldalso see Amazing Spiderman with Emma Stone”

With placeholder : “If you like [MOVIE TITLE 0] , you should also see [MOVIE TITLE 1] with [MOVIE P ACTOR 0] ;movies: La La Land (2016), The AmazingSpider-Man (2012); people: Emma Stone”

We prepend the strategy as a special token to theinput utterance so that the model does not only gen-erate sentences but also strategies. Similar methodwas used to control text generation style (Rashkinet al., 2019) as a simple and effective way to incor-porate the strategies. The input to the encoder is asfollows:

Prepend : “ encouragement

If you like [MOVIE TITLE 0] , you should also see [MOVIE TITLE 1] with [MOVIE P ACTOR 0] ; title:La La Land (2016), The Amazing Spider-Man(2012); people: Emma Stone”The model ﬁrst generates ﬁve candidate sen-tences. Then, it randomly selects a generated candi-date that either contains “encouragement” strategyor has the greatest sentence length. In our exper-iment, we have tried various combinations of thetop three strategies (e.g., “encouragement” only,“encouragement” and “similarity”), and it turns outthat “encouragement” only model gave the bestresult. Moreover, the sentence length selection isbased on our intuition when chatting with the sys-tem. This aligns from our ﬁndings, “encourage-ment” is the second most frequently used strategywhen humans make recommendations (§4.1), and“recommendation” is associated positively with suc-cessful recommendation (Table 8) .To decide if a sentence is a recommendation ornot, we train a BERT-based recommendation classi-ﬁer that receives an input of recommender’s currentutterance and seeker’s utterances from previousturn with 95.4% accuracy and 91.2 % F1-score.While the index in the placeholder may become a We tried multi-task learning for utterance generation andstrategy prediction, and the automatic metric result is compa-rable with the prepend method. odel

PPL ↓ BLEU-4 ↑ Baseline 9.28 5.11Strategy 8.93 6.63

Table 7: Results for automatic metrics. proxy to decide whether the system needs to recom-mend a movie or not, it is not strictly supervised.Thus, if a generated sentence is labeled as “rec-ommendation”, we enforce our dialog system torecommend a new movie.

We compare the baseline dialog model withoutstrategy supervision against our dialog model withstrategy supervision. We use both automatic met-rics and human evaluation.For automatic metrics, we compute perplexityand BLEU scores (Papineni et al., 2002), suggest-ing that prepending strategies improves the modelperformance as shown in Table 7. For human eval-uation, twenty-eight participants chat with bothmodels for 2-3 times for a more reliable judgment.We randomize which model they will chat ﬁrst, inorder to avoid exposure bias. After chatting, theyare asked to decide which model is better in theseﬁve aspects: ﬂuency, consistency, naturalness, per-suasiveness, and engagingness. If they are unableto distinguish the dialog systems, they are allowedto choose “can’t tell” option.Results in Figure 5 suggest that human users pre-fer the model with strategy over the baseline in allaspects . It is interesting to see that although thestrategy model is preferred on all metrics, peopleﬁnd the two model differs the most in engaging-ness, followed by naturalness. This supports ourhypothesis that human users will ﬁnd the conversa-tions more engaging and more natural with sociablestrategies incorporated in recommendation dialogsystems. In this work, we have introduced I

NSPIRED , a newrecommendation dialog dataset collected in naturalsetting and annotated with sociable recommenda-tion strategies. We analyze the connection betweendifferent strategies and the recommendation results.Our ﬁndings show that sociable strategies do have apositive impact on the acceptance of recommenda-tion and dialog quality. This work opens up several We also run additional user study with ﬁve-scale ratingson these ﬁve aspects with results in Table 10 in the Appendix P a r t i c i p a n t s can't tell baseline strategy Figure 5: Human evaluation result. “Flu.” standsfor ﬂuency, “cons.”: consistency, “nat.”: naturalness,“pers.”: persuasiveness, and “eng.”: engagingness. directions for future studies in building sociableand personalized recommendation dialog systemsas follows:First, we will explore more ways of utilizingthe strategies, including dynamic strategy selec-tion after decoding. Then, we plan to investigatethe strategy patterns for people with different per-sonalities and movie preferences to make dialogsystem more personalized. Finally, another inter-esting exploration is to extend the model with ajointly trainable movie recommendation and movieinformation modules.

Acknowledgments

We would like to thank members of the NLP labat UC Davis for discussion and participation in thepilot study. We are also grateful for human evalua-tion participants and Mechanical Turk workers oncontributions of building this dataset.

References

Irwin Altman. 1973. Reciprocity of interpersonal ex-change.

Journal for the Theory of Social Behaviour ,3(2):249–261.Timothy Bickmore and Justine Cassell. 2005. Socialdialongue with embodied conversational agents. In

Advances in natural multimodal dialogue systems ,pages 23–54. Springer.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. Bert: Pre-training of deepbidirectional transformers for language understand-ing. In

Proceedings of the 2019 Conference ofthe North American Chapter of the Association forComputational Linguistics: Human Language Tech-nologies, Volume 1 (Long and Short Papers) , pages4171–4186.Peter W Dowrick. 1999. A review of self modeling andelated interventions.

Applied and preventive psy-chology , 8(1):23–39.David Schlangen Fabian Galetzka1, ChukwuemekaU. Eneh. 2020. A corpus of controlled opinionatedand knowledgeable movie discussions for trainingneural conversation models. In

Language Resourcesand Evaluation Conference (LREC) , volume 12.B. J. Fogg. 2002. Persuasive technology: Using com-puters to change what we think and do.

Ubiquity ,2002(December).Soﬁa Gkika and George Lekakos. The persuasive roleof explanations in recommender systems.Mehmet H G¨oker and Cynthia A Thompson. 2000.Personalized conversational case-based recommen-dation. In

European Workshop on Advances in Case-Based Reasoning , pages 99–111. Springer.Lewis R Goldberg. 1993. The structure of phenotypicpersonality traits.

American psychologist , 48(1):26.Ulrike Gretzel and Daniel R. Fesenmaier. 2006. Per-suasion in recommender systems.

InternationalJournal of Electronic Commerce , 11(2):81–100.Katherine Hamilton, Shin-I Shih, and Susan Mo-hammed. 2016. The development and validation ofthe rational and intuitive decision styles scale.

Jour-nal of personality assessment , 98(5):523–535.F. Maxwell Harper and Joseph A. Konstan. 2015. Themovielens datasets: History and context.

ACMTrans. Interact. Intell. Syst. , 5(4):19:1–19:19.Gerald H¨aubl and Kyle B Murray. 2003. Preferenceconstruction and persistence in digital marketplaces:The role of electronic recommendation agents.

Jour-nal of Consumer Psychology , 13(1-2):75–91.Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, andYejin Choi. 2019. The curious case of neural textdegeneration. arXiv preprint arXiv:1904.09751 .Dongyeop Kang, Anusha Balakrishnan, Pararth Shah,Paul Crook, Y-Lan Boureau, and Jason Weston.2019. Recommendation as a communication game:Self-supervised bot-play for goal-oriented dialogue.In

Proceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the9th International Joint Conference on Natural Lan-guage Processing (EMNLP-IJCNLP) , pages 1951–1961, Hong Kong, China. Association for Computa-tional Linguistics.Klaus Krippendorff. 2004. Reliability in contentanalysis: Some common misconceptions and rec-ommendations.

Human communication research ,30(3):411–433.Paul F. Lazarsfeld and Robert King Merton. 1964.Friendship as social process: a substantive andmethodological analysis. pages 18–66, New York.Van Nostrand. Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz,Vincent Michalski, Laurent Charlin, and Chris Pal.2018. Towards deep conversational recommenda-tions. In

Advances in Neural Information Process-ing Systems , pages 9725–9735.Kaihui Liang, Austin Chau, Yu Li, Xueyuan Lu, DianYu, Mingyang Zhou Zhou, Ishan Jain, Sam David-son, Josh Arnold, Minh Nguyen, and Zhou Yu. 2020.Gunrock 2.0: A user adaptive social conversationalsystem. In .Ilya Loshchilov and Frank Hutter. 2019. Decoupledweight decay regularization. In

International Con-ference on Learning Representations .A. H. Miller, W. Feng, A. Fisch, J. Lu, D. Batra, A. Bor-des, D. Parikh, and J. Weston. 2017. Parlai: Adialog research software platform. arXiv preprintarXiv:1705.06476 .Daniel James O’Keefe. 2004. Trends and prospects inpersuasion theory and research. In

Readings in per-suasion, social inﬂuence, and compliance gaining ,pages 31–43. Pearson/Allyn and Bacon.Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic eval-uation of machine translation. In

Proceedings ofthe 40th annual meeting on association for compu-tational linguistics , pages 311–318. Association forComputational Linguistics.Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Mat-suyama, and Justine Cassell. 2019. A model ofsocial explanations for a conversational movie rec-ommendation system. In

Proceedings of the 7thInternational Conference on Human-Agent Interac-tion , HAI ’19, pages 135–143, New York, NY, USA.ACM.Alec Radford, Jeff Wu, Rewon Child, David Luan,Dario Amodei, and Ilya Sutskever. 2019. Languagemodels are unsupervised multitask learners.Filip Radlinski, Krisztian Balog, Bill Byrne, andKarthik Krishnamoorthi. 2019. Coached conver-sational preference elicitation: A case study in un-derstanding movie preferences. In

Proceedings ofthe Annual SIGdial Meeting on Discourse and Dia-logue .Hannah Rashkin, Eric Michael Smith, Margaret Li, andY-Lan Boureau. 2019. Towards empathetic open-domain conversation models: a new benchmark anddataset. In

ACL .Kevin Reschke, Adam Vogel, and Dan Jurafsky. 2013.Generating recommendation dialogs by extractinginformation from user reviews. In

Proceedingsof the 51st Annual Meeting of the Association forComputational Linguistics (Volume 2: Short Papers) ,pages 499–504, Soﬁa, Bulgaria. Association forComputational Linguistics.elton V. Rhoads and Robert B. Cialdini. 2002. Thebusiness of inﬂuence. In

Persuasion handbook: De-velopments in theory and practice , pages 513–542,London, United Kingdom. Sage.Shalom H Schwartz. 2003. A proposal for mea-suring value orientations across nations.

Ques-tionnaire Package of the European Social Survey ,259(290):261.Weiyan Shi and Zhou Yu. 2018. Sentiment adaptiveend-to-end dialog systems. In

Proceedings of the56th Annual Meeting of the Association for Compu-tational Linguistics (Volume 1: Long Papers) , pages1509–1519, Melbourne, Australia. Association forComputational Linguistics.Richard Socher, Alex Perelygin, Jean Wu, JasonChuang, Christopher D Manning, Andrew Y Ng,and Christopher Potts. 2013. Recursive deep mod-els for semantic compositionality over a sentimenttreebank. In

Proceedings of the 2013 conference onempirical methods in natural language processing ,pages 1631–1642.Yueming Sun and Yi Zhang. 2018. Conversational rec-ommender system. In

The 41st International ACMSIGIR Conference on Research & Development inInformation Retrieval , pages 235–244. ACM.Cynthia A Thompson, Mehmet H Goker, and Pat Lang-ley. 2004. A personalized system for conversationalrecommendations.

Journal of Artiﬁcial IntelligenceResearch , 21:393–428.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, undeﬁne-dukasz Kaiser, and Illia Polosukhin. 2017. Attentionis all you need. In

Proceedings of the 31st Interna-tional Conference on Neural Information ProcessingSystems , NIPS’17, page 6000–6010, Red Hook, NY,USA. Curran Associates Inc.Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh,Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Per-suasion for good: Towards a personalized persuasivedialogue system for social good. In

Proceedings ofthe 57th Annual Meeting of the Association for Com-putational Linguistics , pages 5635–5649, Florence,Italy. Association for Computational Linguistics.Qingyang Wu, Yichi Zhang, Yu Li, and Zhou Yu.2019. Alternating recurrent dialog model with large-scale pre-trained language models. arXiv preprintarXiv:1910.03756 .Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, JiansheZhou, and Zhoujun Li. 2017. Building task-orienteddialogue systems for online shopping. In

Thirty-First AAAI Conference on Artiﬁcial Intelligence .Kyung-Hyan Yoo, Ulrike Gretzel, and Markus Zanker.2012.

Persuasive Recommender Systems: Con-ceptual Background and Implications , 1st edition.Springer Publishing Company, Incorporated. Markus Zanker, Marcel Bricman, Sergiu Gordea, Di-etmar Jannach, and Markus Jessenitschnig. 2006.Persuasive online-selling in quality and taste do-mains. In

International Conference on ElectronicCommerce and Web Technologies , pages 51–60.Springer.Saizheng Zhang, Emily Dinan, Jack Urbanek, ArthurSzlam, Douwe Kiela, and Jason Weston. 2018a. Per-sonalizing dialogue agents: I have a dog, do youhave pets too? In

Proceedings of the 56th AnnualMeeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , pages 2204–2213.Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang,and W Bruce Croft. 2018b. Towards conversationalsearch and recommendation: System ask, user re-spond. In

Proceedings of the 27th ACM Interna-tional Conference on Information and KnowledgeManagement , pages 177–186. ACM.Yiheng Zhou, He He, Alan W Black, and YuliaTsvetkov. 2019. A dynamic strategy coach for effec-tive negotiation. In

Proceedings of the 20th AnnualSIGdial Meeting on Discourse and Dialogue , pages367–378, Stockholm, Sweden. Association for Com-putational Linguistics.

Movie Trailer Database Creation

For each movie, we obtain metadata informationfrom Youtube and add other movie attributes, suchas plot, actors, and genre using OMBD API .We enrich the movies from MovieLens datasets(Harper and Konstan, 2015) with more movie trail-ers by searching the movie title and “trailer” onYoutube with a duration restriction of less than 5minutes. We choose the trailers that are shorter than5 minutes so that the crowd-workers do not have tospend a long time on watching them. We used theﬁrst retrieved link of the video under the durationconstraint. We remove movies without a retrievedtrailer from our database. Our motivation to useMovieLens and include more trailers is to link ourmovie database with MovieLens user review, sothat it can be used for future work on building rec-ommendation systems. Category CoefﬁcientSociable Strategies

Personal Opinion 0.12*Personal Experience 0.04Similarity *Encouragement 0.17*Offering Help 0.02Preference Conﬁrmation 0.05Self-Modeling 0.02Credibility 0.07

Preference Elicitation

Experience Inquiry − Movie Attributes

Recently Released 0.17Popularity − Genre

Action − − − Table 8: The associations between different strate-gies, movie attributes, and successful recommenda-tions. * p < omdbapi.com B Heuristics for RecommendationSystem

Our heuristics for the recommendation system tohandle cold-start is as follows. If the seeker nevermentions a movie before and the generated textof the recommender dialog system is labeled as“recommendation”, the most recent movie with thelast mentioned genre will be recommended.If the seeker already mentioned a movie, wewill query the last mentioned movie with positiveor neutral sentiment to the recommendation sys-tem for recommendation. The ﬁrst recommendedmovie from the recommendation system outputwill be chosen by our dialog system. If the moviehas been recommended, we will choose the nextrecommended movie in the recommendation out-put list.To detect which movies are favored by theseekers and movie titles in the sentence, we usethe modules from Liang et al. (2020). The sen-timent classiﬁer is a BERT-based (Devlin et al.,2019) model trained on Stanford Sentiment dataset(Socher et al., 2013). For the movie title detec-tion, the model is a bidirectional LSTM-CRF withcharacter-augmented word embedding for the in-put combined with retrieving similar movie title inthe movie database. The movie database is fromTMDB. The model was trained on speech tran-scripts.To detect movie genre in the sentence, we useregular expression matching for these genres fol-lowing movie information from OMDB in ourdatabase: Action, Animation, Biography, Com-edy, Crime, Drama, Documentary, Fantasy, History,Horror, Mystery, Musical, News, Romance, Sport,Thriller, War, and Western. To detect movie actors,actresses, and directors, we use pattern matchingfor capitalized ﬁrst letter and ﬁnd if the name existsin TMDB search for people.

C Dialog Model

The dialog model p ( d ) of a dialog d with T turnsis deﬁned as follows: p ( d ) = T ∏ t = p s ( s t ∣ s < t , r < t ) p r ( r t ∣ s < t , r < t ) (1)where s is the seeker’s utterance at turn t and r is the recommender’s utterance at turn t , and p s ( s t ∣ s < t , r < t ) is the probability of generating theeeker’s utterance given the history. The conversa-tion history is represented by they query/key/valuefeatures using self-attention. Interested reader canrefer to Wu et al. (2019) for more details. D Training Set-up

We adopt GPT-2 small, which is a 12-head, 12-layer, and 768-hidden size Transformer, with 117Mparameters. We use pre-trained GPT-2 Byte PairEncoding (BPE) tokenizer with the extended vo-cabulary of 50,310 tokens to tokenize texts. The op-timizer is AdamW (Loshchilov and Hutter, 2019),and the number of warm-up steps is 100. The learn-ing rate is set to × − , and the dropout rate isset to 0.1. All experiments are run with an NVIDIAGeForce GTX 1080 Ti GPU.The movie information in the input data, suchas actress/actor’s name, movie genre and movieplot, is delexicalized as special tokens. The realinformation (genre, movie title, etc.) is appendedto the utterance. In addition, the strategy label isalso treated as special tokens.We leverage the ReDial (Li et al., 2018) andmovie preference elicitation datasets (Radlinskiet al., 2019) to conduct task-related pretraining.It takes around 1.37 hour to ﬁnish one epoch inpretraining the model.As for the training on INSPIRED dataset, it takesaround 16 minutes to ﬁnish one epoch. We trainthe model until it converges. The baseline modelusually converges after the second epoch while thestrategy-incorporated model after the third epoch.During the inference stage, we combined top-kbased sampling and top-p based sampling (NucleusSampling method (Holtzman et al., 2019)). Wekeep the highest probability tokens whose cumula-tive probability mass exceeds the threshold p . Wemanually tuned the threshold of temperature, p and k to make both model achieve their best perfor-mance.The temperature is set as . for baseline and0.8 for the strategy-incorporated model. For bothmodel, the threshold of k is set as and theupperbound of p is set as . . We manually tunethe hyperparameters.For the strategy-incorporated model, the strategyis generated ﬁrst and the utterance is then generatedconditioned on the strategy. Although it is a looseconstraint, the model learned categorical strategicpatterns. For completeness, we also provide valida-tion perplexity and BLEU-4 score in Table 9 Model Test Valid

PPL BLEU-4 PPL BLEU-4Baseline 9.28 5.11 9.21 5.09Strategy 8.93 6.63 8.90 7.55

Table 9: Results for automatic metrics in both valida-tion and test data.

E Additional User Study

In addition to the comparison study done by humanusers mentioned in §6.3, we conduct another userstudy which asks each participant to rate from 1(worst) to 5 (best) for the same ﬁve aspects: ﬂu-ency, consistency, naturalness, persuasiveness, andengagingness. For each model, 25 participants chatinteractively with it (in total: 50 users). Unlike theuser study in §6.3 where 1 user interacts with bothmodels, this time a user interacts with one modelsince users do not need to compare. These partic-ipants are different from the ones reported in thecomparison user study (§6.3). From Table 10, wecan see that the strategy model has higher ratingsthan the baseline model in all aspects.

F Example Human-Human Dialogs inI

NSPIRED

We include 2 annotated examples of human-humandialogs in Table 11 and 12.

G Example Human-System Dialogs

We include example dialog of human seeker and thebaseline model in Table 13 and an example strategy-incorporated dialog model in Table 14 from userstudy. In the user study of evaluating the dialog sys-tem, we do not set a minimum turn for the humanuser.

H Dialog Collection and AnnotationInterfaces

Figure 6, 7, 9, 11 show dialog collection interface.Figure 12 and 13 are dialog annotation interfacesfor the crowd-workers. odel Fluency Consistency Naturalness Persuasiveness Engagingness

Baseline 3.84 3.80 3.20 3.24 3.56Strategy

Human Speaker* - - 4.17 3.97 4.20

Table 10: Average score for human ratings on a 5-point Likert scale. Note that the human-human dialogues werecollected before the user study and we did not measure ﬂuency and consistency for human recommender.

Conversation 1 Label

REC: Hi There!

No Strategy

What types of movies do you like to watch?

Offering Help

SEEK: Hello! I’m more of an action movie or a good romance and mystery movie.REC: I just saw the trailer for Knives Out when I went to see Joker and it looked like

Self-Modeling a good mix of action and mystery!SEEK: I seen that one too as I seen Joker about a month ago.I thought about asking my ﬁance about going and seeing it.REC: It looks like a good movie for people who like many different movies.

Personal Opinion

It also has a great cast!

Personal Opinion

I was surprised to see Chris Evans in the trailer!

Personal Opinion

SEEK: Maybe with Chris Evans in it it’ll be easier to convince my ﬁance to see it.Do you know who else is in the cast?REC: Daniel Craig and Jamie Lee Curtis are also in the cast.

Credibility

Daniel Craig does a lot of 007 so deﬁnitely a good hearthrob role

Encouragement to convince the misses lol!SEEK: I am the misses lol. But he loves the bond movies so that should be a goodincentive for him to go see it. Do you have any other recommendations?REC: The new Star Wars comes out in less than a month, if you are into the franchise.SEEK: He is, I think he told me we’re getting it when it comes out to add to our moviecollection.REC: Well that is another great action movie.

Encouragement

I also recommend the John Wick series

Encouragement

SEEK: I haven’t seen any of that series. Could you tell me what the general plot is > REC: John Wick is a former member of a gang, he was basically an assassin.

Credibility

He falls in love and quits the game, but then his wife dies, and someone comes in

Credibility and kills his dog.He then goes on a revenge rampage against the people who broke into his house.

Credibility

I have yet to watch the 3rd one but the action scenes were really cool!

Personal Opinion

SEEK: Oh I’d deﬁnitely would cry at the dogs death.REC: It is really sad!

Similarity the dog was a last gift from his dying wife which makes it so much worse

Personal Opinion

SEEK: I couldn’t even ﬁnish I am legend because of the dog dying.Anything with animal death makes me ball like a baby.REC: Marley & Me had me crying for a good half hour so I completely understand that!

Similarity

SEEK: I avoided that movie because someone told me he passed away.My ﬁance took me to see jurrasic world as our ﬁrst date and I cried at thedinosuars dying.REC: I would deﬁnitely avoid that movie if animal deaths make you said.

Similarity

Oh that is so cute though!

No Strategy

SEEK: Yeah, he had to calm me down for about an hour and bought me ice cream toapologize for it.REC: Aww that is so sweet.

No Strategy

Given that you dont want to see animals die, and you are looking for an

Pref. Confirmation

Action / Mystery, I think Knives Out would be a good movie choice.Do you agree?

Pref. Confirmation

SEEK: I do agree with that. When it comes out i’ll bring it up for date night. Thank you!!

Table 11: Example 1 for labeled human-human dialog. REC refers Recommender and SEEK to Seeker. onversation 2 Label

REC: Hi there.

No Strategy

How are you?

No Strategy

SEEK: I’m doing well. I was wondering if you had any movie recommendations.REC: Let me see.

Offering Help

Well, what kind of movies do you usually ﬁnd yourself watching?

Opinion Inquiry

What do you like?

Opinion Inquiry

SEEK: I tend to like action movies that also involve comedy.REC: Oh, me too.

Similarity

So, what did you see last that you liked?

Experience Inquiry

SEEK: I really liked Avengers: Endgame because I liked the pacing andthe effects were great.REC: Okay, I think I kind of understand your taste a little.

Pref. Confirmation

It’s sort of similar to mine, although I haven’t seen an action

Similarity hero movie in a while.Did you like Guardians of the Galaxy?

Opinion Inquiry

SEEK: No, I haven’t, but I heard good things about it.REC: Oh, I was assuming that you’d seen it.

Offering Help

I saw that with my son and I was surprised at how much I liked it.

Personal Experience

SEEK: Were you able to see the sequel for it?REC: No, but I would like to.

Self-Modeling

I would really recommend that.

Encouragement

I don’t know if you know who Chris Pratt is, but he’s a keeper.

Personal Opinion

He was on Parks & Rec, the comedy show.

Credibility

SEEK: I love Chris Pratt! I accept your recommendation. Thank you so much.REC: :) oh, I’m so happy.

Acknowledgment

Cool.

Acknowledgment

I hope you like it.

Encouragement

SEEK: Do you have any other recommendations for action movies?REC: Hmmm...I don’t know if I can think of any others that are funny like that.

No Strategy

I know there are some though.

No Strategy

SEEK: Oh, I just remembered. I really liked Deadpool because it did a great job ofmashing comedy and action together.REC: My 20 year old son liked that.

Personal Experience

I have only seen a couple parts.

Personal Experience

You’ve no doubt seen the MIB series, but those were kind of good as well.

Personal Opinion

SEEK: Can you tell me what you like about MIB?I like Will Smith, but I haven’t see any of the movies.REC: Oh, yeah.

Acknowledgment

They’re telling us to quit the hit, but real quick.

No Strategy

He’s funny and creative in it and he’s great when paired with Tommy Lee Jones.

Personal Opinion

It’s just creative and makes you laugh unexpectedly.

Personal Opinion

Anyway, that’s all I got.

No Strategy

It was nice talking to you.

No Strategy

SEEK: It was great talking to you. I’ll be sure to check out your recommendations.

Table 12: Example 2 for labeled human-human dialog. REC refers Recommender and SEEK to Seeker. uman Conversation with Baseline Dialog Model

REC SYS: Hello there!SEEK: hello, how are youREC SYS: I’m doing good!SEEK: that’s nice to hearREC SYS: I’m here to help you look for a movie trailer.SEEK: really, do you have a recommendation?REC SYS: Do you know what kind of movie trailers you’d like to see?SEEK: I like super hero moviesREC SYS: Do you know what kind of movies you’d like to see?SEEK: i like action moviesREC SYS: I do too.SEEK: that’s pretty awesome, I am glad we have something in commonREC SYS: Have you seen the movie Avengers: Endgame?SEEK: yesREC SYS: What about Avengers: Endgame?SEEK: yes I have seen itREC SYS: Have you seen the movie Assassin’s Creed?SEEK: no i haven’tREC SYS: Did you enjoy it?SEEK: i don’t know because I haven’t seen itREC SYS: If you liked it, you’ll love the movie Assassin’s Creed.SEEK: hm, that’s great.REC SYS: Great!

Table 13: Example dialog of human-system. REC SYS refers baseline recommendation dialog system and SEEKto human Seeker.

Human Conversation with Strategy-Incorporated Dialog Model