INSPIRED: Toward Sociable Recommendation Dialog Systems
Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, Zhou Yu
II NSPIRED : Toward Sociable Recommendation Dialog Systems
Shirley Anugrah Hayati ♡ Dongyeop Kang ☆ Qingxiaoyang Zhu ♡ Weiyan Shi ♡ Zhou Yu ♡♡ Department of Computer Science, University of California, Davis ☆ University of California, Berkeley { sahayati, qinzhu, wyshi, joyu } @[email protected] Abstract
In recommendation dialogs, humans com-monly disclose their preference and make rec-ommendations in a friendly manner. However,this is a challenge in developing a sociable rec-ommendation dialog system, due to the lackof dialog dataset annotated with such sociablestrategies. Therefore, we present I
NSPIRED , anew dataset of 1,001 human-human dialogsfor movie recommendation with measures forsuccessful recommendations. To better under-stand how humans make recommendationsin communication, we design an annotationscheme related to recommendation strategiesbased on social science theories and annotatethese dialogs. Our analysis shows that sociablerecommendation strategies, such as sharingpersonal opinions or communicating with en-couragement, more frequently lead to success-ful recommendations. Based on our dataset,we train end-to-end recommendation dialogsystems with and without our strategy labels.In both automatic and human evaluation, ourmodel with strategy incorporation outperformsthe baseline model. This work is a first step forbuilding sociable recommendation dialog sys-tems with a basis of social science theories . Sociable conversational agents build rapport withusers, in order to gain trust and favor from them. So-cial science researchers believe that the rapport in-fluence a more persuasive recommendation to suc-cessfully suggest an item that satisfies user needs(Yoo et al., 2012; Gkika and Lekakos; Pecune et al.,2019; Gretzel and Fesenmaier, 2006).However, existing works on recommendation di-alog systems lack a study about communicationstrategies used by human speakers for making suc-cessful and persuasive recommendations. They col- Dataset and code are available at https://github.com/sweetpeach/Inspired
REC: Hi! Happy Thanksgiving! I'm here to help you find a trailer!
SEEK: Happy Thanksgiving! My favorite movie is finding Nemo I really like it
REC: Awesome! So do you like Disney movies in general?
SEEK: Yup they are so colorful and full of life!
REC: Yeah, I love Disney too! I have Disney + and watch it everyday haha. Have you seen the new Lady and the Tramp? I find it relatable to my dog!
SEEK: Lol that’s good enough! Never heard of that one! what is it about?
REC: It's about a dog named Lady who runs away with a stray named Tramp out of jealousy . .What do you think?
SEEK: Woo sounds good! I definitely want to see this. Thank you!
REC: No problem! Hope you enjoy it as I did!
OFFERING HELPENCOURAGEMENTPERSONAL EXPERIENCEPREFERENCE CONFIRMATIONPERSONAL OPINIONSIMILARITYCREDIBILITY EXPERIENCE INQUIRYOPINION INQUIRY
Figure 1: An example snippet of human-human recom-mendation dialog in I
NSPIRED . REC refers a personwho recommends a movie and SEEK refers a personwho looks for a recommendation. Above each recom-mender’s utterance is the recommendation strategy an-notated by human workers. Best seen in colors. lect the dataset in scenario-based settings or convertproduct review datasets into question-answeringconversations (Reschke et al., 2013; Yan et al.,2017; Sun and Zhang, 2018; Kang et al., 2019;Li et al., 2018). Common issues with these types ofdatasets are: (1) homologous utterances, (2) mostlyquestion-answering pairs, and (3) lack of user en-gagement.In this work, we aim to validate whether sociablerecommendation strategies are effective for makinga successful recommendation in a dialog. To do so, a r X i v : . [ c s . C L ] O c t ataset I NSPIRED C ONV R EC G O R EC D IAL R E D IAL (Sun and Zhang, 2018) (Kang et al., 2019) (Li et al., 2018)Naturalness (cid:51) (cid:55) (cid:55) (cid:51)
Sociable Strategies (cid:51) (cid:55) (cid:55) (cid:55)
Movie Information (cid:51) (cid:55) (cid:51) (cid:55)
Conversation Types Mixed QA Mixed Mixed
Table 1: Comparison of related recommendation dialog datasets. “QA” refers to question-answer pairs. “Mixed” in-dicates that the conversations contain both statements and question-answer pairs. C
ONV R EC collected 385 human-curated dialogs, but only released 875,721 simulated dialogs. we propose I NSPIRED , a recommendation dialogdataset of two-paired crowd-workers in a naturalsetting, with additional annotations for sociablerecommendation strategies. The dataset consists of1,001 dialogs, and each utterance is manually an-notated with the sociable strategies based on socialscience theory. To encourage more natural dialogflow, we do not set any restrictions on the numberof movies or the type of movies to recommend. Fig-ure 1 shows an example of annotated dialog. Moreexamples are in Table 11 and 12 in the Appendix.Our analyses show that sociable recommenda-tion strategies are correlated with successful rec-ommendation in dialogs. These insights motivateus to build a more sociable recommendation dialogsystem to achieve better persuasion outcomes.For extrinsic evaluation, we build two end-to-enddialog systems trained on the I
NSPIRED dataset:one is encoded with recommendation strategies andthe other is not. We find that the model encodedwith our strategy annotations performs better inboth automatic and human evaluation.We believe that enriching the intersection be-tween social science and computational linguisticsin I
NSPIRED opens plenty of rooms for future stud-ies on sociable recommendation dialog.
Social science theories on recommendation.
Psychological researchers believe that interactionswith recommendation systems should not only beseen from a technical perspective but should alsobe examined from a social and emotional perspec-tive (Zanker et al., 2006). Yoo et al. (2012) proposethat credibility, likeability, friendliness, humor, andother language styles are significant factors for per-suasive recommendations. Pecune et al. (2019) hasstudied modeling social explanation for movie rec- ommendation, such as personal opinion and per-sonal experience. H¨aubl and Murray (2003) findthat more information on recommendation mayhelp consumers make better purchase decisions,but leave them overwhelmed with the abundant in-formation. Inspired by these theories, we borrowsuch principles in the design of our sociable recom-mendation strategies.
Conversational recommendation systems.
While studies on conversational recommendationsystems have been done, none of them focuson the sociable recommendation strategies forpersuasive outcome. This is is due to the lack ofexisting datasets for studying effective strategiesin recommendation dialog. Table 1 comparesdifferent factors across the recommendation dialogdatasets including I
NSPIRED .Prior works on recommendation dialogs collectdata based on template-based question-answeringpairs from user reviews (Thompson et al., 2004;Reschke et al., 2013; Sun and Zhang, 2018; Zhanget al., 2018b). These datasets contain structuredutterances where the recommender continuouslyasks for the seeker’s product preference.Kang et al. (2019) collected goal-driven recom-mendation dialogs (G O R EC D IAL ) in a gamifiedsetting where both speakers are given a small setof movies with descriptions to find the best recom-mendation. This role-play game setting may noteffectively reflect the real-world situation since theseeker pretends that they like the given movies.The most similar work to ours is Li et al. (2018)’sR E D IAL dataset which consists of chit-chats formovie recommendation. However, the recommen-dations are conditioned on the movies mentionedin the dialog, and not directly on the language us-age. Also, they tend to mention only movie namesrather than an in-depth discussion on the movie igure 2: Movie search interface for recommenders. preference.Our work is also closely related to Radlinskiet al. (2019) on movie preference elicitation andFabian Galetzka1 (2020) on movie discussion inthe dialog setting. Preference elicitation is an im-portant step for the human recommender to com-prehend seeker’s taste before recommendation, butthese datasets are not recommendation conversa-tions.Meanwhile, dialogs in I
NSPIRED have bothstages: preference elicitation and recommendation.I
NSPIRED also captures sociable recommendationstrategies in conversations and measures recom-mendation with ratings.
Sociability in dialog systems.
In human-humanconversations, people engage in a talk that does notonly contain task-oriented topics (Bickmore andCassell, 2005). Thus, sociability has raised moreattention in dialog systems as they become moresociable, engaging, and user-adaptive (Zhang et al.,2018a; Shi and Yu, 2018; G¨oker and Thompson,2000).Zhang et al. (2018a) proposed a chit-chat datasetand presented the task of more personalized dialogssystem conditioned on user profile information. So-ciability leads to a more persuasive conversation(Yoo et al., 2012), so social skills are essential fordialog systems to make successful recommenda-tions.Communication strategies on specific tasks, suchas donation and product price negotiation, havebeen found useful for task completion (Wang et al.,2019; Zhou et al., 2019). In this work, we connectdifferent sociable strategies with recommendationin dialog settings and show that sociable strategieshave a positive impact on recommendation success.
To ensure that the recommended movie has trailersand metadata information, we curate a databasewith all movie trailers from Movieclips Trailer released between 2008 and 2020, and movies fromMovieLens dataset (Harper and Konstan, 2015).In total, we have 17,869 movies with trailers andmetadata information. We design a simple moviesearch interface (Figure 2) to assist recommendersin searching for a movie. We recruit crowd-workers from Amazon Mechan-ical Turk. In each conversation, two workers arerandomly paired and assigned different roles: oneas a recommender and another as a seeker. Our col-lection set-up is more realistic compared to priorworks as (1) recommenders have no limitations ofthe number of movies to recommend, (2) seekersaccept or reject a movie following their true prefer-ence, and (3) we record if seekers actually watchthe video trailer or not.
Recommender.
Recommenders’ task is to rec-ommend a movie successfully to the seeker. Beforechatting, we show them tips for sociable recommen-dation strategies with example utterances. Thenthey chat with the seekers in two phases: user in-formation gathering and movie recommendation.In the user information gathering phase, recom-menders are asked to understand the seekers’ movietastes. In the recommendation phase, the recom-menders can still request seekers’ preference whilebrowsing movies to recommend. We encourage therecommenders to continue the conversation untilseekers accept a movie.
Seeker.
Seekers are asked to talk about movierecommendations without any strategy support. Af-ter they complete the conversation, seekers canopt to accept or reject the provided movie recom-mendations. If the seekers accept the recommenda-tion, they can watch the entire recommended movietrailer or part of it, or simply skip it after the con-versation. We record how long seekers watched therecommended movie trailer and ask them to ratethe trailer on 5-Likert scale in the post-task survey. youtube.com/user/movieclipsTRAILERS ataset Statistics Recommender’s Statistics
Seeker’s Statistics
Table 2: I
NSPIRED ’s statistics.
We use ParlAI platform (Miller et al., 2017) andhire 1,594 US crowd-workers from Amazon Me-chanical Turk with a minimum of 90% task ac-ceptance rate. The dialog collection process lastedfrom November 2019 to March 2020.Workers first fill out questionnaires related totheir personality traits and values before their con-versations. The questionnaire consists of three per-sonality trait models: the Big Five personality traits(15 questions) (Goldberg, 1993), the Schwartz Por-trait Value (10 questions) (Schwartz, 2003), andthe Decision Making Style (2 questions) (Hamiltonet al., 2016) . Then, recommenders start the conver-sation and both workers should chat for a minimumof 10 turns or until a recommendation is made. Af-ter the conversation ends, both workers will answera post-task survey of demographic questions suchas age, and gender. Seekers are asked to rate thetrailer with a high score (4 or 5 stars) on a 5-Likertscale and provide the reason of why they rejector do not finish watching the video. Both workersreceive a bonus up to $2 if they complete the entireprocess in addition to the base pay of $0.5.Table 2 presents statistics of the collecteddataset . Even though our dataset has relativelysmall number of samples compared to R E D IAL orG O R EC D IAL , it has human annotations on eachsociable strategy. Moreover, our dataset can be We also release this personality information in our datasetfor future work Dialog collection interfaces are in appendix H in Ap-pendix
Cases
Accept (Rating 4-5) 532 (53.1%)Accept (Rating 3 or lower) 45 (4.5%)Accept (Other Reasons) 289 (28.9%)Accept Uninterested 123 (12.3%)Reject 12 (1.2%)
Table 3: Statistics of dialogs when the seekers accept orreject the final recommended movie. “Accept (Rating 4-5)” means that the seekers accept the recommendationand give rating 4 or 5, and the same is for “Accept (Rat-ing 3 or lower)”. “Accept (Other Reasons)” suggeststhat the seeker gives other reasons for not finishing thevideo. “Accept Uninterested” indicates that the seekersaccept the recommendation, do not finish watching thevideo, and explains in the post-task survey that they arenot interested in the recommended video. used in combination with other datasets in a semi-supervised setting, as shown in our implementationof recommendation dialog systems in §6.The statistics of accept and reject cases areshown in Table 3. We have higher number of suc-cessful cases (79.7%) compared to failure cases.This shows that people tend to accept recommen-dations, and it is not surprising since watching avideo trailer is an entertaining, low-risk activity.For training the dialog model, we use every dialogfrom all cases so that the dialog system will be ableto respond to diverse responses.
After conversations are collected, two experts,trained with linguistics background, develop anannotation scheme using content analysis method(Krippendorff, 2004) and from past study on hu-man behavior in making recommendations. Similarapproaches have been done in prior studies on workfor persuasion task (Wang et al., 2019) or negotia-tion task (Zhou et al., 2019). We divide the recom-mendation strategies into two categories: sociablestrategies and preference elicitation strategies. So-ciable strategies are also derived from our literaturestudy on the social science theories.
Sociable strategies contain eight strategies re-lated to the recommendation task. These strategiesrelate to the recommenders trying to build rapportwith the seekers.•
Personal opinion refers to a condition whenrecommenders express their subjective opinion ategory Example P ERSONAL O PINION “I really like Disney’s more recent princesses”P
ERSONAL E XPERIENCE “I have Disney+ and watched it everyday!”S
IMILARITY “Oh, I love Disney as well.”E
NCOURAGEMENT “You should definitely watch it!”O
FFERING H ELP “I’m here to help you find a trailer!”P
REFERENCE C ONFIRMATION “So do you like Disney movies in general?”C
REDIBILITY “It’s about a dog named Lady who runs away with a stray namedTramp”S
ELF -M ODELING “We are planning to go see Maleficent, we heard it was a very goodmovie.”E
XPERIENCE I NQUIRY “Have you seen the new Lady and the Tramp?”O
PINION I NQUIRY “What do you like about the Avengers: End-game?”R
ECOMMENDATION “You should check out Shazam!”
Table 4: Example utterances for each strategy. about a movie, including its plot, actors, or othermovie attributes.•
Personal experience refers to the use of sharingpersonal experience related to a movie. For exam-ple, recommenders may say that they watch themovie several times to convince the seekers thatthe movie is good. Both personal opinion andpersonal experience are part of self-disclosurethat leads to establishing rapport with the seekers(Altman, 1973).•
Similarity refers to a condition when the recom-menders are empathizing and being like-mindedtoward seekers about their movie preference toproduce similarity among them. Similarity is be-lieved to influence the seekers’ liking for thesource that leads to trust the recommenders’ judg-ment more (O’Keefe, 2004), following Lazars-feld and Merton (1964)’s homophily theory thatstates humans like other people who are similarto them.•
Encouragement is the use of praise of the seek-ers’ movie taste and encouragement to watch arecommended movie to build rapport and pro-mote the recommended movie.•
Offering help is a strategy when the recom-menders disclose explicit intention to help theseeker or being transparent. It is a part of “trans-parency” strategy from Gretzel and Fesenmaier(2006).•
Preference confirmation is a strategy when therecommenders ask or rephrase the seeker’s prefer- ence. This strategy is also a part of “transparency”strategy which states that the recommenders dis-close their thinking process of understanding theseekers’ preference.•
Self-modeling is a strategy when the recom-mender becomes a role model to do somethingfirst so that the Seeker would follow (Dowrick,1999).•
Credibility happens when the recommendershows expertise and trustworthiness in providinginformation to persuade the seeker (Fogg, 2002;O’Keefe, 2004; Rhoads and Cialdini, 2002). Inour study, a recommender is doing credibilityappeal when they provide factual informationabout movie attributes, such as the plot, actors,or awards that the movie has.
Preference elicitation inquiries include the fol-lowing inquiries that are asked by the recom-menders to know the seekers’ movie tastes.•
Experience inquiry asks for seeker’s experienceon movie watching, such as whether a seeker haswatched a certain movie or not.•
Opinion inquiry asks for seeker’s opinion onmovie-related attributes. Example answers forthis inquiry is the seeker’s explanation on whatthey like about the plot or if they admire theactors’ acting skill.Other kinds of utterances, such as greetings orthanks, fall into non-strategy category. We also la-bel sentences which are recommendation. Recom-mendation is defined as when the recommender ategory
Credibility 2,687 (13.7%)Personal Opinion 2,599 (13.9%)Encouragement 1,975 (10.6%)Similarity 957 (5.1%)Offering Help 953 (5.1%)Preference Confirmation 950 (5.1%)Personal Experience 564 (3%)Self-Modeling 449 (2.4%)
Preference Elicitation Inquiries
Experience Inquiry 1,505 (8.1%)Opinion Inquiry 2,120 (11.3%)
Non-strategy
No Strategy 2,566 (13.7%)Acknowledgment 1,354 (7.2%)Recommendation 2177 (6.1%)
Table 5: Statistics of the number of utterances anno-tated with strategies in I
NSPIRED . suggests a new movie title for the first time forthe seeker. 30% of the recommendation sentencesare “experience inquiries”, 27% are “encourage-ment”, and 14% are “personal opinion”. Exam-ple annotated utterances are displayed in Table 4.Meanwhile, Table 5 shows the number of annotatedutterances in I NSPIRED . To ensure annotation quality, we separate our anno-tation study in two steps. First, we hire two expertswith linguistics training to perform annotation, inorder to test the validity of the scheme. The two ex-perts annotated 30 randomly selected conversationsand reached a Kappa agreement of 0.77, suggestingthat our scheme is possible to replicate.Our dataset contains more than 18k utterances,so it’s too costly to hire experts to annotate all ofthem. In the second step, We hire US-based crowd-workers (95% task acceptance) from Amazon Me-chanical Turk for the annotation tasks. In each task,a worker was given a tutorial of the annotation andthen they were given 10 dialogs to annotate. Oneof the dialogs was labeled by experts to calibratethe quality of the worker’s annotation, called asevaluation dialog. Five workers work on the sametask. We filter out workers whose score is belowthe threshold 0.60 on the evaluation dialog. To set
Figure 3: Distribution of sociable strategies over thedialog turns. Best viewed in color. this threshold in a reasonable value, we conductedthe following study. This time we ran onetask inwhich all the dialogs are already labeled with theexperts including the evaluation dialogs. We foundthat if the workers’ score on the evaluation dialog isabove 0.60, their agreement score with the expert’sannotation on the rest of the dialogs in this task is0.77.These selected high quality crowd-workers an-notate the rest of the dialogs. We still have fiveworkers annotate the same dialog. If more thanone worker disagrees on a utterance’s annotation,the experts are then involved to annotate them asquality control. The inter-annotator majority agree-ment among all workers is 0.78 over all dialogs.The annotation scheme for the crowd-workers areprovided in Figure 12 in the Appendix.
As shown in Figure 3, we observe that differentsociable strategies are unequally distributed acrossconversation turns. Most notably, “offering help”and “similarity” often happen at the beginning, in-dicating that recommenders strategically attempt tobuild rapport with seekers at the early stages. Then,“credibility” and “personal opinion” frequently ap-pear in the conversations, as recommenders seek topersuade. Moreover, “encouragement” mostly ap-pears in the middle and at the end of conversations. .2 What Strategies Contribute to SuccessfulRecommendations?
We study the association of sociable strategies andsuccessful recommendations. A recommendationis considered successful if seekers finish watchinga substantial portion of the recommended movietrailer and rate the trailer with a high score (4 or 5stars). We set a threshold that seekers need to watchat least more than 50% of the video duration sincesome videos have advertisements at the end, etc.On the other hand, a recommendation is consideredunsuccessful if the seekers reject the recommenda-tion (“Reject”) or skip watching the trailer (“AcceptUninterested”). Thus, for our analysis, we use 532successful dialogs and 135 unsuccessful dialogs forour analysis on association of strategies in success-ful recommendations.To analyze the effect of our sociable recommen-dation strategies on success of recommendation,we run a logistic regression model to predict thesuccess of recommendation (1 = successful, 0 =unsuccessful). We use frequency of the strategy ina dialog as the feature value.Table 6 shows the coefficients of each strategywith respect to the recommendation. We observethat “personal opinion”, “similarity”, “encourage-ment”, and “credibility” strategies have a signifi-cant positive effect on successful recommendations.This confirms with the previous studies that moresociable recommenders are more likely to be suc-cessful in the recommendation.“Similarity” strategy has the highest coefficientvalue which suggests that if the recommender isconforming to the seeker’s preference, the seeker ismore likely to favor the recommendation. This alsosupports the theory in O’Keefe (2004) that like-ability helps in recommendation. We also observethat all the preference elicitation inquiries are notsignificantly contributing to the successful recom-mendation. From this result, we are not saying thatrecommenders need not to query seekers’ prefer-ences since it is crucial to understand their tastes.However, a more sociable approach is necessaryfor a more successful recommendation.
In a recommendation task, a natural question to askis how big a role the recommended product playsin the acceptance of recommendation. If the qualityof the product matters more than how you recom-
Category CoefficientSociable Strategies
Personal Opinion 0.12*Personal Experience 0.05Similarity *Encouragement 0.20**Offering Help 0.03Preference Confirmation 0.05Self-Modeling 0.02Credibility 0.09*
Preference Elicitation
Experience Inquiry − Table 6: Associations between different strategies andsuccessful recommendation. * p < p < mend, it makes more sense to improve the productsrather than the recommendation skills. Therefore,we also analyze if adding movie attributes, such asthe genre, recent release date, and the number oflikes of the movie trailer have an impact on success-ful recommendation along with the eight sociablestrategies and two preference elicitation inquiries.For the popularity, we categorize the top 10%movies in terms of the number of likes to be popularand the rest to be non-popular in our database. Amovie is said to be recent if it is released in 2019or 2020. For the genre, we select the top five mostpopular genres in the movie database. When wecheck with the recommended movies in I NSPIRED ,96% of recommended movies are covered by thetop five genres.Results of the analysis between the strategiesand movie attributes are shown in Table 8 in theAppendix. Sociable strategies remain significantlycorrelated with successful recommendations. Rec-ommenders who perform “similarity” strategy, ex-press “personal opinion”, and show “encourage-ment” are more likely to successfully recommend amovie ( p < . ). Surprisingly, none of the movieattributes has significant effect on successful recom-mendations. A possible reason is that the seekers’movie tastes are so diverse that movie attributessuch as genre do not have a significant impact onthe recommendation success. lternating Memory Recommender LMSeeker LM
S_utt t
NSPIRED areuseful in creating a more engaging and persuasiverecommendation dialog, we develop a generativedialog model as our baseline to compare against ourstrategy-incorporated dialog system. We split thedialogs into 801/100/100 for train/validation/testsplit. We use external recommendation systemfrom TMDB with heuristics to select the movies.More details for heuristics and training set-up arein the Appendix. The baseline dialog model uses two separateTransformer-based pretrained language models(Vaswani et al., 2017; Radford et al., 2019; Wuet al., 2019) to learn the recommender’s andseeker’s language models separately in alternatingorder. Both language models are trained to maxi-mize the likelihood of generating ground truth ut-terance on the alternating memory as shown in Fig-ure 4. The model is pretrained on non-task relatedcorpus, WebText, and task-related corpus: recom-mendation dataset from R E D IAL (Li et al., 2018)and movie preference elicitation dataset (Radlin-ski et al., 2019). Then, we fine-tune the model withI
NSPIRED .We replace movie attributes such as titles, actors,and genres with indexed placeholders. It is because in a single conversation, multiple attributes maybe mentioned several times. The replacement withplaceholders improves factual correctness as we re-place them back with the original movie attributeslater. At the end of the sentence, we append theattribute information as below: Original : “If you like La La Land, you shouldalso see Amazing Spiderman with Emma Stone”
With placeholder : “If you like [MOVIE TITLE 0] , you should also see [MOVIE TITLE 1] with [MOVIE P ACTOR 0] ;movies: La La Land (2016), The AmazingSpider-Man (2012); people: Emma Stone”
We prepend the strategy as a special token to theinput utterance so that the model does not only gen-erate sentences but also strategies. Similar methodwas used to control text generation style (Rashkinet al., 2019) as a simple and effective way to incor-porate the strategies. The input to the encoder is asfollows:
Prepend : “ encouragement
If you like [MOVIE TITLE 0] , you should also see [MOVIE TITLE 1] with [MOVIE P ACTOR 0] ; title:La La Land (2016), The Amazing Spider-Man(2012); people: Emma Stone”The model first generates five candidate sen-tences. Then, it randomly selects a generated candi-date that either contains “encouragement” strategyor has the greatest sentence length. In our exper-iment, we have tried various combinations of thetop three strategies (e.g., “encouragement” only,“encouragement” and “similarity”), and it turns outthat “encouragement” only model gave the bestresult. Moreover, the sentence length selection isbased on our intuition when chatting with the sys-tem. This aligns from our findings, “encourage-ment” is the second most frequently used strategywhen humans make recommendations (§4.1), and“recommendation” is associated positively with suc-cessful recommendation (Table 8) .To decide if a sentence is a recommendation ornot, we train a BERT-based recommendation classi-fier that receives an input of recommender’s currentutterance and seeker’s utterances from previousturn with 95.4% accuracy and 91.2 % F1-score.While the index in the placeholder may become a We tried multi-task learning for utterance generation andstrategy prediction, and the automatic metric result is compa-rable with the prepend method. odel
PPL ↓ BLEU-4 ↑ Baseline 9.28 5.11Strategy 8.93 6.63
Table 7: Results for automatic metrics. proxy to decide whether the system needs to recom-mend a movie or not, it is not strictly supervised.Thus, if a generated sentence is labeled as “rec-ommendation”, we enforce our dialog system torecommend a new movie.
We compare the baseline dialog model withoutstrategy supervision against our dialog model withstrategy supervision. We use both automatic met-rics and human evaluation.For automatic metrics, we compute perplexityand BLEU scores (Papineni et al., 2002), suggest-ing that prepending strategies improves the modelperformance as shown in Table 7. For human eval-uation, twenty-eight participants chat with bothmodels for 2-3 times for a more reliable judgment.We randomize which model they will chat first, inorder to avoid exposure bias. After chatting, theyare asked to decide which model is better in thesefive aspects: fluency, consistency, naturalness, per-suasiveness, and engagingness. If they are unableto distinguish the dialog systems, they are allowedto choose “can’t tell” option.Results in Figure 5 suggest that human users pre-fer the model with strategy over the baseline in allaspects . It is interesting to see that although thestrategy model is preferred on all metrics, peoplefind the two model differs the most in engaging-ness, followed by naturalness. This supports ourhypothesis that human users will find the conversa-tions more engaging and more natural with sociablestrategies incorporated in recommendation dialogsystems. In this work, we have introduced I
NSPIRED , a newrecommendation dialog dataset collected in naturalsetting and annotated with sociable recommenda-tion strategies. We analyze the connection betweendifferent strategies and the recommendation results.Our findings show that sociable strategies do have apositive impact on the acceptance of recommenda-tion and dialog quality. This work opens up several We also run additional user study with five-scale ratingson these five aspects with results in Table 10 in the Appendix P a r t i c i p a n t s can't tell baseline strategy Figure 5: Human evaluation result. “Flu.” standsfor fluency, “cons.”: consistency, “nat.”: naturalness,“pers.”: persuasiveness, and “eng.”: engagingness. directions for future studies in building sociableand personalized recommendation dialog systemsas follows:First, we will explore more ways of utilizingthe strategies, including dynamic strategy selec-tion after decoding. Then, we plan to investigatethe strategy patterns for people with different per-sonalities and movie preferences to make dialogsystem more personalized. Finally, another inter-esting exploration is to extend the model with ajointly trainable movie recommendation and movieinformation modules.
Acknowledgments
We would like to thank members of the NLP labat UC Davis for discussion and participation in thepilot study. We are also grateful for human evalua-tion participants and Mechanical Turk workers oncontributions of building this dataset.
References
Irwin Altman. 1973. Reciprocity of interpersonal ex-change.
Journal for the Theory of Social Behaviour ,3(2):249–261.Timothy Bickmore and Justine Cassell. 2005. Socialdialongue with embodied conversational agents. In
Advances in natural multimodal dialogue systems ,pages 23–54. Springer.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. Bert: Pre-training of deepbidirectional transformers for language understand-ing. In
Proceedings of the 2019 Conference ofthe North American Chapter of the Association forComputational Linguistics: Human Language Tech-nologies, Volume 1 (Long and Short Papers) , pages4171–4186.Peter W Dowrick. 1999. A review of self modeling andelated interventions.
Applied and preventive psy-chology , 8(1):23–39.David Schlangen Fabian Galetzka1, ChukwuemekaU. Eneh. 2020. A corpus of controlled opinionatedand knowledgeable movie discussions for trainingneural conversation models. In
Language Resourcesand Evaluation Conference (LREC) , volume 12.B. J. Fogg. 2002. Persuasive technology: Using com-puters to change what we think and do.
Ubiquity ,2002(December).Sofia Gkika and George Lekakos. The persuasive roleof explanations in recommender systems.Mehmet H G¨oker and Cynthia A Thompson. 2000.Personalized conversational case-based recommen-dation. In
European Workshop on Advances in Case-Based Reasoning , pages 99–111. Springer.Lewis R Goldberg. 1993. The structure of phenotypicpersonality traits.
American psychologist , 48(1):26.Ulrike Gretzel and Daniel R. Fesenmaier. 2006. Per-suasion in recommender systems.
InternationalJournal of Electronic Commerce , 11(2):81–100.Katherine Hamilton, Shin-I Shih, and Susan Mo-hammed. 2016. The development and validation ofthe rational and intuitive decision styles scale.
Jour-nal of personality assessment , 98(5):523–535.F. Maxwell Harper and Joseph A. Konstan. 2015. Themovielens datasets: History and context.
ACMTrans. Interact. Intell. Syst. , 5(4):19:1–19:19.Gerald H¨aubl and Kyle B Murray. 2003. Preferenceconstruction and persistence in digital marketplaces:The role of electronic recommendation agents.
Jour-nal of Consumer Psychology , 13(1-2):75–91.Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, andYejin Choi. 2019. The curious case of neural textdegeneration. arXiv preprint arXiv:1904.09751 .Dongyeop Kang, Anusha Balakrishnan, Pararth Shah,Paul Crook, Y-Lan Boureau, and Jason Weston.2019. Recommendation as a communication game:Self-supervised bot-play for goal-oriented dialogue.In
Proceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the9th International Joint Conference on Natural Lan-guage Processing (EMNLP-IJCNLP) , pages 1951–1961, Hong Kong, China. Association for Computa-tional Linguistics.Klaus Krippendorff. 2004. Reliability in contentanalysis: Some common misconceptions and rec-ommendations.
Human communication research ,30(3):411–433.Paul F. Lazarsfeld and Robert King Merton. 1964.Friendship as social process: a substantive andmethodological analysis. pages 18–66, New York.Van Nostrand. Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz,Vincent Michalski, Laurent Charlin, and Chris Pal.2018. Towards deep conversational recommenda-tions. In
Advances in Neural Information Process-ing Systems , pages 9725–9735.Kaihui Liang, Austin Chau, Yu Li, Xueyuan Lu, DianYu, Mingyang Zhou Zhou, Ishan Jain, Sam David-son, Josh Arnold, Minh Nguyen, and Zhou Yu. 2020.Gunrock 2.0: A user adaptive social conversationalsystem. In .Ilya Loshchilov and Frank Hutter. 2019. Decoupledweight decay regularization. In
International Con-ference on Learning Representations .A. H. Miller, W. Feng, A. Fisch, J. Lu, D. Batra, A. Bor-des, D. Parikh, and J. Weston. 2017. Parlai: Adialog research software platform. arXiv preprintarXiv:1705.06476 .Daniel James O’Keefe. 2004. Trends and prospects inpersuasion theory and research. In
Readings in per-suasion, social influence, and compliance gaining ,pages 31–43. Pearson/Allyn and Bacon.Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic eval-uation of machine translation. In
Proceedings ofthe 40th annual meeting on association for compu-tational linguistics , pages 311–318. Association forComputational Linguistics.Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Mat-suyama, and Justine Cassell. 2019. A model ofsocial explanations for a conversational movie rec-ommendation system. In
Proceedings of the 7thInternational Conference on Human-Agent Interac-tion , HAI ’19, pages 135–143, New York, NY, USA.ACM.Alec Radford, Jeff Wu, Rewon Child, David Luan,Dario Amodei, and Ilya Sutskever. 2019. Languagemodels are unsupervised multitask learners.Filip Radlinski, Krisztian Balog, Bill Byrne, andKarthik Krishnamoorthi. 2019. Coached conver-sational preference elicitation: A case study in un-derstanding movie preferences. In
Proceedings ofthe Annual SIGdial Meeting on Discourse and Dia-logue .Hannah Rashkin, Eric Michael Smith, Margaret Li, andY-Lan Boureau. 2019. Towards empathetic open-domain conversation models: a new benchmark anddataset. In
ACL .Kevin Reschke, Adam Vogel, and Dan Jurafsky. 2013.Generating recommendation dialogs by extractinginformation from user reviews. In
Proceedingsof the 51st Annual Meeting of the Association forComputational Linguistics (Volume 2: Short Papers) ,pages 499–504, Sofia, Bulgaria. Association forComputational Linguistics.elton V. Rhoads and Robert B. Cialdini. 2002. Thebusiness of influence. In
Persuasion handbook: De-velopments in theory and practice , pages 513–542,London, United Kingdom. Sage.Shalom H Schwartz. 2003. A proposal for mea-suring value orientations across nations.
Ques-tionnaire Package of the European Social Survey ,259(290):261.Weiyan Shi and Zhou Yu. 2018. Sentiment adaptiveend-to-end dialog systems. In
Proceedings of the56th Annual Meeting of the Association for Compu-tational Linguistics (Volume 1: Long Papers) , pages1509–1519, Melbourne, Australia. Association forComputational Linguistics.Richard Socher, Alex Perelygin, Jean Wu, JasonChuang, Christopher D Manning, Andrew Y Ng,and Christopher Potts. 2013. Recursive deep mod-els for semantic compositionality over a sentimenttreebank. In
Proceedings of the 2013 conference onempirical methods in natural language processing ,pages 1631–1642.Yueming Sun and Yi Zhang. 2018. Conversational rec-ommender system. In
The 41st International ACMSIGIR Conference on Research & Development inInformation Retrieval , pages 235–244. ACM.Cynthia A Thompson, Mehmet H Goker, and Pat Lang-ley. 2004. A personalized system for conversationalrecommendations.
Journal of Artificial IntelligenceResearch , 21:393–428.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, undefine-dukasz Kaiser, and Illia Polosukhin. 2017. Attentionis all you need. In
Proceedings of the 31st Interna-tional Conference on Neural Information ProcessingSystems , NIPS’17, page 6000–6010, Red Hook, NY,USA. Curran Associates Inc.Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh,Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Per-suasion for good: Towards a personalized persuasivedialogue system for social good. In
Proceedings ofthe 57th Annual Meeting of the Association for Com-putational Linguistics , pages 5635–5649, Florence,Italy. Association for Computational Linguistics.Qingyang Wu, Yichi Zhang, Yu Li, and Zhou Yu.2019. Alternating recurrent dialog model with large-scale pre-trained language models. arXiv preprintarXiv:1910.03756 .Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, JiansheZhou, and Zhoujun Li. 2017. Building task-orienteddialogue systems for online shopping. In
Thirty-First AAAI Conference on Artificial Intelligence .Kyung-Hyan Yoo, Ulrike Gretzel, and Markus Zanker.2012.
Persuasive Recommender Systems: Con-ceptual Background and Implications , 1st edition.Springer Publishing Company, Incorporated. Markus Zanker, Marcel Bricman, Sergiu Gordea, Di-etmar Jannach, and Markus Jessenitschnig. 2006.Persuasive online-selling in quality and taste do-mains. In
International Conference on ElectronicCommerce and Web Technologies , pages 51–60.Springer.Saizheng Zhang, Emily Dinan, Jack Urbanek, ArthurSzlam, Douwe Kiela, and Jason Weston. 2018a. Per-sonalizing dialogue agents: I have a dog, do youhave pets too? In
Proceedings of the 56th AnnualMeeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , pages 2204–2213.Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang,and W Bruce Croft. 2018b. Towards conversationalsearch and recommendation: System ask, user re-spond. In
Proceedings of the 27th ACM Interna-tional Conference on Information and KnowledgeManagement , pages 177–186. ACM.Yiheng Zhou, He He, Alan W Black, and YuliaTsvetkov. 2019. A dynamic strategy coach for effec-tive negotiation. In
Proceedings of the 20th AnnualSIGdial Meeting on Discourse and Dialogue , pages367–378, Stockholm, Sweden. Association for Com-putational Linguistics.
Movie Trailer Database Creation
For each movie, we obtain metadata informationfrom Youtube and add other movie attributes, suchas plot, actors, and genre using OMBD API .We enrich the movies from MovieLens datasets(Harper and Konstan, 2015) with more movie trail-ers by searching the movie title and “trailer” onYoutube with a duration restriction of less than 5minutes. We choose the trailers that are shorter than5 minutes so that the crowd-workers do not have tospend a long time on watching them. We used thefirst retrieved link of the video under the durationconstraint. We remove movies without a retrievedtrailer from our database. Our motivation to useMovieLens and include more trailers is to link ourmovie database with MovieLens user review, sothat it can be used for future work on building rec-ommendation systems. Category CoefficientSociable Strategies
Personal Opinion 0.12*Personal Experience 0.04Similarity *Encouragement 0.17*Offering Help 0.02Preference Confirmation 0.05Self-Modeling 0.02Credibility 0.07
Preference Elicitation
Experience Inquiry − Movie Attributes
Recently Released 0.17Popularity − Genre
Action − − − Table 8: The associations between different strate-gies, movie attributes, and successful recommenda-tions. * p < omdbapi.com B Heuristics for RecommendationSystem
Our heuristics for the recommendation system tohandle cold-start is as follows. If the seeker nevermentions a movie before and the generated textof the recommender dialog system is labeled as“recommendation”, the most recent movie with thelast mentioned genre will be recommended.If the seeker already mentioned a movie, wewill query the last mentioned movie with positiveor neutral sentiment to the recommendation sys-tem for recommendation. The first recommendedmovie from the recommendation system outputwill be chosen by our dialog system. If the moviehas been recommended, we will choose the nextrecommended movie in the recommendation out-put list.To detect which movies are favored by theseekers and movie titles in the sentence, we usethe modules from Liang et al. (2020). The sen-timent classifier is a BERT-based (Devlin et al.,2019) model trained on Stanford Sentiment dataset(Socher et al., 2013). For the movie title detec-tion, the model is a bidirectional LSTM-CRF withcharacter-augmented word embedding for the in-put combined with retrieving similar movie title inthe movie database. The movie database is fromTMDB. The model was trained on speech tran-scripts.To detect movie genre in the sentence, we useregular expression matching for these genres fol-lowing movie information from OMDB in ourdatabase: Action, Animation, Biography, Com-edy, Crime, Drama, Documentary, Fantasy, History,Horror, Mystery, Musical, News, Romance, Sport,Thriller, War, and Western. To detect movie actors,actresses, and directors, we use pattern matchingfor capitalized first letter and find if the name existsin TMDB search for people.
C Dialog Model
The dialog model p ( d ) of a dialog d with T turnsis defined as follows: p ( d ) = T ∏ t = p s ( s t ∣ s < t , r < t ) p r ( r t ∣ s < t , r < t ) (1)where s is the seeker’s utterance at turn t and r is the recommender’s utterance at turn t , and p s ( s t ∣ s < t , r < t ) is the probability of generating theeeker’s utterance given the history. The conversa-tion history is represented by they query/key/valuefeatures using self-attention. Interested reader canrefer to Wu et al. (2019) for more details. D Training Set-up
We adopt GPT-2 small, which is a 12-head, 12-layer, and 768-hidden size Transformer, with 117Mparameters. We use pre-trained GPT-2 Byte PairEncoding (BPE) tokenizer with the extended vo-cabulary of 50,310 tokens to tokenize texts. The op-timizer is AdamW (Loshchilov and Hutter, 2019),and the number of warm-up steps is 100. The learn-ing rate is set to × − , and the dropout rate isset to 0.1. All experiments are run with an NVIDIAGeForce GTX 1080 Ti GPU.The movie information in the input data, suchas actress/actor’s name, movie genre and movieplot, is delexicalized as special tokens. The realinformation (genre, movie title, etc.) is appendedto the utterance. In addition, the strategy label isalso treated as special tokens.We leverage the ReDial (Li et al., 2018) andmovie preference elicitation datasets (Radlinskiet al., 2019) to conduct task-related pretraining.It takes around 1.37 hour to finish one epoch inpretraining the model.As for the training on INSPIRED dataset, it takesaround 16 minutes to finish one epoch. We trainthe model until it converges. The baseline modelusually converges after the second epoch while thestrategy-incorporated model after the third epoch.During the inference stage, we combined top-kbased sampling and top-p based sampling (NucleusSampling method (Holtzman et al., 2019)). Wekeep the highest probability tokens whose cumula-tive probability mass exceeds the threshold p . Wemanually tuned the threshold of temperature, p and k to make both model achieve their best perfor-mance.The temperature is set as . for baseline and0.8 for the strategy-incorporated model. For bothmodel, the threshold of k is set as and theupperbound of p is set as . . We manually tunethe hyperparameters.For the strategy-incorporated model, the strategyis generated first and the utterance is then generatedconditioned on the strategy. Although it is a looseconstraint, the model learned categorical strategicpatterns. For completeness, we also provide valida-tion perplexity and BLEU-4 score in Table 9 Model Test Valid
PPL BLEU-4 PPL BLEU-4Baseline 9.28 5.11 9.21 5.09Strategy 8.93 6.63 8.90 7.55
Table 9: Results for automatic metrics in both valida-tion and test data.
E Additional User Study
In addition to the comparison study done by humanusers mentioned in §6.3, we conduct another userstudy which asks each participant to rate from 1(worst) to 5 (best) for the same five aspects: flu-ency, consistency, naturalness, persuasiveness, andengagingness. For each model, 25 participants chatinteractively with it (in total: 50 users). Unlike theuser study in §6.3 where 1 user interacts with bothmodels, this time a user interacts with one modelsince users do not need to compare. These partic-ipants are different from the ones reported in thecomparison user study (§6.3). From Table 10, wecan see that the strategy model has higher ratingsthan the baseline model in all aspects.
F Example Human-Human Dialogs inI
NSPIRED
We include 2 annotated examples of human-humandialogs in Table 11 and 12.
G Example Human-System Dialogs
We include example dialog of human seeker and thebaseline model in Table 13 and an example strategy-incorporated dialog model in Table 14 from userstudy. In the user study of evaluating the dialog sys-tem, we do not set a minimum turn for the humanuser.
H Dialog Collection and AnnotationInterfaces
Figure 6, 7, 9, 11 show dialog collection interface.Figure 12 and 13 are dialog annotation interfacesfor the crowd-workers. odel Fluency Consistency Naturalness Persuasiveness Engagingness
Baseline 3.84 3.80 3.20 3.24 3.56Strategy
Human Speaker* - - 4.17 3.97 4.20
Table 10: Average score for human ratings on a 5-point Likert scale. Note that the human-human dialogues werecollected before the user study and we did not measure fluency and consistency for human recommender.
Conversation 1 Label
REC: Hi There!
No Strategy
What types of movies do you like to watch?
Offering Help
SEEK: Hello! I’m more of an action movie or a good romance and mystery movie.REC: I just saw the trailer for Knives Out when I went to see Joker and it looked like
Self-Modeling a good mix of action and mystery!SEEK: I seen that one too as I seen Joker about a month ago.I thought about asking my fiance about going and seeing it.REC: It looks like a good movie for people who like many different movies.
Personal Opinion
It also has a great cast!
Personal Opinion
I was surprised to see Chris Evans in the trailer!
Personal Opinion
SEEK: Maybe with Chris Evans in it it’ll be easier to convince my fiance to see it.Do you know who else is in the cast?REC: Daniel Craig and Jamie Lee Curtis are also in the cast.
Credibility
Daniel Craig does a lot of 007 so definitely a good hearthrob role
Encouragement to convince the misses lol!SEEK: I am the misses lol. But he loves the bond movies so that should be a goodincentive for him to go see it. Do you have any other recommendations?REC: The new Star Wars comes out in less than a month, if you are into the franchise.SEEK: He is, I think he told me we’re getting it when it comes out to add to our moviecollection.REC: Well that is another great action movie.
Encouragement
I also recommend the John Wick series
Encouragement
SEEK: I haven’t seen any of that series. Could you tell me what the general plot is > REC: John Wick is a former member of a gang, he was basically an assassin.
Credibility
He falls in love and quits the game, but then his wife dies, and someone comes in
Credibility and kills his dog.He then goes on a revenge rampage against the people who broke into his house.
Credibility
I have yet to watch the 3rd one but the action scenes were really cool!
Personal Opinion
SEEK: Oh I’d definitely would cry at the dogs death.REC: It is really sad!
Similarity the dog was a last gift from his dying wife which makes it so much worse
Personal Opinion
SEEK: I couldn’t even finish I am legend because of the dog dying.Anything with animal death makes me ball like a baby.REC: Marley & Me had me crying for a good half hour so I completely understand that!
Similarity
SEEK: I avoided that movie because someone told me he passed away.My fiance took me to see jurrasic world as our first date and I cried at thedinosuars dying.REC: I would definitely avoid that movie if animal deaths make you said.
Similarity
Oh that is so cute though!
No Strategy
SEEK: Yeah, he had to calm me down for about an hour and bought me ice cream toapologize for it.REC: Aww that is so sweet.
No Strategy
Given that you dont want to see animals die, and you are looking for an
Pref. Confirmation
Action / Mystery, I think Knives Out would be a good movie choice.Do you agree?
Pref. Confirmation
SEEK: I do agree with that. When it comes out i’ll bring it up for date night. Thank you!!
Table 11: Example 1 for labeled human-human dialog. REC refers Recommender and SEEK to Seeker. onversation 2 Label
REC: Hi there.
No Strategy
How are you?
No Strategy
SEEK: I’m doing well. I was wondering if you had any movie recommendations.REC: Let me see.
Offering Help
Well, what kind of movies do you usually find yourself watching?
Opinion Inquiry
What do you like?
Opinion Inquiry
SEEK: I tend to like action movies that also involve comedy.REC: Oh, me too.
Similarity
So, what did you see last that you liked?
Experience Inquiry
SEEK: I really liked Avengers: Endgame because I liked the pacing andthe effects were great.REC: Okay, I think I kind of understand your taste a little.
Pref. Confirmation
It’s sort of similar to mine, although I haven’t seen an action
Similarity hero movie in a while.Did you like Guardians of the Galaxy?
Opinion Inquiry
SEEK: No, I haven’t, but I heard good things about it.REC: Oh, I was assuming that you’d seen it.
Offering Help
I saw that with my son and I was surprised at how much I liked it.
Personal Experience
SEEK: Were you able to see the sequel for it?REC: No, but I would like to.
Self-Modeling
I would really recommend that.
Encouragement
I don’t know if you know who Chris Pratt is, but he’s a keeper.
Personal Opinion
He was on Parks & Rec, the comedy show.
Credibility
SEEK: I love Chris Pratt! I accept your recommendation. Thank you so much.REC: :) oh, I’m so happy.
Acknowledgment
Cool.
Acknowledgment
I hope you like it.
Encouragement
SEEK: Do you have any other recommendations for action movies?REC: Hmmm...I don’t know if I can think of any others that are funny like that.
No Strategy
I know there are some though.
No Strategy
SEEK: Oh, I just remembered. I really liked Deadpool because it did a great job ofmashing comedy and action together.REC: My 20 year old son liked that.
Personal Experience
I have only seen a couple parts.
Personal Experience
You’ve no doubt seen the MIB series, but those were kind of good as well.
Personal Opinion
SEEK: Can you tell me what you like about MIB?I like Will Smith, but I haven’t see any of the movies.REC: Oh, yeah.
Acknowledgment
They’re telling us to quit the hit, but real quick.
No Strategy
He’s funny and creative in it and he’s great when paired with Tommy Lee Jones.
Personal Opinion
It’s just creative and makes you laugh unexpectedly.
Personal Opinion
Anyway, that’s all I got.
No Strategy
It was nice talking to you.
No Strategy
SEEK: It was great talking to you. I’ll be sure to check out your recommendations.
Table 12: Example 2 for labeled human-human dialog. REC refers Recommender and SEEK to Seeker. uman Conversation with Baseline Dialog Model
REC SYS: Hello there!SEEK: hello, how are youREC SYS: I’m doing good!SEEK: that’s nice to hearREC SYS: I’m here to help you look for a movie trailer.SEEK: really, do you have a recommendation?REC SYS: Do you know what kind of movie trailers you’d like to see?SEEK: I like super hero moviesREC SYS: Do you know what kind of movies you’d like to see?SEEK: i like action moviesREC SYS: I do too.SEEK: that’s pretty awesome, I am glad we have something in commonREC SYS: Have you seen the movie Avengers: Endgame?SEEK: yesREC SYS: What about Avengers: Endgame?SEEK: yes I have seen itREC SYS: Have you seen the movie Assassin’s Creed?SEEK: no i haven’tREC SYS: Did you enjoy it?SEEK: i don’t know because I haven’t seen itREC SYS: If you liked it, you’ll love the movie Assassin’s Creed.SEEK: hm, that’s great.REC SYS: Great!
Table 13: Example dialog of human-system. REC SYS refers baseline recommendation dialog system and SEEKto human Seeker.
Human Conversation with Strategy-Incorporated Dialog Model