[PDF] Eliciting and Analysing Users' Envisioned Dialogues with Perfect Voice Assistants

Abstract

We present a dialogue elicitation study to assess how users envision conversations with a perfect voice assistant (VA). In an online survey, N=205 participants were prompted with everyday scenarios, and wrote the lines of both user and VA in dialogues that they imagined as perfect. We analysed the dialogues with text analytics and qualitative analysis, including number of words and turns, social aspects of conversation, implied VA capabilities, and the influence of user personality. The majority envisioned dialogues with a VA that is interactive and not purely functional; it is smart, proactive, and has knowledge about the user. Attitudes diverged regarding the assistant's role as well as it expressing humour and opinions. An exploratory analysis suggested a relationship with personality for these aspects, but correlations were low overall. We discuss implications for research and design of future VAs, underlining the vision of enabling conversational UIs, rather than single command "Q&As".

Full PDF

EEliciting and Analysing Users’ Envisioned Dialogues withPerfect Voice Assistants

Sarah Theres Völkel [email protected] MunichMunich, Germany

Daniel Buschek [email protected] Group HCI + AI,Department of Computer Science,University of BayreuthBayreuth, Germany

Malin Eiband [email protected] MunichMunich, Germany

Benjamin R. Cowan [email protected] College DublinDublin, Ireland

Heinrich Hussmann [email protected] MunichMunich, Germany

ABSTRACT

We present a dialogue elicitation study to assess how users envisionconversations with a perfect voice assistant (VA). In an online sur-vey, N=205 participants were prompted with everyday scenarios,and wrote the lines of both user and VA in dialogues that they imag-ined as perfect. We analysed the dialogues with text analytics andqualitative analysis, including number of words and turns, socialaspects of conversation, implied VA capabilities, and the influenceof user personality. The majority envisioned dialogues with a VAthat is interactive and not purely functional; it is smart, proactive,and has knowledge about the user. Attitudes diverged regardingthe assistant’s role as well as it expressing humour and opinions.An exploratory analysis suggested a relationship with personalityfor these aspects, but correlations were low overall. We discussimplications for research and design of future VAs, underlining thevision of enabling conversational UIs, rather than single command“Q&As”.

CCS CONCEPTS • Human-centered computing → Empirical studies in HCI ; Natural language interfaces . KEYWORDS

Adaptation, conversational agent, dialogue, personality, voice assis-tant

ACM Reference Format:

Sarah Theres Völkel, Daniel Buschek, Malin Eiband, Benjamin R. Cowan,and Heinrich Hussmann. 2021. Eliciting and Analysing Users’ EnvisionedDialogues with Perfect Voice Assistants. In

CHI Conference on Human Factorsin Computing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan.

ACM,New York, NY, USA, 15 pages. https://doi.org/10.1145/3411764.3445536

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445536

Voice assistants are ubiquitous. They are conversational agentsavailable through a number of devices such as smartphones, com-puters, and smart speakers [16, 68], and are widely used in a numberof contexts such as domestic [68] and automotive settings [9]. Arecent analysis of more than 250,000 command logs of users in-teracting with smart speakers [1] showed that, whilst people usethem for functional requests (e.g., playing music, switching the lighton/off), they also request more social forms of interaction with as-sistants (e.g., asking to tell a joke or a good night story). Recentreports on smart speakers [42] and in-car assistant usage trends [41]corroborate these findings, emphasising that voice assistants aremore than just speech-enabled “remote controls”.Moreover, voice assistants are perceived as particularly appeal-ing if adapted to user preferences, behaviour, and background [19,20, 22, 45]. Since conversational agents tend to be seen as socialactors in general [58], with users often assigning them personali-ties [69], personality has been highlighted as a promising directionfor designing and adapting voice assistants. For example, Braun etal. [9] found that users trusted and liked a personalised in-car voiceassistant more than the default version, especially if its personalitymatched their own. Although efforts have been made to generateand adapt voice interface personality [49], commercially availablevoice assistants have so far taken a one-size-fits-all approach, ig-noring the potential benefits that adaptation to user preferencesmay bring.Systematically adapting a voice assistant to the user is chal-lenging: People tend to show individual differences in preferencesfor conversations when asked about their envisioned version ofa perfect voice assistant [83]. Personalisation also harbours cer-tain dangers, as an incorrectly matched voice assistant may beless accepted by a user than a default [9]. Current techniques forgenerating personalised agent dialogues tend to take a top-downapproach [9, 45], with little user engagement. That is, differentversions of voice assistants are developed and then contrasted inan evaluation, without investigating how they should behave inspecific tasks or contexts.To overcome these problems, we present a pragmatic bottom-upapproach, eliciting what users envision are dialogues with perfectvoice assistants : Concretely, in an online survey, we asked N=205 a r X i v : . [ c s . H C ] F e b HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al. participants to write what they imagined to be a conversation be-tween a perfect voice assistant and a user for a set of common usecases. In an exploratory approach, we then analysed participants’ re-sulting dialogues qualitatively and quantitatively: We examined theshare of speech and interaction between the interlocutors, as wellas social aspects of the dialogue, voice assistant and user behaviour,and knowledge attributed to the voice assistant. We also assessedrelationships of user personality and conversation characteristics.Specifically we address these research questions:(1) RQ1:

How do users envision a dialogue between a user and aperfect voice assistant, and how does this vision vary? (2) RQ2:

How does the user personality influence the envisionedconversation between a user and a perfect voice assistant?

Our contribution is twofold: First, on a conceptual level, wepropose a new approach so as to engage users in voice assistantdesign. Specifically, it allows designers to gain insight into whatusers feel is the design of a dialogue with a perfect voice assistant.Second, we present a set of qualitative and quantitative analysesand insights into users’ preferences for personalised voice assistantdialogues. This provides much needed information to researchersand practitioners on how to design voice assistant dialogues andhow these vary and might thus be adapted to users.

Below we summarise work on the characteristics of today’s human-agent conversation, human and conversational agent personality,and adapting the agent to the user.

Despite the promise that the name

Conversational Agent implies,several studies show that conversations with voice assistants arehighly constrained, falling short of users’ expectations [21, 48, 68,70]. Existing dialogues with voice assistants are heavily task ori-ented, taking the form of adjacency pairs that revolve around re-questing and confirming actions [17, 34]. Voice assistant researchis increasingly interested in imbuing speech systems with abili-ties that encompass the wider nature of human conversationalcapabilities, in an attempt to mimic more closely the types of con-versations between humans. The ability to generate social talk [76],humour [17], as well as fillers and disfluencies [77] are being de-veloped as ways of making interaction with speech systems seemmore natural. On the other hand, there is scepticism around thebenefits this type of naturalness may produce. Recent research onthe perception of humanness in voice user interfaces suggests thatusers tend to perceive voice assistants as impersonal, unemotional,and inauthentic, especially when producing opinions [28].Users also tend to see a clear difference between humans and ma-chines as capable interlocutors rather than blurring the boundariesbetween these two types of partner [28]. Machine dialogue partnersare regularly seen as “basic” [8] or “at risk listeners” [64]. To com-pensate for this, users develop strategies to adapt their speech ininteraction [16, 66]. For example, people’s speech becomes more for-mal and precise, with fewer disfluencies and filled pauses along withmore command-like and keyword-based structure [36, 46, 48, 61–63]. Moreover, users are also prone to mimic the syntax [19] andlexical choices [7, 8] of voice assistant’s language, a phenomenon termed alignment. This alignment also occurs frequently in human-human interaction but within human-machine dialogue is thoughtto be driven by a user’s attempt to ensure communication successwith the system [7]. This phenomenon can also be leveraged inconversational design with recent work indicating that users as-cribe high likability and integrity to a voice user interface thataligns its language to the user [47]. In contrast to prior work, whichexamined user conversation with voice assistants given the cur-rent technological status quo, we take a different approach: We letusers freely imagine a conversation they consider to be perfect,using their desired conversation style, syntax, and wording, thusexploring what users actually do want when given the choice. Personality describes consistent and characteristic patterns whichdetermine how an individual behaves, feels, and thinks [51]. The

Big Five (also

Five-Factor model or OCEAN ) is the most prevalentparadigm for modelling human personality in scientific researchand has five broad dimensions [23, 26, 27, 35, 39, 40, 50–53]:

Openness reflects an individual’s inclination to seek new experi-ences, imagination, artistic interests, creativity, intellectual curios-ity, and an open-minded value and norm system.

Conscientiousness reflects a tendency to be disciplined, orderly,dutiful, competent, ambitious, and cautious.

Extraversion reflects a tendency to be friendly, sociable, assertive,dynamic, adventurous, and cheerful.

Agreeableness reflects a tendency to be trustful, genuine, helpful,modest, obliging, and cooperative.

Neuroticism reflects an individual’s emotional stability and re-lates to experiencing anxiety, negative affect, stress, and depression.A plethora of work in psychology and linguistics has examinedthe role of personality in human language use [12, 13, 25, 33, 54, 59,65, 67, 71, 73]. This relationship is most pronounced for

Extraversion .For example, extraverts tend to talk more, use more explicit andconcrete speech style, simpler sentence structure, and a limitedvocabulary with highly frequent words in contrast to introverts [4,25, 33, 59, 67].Although not common in commercially available voice assis-tants yet [82], the construct of personality, in particular the BigFive, has also been leveraged to describe differences in how con-versational agents express behaviour [56, 74, 79, 81]. Focusing onvoice assistant personality modelling rather than voice assistantpersonality design, work by Völkel et al. [84] points out that theway users describe voice assistant personality may not fit the BigFive model, proposing ten alternative dimensions for modelling aconversational agent personality. Their dimensions such as “Social-entertaining” can be expected to be realised by designers also viadialogue-level characteristics, such as including humorous remarks,as we examine here.

Previous work has noted that users enjoy interacting with voiceassistants, imbuing them with human-like personality [21]. De-liberately manipulating this personality has an impact on userinteraction, influencing acceptance and engagement [11, 88]. liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan

Much like human-human interaction, users have preferences forparticular personality types, tending to prefer voice assistants whoshare similar personalities to them [6, 30, 57], termed the similar-ity attraction effect [10, 57]. When interacting with a book buyingwebsite, extraverted participants showed more positive attitudestowards more extraverted voice user interfaces [57], whilst match-ing a voice user interface’s personality to the user’s personalityalso increases feelings of social presence [37, 45]. Similarity attrac-tion effects have also been seen in in-car voice assistants, wherebyusers liked and trusted the assistant more if their personalities werematched [9]. A user’s personality also influences their preferencefor the type of talk voice assistants engage in, with extravertedusers preferring a virtual real estate agent that engaged in socialtalk, and with more introverted users preferring a purely task-oriented dialogue [5, 6]. While previous work focused on similarityattraction for extraversion in voice assistants, we look at all voiceassistant/user personality dimensions.

Overall, related work has provided insights into current shortcom-ings in voice assistant interaction [21, 48, 68] and how users per-ceive conversations with voice assistants [17] and their human-ness [28] contrary to human-human interaction. In contrast to thisrecent qualitative work, we take a mixed methods approach toexplore what users themselves would prefer in a voice assistantdialogue given no technical limitations.This motivates us to ask users to write their envisioned dialogueswith a perfect voice assistant. In this way, we engage users to in-form future assistant design, beyond contributing “compensationstrategies” for current technical limitations. Moreover, given theliterature’s focus on user personality as basis for agent adaptation,we explore relationships of personality and such envisioned dia-logues. Finally, regarding the level of analysis, our study providesthe first in-depth dialogue-level assessment, beyond, for example,phrasing of single commands [9], social vs functional talk [6], ornonverbal [45] investigations of agent personalisation.

We conducted an online study to investigate our research questions.Our research design is inspired by our previous method [83] whichpresented participants with different social as well as functionalscenarios. In each scenario, participants were asked to complete adialogue between a user and a voice assistant where the user partwas given, that is, they had to add the part of the voice assistantonly. We found that there are differences between participants withregard to how they “designed” the voice assistant. These differ-ences were more notable in social scenarios than in functional ones.Moreover, dialogues in functional scenarios were very similar tothe current state-of-the-art in interaction with voice assistants.In our study, we built on this approach, but decided to let par-ticipants write entire dialogues (i.e. both the user and the voiceassistant part) because we assumed that differences between partic-ipants might then emerge more clearly. Participants were presentedwith different smart home scenarios in which a user solves a spe-cific issue by conversing with the voice assistant (cf. below). Weinstructed them to write down their envisioned conversation with

Figure 1: Participants were asked to sketch an envisionedconversation with a perfect voice assistant. For eight givenscenarios, they first selected who is speaking from a drop-down menu and then wrote down what the selected speakeris saying. Example dialogue written by participant 28. a “perfect” voice assistant, assuming there were no technical limita-tions to its capabilities, with it being fully capable of participatingand engaging in a natural conversation to whatever extent theyprefer. Following the Oxford English Dictionary (OED), we define aperfect voice assistant as users’ vision of “complete excellence” thatis “free from any imperfection or defect of quality” [60]. Further-more, we asked participants to “imagine living in a smart home witha voice assistant”. Hence, we expect they described their versionof a conversation in a context of long-term use. This method com-bines aspects of the story completion [18] method (i.e. participantswriting envisioned interactions) and elicitation approaches [80](i.e. asking people to come up with input for a presented outcome),shedding light on user preferences on a technology in the making.

We designed eight scenarios based on the most popular use casesfor Google Home and Amazon Alexa/Echo as recently identifiedby Ammari et al. from 250,000 command logs of users interactingwith smart speakers [1]. In each scenario, we described a specific everyday situation a user encounters and an issue . Notably, we de-signed the scenarios in a way so that the participant could choosewhether the user or the voice assistant initiates the conversation.In addition, we included an open scenario where participants coulddescribe a situation in which they would like to use a voice assistant.The final scenarios are listed in Table 1. This selection of scenar-ios corresponds to similar analyses of everyday use of voice userinterfaces as described by prior research [3, 21, 48] and consumerreports [41, 42]. Participants were introduced to the study purpose and asked fortheir consent in line with our institution’s regulations. After that,they were presented with their task of writing dialogues between auser and a voice assistant for different scenarios. We highlightedthat the conversation could be initiated by both parties, and also

HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al.

Name Description & Issue

Search You want to go to the cinema to see a film, but you do not know the film times for your local cinema.Music You are cooking dinner. You are on your own and you like to listen to some music while cooking.Internet of Things You are going to bed. You like to read a book before going to sleep. You often fall asleep with the lights on.Volume You are listening to loud music, but your neighbours are sensitive to noise.Weather You are planning a trip to Italy in two days but do not know what kind of clothing to pack. You like to be prepared for the weather.Joke You and your friends are hanging out. You like to entertain your friends, but the group seems to have run out of funny stories.Conversational You are going to bed, but you are having trouble falling asleep.Alarm You are going to bed. You have an important meeting early next morning, and you tend to oversleep.Open Scenario Please think about another situation in which you would like to use the perfect voice assistant.

Table 1: Scenarios used in our study. Each scenario contains a descriptive part and a specific issue which participants shouldaddress and solve in their envisioned dialogue between a user and a perfect voice assistant. provided an example scenario with two example dialogues (oneinitiated by the user, the other by the voice assistant). Participantswere then presented with the eight different scenarios in randomorder before concluding with an open scenario, where they weregiven the opportunity to think of another situation in which theywould like to use the perfect voice assistant. For each scenario,participants were asked to first select who is speaking from a drop-down menu (

You or Voice assistant ) and then write down what theselected speaker is saying (cf. Figure 1). If they wanted, participantscould give the voice assistant a name. At the end of the study, wecollected participants’ self-reported personality via the Big FiveInventory-2 questionnaire (BFI-2) [75], their previous experiencewith voice assistants, as well as demographic data.

We conducted a data-driven inductivethematic analysis on the emerging dialogues. Two authors indepen-dently coded 27 randomly selected dialogues per scenario (13.2% oftotal dialogues), deriving a preliminary coding scheme. Afterwards,four researchers closely reviewed and discussed the resulting cat-egories to derive a codebook. The two initial coders then refinedthese categories and re-coded the first sample with the codebook athand to ensure mutual understanding. After comparing the results,the first author performed the final analysis. In case of uncertainty,single dialogues were discussed by two authors to eliminate any dis-crepancies. In the findings below, we present representative quotesfor the themes as well as noteworthy examples of extraordinarydialogues. All user quotes are reproduced with original spelling andemphasis. Our approach follows common practice in comparablequalitative HCI research [17, 21].

In an exploratory analysis,we analysed the relationship of user personality and the analysedaspects of the dialogues with (generalised) linear mixed-effectsmodels (LMMs), using the R package lme4 [2]. We further used theR package lmerTest [43] which provides p-values for mixed modelsusing Satterthwaite’s method. Following similar analyses in relatedwork [86], we used LMMs to account for individual differences viarandom intercepts (for participant and scenario), in addition to thefixed effects (participants’ Big Five personality dimension). In linewith best-practice guidelines [55], we report LMM results in briefformat here, with the full analyses as supplementary material.

To determine the required sample size, we performed an a prioripower analysis for a point biserial correlation model. We usedG*Power [31] for the analysis, specifying the common values of80% for the statistical power and 5% for the alpha level. Earlierstudies regarding the role of personality in language usage [54, 67]informed the expected effect size of around 0.2 so that we stipulateda minimum sample size of 191.We recruited participants using the web platform

Prolific . Afterexcluding three participants due to incomplete answers, our sampleconsisted of 205 participants (48.8% male, 50.7% female, 0.5% non-binary, mean age 36.2 years, range: 18–80 years).Participants on Prolific are paid in GBP (£) and studies are re-quired to pay a minimum amount that is equivalent to USD ($) 6.50per hour. Based on a pilot run we estimated our study to take 30minutes. Considering Prolific’s recommendation for fair payment,we thus offered £ 3.75 as compensation. We observed a mediancompletion time of 32 minutes with a high standard deviation of21 minutes. Since we wanted to exclude language proficiency anddialect as confounding factors, we decided to only include BritishEnglish native speakers.59.0% of participants had a university degree, 28.8% an A-leveldegree, and 9.8% a middle school degree (2.4% did not have aneducational degree). 94.1% of participants had interacted with avoice assistant at least once, while 32.2% used a voice assistant on adaily basis. Most popular use cases were searching for informationand playing music (mentioned by 54.1% and 51.2% of participants,respectively), followed by asking for the weather (35.1%), setting atimer or an alarm (16.1% and 12.7%), asking for entertainment inthe form of jokes or games (12.7%), controlling IoT devices (12.7%),and making a call (11.2%). Overall, this reflects Ammari et al.’sfindings [1] on which we based our scenarios.Figure 2 shows the distribution of participants’ personality scoresin the Big Five model.

We elicited 1,845 dialogues from 205 people with a total numberof 79,485 words and 9,239 speaker lines. On average, a dialoguecomprised 43.08 words (SD=31.06) and 5.01 lines (SD=2.89). last accessed 27.07.2020. liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan Scenario Initiated by U Terminated by U Turns U Turns VA Word count U Word count VA Questions U Questions VA

Search 100.0% 33.9% 2.89 (SD 1.54) 2.64 (SD 1.56) 23.68 (SD 13.51) 29.92 (SD 22.23) 1.25 (SD 1.04) 1.01 (SD 1.08)Music 96.0% 29.8% 2.29 (SD 1.40) 2.04 (SD 1.35) 15.05 (SD 12.23) 15.36 (SD 14.49) 0.48 (SD 0.81) 0.82 (SD 1.01)IoT 89.1% 23.7% 2.00 (SD 1.24) 1.88 (SD 1.24) 18.19 (SD 12.93) 16.03 (SD 12.96) 0.61 (SD 0.70) 0.54 (SD 0.89)Volume 72.3% 37.9% 2.05 (SD 1.24) 1.96 (SD 1.32) 16.16 (SD 12.2) 17.48 (SD 15.34) 0.62 (SD 0.79) 0.56 (SD 0.84)Weather 95.5% 40.7% 2.62 (SD 1.33) 2.36 (SD 1.33) 23.40 (SD 13.07) 31.25 (SD 22.62) 1.30 (SD 0.99) 0.41 (SD 0.70)Joke 95.5% 28.9% 2.50 (SD 1.38) 2.33 (SD 1.37) 17.07 (SD 12.03) 22.94 (SD 26.61) 0.85 (SD 1.00) 1.04 (SD 1.10)Conversational 95.0% 37.0% 2.63 (SD 1.40) 2.38 (SD 1.39) 17.36 (SD 10.41) 22.85 (SD 16.20) 0.72 (SD 0.92) 0.99 (SD 0.97)Alarm 87.2% 31.1% 2.54 (SD 1.39) 2.32 (SD 1.37) 23.41 (SD 14.88) 22.95 (SD 18.48) 0.57 (SD 0.77) 0.69 (SD 0.83)Open 95.4% 33.7% 2.94 (SD 1.74) 2.78 (SD 1.80) 24.53 (SD 18.45) 30.11 (SD 26.85) 0.91 (SD 1.00) 1.06 (SD 1.27)

Table 2: Automatically extracted data from the dialogues: Percent of dialogues which were initiated and terminated by the user(U) in contrast to the voice assistant (VA). For the other columns, the mean and the standard deviation (SD) over all dialoguesin the respective scenario are given. O C D en s i t y E A Personality Score N Figure 2: Distribution of the Big Five personality scores inour sample (histogram and KDE plot).

We automatically extracted whethera dialogue was initiated by the voice assistant or the user. In themajority of cases, this was done by the user (91.7% out of 1,798 dia-logues) . Scenario Volume presents a notable exception, where thevoice assistant initiated 27.7% of dialogues (cf. Table 2). In contrast,overall 67% of the dialogues were terminated by the voice assistant.

When a dialogue was initiated by theuser, we analysed whether they addressed the voice assistant byname as a wake word. To this end, we examined whether the firstline of a dialogue included the voice assistant name the participanthad specified, one of the prevalent voice assistant names (e.g.,

Siri,Alexa, Google ), or “Voice Assistant” or “Assistant”. This was truefor 73.0% of user-initiated dialogues. Please note that a few participants forgot to indicate the speaker in their dialoguesso that the number of dialogues analysed here differs slightly from the total numberof dialogues.

As shown in Table 2, the overall number ofwords (including stop words) varied substantially within one sce-nario and also differed between scenarios. On average, the voiceassistant had a bigger share of speech (M=23.21 words per dia-logue, SD=20.92 words) than the user (M=19.87 words per dialogue,SD=13.91 words).

Despite the smaller share of speech, the userhad on average slightly more turns (M=2.50 lines per dialogue,SD=1.44 lines) than the voice assistant (M=2.30 lines per dialogue,SD=1.45 lines). The number of turns did not vary much between sce-narios (cf. Table 2). On average, participants described 3.66 speakerturns (SD=2.59 turns) per dialogue.

We automatically classified all written sen-tences as questions vs statements, building on an open source ques-tion detection method using the nltk library . We further extendedthis method with a list of keywords that in our context clearlymarked a question, as informed by our qualitative analysis (e.g.,“could you”, ”would you”, “have you”). Table 2 shows the numbersof questions per scenario and speaker. Over all scenarios, the grandmean was 0.81 questions for the user and 0.79 for the voice assistantper dialogue. Clark et al. [17] stressed that people perceive a clear dichotomybetween social and functional goals of a dialogue with a voiceassistant. As anticipated by our study design, the collected dialoguesmainly comprised task-related exchange with clear functional goals.Still, the majority of participants also incorporated social aspects,that is, “talk in which interpersonal goals are foregrounded andtask goals – if existent – are backgrounded” [44]. Social talk is notnecessary to fulfill a given task, but rather fosters rapport and trustamong the speakers and to agree on an interaction style [29].Our thematic analysis suggests three different kinds of social talkin the elicited dialogues: social protocol , chit-chat , and interpersonalconnection . We here define social protocol as an exchangeof polite conventions or obligations, such as saying “thank you” , “please” , a form of general affirmation (e.g., “great” ) or wishing the https://github.com/kartikn27/nlp-question-detection, last accessed 15.09.2020 HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al. S o c i a l p r o t o c o l C h i t - c h a t I n t e r p e r s o n a l O p i n i o n R e c o mm e n d a t i o n S u gg e s t i o n T h i n k i n g a h e a d H u m o u r C o n t r a d i c t i n g A s k i n g f o r f ee db a c k K n o w l . u s e r K n o w l . e n v i r o n m e n t L e a d t o VA T r u s t i n g VA SearchMusicIOTVolumeWeatherJokeConversationalAlarmOpen

56 12 1 4.9 5.4 6.3 41 0.5 1 2 9.3 50 2.9 144 14 3.4 8.3 13 18 16 2.4 0.5 2.9 28 2.9 28 053 9.3 2.9 3.4 6.3 3.4 18 1 0.5 3.4 16 2.4 5.4 054 4.9 1 20 11 19 26 1.5 6.3 10 4.9 19 12 1.556 8.8 2 7.3 26 25 13 0.5 0.5 0.5 5.9 4.4 40 140 4.9 3.9 5.9 4.9 42 17 44 0.5 2 4.9 2.9 4.9 0.557 9.8 7.3 3.9 13 60 38 1 0 2 11 6.3 53 0.558 7.8 4.4 2.4 7.3 23 45 1 1.5 3.4 11 7.3 10 1.555 12 4.9 5.9 8.8 34 28 1 0.5 5.9 12 18 24 12 020406080100

Figure 3: Percent of dialogues covering each coded category in each scenario. other a “good night” . 91.7% of participants incorporated at leastone of such phrases in at least one of the scenarios. Yet, mostparticipants did not do so in all their dialogues. The use of socialprotocol ranged from 39.5% of participants in scenario

Joke to 58.5%in scenario

Alarm . With chit-chat , we here refer to an informal con-versation on an impersonal level that is not relevant for the actualtask. This includes wishing the user fun or affirming a particulardecision (e.g.,

VA: “no problems enjoy the movie i have heard it is verygood” (P40)), assuring to be “glad to be of service” (P179), or smalltalk (e.g.,

VA:“Yes, although hopefully will be some sunny breaks inthe weather.” (P105); “Oh, dinner time, already? Where has the daygone?” (P55)). 40.0% of participants used chit-chat at least once.Chit-chat occurred most frequently in the scenarios

Music (13.7%of participants),

Search , and

Open (12.2% of dialogues each).

Following Doyle et al. [28], wedefine interpersonal connection as talk about personal topics thatbuilds an interpersonal relationship. 20.5% of participants describedinterpersonal connection in at least one of the scenarios in a broadrange of ways. Interpersonal connection appeared over all scenar-ios but was slightly more prominent in

Conversational (7.3% ofdialogues) and

Open (4.9% of dialogues). In the former, it was pri-marily manifested through enquiries about the user (e.g.,

VA: “Itappears you are not sleeping yet, what’s bothering you?” (P173)),which the user responds to by sharing what is on their mind, suchas anxiety about speaking in public (P145), dealing with a child withautism (P141), or difficulties at work (P87). The voice assistant thencomforts the user (e.g., “Don’t worry I got the perfect plan” (P120)) ormakes suggestions on how to deal with the situation. For example,P87 sketched a voice assistant which offers to have the user’s back(e.g.,

U: “My boss reprimanded me” – VA: “WHAT?? Shall I suggestways to take your revenge? [...] Take me into work with you with yourheadpiece on and I’ll suggest replies the next time he’s nasty to you” ),and P152’s voice assistant motivates the user in a witty way (e.g.,

VA: “Get out of bed and then i will start” – U: “That’s harsh” – VA: Come on, i’ll play the Spice Girls if you promise to dance along andsing into your hairbrush” ).In the

Open scenario, interpersonal connection was manifestedin various ways, such as through emphasising the relationship withthe user (e.g.,

VA: “I hope you wouldn’t ever lie to me as I’m your bestfriend” (P87)), by recollecting shared experiences (e.g.,

VA: “Hereare my favorite pictures of last Halloween. Personally this was myfavorite costume, and if I remember correctly we listened to this artistall night. I turn up the music and play some of her tunes.” (P2)), ordiscussing the user’s love life (P179):VA: “Was that hesitation I registered in your voice?” U: “No, what are you talking about? Of course I’m cook-ing for myself, who else would I be cooking for?” VA: “A lady, maybe? ” U: “.....” VA: “What’s her name?” U: “None of your business” VA: “Dude I am an AI that lives in your house, of courseit’s gonna be my business. If it is a lady coming overthen you need to be a lot cooler than you are with me” U: “Ahh dude you’re right, I’m sorry I’m just nervous” VA: “No shit”

Our thematic analysis showed that the majority of participants lettheir voice assistant take the lead in parts of the dialogue. By takingthe lead , we refer to the voice assistant either providing advice tothe user or doing something the user did not specifically ask for.We further differentiate between suggesting , recommending , givingan opinion , thinking ahead , contradicting , and refusing , as emergedfrom our analysis. denotes “mention[ing] an idea, possible plan, oraction for other people to consider” according to the Cambridge Dic-tionary . That is, the voice assistant selects individual options and https://dictionary.cambridge.org/dictionary/english/suggest, last accessed 12.09.2020. liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan presents them to the user, without indicating a preference for oneof them. Suggestions are often introduced by “What about” , “Youcould” , or “Which would you prefer” . 87.8% of participants had theirvoice assistant give at least one suggestion over all dialogues. Sug-gestions occurred most often in the scenarios Conversational (60.0%of dialogues) and

Joke (42.0% of dialogues), while less than 10% ofdialogues in the scenarios

Search and

IoT contained a suggestion.Suggestions mainly came in the form of possible options thevoice assistant pointed out to the user, such as films, music, books,jokes, games, or recipes. When giving a suggestion, the voice as-sistant often took into account user preferences (e.g.,

VA: “There’sa film called Onward from Disney, I know you like the Pixar films.” (P107)) or context (e.g.,

VA: “What are you cooking today?” – U: “I’mmaking meatloaf.” – VA: “OK, I’ve found a playslist for you startingwith Bat out of Hell.” (P67)).Other suggestions were more complex. Depending on the sce-nario, this included strategies for falling asleep, avoiding oversleep-ing, preparing a trip, or dealing with the neighbours ( e.g., U: “HeyLexi, My neighbours think my music is too loud.” – VA: “How about Ifind a new home?” – U: “No, that isn’t realistic enough. – VA: “Whatif i search for some great headphones?” – U: “Sure! That would begreat” – VA: “I will get onto that.” (P27)). on the other hand, describes advising some-one to do something and emphasising the best option . The voice as-sistant usually ushers recommendations by phrases such as “I woulddo/choose” or “I recommend” . 51.7% of participants let their voiceassistant give at least one recommendation in varying complex-ity. For instance, the advice given by P8’s voice assistant is ratherstraightforward ( “I recommend spaghetti ala carbonara” ) while P15described a more complex recommendation: VA: “hey rami, let meoptimise the frequency of the speakers so we have the maximumvolume indoors without decibels spilling over into the neighboursear shot.” – U: “that’s great, I didn’t even know you could do that” .Recommendations concerned entertainment, such as films, music,videos, or how users could achieve their goals, for example, goingto the cinema or not to oversleep. In a few cases, the voice assistantnudged the user towards better behaviour (e.g.,

VA: “You shouldturn it [the music] down as your neighbours have complained before” (P186)) or helped saving energy (P103):VA: “I’ve noticed you’ve been leaving the lights on allnight.” U: “I know. It’s when I read in bed. I fall asleep andforget to turn them out.” VA: “Did you want me to turn them off for you? [...]” U: “Will it make much difference whether they are onor off?” VA: “It’ll save electricity. On your current plan, youwould save £3.00 per month by turning out the lightsevery night.”

Over all scenarios, recommendations occurred most prominentlyin

Weather (25.8% of dialogues), in which the voice assistant recom-mended what to pack. These recommendations were often basedon knowledge about the weather forecast (e.g.,

VA: “The weather inRome, Italy is expected to be hot and dry this week. I would recommend https://dictionary.cambridge.org/dictionary/english/recommend, last accessed27.07.2020. bringing light, breathable shorts and shirts.” (P137)) or knowledgeabout the expected context (e.g., VA: “The best views of the city arefrom the gardens above the valley, so make sure to take somethingyou can walk comfortably in.” (P125)).Second was the

Conversational scenario (13.2% of dialogues), inwhich the user asked the voice assistant’s help for falling asleep(e.g.,

U: “I cannot sleep, do you have any useful recommendations?” (P20)). refers to sharing thoughts, beliefs, andjudgments about someone or something . 39% of participants hadtheir voice assistant give an opinion in at least one dialogue. Interms of scenarios, the voice assistant most often expressed anopinion in Volume (20.0% of dialogues). Here, the voice assistantcommented: “I think the music you are playing is too loud. It willannoy your neighbours.” (P204). Apart from this, the voice assistantalso shared its opinion on the user’s choice of film or food, usuallypraising the user (e.g., “U: Great, I think I’d like to go to see (film)at 7pm.” – VA: “Good choice [...]” (P191);

U: “Hey Lexi, I’m going toItaly” – VA: “Ciao! what a beautiful country” (P27)). Moreover, itcommented on bad habits of the user, such as

VA: “Haha yeah it’s[leaving on the lights at night] not a good habit” (P179). In addition,the voice assistant shared its taste, confident that the user will likeit, too:

VA: “I’ll play you a mix of some songs you know and somenew things I think you’ll like.” (P48). describes that the voice assistant anticipatespossible next steps and proposes them to the user without herasking for them. Note that this does not include voice assistantenquiries due to incomplete information on a task (e.g.,

U: “Hi frank,what are the film times for local cinema” – VA: “Please choose whichcinema” (P45)). Examples for thinking ahead include offering tobook tickets when a user asks for film showings (e.g.,

VA: “Theyplay it [the film] at 7:30pm on Saturday. Do you want me to book it?” (P13)), suggesting to set a reminder or a morning routine (e.g.,

VA:“No problem I will wake you at least one hour before that and preparea coffee so that you are actually awake.” (P2)), or making the usercomfortable (e.g.,

VA: “Its going to be chilly in the morning shall I setthe Hive for the heating to come on a little earlier than usual so itswarm when you get up?” (P99)) . 83.4% of participants created such aforesighted voice assistant at least once, even though their usersdid not always accept the proposed actions. Thinking ahead wasparticularly prevalent in the scenarios

Alarm (45.4% of dialogues)and

Search (41.0% of dialogues). denotes parts of the dialogue in which thevoice assistant disagrees or argues with the user. Only 8.3% ofparticipants let the voice assistant contradict the user at least once.Single cases of contradicting were spread over all scenarios, whilemost occurrences were part of the scenario

Volume (in 13 out of 205dialogues). While in some cases, the voice assistant carefully phrasesits objection (e.g.,

U: “I don’t think so Sally, I like it loud.” – VA: “Well,forgive me, but I have very sensitive hearing and can hear them nextdoor getting a bit upset with you.” (P65)), it made this objection veryclear in others (e.g.,

VA: “You do realise that your music is so extremelyand inconsideratly loud that it could be annoying everyone includingthe neighbours” (P106)). Other examples for contradicting included https://dictionary.cambridge.org/dictionary/english/opinion, last accessed 27.07.2020. HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al. the voice assistant having a different opinion on a particular topic(e.g.,

U: “Hmm, no I don’t like her [Jennifer Anniston] as an actress.”– VA: “She is very talented” (P20) or on fulfilling a task (e.g.,

U: “Canyou set the alarm for 8am please?” – VA: “Maybe I should set it for7.30am just in case and to give you more time.” (P20)). Interestingly,in all arguments, the user always gave in and followed the voiceassistant’s advice.

Only three participants (in four dialogues) hadthe voice assistant refuse what the user asked for. For example,the voice assistant declined to increase the volume of the music(

VA: “Yes, but it’s so loud that it’s keeping me awake” (P87)) andasked the user instead to “please plug in your headphones” . Anotherparticipant (P179) described an assertive human-like voice assistantwhich tells the user’s friends a funny story about the user despitetheir protest, and fights with the user about who turns down themusic:

VA: “You’re the one with hands, you turn it down” – U: “You’reliterally in the ether where the electronics live, you turn it down [...]” . Volume (10.2%), for instance, to enquire if the user was “happywith that [adjusted] sound level” (P21). In other scenarios, the voiceassistant wondered whether the user “[liked] that story” (P81) orthe music (

Ok, but if it is getting too funky just say it!” (P2)) thevoice assistant had suggested.

Joke sce-nario (44.4% of dialogues).Examples from other scenarios included remarks on the user’sfilm choice

Terminator ( VA: “and don’t forget, I’ll be back” (P19)) andsarcasm when asked for a suitable conversation topic (

VA: “Ok. Letsmake it interesting. What’s everyone’s position on brexit?” (P113)).However, voice assistant humour appealed to the users differently.While some dialogues encompassed appreciation (e.g.,

U: “You areso funny” (P28)), others described the user as less convinced (e.g.,

U: “Nice try” (P244)).

Participants attributed the voice assistant knowledge about the user as well as knowledge about the environment . With knowledge about the user ,we refer to voice assistant knowledge about user behaviour andpreferences. For example, when the voice assistant is aware of theuser’s schedule (e.g.,

VA: “It looks like the 8PM showing would fit intoyour schedule best” (P178)) and favoured choices (e.g.,

VA: “Maybeone of your favourite playlists - last time you were cooking you playedthis one?” (P99)). Participants also let the voice assistant knowabout the user’s health (e.g.,

VA: “I see your heart beep is movingirregular[ly]. You okay[?]” (P120)), habits (e.g.,

U: “Hey Masno, youknow i snoring every night.” – VA: “Yes, you are so loud.” (P5)), andpast events (e.g.,

VA: “Hi, It’s Sally, why are you not sharing thestories about your last holiday with your friends?” (P65)). 58.5% of participants equipped the voice assistant with knowledge aboutthe user in at least one scenario. In terms of the scenarios, this kindof knowledge was most strongly represented in

Music (28.3% ofdialogues) and

IoT (15.6% of dialogues), where it is primarily relatedto the user’s preferences in terms of musical taste. Conversely, inscenario

IoT the voice assistant is equipped with knowledge aboutthe user behaviour, in particular to automatically recognise whetherthey are already asleep.

Knowledge about theenvironment includes intelligence about the status of other devicesin the house (e.g.,

U: “Henry, can you tell me what’s low in stock in thefridge[?]” (P185)) as well as the ability to interact with these devices(e.g.,

VA: “I will get the coffee machine ready for when you wake so thesmell might get you to rise” (P105) ). It also comprises awareness ofthe current location and distance to points of interest in the vicinity(e.g.,

U: “Hey Masno, could you check whats time the local cinemaare playing this new action film?” (P5 )), and a kind of omniscientknowledge about others (e.g.,

VA: “I’ll turn it down when I hear thementer th[ei]r house.” (P23)) . 71.2% of participants equipped the voiceassistant with such knowledge in at least one of the scenarios. Thiswas most prevalent in

Search (50.2% of dialogues) with knowledgeabout the nearest or local cinema, followed by

Volume (19.0% ofdialogues) with knowledge about the neighbours, and by the

Open scenario (18.5%). In the latter, the voice assistant could often tell itsuser what is in the fridge, interact with other devices in the house,or even knew the stock and prices of items in all local supermarkets(e.g.,

U: “Can you check my local supermarkets to see if anyone hasgot Nescafe on offer?” – VA: “I can see that Morrisons has 2 jars forthe price of 1, would you like me to add this to your shopping list?” ). Open scenario recorded the most occurrencesof this category (12.2% of dialogues), indicating that trusting thevoice assistant with challenging tasks is more of a future use case.Examples included social tasks, such as writing a message withoutspecifying the exact content (e.g.,

U: “I need you to write an emailto my daughter’s college. [...] The additional help provided for herbecause of her dsylexia. They promised reader pens and a dictaphonebut she hasn’t received them yet. Please ask why” (P41)), sending outbirthday cards or selecting pictures to show to friends. Moreover,participants trusted the voice assistant with preparing a weeklymeal plan and ordering the according ingredients, putting togethera suitable outfit or planning a trip, paying for expenses, or editingpresentations for work as well as making a website.

Conversational (52.7% of dialogues)and

Weather (39.5% of dialogues). In the former, participants soughtadvice from the voice assistant on how to fall asleep or waited liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan for the voice assistant to help by simply stating that they were “having trouble falling asleep” (P11) . In

Weather , participants let theuser not only ask the voice assistant for the weather but also forrecommendations on what to pack (e.g., “U: Can I ask your advice onwhat type of clothes to pack for Italy, will I need any light jumpers oranything?” (P107)). Participants also liked to see the voice assistantas a source of inspiration, which provides suggestions on what toread, cook, play, or listen to. For example, P2 requested: “Surpriseme and play something you like” . In a few cases, partic-ipants incorporated an explicit description of the voice assistantby letting the user comment on them (e.g., “U: You are so funny.” (P103)). These descriptions were only found in sixteen dialogues(0.9%) and included seven times “funny” , four times “smart” or “clever” , and once “reassuring” . Once the voice assistant was calledan “entertainer” and a “mindreader” . One participant noted: “You’vegot my back – ain’t you” (P179). On the other hand, three partic-ipants also commented on the voice assistant’s lack of wittiness(e.g., “No! Something actually funny!” (P103)). Finally, we analysed how many dialogues did not fall into any ofthe aforementioned categories. These dialogues can be seen asa depiction of the status quo: a functional task-related request.Table 3 provides example status quo dialogues for each scenario.The occurrence of these dialogues ranged from 14.1% in scenario

Joke to 32.7% in scenario

IoT .Status quo dialogues were on average shorter than dialoguesoverall (difference between the two indicated by Δ , respectively).User (M=11.76 words per dialogue, SD=8.21, Δ =8.11) and in partic-ular voice assistant (M=9.52 words per dialogue, SD=9.89, Δ =13.69)had a smaller share of speech in the status quo dialogues in contrastto the average word count over all dialogues. Similarly, there werefewer speaker turns both by the user (M=1.45, SD=0.90, Δ =1.05)and the voice assistant (M=1.39, SD=0.96, Δ =0.91). 97.1% of thestatus quo dialogues were initiated by the user (in 18 of 401 dia-logues, the first speaker was not defined), while 90.4% of the statusquo dialogues were terminated by the voice assistant (in 35 of 401dialogues, the last speaker was not defined). As a last task, participants were asked to write a dialogue for anotherscenario they would like to use their perfect voice assistant in. Thesedialogues indicated a broad spectrum of imagined use cases, yetthe majority (58%) reflected already existing ones. This is commonwhen people are asked to imagine a technology which does not existyet [78]. These scenarios included receiving recommendations orsuggestions from the assistant (mentioned by 10.8% of participantsin all open scenarios), searching for information (10.8%), controllingIoT devices (10.2%), using the assistant instead of typing (e.g., fornotes, shopping lists, text messages; 8.3%), getting directions (5.9%),or setting an alarm, timer, or notification (2.3%).However, 47.3% of people mentioned scenarios in which the voiceassistant’s capabilities exceed the status quo. In most of these cases(43.9%), they imagined the voice assistant to become a personalassistant with very diverse roles and tasks, which supports them in their decision-making. For example, P114 would like to havecooking assistance: “I would ask the voice assistant [...] for help incooking dishes like homemade curries and perfect pork crackling jointsand perfect roast potatoes” . P65 saw her perfect voice assistant asa diet and meal planner: “[It orders] me food shopping with gooddates, healthy choices in the foods I like. It would also consider mydietary requirements (lactose intolerant) and add the substitutes I likefor dairy [...]” . P13 imagined a housework organiser ( “To plan myhousework for the week and give me reminders to do it. And chase meup if I don’t say that it is completed” ), and P61 a personal shopper( “Give your preferences, size etc [...] Give [...] the event type you areattending and your price range and ask to order you outfits for theoccasion.” ). P139 even trusted the voice assistant in “coping withan autistic child and helping to handle them” , and P25 would liketo use it for mental health support. Another four people describeda scenario in which the assistant helps in an emergency, such asalerting the neighbour in case of a household accident (P73).We further classified the roles participants implicitly ascribed totheir perfect voice assistant in the

Open scenario. Three differentroles emerged from the dialogues: tool , assistant , and friend . Weclassified the role as tool if the user utilises the voice assistant inorder to do something they want to do , that is, a clearly definedtask which the voice assistant simply carries out. 26.3% of all Open scenario dialogues featured a voice assistant as a tool. For example,P38 sketched a dialogue for setting an alarm:

U: “Set an alarm for 10minutes please.” - VA: “Alarm set” . A voice assistant as an assistant issomeone who helps the user to do their job . In contrast to the tool,however, the task is not precisely defined, but requires a certainamount of creativity, thinking ahead, or individual responsibility.Moreover, the voice assistant is seen as a person rather than a thing.72.3% of participants ascribed an assistant role to the voice assistant.For example, P16 would like support to find presents: U: “Hey google,it’s sarah’s from works birthday on 22nd January, can you remind meto get her a gift?” - VA: “Hey rami, sure thing, let me put that in thecalendar for you. We can put together a list of gift ideas, do you haveanything in mind?”

Finally, a voice assistant as a friend knows theuser well and has a close, personal relationship with them . Onlythree participants imagined a closer relationship with their voiceassistant – “a best friend who will never betray me. :-)” , as P86 put it. As an overview, Figure 4 shows the correlation coefficients betweenuser personality and the examined measures. We overall see positiveassociations of

Conscientiousness , Openness , and

Agreeableness withmeasures of dialogue length (turns, word counts of both user andvoice assistant). Further associations stand out for

Openness and

Trusting the VA (positive), and

Neuroticism and

Humour (negative).In addition, we created one generalised LMM for each measure,as described in Section 3.3.2. For brevity, we only report on some ofthe models here. In particular, to account for the exploratory natureof our analysis, we make this decision based on the uncorrected p-value: That is, we report on all models with a predictor with p<.05.We provide the analysis output of all models in the supplementary https://dictionary.cambridge.org/dictionary/english/tool, last accessed 04.01.2021 https://dictionary.cambridge.org/dictionary/english/assistant, last accessed 04.01.2021 https://dictionary.cambridge.org/dictionary/english/friend, last accessed 04.01.2021 HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al.

Scenario % of dialogues Example Status Quo Dialogue

Search 20.0% U: “Assistant, search up the film times for Shrek at the Odeon in Liverpool.” – VA: “The film times are at 2:00, 2:45 and 5:00.” (P170)Music 22.4% U: “Eleonora play Tom Petty on Spotify.” – VA: “Playing songs by Tom Petty on Spotify.” (P26)IoT 32.7% U: “Google, switch off all home lights at 2am.” – VA: “Ok, done, lights will switch off at 2am” (P34)Volume 27.8% U: “Alexa turn the music down to 6.” – VA: “Ok.” (P18)Weather 21.0% U: “Minerva, what is the weather going to be like in Italy this week?.” – VA: “The weather will be mostly sunny in Italy this week.” (P186)Joke 14.1% U: “Bubble, it’s a party! Tell us something fun and interesting.” – VA: “Here are some fun stories i have found on the internet..” – (P109)Conversational 16.6% U: “Hey google, play rain sounds.” – VA: “Playing rain sounds” (P6)Alarm 24.4% U: “Google , set an alarm for 8am.” – VA: “OK , alarm set for 8am.” (P34)Open Scenario 20.5% U: “Dotty, reminder for hospital appointment at 3 pm tomorrow.” – VA: “Reminder set.” (P15)

Table 3: Example dialogues for each scenario which did not fall into any other category. These dialogues can be seen as adepiction of the status quo: functional task-related request. Percentages refer to their share in all dialogues per scenario. S o c i a l p r o t o c o l C h i t - c ha t I n t e r pe r s ona l O p i n i on R e c o mm enda t i on S ugge s t i on T h i n k i ng ahead H u m ou r C on t r ad i c t i ng A sk i ng f o r f eedba ck K no w l . u s e r K no w l . en v i r on m en t Lead t o VA T r u s t i ng VA T u r n s W o r d c oun t ( u s e r) W o r d c oun t ( VA ) I n i t i a t ed ( u s e r) Q ue s t i on s ( u s e r) Q ue s t i on s ( VA ) OCEAN -0.02 -0.04 0.09 0.02 0.06 0.11 0 0.03 -0.02 0.04 0.12 0.03 0.1 0.23 0.1 0.12 0.2 -0.12 0.13 0.10.11 0.04 0.03 0.01 0.02 0.04 0.08 0 -0.03 0.02 0.04 0.02 0.01 -0.03 0.16 0.23 0.18 0 0.15 0.10.02 0.08 0.03 0.06 0.04 -0.04 0.01 0.1 0.04 -0.01 0 -0.02 -0.01 0 0.03 0.08 0.08 -0.02 0 0.050.08 0.1 0.12 0.08 0.02 -0.02 0.05 0.03 0.04 0.09 0.09 0.06 -0.01 0.02 0.17 0.21 0.16 -0.04 0.14 0.090.03 -0.06 -0.05 -0.03 -0.07 -0.02 -0.01 -0.22 -0.01 -0.07 -0.06 -0.05 -0.02 -0.04 -0.07 -0.16 -0.13 -0.04 -0.01 -0.01 −0.16−0.080.000.080.16

Figure 4: Spearman correlations of Big 5 personality scores and aspects of the dialogues. material. Since this is an exploratory analysis, we highlight thatsignificance here is not to be interpreted as confirmatory. Rather,we intend our results here to serve the community as pointers forfurther investigation in future (confirmatory) work.For

Opinion , the model had

Conscientiousness as a significantnegative predictor ( 𝛽 =-0.479, SE=0.239, 𝛽 𝑠𝑡𝑑 =-0.352, 95% CI=[-0.946,-0.011], z=-2.01, p<.05), indicating that people who score higher onthis personality dimension might prefer voice assistants that lessfrequently express own opinions. Based on the coefficient exp ( 𝛽 𝑠𝑡𝑑 ) ,a one point increase in Conscientiousness results in 0.70 times thechance of including an opinion in the dialogue.For

Humour , the model had

Neuroticism as a significant negativepredictor ( 𝛽 =-0.797, SE=0.252, 𝛽 𝑠𝑡𝑑 =-0.701, 95% CI=[-1.291, -0.302],z=-3.16, p<.01), indicating that people who score higher on thisdimension might prefer assistants that less frequently express hu-mour: In this model, a one point increase in Neuroticism results in0.50 times the chance of including humor in the dialogue.For

Question (User) , the model had

Conscientiousness as a signifi-cant positive predictor ( 𝛽 =0.198, SE=0.098, 𝛽 𝑠𝑡𝑑 =0.150, 95% CI=[0.007,0.389], z=2.03, p<.05), indicating that people who score higher onthis dimension might prefer asking more questions when convers-ing with voice assistants: In this model, a one point increase in Conscientiousness results in 1.22 times the chance of a user’s sen-tence being a question.

Our data, method, and findings are limited in several ways andshould be understood with these limitations in mind.First, while our scenario selection was informed by the most pop-ular real-world use cases for voice assistants [1], our data was col-lected in an online survey. In contrast to everyday use of voice assis-tants, where conversations are usually embedded in various real-lifesituations, this created a more artificial setting which might haveinfluenced the dialogue production [68]. Moreover, users mightdisplay different dialogue preferences in practice than in theory.Second, our dialogues were written down, and are therefore lim-ited in what they can tell us about actual, spoken conversations.This concerns, for example, the negotiation of turn-taking, which isusually an important part of conversation analysis [72], but cannotbe assessed on our data. We focus on how the envisioned dialoguesshould be structured in terms of content and proportion of distinctlinguistic behaviour (e.g., contains social talk). Paralinguistic as-pects of speech (e.g., accent, tone) are equally important, but require liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan spoken conversation. However, it might be more difficult for par-ticipants to embody both voice assistant and user while inventinga spoken dialogue in a study. Evaluating the content – as we did –may therefore be (initially) more actionable for user-centred design.Also, asking crowdworkers to write dialogues for a conversationflow has been effectively utilised before [14], further demonstrat-ing the potential for written dialogue elicitation in conversationalinterface design.Third, as anticipated by our study design, the collected dialoguesmainly comprised task-related conversation with clear functionalgoals. Our findings and implications thus might not generalise tonon-task-related dialogues with a fuzzy goal or no goal at all.Fourth, while writing the dialogues, participants had to anticipatea technology which does not yet exist, at least in the form werequested. We acknowledge that the dialogues might thereforehave been influenced by participants’ imagination. More creativeparticipants might have come up with richer dialogues than others.Finally, it is important to note that, rather than representing goldstandard voice assistant interactions, the dialogues here should beinterpreted as the first step in a user-centred design process towardspersonalised dialogues based on users’ vision of what perfect voiceassistants should do in these tasks. As such, the elicited dialoguescan inform the design of personalised voice assistant prototypes ina next step. However, as in any user-centred design process, thesedialogues and prototypes must then be evaluated and validatedwith users. In particular, users’ visions of a perfect voice assistantmight change after experiencing the use of such a voice assistant.It is therefore essential to understand the design of personalisedvoice assistants as a process with several iteration loops.

By writing their envisioned dialogues with a voice assistant, par-ticipants implicitly painted a picture of the characteristics of theirperfect voice assistant. In the following subsection, we analyse anddiscuss these characteristics.

Our first research question asks how users envision a conversationwith a perfect voice assistant. The wide range and diversity ofdialogues suggest that there is no single answer. Here, we discussboth common trends as well as diverging preferences. Moreover,we point out implications for voice assistant design and research.

The majority of people envisions a voice assistant which issmarter and more proactive than today’s agents, and which haspersonal knowledge about users and their environment. In particu-lar, it gives well thought-through suggestions and recommendations to solve complex problems.The perfect assistant is also foresighted and proactive , anticipat-ing possible next actions. However, users in the dialogues do notalways accept their assistant’s suggestions. Together, these findingsindicate that, rather than a master-servant relationship [28], userswish for perfect voice assistants to be more collaborative.

Knowledge about the users and their environment may also makeconversations with assistants more effective and natural by creat-ing the impression of shared knowledge and common ground, asintegral to human dialogue effectiveness [15, 17].Such shared knowledge is currently missing in the design of voiceassistants [17]. Considering our results on questions, interactivity,and “thinking ahead”, this might be realised in current systems byallowing the assistant to proactively ask the user (more) questionsat opportune moments. Moreover, knowledge about the user alsoallows for more personalised suggestions and conversations, whichare more likely to appeal to the user.

Another trend in themajority of dialogues is that they are not intended or optimised forfast information retrieval. Current dialogues with voice assistantare characterised by a question-answer structure [28] and a mediancommand length of four words [3]. In contrast, people’s envisioneddialogues comprise longer speech acts and more interactivity, cre-ating the impression of being more conversational . This is furthersupported by the observed amount of non-task related talk, such aschit-chat, personal talk, or humour. Hence, it appears as if there is ademand for more human-like personal conversation with voice as-sistants than currently available despite recent discussions whetherhumanness is the best metaphor to interact with conversationalagents [28].In the long term, the design of voice assistants should aim formultiple-turn conversations. Yet, in the short term, a variety offillers to begin answers (e.g., “Sure, let me get on to this.” ) and closingremarks (e.g., “Enjoy the movie!” ) could be used to avoid raisingunrealistically high expectations.

People imaginedifferent roles for their perfect voice assistant: 22.0% of dialogueswere purely functional, suggesting that the assistant is seen as atool to get things done. However, the majority of dialogues depictsa helpful assistant who supports the users in their chores and mighttake over more complex tasks in the future, as suggested by partici-pants in the

Open scenario. As a consequence, users feel obliged toobey to conversational rules, including “thank you” and “please”.The number of participants following these social protocols washigher than expected from previous studies examining interactionwith a robot receptionist [46]. It was also surprising that 40.0% ofparticipants included a form of chit-chat since this kind of small talkwas previously flagged as inappropriate and unwelcome [28, 85]. Areason for this difference to related work could be that participantsimagined a more intelligent voice assistants than is currently avail-able. Echoing previous findings [17, 28], few participants regardedthe voice assistant as a friend . However, the scenarios participantsdescribed in the

Open task reveal use cases in which the assistantalso listens to and advises on personal issues.Overall, these findings motivate considering such roles as a con-ceptual basis for personalisation of voice assistants, beyond or inaddition to the currently dominant focus on personality. For in-stance, people living alone or with a smaller circle of acquaintancesmight be more likely to seek personal advice from a voice assistant,seeing it as a friend rather than an assistant.

HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al. “[w]e all needa bit of help from time to time and advice, and yet sometimes there isnobody to talk to. The option to ask [for] an opinion would be a greatthing to have at anybody’s disposal.”

In addition, most participantsdid not link their voice assistant to a particular company, whichmight also influence trust in its opinions. In certain situations, asmall part of participants seems to accept a voice assistant whichcontradicts the user. In one of our scenarios, participants heeded thevoice assistant’s objection to avoid a conflict with the neighbours.Thus, future work could leverage this knowledge and evaluate theeffectiveness of persuasive voice assistants for other topics suchas in supporting a healthy lifestyle or environmental-friendly be-haviour. In the short term, a voice assistant could offer its “own”opinions from time to time, yet only after the user has asked it todo so at least once.

Humour is considered an integralpart of conversations with humans as well as an interesting noveltyfeature and entry point for voice assistants [17, 48]. Our findingssuggest that there are individual preferences for humour: Morethan half of the participants did not equip their perfect assistantwith a sense of humour although they were given the task to enter-tain their friends. Three people even let the user comment on thevoice assistant’s lack of “actual” humour. On the other hand, othersacknowledged the assistant’s wittiness. Apart from the

Joke sce-nario, humour was often included in the form of comments on thesituation, for example, alluding to the user’s film choice or habitssuch as snoring. This kind of humour seems currently difficult toimplement. Overall, our findings thus imply to approach humourcarefully in voice assistant design today.

Comparingpeople’s vision of a perfect voice assistant with commercially avail-able voice assistants today (most prominently, Amazon’s Alexa, Apple’s Siri, and the Google Assistant), one of the most notabledifferences concerns the delivery of recommendations and sugges-tions. While most participants included suggestions and recommen-dations in their envisioned dialogues, today’s voice assistants aredesigned in a way which includes recommendations only sparsely.When prompted with the scenarios used in our study, Alexa, forexample, does not offer any recommendations on what to pack for atrip, while Siri only tells the weather without a specific suggestionwhen asked what to wear today. On the other hand, Alexa offersdifferent suggestions for activities based on the user’s current mood.Since this form of suggestion seems to be valued by people, voiceassistants could offer such features more extensively. However,the envisioned dialogues suggest that personalising suggestions toindividual users is likely to be challenging.Notably, today’s commercial voice assistants already implementa kind of humour similar to what people envisioned in their dia-logues. For example, when asked for a good night story, Siri sar-castically replies whether the user would like a glass of warm milknext. The Google Assistant jokingly suggests overtone singing uponbeing asked for music recommendations. Conversely, commercialvoice assistants avoid giving an opinion. For example, when askedwhether the music is too loud, Siri, Alexa, and the Google Assistantturn down the volume instead of answering the question. Whilethis seems reasonable at the moment to avoid increasing expecta-tions [28], future voice assistants might carefully assess whethertheir user enjoys humour and opinions and correspondingly decidewhether to incorporate them. For example, a voice assistant couldconsider the current volume, time of day, the user’s living situation,and past music behaviour to give an opinion.Besides, the envisioned perfect voice assistant seems to be ableto “think” more independently by directly presenting an answer,while commercial voice assistants often fall back on web searches.For example, when asked for movie times, most participants’ envi-sioned the voice assistant to give an immediate answer, whereasSiri presents a web search with the results.

In summary, most people envisioned dialogueswith a perfect voice assistant that were highly interactive and notpurely functional; it is smart, proactive, and has personalised knowl-edge about the user. On the other hand, peoples’ attitude towardsthe assistant’s role and it expressing humour and opinions diverged.The envisioned characteristics echo previous findings on the needto convey voice assistant skills through dialogue [48] and that fewusers see a voice assistant as a friend [17, 28], while expanding onthe importance of different user requirements for conversationalskills missing at present [68]. They challenge the assumption thatusers feel voice assistants should not use opinions, humour, or socialtalk [28] – some users welcome this for a perfect voice assistant.To formalise these findings, we conclude this section using theten dimensions for conversational agent personality by Völkel et al.[84]: The assistant’s personality envisioned here seems to be highon

Serviceable , Approachable , Social-Inclined , and

Social-Assistant ,and low on

Confrontational , Unstable , and

Artificial . With respect tothe dimensions

Social-Entertaining and

Self-Conscious , participantsseemed to have mixed opinions. liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan

Our exploratory analysis indicates a limited effect of personality onpeople’s vision of a perfect voice assistant. Moreover, the significantresults are to be interpreted with caution due to the number of testsperformed. Figure 4 shows correlations comparable to previousresearch [54, 67].Our results suggest that

Neuroticism has a small negative rela-tionship with humour . Neurotic individuals tend to perceive newtechnologies as less useful and often experience negative emotionswhen using them since they associate them with stress [24]. Besides,the

Joke scenario described a situation in which the user is respon-sible for entertaining friends – a potentially stressful situation for aneurotic user. Therefore, neurotic individuals might prefer stayingin control of the situation by telling the voice assistant exactly whatto do instead of relying on its sense of humour.Our LMM analysis indicates

Conscientiousness as a negative pre-dictor for the assistant offering an opinion . When seeking informa-tion, conscientious people are described as deep divers , valuing highquality information and structured deep analysis [38]. It thus seemsfitting that these users prefer their assistant to provide fact-basedinstead of opinionated knowledge, in particular since it is difficultto assess the quality of this information.The correlations further suggest a small positive relation be-tween

Openness and trusting the assistant with complex tasks . Indi-viduals who score high on

Openness are intellectually curious andwere found to be early adopters of new technology [87]. Hence, itseems likely they might be more willing to try out new use cases.However, it could also be possible that this correlation stems fromopen individuals’ higher creativity.Our findings do not indicate any meaningful relationship be-tween

Extraversion and the characteristics of an envisioned conver-sation with a perfect voice assistant. This is surprising since therelationship between extraversion and linguistic features is usuallymost pronounced [4, 25, 33, 59, 67].Summing up, our findings give first pointers to potential rela-tionships between Big Five personality traits and characteristicsof the envisioned dialogue with a perfect voice assistant. However,this relationship might be less pronounced than could have beenexpected from related work. A reason for this lack of effect couldbe that our work only concentrates on linguistic content of a di-alogue, while previous work particularly synthesised personalityfrom paraverbal features (e.g. [45]). This opens up opportunitiesfor future work, which we discuss in the following section.

While recent work has emphasised the gulf between user expec-tations and voice assistant capabilities [21, 48, 68], little has beenknown about what users actually do want. To address this gap, wecontribute a systematic empirical analysis of users’ vision of a con-versation with a perfect voice assistant, based on 1,845 dialogueswritten by 205 participants in an online study.Overall, our dialogues reveal a preference for human-like conver-sations with voice assistants, which go beyond being purely func-tional. In particular, they imply assistants that are smart, proactive,and include knowledge about the user. We further found varying user preferences for the assistant’s role, as well as its expression ofhumour and opinions.Since these differences between users can only be explained to alimited extent by their personality, future research should examineother user characteristics more closely to shed further light on howto make the interaction experience more personal. For example,user preference for a particular role of the voice assistant could alsobe due to age or current living situation. Our work also suggeststhat a perfect voice assistant adapts to different situations. Thus,exploring the usage context and its influence on users’ vision can beanother starting point for future research. Finally, our work pointsto the importance of a trustworthy voice assistant that acts in theuser’s best interest. Given recent eavesdropping scandals aboutvoice assistants in users’ homes [32], future work should examinehow this trust can be built while at the same time integrating theinterests of companies.In a wider view, our study underlines the vision of enabling conversational UIs, rather than single command “Q&As”. Towardsthis vision, our method was effective in enabling people to depictpotential experiences anchored by existing concrete use cases. Look-ing ahead, allowing people to draw upon their own creativity andexperiences seems particularly promising in the context of user-centred design of technologies that are envisioned to permeateusers’ everyday lives.Beyond our analysis here, we release the collected dataset to thecommunity to support further research:

ACKNOWLEDGMENTS

We greatly thank Robin Welsch, Sven Mayer, and Ville Mäkelä fortheir helpful feedback on the manuscript.This project is partly funded by the Bavarian State Ministry ofScience and the Arts and coordinated by the Bavarian ResearchInstitute for Digital Transformation (bidt), and the Science Founda-tion Ireland ADAPT Centre (13/RC/2106).

REFERENCES [1] Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. 2019. Music, Search,and IoT: How People (Really) Use Voice Assistants.

ACM Trans. Comput.-Hum.Interact.

26, 3, Article 17 (April 2019), 28 pages. https://doi.org/10.1145/3311956[2] Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. FittingLinear Mixed-Effects Models Using lme4.

Journal of Statistical Software

67, 1(2015), 1–48. https://doi.org/10.18637/jss.v067.i01[3] Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White,and Danielle Lottridge. 2018. Understanding the Long-Term Use of Smart SpeakerAssistants.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

2, 3, Article 91(Sept. 2018), 24 pages. https://doi.org/10.1145/3264901[4] Camiel J Beukeboom, Martin Tanis, and Ivar E Vermeulen. 2013. The languageof extraversion: Extraverted people talk more abstractly, introverts are moreconcrete.

Journal of Language and Social Psychology

32, 2 (2013), 191–201. https://doi.org/10.1177/0261927X12460844[5] Timothy Bickmore and Justine Cassell. 2001. Relational Agents: A Model andImplementation of Building User Trust. In

Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’01) .ACM, New York, NY, USA, 396–403. https://doi.org/10.1145/365024.365304[6] Timothy Bickmore and Justine Cassell. 2005. Social Dialogue with EmbodiedConversational Agents. In

Advances in Natural Multimodal Dialogue Systems ,Jan C. J. van Kuppevelt, Laila Dybkjær, and Niels Ole Bernsen (Eds.). SpringerNetherlands, Dordrecht, 23–54. https://doi.org/10.1007/1-4020-3933-6_2[7] Holly P. Branigan, Martin J. Pickering, Jamie Pearson, and Janet F. McLean. 2010.Linguistic alignment between people and computers.

Journal of Pragmatics

42, 9(2010), 2355 – 2368. https://doi.org/10.1016/j.pragma.2009.12.012[8] Holly P. Branigan, Martin J. Pickering, Jamie Pearson, Janet F. McLean, and AshBrown. 2011. The role of beliefs in lexical alignment: Evidence from dialogues

HI ’21, May 8–13, 2021, Yokohama, Japan Völkel et al. with humans and computers.

Cognition

Proceedings of the 2019 CHI Conference on HumanFactors in Computing Systems (Glasgow, Scotland, UK) (CHI ’19) . ACM, New York,NY, USA, Article 40, 11 pages. https://doi.org/10.1145/3290605.3300270[10] Donn Erwin Byrne. 1971.

The attraction paradigm . Academic Press, Cambridge,MA, USA.[11] Angelo Cafaro, Hannes Högni Vilhjálmsson, and Timothy Bickmore. 2016. FirstImpressions in Human–Agent Virtual Encounters.

ACM Trans. Comput.-Hum.Interact.

23, 4, Article 24 (Aug. 2016), 40 pages. https://doi.org/10.1145/2940325[12] Anne Campbell and J. Philippe Rushton. 1978. Bodily communication and per-sonality.

British Journal of Social and Clinical Psychology

17, 1 (1978), 31–36.https://doi.org/10.1111/j.2044-8260.1978.tb00893.x[13] DW Carment, CG Miles, and VB Cervin. 1965. Persuasiveness and persuasibilityas related to intelligence and extraversion.

British Journal of Social and ClinicalPsychology

4, 1 (1965), 1–7. https://doi.org/10.1111/j.2044-8260.1965.tb00433.x[14] Yoonseo Choi, Toni-Jan Keith Palma Monserrat, Jeongeon Park, Hyungyu Shin,Nyoungwoo Lee, and Juho Kim. 2021. ProtoChat: Supporting the ConversationDesign Process with Crowd Feedback.

Proc. ACM Hum.-Comput. Interact.

Using language . Cambridge University Press, Cambridge,UK.[16] Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl,Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards,and Benjamin R. Cowan. 2019. The State of Speech in HCI: Trends, Themesand Challenges.

Interacting with Computers

31, 4 (09 2019), 349–371. https://doi.org/10.1093/iwc/iwz016[17] Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, JustinEdwards, Brendan Spillane, Emer Gilmartin, Christine Murad, Cosmin Munteanu,Vincent Wade, and Benjamin R. Cowan. 2019. What Makes a Good Conversation?:Challenges in Designing Truly Conversational Agents. In

Proceedings of the 2019CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland UK) (CHI ’19) . ACM, New York, NY, USA, Article 475, 12 pages. https://doi.org/10.1145/3290605.3300705[18] Victoria Clarke, Nikki Hayfield, Naomi Moller, and Irmgard Tischner. 2017. OnceUpon A Time?: Story Completion Methods. In

Collecting Qualitative Data: APractical Guide to Textual, Media and Virtual Techniques , Virginia Braun, VictoriaClarke, and Debra Gray (Eds.). Cambridge University Press, Cambridge, UK,45–70. http://oro.open.ac.uk/48404/[19] Benjamin R. Cowan, Holly P. Branigan, Mateo Obregón, Enas Bugis, and RussellBeale. 2015. Voice anthropomorphism, interlocutor modelling and alignmenteffects on syntactic choices in human-computer dialogue.

International Journalof Human-Computer Studies

83 (2015), 27 – 42. https://doi.org/10.1016/j.ijhcs.2015.05.008[20] Benjamin R. Cowan, Derek Gannon, Jenny Walsh, Justin Kinneen, Eanna O’Keefe,and Linxin Xie. 2016. Towards Understanding How Speech Output AffectsNavigation System Credibility. In

Proceedings of the 2016 CHI Conference ExtendedAbstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16) . ACM, New York, NY, USA, 2805–2812. https://doi.org/10.1145/2851581.2892469[21] Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke,Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. “What Can I HelpYou with?”: Infrequent Users’ Experiences of Intelligent Personal Assistants. In

Proceedings of the 19th International Conference on Human-Computer Interactionwith Mobile Devices and Services (Vienna, Austria) (MobileHCI ’17) . ACM, NewYork, NY, USA, Article 43, 12 pages. https://doi.org/10.1145/3098279.3098539[22] Nils Dahlbäck, QianYing Wang, Clifford Nass, and Jenny Alwin. 2007. Similarity isMore Important than Expertise: Accent Effects in Speech Interfaces. In

Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems (San Jose,California, USA) (CHI ’07) . ACM, New York, NY, USA, 1553–1556. https://doi.org/10.1145/1240624.1240859[23] Boele de Raad. 2000.

The Big Five Personality Factors: The psycholexical approachto personality.

Hogrefe & Huber Publishers, Göttingen, Germany.[24] Sarv Devaraj, Robert F. Easley, and J. Michael Crant. 2008. Research Notes –How Does Personality Matter? Relating the Five-Factor Model to TechnologyAcceptance and Use.

Information Systems Research

19, 1 (2008), 93–105. https://doi.org/10.1287/isre.1070.0153[25] Jean-Marc Dewaele and Adrian Furnham. 2000. Personality and speech pro-duction: A pilot study of second language learners.

Personality and IndividualDifferences

28, 2 (2000), 355–365. https://doi.org/10.1016/S0191-8869(99)00106-3[26] Colin G. DeYoung. 2014. Openness/Intellect: A dimension of personality reflect-ing cognitive exploration. In

APA Handbook of Personality and Social Psychology:Personality Processes and Individual Differences , M. Mikulincer, P.R. Shaver, M.L.Cooper, and R.J. Larsen (Eds.). Vol. 4. American Psychological Association, Wash-ington, DC, USA, 369–399. https://doi.org/10.1037/14343-017 [27] Ed Diener, Ed Sandvik, William Pavot, and Frank Fujita. 1992. Extraversion andsubjective well-being in a US national probability sample.

Journal of Research inPersonality

26, 3 (1992), 205–215. https://doi.org/10.1016/0092-6566(92)90039-7[28] Philip R. Doyle, Justin Edwards, Odile Dumbleton, Leigh Clark, and Benjamin R.Cowan. 2019. Mapping Perceptions of Humanness in Intelligent Personal Assis-tant Interaction. In

Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (Taipei, Taiwan) (Mobile-HCI ’19) . ACM, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/3338286.3340116[29] Robin Dunbar and Robin Ian MacDonald Dunbar. 1998.

Grooming, gossip, andthe evolution of language . Harvard University Press, Boston, MA, USA.[30] Patrick Ehrenbrink, Seif Osman, and Sebastian Möller. 2017. Google Now isfor the Extraverted, Cortana for the Introverted: Investigating the Influence ofPersonality on IPA Preference. In

Proceedings of the 29th Australian Conferenceon Computer-Human Interaction (Brisbane, Queensland, Australia) (OZCHI ’17) .ACM, New York, NY, USA, 257–265. https://doi.org/10.1145/3152771.3152799[31] Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Sta-tistical power analyses using G* Power 3.1: Tests for correlation and regressionanalyses.

Behavior research methods

Handbook of language andsocial psychology , William Peter Robinson and Howard Giles (Eds.). John Wiley& Sons, Chichester, UK, 73–95.[34] Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam,Benjamin R. Cowan, Nick Campbell, and Vincent Wade. 2017. Dialog Acts inGreeting and Leavetaking in Social Talk. In

Proceedings of the 1st ACM SIGCHIInternational Workshop on Investigating Social Interactions with Artificial Agents (Glasgow, UK) (ISIAA 2017) . ACM, New York, NY, USA, 29–30. https://doi.org/10.1145/3139491.3139493[35] Lewis R. Goldberg. 1981. Language and individual differences: The search foruniversals in personality lexicons. In

Review of Personality and Social Psychology ,L. Wheeler (Ed.). Vol. 2. Sage Publications, Beverly Hills, CA, USA, 141–166.[36] Alexander G. Hauptmann and Alexander I. Rudnicky. 1988. Talking to computers:an empirical investigation.

International Journal of Man-Machine Studies

28, 6(1988), 583 – 604. https://doi.org/10.1016/S0020-7373(88)80062-2[37] Carrie Heeter. 1992. Being There: The Subjective Experience of Presence.

Presence:Teleoperators and Virtual Environments

1, 2 (1992), 262–271. https://doi.org/10.1162/pres.1992.1.2.262[38] Jannica Heinström. 2005. Fast surfing, broad scanning and deep diving: Theinfluence of personality and study approach on students’ information-seekingbehavior.

Journal of Documentation

61, 2 (2005), 228–247. https://doi.org/10.1108/00220410510585205[39] Joshua J. Jackson, Dustin Wood, Tim Bogg, Kate E. Walton, Peter D. Harms, andBrent W. Roberts. 2010. What do conscientious people do? Development andvalidation of the Behavioral Indicators of Conscientiousness (BIC).

Journal ofResearch in Personality

44, 4 (2010), 501–511. https://doi.org/10.1016/j.jrp.2010.06.005[40] Lauri A. Jensen-Campbell and William G. Graziano. 2001. Agreeableness as amoderator of interpersonal conflict.

Journal of Personality

69, 2 (2001), 323–362.https://doi.org/10.1111/1467-6494.00148[41] Bret Kinsella and Ava Mutchler. 2020. In-car Voice Assistant Consumer AdoptionReport. http://voicebot.ai/wp-content/uploads/2020/02/in_car_voice_assistant_consumer_adoption_report_2020_voicebot.pdf, accessed July 30, 2020.[42] Bret Kinsella and Ava Mutchler. 2020. Smart Speaker Consumer Adoption Report2020. https://research.voicebot.ai/report-list/smart-speaker-consumer-adoption-report-2020/, accessed July 30, 2020.[43] Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. 2017.lmerTest Package: Tests in Linear Mixed Effects Models.

Journal of StatisticalSoftware

82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13[44] John Laver. 1981. Linguistic routines and politeness in greeting and parting. In

Conversational Routine , F. Coulmas (Ed.). Mouton Publisher, The Hague, Nether-lands, 289 – 304.[45] Kwan Min Lee and Clifford Nass. 2003. Designing Social Presence of SocialActors in Human Computer Interaction. In

Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03) .ACM, New York, NY, USA, 289–296. https://doi.org/10.1145/642611.642662[46] Min Kyung Lee and Maxim Makatchev. 2009. How Do People Talk with a Robot?An Analysis of Human-Robot Dialogues in the Real World. In

CHI ’09 ExtendedAbstracts on Human Factors in Computing Systems (Boston, MA, USA) (CHI EA ’09) .ACM, New York, NY, USA, 3769–3774. https://doi.org/10.1145/1520340.1520569[47] Gesa Alena Linnemann and Regina Jucks. 2018. ‘Can I Trust the Spoken DialogueSystem Because It Uses the Same Words as I Do?’—Influence of Lexically AlignedSpoken Dialogue Systems on Trustworthiness and User Satisfaction.

Interactingwith Computers

30, 3 (2018), 173–186. https://doi.org/10.1093/iwc/iwy005 liciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants CHI ’21, May 8–13, 2021, Yokohama, Japan [48] Ewa Luger and Abigail Sellen. 2016. “Like Having a Really Bad PA”: The Gulfbetween User Expectation and Experience of Conversational Agents. In

Pro-ceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16) . ACM, New York, NY, USA, 5286–5297.https://doi.org/10.1145/2858036.2858288[49] François Mairesse and Marilyn A Walker. 2010. Towards personality-based useradaptation: psychologically informed stylistic language generation.

User Modelingand User-Adapted Interaction

20, 3 (2010), 227–278. https://doi.org/10.1007/s11257-010-9076-2[50] Gerald Matthews, Ian J Deary, and Martha C Whiteman. 2003.

Personality traits .Cambridge University Press, Cambridge, UK.[51] Robert R. McCrae and Paul T. Costa. 2008. A five-factor theory of personality. In

Handbook of Personality: Theory and Research , O.P. John, R.W. Robins, and L.A.Pervin (Eds.). Vol. 3. The Guilford Press, New York, NY, USA, 159–181.[52] Robert R. McCrae and Oliver P. John. 1992. An introduction to the five-factormodel and its applications.

Journal of Personality

60, 2 (1992), 175–215. https://doi.org/10.1111/j.1467-6494.1992.tb00970.x[53] J. Murray McNiel and William Fleeson. 2006. The causal effects of extraversionon positive affect and neuroticism on negative affect: Manipulating state extraver-sion and state neuroticism in an experimental approach.

Journal of Research inPersonality

40, 5 (2006), 529–550. https://doi.org/10.1016/j.jrp.2005.05.003[54] Matthias R. Mehl, Samuel D. Gosling, and James W. Pennebaker. 2006. Personalityin its natural habitat: Manifestations and implicit folk theories of personalityin daily life.

Journal of Personality and Social Psychology

90, 5 (2006), 862–877.https://doi.org/10.1037/0022-3514.90.5.862[55] Lotte Meteyard and Robert A.I. Davies. 2020. Best practice guidance for linearmixed-effects models in psychological science.

Journal of Memory and Language

112 (2020), 104092. https://doi.org/10.1016/j.jml.2020.104092[56] Clifford Nass and Scott Brave. 2005.

Wired for speech: How voice activates andadvances the human-computer relationship . MIT press, Cambridge, MA, USA.[57] Clifford Nass and Kwan Min Lee. 2001. Does computer-synthesized speechmanifest personality? Experimental tests of recognition, similarity-attraction,and consistency-attraction.

Journal of Experimental Psychology: Applied

7, 3(2001), 171. https://doi.org/10.1037/1076-898X.7.3.171[58] Clifford Nass, Jonathan Steuer, and Ellen R. Tauber. 1994. Computers Are SocialActors. In

Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems (Boston, Massachusetts, USA) (CHI ’94) . ACM, New York, NY, USA, 72–78.https://doi.org/10.1145/191666.191703[59] Jon Oberlander and Alastair J. Gill. 2004. Individual differences and implicitlanguage: personality, parts-of-speech and pervasiveness.

Proceedings of theAnnual Meeting of the Cognitive Science Society

Computer Speech and Language

9, 1 (1995), 19–36. https://doi.org/10.1006/csla.1995.0002[62] Sharon Oviatt. 1996. User-centered modeling for spoken language and multimodalinterfaces.

IEEE MultiMedia

3, 4 (1996), 26–35. https://doi.org/10.1109/93.556458[63] Sharon Oviatt and Bridget Adams. 2000. Designing and evaluating conversationalinterfaces with animated characters. In

Embodied Conversational Agents , J. Cassell,J. Sullivan, S. Prevost, and E. Churchill (Eds.). MIT Press, Cambridge, MA, USA,319–343.[64] Sharon Oviatt, Jon Bernard, and Gina-Anne Levow. 1998. Linguistic AdaptationsDuring Spoken and Multimodal Error Resolution.

Language and Speech

41, 3(1998), 419–442. https://doi.org/10.1177/002383099804100409[65] M. Patterson and D.S. Holmes. 1966. Social Interaction Correlates of the MMPIExtraversion Introversion Scale.

American Psychologist

21 (1966), 724–25.[66] Hannah R.M. Pelikan and Mathias Broth. 2016. Why That Nao? How HumansAdapt to a Conventional Humanoid Robot in Taking Turns-at-Talk. In

Proceedingsof the 2016 CHI Conference on Human Factors in Computing Systems (San Jose,California, USA) (CHI ’16) . ACM, New York, NY, USA, 4921–4932. https://doi.org/10.1145/2858036.2858478[67] James W. Pennebaker and Laura A. King. 1999. Linguistic styles: Language useas an individual difference.

Journal of Personality and Social Psychology

77, 6(1999), 1296–1312. https://doi.org/10.1037/0022-3514.77.6.1296[68] Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. VoiceInterfaces in Everyday Life. In

Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems (Montreal, QC, Canada) (CHI ’18) . ACM, New York,NY, USA, 1–12. https://doi.org/10.1145/3173574.3174214[69] Byron Reeves and Clifford Ivar Nass. 1996.

The media equation: How peopletreat computers, television, and new media like real people and places.

CambridgeUniversity Press, Cambridge, UK.[70] Stuart Reeves. 2019. Conversation Considered Harmful?. In

Proceedings of the1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI ’19) . ACM, New York, NY, USA, Article 10, 3 pages. https://doi.org/10.1145/3342775.3342796[71] D. R. Rutter, Ian E. Morley, and Jane C. Graham. 1972. Visual interaction in agroup of introverts and extraverts.

European Journal of Social Psychology

2, 4 (1972), 371–384. https://doi.org/10.1002/ejsp.2420020403[72] Harvey Sacks, Emanuel Schegloff, and Gail Jefferson. 1974. A simplest systematicsfor the organization of turn-taking for conversation.

Language

50, 4 (1974), 696–735. https://doi.org/10.1353/lan.1974.0010[73] Klaus Rainer Scherer. 1979. Personality markers in speech. In

Social Markersin Speech , Klaur Rainer Scherer and Howard Giles (Eds.). Cambridge UniversityPress, Cambridge, UK.[74] Michael Schmitz, Antonio Krüger, and Sarah Schmidt. 2007. Modelling Personalityin Voices of Talking Products through Prosodic Parameters. In

Proceedings ofthe 12th International Conference on Intelligent User Interfaces (Honolulu, Hawaii,USA) (IUI ’07) . ACM, New York, NY, USA, 313–316. https://doi.org/10.1145/1216295.1216355[75] Christopher J. Soto and Oliver P. John. 2017. The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhancebandwidth, fidelity, and predictive power.

Journal of Personality and SocialPsychology

Proceedings of the 1st ACM SIGCHI InternationalWorkshop on Investigating Social Interactions with Artificial Agents (Glasgow, UK) (ISIAA 2017) . ACM, New York, NY, USA, 43–44. https://doi.org/10.1145/3139491.3139492[77] Eva Székely, Joseph Mendelson, and Joakim Gustafson. 2017. Synthesising Un-certainty: The Interplay of Vocal Effort and Hesitation Disfluencies.. In

Proc.Interspeech . International Speech Communication Association, Baixas, France,804–808. https://doi.org/10.21437/Interspeech.2017-1507[78] Madiha Tabassum, Tomasz Kosiński, Alisa Frik, Nathan Malkin, Primal Wije-sekera, Serge Egelman, and Heather Richter Lipford. 2019. Investigating Users’Preferences and Expectations for Always-Listening Voice Assistants.

Proc. ACMInteract. Mob. Wearable Ubiquitous Technol.

3, 4, Article 153 (Dec. 2019), 23 pages.https://doi.org/10.1145/3369807[79] Jürgen Trouvain, Sarah Schmidt, Marc Schröder, Michael Schmitz, and William J.Barry. 2006. Modelling personality features by changing prosody in syntheticspeech. In

Proceedings of the 3rd International Conference on Speech Prosody .TUDpress, Dresden, Germany, 4. https://doi.org/10.22028/D291-25920[80] Santiago Villarreal-Narvaez, Jean Vanderdonckt, Radu-Daniel Vatavu, and Ja-cob O. Wobbrock. 2020. A Systematic Review of Gesture Elicitation Studies:What Can We Learn from 216 Studies?. In

Proceedings of the 2020 ACM DesigningInteractive Systems Conference (Eindhoven, Netherlands) (DIS ’20) . ACM, NewYork, NY, USA, 855–872. https://doi.org/10.1145/3357236.3395511[81] Alessandro Vinciarelli and Gelareh Mohammadi. 2014. A survey of personalitycomputing.

IEEE Transactions on Affective Computing

5, 3 (2014), 273–291. https://doi.org/10.1109/TAFFC.2014.2330816[82] James Vlahos. 2019.

Talk to Me: How Voice Computing Will Transform the WayWe Live, Work, and Think . Houghton Mifflin Harcourt, Boston, MA, USA.[83] Sarah Theres Völkel, Penelope Kempf, and Heinrich Hussmann. 2020. Person-alised Chats with Voice Assistants: The User Perspective. In

Proceedings of the2nd Conference on Conversational User Interfaces (Bilbao, Spain) (CUI ’20) . ACM,New York, NY, USA, Article 53, 4 pages. https://doi.org/10.1145/3405755.3406156[84] Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, VerenaWinterhalter, Markus Bühner, and Heinrich Hussmann. 2020. Developing aPersonality Model for Speech-Based Conversational Agents Using the Psycholex-ical Approach. In

Proceedings of the 2020 CHI Conference on Human Factors inComputing Systems (Honolulu, HI, USA) (CHI ’20) . ACM, New York, NY, USA,1–14. https://doi.org/10.1145/3313831.3376210[85] Yorick Wilks. 2010.

Close engagements with artificial companions: key social,psychological, ethical and design issues . Vol. 8. John Benjamins Publishing, Ams-terdam, Netherlands.[86] Yunhan Wu, Justin Edwards, Orla Cooney, Anna Bleakley, Philip R. Doyle, LeighClark, Daniel Rough, and Benjamin R. Cowan. 2020. Mental Workload andLanguage Production in Non-Native Speaker IPA Interaction. In

Proceedings ofthe 2nd Conference on Conversational User Interfaces (Bilbao, Spain) (CUI ’20) . ACM,New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/3405755.3406118[87] Hyun Shik Yoon and Linsey M Barker Steege. 2013. Development of a quan-titative model of the impact of customers’ personality and perceptions on In-ternet banking use.

Computers in Human Behavior

29, 3 (2013), 1133–1141.https://doi.org/10.1016/j.chb.2012.10.005[88] Michelle X. Zhou, Gloria Mark, Jingyi Li, and Huahai Yang. 2019. Trusting VirtualAgents: The Effect of Personality.