[PDF] Artificial intelligence in communication impacts language and social relationships

Abstract

Artificial intelligence (AI) is now widely used to facilitate social interaction, but its impact on social relationships and communication is not well understood. We study the social consequences of one of the most pervasive AI applications: algorithmic response suggestions ("smart replies"). Two randomized experiments (n = 1036) provide evidence that a commercially-deployed AI changes how people interact with and perceive one another in pro-social and anti-social ways. We find that using algorithmic responses increases communication efficiency, use of positive emotional language, and positive evaluations by communication partners. However, consistent with common assumptions about the negative implications of AI, people are evaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase communication efficiency and improve interpersonal perceptions, it risks changing users' language production and continues to be viewed negatively.

Full PDF

aa r X i v : . [ c s . H C ] F e b Artiﬁcial intelligence in communication impactslanguage and social relationships

Jess Hohenstein , Dominic DiFranzo , Rene F. Kizilcec , Zhila Aghajari , HannahMieczkowski , Karen Levy , Mor Naaman , Jeff Hancock , and Malte Jung Department of Information Science, Cornell University, Ithaca, NY 14853, USA Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA 18015, USA Department of Communication, Stanford University, Stanford, CA 94305, USA Cornell Tech, New York, NY 10044, USA * [email protected]; [email protected] ABSTRACT

Artiﬁcial intelligence (AI) is now widely used to facilitate social interaction, but its impact on social relationships and commu-nication is not well understood. We study the social consequences of one of the most pervasive AI applications: algorithmicresponse suggestions ("smart replies"). Two randomized experiments (n = 1036) provide evidence that a commercially-deployed AI changes how people interact with and perceive one another in pro-social and anti-social ways. We ﬁnd that usingalgorithmic responses increases communication efﬁciency, use of positive emotional language, and positive evaluations bycommunication partners. However, consistent with common assumptions about the negative implications of AI, people areevaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase com-munication efﬁciency and improve interpersonal perceptions, it risks changing users’ language production and continues tobe viewed negatively.

Introduction

Communication is the basic process through which people form perceptions of others , build and maintain social relation-ships , and achieve cooperative outcomes . Applications of artiﬁcial intelligence (AI) are increasingly shaping the way thatpeople communicate and interact with one another . One of the most visible AI applications is AI-generated reply sugges-tions in text-based communication, commonly known as smart replies, which aim to help users compose messages with "justone tap" . Despite the rapid deployment of AI applications in new products and contexts and people’s growing concerns aboutthe societal consequences of AI , research has predominantly focused on the technical aspects and largely ignored the po-tential social impacts of integrating AI-generated messages into human-to-human communication. Reports from the AI NowInstitute liken this scenario to "...conducting an experiment without bothering to note the results" and have repeatedly notedthe under-investment in research on the social implications of AI technologies while calling for an increase in interdisciplinaryresearch focusing on examining these systems within human populations .As of last year, algorithmic responses constituted 12% of messages sent through Gmail alone , representing about 6.7billion emails written by AI on our behalf each day . Smart reply systems aim to make text production more efﬁcient bydrawing from general and user-speciﬁc text corpora to predict what a person might type and generating one or more suggestedresponses that a user can choose from when responding to a message . Users’ rapid adoption of this type of AI in interpersonalcommunication has been facilitated by a large body of technical research regarding various methods for generating algorithmicresponses (e.g., ). However, the social implications of this type of AI involvement remain largely unclear.Given the broad integration of AI systems like these in our social lives, a growing body of work at the intersection ofcomputer and social science is concerned with understanding how such systems may be inﬂuencing human behavior andhow they are perceived (e.g., ). Initial studies have found that algorithmic responses can impact how people write ,and people believe that the mere presence of smart replies inﬂuences the way that they communicate, in part because of thelinguistic skew of smart replies, which tend to express more positive motions . However, the social implications of smartreply use remain unclear.To examine the social consequences of using AI to help generate messages, we conducted a set of randomized controlledexperiments to study how the display and use of AI-generated smart replies in real-time text-based communication affectshow people interact and perceive each other. We show that a commercially-deployed AI affects various aspects of interper-sonal communication. More speciﬁcally, we ﬁnd that AI inﬂuences multiple dimensions of social engagement includingommunication efﬁciency, emotional tone, and interpersonal evaluations in both positive and negative ways. Results

AI is Perceived Negatively but Improves Interpersonal Perceptions

Inspired by theories of how computer-mediated communication can affect intimacy and relationship maintenance , we hy-pothesized that seeing AI-generated reply suggestions could inﬂuence participants’ feelings of connectedness with their con-versation partner. To test the effect of AI mediation on interpersonal trait inferences and perceptions of cooperativeness, wedeveloped a messaging application in which we can manipulate the smart replies that are displayed to users while collectingdata about the conversation.To identify the effects and perceptions of using algorithmic responses in conversation (beyond merely being presentedwith them), we randomly assigned 219 pairs of participants into three different messaging conditions: 1) both participantscan use smart replies (i.e., suggested responses generated using the Google Reply API ), 2) only one participant can usesmart replies, or 3) neither participant can use smart replies. Participants engaged in a conversation about a policy issue whilethe application tracked their use of smart replies. By presenting participants with smart replies, they were encouraged to usethem in conversation, which serves as our causal identiﬁcation strategy for estimating the effects of smart reply use by the selfand the partner. After completing the conversation, participants were given a deﬁnition of smart replies and asked to rate howoften they believed that their partner had used them. They also responded to established measures of dominance and afﬁliation(Revised Interpersonal Adjective Scale ), and perceived cooperative communication ( ). NeitherPartner onlySelf onlyBoth 0.00 0.05 0.10 0.15 0.20

Average SR use, 1SEM S m a r t R ep l y C ond i t i on Self SR use Partner SR use AffiliationCooperativeCommunicationCommunicationEfficiencyConversationSentiment −1 0 1 2 3

Effect of SR use (LATE), 1SEM O u t c o m e M ea s u r e Self SR use Partner SR use

Figure 1.

Results of the randomized experiment independently manipulating the availability of smart replies (SR) for eachpartner. Left panel: average proportion of SR use in conversation by SR condition (i.e. ﬁrst-stage effect). Right panel: latentaverage treatment effect (LATE) estimates of how increased SR use by the self and by the partner affected conversationsentiment, communication efﬁciency, cooperative communication, and afﬁliation. Error bars represent cluster-robuststandard errors.We ﬁnd that the presence of algorithmic responses was a strong encouragement to use them: smart replies account for14.3% of sent messages on average ( t (211)=13.8, p < .0001), and Figure 1 (left panel) shows average smart reply use by exper-imental condition. Because the variation in smart reply use is experimentally and independently induced for each participant,we can use instrumental variable (IV) estimation to identify the consequences of increased smart reply use by the self and thepartner (Figure 1, right panel). Using IV estimation, we ﬁnd that increased use of smart replies by the self (but not the partner)led to more efﬁcient communication in terms of the number of messages the self sent per minute ( t (198)=2.21, p =0.0286).While smart reply use clearly improves communication efﬁciency, its consequences for interpersonal perceptions are morecomplex.Participants are capable of recognizing their partner’s use of smart replies to some degree: beliefs about how much theirpartner used smart replies correlated with actual use but not strongly (Pearson’s r =0.22, t (97)=3.62, p =0.0005). Consistentwith commonly held beliefs about the negative implications of AI in social interactions , we ﬁnd strong associationsbetween perceived smart reply use by the partner and attitudes towards them. The more participants thought their partnerused smart replies, the less cooperative they rated them ( t (92)=-9.89, p < t (92)=-6.90, p < Affiliation Cooperative Communication

Perceived SR Use M ean R a t i ng , SE M Figure 2.

Average rating of the partner’s afﬁliation and cooperative communication by the self for different levels ofperceived smart reply use by the partner ( N = 361). Error bars show 1 cluster-robust standard error.Our IV estimation strategy reveals that increased use of smart replies by the partner actually improved the self’s rating ofthe partner’s cooperation ( t (167)=2.23, p =0.0273) and sense of afﬁliation towards them ( t (167)=2.54, p =0.0120). Althoughperceived smart reply use is judged negatively, actual use results in more positive attitudes. Moreover, we ﬁnd that conversationsentiment became more positive as a result of both the self using more smart replies ( t (198)=2.06, p =0.0404) and the partnerusing more smart replies ( t (198)=2.11, p =0.0362). This ﬁnding suggests that the effects of AI mediation on interpersonalperceptions are related to changes in language introduced by the AI system. AI Sentiment Affects Emotional Content in Human Conversations

To better understand how the sentiment of AI-suggested responses affects conversational language, we conducted a secondexperiment. Using a between-subjects design, we randomly assigned 299 pairs to discuss a policy issue using our app inone of four conditions: with Google smart replies (i.e., participants receive algorithmic responses generated using the GoogleReply API ), positive smart replies (i.e., participants receive algorithmic responses that have positive sentiment, as rated bycrowdworkers), negative smart replies (i.e., participants receive algorithmic responses that have negative sentiment, as ratedby crowdworkers), or no smart replies (i.e., participants do not receive algorithmic responses). We measured conversationsentiment using VADER, a lexicon and rule-based sentiment analysis tool that is ideal for analyzing short, social messages .A robustness check with another sentiment analysis dictionary is presented in the Methods section. We aggregated VADERscores into a sentiment polarity score ranking from most positive (1) to most negative (-1), with neutral (0) in the middle. Theaverage conversations comprised 20 messages ( sd =8.6) and lasted 6.33 minutes ( sd =2.67).We found that the presence of positive and Google smart replies caused conversations to have more positive emotionalcontent than conversations with negative or no smart replies ( t (127)=2.75, p =0.007, d =.352; Figure 3). Moreover, the ﬁndingthat widely-used Google smart replies have a similar effect on conversation sentiment as a set of positive ( t (150)=0.51, p =0.61)but not negative smart replies ( t (127)=2.40, p =0.018) highlights the positive sentiment bias of smart replies in commercialmessaging apps. Taken together, these ﬁndings demonstrate how AI-generated sentiment affects the emotional language usedin human conversation. Discussion

Our research shows that a commercially-deployed AI can fundamentally reshape how people communicate with others, andthis has both positive and negative consequences. We ﬁnd that people choose to use AI when given the opportunity, andthis increases the efﬁciency of communication and leads to more emotionally positive language. However, we also ﬁnd thatas participants think that their partner is using more algorithmic responses, they perceive them as less cooperative and feel .000.030.060.090.120.15 Control (None) Negative Positive Google

Smart Reply Condition C on v e r s a t i on S en t i m en t, SE M Figure 3.

Mean overall conversation sentiment by experimental condition: both participants assigned to no smart replies,negative, positive, or Google smart replies. Error bars show 1 cluster-robust standard error.less afﬁliation towards them. This ﬁnding could be related to common assumptions about the negative implications of AI insocial interactions. For example, humans are already predisposed to trust other humans over computers , and most currentcommunication systems featuring AI mediation lack transparency for users (i.e., the sender knows that their responses havebeen modiﬁed or generated by AI, while the receiver does not). Taken together with users’ preference for reducing uncertaintyin interactions , this could lead to negative perceptions of AI in everyday communication. Indeed, these negative perceptionsare conﬁrmed by recent work, such as , where users described how smart replies often did not capture what they wantedto say and could be altering the way that they communicated with others, and , where text suspected of or labeled as beingwritten by AI was perceived as less trustworthy.Despite negative perceptions about AI in communication, we ﬁnd that as people actually use more algorithmic responses,their communication partner has more positive attitudes about them. Even though perceived smart reply use is viewed nega-tively, actual smart reply use results in communicators being viewed as being more cooperative and afﬁliative. In other words,it seems that the negative perception of using AI to help us communicate do not match the reality.Our work provides evidence that AI can alter the language that people use when interacting with others. Understanding theimpact on language is important because language is inextricably linked with listeners’ characterizations of a communicator,including their personality , emotions , sentiment , and level of dominance (i.e., expressing more aggressive insteadof afﬁliative behavior in an interaction) . Indeed, we ﬁnd that AI-generated responses changed the expression of emotion inhuman conversations. The inﬂuence of AI on human emotional communication is deeply concerning given that AI alreadywrites about 6.7 billion emails on our behalf daily . With the increasing popularity of other forms of AI mediating oureveryday communication (e.g., Smart Compose ), we have little insight into how regularly people are allowing AI to helpthem communicate or the potential long-term implications of the interference of AI in human communication. Our worksuggests that interpersonal relationships are likely to be affected, potentially positively. However, the demonstrated changesin language suggest that we could potentially lose our personal communication styles, with language becoming increasinglyhomogeneous over time. While current implementations of AI in messaging apps increase efﬁciency by allowing users torespond to messages more quickly, smart reply use is still viewed negatively, and as we have now demonstrated, it has theability to alter our language when communicating with other humans.Our work has implications for the development of AI systems and highlights both opportunities and risks of deployingsuch systems. A recent laboratory study has shown that a humanoid robot can improve interpersonal communication whenexpressing vulnerability within a team . Our work takes this research further by demonstrating how a commercially-deployedAI can inﬂuence interactions in positive ways through much more subtle forms of intervention than a robot’s overt behavior.Merely providing suggestions changes the language used in a conversation, and the changes are consistent with the linguisticqualities of the algorithmic responses. Additionally, previous work has shown that when conversations go awry, people trustthe AI more than the person that they’re communicating with and assign some of the blame that they otherwise would haveassigned to this person to the AI . Taken together, these ﬁndings suggest possible opportunities for developers to affect onversational dynamics and outcomes by carefully controlling the linguistics of smart replies that are shown to users, suchas in . On the other hand, the ﬁnding that changes in language are consistent with changes in smart replies raises potentialrisks as AI gains continues to gain inﬂuence over the our social interactions. Knowing that AI can shape the way that wecommunicate, it is important for researchers and practitioners to consider the broader social consequences when designingalgorithms that support communication.Overall, while AI has the potential to help people communicate more efﬁciently and improve interpersonal perceptions ineveryday conversation, users should be cautioned that these beneﬁts are coupled with alterations to the emotional aspects ofour language and a corresponding potential loss of personal expression. Methods

The study procedure and all materials were approved by our Institutional Review Board (1610006732), and the study waspre-registered on AsPredicted (40389).

Web-Based AI-MC PlatformFigure 4.

We can use our web-based AI-MC platform to control and record the smart replies that participants see. Thisﬁgure shows both positive and negative sentiment smart reply examples (i.e., blue and grey boxes, respectively). Duringactual use, participants see only one of those sets.To develop a nuanced and systematic understanding of the mechanisms by which smart replies and their linguistic prop-erties as well as presentation modalities affect communication, we developed a ﬂexible web-based research platform that llows us to recruit participants online (e.g., through crowdsourcing platforms) and engage them in real-time interpersonalcommunication tasks while receiving smart reply support, as shown in Figure 4.The platform is designed as a web application that allows two participants to text chat with one another in real time andruns on all major modern browsers (e.g., Google Chrome 60, Mozilla Firefox 54, Microsoft Edge 14 and Apple Safari 10).It is built using Node.js and MongoDB on the backend with jQuery and Semantic UI framework on the client side. Theinterface is reactive to device type and resizes itself to work well on desktops, tablets, and mobile devices (Android and iOS).Throughout the design process, we elicited feedback from colleagues to ensure that the application seemed natural and easyto use. Like in existing commercial messaging applications that feature smart replies, in addition to the standard text box fortyping messages, participants can also receive smart replies that they can tap to send. Participants can also scroll to see thehistory of their conversation at any point during the chat.Four implementations of the messenger were used in this work: positive and negative sentiment smart replies, real smartreplies (i.e., generated by the Google Reply API ), and no smart replies.In the positive and negative sentiment smart replies conditions, the smart replies shown to participants had only positiveor negative sentiment, respectively. For example, in the positive condition, a participant might see smart replies such as, “Ilike it” and “I can’t agree more”, whereas in the negative condition, a participant might see smart replies such as, “I don’tget it” and “No you are not”. These smart replies are chosen randomly from an input json ﬁle without being too repetitive(i.e., all three utterances shown in each instance are different, and the same utterance is not shown in immediately subsequentinstances). These utterances were pulled from previous work where crowdworkers rated the sentiment of smart replies, andthe suggestions included only those that were rated as having deﬁnitive positive or negative sentiment, respectively. In both ofthese implementations, each time participants sends or receives a message, the smart replies are updated.In the implementation that did not include smart replies, which served as our control condition, participants had to manu-ally type each message that they sent.The ﬁnal implementation uses Google’s Reply model to generate smart replies. However, since this model does the pre-and post-processing tasks during run-time and its framework is built with C++ and compiled into an Android archive, it is notpossible to run it on desktop environments. A stand-alone CPython library on the top of the Reply model can be compiledon a Linux operating system , which we used to generate smart replies using Google’s Reply model. When users send orreceive a message, the Python API receives that message and generates smart replies through the Reply model. Study 1

Participants & Procedure

We collected data from Mechanical Turk participants ( N =438, 33.7%F, 0.005% non-binary) who received monetary paymentfor their participation. Participants ranged in age from 18-68 ( M = 34.15, SD =10.1).The survey itself was conducted using Qualtrics. After obtaining consent, participants were informed that they would beusing a messaging system to complete a task with an anonymous partner. Participants were then presented with a task involvinga discussion of unfair rejection of work, an issue that is relevant to all crowdworkers on Mechanical Turk . Speciﬁcally,we asked pairs to come to an agreement on the ”top 3 changes that Mechanical Turk could make to better handle unfairlyrejected work”. Participants were asked to open the web-based messaging platform in another window while still viewingthe Qualtrics survey. After opening the messaging platform, participants waited up to 5 minutes for another participant toenter the conversation. If 5 minutes elapsed without another participant arriving, participants were able to prematurely exit thesurvey and receive partial compensation. Once another participant arrived, the pair was as much time as they needed to cometo an agreement on a ranked list. When ﬁnished with the task, participants could press a ”Conversation complete” button inthe messenger and receive a conversation completion code that they pasted into the Qualtrics survey to conﬁrm that they hadcompleted a conversation with a partner.After verifying that a conversation was completed and giving a brief description of smart replies, we asked participantshow much they believed their partner had used smart replies. Participants were also asked to ﬁll out the Perceived CooperativeCommunication scale and the Interpersonal Adjective Scale, Revised (IAS-R).Perceived cooperative communication was measured through a 7-item scale where participants rated their agreementwith statements describing cooperative communication in their overall interaction with their partner. The instructions read,"Thinking about your interaction with your partner, please rate the extent to which you agree with each of these statements".Participants rated each statement on rating-scale items anchored by "Strongly disagree" (1) and "Strongly agree" (7).The IAS-R provides an empirical measure of various dimensions that underlie interpersonal transactions . To shorten themeasure, two adjectives with the highest loading factors from each interpersonal octant were selected, based on the analysis ofWiggins et al , resulting in 16 items to be ranked. The instructions read, "Below are a list of words that describe how peopleinteract with others. Based on your intuition, please rate how accurately each word describes your conversation partner"(adapted from ). Participants rated each statement on rating-scale items anchored by "Extremely inaccurate" (1), "Somewhat ccurate" (4), and "Extremely accurate" (7). These ratings were then combined according to a formula adapted from todetermine ratings of afﬁliation and dominance .The presentation of the 3 post-task measures was randomized between participants to avoid any possible order bias. Lastly,participants were asked about demographic information as well as for any comments that they had about the survey. Statistics

We analyzed the data using instrumental variable regression with self smart reply use and partner smart reply use instrumentedby condition assignment; no covariates were added. We used the ivreg function in the R AER package . We computedcluster-robust standard errors (i.e., CR2) using the coef_test function in the R clubSandwich package . The reported andplotted estimates represent coefﬁcients, t-statistics and p-values from the IV regression output. Analysis and Results

We excluded conversations from the analysis that had less than 10 messages exchanged overall and where one participant sentless than 3 messages. For the analysis of post-conversation self-report outcomes, we also excluded participants who did notcomplete the full survey.Sentiment was analyzed using VADER, a lexicon and rule-based sentiment analysis tool speciﬁcally attuned to sentimentsexpressed on social media . This analysis tool yields a sentiment metric indicating how positive, negative, or neutral thesentiment of the supplied text is. For our purposes, messages were analyzed individually using the VADER compound senti-ment output, an aggregated score ranging from -1 to 1 (i.e., most negative to most positive) based on the three aforementionedsentiment components.The formulae for deriving aggregate measures of dominance and afﬁliation from the IAS-R (adapted from ) are givenbelow:1. Octant scores are computed from adjective ratings:PA = (dominant + assertive)/2BC = (sly + cunning)/2DE = (unsympathetic + warmthless)/2FG = (unsociable + antisocial)/2H1 = (shy + unaggressive)/2JK = (uncunning + unsly)/2LM = (gentle + tender)/2NO = (friendly + outgoing)/22. Dominance and afﬁliation scores are computed from these octant scores:DOM = PA - HI + .707(NO + BC - FG - JK)AFF = LM - DE + .707(NO - BC - FG + JK) Study 2

Participants & Procedure

We collected messaging conversations from participants ( N =599, 37.2% F, 0.1% non-binary) through Mechanical Turk whoreceived monetary payment for their participation. Participants ranged in age from 19–69 ( M =35.6, SD =9.96). To ensurethat any language differences that we found were not the result of demographic differences between the four conditions , weexamined the demographic makeup (i.e., age, gender, and race) between conditions and did not ﬁnd any signiﬁcant differences.The survey and procedure were similar to the previous study, except participants in the AI-mediated messaging conditionwere informed that they would be ”[...] using an AI-mediated messaging system to have a conversation with your partner.While you are messaging, artiﬁcial intelligence (AI) will provide smart replies that you can simply tap to send.”, while partic-ipants in the control condition were told that they would be ”[...] using a standard messaging system to have a conversationwith your partner.” After verifying that a conversation was completed, participants were asked about demographic informationas well as for any comments that they had about the survey. Statistics

We analyzed the resulting data at the individual level using a simple linear regression with cluster-robust standard errors usingthe lm_robust function in the R estimatr package . The dependent variable was the individual language measure (i.e., VADERsentiment) and the independent variable was the assigned condition; no covariates were added. The reported statistics are thet-statistic and p-value for the relevant coefﬁcient, and Cohen’s d computed manually. obustness Check for Sentiment Results We presented an analysis of conversation sentiment using VADER, a lexicon and rule-based sentiment analysis tool that isideal for analyzing short, social messages . As in Study 1, we excluded conversations from the analysis that had less than 10messages exchanged overall and where one participant sent less than 3 messages.To ensure that results do not signiﬁcantly change with other dictionaries, we performed a robustness check using LinguisticInquiry and Word Count (LIWC), a dictionary-based text analysis tool that determines the percentage of words that reﬂecta number of linguistic processes, psychological processes, and personal concerns . To verify our ﬁndings with respect tosentiment from VADER, we analyzed Affect scores from LIWC. Affect, with values ranging from 0-100, is made up ofPositive and Negative Emotion variables, which also range from 0-100. For example, a message with an Affect score of 50could be made up of a Positive Emotion score of 50, a Positive Emotion score of 25 and a Negative Emotion score of 25, or aNegative Emotion score of 50. Figure 5.

Mean overall conversation affect by experimental condition: both participants assigned to no smart replies,negative, positive, or Google smart replies. Error bars show 1 cluster-robust standard error.All ﬁndings with respect to VADER sentiment were conﬁrmed using LIWC. We found that the presence of positiveand Google smart replies caused conversations to have higher affect than conversations without smart replies ( t (124)=2.95, p <0.001, d =0.272). The effect of positive and Google smart replies on affect was statistically similar ( t (150)=0.354, p =0.724).The presence of negative smart replies had a strong negative effect on conversation affect compared to the control conditionwithout smart replies ( t (123)=-3.50, p <0.001, d =0.454). Limitations

There were several limitations to this work. First, we analyzed conversations from participants completing a contrived taskon Mechanical Turk. Although we attempted to choose a task that would be personally relevant to all crowdworkers andeffectuate the interpersonal closeness that we hoped to examine, many other types of everyday messaging conversations exist,and future work should examine how these results hold up in disparate contexts.Since our web-based messenger is not yet robust enough for mobile use, this work focused speciﬁcally on AI-mediatedmessaging conversations in a desktop computer environment and may not generalize to similar messaging situations on otheruse contexts. Interpersonal perceptions in mobile messaging contexts featuring smart replies should also be examined.As is standard in similar literature (e.g., ), interpersonal perceptions were measured as momentary states. However, theseperceptions change and develop over time, so future work should examine whether and how these measures are affected longi-tudinally under the inﬂuence of smart replies. Similarly, these studies occurred with anonymous crowdworkers completing aone-time interaction. We do not know whether our ﬁndings would be different in relationships with various levels of interper-sonal closeness. Future work should investigate how interpersonal perceptions are related to smart reply use in more sociallyintimate relationships, such as between friends or co-workers. Additionally, we investigated interpersonal perceptions result-ing from real-time messaging conversations, which could manifest differently in other communication contexts. Future workshould examine how interpersonal relationships are affected by the presence of AI mediation in asynchronous communication ontexts, such as email. Data Availability

The data has been deposited in the Mendeley Data repository (DOI: 10.17632/6v5r6jmd3y.1 ). References Mairesse, F. & Walker, M. Automatic recognition of personality in conversation. In

Proceedings of the Human Lan-guage Technology Conference of the NAACL, Companion Volume: Short Papers , 85–88 (Association for ComputationalLinguistics, 2006). Mairesse, F., Walker, M. et al.

Words mark the nerds: Computational models of personality recognition through language.In

Proceedings of the Annual Meeting of the Cognitive Science Society , vol. 28 (2006). Mairesse, F., Walker, M. A., Mehl, M. R. & Moore, R. K. Using linguistic cues for the automatic recognition of personalityin conversation and text.

J. artiﬁcial intelligence research , 457–500 (2007). Pennebaker, J. W., Mehl, M. R. & Niederhoffer, K. G. Psychological aspects of natural language use: Our words, ourselves.

Annu. review psychology , 547–577 (2003). Zhang, J. et al.

Conversations gone awry: Detecting early signs of conversational failure. arXiv preprint arXiv:1805.05345 (2018). Stone, P. et al. "Artiﬁcial Intelligence and Life in 2030." One Hundred Year Study on Artiﬁcial Intelligence: Report ofthe 2015-2016 Study Panel. Tech. Rep., Stanford University (2016). Rahwan, I. et al.

Machine behaviour.

Nature , 477–486 (2019). Kannan, A. et al.

Smart reply: Automated response suggestion for email. In

Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining , 955–964 (2016). Shakirov, V. Review of state-of-the-arts in artiﬁcial intelligence with application to ai safety problem. arXiv preprintarXiv:1605.04232 (2016).

Crawford, K. et al. The AI Now Report: The Social and Economic Implications of Artiﬁcial Intelligence Technologies inthe Near-Term (AI Now Institute at New York University, 2016).

Campolo, A., Sanﬁlippo, M., Whittaker, M. & Crawford, K.

AI Now 2017 Report (AI Now Institute at New YorkUniversity, 2017).

Whittaker, M. et al. AI now report 2018 (AI Now Institute at New York University New York, 2018).

Kraus, R. Gmail smart replies may be creepy, but they’re catching on like wildﬁre (2018). URL:https://mashable.com/article/gmail-smart-reply-growth/.

Henderson, M. et al.

Efﬁcient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017).

Ritter, A., Cherry, C. & Dolan, W. B. Data-driven response generation in social media. In

Proceedings of the conferenceon empirical methods in natural language processing , 583–593 (Association for Computational Linguistics, 2011).

Lee, M. K. Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmicmanagement.

Big Data & Soc. , 2053951718756684 (2018). Hancock, J. T., Naaman, M. & Levy, K. Ai-mediated communication: Deﬁnition, research agenda, and ethical considera-tions.

J. Comput. Commun. (2020).

Arnold, K. C., Chauncey, K. & Gajos, K. Z. Predictive text encourages predictable writing. In

Proceedings of the 25thInternational Conference on Intelligent User Interfaces , 128–138 (2020).

Hohenstein, J. & Jung, M. AI-Supported Messaging: An Investigation of Human-Human Text Conversation with AISupport. In

Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18 , DOI:10.1145/3170427.3188487 (2018).

Tong, S. & Walther, J. Relational maintenance and cmc.

Comput. Commun. Pers. Relationships Google. Smart reply ml kit (2020). URL: https://developers.google.com/ml-kit/language/smart-reply.

Wiggins, J. S., Trapnell, P. & Phillips, N. Psychometric and geometric characteristics of the revised interpersonal adjectivescales (ias-r).

Multivar. Behav. Res. , 517–530 (1988). Lee, J. Leader-member exchange, the" pelz effect," and cooperative communication between group members.

Manag.Commun. Q. , 266–287 (1997). Jakesch, M., French, M., Ma, X., Hancock, J. T. & Naaman, M. AI-Mediated Communication. In

Proceedings of the2019 CHI Conference on Human Factors in Computing Systems - CHI ’19 , 1–13, DOI: 10.1145/3290605.3300469 (ACMPress, New York, New York, USA, 2019).

Promberger, M. & Baron, J. Do patients trust computers?

J. Behav. Decis. Mak. , 455–468, DOI: 10.1002/bdm.542(2006). Hutto, C. J. & Gilbert, E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In

Eighthinternational AAAI conference on weblogs and social media (2014).

Berger, C. R. & Calabrese, R. J. Some explorations in initial interaction and beyond: Toward a developmental theory ofinterpersonal communication.

Hum. Commun. Res. , 99–112 (1975). Liscombe, J., Venditti, J. & Hirschberg, J. Classifying subject ratings of emotional speech using acoustic features. In

Eighth European Conference on Speech Communication and Technology (2003).

Pierre-Yves, O. The production and recognition of emotions in speech: features and algorithms.

Int. J. Human-ComputerStud. , 157–183 (2003). Breck, E., Choi, Y. & Cardie, C. Identifying expressions of opinion in context. In

IJCAI , vol. 7, 2683–2688 (2007).

Pang, B. & Lee, L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.In

Proceedings of the 43rd annual meeting on association for computational linguistics , 115–124 (Association for Com-putational Linguistics, 2005).

Popescu, A.-M., Nguyen, B. & Etzioni, O. Opine: Extracting product features and opinions from reviews. In

Proceedingsof HLT/EMNLP 2005 Interactive Demonstrations (2005).

Turney, P. D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classiﬁcation of reviews. In

Proceedings of the 40th annual meeting on association for computational linguistics , 417–424 (Association for Compu-tational Linguistics, 2002).

Rienks, R. & Heylen, D. Dominance detection in meetings using easily obtainable features. In

International Workshopon Machine Learning for Multimodal Interaction , 76–86 (Springer, 2005).

Traeger, M. L., Sebo, S. S., Jung, M., Scassellati, B. & Christakis, N. A. Vulnerable robots positively shape humanconversational dynamics in a human–robot team.

Proc. Natl. Acad. Sci. , 6370–6375 (2020).

Hohenstein, J. & Jung, M. Ai as a moral crumple zone: The effects of ai-mediated communication on attribution andtrust.

Comput. Hum. Behav. , 106190 (2020).

Sukumaran, A., Vezich, S., McHugh, M. & Nass, C. Normative inﬂuences on thoughtful online participation.In

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , CHI ’11, 3401–3410, DOI:10.1145/1978942.1979450 (ACM, New York, NY, USA, 2011).

Prasanna, N. smartreply (2020). URL: https://github.com/Narasimha1997/smartreply.

McInnis, B., Cosley, D., Nam, C. & Leshed, G. Taking a hit: Designing around rejection, mistrust, risk, and workers’experiences in amazon mechanical turk. In

Proceedings of the 2016 CHI conference on human factors in computingsystems , 2271–2282 (2016).

Knutson, B. Facial expressions of emotion inﬂuence interpersonal trait inferences. journal Nonverbal Behav. , 165–182(1996). Kleiber, C., Zeileis, A. & Zeileis, M. A. Package ‘aer’.

R package version 1.2 (2020). Pustejovsky, J. clubsandwich: Cluster-robust (sandwich) variance estimators with small-sample corrections. r packageversion 0.2. 3.

R Found. Stat. Comput. Vienna (2017). Schwartz, H. A. et al.

Personality, gender, and age in the language of social media: The open-vocabulary approach.

PloSone , e73791 (2013). Blair, G. et al.

Package ‘estimatr’.

Stat , 295–318 (2018). Hohenstein, J., Kizilcec, R., DiFranzo, D., Aghajari, Z. & Jung, M. Ai-mediated communication: Effects on languageand interpersonal perceptions. http://dx.doi.org/10.17632/6v5r6jmd3y.1 (2021).

Acknowledgements

We would like to acknowledge Hirokazu Shirado and Michael Macy for providing feedback on this work and support fromgrants from the National Science Foundation (Award Numbers IIS-1901151 and 72517).

Author contributions statement

J.H., D.D., R.K., and M.J. conceptualized the experiments, J.H., D.D., R.K., Z.A., and M.J. curated the data, J.H. and R.K.performed the formal analysis, J.H. performed the investigation, J.H., R.K., and M.J. determined the methodology, J.H. andM.J. administered the project, J.H., D.D., and Z.A. provisioned resources, J.H., D.D., and Z.A. developed software, J.H. andM.J. wrote the original draft, K.L., M.N., J.H., and M.J. acquired funding, and all authors reviewed and edited the manuscript.

Additional information

Competing interests : Authors declare no competing interests.: Authors declare no competing interests.