Expressive Interviewing: A Conversational System for Coping with COVID-19
Charles Welch, Allison Lahnala, Verónica Pérez-Rosas, Siqi Shen, Sarah Seraj, Larry An, Kenneth Resnicow, James Pennebaker, Rada Mihalcea
EExpressive Interviewing:A Conversational System for Coping with COVID-19
Charles Welch (cid:5) , Allison Lahnala (cid:5) , Vernica Prez-Rosas (cid:5) , Siqi Shen (cid:5) , Sarah Seraj Φ ,Larry An ι , Kenneth Resnicow † , James Pennebaker Φ , Rada Mihalcea (cid:5)(cid:5) Computer Science & Engineering, University of Michigan Φ Department of Psychology, University of Texas ι Medical School, University of Michigan † School of Public Health, University of Michigan { cfwelch,alcllahn,vrncapr,shensq,lcan,kresnic,mihalcea } @umich.edu { sarahseraj,pennebaker } @utexas.edu Abstract
The ongoing COVID-19 pandemic has raisedconcerns for many regarding personal and pub-lic health implications, financial security andeconomic stability. Alongside many otherunprecedented challenges, there are increas-ing concerns over social isolation and mentalhealth. We introduce
Expressive Interviewing –an interview-style conversational system thatdraws on ideas from motivational interview-ing and expressive writing. Expressive Inter-viewing seeks to encourage users to expresstheir thoughts and feelings through writing byasking them questions about how COVID-19has impacted their lives. We present relevantaspects of the system’s design and implemen-tation as well as quantitative and qualitativeanalyses of user interactions with the system.In addition, we conduct a comparative evalu-ation with a general purpose dialogue systemfor mental health that shows our system poten-tial in helping users to cope with COVID-19issues.
The COVID-19 pandemic has changed our world inunimaginable ways, dramatically challenging ourhealth system and drastically changing our dailylives. As we learned from recent large-scale analy-ses that we performed on social media datasets andextensive surveys, many people are currently expe-riencing increased anxiety, loneliness, depression,concerns for the health of family and themselves,unexpected unemployment, increased child care orhomeschooling, and general concern with what thefuture might look like. Research in Expressive Writing (Pennebaker,1997b) and Motivational Interviewing (Miller andRollnick, 2012) has shown that even simple inter-actions where people talk about one particular ex-perience can have significant psychological value. http://trackingsocial.life Numerous studies have demonstrated their effec-tiveness in improving peoples mental and physicalhealth (Vine et al., 2020; Pennebaker and Chung,2011; Resnicow et al., 2017). Both ExpressiveWriting and Motivational Interviewing rely on thefundamental idea that by putting emotional up-heavals into words, one can start to understandthem better and therefore gain a sense of agencyand coherence of the thoughts and emotions sur-rounding their experience.In this paper, we introduce a new interview-styledialogue paradigm called
Expressive Interviewing that unites strategies from Expressive Writing andMotivational Interviewing through a system thatguides an individual to reflect on, express, and bet-ter understand their own thoughts and feelings dur-ing the pandemic.By encouraging introspection and self-expression, the dialogue aims to reduce stressand anxiety. Our system is currently online at https://expressiveinterviewing.org andavailable for anyone to try anonymously.
Expressive Writing.
Expressive writing is a writ-ing paradigm where people are asked to disclosetheir emotions and thoughts about significant lifeupheavals. Originally studied in the scope of trau-matic experiences (Pennebaker and Beall, 1986),study participants are usually asked to write aboutan assigned topic for about 15 minutes for one tofive consecutive days. Later studies expanded tospecific experiences such as losing a job (Speraet al., 1994). Expressive writing has been shown tobe effective on both physical and mental healthmeasures by multiple meta-analyses (Frattaroli,2006), finding its association with drops in physi-cian visits, positive behavioral changes, and long-term mood improvements. No single theory at a r X i v : . [ c s . H C ] J u l resent explains the cause of its benefits, but it is be-lieved that the process of expressing emotions andconstructing a story may play a role for participantsin forming a new perspective on their lives (Pen-nebaker and Chung, 2011). Motivational Interviewing.
Motivational Inter-viewing (MI) is a counseling technique designedto help people change a desired behavior by lever-aging their own values and interests. The approachaccepts that many people looking for a change areambivalent about doing so as they have reasons toboth change and sustain the behavior. Therefore,the goal of an MI counselor is to elicit their client’sown motivation for changing by asking open ques-tions and reflecting back on the client’s statements.MI has been shown to correlate with positive be-havior changes in a large variety of client goals,such as weight management (Small et al., 2009),chronic care intervention (Brodie et al., 2008), andsubstance abuse prevention (D’Amico et al., 2008).
Dialogue Systems.
With the development of deeplearning techniques, dialogue systems have beenapplied to a large variety of tasks to meet increasingdemands. In recent work, Afzal et al. (2019) builta dialogue-based tutoring system to guide learn-ers through varying levels of content granularity tofacilitate a better understanding of content. Hen-derson et al. (2019) applied a response retrievalapproach in restaurant search and booking to pro-vide and enable the users to ask various questionsabout a restaurant. Ortega et al. (2019) built anopen-source dialogue system framework that navi-gates students through course selection.There are also dialogue system building toolssuch as Google’s Dialogflow and IBM’s Watsonassistant, which enable numerous dialogue sys-tems for customer service or conversational userinterfaces. Chatbots for Automated Counseling.
Two dia-logue systems for automated counseling servicesavailable on mobile platforms are Wysa and Woe-bot. These chatbots provide cognitive behavioraltherapy with the goal of easing anxiety and depres-sion by allowing users to express their thoughts. Astudy of Wysa users over three months showed thatmore active users had significantly improved symp- https://dialogflow.com/ https://wysa.io/ https://woebot.io/ toms of depression (Inkster et al., 2018). Anotherstudy shows that young students using Woebot sig-nificantly reduced anxiety levels after two weeksof using the conversational agent (Fitzpatrick et al.,2017). These findings suggest a promising ben-efit of automated counseling for the nonclinicalpopulation.Our system is distinct from Wysa and Woebotin that it is designed specifically for coping withCOVID-19 and allows users to write more topic re-lated free-form responses. It asks open-ended ques-tions and encourages users to introspect, and thenprovides visualized feedback afterward, whereasthe others have a conversational logic mainly basedon precoded multiple choice options. Our system conducts an interview-style interactionwith the users about how the COVID-19 pandemichas been affecting them. The interview consists ofseveral writing prompts in the form of questionsabout specific issues related to the pandemic. Dur-ing the interview, the system provides reflectivefeedback based on the user’s answers. After theinteraction is concluded, the system presents userswith detailed graphical and textual feedback.The system’s goal is to encourage users to writeas much as possible about themselves, buildingupon previous findings regarding the psychologi-cal value of writing about personal upheavals andthe use of reflective listening for behavioral change(Pennebaker, 1997b; Miller and Rollnick, 2012).To achieve this, the system guides the interactionby asking four main open-ended questions. Then,based on users responses, the system provides feed-back and asks additional questions whenever ap-propriate. In order to provide reflective feedback,the system automatically detects the topics beingdiscussed (e.g., work, family) or emotions beingfelt (e.g., anger, anxiety), and responds with a re-flective prompt that asks the user to elaborate or toanswer a related question to explore that conceptmore deeply. For instance, if the system detects work as a topic of interest, it responds with “Howhas work changed under COVID? What might yoube able to do to keep your career moving duringthese difficult times?”
During the formulation of the guiding questionsused by our system, we worked closely with oursychology and public health collaborators to iden-tify a set of questions on COVID-19 topics thatwould motivate individuals to talk about their per-sonal experience with the pandemic. We formu-lated the following question as the system’s conver-sation starting point: [Major issues]
What are the major issues in yourlife right now, especially in the light of the COVIDoutbreak?We also formulated three follow-up questions,which were generated after several refining itera-tions. The order of these questions is randomizedacross users of the system. [Looking Forward]
What do you most look for-ward to doing once the pandemic is over? [Advice to Others]
What advice would you giveother people about how to cope with any of theissues you are facing? [Grateful]
The outbreak has been affectingeveryone’s life, but people have the amazing abilityto find good things even in the most challengingsituations. What is something that you have doneor experienced recently that you are grateful for?
Our system’s capability for language understand-ing relies on identifying words belonging to variouslexicons. This simple strategy allowed us to quicklydevelop a platform upon which we intend to imple-ment a more sophisticated language understandingability in future work.When a user responds to one of the mainprompts, the system looks for words belongingto specific topics and word categories. The systemexamines the user responses to identify dominantword categories or topics and triggers a reflectionfrom a set of appropriate reflections. If none ofthese types are matched, it responds with a genericreflection.The word categories are derived from the LIWC,WordNet-Affect and MPQA lexicons (Pennebakeret al., 2001; Strapparava et al., 2004; Wiebe et al., We removed an additional question about how people’slives have changed since the outbreak, as well as a questionabout what people missed the most about their previous lives. A dominant word category is defined as a word type,where the frequency of occurrence is at least 50% higher thanthe second highest frequency category for that group. politics category from the Roget’s The-saurus (Roget, 1911) and add a small number ofproper nouns covered in recent news (e.g. Trump,Biden, Fauci, Sanders).We formulate a set of specific reflections foreach word category and topic, which were refinedby our psychology and public health collaborators.For instance, if the dominant emotion category isanxiety, the system responds “You mention feelingssuch as fear and anxiety. What do you think is thebest way for people to cope with these feelings?”Initially, we also considered reflections for differenttypes of pronouns, but found that they did not steerthe dialogue in a meaningful direction. Instead, weflag responses with dominant use of impersonalpronouns and lack of references to the self andreflect that fact back to the user and further askthem how they are specifically being affected. Wealso crafted generic reflections to be applicable to alarge number of situations though the system doesnot understand the content of what the user has said(e.g. “I see. Tell me about a time when things weredifferent”, and “I hear you. What have you tried inthe past that has worked well”).
After the interview, the system provides visual andtextual feedback based on the user’s responses andprovides links to resources (i.e., mental health re-sources) appropriate given their main concerns.The visual feedback consists of four pie chartsshowing the relative usage of different word cate-gories, including: discussed topics (work, finance,home, health, family, friends and politics), affect(positive, negative), emotions (anger, sadness, fear,anxiety, joy), and pronouns (I, we, other).The textual feedback includes a comparison withothers (to normalize the user’s reactions) and in-terpretations of where the user falls within normal-ized scales. The system also presents a summaryof the most and least discussed topics and howthey compare to the average user, along with nor-malized values for meaningfulness, self-reflection,and emotional tone (using a 0-10 scale) along withextual descriptors for the shown scale values. These metrics are inspired by previous work onexpressive writing and represent the self-reportedmeaningfulness, usage of self-referring pronouns,and the difference in positive and negative wordusage (Pennebaker, 1997a). Finally, the systemprovides relevant resources for further exploration(e.g. for the work topic it lists external links toCOVID related job resources and safety practices).
The system is implemented as a web interface soit is accessible and easy to use. The interface isbuilt with the Django platform and jQuery and usesPython on the backend (Django Software Founda-tion, 2019).
Before the interaction users are asked to reporton a 1-7 scale: (1) [Life satisfaction] how sat-isfied they are with their life in general, and (2) [Stress before ] what is their level of stress. The userthen proceeds to the conversational interaction withour system. After the interaction, the user is askedagain about (3) [Stress after ] what is their level ofstress; (4) [Personal] how personal their interac-tion was; and (5) [Meaningful] how meaningfultheir interaction was. Once this is submitted, theuser can proceed to the feedback page and viewdetails about what they wrote and how their interac-tion compares to a sample of recent users. The useris finally presented with a list of resources triggeredby the topics discussed.We made an effort to make our system appearhuman-like to make users more comfortable whileinteracting with it, although this can vary for differ-ent individuals. In future work, we hope to exploreindividual personas and more sophisticated rapportbuilding techniques. We named our dialogue agent‘C.P.’, which stands for Computer Program . Thisname acknowledges that the user is interacting witha computer, while at the same time it makes thesystem more human by assigning it a name. Whenresponding to the user, C.P. pauses for a few sec-onds as if it is thinking and then proceeds to type aresponse one letter at a time with a low probabilityof making typos – similarly to how human userswould type. Textual descriptions are predefined for different ranges ofeach scale
Rating 1 Rating 2 rhoLife satisfaction Stress before -0.261
Life satisfaction Stress after -0.166
Life satisfaction ∆ Stress 0.083Life satisfaction Personal
Life satisfaction Meaningful
Meaningful Stress before after -0.226
Meaningful ∆ Stress -0.202
Meaningful Personal
Personal Stress before after -0.067Personal ∆ Stress -0.073
Table 1: Spearman correlation coefficients betweenpairs of ratinga for the 174 interactions. Bold indicatessignificance with p < . . After the system was launched (and up to when weconducted this analysis), we had 174 users interactwith the system. We analyze these interactions toevaluate system usefulness, user engagement, andreflection effectiveness.
System Usefulness.
We examine the system’s abil-ity to help users cope with COVID-19 related issuesby analyzing the different ratings provided by usersbefore and after their interaction with C.P. Through-out this discussion, we use ∆ Stress to indicate howthe users stress rating differs before and after theinteraction: ∆ Stress = Stress after - Stress before .Negative values for ∆ Stress are therefore an indi-cator of stress reduction, whereas positive valuesfor ∆ Stress reflect an increase in stress.We start by measuring the Spearman correlationbetween the different ratings for the 174 interac-tions with C.P. Results are shown in Table 1.The strongest correlation we observe is betweenthe personal and meaningful ratings, suggestingthat interactions that are more meaningful appearto feel more personal , or vice versa.We also observe a strong negative correlationbetween ∆ Stress and the meaningfulness of theinteraction, suggesting that the interactions that theusers found to be meaningful are associated with areduction in stress.
User engagement.
We examine user engagementby analyzing the time users spend in the interactionand the number of words they write throughouthe session. Figure 1 shows histograms of the ses-sion lengths in the number of words used by theuser and of the session duration in seconds. Therightmost column of Table 2 shows Spearman cor-relation coefficients between user ratings and thelength and duration of the sessions. We find a sig-nificant negative correlation between
Stress before and
Stress after with session duration and numberof words, suggesting an association between userengagement and lower stress. There is also a weaknegative correlation between duration of sessionand reduction in stress ( ∆ Stress).
Figure 1: Histograms of overall user engagement mea-sured by session length and duration.
We also investigate if there is a relationship be-tween the pre- and post-session ratings and howengaged a user was with each prompt in terms oflength of and duration in writing their response. Ta-ble 2 shows Spearman correlation coefficients forthese relationships. It appears that
Life Satisfaction has no correlation with the length of any promptresponse except a potentially weak negative cor-relation with length on the
Major Issues prompt( p = 0 . ). A lower rating may relate with havingmore personal challenges to write about. Stress before has a weak negative correlation be-tween the number of words used and the durationspent in the response to
Looking Forward . Higherstress may relate to present concerns, which maymake one less inclined to spend time thinking and writing about positive aspects of their future thansomeone with less stress. We presume this couldbe the case for the
Grateful prompt, which likewisecorrelates weakly and negatively with
Stress before . Stress after has a negative correlation betweenduration spent on every prompt response except forthe time spent on
Major Issues . This could be areflection of the fact that those who have a lot towrite about major issues in their life also incur highlevels of stress.The
Personal rating shows no correlations withthe duration spent on any of responses, except po-tentially
Advice to Others ( p = 0 . ). We doobserve weak negative correlations between Per-sonal ratings and response lengths on Major Issues and
Looking Forward , and potentially on
Grate-ful ( p = 0 . ) and Advice to Others ( p = 0 . ).Perhaps if a user writes more, there is a greater ex-pectation for more personal reflections. We discussengagement related to reflections more deeply inthe next section.The Meaningful rating shows weak negative cor-relations with length on
Major Issues , Advice toOthers , and possibly on
Grateful ( p = 0 . ) and Looking Forward ( p = 0 . ). We do not observea significant correlation with duration on MajorIssues or Grateful , but we do observe positive cor-relations between duration and
Looking Forward and
Advice to Others . Users who spend more timethinking about advice they would give others facingtheir issues may find the interaction more meaning-ful, and may experience benefits having reflectedon their agency in managing their challenges.
Reflection Effectiveness.
To investigate the ef-fectiveness of Expressive Interviewing reflections,we compare the reflections that were triggered forusers whose stressed decreased to the reflectionsthat triggered for the users whose stress increased.For each of these user groups, we compute the dom-inance of each reflection as its proportion of timesit was triggered out of all reflections triggered. InFigure 2, we compare the dominance of each re-flection across these user groups by dividing thereflection dominance in the decreased-stress groupby that of the increased-stress group.Importantly, we observe that all emotion reflec-tions and more topic reflections were triggeredat a higher rate for users whose stress decreased,whereas more generic reflections were triggeredat a higher rate for users whose stress increased.While we do not presume that increased stress wasajor issues Grateful Looking Forward Advice to Others OverallLength in WordsLife Satisfaction -0.148 -0.121 -0.079 -0.096 -0.070Personal -0.156 -0.147 -0.185 -0.134 -0.159
Meaningful -0.181 -0.148 -0.142 -0.151 -0.151
Stress before -0.001 -0.076 -0.161 -0.083 -0.151
Stress after -0.020 -0.135 -0.130 -0.129 -0.177 ∆ Stress -0.067 -0.106 -0.039 -0.112 -0.092Duration in SecondsLife Satisfaction -0.057 0.016 0.048 0.091 0.066Personal -0.017 -0.041 0.053 0.136 0.066Meaningful -0.036 0.099 before -0.067 -0.252 -0.178 -0.099 -0.198
Stress after -0.120 -0.241 -0.207 -0.192 -0.233 ∆ Stress -0.069 -0.023 -0.052 -0.092 -0.068
Table 2: Spearman correlation coefficients between each rating provided by a user and (top) the length in numberof words of the user’s response to each particular prompt, and (bottom) duration in seconds of the user’s responseto each particular prompt, from 174 full interactions. Bold denotes significance with p < . . HEALTH Id like to know more about your feelings surrounding your own health and thehealth of people close to you. What actions can you take to help keep you healthyduring these challenging times?FAMILY What can you do to keep your family resilient during these tough times?POLITICS What is it about the political world that may be hooking you? What are yourreactions saying about you?GEN1 Interesting to hear that. How does what you say relate to your values?GEN3 I see. Tell me about a time when things were different.
Table 3: Sample topic specific and generic reflections. due to generic reflections, the correspondence be-tween emotion and topic reflections with stressreduction aligns with expectations of effective re-flections from Motivational Interviewing–genericreflections and specific reflections resemble simplereflections and complex reflections respectively, asreferred to in Motivation Interviewing. While bothtypes of reflections serve a purpose, complex reflec-tions both communicate an understanding of whatthe client has said and also contribute an additionallayer of understanding or a new interpretation forthe user, whereas simple reflections focus on theformer (Rollnick and Allison, 2004).In qualitatively analyzing the instances wheregeneric reflections were triggered, we observe thatcontextual appropriateness seems to be the best in-dicator of their success (in terms of ability to elicita deeper thought, feeling, or interpretation) giventhat the user was invested in the experience. As these generic reflections are selected at random,their contextual appropriateness was inconsistent,illuminating the scenarios in which they are moreor less appropriate. For instance, out of the seventimes the reflection “
Interesting to hear that. Howdoes what you say relate to your values? ” wastriggered for the increased-stress users, one userexpanded on their previous message, one expressedconfusion about the question, and another copiedand pasted the definition of core values as theirresponse. Two other instances of this reflectionwere triggered when a user had expressed negativefeelings such as worry and feeling lazy which ap-peared misplaced, and the last case was triggeredby a message that was not readable. Out of the thir-teen times the same reflection was triggered for thedecreased-stress group, one user expressed not hav- igure 2: The dominance of each reflection triggeredfor users whose stress decreased divided by each re-flection’s dominance for users whose stress increased.Scores above 1 (red line) correspond to a decreasein stress; score below 1 correspond to an increase instress. See Table 3 for sample reflections, including the GEN eric reflections. ing much to say, another gave one word responsesbefore and after, and all others expanded on theirprevious message in relation to their values or gavea simple response to indicate a degree that it re-lates. This reflection appeared more “successful”(based on if the user expanded on their previousmessage or values) when it was triggered by a mes-sage with more neutral to positive sentiment, suchas when the user was expressing what they werelooking forward to, or when they had several piecesof advice to offer for a friend in their situation, asopposed to one with more negative sentiment likethe messages expressing worry or laziness.In instances of other generic reflections, we ob-served that another issue for appropriateness waswhether the reflection matched the user’s frame ofthought in terms of past, present, or future. Forinstance, the reflection “I see. Tell me about a timewhen things were different,” best matched scenar-ios when users described thoughts about changesto their daily lives, but not when users described fu-ture topics such as what they were looking forwardto, nor when they were already describing the past.Based on our observations of the reflections inaction, we have three main takeaways. First, topicand emotion specific reflections are more associ-ated with the group of users whose stress decreased.These reflections are only triggered if the systemdetermines a dominant topic or emotion, whichdepends on the effectiveness of its heuristics, aswell as the amount of detail and context that a userexpresses. This leads to the next takeaway, that the system appears to be more effective when usersapproach the experience with an intention for ex-pression, or conversely it seems less effective whenthe intent to not engage and express is explicit.Third, the generic reflections were developed withthe intent to function in generic contexts, but welearned in practice that some clashed with emo-tional and situational content or were confusinggiven the context. As we did observe many, if notmore, successful instances of generic reflections,we are able to contrast these contexts to the unsuc-cessful contexts, and can develop a heuristic forselecting the generic reflections rather than select-ing at random, as well as adapt the language of ourcurrent generic reflections to be more appropriatefor the Expressive Interviewing setting.
To assess the extent to which our Expressive Inter-viewing system delivers an engaging user experi-ence, we conduct a comparative study between oursystem and the conversational mental health appWoebot (Fitzpatrick et al., 2017).We recruited 12 participants and asked them tointeract independently with each system to discusstheir COVID-19 related concerns. More specifi-cally, we asked them to use each system for 10-15minutes and provide evaluative feedback pre- andpost-interaction. To avoid cognitive bias, we ran-domized the order in which each participant evalu-ated the systems. In addition, we randomized theorder in which the evaluation questions are shown.Before interacting with either system, partici-pants rated their life satisfaction and their stresslevel. After the interaction, participants reportedagain their stress level and rated several aspects oftheir interaction with the system, including easeof use, usefulness (in terms of discussing COVID-19 related issues and motivation to write about it),overall experience, and satisfaction using mainlybinary scales. For example, the questions “Did < system > motivate you to write at length aboutyour thoughts and feelings? yes/no” and “Howuseful was C.P. to discuss your concerns aboutCOVID? useful/not useful” assess whether thesystem encouraged the user to write about theirthoughts and feelings about COVID and whetherthe system provided guidance for it. Tables 4 and 5show the percentage of users that provided positiveor high scores ( > on a 7-point scale) for each ofthese aspects after interacting with both systems.oebot Expressive InterviewingStress before
91% 91%Stress after
73% 64%
Table 4: Percentage of users reporting high levels ofstress ( > on a 7-point Likert scale) before and afterusing Woebot and Expressive Interviewing. Woebot Expr. Interv.Ease of Use 82% 91%Useful 18% 73%Motivation to Write 27% 91%User Satisfaction 36% 36%Meaningful Interaction 64% 73%Overall Experience 36% 46%
Table 5: Comparative evaluation Woebot and Expres-sive Interviewing. Percentage of users reporting pos-itive/high ratings (with scores > As observed, there are fewer participants report-ing high levels of stress after using either system.However, we see a smaller fraction of participantsreporting high levels of stress after interacting withExpressive Interviewing, thus suggesting that oursystem was more effective in helping participantsto reduce their stress levels.Overall, participants reported that Expressive In-terviewing was easier to use, more useful to discusstheir COVID concerns and motivated them to writemore than Woebot. Similarly, users reported a moremeaningful interaction and a better overall experi-ence. However, it is important to mention that Woe-bot was not specifically designed for discussingCOVID-19 concerns and it is of more general pur-pose than our system. Nonetheless, we believe thatthis comparison provides evidence that a dialoguesystem such as Expressive Interviewing is moreeffective in helping users cope with COVID-19 is-sues as compared to a general purpose dialoguesystem for mental health.
We followed the suggestions of previous re-search on automated mental health counseling andadopted the goals of being respectful of user pri-vacy, following evidence based methods, ensuringuser safety, and being transparent in system capa-bilities (Kretzschmar et al., 2019). The practices of motivational interviewing and expressive writ-ing have numerous studies supporting their effi-cacy (Miller and Rollnick, 2012; Pennebaker andChung, 2007). The combination of these meth-ods in an interviewing format has not previouslybeen studied and we intend to continue publishingour findings as the user population expands andbecomes more diverse. We will also continue toimprove our system and assessment.We have taken efforts to secure user data. Wedo not ask for identifiers and data is stored anony-mously by session ID. The website is secured withSSL. Data is only accessible to researchers directlyinvolved with our study.Our study has been approved by the Universityof Michigan IRB.
In this paper, we introduced an interview-style di-alogue system called Expressive Interviewing tohelp people cope with the effects of the COVID-19pandemic. We provided a detailed description onhow the system is designed and implemented.We analyzed a sample of 174 user interactionswith our system and conducted qualitative andquantitative analyses on aspects such as system use-fulness, user engagement and reflection effective-ness. We also conducted a comparative evaluationstudy between our system and Woebot, a generalpurpose dialogue system for mental health. Ourmain findings suggest that users benefited from thereflective strategies used by our system and experi-enced meaningful interactions leading to reducedstress levels. Furthermore, our system was judgedto be easier to use and more useful than Woebotwhen discussing COVID-19 related concerns.In future work we intend to explore the appli-cability of the developed system to other health-related domains.
Acknowledgements
This material is based in part upon work supportedby the Precision Health initiative at the Universityof Michigan, by the National Science Foundation(grant eferences
Shazia Afzal, Tejas Dhamecha, Nirmal Mukhi, RenukaSindhgatta, Smit Marvaniya, Matthew Ventura, andJessica Yarbro. 2019. Development and deploymentof a large-scale dialog-based intelligent tutoring sys-tem. In
Proceedings of the 2019 Conference of theNorth American Chapter of the Association for Com-putational Linguistics: Human Language Technolo-gies, Volume 2 (Industry Papers) , pages 114–121,Minneapolis, Minnesota. Association for Computa-tional Linguistics.David A Brodie, Allison Inoue, and David G Shaw.2008. Motivational interviewing to change qualityof life for people with chronic heart failure: a ran-domised controlled trial.
International journal ofnursing studies , 45(4):489–500.Elizabeth J D’Amico, Jeremy NV Miles, Stefanie AStern, and Lisa S Meredith. 2008. Brief motiva-tional interviewing for teens at risk of substance useconsequences: A randomized pilot study in a pri-mary care clinic.
Journal of substance abuse treat-ment , 35(1):53–61.Django Software Foundation. 2019. Django.Kathleen Kara Fitzpatrick, Alison Darcy, and MollyVierhile. 2017. Delivering cognitive behavior ther-apy to young adults with symptoms of depressionand anxiety using a fully automated conversationalagent (woebot): a randomized controlled trial.
JMIRmental health , 4(2):e19.Joanne Frattaroli. 2006. Experimental disclosure andits moderators: a meta-analysis.
Psychological bul-letin , 132(6):823.Matthew Henderson, Ivan Vuli´c, I˜nigo Casanueva,Paweł Budzianowski, Daniela Gerz, Sam Coope,Georgios Spithourakis, Tsung-Hsien Wen, NikolaMrkˇsi´c, and Pei-Hao Su. 2019. PolyResponse: Arank-based approach to task-oriented dialogue withapplication in restaurant search and booking. In
Pro-ceedings of the 2019 Conference on Empirical Meth-ods in Natural Language Processing and the 9th In-ternational Joint Conference on Natural LanguageProcessing (EMNLP-IJCNLP): System Demonstra-tions , pages 181–186, Hong Kong, China. Associa-tion for Computational Linguistics.Becky Inkster, Shubhankar Sarda, and Vinod Sub-ramanian. 2018. An empathy-driven, conversa-tional artificial intelligence agent (wysa) for digi-tal mental well-being: real-world data evaluationmixed-methods study.
JMIR mHealth and uHealth ,6(11):e12106.Kira Kretzschmar, Holly Tyroll, Gabriela Pavarini, Ar-ianna Manzini, Ilina Singh, and NeurOx Young Peo-ples Advisory Group. 2019. Can your phone be yourtherapist? young peoples ethical perspectives on theuse of fully automated conversational agents (chat-bots) in mental health support.
Biomedical informat-ics insights , 11:1178222619829083. William R Miller and Stephen Rollnick. 2012.
Motiva-tional interviewing: Helping people change . Guil-ford press.Daniel Ortega, Dirk V¨ath, Gianna Weber, Lindsey Van-derlyn, Maximilian Schmidt, Moritz V¨olkel, Zor-ica Karacevic, and Ngoc Thang Vu. 2019. AD-VISER: A dialog system framework for education& research. In
Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics: System Demonstrations , pages 93–98, Flo-rence, Italy. Association for Computational Linguis-tics.James W Pennebaker. 1997a. Writing about emotionalexperiences as a therapeutic process.
Psychologicalscience , 8(3):162–166.James W Pennebaker and Sandra K Beall. 1986. Con-fronting a traumatic event: toward an understandingof inhibition and disease.
Journal of abnormal psy-chology , 95(3):274.James W Pennebaker and Cindy K Chung. 2007. Ex-pressive writing, emotional upheavals, and health.
Foundations of health psychology , pages 263–284.James W Pennebaker and Cindy K Chung. 2011. Ex-pressive writing: Connections to physical and men-tal health. Oxford University Press.James W Pennebaker, Martha E Francis, and Roger JBooth. 2001. Linguistic inquiry and word count:Liwc 2001.
Mahway: Lawrence Erlbaum Asso-ciates , 71(2001):2001.J.W. Pennebaker. 1997b. Writing about emotional ex-periences as a therapeutic process.
PsychologicalScience , (8):162–166.K. Resnicow, PJ Teixeira, and GC Williams. 2017. Effi-cient allocation of public health and behavior changeresources: The ”difficulty by motivation” matrix.
American Journal of Public Health , 107(1):55–57.Peter Mark Roget. 1911.
Roget’s Thesaurus of EnglishWords and Phrases...
TY Crowell Company.Stephen Rollnick and Jeff Allison. 2004. Motivationalinterviewing.
The essential handbook of treatmentand prevention of alcohol problems , pages 105–116.Leigh Small, Deborah Anderson, Kimberly Sidora-Arcoleo, and Bonnie Gance-Cleveland. 2009. Pe-diatric nurse practitioners’ assessment and manage-ment of childhood overweight/obesity: Results from1999 and 2005 cohort surveys.
Journal of PediatricHealth Care , 23(4):231–241.Stefanie P Spera, Eric D Buhrfeind, and James WPennebaker. 1994. Expressive writing and copingwith job loss.
Academy of management journal ,37(3):722–733.Carlo Strapparava, Alessandro Valitutti, et al. 2004.Wordnet affect: An affective extension of wordnet.In
Lrec , volume 4, page 40. Citeseer.. Vine, R.L. Boyd, and J.W. Pennebaker. 2020. Feel-ings in many words: Natural emotion vocabulariesas windows on distress and well-being.
Nature Com-munications .Janyce Wiebe, Theresa Wilson, and Claire Cardie.2005. Annotating expressions of opinions and emo-tions in language.
Language resources and evalua-tion , 39(2-3):165–210. ppendices
Figure 3: Average number of words in each responsegrouped by prompt order, divided by the average num-ber of words in each response overall. Equal numberof words is at 1, marked with the line. Order of theprompts are indicated by first letter: A = Advice to Oth-ers, G = Grateful, L = Looking Forward.Figure 4: Histogram of the prompt response durationsin seconds. Figure 5: Histogram of the prompt response lengths intokens. rder Sessions Life Sat. Stress b Stress a Personal Meaning ∆ StressAdvice to Others,Looking Forward,Grateful 32 5.00 3.66 3.53 5.53 5.06 -0.12Looking Forward,Grateful,Advice to Others 29 5.10 3.62 3.17 5.31 5.21 -0.45Grateful,Advice to Others,Looking Forward 36 5.75 3.03 3.03 5.39 5.53 0.00Looking Forward,Advice to Others,Grateful 29 5.41 3.93 2.83 5.17 5.24 -1.10Grateful,Looking Forward,Advice to Others 22 5.27 3.73 3.59 5.05 4.77 -0.14Advice to Others,Grateful,Looking Forward 25 5.04 3.76 3.40 4.56 4.92 -0.36
Table 6: Average ratings grouped by order that the prompts appeared. All sessions begin with “