Alexa Depression and Anxiety Self-tests: A Preliminary Analysis of User Experience and Trust
AAlexa Depression and Anxiety Self-tests: A Preliminary Analysisof User Experience and Trust
Juan C. Quiroz
Centre for Big Data Research inHealth, UNSWAustralian Institute of HealthInnovation, Macquarie UniversitySydney, [email protected]
Tristan Bongolan
Macquarie UniversitySydney, [email protected]
Kiran Ijaz
Australian Institute of HealthInnovation, Macquarie UniversitySydney, [email protected]
ABSTRACT
Mental health resources available via websites and mobile appsprovide support such as advice, journaling, and elements fromcognitive behavioral therapy. The proliferation of spoken conver-sational agents, such as Alexa, Siri, and Google Home, has led toan increasing interest in developing mental health apps for thesedevices. We present the pilot study outcomes of an Alexa Skillthat allows users to conduct depression and anxiety self-tests. Tenparticipants were given access to the Alexa Skill for two-weeks,followed by an online evaluation of the Skill’s usability and trust.Our preliminary evaluation suggests that participants trusted theSkill and scored the usability and user experience as average. Usageof the Skill was low, with most participants using the Skill onlyonce. In view of work-in-progress, we also present a discussion ofimplementation and study design challenges to guide the currentliterature on designing spoken conversational agents for mentalhealth applications.
CCS CONCEPTS • Human-centered computing → Sound-based input / out-put . KEYWORDS mental health, conversational agent, depression, anxiety, Alexa
ACM Reference Format:
Juan C. Quiroz, Tristan Bongolan, and Kiran Ijaz. 2020. Alexa Depressionand Anxiety Self-tests: A Preliminary Analysis of User Experience and Trust.In
Adjunct Proceedings of the 2020 ACM International Joint Conference onPervasive and Ubiquitous Computing and Proceedings of the 2020 ACM Inter-national Symposium on Wearable Computers (UbiComp/ISWC ’20 Adjunct),September 12–16, 2020, Virtual Event, Mexico.
ACM, New York, NY, USA,3 pages. https://doi.org/10.1145/3410530.3414374
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].
UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8076-8/20/09...$15.00https://doi.org/10.1145/3410530.3414374
Mental health problems are a growing global challenge affectingpeople of all different backgrounds, ages, and socioeconomic sta-tus [19]. To tackle this challenge, mental health resources are in-creasingly available online and via mobile apps [8, 12, 14]. Theproliferation of conversational agents has made them attractive forhealth applications [11] and mental health [18]. Notable chatbotsthat monitor mood and use aspects of CBT to help users deal withanxiety and depression include Woebot [17] and TESS [4].The advancements and rapid adoption of spoken conversationalagents, such as Siri and Alexa, make them attractive as a channelfor providing mental health resources to users [13]. Spoken conver-sational agents interact with users via spoken natural language. Formental health applications, this requires users to vocalize responsesabout their mental health status, which is different than typingresponses to a chatbot or on a website. One study explained thatsome people are more likely to have truthful interactions abouttheir mental health with technology than wth mental health pro-fessionals [16].This paper presents work-in-progress findings of an Alexa Skillwe developed that performs depression and anxiety self-tests. Cur-rent Alexa Skills focus on guiding, educating, and helping usersmanage mental health issues. Some Alexa Skills examples includemanagement for anxiety and stress through advice sessions (
AntiAnxiety , Anxiety Stress ), assisting people with depression by pro-viding tasks to boost their mood (
Mental Health Day Manager ),management advice and education for children and teenagers deal-ing anger, stress, anxiety, and depression (
Mental Health Spies ),and targeted exercises depending on the situation (work, studies,life) causing the user stress (
Mindscape ). One study used Alexa tomonitor a user’s mental health behaviors and symptoms, requiringusers to self-report data on sleep, mood, and activity levels [13].However, further studies are required to provide evidence regard-ing the efficacy of delivering mental health resources via spokenconversational agents.Our contributions are: (1) We developed an Alexa Skill that al-lows users to conduct depression and anxiety self-tests and makesexercise recommendations to alleviate anxiety and depression symp-toms; (2) We conducted a pilot study with 10 participants to assessthe usability of our Alexa Skill. a r X i v : . [ c s . H C ] A ug biComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico Quiroz et al. In this preliminary study, we recruited 10 participants who ownedan Alexa device or had a smartphone where the Alexa app could beinstalled. Participants first completed an online questionnaire thatcollected demographics and conversational agent usage habits. Thequestionnaire also included depression (PHQ-9 [10]) and anxiety(GAD-7 [7]) self-tests. This helped familiarize the participants withthe depression and anxiety self-tests that they would later completewith our Alexa Skill. In our results, we compare online self-testscores vs the self-test scores completed using our Alexa Skill.Participants were given access to the Alexa Skill for two weeks.It was recommended to participants to use the Skill regularly,with reminders sent every three days. After two weeks, we askedparticipants to complete another questionnaire, which includedquestions to assess usability, user experience, and trust. Ethics ap-proval was granted by Macquarie UniversityâĂŹs Human ResearchEthics Committee for Medical Sciences (ethics reference number of52020662417083).
The Alexa Skill allowed users to express their emotions, conduct self-tests for depression and anxiety, and make a number of suggestionsto improve the user’s current state-of-mind. Each session beganwith Alexa asking the user how they were feeling. After expressingtheir emotions, the user was prompted to either complete self-testsfor depression and anxiety or to hear self-help exercises.If the user expressed a symptom related to anxiousness or de-pression, Alexa prompted them to complete a self-test dependingon their listed emotions. After the user completed the depressionand anxiety self-tests, Alexa stated their depression and anxietyscores. Afterward, Alexa would recommend that the user practiceone of five actions (randomly selected): breathing exercise; mus-cle relaxation exercise; lifestyle recommendations such as propersleep, exercise, and maintaining a healthy diet; journaling; practicegratitude.
The questionnaire after the two-week period collected the partici-pant’s final self-test scores for depression and anxiety, a perspectiveof the appâĂŹs usability, user experience, trust, and feedback. Weused the following questionnaires to assess the Alexa Skill: theSystem Usability Scale (SUS) for system usability [2]; the User Expe-rience Questionnaire (UEQ) for pragmatic and hedonic qualities [5];and the Technology Trust Questionnaire for trust between the sys-tem and the user [6].
The average age of the participants was 21.6 years old and 8/10 weremale. 9/10 of the participants used conversational agents. 7/10 of theparticipants also indicated that they had not previously used an appfor mental well-being. After the two-week period, 7/10 participantsindicated that they rarely used our Alexa Skill.Figure 1 shows the anxiety and depression scores completedusing a webpage (online) vs using Alexa. Given that the majority
Figure 1: Depression (PHQ-9) and anxiety (GAD-7) self-testscores completed online (webpage) vs using the Alexa Skill. of the participants rarely used the Alexa Skill, we cannot attributethe change in anxiety score or depression scores to the use of theAlexa Skill or to the delivery of the self-tests via Alexa.Figure 2 shows the user experience scores of the Alexa Skill.The participants found the app’s attractiveness, dependability, andstimulation to be above average, but attractiveness, perspicuity,efficiency, and novelty were scored below average. Figure 3 showsthe trust scores for the Alexa Skill. We conclude, participants mostlytrusted the app.
Figure 2: User experience scores of the Alexa Skill for depres-sion and anxiety self-tests.Figure 3: Trust scores of the Alexa Skill.
This pilot showed a willingness from participants to trust Alexawith personal information such as depression and anxiety scores. lexa Depression and Anxiety Self-tests: A Preliminary Analysis of User Experience and Trust UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico
The user experience scores of the Alexa Skill also showed thatparticipants considered it lacking in efficiency and novelty.The design of this study, the implementation of the Alexa Skill,and conducting the pilot, highlighted a number of challenges whenit comes to the development of an Alexa Skill for the mental healthspace.
Cold Start Problem:
The first user interactions with an AlexaSkills with multi-step or branching dialogues can be challenging fornew users. New users do not know the possibility of dialogue flows,the intent recognitions that have been programmed into the AlexaSkill, and the responses that are valid (how to respond to a questionfrom the Alexa Skill). While Alexa Skills can be deployed with auser manual or by walking the user through a tutorial, complexskills may require regular use and various trials before the user iscomfortable interacting with the Alexa Skill. In our pre-pilot tests,users struggled when interacting with our Alexa Skill because theygave responses that were not captured by the intent recognitionrules we had programmed. Careful design must be planned to en-sure intuitive interactions between the user and the Alexa Skill,as any difficulties are bound to discourage users from future use.This is especially important when designing spoken conversationalagent apps for mental health, where users may be experiencingdistress when they decide to use the Skill.
Robust Dialogues:
Alexa Skills with deep dialogues need robusthandling of user responses. In our Alexa Skill, the depression andthe anxiety self-tests involved Alexa reading multiple questionsand waiting for a user response to each question. During testing,some users were frustrated by having to repeat the depression orthe anxiety self-test because the speech recognition of Alexa didnot understand their response, or because the Skill closed due to anAlexa error or an error in our Skill. If a user is experiencing distress,this type of experience may worsen the user’s mental state or failto provide the support intended by the app/Skill.
Handling expression of emotions:
Our Alexa Skill allowedusers to state how they were feeling using single words. This aspectof our Alexa Skill was brittle, since expressing a word not includedin our list of emotions meant that Alexa would ask the user torepeat themselves. Ideally, our long-term goal is to support allexpressions of emotions, with Alexa being able to acknowledgewhat the user is experiencing. We believe this acknowledgmentcan help users in isolation and experiencing distress, but it posestechnical challenges in natural language understanding. Some ofthe users that tested our Skill also expressed their emotions byreferring to physical symptoms associated with an emotion theywere feeling, i.e. "sweaty palms", "heart racing", which poses theadditional challenge of knowing the association between a bodyresponse and an emotion.
Engagement with the self-test Alexa Skill:
Engagement withhealth support technologies is concerning. Related literature inmobile health apps for mental health reports users’ uptake and en-gagement challenges [15]. In our pilot, we observed similar trendswhere participants rarely used the Alexa Skill. Enhancing the en-gagement of conversational agents through storytelling [1], person-alization [9], and affect [3] has been proposed in the past. However,further research is needed to understand the design implication ofsuch features for spoken conversational agents in the mental healthcontext.
REFERENCES [1] Cristina Battaglino and Timothy W. Bickmore. [n.d.]. Increasing the engagementof conversational agents through co-constructed storytelling. In
INT/SBG@AIIDE (2015).[2] Simone Borsci, Stefano Federici, and Marco Lauriola. 2009. On the dimensionalityof the System Usability Scale: a test of alternative measurement models.
CognitiveProcessing
10, 3 (Aug. 2009), 193–197. https://doi.org/10.1007/s10339-009-0268-9[3] Zoraida Callejas, RamÃşn LÃşpez-CÃşzar, Nieves ÃĄbalos, and David Griol.2011. Affective conversational agents: the role of personality and emotion inspoken interactions. In
Conversational agents and natural language interaction:Techniques and effective practices . IGI Global, 203–222.[4] Russell Fulmer, Angela Joerin, Breanna Gentile, Lysanne Lakerink, and MichielRauws. 2018. Using Psychological Artificial Intelligence (Tess) to Relieve Symp-toms of Depression and Anxiety: Randomized Controlled Trial.
JMIR mentalhealth
5, 4 (Dec. 2018), e64. https://doi.org/10.2196/mental.9782[5] Andreas Hinderks, Martin Schrepp, Francisco JosÃľ DomÃŋnguez Mayo,MarÃŋa JosÃľ Escalona, and JÃűrg Thomaschewski. 2019. Developing a UXKPI based on the user experience questionnaire.
Computer Standards & Interfaces
65 (July 2019), 38–44. https://doi.org/10.1016/j.csi.2019.01.007[6] Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for anEmpirically Determined Scale of Trust in Automated Systems.
InternationalJournal of Cognitive Ergonomics
4, 1 (March 2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04[7] Pascal Jordan, Meike C. Shedden-Mora, and Bernd LÃűwe. 2017. Psychometricanalysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care usingmodern item response theory.
PLoS ONE
12, 8 (Aug. 2017). https://doi.org/10.1371/journal.pone.0182162[8] Takeshi Kamita, Tatsuya Ito, Atsuko Matsumoto, Tsunetsugu Munakata, andTomoo Inoue. 2019. A Chatbot System for Mental Healthcare Based on SATCounseling Method.
Mobile Information Systems
J Med Internet Res
21, 11 (7 Nov 2019), e15360. https://doi.org/10.2196/15360[10] K. Kroenke, R. L. Spitzer, and J. B. Williams. 2001. The PHQ-9: validity of a briefdepression severity measure.
Journal of General Internal Medicine
16, 9 (Sept.2001), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x[11] Liliana Laranjo, Adam G Dunn, Huong Ly Tong, Ahmet Baki Koca-balli, Jessica Chen, Rabia Bashir, Didi Surian, Blanca Gallego, FarahMagrabi, Annie Y S Lau, and Enrico Coiera. 2018. Conversationalagents in healthcare: a systematic review.
Journal of the Ameri-can Medical Informatics Association
25, 9 (07 2018), 1248–1258. https://doi.org/10.1093/jamia/ocy072 arXiv:https://academic.oup.com/jamia/article-pdf/25/9/1248/25643433/ocy072.pdf[12] Kien Hoa Ly, Ann-Marie Ly, and Gerhard Andersson. 2017. A fully automatedconversational agent for promoting mental well-being: A pilot RCT using mixedmethods.
Internet Interventions
10 (Dec. 2017), 39–46. https://doi.org/10.1016/j.invent.2017.10.002[13] Raju Maharjan, Per BÃękgaard, and Jakob E. Bardram. 2019. "Hear me out":smart speaker based conversational agent to monitor symptoms in mentalhealth. In
Adjunct Proceedings of the 2019 ACM International Joint Conferenceon Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM In-ternational Symposium on Wearable Computers (UbiComp/ISWC ’19 Adjunct) .Association for Computing Machinery, London, United Kingdom, 929–933.https://doi.org/10.1145/3341162.3346270[14] Robert R. Morris, Stephen M. Schueller, and Rosalind W. Picard. 2015. Efficacyof a Web-based, crowdsourced peer-to-peer cognitive reappraisal platform fordepression: randomized controlled trial.
Journal of Medical Internet Research
Proceedings Of The 19Th Annual SigdialMeeting On Discourse And Dialogue. [17] Colleen Stiles-Shields. [n.d.]. Woebot: A Professional Review. ([n. d.]). https://onemindpsyberguide.org/expert-review/woebot-an-expert-review/[18] Aditya Nrusimha Vaidyam, Hannah Wisniewski, John David Halamka, Matcheri S.Kashavan, and John Blake Torous. 2019. Chatbots and Conversational Agentsin Mental Health: A Review of the Psychiatric Landscape.
Canadian Journalof Psychiatry. Revue Canadienne De Psychiatrie
64, 7 (2019), 456–464. https://doi.org/10.1177/0706743719828977[19] K. Woodward, E. Kanjo, D. Brown, T. M. McGinnity, B. Inkster, D. J. Macintyre,and A. Tsanas. 2019. Beyond mobile apps: a survey of technologies for mentalwell-being.