[PDF] Investigation of the Effect of Fear and Stress on Password Choice (Extended Version)

Abstract

Background. The current cognitive state, such as cognitive effort and depletion, incidental affect or stress may impact the strength of a chosen password unconsciously. Aim. We investigate the effect of incidental fear and stress on the measured strength of a chosen password. Method. We conducted two experiments with within-subject designs measuring the Zxcvbn \textsf{log10} number of guesses as strength of chosen passwords as dependent variable. In both experiments, participants were signed up to a site holding their personal data and, for the second run a day later, asked under a security incident pretext to change their password. (a) Fear. N F =34 participants were exposed to standardized fear and happiness stimulus videos in random order. (b) \textbf{Stress.} N S =50 participants were either exposed to a battery of standard stress tasks or left in a control condition in random order. The Zxcvbn password strength was compared across conditions. Results. We did not observe a statistically significant difference in mean Zxcvbn password strengths on fear (Hedges' g av =−0.11 , 95\% CI [−0.45,0.23] ) or stress (and control group, Hedges' g av =0.01 , 95\% CI [−0.31,0.33] ). However, we found a statistically significant cross-over interaction of stress and TLX mental demand. Conclusions. While having observed negligible main effect size estimates for incidental fear and stress, we offer evidence towards the interaction between stress and cognitive effort that vouches for further investigation.

Full PDF

IInvestigation of the Effect of Fear and Stress on Password Choice(Extended Version) ∗ Tom FordyceSchool of ComputingNewcastle University, UK Sam GreenSchool of ComputingNewcastle University, UKThomas GroßSchool of ComputingNewcastle University, UK

Abstract

Background.

The current cognitive state, such ascognitive effort and depletion [1], incidental affector stress may impact the strength of a chosen pass-word unconsciously.

Aim.

We investigate the effect of incidental fear andstress on the measured strength of a chosen pass-word.

Method.

We conducted two experiments withwithin-subject designs measuring the zxcvbn [2] log10 number of guesses as strength of chosen pass-words as dependent variable. In both experiments,participants were signed up to a site holding theirpersonal data and, for the second run a day later,asked under a security incident pretext to changetheir password. (a)

Fear. N F =

34 participants wereexposed to standardized fear and happiness stimulusvideos in random order. (b)

Stress. N S =

50 partic-ipants were either exposed to a battery of standardstress tasks or left in a control condition in randomorder. The zxcvbn password strength was comparedacross conditions.

Results.

We did not observe a statisticallysigniﬁcant difference in mean zxcvbn passwordstrengths on fear (Hedges’ g av = − .

11, 95% CI [ − . , . ] ) or stress (and control group, Hedges’ ∗ Open Science Framework: https://osf.io/3cd9h . Thisis the author’s technical-report copy of this work. The deﬁnitiveversion has been published in Proceedings of the 7 th Workshopon Socio-Technical Aspects in Security and Trust (STAST’17),ACM Press, December 2018, pp. 3–15, https://doi.org/10.1145/3167996.3168000 . g av = .

01, 95% CI [ − . , . ] ). However, wefound a statistically signiﬁcant cross-over interac-tion of stress and TLX mental demand. Conclusions.

While having observed negligiblemain effect size estimates for incidental fear andstress, we offer evidence towards the interaction be-tween stress and cognitive effort that vouches forfurther investigation.

While recently heralded as its days being num-bered [3, 4], username and password are still thepredominant authentication mechanism. At thesame time, password choice makes for an archetyp-ical security task, in which users are focused onreaching a primary goal (accessing a Web site),while the security of the password is only a sec-ondary goal. Hence, results on users’ passwordchoice can be informative beyond the act of choos-ing a password itself.While there is a body of research on user habitsin password choice [5], recent research also aimedat investigating how the user’s current cognitive oraffective state impacts the strength of chosen pass-words. Imagine a user being depleted after a longday’s work or being stressed out after a painful so-cial interaction.Given that such cognitive and affective states arefound to impact inter alia executive function, work-ing and declarative memory, effects on security de-1 a r X i v : . [ c s . H C ] S e p ision making are plausible. This may hold forfor security specialists ﬁghting a security incident,experiencing stress and cognitive depletion in theprocess, as much as for off-the-street users, expe-riencing stress and cognitive depletion in everydaylife. The current state of the user could then uncon-sciously impair the strength of the password choice.While Florˆencio et al. [6] already consideredpassword strategies for ﬁnite-effort users, Groß etal. [1] investigated the effect of cognitive effort anddepletion [7, 8] on password choice in depth, con-cluding that cognitive effort is a necessary condi-tion for strong passwords. By their account, cogni-tive depletion would impair password choice signif-icantly.We investigate the effect of fear and stress onpassword choice as an archetypical security task.We focus on incidental stress, that is, stress that isunrelated to the security task at hand.It is rather elusive to induce stress independentlyfrom cognitive effort, because the host of stress in-duction instruments cause require the participant toexert cognitive effort at the same time. The rea-son for that is that by Baumeister’s limited strengthmodel [8] also governs the use of willpower. Stressinduction techniques, however, no matter whetherthey are cognitive [9], physical [10], or social [11]require the participant to exert willpower as well tokeep going, to persevere in the task.For that reason, we have not only designed an ex-periment that induces stress, but also a second ex-periment that induces fear. We have chosen to in-duce fear with a standardized video stimulus, whichdoes not yield any cognitive effort whatsoever.In this paper, we contribute two studies establish-ing the effect of incidental fear as well as of stresson password choice. To our knowledge, these arethe ﬁrst studies that induce fear and stress in a pass-word choice scenario. In addition, we investigatethe interaction between stress and cognitive effort.We offer research synthesis between these two stud-ies and the earlier work on cognitive effort in a net-work meta analysis to put these results in context ofone another. This study is founded on the principles of param-eter and interval estimation [12], that is, we areless interested in null-hypothesis signiﬁcance test-ing (NHST). Instead, we seek to quantify the mag-nitude of effects and to offer conﬁdence intervals tobracket the likely effect sizes in the population.While NHST has received its share of criticismand observed fallacies [13, 14], we aim at gainingrobustness through complementing NHST with es-timation methods [12].

We use standardized effect size metrics to ascer-tain the magnitude of observed effect. Such an ef-fect size estimates the difference between means inthe population standardized by the observed vari-ance. A prevalent recommendation in the commu-nity [12, 15] for correlated (within-subjects) meansis Hedges’ g av : g av = (cid:18) − ( df ) − (cid:19) (cid:32) M diﬀ SD + SD (cid:33) We further seek to ascertain the conﬁdence inter-val of the effect size. We consider 95% conﬁdenceintervals, which yield that if the study were run in-ﬁnitely often, the true population effect size will becaptured by the interval 95% of the runs.For our correlated samples case, we use a rec-ommended estimation method by Algina and Ke-selman [16, 12] to gain an interval estimate on thepopulation effect size. (a) We estimate the non-centrality parameter λ with a t distribution, (b) com-pute the conﬁdence interval on the λ and (c) thenﬁnally multiply its conﬁdence limits with the fol-lowing factor to obtain the ﬁnal conﬁdence intervalon the effect size: (cid:115) ( SD + SD − COV ) n ( SD + SD ) . .3 Network Meta Analysis Meta analysis refers to a set of statistical techniquesto combine the results of multiple studies [12, 17].

Network meta analysis (NMA), in particular, spe-cializes on combining results of studies with differ-ent treatments in the same meta analysis simultane-ously. For instance, in this paper we consider stud-ies which all studied the impact of certain “treat-ments,” such as stress, fear or cognitive effort, onpassword strength. Schwarzer et al. [18] offer anintroduction to the method itself, while Binod etal. [19] offer a good overview of such techniquesavailable in R . In this work, we use the R package netmeta , which implements frequentist and graph-theoretical techniques following R¨ucker [20]. In terms of overall user password behavior, the aver-age user is said to have 6.5 passwords, each sharedacross 3.9 different sites, each user has 25 accountsrequiring passwords and type 8 passwords per day[5].It has been argued that the recall in passwordauthentication itself is a humanly impossibly task,because non-meaningful items are inherently difﬁ-cult to remember [21]. This is aggravated by typ-ical password policies asking users to comply toarcane procedures, such as monthly password re-set. Such policies cause users to feel frustrated andsecurity-fatigued [22]. Consequently, users are nat-urally employing alternative strategies such as writ-ing passwords down, incrementing the number inthe password at each reset [23], storing passwordsin electronic ﬁles and reusing or recycling old pass-words [22].Password reuse, in particular, having been ob-served as overall tendency and individual prefer-ence, has received further attention in recent re-search [24].While it is possible to create strong and meaning-ful passwords using pseudo-random combinationsof letters, numbers and characters that are mean- ingful only to the owner [25], from qualitative re-search it was found that four to ﬁve passwords arethe most a typical user can be expected to use effec-tively [23].

There are controversies around how to measurepassword strength soundly and reliably [26], whereearlier methods such as the NIST password entropyheuristic (NIST Special Publication 800-63 [27])have received criticism.A recent research direction is to practicably esti-mate the number of guesses needed by an adversaryto break the password [28, 2].In this work, we considered the Password Guess-ability Service (PGS) [28] hosted by CMU as wellas the Dropbox password meter zxcvbn [2], whichboth output an estimate of the number of guesses.We chose zxcvbn as ﬁnal measurement instrument.

Affect is the experience of an emotion or feeling,where we also consider the observable affect, thatis, behavior serving as indicator of affect.While there exist a number of conceptualizationsfor affect, mood and emotions as well as a bodyof mature research in psychology [29, 30], we con-sider the Russel’s core affect conceptualization [31]as a guiding work. Russel considers the dimensionsactivation-deactivation and pleasure-displeasure asfoundational.We make a distinction between two kinds of af-fect [32]: • integral affect (also task-related affect) refersto the experienced feelings with respect to astimulus, and • incidental affect refers to feelings such asmood states that are independent of a stimulusbut can be misattributed to it or can inﬂuencedecision processes.For our enquiry into fear and stress, Russel’s coreaffect model is especially interesting as it allows usto classify distress and fear in the common quadrantof displeasure/activation.3 .3 Fear and Fear Appeals Affects have been considered in security mostlyconcerning fear, especially in relation to fear ap-peals.

Fear appeals [33, 34, 35] are messages de-signed to motivate a certain behavior by elicitingfear.While fear appeals are are operating withintegral/task-related fear, we consider incidentalfear and stress in this work.Boss et al. [36] gave an overview of fear appealsresearch in information security, pointing out thatmost prior studies in the space did not measure feardirectly. In general, Ruiter et al. [37] pointed outthat coping information is bound to be more im-portant in yielding a protection motivation than riskwarnings and fear arousal.

There has been considerable research on affect elic-itation. For instance, the ﬁrst ten chapters ofthe

Handbook of Emotion Elicitation and Assess-ment [38] are concerned with different elicitationmethods. Not all elicitation methods are createdequal, though. Westermann et al. found consid-erable differences in effectiveness and validity ofmood induction procedures (MIP) [39]. In thisstudy, we focus on Film MIP without instructionswhich was a category recommended by Wester-mann et al. From inspecting this past research, weconclude that the chosen MIPs are sound to elicitaffect in participants.Affect elicitation with stimulus ﬁlms has receiveda systematic treatment in affective psychology [40],where our stimulus videos are drawn from Rotten-berg et al.’s work [41]. Here, we ﬁnd particular rec-ommendations for video segments validated for des-ignated target emotions, such as fear or happiness.We observe that it was reported among others byRottenberg et al. [41, pp. 103] that fear is a chal-lenging emotion to elicit discretely. In their experi-ments with stimulus ﬁlms, such as

The Silence of theLambs , they found that the ﬁlms also elicited ten-sion and interest in equal measures, however theyargued that the conﬂuence of fear, interest and ten-sion is indeed natural. We note that affect elicitation with stimuli ﬁlmshas been validated in within-subject experiment de-signs [41]. The variable “within-subject design”was taken into account in a comprehensive metaanalysis on MIPs [39], where the impact of thisvariable was not found to be statistically signiﬁcant( r = .

08 for ﬁlms/story MIPs.).

While Coan and Allen [38] give an overview ofa range of affect measurement methods, we focuson self-report instruments such as the Positive andNegative Affect Schedule (PANAS-X) [42]. Thequestionnaire primarily measures the valence of re-ported affect (positive or negative) and its content(e.g., fear or joviality). The questionnaire was de-signed to reliably measure affect while still beingeasily administered. The full 60-item schedule usu-ally takes participants about 10 minutes.

Selye [43] originally deﬁned stress as the the non-speciﬁc response of the body to any demand madeupon it , and distinguished between eustress and dis-tress, hyperstress and hypostress.It was postulated that some levels of stress mayimprove performance and that performance will de-teriorate once an optimal range of arousal is passed.With respect to arousal in habit formation, sucha relationship was also formulated as the Yerkes-Dodson law [44].

Stress can be elicited in experiments predominantlyby cognitive, physical or social instruments. Liaoand Carey [9] offer an overview of lab stress elici-tation methods.Existing experimental protocols often combinevalidated cognitive, physical and social stress tasks,where such tasks have been used to induce psy-chological stress and receive a cardiovascular re-sponse [9]. In addition, there exist instruments,such as

Trier Social Stress Test (TSST) [11], which4re elaborate protocols to induce stress in multiplestages.

While stress can be measured psychophysiologi-cally from heart rate variability and skin conduc-tance, a number of instruments have been proposedto measure stress in self-report questionnaires.Partially, researchers used affective instru-ments, such as the State-Trait Anxiety Inventory(STAI) [45] to gauge stress.Partially, researchers developed and used special-ized stress scales, such as the Dundee Stress StateQuestionnaire (DSSQ) [46] or its short form, theShort Stress State Questionnaire (SSSQ) [47, 48].

Research in cognitive effort has a long history inpsychology, for instance starting from Kahneman’swork on effort and attention [7].Baumeister et al. [8] ﬁrst proposed that humanbeings have a limited store of cognitive energy, for-mulated in the limited strength model of cognitiveeffort . Willpower and self-control are said to drawfrom this inner resource as well as cognitively hardtasks. Examples include controlling attention, emo-tions, impulses, thoughts and cognitive processing,choice and volition and social processing [49]. Ingeneral, all tasks that are cognitively effortful drawfrom the limited cognitive energy.

Cognitive effort as a construct can, for instance,be measured by tasks that require cognitive effortthemselves. In those examples the error rate on thetasks will indicate cognitive depletion. These mea-surement methods come at the disadvantage thatthey change the participants’ state by inducing cog-nitive depletion themselves.Baumeister et al. consequently proposed to use aBrief Mood Introspection Scale (BMIS) [50, 8] or ashort form [51] as a proxy to measure cognitive ef-fort through items such as “being tired” and “beingworn-out.” Other research focused on the measurement ofthe cognitive effort needed for a task, the task load .A notable measurement instrument along this linesthat stood the test of time is the NASA Task LoadIndex (TLX).

RQ 1 (Study 1: Fear) . To what extent do elicited af-fects happiness and fear impact password strength?

Table 1 gives an overview of the operationaliza-tion of this research question. As independent vari-able (IV), we have elicited affect with the two lev-els: Fear and Happiness.We intend to check that the manipulation wassuccessful by evaluating the Positive and NegativeAffect Schedule (PANAS-X) [42] on fear and jovial-ity . The null hypothesis of this manipulation checkis H mc , F , : There is no mean difference between ei-ther fear or joviality between conditions. We callthe manipulation successful if this null hypothesisis rejected.The null hypothesis of the overall experimentis H F , : There is no mean difference in zxcvbn log10 guesses between conditions.

The correspond-ing alternative hypothesis H F , is The zxcvbn log10 guesses differ between conditions.

RQ 2 (Study 2: Stress) . To what extent does elicitedstress impact password strength?

Table 2 operationalizes this research question. Asindependent variable (IV), we have elicited stresswith the two levels: Stress and Control.We check the success of the manipulationwith two kinds of instruments: the Short StateStress Questionnaire (SSSQ) [48], consideringoverall stress and distress , as well as the State-Trait Anxiety Inventory (STAI) [45], considering state anxiety .The null hypothesis of this manipulation check is H mc , S , : There is no mean difference between either stress , distress , or state anxiety . We call the manip-ulation successful if this null hypothesis is rejected.The null hypothesis of the overall experimentis H S , : There is no mean difference in zxcvbn

Levels Instrument Intervention/VariableIV: Affect Fear Stimulus Video [41]

Silence of the Lambs

Happiness

When Harry Met Sally

IV Check Fear PANAS-X [42] fear

Happiness joviality

DV: Pwd Strength zxcvbn [2] log10

Guesses log10 guesses between stressed and control condi-tion.

The corresponding alternative hypothesis H S , is The zxcvbn log10 guesses differ between condi-tions.

For reproducibility and scientiﬁc integrity, the studyhas been registered at the Open Science Framework(OSF) . The comprehensive report [54] of all anal-yses computed is registered at the OSF, as well.Analyses, graphs and statistical reporting in this pa-per were computed directly from the data using the R package knitr . Both Studies were conducted simultaneously aswithin-subject experiments with two conditions,that is, each participant goes through both condi-tions. We determined in a constrained random blockassignment in which order each participant was ex-posed to the conditions, while maintaining balancedsub-sample sizes.We stress that affect elicitation and mood in-duction procedures have been experimentally andmeta-analytically validated for within-subject set-tings [41, 39]. To ensure that a MIP stimulus ofa preceding session does not confound a subsequentsession, we leave a break of 24 hours between ses-sions.We have further chosen to run the studies as a labexperiment and not as a study on Amazon Mechan- osf.io/3cd9h ical Turk (AMT), primarily because the stress ma-nipulations required physical presence of the partic-ipants. Given that we induced stress as well as fear,we deemed it ethical that the experiments were con-ducted by a physically-present experimenter, whocould offer information about the experiment in per-son, allow participants to withdraw from the exper-iment in dignity and ensure that participants are notleaving the experiment overly disturbed. The participants for both studies were recruited in-dependently from one another.We have chosen to run the experiments as within-subject design with the given sample sizes to gainat least 80% power for medium effect sizes, afteran a priori power analysis. A two-tailed dependent-samples t -test requires a sample size of N =

34 toreach 80% power at a signiﬁcance level of α = . In both studies, participants are asked to register ona Web site, which stores personal information aswell as sensitive data about them, such as person-ality traits and psychometric test results. The par-ticipants are made aware of the sensitivity of thedata. There was no password policy imposed on theparticipants. Participants were asked to “choose”a password, that is, they were not asked to refrainfrom reusing prior passwords.6able 2: Operationalization of Study 2: Effect of Stress on Password Strength.

Levels Instrument Intervention/VariableIV: Stress Stress Serial Subtraction Task [9] cont. sub. 7 from 9095, 1.5 mincont. sub. 13 from 5245, 1.5 minIsometric Handgrip Task [10] 30% max strength, 2.5 minSocial Stress akin to TSST [11] results judged; fail: start overControl Balanced: Serial addition cont. add 7 to 9095, 1.5 mincont. add. 13 to 5245, 1.5 minBalanced: Isometric Handgrip 30% max strength, till discomfortIV Check Total Stress SSSQ [47, 48, 46] stress

Distress distress

Anxiety STAI [45] state anxiety

Cognitive Effort Task Load TLX [52, 53] tlx

Mental Demand tlx mental

DV: Pwd Strength zxcvbn [2] log10

Guesses

The experiments are conducted over two days, tolet the effects of prior manipulations subside. Thebreak between both runs was at least 24 hours, butno more than ﬁve days. All participants returnedfor the second run. No participant withdrew fromthe study.When the participants returned for the secondday, they are informed that they need to set a newpassword for their personal data under the pretext ofa security incident. The system enforced that theycould not repeat exactly the same password.For both experiments, we ﬁrst elicited an affec-tive state, then asked the participants to register anaccount with a password, and ﬁnally had a post-taskmanipulation check to evaluate how well the affectelicitation worked. We note here that the manipu-lation check was deliberately placed after the pass-word task, to ensure that the affect during the taskwas at least as strong as the one measured.The participants were only told in the debrief-ing of each study that the experiment’s true purposewas about password strength. Study 2 included adebrieﬁng questionnaire on the modalities of pass-word choice made.

Password strength was measured as log10 numberof password guesses as evaluated with an ofﬂine zx-cvbn [2] with standard dictionaries.

Both studies followed the institution’s ethics guide-lines and were approved in its ethics process.

Affect Elicitation.

Participants were exposed tomild discomfort in the form of stress or fear, yetnot more so than in daily life. The stimuli used havebeen validated in affective psychology and stress re-search and been found appropriate for the use in ex-periments with adult participants.The experiments were conducted in a face-to-facesetting to ensure the experimenter could offer after-care should the participants feel uncomfortable orupset after a session.

Informed Consent and Opt-Out.

Participantswere informed of the requirements (two lab ses-sions) of the studies in advance.Participants received a consent form, could askquestions before and during the experiments, andwere informed that they could withdraw from theexperiment at any time. All participants were ableto exercise informed consent.

Deception.

The participants were deceived in thatwe did not disclose that our main interest was thepassword choice. Instead, the personality traits, af-fect and stress measurements were presented as part7f a personality proﬁle Web site.Participants received a debrieﬁng in which thetrue purpose of the experiment was explained.

Compensation.

Participants were reimbursed fortheir time spent in the experiment at the institution’scustomary rate for lab experiments. We set the pol-icy that participants would be reimbursed even ifthey chose to withdraw from the study.

Data Protection.

We ensured data protection andprivacy of the participants’ sensitive information.Records were anonymized and stored on an en-crypted hard disk. The zxcvbn metrics on the par-ticipants’ passwords were computed ofﬂine.

The participants were recruited through e-mailinglists with in the university on students in computerscience, mathematics, and statistics as well as ﬂy-ers. The sample consisted of 25 students and 9 par-ticipants from a range of professions, incl. nursing,teaching, and management.A total sample of N F =

34 was recruited, 11women and 23 men. We note that for mood induc-tion procedures, gender has not been found a statis-tically signiﬁcant confounding variable [39].88% of the participants were Caucasian, 9%from Paciﬁc or Asian islands, one of mixedAsian/Caucasian ethnicity. The majority of partici-pants had a BSc degree (79%), 12% a high schooldegree, 3% an MSc and 9% a PhD. Table 3 showsthe distribution of gender and age ( M = . SD = . a priori with a sensitivity to detect mediumdifferences between dependent means of Cohen’s d = . We offer an overview of the exact procedure in Fig-ure 1. Participants were constrained randomly as- Table 3: Demographics of Study 1: Fear

Gender

Female 32%Male 68%

Age

Big Five Inventory

Prequestions

Demographics

Manipulation

FearVideoHappinessVideo

Check

PANA-XFear/Joviality

PersonalitySite

First-TimeRegistration

Debriefing

PasswordStrategiesAfterCarePost-IncidentRenewal

Figure 1: Experiment design of Study 1: Fear.signed to be exposed to either the fear or the happi-ness stimulus in the ﬁrst session.

Participants were asked to watch standardized stim-ulus videos that induce either happiness or fear [41]from the

Handbook of emotion elicitation and as-sessment [38]. To elicit fear, we selected a speciﬁedscene from the

Silence of the Lambs , in which anFBI agent is trying to ﬁnd a psychopath, the movie’svillain, in a dark basement.To elicit happiness, we selected a speciﬁed scenefrom

When Harry Met Sally , in which the two maincharacters are sitting in a caf´e talking about a fakeorgasm and Sally starts to fake an orgasm to proveher point.The effectiveness of both stimuli has been doc-umented in the corresponding research in affectivepsychology [41].

The manipulation check allowed us to test whetherthe manipulation was successful as well as to com-pute a correlation between the measured strength ofthe affect and the password strength.As post-task manipulation check, we adminis-tered the

Positive and Negative Affect Schedule (PANAS-X) [42] as a self-report questionnaire toevaluate the participants’ current affects.8able 4: Demographics of Study 2: Stress

Gender

Female 52%Male 48%

Age

The PANAS-X is scale is based on 5-point Likert-items. anchored on 1 – “very slightly or not at all,”2 – “a little,” 3 – “moderately,” 4 – “quite a bit,” and5 – “extremely.”We restricted PANAS-X to the items pertainingto the variables na (negative affect), pa (positiveaffect), fear and joviality , which yields a 40-itemquestionnaire. We anchored PANAS-X on “at thepresent moment.” The N S =

50 participants were recruited from uni-versity students, mostly with computer sciencebackground. Table 4 includes their gender and agedistribution ( M = . SD = . a priori with a sensitivity to detect differ-ences between means of dependent groups of Co-hen’s d = . Figure 2 depicts the experiment design and proce-dure for the second study on stress. Participants

Big Five Inventory

Prequestions

Demographics

Manipulation

StressTasksControlTasks

Check

State-TraitAnxiety

PersonalitySite

First-TimeRegistration

Debriefing

PasswordStrategiesAfterCarePost-IncidentRenewal Short State Stress

Figure 2: Experiment design of Study 1: Stress.were constrained randomly assigned to be exposedto the stress condition either in the ﬁrst or the sec-ond session.

The manipulation in the experiment condition con-sisted of two stressful tasks combined with induc-tion of social stress. In the control condition, theparticipants completed balanced tasks which did notinduce stress.

Serial Subtraction Task.

We induced cognitivestress by the serial subtraction task , one of the mostused tasks to induce psychological stress and re-ceive a cardiovascular response [9]. In the experi-ment condition, the participant is asked to contin-ually subtract an one- or two-digit prime numberfrom a four digit number. The participants com-pleted two serial subtraction tasks with 7 and 13counting down from 9095 and 5245, respectively.Each task lasted 1.5 min. When a participant mis-computed a value, the participant was asked to startover. This was framed as a test of cognitive ability.In the control condition, participants were askedto continuously add 7 or 13 to 9095 and 5245, re-spectively, for 1.5 min. If they made a mistake, theywere told the correct result, but not to start over.

Isometric Handgrip Task.

The isometric hand-grip task has been used to induce physical stressin terms of cardiovascular response [10]. An elec-tronic hand dynamometer was used to measure theparticipant’s maximal grip strength.In the experiment condition, the participants wereasked to hold the grip at least at 30% of their maxi-mal grip strength for 2.5 min. Should they go under30%, the experimenter would sternly tell them to9eep it above “Keep it above [30% of their max].”The experimenter took notes of the participants’sperformance.In the control condition, the participants wereasked to hold the grip at 30% max strength till theystart to feel uncomfortable and, then, to stop. Nonotes were taken in the control condition.

Social Stress.

Part of the protocol was to inducesocial stress in the experiment condition akin to themethods of the

Trier Social Stress Test (TSST) [11].While we did not replicate the TSST exactly, wetook inspiration from it. In the serial subtractiontask, participants were told that their results wouldbe reviewed with the principal investigator of thestudy. TSST also used serial subtraction, and sim-ilar to the TSST, participants were asked to startover, when they made a mistake.In the isometric handgrip task’s experiment con-dition, the experimenter was standing behind thesitting participants. The experimenter issued sternwarning and made notes whenever the participantslipped under 30% of the max grip strength.

We used two instruments to check the stress of par-ticipants: the Short State Stress Questionnaire [47,48] and the State-Trait Anxiety Inventory [45].

Short State Stress Questionnaire.

The ShortState Stress Questionnaire (SSSQ) [48] is self-report questionnaire on state stress, an abridgedversion of the Dundee State Stress Questionnaire(DSSQ) [46]. The short version contains 24 ques-tions, formalized as 5-point Likert items anchoredon 1 – “Strongly Agree,” 2 – “Agree,” 3 – “Nei-ther Agree nor Disagree,” 4 – “Disagree,” and 5 –“Strongly Disagree.” Questions referred to the time“in this moment.”The SSSQ contains three factors: engagement , distress , and worry . We considered the overall stress , the sum of these three, as well as distress . State-Trait Anxiety Inventory.

The State-TraitAnxiety Inventory for Adults (STAI-AD) [45] is a 40-question self-report questionnaire. We are in-terested in the temporary construct of state anxi-ety, that is, “how you feel right now.” It uses 4-point Likert items anchored on 1 – “Not At All,” 2– “Somewhat,” 3 – “Moderately So,” and 4 – “VeryMuch So.”

NASA Task Load Index.

The NASA Task LoadIndex (TLX) [52, 53] is a standardized and vali-dated instrument to measure the overall task load(strongly related to cognitive effort). It includessub-scales for mental, physical and temporal de-mand as well as performance, effort and frustrationlevel. It measures these sub-scales on visual ana-logue scales (VAS), which we projected on the in-terval [ − , + ] . For the overall TLX measure-ment, the different sub-scales are weighted by thesubjective workload ranks. The debrieﬁng of Study 2 inquired on multiple as-pects of the participants’ password choice for bothconditions. The questions included: • reuse : Has the password or a variant been usedpreviously? • frequency : How often has the password beenused in the past? • last use : When has the password been lastused? • strategy : Why has the password been chosenthat way? As a general rule, statistics were computed with asigniﬁcance level of α = .

05. We used two-taileddependent-samples tests throughout. We considerthe manipulation checks of each study as a testfamily and report p -values with Bonferroni-Holmmultiple-comparisons corrections as p MC ( n ) , where n is the number of comparisons made. We do notcorrect the password strength comparison or theorder-effect analysis to ward against Type-II errors.10 .1 Study 1: Effect of Fear The study measured PANAS-X fear and joviality as manipulation checks and zxcvbn log10 guessesas dependent variable. Table 5 offers an overviewof the descriptive statistics of the fear experiment( N F = (a) Elicited Affect: Fear PANAS-X zxcvbnfear joviality log10

Guesses M SD (b) Elicited Affect: Happiness PANAS-X zxcvbnfear joviality log10

Guesses M SD As manipulation check, we compared PANAS-Xmeasurements on fear and joviality across the twoconditions Fear and Happiness. Figure 3 comparesthe distributions of the treatments for each measure-ment.

Assumptions.

We analyzed the difference of bothconditions for outliers based on the Outlier LabelingRule. We further checked for outliers with the Ma-halanobis Distance D and, ﬁnally, concluded thatno outlier needed to be capped or removed.We tested the distribution of differences betweenconditions for normality with Shapiro-Wilk. Whilethe differences of the joviality measurements weresufﬁciently normally distributed ( W = . , p = . fear measurements were not normally distributed, W = . , p = . t -test is deemed tosome extent robust against violations of normality,we complement it with a Wilcoxon Signed-Ranktest. Success of the Fear/Happiness Manipulations.

Comparing across conditions, the mean fear wasstatistically signiﬁcantly greater under elicitedfear than under elicited happiness, t ( ) = . , p MC ( ) < . g av = .

15, 95% CI [ . , . ] . We observed a very large effect.The Wilcoxon signed-rank test conﬁrmed this re-sult, V = , p < . jovi-ality was statistically signiﬁcantly less underelicited fear than under elicited happiness, t ( ) = . , p MC ( ) < . g av = .

38, 95% CI [ . , . ] This was a large effect.We rejected the null hypothesis H mc , F , . Con-sequently, the manipulation check showed that thestimulus videos The Silence of the Lambs and

WhenHarry met Sally indeed caused fear and happinessin the participants.

We compared the password strength measured in zxcvbn log10 guesses across conditions.

Assumptions.

By the Outlier Labeling Rule andthe evaluation of Mahalanobis distance D , therewere no signiﬁcant outliers. Differences Between Conditions.

The log10 number of guesses is not statistically signiﬁ-cant across conditions, t ( ) = − . , p = . g av = − .

11, 95% CI [ − . , . ] . Theeffect size was trivial. We failed to reject the nullhypothesis H F , .The log10 guesses are statistically signiﬁcantlycorrelated across conditions, r = .

50, 95% CI [ . , . ] .Finally, we conducted a comparison of effectsizes across conditions an the estimation of theirconﬁdence intervals. Figure 14a offers a forest plotof these parameter and interval estimations.11 PANAS−X FearCondition

Fear Happiness (a) PANAS-X fear

PANAS−X JovialityCondition

Fear Happiness (b) PANAS-X joviality

Figure 3: Density plots by manipulation check for Study 1: Fear.

FearJovialitylog10 Guesses 0 1 2 3 4

Hedges' g (with 95% CI) M ea s u r e m en t Figure 4: Comparison of the effects of the manip-ulations vis-‘a-vis of the effect of fear on passwordchoice. (Hedges’ g av , 95% Conﬁdence Intervals).We observed that even if the manipulationsyielded large and very large effects, the effect onpassword strength was trivial. The margin of erroron the effect size estimation was less than half stan-dard deviation. Order Effects.

Dependent samples t -tests of fear and joviality by the order of conditions showedno statistically signiﬁcant difference, ps > .

45 and ps > .

25 respectively. We note that the differencesof fear by order were neither normally nor symmet-rically distributed, by which we used a dependent-samples Sign Test for the corresponding analysis.Considering the impact of the order on the pass-word strength is interesting in itself, because in theﬁrst password choice participants made an initial ac-count registration, in the second password choicethe participants made a password reset after an inci- dent. The differences of zxcvbn log10 guesses ful-ﬁlled the assumptions (no outliers, normality) for adependent-samples t -test. zxcvbn log10 guesses were not statistically sig-niﬁcantly different by condition order, t ( ) = − . , p = . g av = .

28, 95% CI [ − . , . ] .The mean password strength for the ﬁrst password(registration) was M F , = .

79. The mean pass-word strength for the second password (renewal)was M F , = . We found that the dependent variable zxcvbn log10 guesses was not statistically signiﬁcantly correlatedwith either fear or joviality. Table 6 offers anoverview of the pair-wise correlations.Table 6: Correlations of fear vs. password strengthin zxcvbn log10 guesses [95% Conﬁdence Inter-vals]. PX fear PX joviality PX fear PX joviality -0.52*** [-0.67, -0.32] log10 Guesses -0.03 [-0.27, 0.21] 0.08 [-0.16, 0.31] .2 Study 2: Effect of Stress6.3 Data Preparation We have analyzed the data for univariate outlierswith the Outlier Labeling Rule as well as multi-variate outliers with the Mahalanobis distance D .We found two cases with extreme values of zxcvbnlog10 guesses greater than 16 and D >

19. We de-cided to cap the outlying values with the 95th per-centile, instead of removing the cases altogether.

Table 7 shows the means and standard deviations ofthe stress study.Table 7: Descriptive statistics of Study 2: Stress. (a) Elicited Affect: Stress

SQQQ STAI zxcvbnstress distress state anxiety log10

Guesses M SD (b) Control SQQQ STAI zxcvbnstress distress state anxiety log10

Guesses M SD For the Short State Stress Ques-tionnaire (SSSQ) our analysis did not vouch for thecapping of any outliers. The differences of the to-tal stress values were normally distributed, Shapiro-Wilk W = . , p = . W = . , p = . W = . , p = . Success of the Stress Manipulation.

All threemeasurements, SSSQ stress and distress as well asSTAI state anxiety showed that the stress manipu-lations were successful. We offer an overview of thedistributions of the treatments by measurements inFigure 5.Participants in the stress condition showed sta-tistically signiﬁcantly more overall stress than inthe control condition, t ( ) = . , p MC ( ) < . g av = .

66, 95% CI [ . , . ] .They exhibited statistically signiﬁcantly moredistress than in the control condition, t ( ) = . , p MC ( ) < . g av = .

87, 95% CI [ . , . ] .Furthermore, they exhibited statistically signiﬁ-cantly more state anxiety than in the control con-dition, t ( ) = . , p MC ( ) < . g av = .

88, 95% CI [ . , . ] .As a consequence, we reject the null hypothesis H mc , S , . While we detected one outlier, wedecided not to cap it as it was close to the in-ner fence. The differences between zxcvbn log10 guesses between conditions were normally dis-tributed, W = . , p = . Difference Between Conditions

There was nostatistically signiﬁcant mean difference log10 num-ber of guesses between stress and control condition, t ( ) = . , p = . g av = .

01, 95%CI [ − . , . ] . Hence, we failed to reject the nullhypothesis H S , .There was a statistically signiﬁcant correlationbetween zxcvbn log10 guesses in the stress and con-trol condition, r = .

33, 95% CI [ . , . ] .We include a forest plot of the standardized meandifference of password strength under stress in Fig-ure 14b. Even though the manipulation causedstress with medium to large effect size, the effectsize observed on password strength in log10 guesses13 .000.010.020.030.04 50 60 70 80 90 SSSQ StressCondition

Stress Control (a) SSSQ stress

SSSQ DistressCondition

Stress Control (b) SSSQ distress

STAI State AnxietyCondition

Stress Control (c) STAI state anxiety

Figure 5: Density plots by manipulation check for Study 2: Stress.is 0, where the conﬁdence interval brackets it to anat most small effect.

State AnxietyOverall StressDistresslog10 Guesses 0.0 0.4 0.8 1.2 M ea s u r e m en t Figure 6: Comparison of the effects of the manipu-lations vis-‘a-vis of the effect of stress on passwordchoice. (Hedges’ g av , 95% Conﬁdence Intervals). Order Effects.

Having checked the assumptions,we computed dependent-samples t -tests on stress , distress and state anxiety by the order of experi-ment and control condition. There were no statisti-cally signiﬁcant order effects, ps > . zxcvbn log10 guesses werenot statistically signiﬁcantly different by conditionorder, t ( ) = − . , p = . g av = .

17, 95% CI [ − . , . ] .The mean password strength for the ﬁrst pass-word (registration) was M S , = . M S , = . Difference by Reuse.

We analyzed the differencein password strength depending on whether par-ticipants chose a new password or reused (a vari-ant of) an old one. Under observation of the cor-responding assumptions, computed independent-samples Welch t -tests. The zxcvbn log10 guesseswere not statistically signiﬁcantly different by reusefor either the experiment or the control condi-tion respectively, EXP: t ( . ) = . , p = . g = − .

26, 95% CI [ − . , . ] andCTRL: t ( . ) = − . , p = . g = .

27, 95% CI [ − . , . ] .Consequently, we failed to reject the null hypoth-esis that reuse of passwords has no impact on thepassword strength. We found that there was no statistically signiﬁcantcorrelation between the measurements of stress andthe password strength in log10 guesses. Table 8contains the overall correlation matrix. × Cognitive Load Interac-tion

We observed indications of a disordinal/cross-overinteraction between the experiment condition and14able 8: Correlations of stress vs. password strength in zxcvbn log10 guesses [95% Conﬁdence Intervals].

Total Stress Distress AnxietyTotal StressDistress 0.64*** [ 0.51, 0.75]State Anxiety 0.35*** [ 0.16, 0.51] 0.64*** [ 0.51, 0.74] log10

Guesses -0.02 [-0.21, 0.18] 0.08 [-0.12, 0.27] 0.05 [-0.14, 0.25] task load. Figure 7 offers an interaction diagramillustrating the situation.

Condition Z xcv bn Log10 G ue ss e s Mental Task Load

High TLX Mental Low TLX Mental

Figure 7: Interaction TLX Mental level by Condi-tion.We conducted a repeated-measures mixed-effectsanalysis with TLX Mental Demand and the Experi-ment Condition as ﬁxed effects. We included a ran-dom intercept and distress slope with the subject asthe context.Compared to the intercept model, the model un-der consideration was on the borderline, but not sta-tistically signiﬁcant under a Likelihood-Ratio test, χ ( ) = . , p = . (cid:54) < . R between theoriginal data and the ﬁtted values as an estimate ofoverall effect size, we obtain R = R = Fitted Values O b s e r v ed V a l ue s Figure 8: Fit of the stress–mental demand interac-tion model.We offer an overview of the model’s coefﬁcientestimates in Figure 9. (Intercept)ConditionEXPConditionEXP:TLX_MentalTLX_Mental−2.5 0.0 2.5 5.0 7.5

Estimate (95\% Confidence Interval) C oe ff i c i en t Figure 9: Interaction Model Coefﬁcient Estimates.The impact of the Condition × TLX Men-tal Demand interaction was statistically signiﬁ-cant, F ( , ) = . p = . .

25, 95% CI [ . , . ] .We observe a statistically signiﬁcant intercept es-timate, F ( , ) = . p < . .

31, 95% CI15 . , . ] .The main ﬁxed effects were not statistically sig-niﬁcant. Condition: F ( , ) = . p = . F ( , ) = . p = . log10 guesses than participantsreporting high mental demand.However, in the experiment condition, when theparticipants completed stressful tasks, participantswho reported high mental demand on average chosebetter passwords than participants who reported lowmental demand.We note that the companion analysis report [54]also contains an analysis of the three-way in-teraction Condition × TLM Mental × Reuse .The mixed-effects model of the three-way inter-action was not statistically signiﬁcant, χ ( ) = . , p = . A meta analysis of the order effects across bothstudies showed an overall effect in Hedges’ g av = . [ − . , . ] . The summary ef-fect was not statistically signiﬁcant though, p = . Q -test was not statistically signiﬁcantat α = . Q ( ) = . , p = . I = We conducted a network meta analysis to put the ef-fects of different studies in relation. We consideredthe 2016 LASER paper by Groß et al. [1], whichconsidered the effect of cognitive effort and deple-tion on password choice.For the network meta analysis, we coded Groß etal.’s categories “Undepleted,” “Effortful” and “De-

RE Model −0.4 0 0.2 0.4 0.6 0.8Observed OutcomeStudy 2: StressStudy 1: Fear 0.17 [−0.22, 0.55]0.28 [−0.13, 0.69]0.22 [−0.06, 0.50]

Figure 10: Forest Plot of order effectspleted” conditions as three conditions of the samestudy, noting that, in fact, these categories are de-rived from the study’s manipulation check on de-pletion level.We coded the “Fear” and “Happiness” conditionsof Study 1 of this paper, such that “Happiness” ismapped onto “Undepleted.” Similarly, we coded the“Stress” and “Control” conditions of Study 2 of thispaper, such that “Control” is mapped onto “Unde-pleted.”We display an overview of the resulting networkmeta analysis in Figure 11. The meta analysis isbased on standard mean differences, Hedges’ g incase of Gross et al. and Hedges’ g av in case of thispaper. Figure 11a yields a forest plot of the results,while Figure 11b shows the network of treatmentrelations.We observe that the effects associated with the“Effortful” and “Depleted” categories of Groß etal. [1] are maintained at large and medium effectsizes. The treatments fear and stress only yield triv-ial effect sizes. Consequently, stress is not sup-ported as an alternative explanation for reported ef-fects of cognitive effort and depletion. We induced incidental fear and stress, that is, an af-fective state not related to the password choice sce-16 reatment

DepletedEffortfulFearStressUndepleted −1 −0.5 0 0.5 1

Comparison: other vs 'Undepleted'(Random Effects Model) SMD −0.760.52−0.110.010.00 [−1.40; −0.11][−0.02; 1.05][−0.44; 0.22][−0.31; 0.32] (a) Forest Plot of Treatment Effects

DepletedEffortfulFear StressUndepleted (b) Network Graph

Figure 11: Network Meta Analysis of LASER’2016 [1] as well as this paper’s Study 1: Fear and Study 2:Stress.nario. Consistently yielding large to very large ef-fect sizes, the different induction techniques (affectstimulus videos, stress battery) were shown to haveworked well.Clearly, task-related fear and stress should morelikely to impact password strength, as pursued inresearch on fear appeals. For instance, partici-pants could be exposed to a news article describingthe negative impact of identity theft or a passwordbreach. Such an experiment setup would vouch foran analysis with the Protection Motivation Theory(PMT) [33, 34, 35].

While the 95% conﬁdence intervals on the effect offear and stress on password strength bracketed theeffects as small, the effect of incidental stress cor-rected for other predictors was negative, similarlyto the effect of incidental fear. Hence, future re-search may aim at pinpointing a negative inﬂuenceof incidental stressors.At the same time, we found a statistically signiﬁ-cant interaction between stress and mental demand,which asks for further investigation seeking to iso-late both conditions.

While a three-way interaction model includingpassword reuse [54] was not statistically signiﬁ- cant, we observe weak evidence that newly createdpasswords would be weaker when the user is eitherstressed or under mental demand, and stronger un-der baseline conditions.While recommendations that users should createnew passwords when they are rested as well as beallowed to rely on variants of prior passwords whenthey are stressed or depleted seem plausible, thisarea requires further investigation for a conclusiveresult.

Groß et al. hypothesized that stress could bean alternative explanation for the observed effects(a) that users under cognitive effort but not depletedcreated better passwords than the control group, and(b) that users reporting high depletion create worsepasswords than the control group.Having induced incidental fear and stress andcompared the results in a network meta analysis,we have not found evidence that these factors causean effect of similar magnitude as the the cognitiveeffort and depletion, reported in Gross et al. [1].Hence, cognitive effort and depletion are still plau-sible explanations of the observed differences inpassword strength.17 .5 When users are asked to renewtheir password because of a se-curity incident, their passwordstrength may improve.

Having considered the meta analysis of differencesbetween password choice in a ﬁrst-time registrationand in a renewal after a security incident, we foundevidence of a consistent effect between studies thatthe password strength in the renewal condition isslightly greater than in the ﬁrst-time registration.In terms of magnitude, we ﬁnd that the greater log10 number of guesses during renewal after a re-ported security incident makes for a small effectsize, g av = . [ − . , . ] .This yields an early indication that task-relatedfear (induced by a security incident message) iscausing a change in user behavior towards a protec-tion motivation. Hence, we expect that a future ex-periment constructed to test the full Protection Mo-tivation Theory (PMT) in a password choice sce-nario will yield conclusive result.Given the small effect size estimates, however,neither the individual studies nor the meta analysishad enough power to reject the null hypothesis withstatistical signiﬁcance. Hence, this asks for furtherinvestigation. The participants were largely recruited from univer-sity students, limiting generalizability.In terms of ecological validity, the studies cre-ated a scenario in which actual private informationof participants (personality traits, stress and anxietydata) was stored on a Web site. Having been madeaware of the sensitivity of such personal data, theparticipants’ incentive to protect the data was simi-lar to real life.The participants were exposed to a diversion inthat they were only informed after the experimentthat the research aim included the password strengthand in that they were misled under the pretext of asecurity incident to change their password. Con-sequently, the ﬁrst and the second run of the pass- word choice trial were different by design in termsof “ﬁrst registration” vs. “password change after in-cident”.

We note that an within-subjects experiment to estab-lish a lower bound on the impact of fear or stress inpassword strength would need a considerable num-ber of participants. For 95% a priori power on adependent-samples t -test for standardized mean dif-ference of Cohen’s d = .

1, one would need a sam-ple size of 1300.

This is the ﬁrst work to investigate the effects ofincidental fear and stress on password choice. It isthe ﬁrst to estimate the magnitude of such effectsacross studies on the inﬂuence of the user’s currentcognitive and affective state on password decisionmaking.As future work, the two studies yield an obser-vation on the effect of fear appeals in passwordchoice. There were ﬁrst indications that the mes-sage of a security incident caused participants tochoose a stronger password with a small effectsize. This vouches for further investigation, for in-stance using the full Protection Motivation Theory(PMT) [33, 34, 35] as a foundation.

Acknowledgements

We are grateful for the discussions with KovilaCoopamootoo, especially on earlier work on cog-nitive effort, on incidental and integral affect, aswell as on the Protection Motivation Theory. Weare grateful for the discussions with Uchechi Phyl-lis Nwadike, especially on the effects of inciden-tal fear, sadness and happiness on privacy decisionmaking [56]. We appreciated discussions with RoyMaxion on experimentation on stress. This workwas in parts supported by the ERC Starting GrantCASCAde (GA n o eferences [1] T. Groß, K. Coopamootoo, and A. Al-Jabri,“Effect of cognitive depletion on passwordchoice,” in Learning from Authoritative Secu-rity Experiment Results (LASER’16) , S. Peis-ert, Ed., July 2016.[2] D. L. Wheeler, “zxcvbn: Low-budget pass-word strength estimation,” in

Proc. USENIXSecurity , 2016.[3] J. Bonneau, C. Herley, P. C. Van Oorschot, andF. Stajano, “The quest to replace passwords: Aframework for comparative evaluation of webauthentication schemes,” in

Security and Pri-vacy (SP), 2012 IEEE Symposium on . IEEE,2012, pp. 553–567.[4] J. Bonneau and S. Preibusch, “The passwordthicket: Technical and market failures in hu-man authentication on the web.” in

WEIS ,2010.[5] D. Florencio and C. Herley, “A large-scalestudy of web password habits,” in

Proceedingsof the 16th international conference on WorldWide Web . ACM, 2007, pp. 657–666.[6] D. Florˆencio, C. Herley, and P. C.Van Oorschot, “Password portfolios andthe ﬁnite-effort user: Sustainably managinglarge numbers of accounts.” in

USENIXSecurity Symposium , 2014, pp. 575–590.[7] D. Kahneman,

Attention and effort . Citeseer,1973.[8] R. Baumeister, E. Bratslavsky, E. Muraven,and D. Tice, “Ego depletion: is the active self alimited resource?”

Personality and social psy-chology , vol. 74, pp. 1252–1265, 1998.[9] R. Li-Mei Liao and M. G. Carey, “Laboratory-induced mental stress, cardiovascular re-sponse, and psychological characteristics,”

Rev Cardiovasc Med , vol. 16, no. 1, pp. 28–35, 2015.[10] K. A. Matthews and C. M. Stoney, “In-ﬂuences of sex and age on cardiovascu-lar responses during stress.”

PsychosomaticMedicine , vol. 50, no. 1, pp. 46–56, 1988. [11] C. Kirschbaum, K.-M. Pirke, and D. H. Hell-hammer, “The ‘trier social stress test’–a toolfor investigating psychobiological stress re-sponses in a laboratory setting,”

Neuropsy-chobiology , vol. 28, no. 1-2, pp. 76–81, 1993.[12] G. Cumming,

Understanding the new statis-tics: Effect sizes, conﬁdence intervals, andmeta-analysis . Routledge, 2013.[13] R. S. Nickerson, “Null hypothesis signiﬁcancetesting: a review of an old and continuingcontroversy.”

Psychological methods , vol. 5,no. 2, p. 241, 2000.[14] M. J. Gardner and D. G. Altman, “Conﬁ-dence intervals rather than p values: estima-tion rather than hypothesis testing.”

Br Med J(Clin Res Ed) , vol. 292, no. 6522, pp. 746–750, 1986.[15] D. Lakens, “Calculating and reporting effectsizes to facilitate cumulative science: a practi-cal primer for t-tests and anovas,”

Frontiers inpsychology , vol. 4, 2013.[16] J. Algina and H. Keselman, “Approximateconﬁdence intervals for effect sizes,”

Ed-ucational and Psychological Measurement ,vol. 63, no. 4, pp. 537–553, 2003.[17] H. Cooper, L. V. Hedges, and J. C. Valentine,

The handbook of research synthesis and meta-analysis . Russell Sage Foundation, 2009.[18] G. Schwarzer, J. R. Carpenter, and G. R¨ucker,

Network Meta-Analysis . Cham: SpringerInternational Publishing, 2015, pp. 187–216.[Online]. Available: https://doi.org/10.1007/978-3-319-21416-0 8[19] B. Neupane, D. Richer, A. J. Bonner, T. Ki-bret, and J. Beyene, “Network meta-analysisusing r: A review of currently available auto-mated packages,”

PLOS ONE , vol. 9, no. 12,pp. 1–17, 12 2014. [Online]. Available:https://doi.org/10.1371/journal.pone.01150651920] G. R¨ucker, “Network meta-analysis, electricalnetworks and graph theory,”

Research Synthe-sis Methods , vol. 3, no. 4, pp. 312–324, 2012.[21] M. A. Sasse, S. Brostoff, and D. Weirich,“Transforming the ‘weakest link’—a hu-man/computer interaction approach to usableand effective security,”

BT technology journal ,vol. 19, no. 3, pp. 122–131, 2001.[22] P. Hoonakker, N. Bornoe, and P. Carayon,“Password authentication from a human fac-tors perspective,” in

Proc. Human Factors andErgonomics Society Annual Meeting , vol. 53,no. 6. SAGE Publications, 2009, pp. 459–463.[23] A. Adams and M. A. Sasse, “Users are not theenemy,”

Communications of the ACM , vol. 42,no. 12, pp. 40–46, 1999.[24] A. Das, J. Bonneau, M. Caesar, N. Borisov,and X. Wang, “The tangled web of passwordreuse.” in

NDSS , vol. 14, 2014, pp. 23–26.[25] M. Zviran and W. J. Haga, “A comparison ofpassword techniques for multilevel authenti-cation mechanisms,”

The Computer Journal ,vol. 36, no. 3, pp. 227–237, 1993.[26] J. Bonneau, “The science of guessing: analyz-ing an anonymized corpus of 70 million pass-words,” in

Security and Privacy (SP), 2012IEEE Symposium on . IEEE, 2012, pp. 538–552.[27] W. E. Burr, D. F. Dodson, and W. T. Polk,“Electronic authentication guideline,” NIST,NIST Special Publication 800-63, jun 2004.[28] P. G. Kelley, S. Komanduri, M. L. Mazurek,R. Shay, T. Vidas, L. Bauer, N. Christin, L. F.Cranor, and J. Lopez, “Guess again (and againand again): Measuring password strength bysimulating password-cracking algorithms,” in

Security and Privacy (SP), 2012 IEEE Sympo-sium on . IEEE, 2012, pp. 523–537. [29] R. J. Davidson, K. R. Sherer, and H. H. Gold-smith,

Handbook of affective sciences . Ox-ford University Press, 2009.[30] M. Lewis, J. M. Haviland-Jones, and L. F. Bar-rett,

Handbook of emotions . Guilford Press,2010.[31] J. A. Russell, “Core affect and the psycholog-ical construction of emotion.”

Psychologicalreview , vol. 110, no. 1, p. 145, 2003.[32] E. Peters, D. V¨astfj¨all, T. G¨arling, andP. Slovic, “Affect and decision making: A“hot” topic,”

Journal of Behavioral DecisionMaking , vol. 19, no. 2, pp. 79–85, 2006.[33] R. W. Rogers, “Cognitive and psychologicalprocesses in fear appeals and attitude change:A revised theory of protection motivation,”

Social psychophysiology: A sourcebook , pp.153–176, 1983.[34] D. L. Floyd, S. Prentice-Dunn, and R. W.Rogers, “A meta-analysis of research on pro-tection motivation theory,”

Journal of appliedsocial psychology , vol. 30, no. 2, pp. 407–429,2000.[35] K. Witte, “Putting the fear back into fear ap-peals: The extended parallel process model,”

Communications Monographs , vol. 59, no. 4,pp. 329–349, 1992.[36] S. R. Boss, D. F. Galletta, P. B. Lowry, G. D.Moody, and P. Polak, “What do users have tofear? using fear appeals to engender threatsand fear that motivate protective security be-haviors,” 2015.[37] R. A. Ruiter, L. T. Kessels, G.-J. Y. Peters, andG. Kok, “Sixty years of fear appeal research:Current state of the evidence,”

Internationaljournal of psychology , vol. 49, no. 2, pp. 63–70, 2014.[38] J. A. Coan and J. J. Allen,

Handbook of emo-tion elicitation and assessment . Oxford uni-versity press, 2007.2039] R. Westermann, G. Stahl, and F. Hesse, “Rela-tive effectiveness and validity of mood induc-tion procedures: analysis,”

European Journalof social psychology , vol. 26, pp. 557–580,1996.[40] J. J. Gross and R. W. Levenson, “Emotionelicitation using ﬁlms,”

Cognition & emotion ,vol. 9, no. 1, pp. 87–108, 1995.[41] J. Rottenberg, R. D. Ray, and J. J. Gross,“Emotion elicitation using ﬁlms,”

Handbookof emotion elicitation and assessment , pp. 9–28, 2007.[42] D. Watson and L. A. Clark, “The PANAS-X: Manual for the positive and negative af-fect schedule – expanded form,” University ofIowa, Department of Psychology, Tech. Rep.,1999.[43] H. Selye, “Stress without distress,”

New york ,pp. 26–39, 1974.[44] K. H. Teigen, “Yerkes-dodson: A law for allseasons,”

Theory & Psychology , vol. 4, no. 4,pp. 525–547, 1994.[45] C. D. Spielberger, R. L. Gorsuch, and R. E.Lushene, “Manual for the state-trait anxietyinventory,” 1970.[46] G. Matthews, L. Joyner, K. Gilliland,S. Campbell, S. Falconer, and J. Huggins,“Validation of a comprehensive stress statequestionnaire: Towards a state big three,”

Per-sonality psychology in Europe , vol. 7, pp. 335–350, 1999.[47] W. S. Helton, “Validation of a short stress statequestionnaire,” in

Proceedings of the HumanFactors and Ergonomics Society Annual Meet-ing , vol. 48, no. 11. SAGE Publications SageCA: Los Angeles, CA, 2004, pp. 1238–1242.[48] W. S. Helton and K. N¨aswall, “Short stressstate questionnaire: Factor structure and statechange assessment.”

European Journal of Psy-chological Assessment , vol. 31, no. 1, p. 20,2015. [49] R. F. Baumeister, K. D. Vohs, and D. M. Tice,“The strength model of self-control,”

Currentdirections in psychological science , vol. 16,no. 6, pp. 351–355, 2007.[50] J. D. Mayer and Y. N. Gaschke, “The experi-ence and meta-experience of mood.”

Journalof personality and social psychology , vol. 55,no. 1, p. 102, 1988.[51] D. M. Tice, R. F. Baumeister, D. Shmueli, andM. Muraven, “Restoring the self: Positive af-fect helps improve self-regulation followingego depletion,”

Journal of Experimental So-cial Psychology , vol. 43, no. 3, pp. 379–384,2007.[52] S. G. Hart and L. E. Staveland, “Developmentof NASA-TLX (Task Load Index): Results ofempirical and theoretical research,”

Advancesin psychology , vol. 52, pp. 139–183, 1988.[53] S. G. Hart, “NASA-task load index (NASA-TLX); 20 years later,” in

Proceedings of thehuman factors and ergonomics society annualmeeting , vol. 50. Sage Publications, 2006,pp. 904–908.[54] T. Groß, “Analysis report – investigation of theeffect of fear and stress on password choice,”Open Science Framework, OSF Report https://osf.io/3cd9h/, 2017.[55] M. M. Kelly, A. R. Tyrka, G. M. Anderson,L. H. Price, and L. L. Carpenter, “Sex dif-ferences in emotional and physiological re-sponses to the trier social stress test,”

Journalof behavior therapy and experimental psychi-atry , vol. 39, no. 1, pp. 87–98, 2008.[56] U. Nwadike, T. Groß, and K. P. Coopamootoo,“Evaluating users’ affect states: Towards astudy on privacy concerns,” in

Privacy andIdentity Management. Facing up to Next Steps .Springer, 2016, pp. 248–262.21 otal Stress Distress State Anxiety log10 Guesses

Stress vs. Password Strength

Figure 13: Corrgram of the variables irrespective ofconditions.

A Cross-Correlations

We determined in Studies 1 and 2 that the zxcvbnlog10 guesses were not statistically signiﬁcantlycorrelated with either fear or stress. Figure 12 andFigure 13 the corresponding corrgram for fear and stress respectively.

Fear Joviality log10 Guesses

Fear vs. Password Strength

Figure 12: Corrgram of the variables irrespective ofconditions.

B Effect and Interval Estimates earJovialitylog10 Guesses 0 1 2 3 4 M ea s u r e m en t (a) Study 1: Fear State AnxietyOverall StressDistresslog10 Guesses 0.0 0.4 0.8 1.2 M ea s u r e m en t (b) Study 2: Stress EffortfulDepletedFearStress −1.5 −1.0 −0.5 0.0 0.5 1.0 M an i pu l a t i on (c) Comparision of previously reported depletion effects.(c) Comparision of previously reported depletion effects.