[PDF] On the Economics of Offline Password Cracking

Abstract

We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass' use of PBKDF2-SHA256 with 10 5 hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker -- if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is always to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of 500 million cracked passwords is less than 100 times the total value of 5 million passwords). See paper for full abstract.

Full PDF

OOn the Economics of Ofﬂine Password Cracking

Jeremiah BlockiPurdue University Ben HarshaPurdue University Samson ZhouCarnegie Mellon University

Abstract —We develop an economic model of an ofﬂine pass-word cracker which allows us to make quantitative predictionsabout the fraction of accounts that a rational password attackerwould crack in the event of an authentication server breach. Weapply our economic model to analyze recent massive passwordbreaches at Yahoo!, Dropbox, LastPass and AshleyMadison.All four organizations were using key-stretching to protect userpasswords. In fact, LastPass’ use of PBKDF2-SHA256 with hash iterations exceeds 2017 NIST minimum recommendationby an order of magnitude. Nevertheless, our analysis paintsa bleak picture: the adopted key-stretching levels provideinsufﬁcient protection for user passwords. In particular, wepresent strong evidence that most user passwords follow aZipf’s law distribution, and characterize the behavior of arational attacker when user passwords are selected from aZipf’s law distribution. We show that there is a ﬁnite thresholdwhich depends on the Zipf’s law parameters that character-izes the behavior of a rational attacker — if the value ofa cracked password (normalized by the cost of computingthe password hash function) exceeds this threshold then theadversary’s optimal strategy is always to continue attackinguntil each user password has been cracked. In all cases (Yahoo!,Dropbox, LastPass and AshleyMadison) we ﬁnd that the valueof a cracked password almost certainly exceeds this thresholdmeaning that a rational attacker would crack all passwordsthat are selected from the Zipf’s law distribution (i.e., most userpasswords). This prediction holds even if we incorporate anaggressive model of diminishing returns for the attacker (e.g.,the total value of million cracked passwords is less than times the total value of million passwords). On a positive noteour analysis demonstrates that memory hard functions (MHFs)such as SCRYPT or Argon2i can signiﬁcantly reduce thedamage of an ofﬂine attack. In particular, we ﬁnd that becauseMHFs substantially increase guessing costs a rational attackerwill give up well before he cracks most user passwords andthis prediction holds even if the attacker does not encounterdiminishing returns for additional cracked passwords. Basedon our analysis we advocate that password hashing standardsshould be updated to require the use of memory hard functionsfor password hashing and disallow the use of non-memory hardfunctions such as BCRYPT or PBKDF2.

1. Introduction

In the last few years breaches at organizations like Ya-hoo!, Dropbox, Lastpass, AshleyMadison, LinkedIn, eBayand Adult FriendFinder have exposed over a billion user passwords to ofﬂine attacks. Password hashing algorithmsare a critical last line of defense against an ofﬂine attackerwho has stolen password hash values from an authenticationserver. An attacker who has stolen a user’s password hashvalue can attempt to crack each user’s password ofﬂine bycomparing the hashes of likely password guesses with thestolen hash value. Because the attacker can check each guessofﬂine it is no longer possible to lockout the adversary afterseveral incorrect guesses.An ofﬂine attacker is limited only by the cost of com-puting the hash function. Ideally, the password hashing al-gorithm should be moderately expensive to compute so thatit is prohibitively expensive for an ofﬂine attacker to crackmost user passwords e.g., by checking millions, billions oreven trillions of password guesses for each user. It is perhapsencouraging that AshleyMadison, Dropbox, LastPass andYahoo! had adopted slow password hashing algorithms likeBCRYPT and PBKDF2-SHA256 to discourage an ofﬂineattacker from cracking passwords. In the aftermath of thesebreaches, the claim that slow password hashing algorithmslike BCRYPT [1] or PBKDF2 [2] are sufﬁcient to protectmost user passwords from ofﬂine attackers has been re-peated frequently. For example, LastPass [3] claimed that“Cracking our algorithms [PBKDF2-SHA256] is extremelydifﬁcult, even for the strongest of computers.” Security ex-perts have made similar claims about BCRYPT e.g., after theDropbox breach [4] a prominent security expert conﬁdentlystated that “all but the worst possible password choices aregoing to remain secure” because Dropbox had used theBCRYPT hashing algorithm.Are these strong claims about the security of BCRYPTand PBKDF2 true? Despite all of their problems passwordsremain prevalent and are likely to remain entrenched as thedominant form of authentication on the internet for years tocome because they are easy to use and deploy, and usersare already familiar with them [5], [6], [7]. It is thereforeimperative to develop tools to quantify the damages ofpassword breaches, and provide guidance to organizationson how to store passwords. In this work we seek to addressthe following question:Can we quantitatively predict how many user pass-words a rational attacker will crack after a breach?We introduce a game-theoretic model to answer thisquestion and analyze recent data-breaches. Our analysisstrongly challenges the claim that BCRYPT and PBKDF2-SHA256 provide adequate protection for user passwords. Onthe positive side our analysis indicates that more modern a r X i v : . [ c s . CR ] J un assword hashing algorithms [8] (e.g., memory hard func-tions [9]) can provide meaningful protection against ofﬂineattackers. We ﬁrst develop a new decision-theoretic frameworkto quantify the damage of an ofﬂine attack. Our modelgeneralizes the stackelberg game-theoretic model of Blockiand Datta [10]. A rational password attacker is economi-cally motivated and will quit guessing once his marginalguessing costs exceed his marginal reward. The attacker’smarginal reward is given by the probability p i that thenext ( i th) password guess is correct times the value ofan additional cracked password to the adversary e.g., theadditional revenue of selling that password on the blackmarket or the expected amount of additional money thatcould be extorted from this user. Given the average value v of each cracked password for the adversary , the cost k ofcomputing the password hash function and the probabilitydistribution p > p > . . . over user selected passwords,our model allows us to predict exactly how many passwordsa rational adversary will crack. Unlike the model of Blockiand Datta [10] we can use our framework to model a settingin which the attacker encounters diminishing returns as wewould expect in most (black)markets i.e., the total valueof million cracked passwords may be signiﬁcantly lessthan times the total value of million passwords.Second, we present the strongest evidence to date thatZipf’s law models the distribution of user selected passwords(with the possible exception of the tail of the distribution).These ﬁndings strongly support previous conclusions ofWang and Wang [11]. In particular, we show that Zipf’slaw closely ﬁts the Yahoo! password frequency corpus. Thisdataset was collected by Bonneau [12] and later publishedby Blocki et al. [13]. In contrast to datasets from passwordbreaches the Yahoo! dataset was collected by trusted parties,and is representative of active Yahoo! users (researchershave observed that hacked datasets contain many passwordsthat appear to be fake [14]). Our sample size, millionusers, is also more than twice as large as the datasets Wangand Wang[11] used to support their argument that Zipf’slaw closely models password datasets.Third, we show that there is a ﬁnite threshold T ( . ) whichcharacterizes the behavior of a rational value v -adversarywhenever the distribution over passwords follows Zipf’slaw. In particular, if the ﬁrst cracked password has value v ≥ T ( . ) × k then the adversary’s optimal strategy is alwaysto continue guessing until he cracks the user’s password. Thethreshold T ( y, r, a ) is parameterized Zipf’s law parameters y and r and a parameter a representing the rate of passwordvalue decay. We remark that, even if Zipf’s law fails tomodel the tail of the password distribution, the threshold T ( y, r, a ) still provides a useful characterization of the

1. More precisely, if there are N users in the dataset and the totalvalue of all N cracked passwords is V then v = V/N . When there arediminishing returns for additional cracked passwords the parameter v maybe signiﬁcantly lower than the value of the ﬁrst cracked password. attacker’s behavior. In particular, if ( − x ) % of passwordsin a distribution follow Zip’s law and the other x % followsome unknown (possibly uncrackable) distribution then ourbounds imply that an attacker will compromise at least ( − x ) % of user passwords whenever v ≥ T ( y, r, a ) × k .Fourth, we also derive model independent upper andlower bounds on the fraction of passwords that a rationaladversary would crack. While these bounds are slightlyweaker than the bounds we can derive using Zipf’s law thesebounds do not require any modeling assumptions e.g., it isimpossible to determine for sure whether or not Zipf’s lawﬁts the tail of the password distribution. Interestingly, thelower bounds we derive suggest that state of the art passwordcrackers [15] could still be improved substantially.Fifth, we apply our framework to analyze recent largescale password breaches including LastPass, AshleyMadi-son, Dropbox and Yahoo! Our analysis strongly challengesthe claim that BCRYPT and PBKDF2-SHA256 provideadequate protection for user passwords. In fact, if thepassword distribution follows Zipf’s law then our analy-sis indicates that a rational attacker will almost certainlycrack 100% of user passwords e.g., unless the value ofDropbox/LastPass/AshleyMadison/Yahoo! passwords is sig-niﬁcantly less valuable than black market projections [16].Finally, we derive model independent upper and lowerbounds on the % of passwords cracked by a rational ad-versary. These bounds do not rely on the assumption thatZipf’s law models the tail of the password distribution .Nevertheless, our predictions are still quite dire e.g., arational adversary will crack % of Yahoo! passwords atminimum . Our analysis indicates that, to achieve sufﬁcientlevels of protection with BCRYPT or PBKDF2, it would be necessary to run these algorithms for well over a secondon modern CPU which would constitute an unacceptableauthentication delay in many contexts [17]. On a morepositive note our analysis suggests that the use of moremodern password hashing techniques like memory hardfunctions can provide strong protection against a rationalpassword attacker without introducing inordinate delays forusers during authentication. In particular, our analysis sug-gests that it could be possible to reduce the % of crackedpasswords below % without increasing authenticationdelays to a full second. In light of our analysis we contend that that there isa clear need to update standards for password storage toprovide developers with clear guidance about the importanceof using memory hard functions such as SCRYPT [9] orArgon2id [18]. In a recent recent user study Naiakshinaet al. [19] asked developers to select a password hashfunction for a new social networking platform. None of the

2. Wang and Wang [11] observed that the tails of empirical passworddatasets are not inconsistent with a Zipf’s law distribution. However, wecannot be entirely conﬁdent that Zipf’s law models the tail of the distribu-tion since, by deﬁnition, we do not have many samples for passwords inthe tail of the distribution. evelopers in this study selected a memory hard function and the strongest password hashing algorithms selected werePBKDF2 with 20,000 hash iterations and BCRYPT with1,024 iterations. The selection of PBKDF2 with 20,000 hashiterations would be deemed acceptable under 2017 NISTstandards [20] — PBKDF2 with at least

10, 000 iterations ispresented an acceptable selection for password hashing . Inthis sense, LastPass’ use of PBKDF2-SHA256 with iterations greatly exceeds current NIST standards. Neverthe-less, our analysis suggests that even PBKDF2-SHA256 with hash iterations is insufﬁcient to protect a majoritya user passwords while memory hard functions such asSCRYPT [9] or Argon2id [18] would provide meaningfulprotection. In addition to memory hard functions we alsoadvocate for the use of secure distributed password hashingprotocols [22], [23], [24] whenever feasible so that anattacker cannot mount an ofﬂine attack without breaching multiple authentication servers.

2. Economic Model

Given a dataset D of N user passwords we use f i to de-note the frequency of the i ’th most common password in thedataset and we use pwd i to denote the i ’th most commonpassword in the dataset. We use p , p , . . . to denote theactual distribution over passwords pwd , pwd , . . . . Thatis p i is the probability that a random user selects password pwd i . We use ^ p i = f i /N to denote an empirical estimateof p i given a dataset D which was sampled from the realpassword distribution. We also use λ i = (cid:80) ij = p j to denotethe cumulative probability of the i most likely passwords.Equivalently, λ i denotes the probability that an adversarycracks the user’s password within the ﬁrst i guesses.We say that the probability distribution p ≥ p . . . follows Zipf’s law if p i = zi s for some constants s and z .We say that a probability distribution follows a CDF-Zipfdistribution if λ i = yi r for some constants r and y .Ofﬂine Attack. To authenticate users passwordauthentication servers traditionally store salted pass-word hashes. In more detail to authenticate user u the authentication stores a record like the following: ( u, s u , H ( pwd u | s u )) . Here, u is the the username and pwd u is the user’s password, s u is a random string calledthe salt value used to protect against rainbow table attacksand H is a cryptographic hash function. An adversary whobreaches the authentication server will be able to obtainthe hash value along with the secret salt value. This ad-versary can now attempt as many guesses as he desiresofﬂine by computing the hashes of likely passwords guesses H ( g , s u ) , H ( g , s u ) , . . . and comparing these values with

3. On a positive note the authors did ﬁnd that priming developers aboutthe importance password security resulted in the selection of strongerpassword hashing algorithms.4. An upgrade from

1, 000 iterations as the minimal acceptable numberof hash iterations for PBKDF2 in an older 2010 NIST standard [21]. the stolen password hash. The attacker is only limited bythe resources that he is willing to invest trying to crack theuser’s password.Key Questions and Parameters. We aim to addressthe following questions: How many guesses will our rationaladversary attempt? What fraction of the user passwordswill an adversary manage to break? The answer to thesequestions will depend on several factors. How valuable is acracked password to the adversary? How much does it costto compute H each time we validate a new password guess?What does the distribution over user passwords look like?We use v to denote the value of a cracked passwordto the adversary measured in units of C H , where H is anunderlyng cryptographic hash function like SHA256. Wecan estimate v $ by looking at black market prices for crackedpasswords. For example, Fossi et al. [16] found that themarket price for hacked passwords tends to lie in the range [ $ $ ] . A more recent analysis of Yahoo! passwords foundthat they sell for between $ and $ [25] — the dropin price may be due to an increased supply of Yahoo! pass-words. Herley and Florencio found that dishonest behaviorcan signiﬁcantly inhibit trade on black markets [26]. Thus,these prices may underestimate the true value of a crackedpassword.Password hash functions are often constructed from anunderlying cryptographic hash function H . For example,PBKDF2-SHA256 simply iterates the SHA256 hash func-tion multiple times. We use k to denote the cost of acomputing the ﬁnal password hash function — once againmeasured in units of C H . We use v $ = v × C H (resp. k $ = k × C H ) to denote the value (resp. cost) in USD givenan estimate of C H . We model a rational adversary who has obtained thesalted password hash of a user’s password. Our model gen-eralizes the stackelberg game-theoretic framework of Blockiand Datta [10] by introducing a parameter ≤ a ≤ which models diminishing returns. We assume that adver-sary knows the password distribution p , p , . . . as well asthe corresponding passwords pwd , pwd , . . . . However,the adversary does not know which password the userselected.Attacker Game. We model password cracking usinga single-shot game. In the game we sample a random pass-word pwd from the password distribution Pr [ pwd i ] = p i .The adversary picks a threshold t ≥ . The threshold t speciﬁes an ordered list L ( t ) = pwd , . . . , pwd t of the t most likely passwords. If the real password is containedin the list of adversary guesses, pwd ∈ L ( t ) , then theadversary receives a payment of v and we charge the ad-versary j · k , where j is the index of the correct passwordguess pwd = pwd j . If the real password is not containedin the list pwd , . . . , pwd t of adversary guesses then theadversary receives no payment ( v = ) and the adversary ischarged t · k . Notice that t = corresponds to the strategy inwhich the adversary gives up without guessing, and t = ∞ orresponds to the strategy in which the adversary neverquits. Observe that λ t = (cid:80) tj = p j denotes the fraction ofuser passwords that are cracked by a threshold t adversary.About the Attacker. In our analysis we consider anattacker that is(1) Informed:

The attacker knows the password dis-tribution p , p . . . and the associated passwords pwd , pwd . . . . However, the attacker does not knowwhich password a particular user u selected.(2) Untargeted:

The attacker does not have personalknowledge about the user that can be exploited toimprove the guessing attack.(3)

Rational:

The attacker is economically motivated, andwill stop attacking the user once marginal guessingcosts exceed the marginal guessing rewards.Discussion. Our attacker model captures the mostcommon types of password attacks. It is generally reason-able to assume that the attacker knows the password distri-bution — possibly excluding of the tail of the distribution.In particular, previous password breaches provide plenty oftraining data for the attacker and it is reasonable to assumethat password cracking models will continue to improve asattackers obtain more and more training data from futurepassword breaches. We focus on an untargeted attacker inour analysis. However, we stress that our model may also beuseful when considering a targeted attacker with backgroundknowledge of the user (e.g., name, birthdate, hobbies etc...).In particular, let p i denote the probability that a targetedadversary’s i ’th guess is correct. Wang et al. observed thata targeted distribution over user passwords p , p . . . stillseems to follow Zipf’s law [27].Rational Attacker Behavior. If the adversary choosesa threshold t then his expected guessing costs are C ( t ) = t  − t (cid:88) j = p j  k + k t (cid:88) j = j · p j . Similarly, his expected reward is R ( t ) = v  t (cid:88) j = p j  a where the parameter ≤ a ≤ allows us to modeldiminishing returns for the attacker as he obtains additionalcracked passwords. For example, let t % (resp. t % ) be givensuch that p + . . . + p t % = ( p + . . . + p t % = )then for a < 1 we have R ( t % ) = a R ( t % ) < 2 × R ( t % ) even though an adversary cracks twice as many passwordsby increasing his threshold from t % to t % . Diminishing Returns:

We note that the original model ofBlocki and Datta [10] is a special case of our model when a = (no diminishing returns). There are a number ofreasons why an attacker may encounter diminishing returns ( a < 1 ) for additional cracked passwords. First, if theattacker plans to sell the passwords on the black marketthen basic economics suggests that increasing the supply ofcracked passwords is likely to drive down prices. In the case of a large breach like Yahoo! (500 million passwords) it isconceivable the number of available passwords on the blackmarket might quickly increase by two orders of magnitude.Second, the more user accounts that are hacked/activelyexploited the more likely it is that the original breach will bedetected. If the breach is detected then an organization canask (or require) users to change their passwords or requiretwo-factor authentication, which will reduce the value ofeach cracked password . Interpreting model parameter v : We note that we have v = R ( ∞ ) , where R ( ∞ ) × N denotes the total value ofa completely cracked password dataset of size N . Thus,the parameter v denotes the average value of a crackedpassword given that all password have been cracked. We canestimate this parameter v based on black market sales data.For example, suppose that we know that R ( t % ) = $ × %e.g., from equilibrium black market prices when only % ofcracked passwords are on the market. In this case we canextrapolate v = R ( ∞ ) = R ( t % ) = a R ( t % ) = $ − a . (1) Rational Attacker Behavior:

Formally, the rational ad-versary will select the threshold t ∗ maximizing his overallutility t ∗ = arg max t ( R ( t ) − C ( t )) . Intuitively, a rational adversary should stop guessingif the marginal cost of one more password guess exceedsthe marginal beneﬁt of that guess. Thus, we will have MC ( t ∗ ) = C ( t ∗ )− C ( t ∗ − ) ≈ MR ( t ∗ ) = R ( t ∗ )− R ( t ∗ − ) .The marginal cost of increasing the threshold from t − to t is MC ( t ) = C ( t ) − C ( t − ) = k  − t − (cid:88) j = p j  . (2)Intuitively, the attacker pays an extra cost k to hash pwd t if and only if the ﬁrst t − guesses are incorrect. Similarly,the attacker’s marginal revenue is MR ( t ) = R ( t ) − R ( t − ) when a = we have MR ( t ) = v × p t otherwise MR ( t ) = v  t (cid:88) j = p j  a −  t − (cid:88) j = p j  a  × p t . (3)Note that λ t ∗ denotes the expected fraction of passwordscompromised by an rational attacker. Given a speciﬁc as-sumption about the password distribution (e.g., Zipf’s law)we can derive bounds on λ t ∗ .Competition. We do not attempt to directly model thebehavior of an adversary who faces competition from otherpassword crackers. Many breaches (e.g., Yahoo!, LinkedIn,

5. However, the cracked passwords arguably still have signiﬁcant valueafter the breach is detected for two reasons. First, many users will notupdate their passwords unless they are required to do so. Second, manyof the users that do update their passwords may do so in a predictableway [28]. Third, many users will have the same password for otheraccounts. ropbox) remained undetected for several years. In thesecases it may be reasonable to assume that the passwordcracker faced no competition. However, competition cer-tainly could occur in the event that the breach is public(e.g., Ashley Madison). In an extremely competitive setting(e.g., password for a cryptocurrency wallet) only the ﬁrstattacker to crack the password will be rewarded . Suchcompetition would decrease the expected reward for eachcracked password and could potential reduce the total % ofpasswords cracked by each individual attacker.However, from the defender’s point of view the goal isto minimize the % of passwords that are cracked by any attacker. Thus, we can argue that competition will have aminimal impact on the total % of cracked passwords. Inparticular, even in an extremely competitive setting whereonly the ﬁrst attacker to ﬁnd the password is rewarded westill have CompCrack ( v, a ) ≥ min ≤ p ≤ max { Cracked ( pv, a ) , 1 − p } . Here

CompCrack ( v ) (resp. Cracked ( v ) ) denotes the % ofpasswords that are cracked by some attacker when the valueof a password is v and attackers face competition (resp. donot face competition). This follows because the expectedreward for attacker when faced with competition is at least R comp ( t ) ≥ p first × R ( t ) where p first is the probabilitythat no competing attacker managed to crack the passwordalready. If p first is small then the marginal rewards will alsobe small so the attacker may quit earlier, but in this case itis likely that another attacker has already compromised theaccount ( − p first ).Defender Actions. The value λ t ∗ will depend on k , v as well as the underlying password distribution p ≥ p ≥ . . . . The goal of key-stretching is to increase k so thatwe can reduce λ t ∗ , the fraction of compromised accounts,in the event of an authentication server breach. However,the defender is constrained by server workload and byauthentication times. In particular, the number of sequentialhash iterations ( τ ) is bounded by usability constraints asusers may be unhappy if they need to wait a long timeto authenticate e.g., it would at least a second to computePBKDF2-SHA256 with τ = hash iterations on a mod-ern CPU [29]. Similarly, the total workload k is similarlybounded by workload constraints e.g., the authenticationserver must be able to handle all of the authentication re-quests even during trafﬁc peaks. If the value v is sufﬁcientlylarge (in proportion to the cost k of a password guess) thena rational attacker will crack every password λ t ∗ = . Inthis case we say that all of the key-stretching effort wasuseless against a value v rational adversary.Password hashing algorithms like BCRYPT, PBKDF2and SCRYPT have parameters that control the running time(number of hash iterations) τ and total cost k of computingthe password hash function. Thus, the cost k of computing

6. However, we remark that in many instances attackers may unknow-ingly “share” the beneﬁt of a cracked account. For example, an attackerwho cracks a password may not actually change the password since suchan action would alert the legitimate user of the breach.

PBKDF2 or BCRYPT is k = τ × C H , where C H denotesthe cost of computing the underlying hash function (e.g.,SHA256 or Blowﬁsh). We will treat C H as a unit of mea-surement when we report the cost k and write k = τ forthe BCRYPT and PBKDF2 functions. Given an estimate of C H in USD we will use k $ = k × C H to denote the cost ofcomputing the password hash function in USD.Intuitively, a memory hard function is a function whosecomputation requires large amounts of memory. One of thekey advantages of a memory hard function is that cost k potentially scales with τ instead of τ making it possible toincrease costs without introducing intolerable authenticationdelays. An ideal memory hard function runs in time τ andrequires τ blocks of memory to compute. Thus, the Areax Time (AT) complexity of computing the Memory HardFunction scales with τ because the adversary must allocate τ blocks of memory for τ units of time. In particular, weuse k = τ × C H + τ × C mem to model the approximatecost of computing a memory hard function which iterativelymakes τ calls to the underlying hash function H and requires τ blocks of memory. By contrast, the AT complexity ofBCRYPT and PBKDF2 is just k = τ since these functionscan be computed with a single block of memory. Here, C mem is a constant representing the core memory-arearatio. That is the area of one block of memory on chipdivided by the the area of a core evaluating H on chip.In this paper we use the estimate C mem ≈ as in[30], [31] though we stress that our analysis could be easilyrepeated with different parameter choices.Model Limitations. To keep exposition simple we donot attempt to incorporate any model of equilibrium pricesfor cracked passwords on the black market and insteadassume that the value of a cracked password v $ is static forall users. A targeted adversary may have higher valuationsfor speciﬁc user passwords e.g., celebrities, politicians. Sim-ilarly, an attacker who ﬂoods a black market with crackedpasswords may drive equilibrium prices down. Our primaryﬁndings would not be altered in any signiﬁcant way byincluding such a model unless equilibrium prices drop by – orders of magnitude [16]. We also remark that ourintention is to model an untargeted economically motivatedattacker and not a nation state focused on cracking thepasswords of a particular person of interest. However, it maystill be reasonable to believe that a nation state attacker willbe largely be constrained by economic considerations (e.g.,expected value of additional intelligence gained by crackingthe password versus expected cost to crack password).

3. Yahoo! Passwords follow Zipf’s Law

Zipf’s law states that the frequency of an element in adistribution is related to its rank in the distribution. Thereare two variants of Zipf’s law for passwords: PDF-Zipfand CDF-Zipf. In the CDF-Zipf model we have λ t = (cid:80) tj = p i = y · t r , where the constants y and r are the CDF-Zipf parameters. In the PDF-Zipf model we have f i = Ci s ,where s and C are the PDF-Zipf parameters. Normalizingby N the number of users we have p i = zi s , where z = CN .ang et al. [32] previously found that password frequen-cies tend to follow PDF-Zipf’s law if the tail of the passworddistribution (e.g., passwords with frequency f i < 5 ) isdropped. Wang and Wang [11] subsequently found thatCDF-Zipf’s model is superior in that the CDF-Zipf ﬁtswere more stable than PDF-Zipf ﬁts and that the CDF-Zipf ﬁt performed better under Kolmogorov-Smirnov (KS)tests. Furthermore, the CDF-Zipf model can ﬁt the entirepassword distribution (e.g., without excluding passwordswith frequency f i < 5 ). These claims were based on analysisof several smaller password datasets ( N ≤ millionusers) which were released by hackers.In 2016 Yahoo! allowed the release of a differentiallyprivate list of password frequencies for users of their ser-vices [13]. We refer an interested reader to [12], [13] foradditional details about how the Yahoo! data was collectedand how it was perturbed to preserve differential privacy.The Yahoo! dataset is superior to other datasets in thatit offers the largest sample size N = million and thedataset was collected and released by trusted parties. Weshow that the Yahoo! dataset is also well modeled by CDF-Zipf’s law. Our analysis comprises the strongest evidenceto date of Wang and Wang’s premise [11] that passworddistributions follow CDF-Zipf’s law due to the advantagesof the Yahoo! dataset. We focus on the CDF-Zipf’s lawmodel in this section since it can ﬁt the entire passworddistribution [11]. We also veriﬁed that the Yahoo! dataset isalso well modeled by PDF-Zipf’s law if we drop passwordswith frequency f i < 5 like Wang et al. [32], but we omitthis analysis from the submission due to lack of space.The rest of this section is structured as follows: First, insection 3.1 we discuss the advantages of using the Yahoo!dataset over leaked datasets like RockYou. In 3.2 we showthat the noise that was added to preserve differential privacywill have a negligibly small impact on CDF-Zipf ﬁttings. Insection 3.3 we use subsampling to show that the CDF-Zipfﬁttings for Yahoo! converge to a stable solution. Finally, insection 3.4 we present the CDF-Zipf ﬁtting for the entireYahoo! dataset. The Yahoo! frequency corpus offers many advantagesover breached password datasets such as RockYou orTianya. • The Yahoo! password frequency corpus is based on million Yahoo! passwords — more than twice as largeas any of the breached datasets analyzed by Wang andWang [11]. • The records were collected in a trusted fashion. Noinﬁltration, hacking, tricks, or general foul play wasused to obtain any of this data. There was no ulteriormotive behind collecting these passwords other thanto provide valuable data in a way that can be usedfor scientiﬁc research. By contrast, it is possible thathackers strategically omit (or inject) password databefore they release a breached dataset like RockYou or List Version y σ y RockYou Standard

RockYou Diff. Private ∗ − r σ r RockYou Standard

RockYou Diff. Private ∗ − R σ R RockYou Standard

RockYou Diff. Private ∗ − TABLE 1: Impact of Differential Privacy on CDF FitTianya! Why should we trust rogue hackers to provideresearchers with representative password data? • Breached password datasets often contain many pass-words/ accounts that look suspiciously fake. In 2016Yang et al [14] suggested that such passwords canbe removed with DBSCAN [33]. Cleansing operationsended up removing a reasonable portion of the dataset(e.g., 5 million passwords were removed from Rock-You’s data). With the Yahoo! data such cleansing isnot needed, as it was collected in a manner that ensuredcollected passwords were in use. Previous work that hasbeen done on Zipf distributions in breached passworddatasets [11] did not perform any sort of sanitizing stepon the data. It is unclear how such operations wouldaffect the Zipf law ﬁt. • The information is released in a responsible way thatpreserves users’ privacy. The differential privacy mech-anism means that even with the released data it isnot possible to determine any new information aboutYahoo’s users that an adversary would not be able toobtain anyways. • Data from the Yahoo! password frequency corpus ulti-mately is derived from the passwords of active Yahoo!users who were logging in during the course of thestudy as opposed to passwords from throwaway ac-counts that have been long forgotten.

The published Yahoo! password frequency lists wereperturbed to ensure differential privacy. Before attemptingto ﬁt this dataset using Zipf’s law we seek to answer thefollowing question: Does this noise, however small, affectour CDF-Zipf ﬁtting process in any signiﬁcant way? Weclaim that the answer is no, and we offer strong empiricalevidence in support of this claim. In particular, we took theRockYou dataset ( N ≈ million users) and generated different perturbed versions of the frequency list byrunning the ( (cid:15), δ ) -differentially private algorithm of Blockiet al. [13]. We set (cid:15) = , the same value that wasused to collect the Yahoo! dataset that we analyze. Foreach of these perturbed frequency lists we compute a CDF-Zipf law ﬁt using linear least squares regression. To applyLinear Least Squares regression we apply logarithms to theample Size(Millions) y r R

15 0.00949 0.2843 0.954230 0.01321 0.2544 0.953145 0.01592 0.2384 0.952960 0.01810 0.2277 0.9530Full 0.02112 0.2166 0.9544TABLE 2: Yahoo! CDF-Zipf with Sub-samplingCDF-Zipf equation λ t = y · t r to obtain a linear equationlog λ t = log y + r log t .Our results, shown in Table 1, strongly suggest thatthe differential privacy mechanism does not impact theparameters y and r in a CDF-Zipf ﬁtting in any signiﬁcantway. In particular, the parameters y and r we obtain fromﬁtting the original data with a CDF-Zipf model are virtuallyindistinguishable from the parameters we obtain by ﬁtting onone of the perturbed datasets. Similarly, differential privacydoes not affect the R value of the CDF-Zipf ﬁt. Here, R measures how well the linear regression models the data( R values closer to 1 indicate better ﬁttings). Thus, onecan compute CDF-Zipf’s law parameters for the Yahoo!data collected by [13] and [12] without worrying about theimpact of the ( (cid:15), δ ) -differentially private algorithm used toperturb this dataset. We also veriﬁed that the noise added tothe Yahoo! dataset will also have a negligible affect on theparameters s and z in a PDF-Zipf ﬁtting. There are two primary ways to ﬁnd a CDF-Zipf ﬁt:Golden Section Search (GSS) and Linear Least Squares(LLS). Wang et al. [11] previously found that CDF-Zipf ﬁtsstabilize more quickly with GSS than with LLS. This wasparticularly important because the largest dataset they testedhad size ≈ × . In this section we test the stability ofLLS by subsampling from the much larger Yahoo! dataset.In particular, we subsample (without replacement) datasetsof size million, million, million and million anduse LLS to compute the CDF-Zipf parameters y and r foreach subsampled dataset. Our results are shown in table 2graphically in Figure 1. While the CDF-Zipf ﬁt returned byLLS does take longer to stabilize our results indicate thatit does eventually stabilize at larger (sub)sample sizes (e.g.,the Yahoo! dataset).We also found that the PDF-Zipf parameters s and z stabilize before N = × samples. We used both LLS regression and GSS to obtain separateCDF-Zipf ﬁttings for the Yahoo! dataset. The results, shownin table 3 and graphically in Figure 6 showed that bothmethods produce high quality ﬁttings. In addition to the Fig. 1: Yahoo! CDF-Zipf SubsamplingMethod y r R KSLLS 0.0211 0.2166

GSS 0.03315 0.1811

TABLE 3: Yahoo! CDF-Zipf Test Resultsparameters y and r we report R values and Kolmogorov-Smirnov (KS) distance. The KS test can be thought of asthe largest distance between the observed discrete distribu-tion F n ( x ) and the proposed theoretical distribution F ( x ) .Formally, D KS = sup | F n ( x ) − F ( x ) | Intuitively, smaller D KS values (resp. larger R values)indicates better ﬁts. Both LLS and GSS produce high qualityCDF-Zipf ﬁttings (e.g., R = ) for the Yahoo! dataset.LLS regression outperforms the golden section search underboth R and Kolmogorov-Smirnov (KS) tests. Wang andWang [11] had previously adopted golden section searchbecause the results stabilized quickly. While this was mostlikely the right choice for smaller password datasets likeRockYou, our analysis in the previous section suggest thatLLS eventually produces stable solutions when the samplesize is large (e.g., N ≥ million samples) as it is in theYahoo! dataset. Thus, in the remainder of the paper we usethe CDF-Zipf parameters y = and = fromLLS regression. We stress that the decision to use the CDF-Zipf parameters from LLS instead of the parameters returnedby GSS does not affect our ﬁndings in any signiﬁcant way.We remark that LLS is also more efﬁcient computation-ally. While we were able to run GSS to ﬁnd a CDF-Zipfﬁt for the Yahoo! dataset ( N ≈ × ), running GSS ona dataset of N = billion passwords (e.g., the size of themost recent Yahoo! breach [34]) would be difﬁcult if notintractable. By contrast, LLS could still be used to ﬁnd aCDF-Zipf ﬁtting and our analysis suggests that the ﬁt wouldbe superior.ataset y r T ( y, r, 1 ) T ( y, r, 0.8 ) RockYou × × × × Battleﬁeld × × Tianya × × Dodonew × × CSDN × × Mail.ru × × Gmail × × Flirtlife.de × × Yahoo! 0.0211 0.2166 × × TABLE 4: CDF-Zipf threshold T ( y, r, a ) < v/k at whichadversary cracks % of passwords for a ∈ {

1, 0.8 } .

4. Analysis of Rational Adversary Model forZipf’s Law

In this section, we show that there is a ﬁnite threshold T ( y, r, a ) which characterizes the behavior of a rationalofﬂine adversary when user passwords follow CDF-Zipf’slaw with parameters y and r i.e., λ i = yi r . In particular,Theorem 1 gives a precise formula for computing thisthreshold T ( y, r, a ) . If v/k ≥ T ( y, r, a ) then a rationalvalue v adversary will proceed to crack all user passwordsas marginal guessing rewards will always exceed marginalguessing costs for a rational attacker. In Table 4 we usethis formula to explicitly compute T ( y, r, a ) for the Yahoo!dataset as well as for nine other password datasets analyzedby Wang and Wang [11].We note that we choose to focus on CDF-Zipf’s lawin this section as it is believed to be better than PDF-Zipf models. However, we stress that similar bounds can bederived using PDF-Zipf’s law though we omit these resultsfrom the submission for lack of space. Theorem 1.

Let k denote the cost of attempting a passwordguess. If vk ≥ T ( y, r, a ) = max t ≤ Z (cid:18) − y ( t − ) r y a ( ra ) t ra − (cid:19) where Z = (cid:38)(cid:18) (cid:19) (cid:39) + then a value v rational attacker will crack % ofpasswords chosen from a Zipf’s law distribution withparameters y and s . Proof :

Suppose a password frequency distribution fol-lows Zipf’s Law, for some parameters and y , so that λ n = yn r . Since the marginal revenue is MR ( t ) = v ( λ at − λ at − ) and the marginal cost is MC ( t ) = k (cid:16) − (cid:80) tn = p n (cid:17) , a rational adversary can be assumed to

7. We remark that when a = it is possible to derive a closed formexpressing for the threshold T ( y, r, a ) . continue attacking as long as MR ( t ) ≥ MC ( t ) . Therefore,the attacker will not quit as long as v ( λ at − λ at − ) ≥ k (cid:32) − t (cid:88) n = p n (cid:33) v ( y a t ra − y a ( t − ) ra ) ≥ k ( − y ( t − ) r ) In particular, the attacker will not quit as long as vk ≥ − y ( t − ) r y a t ra − y a ( t − ) ra . Notably, if vk ≥ max t (cid:16) − y ( t − ) r y a t ra − y a ( t − ) ra (cid:17) for all t , then arational adversary will eventually crack all passwords. For g ( t ) := y a ( ra ) t ra − , we have y a t ra − y a ( t − ) ra = (cid:82) tt − g ( x ) d x . Since ra ≤ , then g ( t ) ≤ g ( x ) ≤ g ( t − ) for all x ∈ [ t −

1, t ] . Thus we have y a ( ra ) t ra − ≤ y a t ra − y a ( t − ) ra ≤ y a ( ra )( t − ) ra − andmax t (cid:18) − y ( t − ) r y a t ra − y a ( t − ) ra (cid:19) ≤ max t (cid:18) − y ( t − ) r y a ( ra ) t ra − (cid:19) . Thus, it sufﬁces to prove that vk ≥ max t f ( t ) where f ( t ) := (cid:16) − y ( t − ) r y a ( ra ) t ra − (cid:17) . From the theorem statement we have vk ≥ f ( t ) holds for any t ≤ Z ; it remains to argue that the same istrue when t > Z . Since we already know that f ( Z ) ≤ v/k ,it sufﬁces to show that the function f ( · ) is decreasing over [ Z, ∞ ) i.e., f (cid:48) ( t ) ≤ for all t ≥ Z .We calculate the derivative f (cid:48) ( t ) as follows f (cid:48) ( t ) = − ( t − ) r − t − ra y − a a + ( − ra ) t − ra y − a ( − ( t − ) r y ) ra , so that f (cid:48) ( t ) ≤ if and only ( − ra ) y − a ( − ( t − ) r y ) rat ra ≤ y − a t ( t − ) r − at ra ( − ra )( − ( t − ) r y ) ≤ yt ( t − ) r − r ( − ra ) ≤ y ( t − ) r − (( t − )( − ra ) + tr ) . Since ( t − )( − ra ) ≤ ( t − )( − ra ) + tr , then thelast expression certainly holds true if ( − ra ) ≤ y ( t − ) r − ( t − )( − ra ) or equivalently, ≤ ( t − ) r . Since Z := (cid:24) + (cid:16) (cid:17) (cid:25) , it follows that f (cid:48) ( t ) ≤ for all t ≥ Z . (cid:3)

5. Analysis of Previous Password Breaches

In this section, we apply our economic model to ana-lyze the consequences of recent password breaches and theimpact of defenses that could have been adopted.

We focus on the following breaches in our analysis: .1.1. Yahoo!.

Attackers stole password hashes for million Yahoo! users in 2014, though the breach was un-known to the general public until 2016 [35]. While Yahoo!used BCRYPT to hash passwords , they have not publiclyspeciﬁed the number of hash iterations τ that they used.However, we do have empirical password frequency datafrom 70 million Yahoo! users which allowed us to derivedCDF-Zipf parameters y = and r = forYahoo! passwords. Thus, we can predict the % of crackedpasswords for different values of τ that Yahoo! might havechosen. Attackers stole password hashes for ≈ million Dropbox users though the breach was unknownto the general public until 2016 [4]. Dropbox used BCRYPTat level (i.e., τ = = hash iterations) to hashpasswords. We don’t have empirical password data fromDropbox users from which we can derive Zipf’s law pa-rameters y and r . However, we have Zipf’s law parametersfor many other datasets such as RockYou, Tianya, CSDNand Yahoo! allowing us to predict how many passwords avalue v adversary would crack if, say, Dropbox passwordsand RockYou passwords have similar strength. Arguably,Dropbox passwords could be quite valuable as they are oftenused to protect sensitive data. Attackers stole nearly millionAshleyMadison password hashes [36] in 2015 and releasedthe stolen data publicly a month later. AshleyMadison pri-marily used BCRYPT at level ( τ = =

4, 096 hash it-erations) to hash passwords [37]. However, CynoSure Primenoticed that some passwords were effectively protected withMD5 instead of BCRYPT due to an implementation error.CynoSure Prime managed to crack approximately millionof these MD5 hashes in just days [36], though it has beenclaimed that most of the passwords protected by BCRYPTare uncrackable [37]. Similar to Dropbox, we do not haveZipf’s law parameters for AshleyMadison users. However, itis plausible to believe that these parameters are comparableto the parameters derived from other datasets such as Yahoo!or RockYou! LastPass was using PBKDF2-SHA256with τ = rounds of iteration when they were breachedin 2015. Similar to AshleyMadison and Dropbox breacheswe don’t currently have Zipf’s law parameters for LastPasspasswords though we can still predict how many passwordswould be breached under the assumption that these pass-words have similar strength to passwords in other datasetslike RockYou or Yahoo! Arguably master passwords willbe more valuable to an attacker than regular passwords asa master password will unlock multiple user accounts. On

8. An earlier 2013 Yahoo! breach affected approximately billion Ya-hoo! users [34]. We focus on the 2014 breach because the breach occurredafter Yahoo! upgraded their password hashing algorithm from MD5 toBCRYPT. We note that any negative ﬁndings about the 2014 breach willcertainly extend to the earlier breach since a weaker hashing algorithm wasinvolved. the other hand previous research [12] has not found a clearcorrelation between password strength and account value.Estimating v . As described in Section 2 the value v represents the value per password when all passwords are re-leased on the market. Thus, although the actual black marketprices may vary with supply, the parameter v is ﬁxed. Ourestimate of this value parameter will depend on the currentblack market price, and model parameter a (diminishingreturns). In Table 5 we show various estimates of v obtainedfrom multiple estimates of black market password prices.These estimates include measurements from Fossi [16] andmore recent estimates from [25], which ﬁnds that Yahoo!passwords go for 0.70-1.20 USD on the black market. Toobtain the estimates in Table 5, we assume that the blackmarket prices were observed when just 1% of the passwordswere on the market. This allows us to esimate the value v if all passwords were to be released using equation 1. Weremark that the difference between the two estimates [25]and [16] may be explained due to additional black marketsupply. We view a = as substantial diminishing returnse.g., the marginal revenue decreases by a factor of when the attacker compromises all accounts. An interestingdirection for future work may be to estimate the parameter a from a longitudinal study of black markets.Translating between v and v $ . Bonneau andSchechter [29] observed that in 2013, Bitcoin miners wereable to perform approximately SHA-256 hashes inexchange for bitcoin rewards worth about $ . Corre-spondingly, one can estimate the cost of evaluating a SHA-256 hash to be approximately C H = $ × − . Alterna-tively, the cost can be viewed as the economic opportunitycost of evaluating each hash function (for instance, rentinga botnet or computing on a cloud platform.) Because Bit-coin mining is almost exclusively performed on applicationspeciﬁc integrated circuits (ASICs) the above cost analysisimplicitly assumes that the attacker is willing to fabricatean an ASIC to evaluate PBKDF2-SHA256 or BCYRPT.We contend that this is a plausible scenario for a rationalattacker, since fabrication costs would amortize over thenumber of user accounts being attacked (e.g., + million).Furthermore, we note that an attacker who is not willing topay to fabricate an ASIC could obtain similar performancegains using a ﬁeld programmable gate array (FPGA). In section 4 we showed that, if passwords follow CDF-Zipf’s law with parameters y and r , and v/k ≥ T ( y, r, a ) then a rational adversary will crack % of user passwords.Figure 2(a) plots v = k × T ( y, r, 0.8 ) for various thresholdsfrom Table 4 including Yahoo! and RockYou. Thus, for apoint ( v, τ ) lying on the blue line, a value v rational ad-versary will crack % of Yahoo! passwords when he cancompute the hash function at cost k = τ . Note that τ = k for hash functions like BCRYPT and PBKDF2 — the onesused by Yahoo!, Dropbox, AshleyMadison and LastPass.For reference, Figure 2(a) includes the actual values of τ selected by AshleyMadison, Dropbox and LastPass as well ( t % ) (USD) a = 0.8 a = 0.9 a = 1.00.70 0.28 0.44 0.701.20 0.48 0.76 1.204.00 1.59 2.52 4.0030.00 11.94 18.93 30.00TABLE 5: v conversion chartas the value τ = . Bonneau and Schechter estimatedthat SHA256 can be evaluated times in second ona modern CPU [38]. Thus, upper bounds the value of τ that one could select without delaying authentication formore than second when using PBKDF2-SHA256.The plots predict that, unless we set τ (cid:29) , the adversarywill crack % of passwords in almost every instance.In particular, the levels of key-stretching performed byDropbox, AshleyMadison and even Lastpass are all wellbelow the thresholds necessary to protect Yahoo!, RockYouor CSDN passwords.Figure 2(b) is similar to Figure 2(a) except that werescale to y axis to show v $ , given monetary estimationsof computation cost and password values, so that we canfocus on the number of hash iterations necessary to simplyavoid all passwords being cracked.While we do not have CDF-Zipf parameters for otherbreaches such as AshleyMadison, Dropbox, or LastPass,we do have the value τ = k for each of these breaches.Figure 2(c) plots v = k × T ( y, r, 0.8 ) only this time wehold k constant and allow T ( y, r, 0.8 ) to vary. For example,in the black line we ﬁx k = τ = since LastPassused PBKDF2-SHA256 with τ = hash iterations andallow T ( y, r, 0.8 ) to vary. The vertical lines represent thethresholds T ( y, r, 0.8 ) we derive from CDF-Zipf’s law ﬁtsfor RockYou, Tianya and Yahoo! Table 4 shows the valueof T ( y, r, 0.8 ) obtained from different password datasets.Observe that in all of cases we had T ( y, r, 0.8 ) ≤ × .As in Figure 2(b) the y -axis in Figure 2(c) is scaled toshow the value v $ in USD (estimated). Thus, if Dropbox(resp. AshleyMadison/LastPass) passwords have compara-ble strength to Yahoo! passwords (resp. Tianya, RockYou)then a rational adversary would crack % of these pass-words. Indeed, Figure 2(c) shows that unless the thresholds T ( y, r, a ) for Dropbox/LastPass/AshleyMadison are signif-icantly larger than the previously observed thresholds, a ra-tional adversary would be compelled to crack all passwords,given the range of password values. For example, even if thethreshold T ( y, r, a ) for Dropbox exceeds the threshold forYahoo! by four orders of magnitude then the adversary willstill crack % of these passwords. Figures 2(a), 2(b) and 2(c) paint a grim picture. PBKDF2and BCRYPT most likely provide dramatically insufﬁcientprotection for most AshleyMadison, Dropbox, Yahoo! and LastPass users — even if we used the lowest estimationof the value parameter v from Table 5 ( v $ = USD)and we assume that the attacker faces substantial dimin-ishing returns ( a = ) for additional cracked passwords.Furthermore, it would not have been possible to providesufﬁcient protection for users using PBKDF2 or BCRYPTwithout introducing intolerable authentication delays ( ≥ second).Our analysis assumes that the password distribution trulyfollows CDF-Zipf’s law. While previous research (e.g., [32],[11] and our own results in Section 3) strongly supports thehypothesis that most of the password distribution followsZipf’s law, it is not possible to deﬁnitively state that thetail of the password distribution does not follow Zipf’s lawsince each of the passwords in the tail were (by deﬁnition)observed with low frequency. We stress that even if CDF-Zipf’s law does not ﬁt the tail of the password distributionthat T ( y, r, a ) still characterizes adversary behavior. Forexample, suppose that the ( − x ) % of passwords followa Zipf’s law distribution with parameters y, r while x % ofpasswords in the tail of the password distribution do not. Inthis case, whenever v/k ≥ T ( y, r, a ) we a rational adversarywill crack at least ( − x ) % of the user’s passwords whichfollow Zipf’s Law. Memory hard functions potentially provide a way ofincreasing computation cost without drastically increasingcomputation time. As the name suggests memory hardfunctions require a large amount of memory to evaluate.Thus, the cost of purchasing/renting hardware for pass-word cracking, approximated by a functions Area x Time(AT) complexity, can be substantial for an attacker. Specif-ically, AT complexity of SCRYPT [9], scales quadraticallywith the number of time steps [39]. Thus, as discussedin Section 2, we estimate k $ = τC H + τ C mem , where C H ≈ $ × − [29] and C mem ≈ C H as in [30], [31].In the last section we assumed that the attacker faced ag-gressive diminishing marginal returns for additional crackedpasswords and we used the lowest possible estimations ofadversary value ﬁnding that an attacker still cracks %of passwords from a Zipf’s law distribution. By contrast, inthis section we operate under the conservative assumptionsthat the attacker does not face diminishing returns and weuse the larger estimations of adversary value in our analysis.Nevertheless, we ﬁnd that the use of MHFs can substantiallyreduce the % of cracked passwords.Figure 3 plots v $ (estimate) versus the minimum valueof τ necessary to prevent a rational attacker from cracking % of passwords. For example, the blue line predicts thatif Yahoo! had adopted memory hard functions with only τ = iterations ( seconds) then a value $ adversarywill not crack all passwords selected from a CDF-Zipf’s lawdistribution with the parameters y = and r = ,the parameters for our CDF-Zipf’s ﬁt for Yahoo! passwords.By contrast, Yahoo! would need to set τ = ( ≈ seconds) when using a function like PBKDF2 or BCRYPT

10 12 14 16 18 20 22 24012345 · − log ( τ ) v ( × ) RockYouCSDNYahoo! v $ = $ (estimate) v $ = $ (estimate)Dropbox τ AshleyMadison τ NIST τ (min)LastPass ττ = (1sec) (a) v/k = T ( y, r, 0.8 ) for RockYou, CSDNand Yahoo! log ( τ ) v $ RockYouCSDNYahoo! v $ = $ (estimate) v $ = $ (estimate) τ = (1sec)Dropbox τ AshleyMadison τ NIST τ (min)LastPass τ (b) v $ vs. τ for v = k × T ( y, r, 0.8 ) . log ( T(y,r,0.8) ) v $ AshleyMadison ( k = )Dropbox ( k = )LastPass ( k = ) NIST MIN ( k = ) v $ = $ $ = $ RockYouCSDNYahoo! (c) v $ versus T ( y, r, 0.8 ) when v = k × T ( y, r, 0.8 ) , at ﬁxed values of k Fig. 2: ( a = )just to ensure that the adversary does not crack % ofpasswords when a = .

14 16 18 20 22 24 26 28 300102030 log ( τ ) v $ Effects of Memory Hard Functions

RockYouYahoo!RockYou MHFYahoo! MHF ( τ = ) LastPass τv $ = $ (estimate) v $ = $ (estimate) Fig. 3: Memory Hard Functions: v $ vs τ when v = k × T ( y, r, 1 ) using thresholds T ( y, r, 1 ) for RockYou andYahoo! k = τC H + τ C mem for MHFs and k = C H × τ otherwise.Figure 3 predicts that MHFs prevent a rational adversaryfrom cracking all passwords from a Zipf’s law distribution.Of course, if the adversary still cracks % of passwordsthen this result would not be particularly exciting. Figure 4plots % cracked passwords vs. τ against a value v $ = $ adversary. These plots provide an optimistic outlook forMHFs. For example, the plots predict that we can signif-icantly reduce the % of cracked passwords (easily below %) with out introducing unacceptably long authenticationdelays when passwords follow a Zipf’s law distribution. Bycontrast, the plots predict that we would need to set τ ≈ ( + seconds) to achieve the same result using PBKDF2or BCRYPT when a = .

20 25 30 35020406080100 log ( τ ) % c r ac k e d Effects of Memory Hard Functions

RockYou MHFYahoo! MHFRockYouYahoo!max ( ) Fig. 4: Memory Hard Functions: % cracked by value v = $ adversary against MHF with running time parameter τ .

6. Model Independent Analysis

In this section we derive model-independent upper andlower bounds on the % of users whose passwords would becracked by a rational adversary. The advantage of a modelindependent analysis is that the bounds we derive apply evenif we do not make any assumptions about the shape of thepassword distribution. As we observed previously it is notpossible to deﬁnitively claim that the tail of the passworddistribution follows Zipf’s law — even if the tail of thedistribution is not known to be inconsistent with Zipf’slaw [32], [11]. The disadvantage of a model independentanalysis is that the bounds we are able to derive may notalways be tight as the bounds we may be able to deriveusing speciﬁc modeling assumptions e.g., Zipf’s law. In thissection we assume for the sake of simplicity that a = i.e., the marginal value of each additional cracked passwordremains constant.Suppose that we are given N independent samples wd , . . . , pwd N ← X from an (unknown) distribution X . As before, we will let f i denote the number of userswho chose password pwd i in a dataset and without loss ofgenerality assume that these frequencies are sorted so that f i ≥ f i + . We can use f i to obtain an estimate ^ p i = f i N for p i , the true probability that that a random user selects thepassword pwd i . While we do have ^ p i ≥ ^ p i + we stressthat we may no longer assume that p i ≥ p i + since ourempirical value ^ p i (resp. ^ p i + ) may over/under estimatethe true probability p i . Theorem 2 lower bounds the number of passwords thatwill be cracked by a rational adversary in expectation. Theexpectation is taken over N passwords sampled from X N . Theorem 2. If Vk ≥ NL and a = then a rational adversarywill crack at least (cid:88) i : f i ≥ j f i − N ( j − )! L j − user passwords, in expectation.The proof of Theorem 2 is in appendix A. The proofbegins with the observation that a password pwd t = pwd i will certainly be cracked by a value V adversary if p i ≥ .We then introduce the notion of a ( j, L ) -bad overestimate . Inparticular, a ( j, L ) -bad overestimate for pwd t occurs when p i < but f i ≥ j . If we have f i ≥ j then either p i ≥ and the password will be cracked, or we have a ( j, L ) -badoverestimate for the password pwd t . We can then show that N ( j − )! L j − upper bounds the expected number of passwords pwd t with ( j, L ) -bad overestimates. In contrast to Theorem 2 , Theorem 3 upper bounds the% of passwords that we expect an attacker to compromise.

Theorem 3. If Vk ≤ NL (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) for ﬁxed (cid:15) and t , then except with probability exp (cid:16) − (cid:15) N (cid:80) ti = p i ( + (cid:15) ) (cid:17) ,a rational adversary will crack at most (cid:80) i : f i >j f i + µ ( N, L, j ) user passwords where µ ( N, L, j ) = (cid:88) i : i ≤ j f i j − (cid:88) (cid:96) = (cid:18) N − (cid:19) (cid:18) (cid:19) (cid:96) (cid:18) NL − (cid:19) N − (cid:96) − . The proof of Theorem 3 is in appendix A. Brieﬂy,we apply Chernoff bounds to show that, if Vk ≤ NL (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) , then with high probability thenumber of user passwords in our dataset that a rationaladversary cracks is at most (cid:88) i : f i ≥ j f i + (cid:88) i f i × C i . Here, C i denotes the event that we have a ( j, L ) -bad under-estimate for the password pwd i . We then separately upperbound the sum (cid:80) i f i × C i to obtain the bound in Theorem3. Theorems 2 and 3 allows us to derive different upper andlower bounds by plugging in different values of j and L . Forexample, by increasing j we decrease the term N ( j − )! L j − in Theorem 2, but we also decrease the sum (cid:80) i : f i ≥ j f i .Increasing (resp. decreasing) L is equivalent to assumingthe adversary has a higher (resp. lower) value for crackedpasswords, which intuitively allows us to establish higherlower bounds (resp. smaller upper bounds) on the percentageof passwords cracked. Applying Theorem 2 we can derivespeciﬁc lower bounds for each of the datasets studied by[11] as well as for the Yahoo! frequency corpus. For mostdatasets we obtain our lower bound by setting j = and L = . For the Yahoo! and RockYou datasets we obtainedbetter lower bounds by setting j = and L = . The resultappears below: Dataset Unique PWs Total PWs Vk % crackedRockYou 14,326,970 32,581,870 × × × × × × × × × × × × Remark:

When j = we have (cid:80) i : f i ≥ j f i − N ( j − )! L j − = N − N = meaning that Theorem 2 provides no lower boundon the % of cracked passwords. At ﬁrst glance this mayappear to be a shortcoming of the theorem. However, we ob-serve that it is impossible to obtain better lower bounds with-out making assumptions about the password distribution. Inparticular, let X (resp. X ) be the uniform distribution overa set of (resp. ) passwords. Observe that X and X can induce dramatically different rational attacker behavior(e.g., if the value of a password is k , the adversary willcrack % of passwords if the true password distributionis X and % of passwords if the true distribution is X ).However, if we draw N = n samples from X and X ,then the frequency lists for the two password distributionswill be indistinguishable ( f = f = . . . = f N = ) bybirthday bounds ( N (cid:28) ). Similarly, we may use Theorem 3 toderive model independent upper bounds on the percentageof Yahoo! passwords cracked by a rational adversary asshown in Figure 5. As Figure 5 shows we could potentiallyuse memory hard functions to reduce the % of crackedpasswords to ≈ % without increasing authentication time /k 10 × × % cracked

100 99 61.38 56.53 52.42V/k 5 × × % cracked TABLE 6: Model Independent Upper Bound % crackedpast second. This is particularly, impressive when oneconsiders that an attacker only needs a single guess toachieve success rate %!

18 20 22 24020406080100 log ( τ ) % c r ac k e d % Cracked for Memory Hard Functions v = $ = $ = (1 sec) Fig. 5: Memory Hard Functions: % cracked by value v $ ∈ { $ $ } adversary against an ideal MHF with running timeparameter τ .

7. Related Work

The issue of ofﬂine passwordcracking has been known for decades [40]. Password crack-ing tools have improved steadily as researchers have ex-plored probabilistic password models [41], ProbabilisticContext Free Grammars for passwords [42], [43], [44],Markov chain models [45], [46], [47], [48] and even neuralnetworks [49]. Attackers may also use public resources(e.g., quotes from the Internet Movie Database or projectGutenberg to crack sentence based passwords [50], [51]) aswell as ‘training data’ from previous breaches at companieslike RockYou or Tianya to improve cracking algorithms.Improved password cracking tools make it all the morecrucial to develop secure tools for key-stretching (e.g., data-independent MHFs) to minimize the number of guesses anattacker can try. Allodi has studied the economics of theblack market for certain attacks and malware, which maybe useful in understanding how password cracking marketsmay work [52].

Efforts to encourage(or force) users to select stronger passwords have shown limited success [53], [54], [55], [56], [57], [58] and ofteninduce high usability costs [59]. Users can be encouragedto select stronger passwords by providing feedback duringthe password creation policy (e.g., [60], [61], [62]) or byproviding clear instructions for the user to follow whencreating passwords [50], [63]. Another extensive line ofresearch explored the use of password composition policiesin which a user is required to select a password satisfyingcertain requirements e.g., contains numbers and/or capitalletters [53], [54], [55], [56], [58], [64]. Password compo-sition policies also introduce a high usability cost [57],[65], [66], [59], and they typically do not increase passwordstrength signiﬁcantly. In fact, sometimes these policies resultin weaker user passwords [67], [54]. Similarly, passwordstrength meters often provide inconsistent feedback [61],[62] and they often fail to persuade users to select strongpasswords.Another line of research has focused on helping users togenerate and remember passwords. One prominent sugges-tion is to turn a phrase or a sentence into a password. It hasbeen claimed that these passwords are as strong as randomones [50], [51], and this has been promoted by NIST andby security experts such as Bruce Schneier [68]. However,subsequent research indicates that these suggestions are lesssecure than previously believed [69], [70]. Another line ofresearch seeks to develop and promote secure and usablestrategies for password management when the user needsto create and remember multiple passwords [71], [72], [73],[74]. However, all of these schemes require a motivated user.Bonneau and Schechter [29] and Blocki et al. [63] showedthat users are capable of memorizing higher entropy secrets(e.g., bits) by following spaced repetition schedules. If anorganization has multiple authentication servers then theycould distribute storage and/or computation of the passwordhashes across multiple servers [75], [22], [23], [24]. Juelsand Rivest [76] proposed storing the hashes of fake pass-words (honeywords) and using a second auxiliary server todetect authentication attempts with honeywords (alerting theorganization that an breach has occurred). The expensiverequirement to purchase and maintain extra servers may pre-vent widespread adoption of these proposals. Even if thesedefenses were adopted there is still a clear need to use securekey-stretching mechanisms — an adversary who breachesboth servers can still mount an ofﬂine attack. Another lineof research has sought to include the solution(s) to hardartiﬁcial intelligence problems in the password hash so thatan ofﬂine attacker needs human assistance to verify eachpassword guess [77], [78], [79]. These solutions increaseuser workload during authentication e.g., by requiring theuser to solve a CAPTCHA puzzle [77], [79].

Malone and Kevin initially explored the feasibilityof modeling the distribution of user password choices usingZipf’s law [80]. Wang et al. [32] and Wang and Wang [11]continued this line of work by providing improved tech-iques to ﬁt Zipf’s law parameters to a dataset. Bonneau [12]took a different approach: collect and analyze a massivepassword frequency corpus with permission from Yahoo!The Yahoo! dataset was recently released using a differen-tially private algorithm [13]. We elaborate on Zipf’s lawand the Yahoo! frequency corpus at length in the body ofthe paper.

Key-stretching was proposed as early as 1979 [40] withthe goal of protecting lower-entropy secrets like passwordsagainst ofﬂine attacks by making it economically infeasiblefor an ofﬂine attacker to try millions or billions of guesses.Traditionally key stretching has been performed using hashiteration e.g., PBKDF2 [2] and BCRYPT [1]. However,password hash functions like PBKDF2 and BCRYPT requireminimal memory to evaluate and thus passwords protectedby these hash functions are highly vulnerable to attackerswith customized hardware [81]. Memory hard functions(MHFs), ﬁrst explicitly introduced by Percival [9], are apromising tool for constructing an ideal key-stretching func-tion. MHFs are motivated by the observation that the cost ofstoring/retrieving items from memory is relatively constantacross different computer architectures. At a high level amemory hard function is moderately expensive to computeand most of the costs associated with computing the functionare memory related (e.g., storing/retrieving items from mem-ory). Ideally we want the Area x Time complexity of com-puting a MHF to scale with τ , where τ denotes the runningtime on a standard PC. Intuitively, to compute the MHF oncethe attacker must dedicate τ blocks of memory for τ timesteps, which ensures that the cost of computing the functionis equitable across different computer architectures (memoryon an ASIC is still expensive). By contrast, Area x Timecomplexity to compute BCRYPT or PBKDF2 is simply τ .Recall that we want to increase costs quickly to minimizedelay during authentication. If costs scale with τ thenwe can rapidly drive up costs, and if computation requiresmemory then an adversary will not be able to signiﬁcantlyreduce guessing costs by constructing an ASIC. Almost allof the entrants to the recent Password Hashing Competition(PHC) [8] claimed some form of memory-hardness. There is a type of MHF called a data-independent MHF(iMHF) which is designed to be resistant to side-channelattacks such as cache timing [82], [83]. These functionshave a data access pattern independent of the input. Multi-ple attacks have been shown in several iMHFs [30], [84],[31], [85], [86], [87], [88]. Data dependent MHFs such asSCRYPT [9] have the previously mentioned side-channelvulnerabilities. Even so, SCRYPT has been found to beoptimally memory hard with respect to AT complexity [39],[89]. The authors of Argon2 [18], winner of the passwordhashing competition [8], now recommend running in hybridmode

Argon2id to balance side-channel resistance and re-sistance to iMHF attacks.

8. Discussion

Our economic analysis decisively shows that traditionalkey-stretching tools like PBKDF2 and BCRYPT fail to pro-vide adequate protection for user passwords, while memoryhard functions do provide meaningful protection againstofﬂine attackers. It is time for organizations to upgradetheir password hashing algorithms and adopt modern key-stretching such as memory hard functions [9], [8]. Alter-natively, could a creative organization adapt customizedBitcoin mining rigs for use in password authentication?For example, the Antminer S9 [81], currently available onAmazon for approximately $

3, 000 , is capable of computingSHA256 trillion times per second. If the organizationstored salted and peppered [90], [10] password hash values u, s u , SHA256 ( pwd u | s u | p u ) then it could potentially usethe Antminer S9, or a similar Bitcoin mining rig, to validatea password by quickly enumerating over a (very) large spaceof secret pepper values p (brieﬂy, a secret salt value that isnot stored which even an honest party must brute force).While our analysis demonstrates that the use of mem-ory hard functions can signiﬁcantly reduce the fraction ofcracked passwords, the damage of an ofﬂine attack maystill be signiﬁcant. Thus, we recommend that organizationsadopt distributed password hashing [75], [22], [23], [24]whenever feasible so that an attacker who only breachesone authentication server will not be able to mount anofﬂine attack. Furthermore, we recommend that organiza-tions take additional measures to mitigate the affect of anauthentication server breach. Solutions might include mech-anisms detect password breaches through the use of honeyaccounts or honey passwords[76], multi-factor authentica-tion and fraud detection/correction algorithms to preventsuspicious/harmful behavior [91].While solid options for password hashing and key-derivation exist [9], [8], [18], [87] the reality is that manyorganizations and developers select suboptimal passwordhashing functions [92], [19]. Thus, there is a clear needto provide developers with clear guidance about selectingsecure password hash functions. On a positive note recent2017 NIST guidelines do suggest the use of memory hardfunctions. However, NIST guidelines still allows for theuser of PBKDF2 with just

10, 000 hash iterations. Basedon our analysis we advocate that password hashing stan-dards should be updated to require the use of memory hardfunctions for password hashing and disallow the use of non-memory hard functions such as BCRYPT or PBKDF2. Itmay be expedient for policy makers to audit and/or penalizeorganizations that fail to follow appropriate standards forpassword hashing.We recommend that users primarily focus on selectingpasswords that are strong enough to resist targeted onlineattacks [27] as there is a often a vast gap between therequired entropy to resist online and ofﬂine attacks [7]. Extrauser effort to memorize a high entropy password might becompletely wasted if an organization adopts poor passwordhashing algorithms like SHA1, MD5 [36] or the identityunction [92]. This effort would likely be more productivelyspent on trying to reduce password reuse [72].

9. Acknowledgments

We would like the thank the reviewers for their insightfulcomments. We would also like to thank Ding Wang forsharing code for computing Zipf ﬁttings. The work wassupported by the National Science Foundation under NSFAwards

References [1] N. Provos and D. Mazieres, “Bcrypt algorithm.” USENIX, 1999.[2] B. Kaliski, “Pkcs

Security and Privacy (SP), 2012IEEE Symposium on . IEEE, 2012, pp. 553–567.[6] C. Herley and P. C. van Oorschot, “A research agenda acknowledgingthe persistence of passwords,”

IEEE Security & Privacy , vol. 10,no. 1, pp. 28–36, 2012.[7] J. Bonneau, C. Herley, P. C. van Oorschot, and F. Stajano, “Passwordsand the evolution of imperfect authentication,”

Communications of theACM , vol. 58, no. 7, pp. 78–87, 2015.[8] J.-P. A. et al., “Password hashing competition,” 2015, https://password-hashing.net/.[9] C. Percival, “Stronger key derivation via sequential memory-hardfunctions,” in

BSDCan 2009 , 2009.[10] J. Blocki and A. Datta, “CASH: A cost asymmetric secure hashalgorithm for optimal password protection,” in

IEEE 29th ComputerSecurity Foundations Symposium , 2016, pp. 371–386.[11] D. Wang and P. Wang, “On the implications of zipf’s law in pass-words,” in

Computer Security - ESORICS 2016 - 21st EuropeanSymposium on Research in Computer Security , 2016, pp. 111–131.[12] J. Bonneau, “The science of guessing: analyzing an anonymized cor-pus of 70 million passwords,” in . IEEE, 2012, pp. 538–552.[13] J. Blocki, A. Datta, and J. Bonneau, “Differentially private passwordfrequency lists,” in , 2016.[14] W. Yang, N. Li, I. M. Molloy, Y. Park, and S. N. Chari, “Comparingpassword ranking algorithms on real-world password datasets,” 2016,pp. 69–90.[15] W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer,N. Christin, and L. F. Cranor, “Fast, lean and accurate: Modelingpassword guessability using neural networks,” in

Proceedings ofUSENIX Security , 2016.[16] M. Fossi, E. Johnson, D. Turner, T. Mack, J. Blackbird, D. McKinney,M. K. Low, T. Adams, M. P. Laucht, and J. Gough, “Symantec reporton the underground economy,” November 2008, retrieved 1/8/2013. [17] R. B. Miller, “Response time in man-computer conversational transac-tions,” in

Proceedings of the December 9-11, 1968, fall joint computerconference, part I . ACM, 1968, pp. 267–277.[18] A. Biryukov, D. Dinu, and D. Khovratovich, “Argon2: Newgeneration of memory-hard functions for password hashing andother applications,” in

IEEE European Symposium on Securityand Privacy, EuroS&P 2016, Saarbr¨ucken, Germany, March21-24, 2016 , 2016, pp. 292–302. [Online]. Available: http://dx.doi.org/10.1109/EuroSP.2016.31[19] A. Naiakshina, A. Danilova, C. Tiefenau, M. Herzog, S. Dechand,and M. Smith, “”why do developers get password storage wrong”,”in

Proceedings of the 2017 ACM SIGSAC Conference on Computerand Communications Security , ser. CCS ’17, 2017, p. (to appear).[20] P. A. Grassi, E. M. Newton, R. A. Perlner, A. R. Regenscheid,W. E. Burr, J. P. Richer, N. B. Lefkovitz, J. M. Danker, Y.-Y.Choong, K. Greene et al. , “Digital identity guidelines: Authenticationand lifecycle management,”

Special Publication (NIST SP)-800-63B ,2017.[21] M. S. Turan, E. B. Barker, W. E. Burr, and L. Chen, “Sp 800-132.recommendation for password-based key derivation: Part 1: Storageapplications,” 2010.[22] J. Camenisch, A. Lysyanskaya, and G. Neven, “Practical yet univer-sally composable two-server password-authenticated secret sharing,”in

Proceedings of the 2012 ACM conference on Computer andCommunications Security . ACM, 2012, pp. 525–536.[23] A. Everspaugh, R. Chaterjee, S. Scott, A. Juels, and T. Risten-part, “The pythia prf service,” in , 2015, pp. 547–562.[24] R. W. F. Lai, C. Egger, D. Schr¨oder, and S. S. M.Chow, “Phoenix: Rebirth of a cryptographic password-hardeningservice,” in

Economicsof information security and privacy , pp. 33–53, 2010.[27] D. Wang, Z. Zhang, P. Wang, J. Yan, and X. Huang, “Targeted onlinepassword guessing: An underestimated threat,” in

Proceedings of the2016 ACM SIGSAC Conference on Computer and CommunicationsSecurity . ACM, 2016, pp. 1242–1254.[28] Y. Zhang, F. Monrose, and M. K. Reiter, “The security of modernpassword expiration: an algorithmic framework and empirical analy-sis,” in

ACM CCS 10: 17th Conference on Computer and Commu-nications Security , E. Al-Shaer, A. D. Keromytis, and V. Shmatikov,Eds. Chicago, Illinois, USA: ACM Press, Oct. 4–8, 2010, pp. 176–186.[29] J. Bonneau and S. E. Schechter, “Towards reliable storage of 56-bit secrets in human memory,” in

Proceedings of the 23rd USENIXSecurity Symposium , 2014, pp. 607–623.[30] A. Biryukov and D. Khovratovich, “Tradeoff cryptanalysis ofmemory-hard functions,” in

International Conference on the Theoryand Application of Cryptology and Information Security . Springer,2014, pp. 633–657.[31] J. Alwen and J. Blocki, “Efﬁciently computing data-independentmemory-hard functions,” in

Advances in Cryptology CRYPTO’16 .Springer, 2016.[32] X. H. Ding Wang, Gaopeng Jian and P. Wang, “Zipfs law in pass-words,” Cryptology ePrint Archive, Report 2014/631, 2014, http://eprint.iacr.org/2014/631.33] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-basedalgorithm for discovering clusters in large spatial databases withnoise,” in

KDD

Proceedings of the 23rd USENIX SecuritySymposium , August 2014.[39] J. Alwen, B. Chen, C. Kamath, V. Kolmogorov, K. Pietrzak, andS. Tessaro, “On the complexity of scrypt and proofs of space in theparallel random oracle model,” Cryptology ePrint Archive, Report2016/100, 2016, http://eprint.iacr.org/.[40] R. Morris and K. Thompson, “Password security: A case history,”

Communications of the ACM , vol. 22, no. 11, pp. 594–597, 1979.[Online]. Available: http://dl.acm.org/citation.cfm?id=359172[41] A. Narayanan and V. Shmatikov, “Fast dictionary attacks onpasswords using time-space tradeoff,” in

Proceedings of ACM CCS ,2005, Conference Proceedings, pp. 364–372. [Online]. Available:http://dl.acm.org/citation.cfm?id=1102168[42] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek, “Passwordcracking using probabilistic context-free grammars,” in

IEEESymposium on Security and Privacy , 2009, Conference Proceedings,pp. 391–405. [Online]. Available: http://ieeexplore.ieee.org/xpls/absall.jsp?arnumber=5207658[43] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas,L. Bauer, N. Christin, L. F. Cranor, and J. Lopez, “Guess again(and again and again): Measuring password strength by simulatingpassword-cracking algorithms,” in

IEEE Symposium on Security andPrivacy , 2012, Conference Proceedings, pp. 523–537.[44] R. Veras, C. Collins, and J. Thorpe, “On the semantic patterns ofpasswords and their security impact,” in

Network and DistributedSystem Security Symposium (NDSS’14) , 2014.[45] C. Castelluccia, M. D¨urmuth, and D. Perito, “Adaptive password-strength meters from Markov models,” in

Proceedings of NDSS , 2012,Conference Proceedings.[46] C. Castelluccia, A. Chaabane, M. D¨urmuth, and D. Perito, “Whenprivacy meets security: Leveraging personal information for passwordcracking,” arXiv preprint arXiv:1304.6584 , 2013.[47] J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilistic passwordmodels,” in

Proceedings of the 2014 IEEE Symposium on Security andPrivacy , 2014, pp. 689–704.[48] B. Ur, S. M. Segreti, L. Bauer, N. Christin, L. F. Cranor,S. Komanduri, D. Kurilova, M. L. Mazurek, W. Melicher, andR. Shay, “Measuring real-world accuracies and biases in modelingpassword guessability,” in

Proceedings of the 24th USENIX SecuritySymposium ∼ lbauer/papers/2015/usenix2015-guessing.pdf[49] W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer,N. Christin, and L. F. Cranor, “Fast, lean, and accurate: Modelingpassword guessability using neural networks,” in USENIX SecuritySymposium , 2016, pp. 175–191.[50] J. Yan, A. Blackwell, R. Anderson, and A. Grant, “The memorabilityand security of passwords: some empirical results,”

Technical Report-University Of Cambridge Computer Laboratory , p. 1, 2000. [51] ——, “Password memorability and security: Empirical results,”

IEEESecurity and Privacy , vol. 2, no. 5, pp. 25–31, Sep. 2004.[52] L. Allodi, “Economic factors of vulnerability trade and exploitation:Empirical evidence from a prominent russian cybercrime market,”

ACM CCS ’17 .[53] J. Campbell, W. Ma, and D. Kleeman, “Impact of restrictive com-position policy on user password choices,”

Behaviour & InformationTechnology , vol. 30, no. 3, pp. 379–388, 2011.[54] S. Komanduri, R. Shay, P. G. Kelley, M. L. Mazurek, L. Bauer,N. Christin, L. F. Cranor, and S. Egelman, “Of passwords andpeople: measuring the effect of password-composition policies,”in

CHI , 2011, Conference Proceedings, pp. 2595–2604. [Online].Available: http://dl.acm.org/citation.cfm?id=1979321[55] R. Shay, S. Komanduri, P. G. Kelley, P. G. Leon, M. L. Mazurek,L. Bauer, N. Christin, and L. F. Cranor, “Encountering strongerpassword requirements: user attitudes and behaviors,” in

Proceedingsof the Sixth Symposium on Usable Privacy and Security , ser. SOUPS’10. New York, NY, USA: ACM, 2010, pp. 2:1–2:20. [Online].Available: http://doi.acm.org/10.1145/1837110.1837113[56] J. M. Stanton, K. R. Stam, P. Mastrangelo, and J. Jolton, “Analysisof end user security behaviors,”

Comput. Secur. , vol. 24, no. 2, pp.124–133, Mar. 2005.[57] P. G. Inglesant and M. A. Sasse, “The true cost of unusablepassword policies: Password use in the wild,” in

Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems ,ser. CHI ’10. New York, NY, USA: ACM, 2010, pp. 383–392.[Online]. Available: http://doi.acm.org/10.1145/1753326.1753384[58] R. Shay, S. Komanduri, A. L. Durity, P. S. Huh, M. L. Mazurek,S. M. Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor,“Can long passwords be secure and usable?” in

Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems ,ser. CHI ’14. New York, NY, USA: ACM, 2014, pp. 2927–2936.[Online]. Available: http://doi.acm.org/10.1145/2556288.2557377[59] A. Adams and M. A. Sasse, “Users are not the enemy,”

Communica-tions of the ACM , vol. 42, no. 12, pp. 40–46, 1999.[60] S. Komanduri, R. Shay, L. F. Cranor, C. Herley, and S. Schechter,“Telepathwords: Preventing weak passwords by reading users’minds,” in

Proceedings of USENIX SecuritySymposium , 2012, Conference Proceedings.[62] X. de Carn´e de Carnavalet and M. Mannan, “From very weak tovery strong: Analyzing password-strength meters,” in

Network andDistributed System Security Symposium (NDSS 2014) . InternetSociety, 2014.[63] J. Blocki, S. Komanduri, L. F. Cranor, and A. Datta, “Spaced repeti-tion and mnemonics enable recall of multiple strong passwords,” in , 2015.[64] R. Shay, S. Komanduri, A. L. Durity, P. S. Huh, M. L. Mazurek, S. M.Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor, “Designingpassword policies for strength and usability,”

ACM Trans. Inf. Syst.Secur. , vol. 18, no. 4, p. 13, 2016.[65] M. Steves, D. Chisnell, A. Sasse, K. Krol, M. Theofanos, andH. Wald, “Report: Authentication diary study,” National Institute ofStandards and Technology (NIST), Tech. Rep. NISTIR 7983, 2014.[66] D. Florˆencio, C. Herley, and P. C. Van Oorschot, “An administrator’sguide to Internet password research,” in

Proceedings of the 28thUSENIX Conference on Large Installation System Administration , ser.LISA’14, 2014, pp. 35–52.67] J. Blocki, S. Komanduri, A. Procaccia, and O. Sheffet, “Optimizingpassword composition policies,” in

Proceedings of the fourteenthACM conference on Electronic commerce

Proceedings of the secondsymposium on Usable privacy and security . ACM, 2006, pp. 67–78.[70] W. Yang, N. Li, O. Chowdhury, A. Xiong, and R. W. Proctor, “Anempirical study of mnemonic sentence-based password generationstrategies,” 2016, p. to appear.[71] J. Blocki, M. Blum, and A. Datta, “Naturally rehearsing passwords,”in

Advances in Cryptology-ASIACRYPT 2013 . Springer, 2013, pp.361–380.[72] D. Florˆencio, C. Herley, and P. C. van Oorschot, “Password portfoliosand the ﬁnite-effort user: Sustainably managing large numbers ofaccounts.” in

USENIX Security , 2014, pp. 575–590.[73] M. Blum and S. S. Vempala, “Publishable humanly usable securepassword creation schemas,” in

Third AAAI Conference on HumanComputation and Crowdsourcing , 2015.[74] J. Blocki, M. Blum, A. Datta, and S. Vempala, “Toward human com-putable passwords,”

Innovations in Theoretical Computer Science,ITCS 2017 , 2017.[75] J. G. Brainard, A. Juels, B. Kaliski, and M. Szydlo, “A new two-serverapproach for authentication with short secrets.” in

USENIX Security ,vol. 3, 2003, pp. 201–214.[76] A. Juels and R. L. Rivest, “Honeywords: Making password-crackingdetectable,” in

Proceedings of the 2012 ACM conference on Computerand communications security . ACM, 2013.[77] R. Canetti, S. Halevi, and M. Steiner,

Mitigating DictionaryAttacks on Password-Protected Local Storage . Berlin, Heidelberg:Springer Berlin Heidelberg, 2006, pp. 160–179. [Online]. Available:http://dx.doi.org/10.1007/11818175 10[78] J. Blocki, M. Blum, and A. Datta, “Gotcha password hackers!” in

Proceedings of the 2013 ACM workshop on Artiﬁcial intelligenceand security . ACM, 2013, pp. 25–34.[79] J. Blocki and H.-S. Zhou,

Designing Proof of Human-WorkPuzzles for Cryptocurrency and Beyond . Berlin, Heidelberg:Springer Berlin Heidelberg, 2016, pp. 517–546. [Online]. Available:http://dx.doi.org/10.1007/978-3-662-53644-5 20[80] D. Malone and K. Maher, “Investigating the distribution of passwordchoices,” in

Proceedings of the 21st international conference on WorldWide Web

IACR Cryptology ePrint Archive , vol. 2013, p.525, 2013.[84] J. Alwen and V. Serbinenko, “High Parallel Complexity Graphs andMemory-Hard Functions,” in

Proceedings of the Eleventh AnnualACM Symposium on Theory of Computing , ser. STOC ’15, 2015,http://eprint.iacr.org/2014/238.[85] J. Alwen and J. Blocki, “Towards Practical Attacks on Argon2iand Balloon Hashing,” in

Proceedings of the 2nd IEEE EuropeanSymposium on Security and Privacy (EuroS&P 2017) . IEEE, 2017,pp. 142–157, http://eprint.iacr.org/2016/759.[86] J. Blocki and S. Zhou, “On the depth-robustness and cumulative peb-bling cost of Argon2i,” in

TCC 2017: 15th Theory of CryptographyConference, Part I , ser. Lecture Notes in Computer Science, Y. Kalaiand L. Reyzin, Eds., vol. 10677. Baltimore, MD, USA: Springer,Heidelberg, Germany, Nov. 12–15, 2017, pp. 445–465. [87] J. Alwen, J. Blocki, and B. Harsha, “Practical graphs for optimalside-channel resistant memory-hard functions,” in

ACM CCS 17:24th Conference on Computer and Communications Security , B. M.Thuraisingham, D. Evans, T. Malkin, and D. Xu, Eds. Dallas, TX,USA: ACM Press, Oct. 31 – Nov. 2, 2017, pp. 1001–1017.[88] J. Alwen, J. Blocki, and K. Pietrzak, “Depth-robust graphs andtheir cumulative memory complexity,” in

Advances in Cryptology –EUROCRYPT 2017, Part II , ser. Lecture Notes in Computer Science,J. Coron and J. B. Nielsen, Eds., vol. 10211. Paris, France: Springer,Heidelberg, Germany, May 8–12, 2017, pp. 3–32.[89] J. Alwen, B. Chen, K. Pietrzak, L. Reyzin, and S. Tessaro, “Scrypt isMaximally Memory-Hard,” in

Advances in Cryptology-EUROCRYPT2017 . Springer, 2017, p. (to appear), http://eprint.iacr.org/2016/989.[90] U. Manber, “A simple scheme to make passwords based on one-wayfunctions much harder to crack,”

Computers & Security , vol. 15, no. 2,pp. 171–176, 1996.[91] D. Freeman, S. Jain, M. D¨urmuth, B. Biggio, and G. Giacinto,“Who are you? A statistical approach to measuring user authenticity,”in

ISOC Network and Distributed System Security Symposium –NDSS 2016 . San Diego, CA, USA: The Internet Society, Feb. 21–24,2016.[92] J. Bonneau and S. Preibusch, “The password thicket: technicaland market failures in human authentication on the web,”in

Proceedings of the Ninth Workshop on the Economics ofInformation Security (WEIS) , Jun. 2010. [Online]. Available: http://weis2010.econinfosec.org/papers/session3/weis2010 bonneau.pdf

Appendix

Reminder of Theorem 2. If Vk ≥ NL and a = then arational adversary will crack at least (cid:88) i : f i ≥ j f i − N ( j − )! L j − user passwords, in expectation. Proof of Theorem 2:

We observe that a user password pwd t = pwd i willbe certainly cracked if V × Pr [ pwd ] = Vp i ≥ k sincethe marginal cost of including an extra guess pwd in thedictionary is at most k . Thus, the adversary will compromiseat least (cid:80) i : p i ≥ k/V f i accounts. The problem with this lowerbound is that we need to know p i = Pr [ pwd i ] for eachpassword pwd i to compute it. However, the values p i areunknown if we do not make assumptions about the shape ofthe password distribution. However, we can lower bound thisquantity. In particular, we say that the estimate ^ p i = f i /N is a ( j, L ) -bad overestimate if p i < , but f i ≥ j . Let B i be an indicator random variable for the that ^ p i is ( j, L ) -bad. Then the sum (cid:80) i f i × B i computes the total numberof users whose password got a ( j, L ) -bad overestimate. Theproof now follows from Claims 4, 5 and 6. Claim 4 lowerbounds the fraction of cracked passwords in terms of theevents B i . Claim 4. If Vk ≥ NL then the number of user passwords inour dataset that a rational adversary cracks is at least (cid:88) i : f i ≥ j f i − (cid:88) i f i × B i . roof : Suppose that a user selects a password pwd i with p i ≥ . Since Vp i > k ≥ max t MC ( t ) the marginalreward of guessing pwd i always exceeds the marginalcost. Thus, a rational attacker must eventually guess pwd i .However, if p i < then either f i < j or ^ p i is a ( j, L ) -badoverestimate of p i . Let S denote the set of users who pickeda password i such that f i ≥ j and let T ⊆ S denote the setof users whose password got a ( j, L ) -bad overestimate. Anyuser in the set S \ T will be compromised eventually. Thus, atleast | S \ T | = (cid:80) i : f i ≥ j f i − (cid:80) i f i × B i since | T | = (cid:80) i f i × B i and | S | = (cid:80) i : f i ≥ j f i . (cid:3) Claim 5 bounds the probability of the event B pwd — that isthe probability that we observe password pwd i , with p i < , at least f i ≥ j times conditioned on the event that weobserve pwd i at least once. Claim 5. Pr [ B i | f i ≥ ] ≤ ( j − )! L j − Proof :

We ﬁrst observe thatPr [ B i | f i ≥ ] ≤ (cid:18) Nj − (cid:19) p j − . Recall that for an event which is ( j, L ) -bad, we have that bydeﬁnition, p i < . Thus (cid:18) Nj − (cid:19) p j − = N !( j − )!( N − j + )! p j − ≤ N j − ( j − )! p j − ≤ ( j − )! L j − . (cid:3) Finally, Claim 6 shows that we cannot have too manybad overestimates.

Claim 6. E [ (cid:80) i f i × B i ] ≤ N ( j − )! L j − Proof :

Consider drawing N samples x , . . . , x N fromour password distribution. Let Y i be and indicator randomvariable for the event that the password x i was sampledat least j − additional times even though Pr [ x i ] ≤ .Observe that (cid:88) i f i × B i = (cid:88) i ≤ N Y i . By Claim 5 we have Pr [ Y i = ] ≤ ( j − )! L j − for all i ≤ N .Thus, E (cid:34) (cid:88) i f i × B i (cid:35) = E  (cid:88) i ≤ N Y i  ≤ N max i Pr [ Y i = ] ≤ N ( j − )! L j − . (cid:3)(cid:3) Reminder of Theorem 3. If Vk ≤ NL (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) for ﬁxed (cid:15) and t , then except with probability exp (cid:16) − (cid:15) N (cid:80) ti = p i ( + (cid:15) ) (cid:17) , a rational adversary will crackat most (cid:80) i : f i >j f i + µ ( N, L, j ) user passwords where µ ( N, L, j ) = (cid:88) i : i ≤ j f i j − (cid:88) (cid:96) = (cid:18) N − (cid:19) (cid:18) (cid:19) (cid:96) (cid:18) NL − (cid:19) N − (cid:96) − . Proof of Theorem 3:

Given N independent sam-ples pwd , . . . , pwd N ← X we use Pop t = { pwd , . . . , pwd t } to denote the t most common passwordsfrom these samples and let X i be an indicator variable forthe event pwd i ∈ Pop t . Claim 7. N (cid:88) i = X i ≤ t (cid:88) i = f i . Proof :

Let pwd , . . . , pwd t , . . . denote the list of ob-served passwords ordered by observed frequency. Let i >. . . > i t be given such that Pop t = { pwd i , . . . , pwd i t } .Now we have N (cid:88) j = X j = t (cid:88) j = f i j ≤ t (cid:88) j = f j . (cid:3) . Claim 8.

We have t (cid:88) i = p i ≤ ( + (cid:15) ) N N (cid:88) i = X i except with probability Pr (cid:104) N (cid:88) i = X i ≤ N1 + (cid:15) t (cid:88) i = p i (cid:105) ≤ exp (cid:32) − (cid:15) N (cid:80) ti = p i ( + (cid:15) ) (cid:33) . Proof :

Since Pr (cid:104) X i = (cid:105) = (cid:80) ti = p i , then E (cid:104) (cid:80) Ni = X i (cid:105) = N (cid:80) ti = p i . Then applying Chernoffbounds, Pr (cid:104) N (cid:88) i = X i ≤ N1 + (cid:15) t (cid:88) i = p i (cid:105) ≤ exp (cid:32) − (cid:15) N (cid:80) ti = p i ( + (cid:15) ) (cid:33) . (cid:3) Claim 9.

With high probability, MC ( t ) ≥ (cid:32) − + (cid:15)N t (cid:88) i = f i (cid:33) k. Proof :

By Claims 7 and 8, t (cid:88) i = p i ≤ + (cid:15)N t (cid:88) i = f i . he proof follows from the observation that MC ( t ) = (cid:16) − (cid:80) t − = p i (cid:17) k and p t ≥ . (cid:3) Now, we deﬁne ^ p i = f i /N as a ( j, L ) - bad underestimate if p i > , but f i ≤ j . Then deﬁne C i as the indicatorvariable for the event that ^ p i is a ( j, L ) -bad underestimateand f i ≥ . Claim 10. If Vk ≤ NL (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) then the numberof user passwords in our dataset that a rational adversarycracks is at most (cid:88) i : f i ≥ j f i + (cid:88) i f i × C i . Proof :

Suppose that a user selects a password pwd i with p i ≤ . Since Vp i < (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) k ≤ MC ( t ) the marginal reward of guessing pwd i never exceeds themarginal cost. Thus, a rational attacker never chooses toguess pwd i . If p i > then either f i > j or ^ p i is a ( j, L ) -bad underestimate of p i . Let S denote the set of users whopicked a password i such that f i > j and let T ⊆ S denotethe set of users whose password got a ( j, L ) -bad underesti-mate. Only the users in the set S ∪ T may be compromisedeventually. Thus, at most | S ∪ T | ≤ (cid:80) i : f i >j f i + (cid:80) i f i × C i since | T | = (cid:80) i : f i ≤ j f i × C i and | S | = (cid:80) i : f i >j f i . (cid:3) Then the following immediately holds, noting that there canbe at most NL passwords which are ( j, L ) -bad underesti-mates: Corollary 11. If Vk ≤ NL (cid:16) − + (cid:15)N (cid:80) ti = f i (cid:17) then thenumber of user passwords in our dataset that a rationaladversary cracks is at most (cid:88) i : f i ≥ j f i + i + NL (cid:88) i f i . Claim 12. Pr [ C i f i ≥ ] ≤ j − (cid:88) (cid:96) = (cid:18) N − (cid:19) (cid:18) (cid:19) (cid:96) (cid:18) NL − (cid:19) N − (cid:96) − Proof :

Recall that for C i = , we require p i > ≥ jN but f i ≤ j . Then for j < N/2 ,Pr [ C i f i ≥ ] = j − (cid:88) (cid:96) = (cid:18) N − (cid:19) p (cid:96)i ( − p i ) N − (cid:96) − ≤ j − (cid:88) (cid:96) = (cid:18) N − (cid:19) (cid:18) (cid:19) (cid:96) (cid:18) NL − (cid:19) N − (cid:96) − (cid:3) Claim 13. (cid:80) i : i ≤ j f i × E [ C i f i ≥ ] . Proof :

Follows immediately from Claim 12 by substitutinginto E [ C i f i ≥ ] in the above sum. (cid:3)(cid:3) Fig. 6: Yahoo! CDF-Zipf Fittings log ( τ ) v ( × ) RockYouCSDNYahoo! v $ = $ (estimate) v $ = $ (estimate)Dropbox τ NIST τ (min)AshleyMadison τ LastPass ττ = (1sec) (a) v/k = T ( y, r, 1 ) for RockYou, CSDN andYahoo!

10 15 20 2501020304050 log ( τ ) v $ RockYouCSDNYahoo! v $ = $ (estimate) v $ = $ (estimate) τ = (1sec)Dropbox τ AshleyMadison τ NIST τ (min)LastPass τ (b) v $ vs. τ for v = k × T ( y, r, 1 ) . log ( T(y,r,1) ) v $ NIST ( k = )AshleyMadison ( k = )Dropbox ( k = )LastPass ( k = ) v $ = $ $ = $ RockYouCSDNYahoo! (c) v $ versus T ( y, r, 1 ) when v = k × T ( y, r, 1 ) ,at ﬁxed values of k Fig. 7: No Diminishing Returns ( a =1