DeepMnemonic: Password Mnemonic Generation via Deep Attentive Encoder-Decoder Model
11 DeepMnemonic: Password Mnemonic Genera-tion via Deep Attentive Encoder-Decoder Model
Yao Cheng, Chang Xu, Zhen Hai*, and Yingjiu Li
Abstract —Strong passwords are fundamental to the security of password-based user authentication systems. In recent years, mucheffort has been made to evaluate the password strength or to generate strong passwords. Unfortunately, the usability or memorability ofthe strong passwords has been largely neglected. In this paper, we aim to bridge the gap between strong password generation and theusability of strong passwords. We propose to automatically generate textual password mnemonics, i.e., natural language sentences,which are intended to help users better memorize passwords. We introduce
DeepMnemonic , a deep attentive encoder-decoderframework which takes a password as input and then automatically generates a mnemonic sentence for the password. We conductextensive experiments to evaluate DeepMnemonic on the real-world data sets. The experimental results demonstrate thatDeepMnemonic outperforms a well-known baseline for generating semantically meaningful mnemonic sentences. Moreover, the userstudy further validates that the generated mnemonic sentences by DeepMnemonic are useful in helping users memorize strongpasswords.
Index Terms —Password mnemonic generation, deep neural network, attention mechanism, neural machine translation. (cid:70)
NTRODUCTION N OWADAYS , user authentication is the key to ensuringthe security of user accounts for most online services,such as social media, e-commerce, and online banking.Although various authentication schemes have emergedin recent years, e.g., pattern-based or biometric-based au-thentication, the password-based authentication remains aprevailing choice in most real-world applications, whosesecurity relies on the difficulty in cracking passwords.Choosing strong passwords becomes extremely importantand necessary.In reality, service providers often present passwordpolicies to aid users in creating strong passwords. Suchpolicies may require that a password be longer than a pre-defined minimum length, or contain multiple types of characters(letters, numbers, and special characters) . Such policies areexpected to guide the generation of passwords that areresistant to password attacks [1], but users tend to choosepasswords that are easy to memorize in practice [2]. Asa result, password policies may not be as effective asexpected [3].Much effort has been made to evaluate the strength ofpasswords [4] [5] [6] or to generate strong passwords [7][8] [9]. Various methods, on the other hand, have beenpresented to crack passwords, assess whether passwordsare sufficiently strong or not, or reduce password guess-ability [10] [11] [12]. However, no rigorous effort has been • Yao Cheng is with Huawei International, Singapore. E-mail:[email protected] • Chang Xu is with Data61, CSIRO, Australia. E-mail:[email protected] • *Corresponding author. Zhen Hai is with Institute for Infocomm Research,A*STAR, Singapore. E-mail: [email protected] • Yingjiu Li is with Department of Computer and Information Science,University of Oregon, Eugene, Oregon, United States. E-mail:[email protected] submitted on June 3, 2019; accepted on April 2, 2020. made to address the memorability of strong passwords.Strong passwords are usually difficult for users to memorizebecause their entropy values are beyond many users’ mem-orability [13] [14] [15]. The memorability of strong passwordhas become one of the biggest hindrances to the wideadoption of strong passwords in real-world applications.One possible approach to addressing this problem isto generate passwords that are not only secure but alsoeasy to remember. Different strategies have been applied togenerating word-based memorable passwords, such as pro-nounceable passwords [16] [17], meaningful passwords [18],and passwords concatenating random words [19]. Unfor-tunately, some of such strategies were demonstrated to bevulnerable to certain attacks [18] [20], and moreover, thereis no theoretical guarantee that the generated passwords areindeed strong. In addition, various user-defined rules wereadopted for the generation of expression-based memorablepasswords [21] [20] [22]. One limitation of this approachis that the strength of the generated passwords largelydepends on the uncertainty of the phrases chosen by users.This approach also tends to suffer from the low quality ofthe generation rules [22].Instead of generating strong and memorable passwords,recent effort has been made to assist users in rememberingpasswords using external tools, such as helper cards [23],hint images [24] [25] and engaging games [26]. In particular,Jeyaraman and Topkara [27] proposed a heuristic methodthat relies on textual hint headlines to deal with thememorability issue of strong passwords. Given a password,they proposed to search and find an existing natural languageheadline that suggests the password from a given corpus(i.e., Reuter Corpus Volume 1). If the search is unsuccessful,they would then use an external semantic lexicon named
WordNet [28] to replace certain words in an existing headlinewith their synonyms. In this way, they could find a headlineor a variant headline and use it as a hint for memorizing a a r X i v : . [ c s . CR ] J un given password. This approach is subject to the followinglimitations: (i) It can only handle the passwords composedof alphabetic characters. Manual intervention is needed totackle digits or special characters, which are non-trivialfor the composition of strong passwords in reality. (ii)Meaningful hint headlines are often missing by simplysearching the given corpus while creating variant headlinesat the word level by using an external lexicon may result insyntactically incorrect or semantically inconsistent headlinesentences. (iii) It only works for short passwords that consistof 6 or 7 characters due to the limited lengths of theheadlines in the corpus.In this work, given strong passwords, we propose toautomatically generate mnemonics , i.e., natural languagesentences, to bridge the gap between security and mem-orability of the passwords. Specifically, we introduce anew mnemonic generation system named DeepMnemonic ,which is capable of generating a semantically meaningfulmnemonic sentence for each given textual password. Thecore of DeepMnemonic is an attentive encoder-decoderneural network. It learns to transform any given userpassword into a mnemonic sentence by first encodingthe password character sequence and then decoding theencoded information into the mnemonic sentence via aspecially designed attention strategy. Both encoder anddecoder are implemented with recurrent neural networksthat can capture the contextual dependency among pair-wise input passwords and mnemonic sentences. The keyinsight of DeepMnemonic is inspired by a cognitive psy-chology theory [29], which states that the human ability tomemorize and recall certain information is positively influ-enced by associating additional semantic content with theinformation [27]. The natural language mnemonic sentencesgenerated by DeepMnemonic can provide such semanticallymeaningful content for users to remember and recall thegiven strong passwords.Different from the existing textual password hint sys-tems [27], DeepMnemonic enjoys several unique properties:(i) Feature-free: DeepMnemonic is an end-to-end solution,which aims to directly map a given textual password to itscorresponding mnemonic sentence, and no manual featureengineering is needed in the learning process. (ii) Long-password-friendly: The attention-based recurrent neuralnetwork component in DeepMnemonic is capable of identi-fying salient information in long passwords for generatingsemantically meaningful sentences. (iii) Learning-adaptive:The learning of DeepMnemonic is fitted to different typesof password-sentence training pairs so as to process diver-sified passwords and meet various mnemonic generationrequirements.In summary, we have made the following main contri-butions in this work: • We introduce DeepMnemonic, a mnemonic sentencegeneration system, to help users rememberstrong passwords. DeepMnemonic utilizes anencoder-decoder neural network model to generatemeaningful mnemonic sentences for given strongpasswords. • We quantitatively evaluate the capability ofDeepMnemonic in mnemonic sentence generation. Our experimental results show that DeepMnemonicachieves 99.09% MP (Mnemonic Proportion) and16.47 BLEU-4 (BiLingual Evaluation Understudy),outperforming an n-gram language model baseline(83.62% MP and 5.09 BLEU-4). • We conduct a user study to qualitatively evaluate thehelpfulness of DeepMnemonic. The results demon-strate that DeepMnemonic helps users with 54.47%decrease in time spent on remembering passwordsand 57.14% decrease in recall error measured bythe edit distance between each pair of the givenpassword and its recalled version.The rest of this paper is organized as follows. Section 2describes the password mnemonic generation problemand its application scenario. Section 3 introducesDeepMnemonic, a deep attentive encoder-decodersystem that can generate mnemonic sentences for thegiven textual passwords. In Section 4 and Section 5, weevaluate DeepMnemonic quantitatively and qualitatively inexperiment and user study, respectively. Section 6 discussesseveral usability issues of DeepMnemonic. Section 7summarizes the related work, and Section 8 concludes thispaper.
ROBLEM S TATEMENT AND A PPLICATION S CE - NARIO
Given a strong textual password that often consists ofrandom characters, the aim of this work is to assist commonusers in remembering the password properly. Accordingto a famous cognitive psychology theory [29], in order tomemorize a piece of information, it is helpful to associatesome external tools or contextual contents with the informa-tion. More specifically, the cognitive psychology theory [29]explains that the information to be remembered can be a listof items, for example, a list of characters in a password. Forthe semantic content, it is not limited to a vivid image rep-resenting, symbolizing, or suggesting each item of semanticinformation. It indicates that, in order to better memorizea piece of information, users can construct such semanticcontent and associate it with the information. Inspiredby the theory, we propose DeepMnemonic, an intelligentsystem that helps users construct the semantic content bysuggesting a meaningful mnemonic sentence correspondingto each character in the password. For example, given atext password
Tcahm,“Wac?” to be remembered, a semanticcontent can be
The child asked her mother,“ Who are children?” ,where the first letter of each word in the mnemonic sentenceis suggesting and corresponding to each character in thepassword.Technically, the problem of password mnemonic gener-ation is closely related to the machine translation problemin natural language processing. Both tasks aim to addressa sequence-to-sequence learning problem. Specifically, inmachine translation, the goal is to generate a translatedsentence in a target language given a sentence in a sourcelanguage, whereas in our case, we focus on generatinga mnemonic sentence for a given password (which is asequence of characters). We propose to exploit a neuralsequence-to-sequence language model [30] to translate a Strong Password GeneratorDeepMnemonic1. Password creationrequest 2. Password creationrequest 3. Password generation request . M n e m o n i c g e n e r a t i o n r e qu e s t . M n e m o n i c Fig. 1: An applicable scenario of DeepMnemonic in the password-based authentication system.given password (a character sequence) into a meaningfulmnemonic sentence (a word sequence).Formally, let D be a set of N pairs of password andmnemonic sentence ( X n , Y n ) ( n ∈ { , · · · , N } ), where X n denotes a password of T characters, X n = x , x , · · · , x T ,where the character x t ( t ∈ { , · · · , T } ) can be alphabetic,punctuation, digital, and special characters, while Y n refersto a mnemonic sentence of T dictionary terms, Y n = y , y , · · · , y T , where the term y t ( t ∈ { , · · · , T } ) can bewords, punctuation marks, numbers, and special tokens. Inthe following, we use words or tokens to refer to dictionaryterms. For each pair of password and mnemonic sentence, aone-to-one mapping relationship exists between the char-acters of the password and the words of the mnemonicsentence, where each character of the password may appearat certain position of the corresponding mnemonic word.For example, given a text password O,y,slt. , a mnemonicsentence can be
Oh, yes, something like that. , where eachalphabetic character of the password corresponds to the firstletter of respective words in the mnemonic sentence.We formulate the mnemonic sentence generation asa natural language generation problem. Given a (strong)password X n , the problem is to automatically generatea semantically coherent and meaningful mnemonic sen-tence Y n of the same length as X n . Our solution, named DeepMnemonic , employs a neural attentive encoder-decoderlanguage model [30] to generate textual mnemonic sen-tences for passwords. DeepMnemonic builds on a sequence-to-sequence learning framework, which encodes an inputpassword (i.e., a sequence of characters) and automaticallygenerates a mnemonic sentence (i.e., a sequence of words)based on the encoded contextual information.Figure 1 shows an applicable scenario of DeepMnemonicin the conventional password-based authentication system.When a user requires to create a password for registration(step 1 and step 2), the authentication service requests thepassword generator to generate a strong password (step3). Typically, the generated password may not be easyfor the user to remember due to the randomness of thepassword [13]. To improve the memorability of the strongpassword, the password generator requests DeepMnemonicto produce a mnemonic sentence corresponding to thegenerated password (step 4). Once the password generatorreceives the response from DeepMnemonic (step 5), it re-turns the generated password as well as the correspondingmnemonic sentence to the service (step 6). Finally, theservice distributes the password and mnemonic informationto the user via a secure channel (step 7).Notice that DeepMnemonic does not modify the strong passwords generated by the password generator, and hencedoes not compromise the security of the generated pass-words, under the assumption that users keep both pass-words and associated mnemonic sentences secure. Thepasswords’ resilience to existing password attacks, suchas brute force attacks, dictionary attacks, and guessingattacks, relies on strong password generation which hasbeen well studied independently of our research. DeepM-nemonic does not generate strong passwords but focuseson the usability of strong passwords. It can generate acorresponding mnemonic sentence for the password so asto overcome the hindrance to the practical use of strongpassword generation techniques.
ETHODOLOGY
DeepMnemonic applies a deep attentive encoder-decoderlearning model [30] to learn a sequence-to-sequence map-ping from an input password (i.e., a character sequence)to an output mnemonic sentence (i.e., a word sequence).The encoder of DeepMnemonic captures the underlyingcontextual meaning of a password from its sequence ofcharacters, while the decoder generates a correspondingmnemonic sentence based on the encoded content of pass-word. Typically, the lengths of the given passwords are vari-able, and strong passwords may consist of long sequencesof random characters. DeepMnemonic may lose focus inits process, if it treats all characters of an input passwordequally. To tackle this issue, DeepMnemonic exploits anattention mechanism [30] to dynamically determine whichparts of an input password are more relevant to generatinga semantically meaningful mnemonic sentence.Figure 2 shows the encoder-decoder learning frameworkof DeepMnemonic. The encoder first takes as input a textualpassword X n of length T , X n = ( x , x , · · · , x t , · · · , x T ) , where x t ∈ R V C , t ∈ { , · · · , T } denotes a character, and V C is the size of the input character vocabulary . It derivesa hidden context-aware summary or “meaning” of thesequence of characters from input password via a bidirec-tional recurrent neural network (BiRNN) [31]. Based on theencoded hidden “meaning” of the password, the decoderof DeepMnemonic utilizes an attentive mechanism [30] togenerate the corresponding mnemonic sentence Y n word byword, Y n = ( y , y , · · · , y t , · · · , y T ) , where y t ∈ R V W denotes a word, and V W is the size of theoutput word vocabulary . x x x x t x T c t’ α t’1 α t’2 α t’3 α t’t α t’T s t’ s t’-1 y t’-1 y t’ s s s s y y y h t h T h h h ... ...... ......... ...... c ... ... ...... Fig. 2: The encoder-decoder learning framework ofDeepMnemonic.
Given an input password X n , we employ a BiRNN to derivethe hidden semantic representation as the meaning of thepassword. BiRNN consists of both forward and backwardprocesses, where the forward process reads the passwordcharacter sequence in the original order, while the backwardprocess reads the sequence in the reverse order. BiRNNcan thus capture contextual patterns in both directions (leftand right) for each password character by summarizing theinformation not only from the preceding characters but alsofrom the following ones in the sequence.Formally, by using the gated recurrent units (GRU) [32],which is a popular choice for modern RNNs, the forwardprocess of BiRNN computes the hidden state h t at a giventime step t as follows. First, the forward reset gate (cid:126)r t iscomputed as, (cid:126)r t = σ ( (cid:126)W ( r ) E ( C ) x t + (cid:126)U ( r ) (cid:126)h t − ) , (1)where σ is the sigmoid function, E ( C ) ∈ R m × V C is thecharacter embedding matrix for the password characters, (cid:126)W ( r ) ∈ R n × m and (cid:126)U ( r ) ∈ R n × n are trainable parametermatrices, and m and n refer to the dimensionalities ofthe character embedding and the hidden state vector,respectively.Similarly, the forward update gate (cid:126)z t is computed as, (cid:126)z t = σ ( (cid:126)W ( z ) E ( C ) x t + (cid:126)U ( z ) (cid:126)h t − ) , (2)where (cid:126)W ( z ) ∈ R n × m and (cid:126)U ( z ) ∈ R n × n are trainableparameter matrices.Then, the forward hidden state vector (cid:126)h t is computed as, (cid:126)h t = (1 − (cid:126)z t ) ◦ (cid:126)h t − + (cid:126)z t ◦ (cid:126) ˜ h t , (3)where (cid:126) ˜ h t = tanh( (cid:126)W E ( C ) x t + (cid:126)U [ (cid:126)r t ◦ (cid:126)h t − ]) , both (cid:126)W ∈ R n × m and (cid:126)U ∈ R n × n are trainable parameters,and ◦ refers to an element-wise multiplication. Following the aforementioned steps, the backward hid-den state (cid:126)h t can be computed reversely for any given timestep t . Note that the character embedding matrix E ( C ) isshared for both forward and backward processes of BiRNN.Next, we concatenate both forward and backward hiddenstates (cid:126)h t and (cid:126)h t for each time step t , and derive a set ofoverall hidden states ( h , · · · , h T ) , where h t = (cid:34) (cid:126)h t (cid:126)h t (cid:35) . (4) Based on the encoded hidden states ( h , · · · , h T ) of theinput password, the decoder of DeepMnemonic employsan attention mechanism to evaluate the importance ofindividual hidden states. Then, it derives an overall context-aware representation of the hidden states in order togenerate each target word in the corresponding mnemonicsentence.In particular, for each encoded hidden state h t at timestep t , the decoder first computes an attentive weight α t (cid:48) ,t with regard to the contextual state s t (cid:48) − at time step t (cid:48) , α t (cid:48) ,t = exp( e t (cid:48) ,t ) (cid:80) Tτ =1 exp( e t (cid:48) ,τ ) , (5)where e t (cid:48) ,t = v ( a ) T tanh( W ( a ) s t (cid:48) − + U ( a ) h t ) is an alignment model that evaluates the relevance betweenthe input hidden state h t and the previous output hiddenstate s t (cid:48) − . The v ( a ) ∈ R n (cid:48) , W ( a ) ∈ R n (cid:48) × n , and U ( a ) ∈ R n (cid:48) × n are trainable parameters, and n (cid:48) refers to thedimensionality of the hidden state vector of the attentionlayer.Then, the attentive context vector c t is computed via aweighted sum, c t (cid:48) = T (cid:88) t =1 α t (cid:48) ,t h t . (6)We can then compute the hidden state s t (cid:48) of the decoderat time t (cid:48) given the hidden states of encoder via a decoderGRU as below, s t (cid:48) = (1 − z t (cid:48) ) ◦ s t (cid:48) − + z t (cid:48) ◦ ˜ s t (cid:48) , (7)where ˜ s t (cid:48) = tanh( W E ( W ) y t (cid:48) − + U [ r t (cid:48) ◦ s t (cid:48) − ] + Cc t (cid:48) ) ,z t (cid:48) = σ ( W ( z ) E ( W ) y t (cid:48) − + U ( z ) s t (cid:48) − + C ( z ) c t (cid:48) ) ,r t (cid:48) = σ ( W ( r ) E ( W ) y t (cid:48) − + U ( r ) s t (cid:48) − + C ( r ) c t (cid:48) ) . (8)In the above formulas, E ( W ) ∈ R m × V W is the wordembedding matrix for the output mnemonic sentence; W, W ( z ) , W ( r ) ∈ R n × m , U, U ( z ) , U ( r ) ∈ R n × n and C, C ( z ) , C ( r ) ∈ R n × n are all trainable parameters. m and n are the dimensionalities of the word embedding andthe hidden state vector, respectively. Note that the initialhidden state of decoder s is computed as, s = tanh( W ( s ) (cid:126)h ) , (9) where W ( s ) ∈ R n × n is the parameter matrix.Next, given the decoder state s t (cid:48) , the attentive contextvector c t (cid:48) , and the previous generated word y t (cid:48) − , wefollow [30] and define the probability for generating thetarget word y t (cid:48) as follows, p ( y t (cid:48) | s t (cid:48) , c t (cid:48) , y t (cid:48) − ) ∝ exp( y Tt (cid:48) − W ( o ) l t (cid:48) ) , (10)where l t (cid:48) = [max { ˜ l t (cid:48) , k − , ˜ l t (cid:48) , k } ] Tk =1 , ··· ,K , and W ( o ) ∈ R V W × K . The hidden state ˜ l t (cid:48) is computed asfollows, ˜ l t (cid:48) = U ( o ) s t (cid:48) + V ( o ) E ( W ) y t (cid:48) − + C ( o ) c t (cid:48) , (11)where U ( o ) ∈ R K × n , V ( o ) ∈ R K × m , and C ( o ) ∈ R K × n are trainable parameters. XPERIMENTS
We conducted quantitative experiments on a large-scalepublicly available dataset to evaluate DeepMnemonic formnemonic sentence generation. We also carried out a casestudy to understand the influence of password lengths anddigital/special characters on the utility of DeepMnemonic.In addition, we provided a visualized understanding of theattention mechanism in DeepMnemonic.
The “Webis-Simple-Sentences-17” is a publicly availabledata source for analyzing expression-based passwords , and thesentences in the dataset are similar to the human-chosenmnemonics in terms of syllable distribution [22]. Afterfiltering out those sentences that contain non-English words,we derived a password from each remaining sentenceby concatenating the first letter of each word and therest special characters (including punctuation marks andnumerical digits) in the original order as shown in thesentence. For example, a password
O,y,slt. can be derivedfrom the following sentence,
Oh, yes, something like that.
Inthis way, we collected totally 500,000 pairs of passwordsand sentences, forming a ground-truth dataset for buildingDeepMnemonic. It is worth noting that, in creating theground truth data, it is also possible to derive a passwordby concatenating the last (or a middle) letter of each wordand the rest special characters in each sentence.In preprocessing the ground-truth data, all words of eachsentence were converted to lowercase, which was helpful toreduce the size of vocabulary for the language generation.In particular, by following the given passwords, we caneasily restore the corresponding words of the generatedmnemonic sentences to uppercase wherever applicable. Inaddition, extra symbols, such as < s > and < /s > , wereinserted at the start and the end of each sentence to indicateits boundary. A special unknown symbol, i.e., < UNK > , wasincluded in the vocabulary, and would be used for thegeneration of sentences when no appropriate words canbe predicted. In the pairwise ground-truth dataset, theminimum and maximum lengths of passwords are 8 and16, respectively, which are compatible with the passwordlength requirements in many authentication systems. Among the ground truth dataset, we randomly selected450,000 pairs as training data, and used the rest 50,000 fortesting. Among the training data, 20% of the pairs wererandomly held out as validation data for model selection.After preprocessing, we obtained a vocabulary of 109,584words (tokens) for the mnemonic sentence generation task. In DeepMnemonic, the hidden layer sizes of both encoderand decoder were set to 256, which were optimally selectedusing the validation set. The dropout strategy has been pre-viously shown effective to prevent neural network modelsfrom overfitting [33] and the dropout rate was set to 0.2 inour experiments.Note that no extra information, e.g., pre-defined gen-eration rules, is required beyond training data. Once thetraining process is done, the DeepMnemonic can be used toautomatically generate a mnemonic sentence for any givenpassword.
Simply generating the best word that achieves the highestpredictive probability at each time step may not alwaysresult in an overall semantically meaningful sentence inpractice. Therefore, a left-to-right beam search strategy [34]was employed to find the most likely mnemonic sen-tence [35] [30] for each given password.In particular, the beam search based decoder stores apredetermined number b (beam width) of partial sentences,where each partial sentence is a prefix of a candidatemnemonic. At each time step, each partial sentence in thebeam grows with a possible mnemonic word from thedecoder vocabulary. Clearly, this process would greatlyincrease the number of candidate sentences. To overcomethis issue, only b most likely candidates are maintainedin terms of their predicted probabilities. Once the end ofsentence symbol is appended to a candidate mnemonic,it is included in the set of full mnemonic sentences. Ingeneral, the wider the beam width b is, the more candidatemnemonics the decoder searches for, and thus the betterresults could be achieved. The n -gram language model is widely known as one of thedominant methods for probabilistic language modeling [36].As a non-parametric learning method, it primarily utilizesthe preceding sequence of n − words to estimate aconditional probability for the prediction of the currentword. Various values of n were tested for n -gram, and the bigram language model ( n = 2 ) achieved decent generationperformance and was chosen as the baseline to benchmarkthe proposed DeepMnemonic.The bigram language model depends on the precedingword for estimating the conditional probability, and thengenerates the current word that has the highest probability.Therefore, the bigram model is unable to automaticallyfigure out the relationship between a given password andits mnemonic sentence, i.e., the correspondence between thealphabetic characters of the password and the first lettersof mnemonic words in the sentence. To properly apply the bigram language model to password mnemonic generation,we manually adopted a pre-defined generation rule. Specif-ically, to generate a mnemonic word, it is required that theword not only achieve the highest conditional probability(given its preceding word), but its first letter also mustbe identical to the corresponding character in the givenpassword. Two metrics were used to evaluate the quality of generatedmnemonic sentences via DeepMnemonic and the baselinemethod. One metric is BLEU (BiLingual Evaluation Un-derstudy), which is one of the most popular metrics forevaluating the quality of machine translation task in naturallanguage processing [37]. Following [30], we used BLEU toevaluate the quality of the generated mnemonic sentenceswith respect to ground-truth sentences in the test set, wherethe quality refers to the correspondence between each pairof the generated sentence and the ground-truth sentence. Inother words, the closer the generated mnemonic sentenceis to the ground-truth sentence, the more meaningful andconsistent it is. BLEU- n is defined as follows (the higher, thebetter). BLEU - n = B · exp ( 1 N (cid:48) N (cid:48) (cid:88) n =1 logp n ) , where N (cid:48) is the maximum length of n -grams and N (cid:48) = 4 was used in the experiments. B refers to brevity penalty,and is computed as B = (cid:26) if c > re − r/c if c ≤ r (cid:27) , where c is the size of the generated mnemonic set, and r is the size of the ground truth sentence set. The modifiedn-gram precision p n is computed as p n = (cid:80) C ∈H (cid:80) n-gram ∈ C Count clip ( n-gram ) (cid:80) C (cid:48) ∈H (cid:80) n-gram (cid:48) ∈ C (cid:48) Count ( n-gram (cid:48) ) , where H is the set of generated mnemonic sentences, Count ( n-gram (cid:48) ) is the number of n-gram (cid:48) s in a generatedmnemonic, and Count clip ( n-gram ) is the clipped numberof the n-gram of a generated mnemonic with regard to thecorresponding ground-truth mnemonic sentence.The other metric, Mnemonic Proportion (MP), is specif-ically defined for testing the matching proportion betweenpair-wise passwords and mnemonic sentences. Specifically,given each pair of password X n and generated mnemonicsentence Y n , MP calculates the proportion of the caseswhere each word y t of mnemonic Y n does start with thecorresponding character x t in password X n , as shownbelow: M P = (cid:80) Nn =1 M P n N , where N is the size of the test set. Then M P n is defined as, M P n = |{ y t | y t ∈ Y n and F irstLetter ( y t ) = x t }|| X n | where | X n | is the length of password X n , and F irstLetter ( · ) is a function that returns the first letterof a word. This section reports the results of mnemonic sentencegeneration via DeepMnemonic and the baseline model interms of MP and BLEU. DeepMnemonic ran on Nvidia TeslaP100 GPU with 16GB memory. Its training process with450,000 training data pairs took around 15 hours, and theinference on the entire test data set with 50,000 passwordstook about 5 minutes.
Table 1 shows the MP results of the DeepMnemonic andbigram language model (the higher, the better). As we cansee, DeepMnemonic achieves much better results comparedto Bigram over different beam widths.TABLE 1: The MP results of DeepMnemonic and bigramlanguage model (Bigram) given beam width ( b ) of 1 and 5. DeepMnemonic Bigram b = 1 b = 5 Given beam width b = 1 , DeepMnemonic attains anMP value of 99.09%, while Bigram only achieves 83.62%MP. When the beam width increases to 5, the Bigram MPincreases up to 98.44%, but is still lower than DeepM-nemonic. Surprisingly, increasing the beam width does notlead to significant gain to DeepMnemonic. This suggeststhat DeepMnemonic can achieve a high MP value given asmall beam width b = 1 , and it is not sensitive to the choiceof width of the beam search. This section reports the BLEU- n scores for the gener-ated sentences with regard to ground-truth mnemonicsentences [37] [35]. Table 2 lists the BLEU- n results withthe order n ∈ { , , , } ( N (cid:48) = 4 ) for DeepMnemonicand Bigram given different beam width b ∈ { , } . Over-all, DeepMnemonic outperforms Bigram significantly andconsistently in all cases.TABLE 2: The BLEU scores of DeepMnemonic and bigramlanguage model (Bigram) given beam width ( b ) 1 and 5. DeepMnemonic Bigram b = 1 b = 5 b = 1 b = 5 BLEU-1 45.17 45.29 28.82 37.22BLEU-2 30.26 30.38 14.72 22.07BLEU-3 22.33 22.44 8.49 14.29BLEU-4 16.47 16.57 5.09 9.48
If beam width is fixed at b = 1 or b = 5 , the BLEU- n score for each model decreases as the order n grows from1 to 4. This is expected, as increasing the order n results ina stricter evaluation of BLEU for the generated sentences,which is consistent with the findings in [37]. In other words,not only are the words between each pair of the generatedand the ground-truth sentences required to be identical, butthe order of the words in the n -gram also needs to be exactlythe same. DeepMnemonic achieves the best BLEU-1 of 45.29 com-pared to the baseline Bigram given the beam width b = 5 .Roughly, this means that about 45 out of 100 generatedmnemonic words (unigrams) are identically matched tothe ground truth. In addition, the results again show thatDeepMnemonic is robust to the width of the beam search. Incontrast, the BLEU- n values (from 1 to 4) of Bigram improvesignificantly by increasing the beam width from b = 1 to b = 5 . But the best BLEU-1 (37.22) of Bigram at b = 5 is stilllower than DeepMnemonic at b = 1 .In addition, it is worth noting that BLEU- n is designedto measure the correspondence between each pair of thegenerated mnemonic sentence and ground-truth sentence.However, it is possible that a generated mnemonic sentenceis helpful to assist users in memorizing a given password,but it may have a low BLEU- n score if the generatedsentence does not match to its ground truth very well. Tomitigate this issue, Section 5 further conducts a user studyabout the helpfulness of DeepMnemonic, where participantsare invited to memorize and recall the assigned passwordsusing mnemonic sentences generated by DeepMnemonic. In this section, we conduct a case study to provide acomplementary understanding of the generated mnemonicsentences. The case study reveals the influence of pass-word lengths and digital/special characters on mnemonicgeneration. Moreover, we visualize the effectiveness ofthe attention mechanism in DeepMnemonic to explain itssemantic meaningfulness.Given a list of 10 randomly selected passwords of dif-ferent lengths, Table 3 shows the corresponding mnemonicsentences generated via DeepMnemonic and Bigram meth-ods. Note that the row “Original” shows the ground-truthsentences from our ground-truth dataset.
It can be observed that, for short passwords, all the sen-tences generated by either DeepMnemonic or Bigram matchall the password characters (i.e., 100% MP). However, asthe password length grows, the unknown token “ < UNK > ”begins to appear more frequently in the mnemonic sen-tences generated by Bigram compared to DeepMnemonic.The main reason is that, when generating a sentence, it issometimes difficult for the Bigram model to find a word thatnot only identically matches the given password characterbut also has the conditional probability greater than zerogiven the preceding word. For example, Bigram beginsto generate the unknown token “ < UNK > ” from the thirdposition onwards in Case 7. Given the first generated word“but”, the model identifies the next word “z2”, which hasthe highest probability and also starts with the character“z” of the given password. However, when continuing togenerate the next word based on “z2”, Bigram fails to findany words that start with the password character “e” andalso have the positive conditional probability. As a result,Bigram generates an unknown token instead. In contrast,DeepMnemonic does not have such issue, and for eachgiven password, it can complete the automatic generationof an entire meaningful sentence. Figure 3 plots the impact of password lengths on MPvalues at b = 5 . We can observe that for both Bigramand DeepMnemonic, MP values decrease as the passwordsbecome longer. The longer the passwords are, the more diffi-cult the task of generating semantically sensible sentences is.It is clear that the MP values of DeepMnemonic are alwaysbetter than those of Bigram at different password lengths,suggesting that DeepMnemonic generates better mnemonicsentences from the given passwords. For example, themnemonic sentence generated by DeepMnemonic in Case9 in Table 3 is more sensible and memorable than that byBigram, although both sentences match all characters of thegiven password. MP CHANGES WITH PASSWORD LENGTH
DeepMnmonic MP Bigram MP
Fig. 3: Impact of the password length on MP values ( % )given b = 5 . In many password generation systems, digital and specialcharacters play an important role in creating strong pass-words. DeepMnemonic is able to handle such characterswhen generating mnemonic sentences for strong passwords.As shown in Case 3 of Table 3, DeepMnemonic generatesa digital sequence “1937” from number “1” in the givenpassword, and then it concatenates previously generatedwords to form a meaningful expression, “in february 1937”,for the given password subsequence “iF1”. In contrast,Bigram generates the words “it for 1” for the same sub-sequence, which is less readable and not much helpfulfor remembering the password. Overall the mnemonicsentence generated by Bigram is not as meaningful as thesentence by DeepMnemonic. In Case 6, DeepMnemonicgenerates a sensible mnemonic phrase “75 causalities in24 hours” from a password segment “7ci2h”, and alsoproduces an overall semantically meaningful mnemonicsentence. Although Bigram generates a meaningful phrase“70 countries in 2006 have” from the same passwordsegment, unfortunately, it ends up with a grammaticallyincorrect and semantically inconsistent sentence.In general, it is challenging to handle special charactersof passwords such as “/” and “*” during generatingmnemonic sentences. For example, in Case 2, given aslash character “/” between “a” and “e” of the password,DeepMnemonic generates a sensible phrase “ author /editor”, where “/” normally represents “or”, while theBigram model outputs a strange phrase “a / etc” for thesame password segment. In Case 4 of Table 3, followingthe password segment “*g*”, DeepMnemonic generates areasonable phrase “*good*”. Though it is not identicalto the original one (“*giddy*”), semantically, this phrasecan be used to highlight the meaning of “good” in thegenerated sentence. However, Bigram fails to generate either
TABLE 3: Randomly selected passwords and the corresponding mnemonic sentences generated by DeepMnemonic andBigram (
Original means that the mnemonic sentences come from the ground-truth dataset).
No. Length Items ValuesCase 1 Short Password Y m s , b t o .Original you’ll miss some , but that’s okay .DeepMnemonic you may say , but that’s ok .Bigram you might say , but the other .Case 2 Short Password S I b a a / e ?Original should i buy a acoustic / electric ?DeepMnemonic should i be an author / editor ?Bigram so i bought at a / etc ?Case 3 Short Password T f i d i F 1 .Original this finding is diagrammed in figure 1 .DeepMnemonic the festival is due in february 1937 .Bigram the first i do it for 1 .Case 4 Medium Password B I j c d t * g * t .Original but i just can’t do the * giddy * thing .DeepMnemonic but i just can’t do the * good * thing .Bigram but i just can’t do they * go < UNK > <
UNK > <
UNK > Case 5 Medium Password T c a h m , “ W a c ? ”Original the child asks his mother , “ what are circles ? ”DeepMnemonic the child asked her mother , “ who are children ? ”Bigram they can also have more , “ we are called ? ”Case 6 Medium Password D t m t b h s 7 c i 2 h ?Original does this mean the book has sold 7 copies in 24 hours ?DeepMnemonic does this mean that because he sees 75 casualties in 24 hours ?Bigram during the more than before he said 70 countries in 2006 have ?Case 7 Medium Password B Z e p a l i h h t c h .Original but zeus ever pursued and longed in his heart to catch her .DeepMnemonic but zelotes ever pushed a line in his head to crucify him .Bigram but z2 < UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > <
UNK > Case 8 Medium Password T , I w l t r t l o K J .Original today , i would like to recognize the life of ken jablonski .DeepMnemonic today , i would like to read the letter of karl james .Bigram this , it was like to read the lord of knowledge < UNK > <
UNK > Case 9 Long Password D t s w i N - D , l a t a p .Original during the six weekends in new - delhi , lunch and tea are provided .DeepMnemonic during the second week in november - december , lakes awoke to a pond .Bigram during the same way is no - day , look at the air pollution .Case 10 Long Password A : Y , b t e n t b a i a l .Original ac : yes , but the egg needs to be altered in a lab.DeepMnemonic alan : yeah , but that’s exactly not the best album in a league .Bigram a : yes , but the early next to be an important as long . a short sensible phrase or an entire semantically meaningfulsentence.One more interesting example is the colon symbol “:” inCase 10, which is typically used to indicate the start of anutterance. Surprisingly, DeepMnemonic generates a name“alan” (Alan) for the character “A” in front of the colon“:”, and then generates an utterance for the subsequence ofcharacters following the colon symbol. Bigram fails again tohandle this special character.
Generally, a mnemonic sentence, which not only literallycovers the characters of the given password, but is also se-mantically meaningful, is more useful for users to memorizeevery password character properly.As shown in Case 8 of Table 3, DeepMnemonic generatesa mnemonic sentence, “today, i would like to read the letterof karl james.” from password “T,IwltrtloKJ.” Although thisis not identical to the ground-truth sentence, i.e., “today, i would like to recognize the life of ken jablonski.”, ourgenerated mnemonic sentence seems easier for users to fol-low and recall the corresponding password. To be specific,DeepMnemonic decodes the given password segment “KJ”as a name, “karl james”, and surprisingly, it turns out to be anew person name that does not even appear in the trainingdata. This demonstrates the ability of DeepMnemonic tocapture the semantic context of input passwords as wellas the relationship between each pair of password andmnemonic sentence. This also shows that the attentionmechanism in DeepMnemonic captures not only the fullview of an input password context but also its salient parts,both of which are useful for mnemonic word inference ateach generation step.Figure 4 visualizes the attention weights in the alignmentbetween each pair of input password (y-axis) and generatedmnemonic sentence (x-axis) for Case 5 and Case 6. Eachpixel denotes the attention weight α t (cid:48) ,t of the t -th passwordcharacter with regard to t (cid:48) -th target mnemonic word in < s > Tachm,“Wac?” t h e a s k e d c h il d h e r m o t h e r , “ w h a t a r e c h il d r e n ? ” < / s > (a) Case 5. < s > Dmttbhs7ci2h d o e s m e a n t h i s t h a t b e c a u s e h e s ee s c a s u a l t i e s i n h o u r s < / s > ? ? (b) Case 6. Fig. 4: The heat map of the attention weights α t (cid:48) ,t ingrayscale (0: black, 1: white)grayscale (0: black, 1: white). The brighter the pixel is, themore important the password character is to the generationof the corresponding mnemonic word. From the heat map,we can observe that DeepMnemonic concentrates on theimportant characters of an input password when it decodesindividual target mnemonic words in the generation phase.In addition, the attention layer also takes into accountthe contextual neighboring characters of each passwordfor mnemonic sentence generation. Thanks to the atten-tion mechanism, DeepMnemonic automatically learns thealignment and discovers which characters in the inputpassword are more important for generating a semanticallymeaningful mnemonic sentence. SER S TUDY
This section conducts a user study to analyze the usabilityof DeepMnemonic, and validates that the mnemonic sen-tences generated by DeepMnemonic are helpful for users tomemorize passwords.
Instruction for the user studyPassword memorization phase IPassword recall phase IPassword memorization phase IIPassword recall phase IIOnline questionnaire phaseInstructionMemorization Phase IRecall Phase I Memorization Phase IIRecall Phase IIOnline Questionnaire
Fig. 5: The user study procedure.
In this user study, we recruited 24 participants from univer-sities and institutes, and randomly divided them into threegroups, labeled as group 0 , group 1 , and group 2 . The lengthsof passwords assigned to the three groups were 8, 12, and16, respectively. This user study includes three main tasks,i.e., password memorization, password recall, and onlinequestionnaire. The password memorization and passwordrecall tasks were conducted face-to-face in our lab, wherewe gave task instructions and measured the participants’performance using a timer. Figure 5 illustrates the user study procedure. In Mem-orization Phase I without the aid of mnemonic sentences,each participant was asked to memorize an assigned pass-word p , where the time the participant used for memoriza-tion is t . Then, after a period of time, in Recall Phase I,each participant was asked to recall and write down theassigned password p , and the recalled password is r .In Memorization Phase II with the aid of the mnemonicsentences generated by DeepMnemonic, each participantwas asked to memorize a different assigned password p ,where the time the participant used for memorization is t . Later in the Recall Phase II, the recalled version of p is termed as r . Finally, the online questionnaire wasdesigned to evaluate the participants’ experiences aboutthe whole user study process. Following the forgettingcurve theory [38], we set the time gap between passwordmemorization and password recall to 48 hours . Figure 6 shows the average results over all participantswithin each group (i.e., with the same password length)with and without the aid of the generated mnemonicsentences by DeepMnemonic.Figure 6a shows the time costs in memorizing thepasswords without ( t ) and with ( t ) the aid of mnemonicsentences. Note that t includes the time for memorizing theassociated mnemonic sentence in addition to the password.It is observed that t is lower than t , especially formemorizing longer passwords. We conducted a statisticalt-test which can work well with our small set of samplesthat approximate the normal distribution [39] [40]. Thedifference between the two sets of time costs is statis-tically significant at a significance level α = 0 . . Thisindicates that, by using the mnemonic sentences generatedby DeepMnemonic, the participants are able to memorizethe passwords more quickly than without the aid of anymnemonics. For group 2 , who memorized passwords oflength 16, the observed difference is more significant, i.e.,110 seconds versus 42.6 seconds with p value = 0 . .Clearly, DeepMnemonic shows its capability of assisting inmemorizing passwords more effectively, and is especiallyhelpful for memorizing relatively long passwords.Figure 6b evaluates the password recall in terms of edit distance between each pair of recalled password r and assigned password p (the smaller, the better). Overallthe average edit distance with the aid of DeepMnemonicis smaller than that without using DeepMnemonic. Thedifferences become even larger when the password lengthsbecome longer. For group 2 where the password length is 16,the average edit distance with the aid of DeepMnemonic issignificantly smaller than that without the aid. Our statisti-cal test shows that the difference is significant given the level α = 0 . . However, for shorter passwords in group 0 wherepassword length is 8, the average edit distances with andwithout using DeepMnemonic are almost the same (0.75).A possible explanation is that, when a given password isshort, the efforts used to remember the generated mnemonicsentence and the password itself are comparable. In such
1. The detailed user study design as well as the ethical considerationcan be found at https://goo.gl/KtwuGB t1 t2 d1 d265 43 0 1194 16 4 3104 20 1 065 42 3 1201 58 8 149 42 4 095 23 6 0107 97 3 2110 42.625 3.625 1Group 2t-Test: Two-Sample Assuming Unequal Variances t-Test: Two-Sample Assuming Unequal Variances Variable 1 Variable 2 Variable 1 Variable 2
Mean 110 42.625 Mean 3.625 1Variance 3339.714 685.6964 Variance 6.553571 1.142857Observations 8 8 Observations 8 8Hypothesized Mean Difference0 Hypothesized Mean Difference0df 10 df 9t Stat 3.003576 t Stat 2.676268P(T<=t) one-tail0.006631 P(T<=t) one-tail0.01268t Critical one-tail1.812461 t Critical one-tail1.833113P(T<=t) two-tail0.013262 P(T<=t) two-tail0.025361t Critical two-tail2.228139 t Critical two-tail2.262157 M e m o r i z a t i o n T i m e ( s ) Without DeepMnemonic (t₁)With DeepMnemonic (t₂) (a) Memorization time cost
Group 0 Group 1 Group 2without with without with without withtime 10.875 9.75 55.375 27.875 110 42.625edit distance 0.75 0.75 0.875 0.5 3.625 1Without DeepMnemonic (t ₁ )With DeepMnemonic (t ₂ )8 charecters 10.875 9.75t-Test: Two-Sample Assuming Unequal Variances 12 charecters 55.375 27.87516 charecters 110 42.625Without DeepMnemonic (between p ₁ and r ₁ )With DeepMnemonic (between p ₂ and r ₂ )8 charecters 0.75 0.7512 charecters 0.875 0.516 charecters 3.625 1 E d i t D i s t a n c e Without DeepMnemonic (between p₁ and r₁)With DeepMnemonic (between p₂ and r₂) (b) Edit distance
Without DeepMnemonic8 charecters12 charecters16 charectersWithout DeepMnemonic8 charecters12 charecters16 charecters C o m p l e t e R e c a ll Without DeepMnemonicWith DeepMnemonic (c) Complete recall
Fig. 6: The comparison results of different user groups with and without using DeepMnemonic. We compare thememorization time cost (a), edit distance (b), and complete recall ratio (c) of the three groups without and with theaid of DeepMnemonic. For each comparison, the results for different password lengths are shown separately.case, using mnemonic sentences may incur extra burdensfor memorizing short passwords.This can also be discovered from the complete recall ratioin Figure 6c (the higher, the better). It shows that users canrecall 8-character passwords better without the burden ofmemorizing extra mnemonic sentences. However, the help-fulness of mnemonic sentences becomes obvious for users tomemorize 12-character and 16-character passwords. Overallthe mnemonic sentences generated by DeepMnemonic areeffective in improving the performance of password recall,especially for longer/stronger passwords.In addition, we analyzed the results of online question-naire to evaluate the user experience. Figure 7 shows thatonly 8.3% of the participants indicated that they are not perplexed by memorization of passwords, which clearlysuggests that remembering passwords is indeed a difficulttask. Almost 96% of participants agreed that the passwordsassigned to them are secure and strong. About 46% of themconsidered their assigned passwords easy to remember. Wenote that the majority of them come from group 0 , wheretheir assigned passwords are not very long, and are thusnot very difficult to remember.
Strongly disagree Disagree Neutral Agree Strongly agreeYou are perplexed by remembering passwords. 1 2 9 9 1 22The passwords assigned to you during the user study are strong passwords. 0 2 0 17 3 22The passwords assigned to you during the user study are easy to remember. 2 9 3 7 1 22The gist of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly. 0 1 7 13 1 22The mnemonic sentence generated by DeepMnemonic are easy to remember exactly. 0 2 6 13 1 22The mnemonics generated by DeepMnemonic is grammatically correct. 0 3 5 10 4 22The mnemonics generated by DeepMnemonic is semantically meaningful. 0 2 6 12 2 22DeepMnemonic is generally helpful in remembering the passwords in general. 0 0 6 14 2 22Strongly disagree Disagree Neutral Agree Strongly agreeYou are perplexed by remembering passwords. 0 2 9 10 3 24The passwords assigned to you during the user study are strong passwords. 0 1 0 18 5 24The passwords assigned to you during the user study are easy to remember. 1 9 3 8 3 24The gist of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly. 0 0 7 14 3 24The mnemonic sentence generated by DeepMnemonic are easy to remember exactly. 0 2 6 14 2 24The mnemonics generated by DeepMnemonic is grammatically correct. 0 2 5 11 6 24The mnemonics generated by DeepMnemonic is semantically meaningful. 0 1 6 13 4 24DeepMnemonic is generally helpful in remembering the passwords in general. 0 0 5 15 4 24Strongly disagree Disagree Neutral Agree Strongly agreeYou are perplexed by remembering passwords. 0 2 9 10 3 24The passwords assigned to you during the user study are strong passwords. 0 1 0 18 5 24The passwords assigned to you during the user study are easy to remember. 1 9 3 8 3 24The gist of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly. 0 0 7 14 3 24The mnemonic sentence generated by DeepMnemonic are easy to remember exactly. 0 2 6 14 2 24The mnemonics generated by DeepMnemonic is grammatically correct. 0 2 5 11 6 24The mnemonics generated by DeepMnemonic is semantically meaningful. 0 1 6 13 4 24DeepMnemonic is generally helpful in remembering the passwords in general. 0 0 5 15 4 24Strongly disagree Disagree Neutral Agree Strongly agreeYou are perplexed by remembering passwords. 0.0% 8.3% 37.5% 41.7% 12.5%The passwords assigned to you during the user study are strong passwords. 0.0% 4.2% 0.0% 75.0% 20.8%The passwords assigned to you during the user study are easy to remember. 4.2% 37.5% 12.5% 33.3% 12.5%The gist of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly. 0.0% 0.0% 29.2% 58.3% 12.5%The mnemonic sentence generated by DeepMnemonic are easy to remember exactly. 0.0% 8.3% 25.0% 58.3% 8.3%The mnemonics generated by DeepMnemonic is grammatically correct. 0.0% 8.3% 20.8% 45.8% 25.0%The mnemonics generated by DeepMnemonic is semantically meaningful. 0.0% 4.2% 25.0% 54.2% 16.7%DeepMnemonic is generally helpful in remembering the passwords in general. 0.0% 0.0% 20.8% 62.5% 16.7%Strongly disagree Disagree Neutral Agree Strongly agree0.0% 8.3% 37.5% 41.7% 12.5% 100.0%0.0%0.0% 4.2% 0.0% 75.0% 20.8% 100.0%0.0%4.2% 37.5% 12.5% 33.3% 12.5% 100.0%0.0%0.0% 0.0% 29.2% 58.3% 12.5% 100.0%0.0%0.0% 8.3% 25.0% 58.3% 8.3% 100.0%0.0%0.0% 8.3% 20.8% 45.8% 25.0% 100.0%0.0%0.0% 4.2% 25.0% 54.2% 16.7% 100.0%0.0%0.0% 0.0% 20.8% 62.5% 16.7% 100.0%The mnemonics generated by DeepMnemonic is semantically meaningful.DeepMnemonic is generally helpful in remembering the passwords in general.You are perplexed by remembering passwords. The passwords assigned to you during the user study are strong passwords. The passwords assigned to you during the user study are easy to remember.The gist of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly.The wording of the mnemonic sentence generated by DeepMnemonic is easy to remember exactly.The mnemonics generated by DeepMnemonic is grammatically correct.
Fig. 7: Online questionnaire analysis results.A large portion of the participants agreed that both gistand exact words of each generated mnemonic sentence arehelpful and easy to remember. Almost 71% and 67% ofparticipants agreed on the two points, i.e., “gist is easy to remember” and “wording is easy to remember”, respec-tively. With regard to grammatical correctness, about 8.3%of participants noticed a few grammar errors existed in thegenerated sentences. For example, in Case 6 of Table 3, thesimple present tense in the sentence may not be rigorouslyreasonable. The majority of the participants (70.9%) agreedthat the mnemonic sentences generated by DeepMnemonicare meaningful. Overall, 79.2% of participants recognizedthat DeepMnemonic is generally helpful for rememberingthe given passwords.
ISCUSSION
In this section, we discuss several issues related to theusability of DeepMnemonic.
In general, the successful application of DeepMnemonicto automatic mnemonic generation largely depends on thelarge-scale training data used for building the sequence-to-sequence learning model. Recall that in our experiments,we generated pairwise password and mnemonic sentencetraining data from Webis-Simple-Sentences-17. For eachsentence in the dataset, we followed one strong passwordgeneration rule proposed in [22], and thus concatenatedall the first letter of each word and special charactersin the sentence so as to create a password. It is worthnoting that DeepMnemonic is not limited to this singlestrong password generation rule adopted for constructingthe pairwise training data. It is possible that DeepMnemoniccan also be trained on other datasets constructed usingdifferent password generation rules, such as concatenatingthe last letter of each word instead of the first letters inderiving passwords. Thanks to the use of the sequence-to-sequence learning model, DeepMnemonic is able to learnlanguage generation patterns automatically from the train-ing data, and map passwords to mnemonic sentences intesting. Therefore, it is possible to train multiple encoder-decoder language generation models in DeepMnemonicusing different password generation rules. As a result,users have flexibility to choose from multiple generatedmnemonic sentences for a given password, which mayfurther improve the usability of DeepMnemonic. The n -gram language model used in our comparison eval-uation is one of the most well-known language generationmodels. We have tested various values of n ( n ∈ [1 , ). Sincethe bigram language model achieved the best performanceamong different n , it was selected as the baseline.We discovered that when using a unigram languagemodel to map from a password sequence to a mnemonicsequence, the model cannot leverage on the context, i.e.,preceding words. As a result, the words that appear morefrequently in training data are considered more important,and are then assigned with a higher probability for predic-tion. For example, given a password character “t”, it is verylikely that the unigram language model would generate theword “the”, regardless of the preceding contextual words.In addition, trigram ( n = 3 ) and 4-gram based languagemodels built on the pairwise training data often suffer fromsparse co-occurrence information, and output the unknowntoken < UNK > . It is more likely that these models, comparedto the bigram models, generate empty list of candidatewords from two or more contextual preceding words anda target password character. According to our experimentalresults, when n is larger than 2, the language models havemore difficulty in generating complete sentences. BLEU is one of the most popular automated metrics forevaluating the quality of machine translation [37] in naturallanguage processing. The quality is often considered to bethe matching degree between each pair of generated sen-tence and ground-truth sentence. The closer the generatedsentence is to the ground-truth, the better the languagegeneration is. Our quantitative experimental results showthat DeepMnemonic significantly outperforms the well-established bigram language model in terms of BLEU.We note that BLEU is a good metric to evaluate thequality of machine translation, where each generated ortranslated sentence should be close to the reference transla-tion (ground truth) as much as possible. The state-of-the-artBLEU score is 34.8 for machine translation using deep neuralnetworks [35]. However, BLEU may not be a fair metric forevaluating the password mnemonic generation, which aimsto assist users in password memorization. It is true thatthe generated mnemonic sentences sometimes do not matchperfectly to the reference sentences (ground truth), whichmay lead to poor BLEU scores. However, it is possible thatsuch mnemonic sentences are still helpful for memorizingpasswords, as long as they are grammatically correct andsemantically meaningful by themselves, as shown in Table 3.The metric MP proposed in Section 4.3.1 is thus designedto compliment BLEU in terms of mnemonic generationevaluation.
In our user study, we recruited 24 participants to evaluatethe usability of DeepMnemonic. Despite the small scale, theuser study shows that DeepMnemonic is helpful in assistingusers in better memorizing the passwords, especially whenpasswords are relatively long. A limitation of this user study lies in the age distribution of the participants. Theages of the participants range from 24 to 40, which are notrepresentative of younger or older population. Indeed, thisis also a challenge in many other user studies [41]. Our userstudy suggests that DeepMnemonic is generally helpful foryoung and middle-aged people. It remains a future work toevaluate the DeepMnemonic against other age groups.
DeepMnemonic does not handle lowercase and uppercasecharacters when generating mnemonic sentences, and thuscannot help users distinguish between them. We havetried to build a variant DeepMnemonic model, whichcan generate sentences that differentiate lowercase anduppercase characters. However, its performance (e.g., BLEUand MP) is not as good as that of current DeepMnemonicmodel. We plan to deal with this lowercase and uppercasecharacter issue in the future work. In addition, users mayhave multiple passwords across various platforms. It wouldbe interesting to extend DeepMnemonic to help usersmemorize their passwords across different platforms.
Currently, the proposed DeepMnemonic has been shownto be effective in generating mnemonics for passwords inEnglish. Similar to the sequence-to-sequence machine transla-tion task in natural language processing [30], the underlyingencoder-decoder model of DeepMnemonic can encode anyinput password into a semantic vectorial representation anddecode the representation into a semantically meaningfulmnemonic sentence in the target language. When suitabletraining datasets in other languages (e.g., Chinese) areavailable, DeepMnemonic can be easily adapted to themnemonic sentence generation in those language.
ELATED W ORK
A lot of efforts have been made in the past to assess thestrength of passwords or to generate strong passwords.The strength of a password can be typically evaluated bytwo common metrics, namely, entropy and guessability.The entropy measures how unpredictable a password is byconsidering the length of password and the distribution ofcharacters in the password. One limitation of the entropy-based measurement [42] is that it only supplies users withrough approximations of password strength [3] [43] [44]. Incontrast, guessability-based measurement, which is definedas the number of guesses required to break a password, hasbecome increasingly popular. To evaluate the guessabilityof a password, one key step is to identify an algorithmfor password cracking, such as the Probabilistic Context-Free Grammar model (PCFG) [10] [11] [6] and the Markov n -grams model [12] [4]. These cracking algorithms exploitthe password distributions that are derived from variouspassword datasets disclosed in previous security incidents.It is revealed that different password datasets collectedfrom different user groups demonstrate different distribu-tions [45] [46]. Then, the guessability of the password canbe measured using the cracking algorithm. Previous studies have evaluated the password strengthby measuring the popularity of passwords. Schechter etal. [47] evaluated user-chosen passwords by identifyingundesirably popular passwords. The passwords withincertain popularity threshold are considered secure underprobabilistic attacks. It is argued that the existing passwordcreation policies can be replaced with popularity limitsso as to strengthen both security and usability for userauthentication systems.Strong password generation aims to strengthen pass-words via changing or adding characters to the passwords.Generation of persuasive text passwords is one of suchapproaches, which inserts one to four characters at randompositions in a given password [7] [8]. Recently, Houshmandand Aggarwal [9] presented an analyze-modify method togenerate strong passwords. They first evaluated passwordstrength using PCFG. For identified weak passwords, theymodified them based on a set of editing rules. One limitationof strong password generation is that it does not take intoconsideration the usability or memorability of passwords.As a consequence, the generated passwords may not beuser-friendly in practice.
A variety of strategies have been employed to createeasy-to-remember passwords by service providers. Onestrategy is to generate pronounceable passwords [16] [18].For example, pronounceable password “kilakefe52” is com-prised of a random sequence of vowel-consonant pairs [16].Another strategy is to create memorable passwords thatcomprise multiple components with certain meanings [17].For example, password “
Ilwm. ” can be generated byjoining the first character of each word and the punctuationin the following sentence: “
I love watching movies. ” Theexpression-based passwords generated by this approach areoften supposed to be stronger than those selected intuitivelyby users [13]; given reference sentences or phrases, theusability cost for memorizing the generated passwords isalmost comparable to memorizing those intuitive ones.Yang et al. [20] demonstrated that the security level ofan expression-based password was largely affected by itsgeneration rules, which was also validated by Kiesel etal. [22]. However, it is not clear how various generationrules at different security levels affect the usability costs for memorizing the generated passwords. Due to users’behavioral tendency toward picking up easy-to-rememberreference sentences [7], the generated memorable passwordsusing such human-selected language expressions may beeasy-to-guess by attackers.Researchers have studied various tips for creatingmnemonic passwords, including sentence substitution(SenSub), key-board change (KbCg), using a formula(UsForm), and special character insertion (SpIns) [49].Alphapwd [50] is a memorable password generationstrategy based on password shapes. In order to creatememorable passwords, Alphapwd requires user toremember a shorter sequence of letters that are shownin larger size on top of a normal keyboard; a user mayderive his/her strong password easily from the keyboardfollowing the strokes of the shorter sequence of large-size letters. The strengths of these generated passwordsare evaluated using probabilistic attack algorithms suchas the PCFG algorithm and MarKov model trained onprevious disclosed password datasets. Ghazvininejad andKnight [51] proposed to generate passwords in the formof English sequences, called passphrase , whose lengthsrange from 31.2 to 87.7 characters on average. However,experiments revealed that it is difficult for common usersto reproduce the exact wording of passphrases, even ifthey manage to remember the gist of these sentences [52].Being complementary to these memorable passwordgeneration methods, DeepMnemonic is designed togenerate mnemonic sentences for any given passwordsinstead of generating new passwords.
To improve the memorability of given strong passwords,various types of hints or mnemonic tools have been used.Atallah et al. [53] is the first to propose the concept ofusing funny jingles for memorizing a randomly generatedpassword. One challenge of this proposal is the generationof related funny jingles. Several other efforts were madeto assist users in remembering passwords via externaltools, such as helper card [23], hint image [24] [25], andengaging game [26]. Jeyaraman and Topkara [27] proposedto match given passwords to textual headlines or theirvariant versions selected from a given corpus to assist usersin remembering passwords. One drawback of this approachis that the variant headlines generated may be syntacticallyincorrect or semantically inconsistent. Due to the limitedlength of the headline text, it can only generate hintsfor short passwords and largely fails for long and strongpasswords. The memorability of the passwords generatedby this approach has not been evaluated.
In natural language processing, statistical language modelsare used to generate meaningful sentences by computingjoint probabilities of sequences of words from a dictionaryof a given language. One of the dominant methods forprobabilistic language modeling is the n -gram languagemodel [36], which is a non-parametric learning algorithm. Itrelies on the preceding sequence of n − words to estimate the conditional probability for predicting (generating) thecurrent n -th word.Recently, neural network based language models havebecome increasingly popular in natural language genera-tion tasks. Bengio et al. [54] proposed a generic neuralprobabilistic language model, which can simultaneouslylearn the distributed representation of each word and jointprobability function of sequences. Sutskever et al. [35]proposed a sequence-to-sequence neural language model toaddress the machine translation problem. One key benefitof their model is that it can automatically generate atranslated sentence in the target language, given a sentencein the source language. In order to improve machinetranslation, Bahdanau et al. [30] introduced an attentivealignment strategy to enhance the sequence-to-sequenceneural language model, which learned to dynamically paymore attention to salient parts of the input sentences whengenerating the translated sentences.In this paper, we exploit natural language translationtechniques to generate human-readable and semanticallymeaningful mnemonic sentences from any given passwordsso as to help users memorize strong passwords. ONCLUSION
In this work, we have proposed DeepMnemonic, a deepneural network based approach to automatic generationof mnemonic sentences for any given textual passwords.DeepMnemonic builds upon an attentive encoder-decoderlanguage generation framework, and works by translating an input sequence of password characters to a naturallanguage sentence of mnemonic words. DeepMnemonic isdesigned to bridge the gap between the strong passwordgeneration and the usability of strong passwords. Exper-imental results show that DeepMnemonic is capable ofgenerating semantically meaningful mnemonic sentences.A user study is conducted to evaluate the usability ofDeepMnemonic, which shows that the generated mnemonicsentences are helpful in memorizing strong passwords.Specifically, with the aid of DeepMnemonic, the time usedfor remembering a password is largely reduced, and thepassword recall quality is also significantly improved. Inthe future, we plan to train DeepMnemonic using morecomprehensive and diverse training data. R EFERENCES [1] R. W. Proctor, M.-C. Lien, K.-P. L. Vu, E. E. Schultz, andG. Salvendy, “Improving computer security for authenticationof users: Influence of proactive password restrictions,”
BehaviorResearch Methods , vol. 34, no. 2, pp. 163–169, 2002.[2] B. Ur, F. Noma, J. Bees, S. M. Segreti, R. Shay, L. Bauer, N. Christin,and L. F. Cranor, “I added !at the end to make it secure: Observingpassword creation in the lab,” in
Proc. SOUPS , 2015.[3] M. Weir, S. Aggarwal, M. Collins, and H. Stern, “Testing metricsfor password creation policies by attacking large sets of revealedpasswords,” in
Proceedings of the 17th ACM conference on Computerand communications security . ACM, 2010, pp. 162–175.[4] J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilisticpassword models,” in
Security and Privacy (SP), 2014 IEEESymposium on . IEEE, 2014, pp. 689–704.[5] W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer,N. Christin, and L. F. Cranor, “Fast, lean, and accurate: Modelingpassword guessability using neural networks.” in
USENIXSecurity Symposium , 2016, pp. 175–191. [6] D. Wang, D. He, H. Cheng, and P. Wang, “fuzzypsm: A newpassword strength meter using fuzzy probabilistic context-freegrammars,” in . IEEE, 2016, pp.595–606.[7] A. Forget, S. Chiasson, P. C. Van Oorschot, and R. Biddle,“Improving text passwords through persuasion,” in
Proceedingsof the 4th symposium on Usable privacy and security , 2008, pp. 1–12.[8] ——, “Persuasion for stronger passwords: Motivation andpilot study,” in
International Conference on Persuasive Technology .Springer, 2008, pp. 140–150.[9] S. Houshmand and S. Aggarwal, “Building better passwordsusing probabilistic techniques,” in
Proceedings of the 28th AnnualComputer Security Applications Conference . ACM, 2012, pp. 109–118.[10] M. Weir, S. Aggarwal, B. De Medeiros, and B. Glodek, “Passwordcracking using probabilistic context-free grammars,” in
Securityand Privacy, 2009 30th IEEE Symposium on . IEEE, 2009, pp. 391–405.[11] R. Veras, C. Collins, and J. Thorpe, “On semantic patterns ofpasswords and their security impact.” in
NDSS , 2014.[12] C. Castelluccia, M. D ¨urmuth, and D. Perito, “Adaptive password-strength meters from markov models.” in
NDSS , 2012.[13] J. Yan, A. Blackwell, R. Anderson, and A. Grant, “Passwordmemorability and security: Empirical results,”
IEEE Security &privacy , vol. 2, no. 5, pp. 25–31, 2004.[14] J. R. Anderson and C. J. Lebiere,
The atomic components of thought .Psychology Press, 2014.[15] G. A. Miller, “The magical number seven, plus or minus two: Somelimits on our capacity for processing information.”
Psychologicalreview
Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security . ACM, 2016,pp. 1216–1229.[21] C. Kuo, S. Romanosky, and L. F. Cranor, “Human selection ofmnemonic phrase-based passwords,” in
Proceedings of the SecondSymposium on Usable Privacy and Security , ser. SOUPS ’06. NewYork, NY, USA: ACM, 2006, pp. 67–78. [Online]. Available:http://doi.acm.org/10.1145/1143120.1143129[22] J. Kiesel, B. Stein, and S. Lucks, “A large-scale analysis of themnemonic password advice,” in
Proceeding of NDSS , 2017.[23] U. Topkara, M. J. Atallah, and M. Topkara, “Passwordsdecay, words endure: Secure and re-usable multiple passwordmnemonics,” in
Proceedings of the 2007 ACM symposium on Appliedcomputing . ACM, 2007, pp. 292–299.[24] M. Fukumitsu, T. Katoh, B. B. Bista, and T. Takata, “A proposalof an associating image-based password creating method and adevelopment of a password creating support system,” in
AdvancedInformation Networking and Applications (AINA), 2010 24th IEEEInternational Conference on . IEEE, 2010, pp. 438–445.[25] K. A. Juang, S. Ranganayakulu, and J. S. Greenstein, “Usingsystem-generated mnemonics to improve the usability andsecurity of password authentication,” in
Proceedings of the HumanFactors and Ergonomics Society Annual Meeting , vol. 56, no. 1. SAGEPublications Sage CA: Los Angeles, CA, 2012, pp. 506–510.[26] J. Doolani et al. , “Improving memorization and long term recall ofsystem assigned passwords,” Ph.D. dissertation, 2016.[27] S. Jeyaraman and U. Topkara, “Have the cake and eat ittoo-infusing usability into text-password based authenticationsystems,” in .IEEE, 2005, pp. 10–pp.[28] P. University, “Wordnet,” https://wordnet.princeton.edu/.[29] G. H. Bower, “Analysis of a mnemonic device: Modern psychologyuncovers the powerful components of an ancient system forimproving memory,”
American Scientist , vol. 58, no. 5, pp. 496–510,1970. [30] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translationby jointly learning to align and translate,” in Proceedings of the 2015International Conference on Learning Representations (ICLR) , 2015.[31] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neuralnetworks,”
IEEE Transactions on Signal Processing , vol. 45, no. 11,pp. 2673–2681, 1997.[32] K. Cho, B. Van Merri¨enboer, C. Gulcehre, D. Bahdanau,F. Bougares, H. Schwenk, and Y. Bengio, “Learning phraserepresentations using rnn encoder-decoder for statistical machinetranslation,” in
Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP) , 2014, pp. 1724–1734.[33] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: a simple way to prevent neuralnetworks from overfitting,”
The Journal of Machine LearningResearch , vol. 15, no. 1, pp. 1929–1958, 2014.[34] P. Koehn, “Pharaoh: a beam search decoder for phrase-basedstatistical machine translation models,” in
Conference of theAssociation for Machine Translation in the Americas . Springer, 2004,pp. 115–124.[35] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequencelearning with neural networks,” in
Advances in neural informationprocessing systems , 2014, pp. 3104–3112.[36] C. D. Manning, C. D. Manning, and H. Sch ¨utze,
Foundations ofstatistical natural language processing . MIT press, 1999.[37] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a methodfor automatic evaluation of machine translation,” in
Proceedings ofthe 40th annual meeting on association for computational linguistics .Association for Computational Linguistics, 2002, pp. 311–318.[38] H. Ebbinghaus, “Memory: A contribution to experimentalpsychology,”
Annals of neurosciences , vol. 20, no. 4, p. 155, 2013.[39] T. G. Dietterich, “Approximate statistical tests for comparingsupervised classification learning algorithms,”
Neural computation ,vol. 10, no. 7, pp. 1895–1923, 1998.[40] N. K. Bakirov and G. J. Szekely, “Students t-test for gaussian scalemixtures,”
Journal of Mathematical Sciences , vol. 139, no. 3, pp. 6497–6505, 2006.[41] Y. Li, Y. Cheng, Y. Li, and R. H. Deng, “What you see is notwhat you get: Leakage-resilient password entry schemes for smartglasses,” in
Proceedings of the 2017 ACM on Asia Conference onComputer and Communications Security . ACM, 2017, pp. 327–333.[42] NIST, “Digital identity guidelines,” https://pages.nist.gov/800-63-3/, 2017.[43] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas,L. Bauer, N. Christin, L. F. Cranor, and J. Lopez, “Guess again (andagain and again): Measuring password strength by simulatingpassword-cracking algorithms,” in
Security and Privacy (SP), 2012IEEE Symposium on . IEEE, 2012, pp. 523–537.[44] B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. L. Mazurek,T. Passaro, R. Shay, T. Vidas, L. Bauer et al. , “How does yourpassword measure up? the effect of strength meters on passwordcreation.” in
USENIX Security Symposium , 2012, pp. 65–80.[45] D. Wang, P. Wang, D. He, and Y. Tian, “Birthday, name andbifacial-security: understanding passwords of chinese web users,”in , 2019,pp. 1537–1555.[46] D. Wang, Q. Gu, X. Huang, and P. Wang, “Understandinghuman-chosen pins: characteristics, distribution and security,” in
Proceedings of the 2017 ACM on Asia Conference on Computer andCommunications Security . ACM, 2017, pp. 372–385.[47] S. Schechter, C. Herley, and M. Mitzenmacher, “Popularity iseverything: A new approach to protecting passwords fromstatistical-guessing attacks,” in
Proceedings of the 5th USENIXconference on Hot topics in security . USENIX Association, 2010,pp. 1–8.[48] R. Shay, P. G. Kelley, S. Komanduri, M. L. Mazurek, B. Ur,T. Vidas, L. Bauer, N. Christin, and L. F. Cranor, “Correcthorse battery staple: Exploring the usability of system-assigned passphrases,” in
Proceedings of the Eighth Symposiumon Usable Privacy and Security , ser. SOUPS ’12. New York,NY, USA: ACM, 2012, pp. 7:1–7:20. [Online]. Available:http://doi.acm.org/10.1145/2335356.2335366[49] B. Ye, Y. Guo, L. Zhang, and X. Guo, “An empirical study ofmnemonic password creation tips,”
Computers & Security , vol. 85,pp. 41–50, 2019.[50] J. Song, D. Wang, Z. Yun, and X. Han, “Alphapwd: A password generation strategy based on mnemonic shape,”
IEEE Access ,vol. 7, pp. 119 052–119 059, 2019.[51] M. Ghazvininejad and K. Knight, “How to memorize a random60-bit string,”
Parking , vol. 11, no. 70.8, pp. 58–83, 2015.[52] H. A. Yajam, Y. K. Ahmadabadi, and M. Akhaee, “Papiapass:Sentence-based passwords using dependency trees,” in . IEEE, 2016, pp. 91–96.[53] M. J. Atallah, C. J. McDonough, V. Raskin, and S. Nirenburg,“Natural language processing for information assurance andsecurity: an overview and implementations,” in
Proceedings of the2000 workshop on New security paradigms . ACM, 2001, pp. 51–65.[54] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neuralprobabilistic language model,”
Journal of machine learning research ,vol. 3, no. Feb, pp. 1137–1155, 2003.
Yao Cheng is currently a senior researcherat Huawei International in Singapore. Shereceived her Ph.D. degree in Computer Scienceand Technology from University of ChineseAcademy of Sciences. Her research interestsinclude security and privacy in deep learningsystems, blockchain technology applications,Android framework vulnerability analysis,mobile application security analysis, and mobilemalware detection.
Chang Xu is currently a Postdoctoral Fellowat Data61, CSIRO, Australia. He received hisPhD degree in Computer Science from NanyangTechnological University in March 2017. His cur-rent interests include robust and explainabledeep neural models for natural language pro-cessing.
Zhen Hai received the PhD degree in ComputerScience and Engineering from Nanyang Tech-nological University in 2014. He has been withthe Institute for Infocomm Research, A*STAR,Singapore since 2015. His research interestsinclude natural language processing, text min-ing, sentiment analysis, information security, andmachine learning. He has been invited to serveon the program committees of leading confer-ences including SIGIR, ACL, EMNLP, AACL,AAAI, IJCAI, CIKM, etc.