[PDF] Using Image Attributes for Human Identification Protocols

Abstract

A secure human identification protocol aims at authenticating human users to a remote server when even the users' inputs are not hidden from an adversary. Recently, the authors proposed a human identification protocol in the RSA Conference 2007, which is loosely based on the ability of humans to efficiently process an image. The advantage being that an automated adversary is not effective in attacking the protocol without human assistance. This paper extends that work by trying to solve some of the open problems. First, we analyze the complexity of defeating the proposed protocols by quantifying the workload of a human adversary. Secondly, we propose a new construction based on textual CAPTCHAs (Reverse Turing Tests) in order to make the generation of automated challenges easier. We also present a brief experiment involving real human users to find out the number of possible attributes in a given image and give some guidelines for the selection of challenge questions based on the results. Finally, we analyze the previously proposed protocol in detail for the relationship between the secrets. Our results show that we can construct human identification protocols based on image evaluation with reasonably ``quantified'' security guarantees based on our model.

Full PDF

aa r X i v : . [ c s . CR ] A p r Using Image Attributes for HumanIdentiﬁcation Protocols

Hassan Jameel , Heejo Lee , and Sungyoung Lee Department of Computer Engineering, Kyung Hee University, 449-701 Suwon,South Korea { hassan,sylee } @oslab.khu.ac.kr , Department of Computer Science and Engineering, Korea University Anam-dong,Seongbuk-gu, Seoul 136-701, South Korea [email protected]

Abstract.

A secure human identiﬁcation protocol aims at authenticat-ing human users to a remote server when even the users’ inputs are nothidden from an adversary. Recently, the authors proposed a human iden-tiﬁcation protocol in the RSA Conference 2007, which is loosely basedon the ability of humans to eﬃciently process an image. The advantagebeing that an automated adversary is not eﬀective in attacking the pro-tocol without human assistance. This paper extends that work by tryingto solve some of the open problems. First, we analyze the complexity ofdefeating the proposed protocols by quantifying the workload of a hu-man adversary. Secondly, we propose a new construction based on textualCAPTCHAs (Reverse Turing Tests) in order to make the generation ofautomated challenges easier. We also present a brief experiment involv-ing real human users to ﬁnd out the number of possible attributes in agiven image and give some guidelines for the selection of challenge ques-tions based on the results. Finally, we analyze the previously proposedprotocol in detail for the relationship between the secrets. Our resultsshow that we can construct human identiﬁcation protocols based on im-age evaluation with reasonably “quantiﬁed” security guarantees basedon our model.

Suppose a student wishes to write a conﬁdential email to disclose the informationof a leaked out exam paper to his friend. Using a secure email client the studentwrites down an email, sends it and logs out. The email shall be encrypted andwould only be viewable by the recipient once he logs in to check the email. Howcan the student be sure that the email was sent securely and no one could learnanything apart from the intended recipient? The answer relies on the weakestsecurity link: the password. Little did the student know that his computer wasbeing key-logged [18]. There was a hidden camera looking at the student’s everymove. His fellow student also shoulder surfed on the password. Even if no onesaw the mail being written, they could log on later to view the sent mail box.t turns out that no matter how secure the email client was, it only served itspurpose until the password was not compromised.Similar situation occurs when one inputs PIN numbers on ATMs. We coulduse biometrics instead of passwords or pin numbers. But biometric data is onlysecure unless the biometric information is kept conﬁdential and the equipmenthas not been tampered. So, are these mechanisms unsecure or useless? Theycertainly aren’t. They were designed with certain assumptions in mind: Thepasswords selected by the user should be truly random strings of a suitablelength; the pin numbers selected by the user should be truly random 4-digitnumbers; the user has the responsibility to hide her input from peeping eyes. Ithas been an interesting topic of research in cryptography to devise authenticationprotocols that meet the security reequirements even when the above mentionedassumptions do not hold. This highly vulnerable security environment has beentermed as the “naked human in a glass house” model in [16], although theﬁrst protocol constructed to be secure in this model was by Matsumoto [2]. Weshall call such protocols as “Human Identiﬁcation Protocols” or HIPs in shortfollowing the terminology in literature. A lot of human identiﬁcation protocolshave been proposed in literature with the goal of ease of “human execution” inmind. The protocols should take roughly the same amount of time as a passwordbased protocol takes. Researchers have tried to construct protocols that aresecure and require little or no computation on the human’s part. This indeed isa very hard goal to achieve and reﬂects through the fact that there are not a lotof proposed human identiﬁcation protocols over the years.The situation, however, is not as bad as it seems. Humans possess goodcognitive abilities. We can recall a previously viewed image with a very highprobability when presented to us again. We do not need to memorise all thedetails of the picture. This has led to the use of graphics as essential ingredientsof human identiﬁcation protocols. Proposed identiﬁcation protocols can be im-plemented with a graphical implementation. For example, instead of memorizingthe string 00101, we can display ﬁve pictures, each time, where the third and theﬁfth picture is shown to the user apriori as the user’s secret pictures. We couldgo a step further and use the things that an image describes to build humanidentiﬁcation protocols. In [1] we proposed that instead of using pictures justas memory aids, we can use their internal structure in some way to constructa human identiﬁcation protocol. The secret could be one of the concepts thatthe picture satisﬁes. It was conjectured that it would be hard even for a humanadversary to ﬁnd the secret. However, no exact quantiﬁcation of this hardnesswas given. While the hardness of breaking the protocol is clearly evident againstautomated adversaries (computer programs programmed to defeat the protocol),the security against “human adversaries” needs more attention. This study aimsto enhance the work proposed in [1] by answering some of the open problemsand moves a step further in trying to quantify the hardness of the underlyingproblem. We address the following issues:

We present a new protocol with the aim of generating automated instancesof challenges. This construction is loosely based on the Gimpy CAPTCHAs[17] and requires the server to only maintain a dictionary of words. – We analyze the security of the protocol presented in [1] and the ones pre-sented in this paper with a new perspective. We show how much work hasto be performed by a “human” adversary in order to obtain the secret basedon our model. – We show the interrelationship of the two “secrets” of the protocol presentedin [1]. – We also present another way of viewing the underlying problem of theseprotocols using matrix representation. This view helps to understand theprincipal hardness of the protocols presented. – Finally, we show the results of some experiments which show the amount ofinformation in a very simple image. This data leads us to some guidelineswhile selecting the secret which we have also mentioned.Our study will deal with passive adversaries only. The reason being that if wedeal with active adversaries in the protocols, then we might require the humanuser to send some random challenges to the remote server in order to authenti-cate it as well. This generation of random challenges requires an extra amountof computation from the human user’s part, and might deem the protocols im-practical. This is left as a future work.

The ﬁrst work on human identiﬁcation dates back to [2]. Since then a lot of otherschemes have been proposed in literature [4],[5],[6],[7],[9],[14]. Some of them werebroken in [4], [15]. While most of them involve some numerical calculations like [2]and the HB protocol[7], they can be implemented using some graphical interfaceemploying pictures as memory aids. We can categorized the human identiﬁca-tion protocols into two broad categories: Protocols built to be secure againstgeneral eavesdropping adversaries and protocols secure against only “guessingadversaries” i.e. Adversaries who do not see the user’s input and hence try toguess the secret or impersonate the user without any apriori knowledge. Proto-cols mentioned so far fall in the ﬁrst category. They have a drawback, however,that they involve extra computation from the user. As an example, in the HBprotocol [7], the user is required to compute bit-wise binary multiplication forsome number of bits in every iteration. This may not seem much but to obtaina higher level of security, the number of computations increase signiﬁcantly.In the second category, the most well known example is the traditional pass-word based authentication system. Others include purely graphical schemes likeDeJa Vu[10],Passface [11], Point & Click [12] and [3] that require little or no nu-merical computation whatsoever. The basic theme of [10] and [11] is to presentthe user a series of pictures, a subset of which are the secret pictures. The useris authenticated if its selection of the secret pictures among the given set of pic-tures is correct. On the other hand, in [12] the user is authenticated if it clicksn the correct secret location in the given picture. [3] works similarly by lettingthe user draw the secret symbol or ﬁgure on a display device. Evidently, thesepurely graphical schemes are not secure against “peeping” attacks [9]. Anyoneobserving the actions of the user can ﬁnd out the secret in no time. For a detailedaccount of all the schemes, see [9]. In our previous work [1], we proposed to useinternal properties of images as secrets. After a secret has been chosen, pictureswhich satisfy the properties were presented randomly with the pictures that donot satisfy the property. The user has to answer the pictures according to thesecret property. It was conjectured that ﬁnding out the secret property is a hardproblem for adversaries. For automated adversaries, this follows immediatelyfrom the deﬁnition of CAPTCHAs[8]. But for human adversaries, this hardnessis diﬃcult to prove. In this paper, we have tried to quantify this hardness andtried to answer some of the open questions described above.

Consider the picture of a magic square shown in Figure 1. How many thingsdoes the picture represent? A simple glance at it can reveal a lot of things: amagic square, a square, nine small squares, digits, black, white, the digit 4, thedigit 2, line(s), the right angle etc. It is amazing how much information does arather simple looking picture contain. We call each piece of this information asa feature. We can take one of these features and construct a question out of it.For example: Does the picture contain a rectangle?. This question is shared asa secret between the server and the user. After that, a series of pictures can bepresented to the user such that with probability 1/2 they satisfy the questionand with probability 1/2 they don’t. The user has to scan the picture and answer‘yes’ or ‘no’ accordingly. How about the adversary? The adversary would like toknow the secret feature. The best way to do it is to extract all features in thepictures presented to the user and then do intersection (if user’s answer is ‘yes’)or diﬀerence (if user’s answer is ‘no’) to narrow down the number of possiblesecret features.From an abstract point of view, we may deﬁne a universal set of all features,all pictures could possibly have. Denote this set as S = { , , . . . , n } , where n isthe total number of features. Any subset of this set represents a picture. Givena subset A of this set (which again is a picture) and its corresponding responsebit a from the user, we would be interested in ﬁnding out how the adversary canﬁnd the hidden feature and hence the secret question. Obviously, it is almostimpossible to write a computer program which given a picture can ﬁlter outall the features that picture describes considering the enormous size of the setof all features. This immediately implies the need of human intervention. So,what could be the best strategy to ﬁnd out the hidden feature? One way isto have a cursory glance at the pictures and see if there is something commonbetween the pictures. A more eﬃcient way is to ﬁnd out all the features in thepictures, and then check which features are common between the pictures. Butonce the features have been extracted, one could make a program that would ig. 1. A simple picture of a magic square may describe a lot of things automatically ﬁlter out the common features. Thus we will only be concernedwith the workload on the human adversary which amounts to ﬁnding out all thefeatures in the pictures until a single common feature is found. Our analysis willtry to quantify the complexity of our proposed schemes based on the workloadon human adversary using the mentioned approach. Namely, given a picture A and its corresponding answer bit a the adversary has to ﬂag all the features asthe candidate secret features if a = 1 or else delete all the features present inthis picture from the candidate set of features if a = 0. Thus the adversary’s jobis to ﬁnd out the features by personally checking every picture and narrow downthe set of candidate secret features.We will now present protocols based on this main idea in the next section. Eachprotocol follows a short discussion and a brief account of the workload on theadversary. The detailed analysis follows in Section VI and VII. Before we present the proposed schemes we present the general notation whichwill be used throughout this manuscript. We assume there to be a pool of dis-tinct pictures P , each element of which is denoted by P , and a set of questions Q . Each question q has a binary answer when applied to any picture in P . Eachquestion therefore asks whether a certain feature is present in the picture ornot. We will also use q to represent the function: q : P → { , } , which repre-sents the evaluation of a picture according to the question. The user’s answerstring is represented by A , where A ( i ) represents the i th bit in the string. Weill use a to denote an arbitrary answer bit. From now onwards, the word “ad-versary” or the symbol H would mean the “human” eavesdropping adversary,unless otherwise speciﬁed. The workload of H for each protocol will be basedon the complexity of the above mentioned algorithm (or a slight variant of it)which is described in Section VI. Due to a bulk of notation used in this arti-cle, we will abuse this general notation in some sections or subsections withoutcompromising disambiguity. We have the following immediate basic scheme:

Setup.

The user and the server share a secret question q from Q . Protocol. – Repeat k times • The server picks a bit b uniformly at random, and picks a picture P from P such that q ( P ) = b . Discards this picture from the pool, and presentsit to the user. • The user submits a = q ( P ). – Output accept if all answer’s are correct, otherwise output reject

The scheme is described pictorically in Figure 2, where 2 pictures are shownat each iteration. We could present all k pictures at the same time depending Fig. 2.

The Basic Protocol with L = 2 and secret question q =“Does the picturecontain a basketball?” on whether they can be displayed on the screen or not. In anycase we can ﬁnd aumber L such that we can have ⌈ k / L ⌉ iterations in one authentication session.According to the analysis in Section VI, the number of pictures the adversaryhas to observe to obtain the secret feature would be: log n regardless of thevalue of L . This amounts to a total work of n log n , which has to be done bythe human adversary. Setting in the values n = 10 , we get a total work of ≈ units. Discussion on the protocol

This protocol is simple and practical. However, thereis a big disadvantage. Namely, the user is sending its answers in the clear. Anadversary thus knows the “correct” answer to all the pictures shown to the user.This makes the life of the adversary a bit easier since it can have a glance atthe pictures to ﬁnd out the common secret feature. We would like to somehow“hide” the answer sequence, thus making it hard for the adversary to guess thesecret feature. The next protocol attempts to do that. This protocol was theoriginal protocol presented in [1].

Matrix Interpretation

Consider the set of all features S = { , , . . . , n } as de-scribed in the previous section. Each picture contains a subset of this universalset of features. The secret question can also be considered as a subset of this setwith a single element. We can represent the features of a presented picture i asa vector v i with all the features not present in the picture represented by 0’s,and the secret question as x with only one entry equal to 1. We can then writethe answer bit as v i · x = a . Thus if we have more pictures, we can represent theprotocol as the matrix operation: Vx = a , where V is the matrix containing rowvectors v i representing the features of the picture i , and a is the answer vector.The obvious way to solve this requires O ( n ) pictures and corresponding answers.However, as we will see in Section VI, not all pictures will contain features fromthe full feature space. Therefore, the actual number of picture and answer pairsrequired would be less than that. The real diﬃculty of the problem, however, isto ﬁnd out the features in the pictures and thus construct the correct matrix V .Our matrix representation shows one way of describing the problem needed tobe solved by H . Suppose we want to present L pictures at a time to the user. The pictures arelabeled from 1 to L in sequential order. The idea is to permute the numbersrandomly. Out of the resulting permutation, select l numbers, and label theothers as “don’t care positions”. This permutation string is also shared as a secretbetween the user and the server. Each time a series of L pictures are presented.The user answer’s the pictures according to the order in the permutation string,and ﬁlls the “don’t care positions” with random bits. So, for example, if L = 10and l = 5, then a possible permutation string would be ∗ ∗ ∗ ∗ ∗

0, wherethe “ ∗ ” represents a don’t care position. We label such a permutation stringas σ . We now have two secrets in our scheme: the secret question q and thepermutation string σ . The user thus has to answer a series of pictures accordingo the function: q σ : P → { , } . The user is accepted if the answers are correctat the l positions in each of the k iterations. The rest of the protocol is the sameas the basic protocol.The protocol is described in Figure 3. Here we have taken L = 5, l = 3 and k = 1. The permutation string is 2 ∗ ∗ and the secret question is “Is the picturesomehow related to computation?”. Fig. 3.

The Enhanced Protocol with L = 5, l = 3 and k = 1 and secret question q =“Isthe picture somehow related to computation?” In Section VI, we show that the total amount of work required by the ad-versary is: L +1 l +1 n log n . Putting in the values L = 10 , l = 5 and n = 10 , weget a total work load of ≈ . Which is signiﬁcantly higher than the previousprotocol. The adversary thus gets the secret question. But how about the secretpermutation string σ ? The adversary is only successful in impersonating as thelegitimate user if it knows σ . It turns out that after the secret question has beenrevealed, it just takes a handful of iterations to observe to guess the permutationstring with high probability. We show this in Section V. Discussion on the protocol

Even though this protocol does have an advantageover the basic protocol in the sense that the correct answer sequence is shuﬄed,it seems hard to construct a method so as to hide the answer sequence completelywith not too much eﬀort on the user’s part. It not only adds extra burden whileanswering the pictures, it also slows down the process. Secondly, in the twoschemes proposed, there is a big question of practicality. How to automaticallygenerate those pictures? We conjecture that it is hard to write a program thatan extract all the features from the given features. But how about the problemof generating or ﬁnding out images satisfying a given question automatically?This might not be possible for all questions. In the next scheme we try to createa protocol that can automatically generate instances.

Matrix Interpretation

It would be interesting to know whether this protocolcan be presented in a matrix representation. We could represent it as Vx = a .However, the actual answer sequence is not the same as a in this case. Thus wecan represent it as: Vx = p L ( a ) where p L ( a ) is a permutation operation on a taking its L components at a time. However, we still have the case that someof the bits in a are random. This becomes a problem similar to the LearningParity with Noise (LPN) problem presented in the HB protocol [7]. Again, thereal problem in our scheme is to extract all features from the pictures and thusconstructing a correct matrix V . In this scheme, we would use the Gimpy CAPTCHA [17]. Gimpy works bypicking several words from a dictuonary, distorting the text of these words andpresenting the resulting words in the form of an image in front of a human user.The idea is that the current computer programs cannot comprehend the text. Weassume a dictionary of size N . The algorithm Gimpy ( j ) does the same thing asgimpy, except that now it takes a desired number j , of words from the dictionary.Let L be a small positive number, e.g. 11. The whole image screen is divided into L boxes. Let s and t be non-negative integers modulo L , kept as secret betweenthe user and the computer. An initial value x , another non-negative numbermodulo L , is also kept as a secret. A secret question q is constructed from thedictionary words. For example, q could be: “Are there more than three wordsbegining with the letter “B”?”. Let Gimpy ( q, j ) be the algorithm that takes j pictures from the dictionary such that the resulting challenge satisﬁes the ques-tion q , and let Gimpy ( − q, j ) be the one that does not satisfy q . Let Grid ( L ) bethe procedure that concatenates L images (boxes) into one image in the form ofa grid. The protocol is described as follows: Setup.

Randomly generate integers s , t and x modulo a public integer L , andshare them as a secret between the server and the user. Share a secret question q from Q . Protocol. – For i = 1 to k , do: • Compute x i ≡ sx i − + t mod L . • Select a bit b uniformly at random and apply Gimpy ( q, j ) to the x i thbox if b = 1 and Gimpy ( − q, j ) otherwise. • For each of the remaining boxes, apply

Gimpy ( j ). • Apply

Grid ( L ), and present it as a challenge to the user. • The user computes x i ≡ sx i − + t mod L and submits a = q ( x i ), where x i denotes the x i th box. Output accept if all answer’s are correct, otherwise output reject .The protocol is described in the Figure 4. Here L = 4, s = 3, t = 3, x = 5.Thus x = 2 and so the user looks at the picture labelled 2 and answers thequestion q =“Does the picture contain the names of at least two animals?”. Thepictures are taken from the Gimpy webpage [17] How about the total amount Fig. 4.

The Protocol Scheme with L = 4, s = 3, t = 3, x = 5 and secret question q =“Does the picture contain the names of at least two animals?” of required by the adversary? We see that this protocol has the form of theprevious protocol if we let L = L and l = 1. Thus the total amount of workrequired by the adversary is L +12 n ( N )2 log ( n ( N )). The quantity n ( N ) denotesthat the total number of features (possible questions) are a function of the size N of the dictionary. Discussion on the protocol

The main theme of the protocol is to let the humanadversary write down all the words presented in the image. Notice the use of thelinear congruential generator x i ≡ sx i − + t mod L , for small values of L andhence s and t . This is used to induce randomness in the selection of the secret box.Although this is not a cryptographically strong pseudorandom number generator,and certainly not for small values of L , the use is just there to inject some kind ofrandomness. Since the values of x i are not shown in the clear, it is safe enough forour purposes. Notice that for some values of the parameters s , t and x , the x i ’sdo not span the whole set of integers modulo L . This again is not a worry as theadversary does not know which parameters are chosen. But the most importantoncern is: How much do the distorted images of texts and numbers describe?Unfortunately, the full spectrum of features described by a distorted image ofa word from a dictionary is not much more than a natural image. Therefore,we should conclude that although the above scheme puts some autonomy in thechallenge generation process, it is not as secure as the ones which contain naturalimages in terms of the workload on the adversary. Details follow in Section VI. Matrix Interpretation

Once again, we are tempted to use a matrix representationfor the protocol. We could represent it as p ( V ) x = a , where V represents thematrix of features of all the L pictures in every iteration. And p ( V ) representsthe function which picks one of those L pictures according to chosen parametersof the linear congruential generator. Thus this protocol is opposite of the previousone in the sense that instead of diﬀusing the answer string, the pictures are sortof randomly chosen each time. In [1] we proposed that we can have a group of questions as a secret connectedby any combination of logical connectives, like AND, OR and NOT. However, weshould not have greater than a certain number of logically connected features,because otherwise the workload on the legitimate user increases considerably.We could safely use a group of 3 or less questions. The adversary’s algorithmdescribed above and in the analysis will not work in this case, unless all thequestions are connected by the logical AND. There are still ways to go aroundthis, the adversary this time around looks for inconsistent features in the pic-ture and eliminates them in each iteration. More precisely, the adversary’s taskis to ﬁnd a boolean function consisting of 3 or less literals that satisﬁes thetruth assignments of all the literals. More precisely, let V denote the matrixconsisting of features extracted from the pictures. Each column of this matrixrepresents the absence or presence of a feature in the corresponding picture. Ifthe adversary also knows the answer vector a , then its job is to ﬁnd a booleanfunction satisfying the mapping. We could use multiple questions in any three ofthe above protocols. The basic protocol is then reduced trivially to the problemof ﬁnding the boolean function deﬁned above. The enhanced protocol howeverbecomes a bit more tricky, since we do not know the exact evaluation of the ex-pression. The practical scheme also becomes hard as we do not know the literalsbeing used in the evaluation of the boolean function. There can be two variantsof this protocol; The basic protocol and the enhanced protocol. Obviously, theadversary can eliminate inconsistent features in the basic protocol easily as itdoes not involve any random replies. However, if we use the enhanced protocolwith questions connected by logical operators, we can make the adversary’s taskharder as there would be random bits in the answer sequence as well. Experiments

We did a few experiments in order to get an idea about the eﬃciency of ourscheme. The experimental stage consisted of two main experiments: The ﬁrstone was carried out to see how many distinct features can be extracted froma given picture; The second one was to check whether it is easy for a humanuser to tell whether a given feature is present in a picture or not. For the ﬁrstexperiment, we presented the image in Figure 1 to ten participants (all computerscience graduate school students). Each one of them were asked to write down asmany features as they believed the picture contained. A commulative total of 42distinct features were extracted by the participants altogether, not counting themultiplicity of some features (such as the digits 1,2,...,9 are written together).These features are given in the Table 1 along with their frequency which meansthe number of participants who wrote down the corresponding feature:

Table 1.

Experimental ResultsFeature Freq & Res Feature Freq & ResNumbers (Digits 1-9) 5 ✔✔✔

Digits without closed loop(s) 1 ✔✔✔

Black color 4 ✔✔✔

Digits with closed loop(s) 1 ✔✔✔

Columns and rows sum to 15 3 ✔✔✔

A heart 1 ✔✘✔

Diagonal sums to 15 3 ✔✔✔

Sign board 1 ✔✘✘

Square(s) 3 ✔✔✔

Triangle 1 ✔✘✔

Matrix 3 ✔✔✔

Cross(es) 1 ✔✘✔ ✔✔✔

Line(s) 1 ✔✔✔

White color 2 ✔✔✔

Rectangle(s) 1 ✔✔✘

Magic square 2 ✔✔✔

Stair(s) 1 ✔✔✔

Odd number(s) 2 ✔✔✔ ’+’sign 1 ✔✔✔

Black line(s) 2 ✔✔✔

Circle(s) 1 ✔✔✔

Hook 1 ✔✘✔

Zig zag path 1 ✔✘✔

Slide 1 ✔✘✔

Alphabets C,S,L and O 1 ✔✘✔ ’X’ sign 1 ✔✘✘

The string 492357816 1 ✔✔✔

The string 438951276 1 ✔✔✔

Distortion in slanted line(s) 1 ✔✘✔

Table 1 ✔✘✔

Array 1 ✔✔✔

Balance 1 ✘✘✘

Equilibrium 1 ✔✘✘

Symmetry 1 ✔✔✔

Complements 1 ✔✔✘

Typed digit(s) 1 ✔✔✔

White area > Black area 1 ✔✔✔

Even number(s) 1 ✔✔✔

The right angle 1 ✔✔✔

Mathematics 1 ✔✔✘

Arithmetic 1 ✔✔✘

Once the features were collected, they were shown to three separate individ-uals not present in the ﬁrst experiment. They were shown the picture and askedto answer whether the given list of features found by the participants in the ﬁrstexperiment were present in the picture or not. Their responses are shown in theolumn labeled ”Freq and Res” where a ✔ represents that the corresponding par-ticipant believed the feature to be present in the picture. Not surprisingly, thefeatures with the higher frequencies were answered correctly by all three users.On the other hand, some of the single frequency features were also answeredcorrectly by all three users. The ones with indiﬀerent answers are those thatrequire a ‘keen’ eye, e.g. “hook”. These experiments show that even a simplepicture as the one shown in Figure 1 can have a lot of features, majority ofwhich are very easy to answer but not so easy to extract. There can still be a lotmore features present in the picture; one such example is ”‘one side of a rubik’scube”’. This survey gives us some guidelines while choosing the pictures and/orsecret questions: – Do not use pictures whose main object is the secret feature. So for example,if we chose the picture of Figure 1 as the challenge picture, then the secretquestion of ”‘Does the picture contain the digits 1-9?”’ will certainly be abad choice. – Do not use simple pictures. Simple pictures contain very few features. Thisis evident from Figure 1. Although, one may still be able to think of morefeatures, there does not seem to be a big number of features. – Do not use secret questions which are hard to answer by the legitimate user.As an example, the feature ”‘Equilibrium”’ in the above table was answered”‘no”’ by two users. This seems hard to ﬁnd out in the picture and needsmore of a philosophical eye. – Always allow for user error. So for example, if the user replies ’10’ pictures,allow an error of 2 to 3 wrong answers. This is clear from the picture that auser answered ”‘no”’ to the feature ”‘Mathematics”’, even though it seemsto be describe the ﬁgure.

In this section, we would like to analyze the relationship between the hiddenpermutation σ and the secret question q in the enhanced protocol. The purposeof analysis is to ﬁnd out the strength of the protocol if one of these secrets isleaked out. For the ﬁrst part, let’s assume that the adversary somehow found outthe hidden permutation σ but not the secret question. Since the adversary knowsthe hidden permutation, it knows exactly which questions are being answeredand which one are being answered randomly. Thus the adversary can neglectthe randomly answered pictures and use the pictures with correct answers tolook for the secret question. Thus this transforms to the Basic Protocol in astraightforward manner.For the other side, let us assume the adversary H who knows the secretquestion q . It evaluates each picture for upto k iterations. Let X i ( t ) be thevariable representing the evaluated bit of the i th picture in the t th iterationof the enhanced protocol, where 1 ≤ t ≤ k . H thus evaluates the followingnformation after k iterations: X ( t ) X ( t ) · · · X L ( t ) b b · · · b L b b · · · b L ... ... ... ... b k b k · · · b kL where each b ij is the bit evaluated by H for the corresponding picture. H alsohas the following response table from the legitimate user: Y ( t ) Y ( t ) · · · Y L ( t ) a a · · · a L a a · · · a L ... ... ... ... a k a k · · · a kL where each Y i ( t ) represents the user’s response bit to the i th picture in the t thiteration. The adversary now runs the following simple algorithm: – Initialize σ ( . ) = null. – For each X i ( t ), check whether there exists a Y j ( t ) such that the two matchat every corresponding bit position. • If there is only one such Y j ( t ) then mark σ ( j ) = i . • If there are two such Y j ( t )’s then halt . – Assign * to each unassigned position of σ ( . ) – Output the permutation σ and halt .We now state the following theorem: Theorem 1 (cid:18) − k (cid:19) L − (cid:18) lL (cid:19) + (cid:18) − k (cid:19) L (cid:18) − lL (cid:19)! L ≤ Pr [ σ is correct] ≤ (cid:18) − k (cid:18) − lL (cid:19)(cid:19) L Proof.

Let A i be the event that the adversary correctly guesses the i th positionof the permutation σ . Without loss of generality, we assume that the adversarystarts with the left most position and goes on to the next position in sequentialorder. We have to ﬁnd the probability:Pr [ σ is correct] = Pr [ A ] Pr [ A | A ] · · · Pr [ A L | A ∧ A ∧ . . . ∧ A L − ]Now let B i be the event that position i is not the don’t care position. Let B i bethe complementary event. It is clear that:Pr [ A i ] = Pr [ A i | B i ] Pr [ B i ] + Pr (cid:2) A i | B i (cid:3) Pr (cid:2) B i (cid:3) ig. 5. Guessing Probability

It is easy to see that: Pr [ B ] = lL and Pr (cid:2) B (cid:3) = 1 − lLIf B is true then X ( t ) matches at least one of the Y j ( t )’s . The adversary’salgorithm will guess it correctly if there is only one such Y j ( t ). So:Pr [ A | B ] = Pr [There exists only one j such that Y j ( t ) = X ( t )]= 1 × − (cid:18) (cid:19) k ! × . . . × − (cid:18) (cid:19) k !| {z } L − = − (cid:18) (cid:19) k ! L − Now if B is true, then the algorithm will detect this if X ( t ) does not matchany of the Y j ( t )’s. The probability of this being true is:Pr (cid:2) A | B (cid:3) = − (cid:18) (cid:19) k ! · · · − (cid:18) (cid:19) k !| {z } L times = (cid:18) − k (cid:19) L With this we get:Pr [ A ] = lL (cid:18) − k (cid:19) L − + (cid:18) − lL (cid:19) (cid:18) − k (cid:19) L et us calculate the probability Pr [ A L | A ∧ A ∧ . . . ∧ A L − ]. This is equalto: Pr [ A L | A ∧ A ∧ . . . ∧ A L − ] =Pr [ A L | B L ∧ A ∧ A ∧ . . . ∧ A L − ] Pr [ B L | A ∧ A ∧ . . . ∧ A L − ] +Pr (cid:2) A L | B L ∧ A ∧ A ∧ . . . ∧ A L − (cid:3) Pr (cid:2) B L | A ∧ A ∧ . . . ∧ A L − (cid:3) It is straight forward to see that:Pr [ B L | A ∧ A ∧ . . . ∧ A L − ] = lL and Pr (cid:2) B L | A ∧ A ∧ . . . ∧ A L − (cid:3) = 1 − lL .We also have: Pr [ A L | B L ∧ A ∧ A ∧ . . . ∧ A L − ] = 1and Pr (cid:2) A L | B L ∧ A ∧ A ∧ . . . ∧ A L − (cid:3) = 1 − k We get the ﬁnal result:Pr [ A L | A ∧ A ∧ . . . ∧ A L − ] = lL + (cid:18) − k (cid:19) (cid:18) − lL (cid:19) = 1 − k (cid:18) − lL (cid:19) Note that all other conditional probabilities for the events A , A , . . . , A L − mustlie between these two calculated probabilities. This gives us the upper and lowerbounds for the probability of guessing the correct permutation as: (cid:18) − k (cid:19) L − (cid:18) lL (cid:19) + (cid:18) − k (cid:19) L (cid:18) − lL (cid:19)! L ≤ Pr [ σ is correct] ≤ (cid:18) − k (cid:18) − lL (cid:19)(cid:19) L Let S = { , , . . . , n } denote the universal set of all features. Let A and A denote two subsets of this set. In actual, A and A , denote the set of features oftwo pictures drawn randomly according to an arbitrary distribution. We assumethat any feature i in S is equally likely to occur in any of the subsets drawn. Wedeﬁne the following two indicator variables: I i = (cid:26) i ∈ A J i = (cid:26) i ∈ A | A ∩ A | = n P i =1 I i J i . The number of subsets of S containinga given feature i would be 2 n − . Since each subset of S is equally likely to contain i regardless of the distribution with which the subset is drawn out, we have:Pr [ A contains i ] = Pr [ A contains i ] = 2 n − (cid:14) n = 1/2rom this and the fact that the two subsets are drawn independently of eachother, we have: E [ | A ∩ A | ] = n X i =1 E [ I i J i ] = n X i =1 E [ I i ] E [ J i ]Now, E [ I i ] = P A ⊆ S I i Pr [ A is chosen from S ] = 2 n − (cid:14) n = 1/2. Similarly, E [ J i ] =1/2. Finally, this gives us: E [ | A ∩ A | ] = n X i =1 E [ I i ] E [ J i ] = n /4In general we can see that: E [ | A ∩ A ∩ . . . ∩ A t | ] = n t +1 = 12 E [ | A ∩ A ∩ . . . ∩ A t − | ]We analyze the three protocols using the result obtained above. According toover discussion in Section III, the adversary will try to narrow down the numberof possible secret features by using the algorithm GetBasicQ described in SectionIII.

The adversary H looks at the current picture and its answer given by the user.It performs the procedure called GetBasicQ in described in Algorithm 1. Noticethat “Compute A k ” means extracting the features of picture A k . Algorithm 1

GetBasicQ

Input:

A set of pictures A , A , . . . together with their answers a , a , . . . Output:

The secret feature q if the answer bit position is 0 then

2: Wait for the next iteration.3: else

4: Extract the features in the picture as A repeat

6: For each picture k :7: if a k = 1 (The answer bit at position j then

8: Compute A k (The picture at position i ) and assign A ← A ∩ A k .9: else

10: As a k = 0 (The answer bit at position j ), so compute A k (Thepicture at position i ) and assign A ← A − A k .11: end if until | A | = 1 and halt .13: end if ext we compute the expected number of steps the adversary has to waituntil he gets the above algorithm to halt. Theorem 1 If A successfully extracts all the features, then the expected numberof steps is log n Proof.

Let A ( k )1 be the set A after the k th step. Our inductive proof is as follows:First assume that k = 2. There are two cases: First if a k = 1 then as computedabove: E h A (2)1 i = E h(cid:12)(cid:12)(cid:12) A (1)1 ∩ A (cid:12)(cid:12)(cid:12)i = n (cid:14) And, if a k = 0, we also get the result: E h A (2)1 i = E h(cid:12)(cid:12)(cid:12) A (1)1 − A (cid:12)(cid:12)(cid:12)i = E h(cid:12)(cid:12)(cid:12) A (1)1 ∩ A c (cid:12)(cid:12)(cid:12)i = n (cid:14) This is true, since A c , the complement of A , is also a subset of S .Now, in general for k = t , we have: E h A ( t )1 i = n (cid:14) t So, if a t = 1, then: E h A ( t +1)1 i = E h(cid:12)(cid:12)(cid:12) A ( t )1 ∩ A t +1 (cid:12)(cid:12)(cid:12)i = 12 E h A ( t )1 i = n t +1 And, if a t = 0, then again: E h A ( t +1)1 i = E h(cid:12)(cid:12)(cid:12) A ( t )1 − A t +1 (cid:12)(cid:12)(cid:12)i = E h(cid:12)(cid:12)(cid:12) A ( t )1 ∩ A ct +1 (cid:12)(cid:12)(cid:12)i = 12 E h A ( t )1 i = n t +1 The adversary will stop for some k if E h A ( k )1 i = 1, this means that: n (cid:14) k = 1 ⇒ k = log n Now, the number of iterations (pictures) for the adversary to observe arelog n . At each step the adversary has to extract the features of a picture, hencethe expected amount of work at each step is n /2 and hence the total amount ofwork to be done by the adversary in the basic protocol is: n log n . How about theprobability of success of the algorithm? We have assumed in this analysis that theadversary can extract all features in the image. In general, an adversary mightnot be able to extract everything in an image, including the secret feature. Wecan associate an average probability of p with the extraction secret feature, whichshows that the secret will be extracted at an average probability of p wheneverthe adversary is presented with a picture with answer ’1’. The ’1’ instances occurwith an equal probability of 1/2. Therefore, the average probability of successof the above algorithm is: p log2 n . In the special case, where the probability ofthe secret picture being extracted out is 1/2, the average probability would be: (cid:0) (cid:1) log2 n = √ n . .2 The Enhanced Protocol Now suppose the adversary H wants to ﬁnd out the hidden question in theenhanced protocol. This time the adversary cannot use the simple procedure GetBasicQ it used for the basic protocol because of the use of the permutation σ .It has to be selective in its choices. This time the adversary has to use a slightlymodiﬁed version of GetBasicQ called

GetEnhancedQ described in Algorithm 2.

Algorithm 2

GetEnhancedQ

Input: A ≥ L set of pictures A , A , . . . together with an equal number of bits a , a , . . . Output:

The secret feature q

1: Select a random picture position i between 1 and L .2: Select a random answer position j between 1 and L .3: if the answer bit position is 0 then

4: Wait for the next iteration.5: else

6: Extract the features in the picture as A repeat

8: For each iteration k :9: if a k = 1 then

10: Compute A k and assign A ← A ∩ A k .11: else

12: As a k = 0, so compute A k and assign A ← A − A k .13: end if until | A | = 1 and halt .15: end if Now assume that the adversary does the following: Whenever it has guessedan incorrect path, it executes the above algorithm for an expected number oflog n steps and then goes back again to choose a diﬀerent path (This showsthe expected time until the adversary realizes that it has chosen the wrongpicture and answer pair). For each picture position 1 ≤ i ≤ L , there are apossible L answer positions. Each of these are equally likely for the adversaryto pick. Out of these, only l result in halting the algorithm. If the adversary,found the correct path, it will stop by outputting the feature. This can occurwith probability l (cid:14) L . If the adversary chooses the wrong path, it will go backagain and choose another path. The total number of correct paths in the seconditeration would be: l (cid:0) L − (cid:1) . Thus the probability of the adversary stoppingafter two iterations would be: lL (cid:16) L − lL − (cid:17) . Continuing in this fashion, if we letPr [ y i ] denote the probability of the adversary stopping at the i th step, we get:Pr [ y i ] = lL (cid:18) L − lL − (cid:19) · · · (cid:18) L − l − ( i − L − ( i − (cid:19) , for i ≥ Y denote the number of steps taken by the adversary. Thus y i ∈ Y denotes the number of steps in the i th path. We get: E [ Y ] = lL log n + lL (cid:18) L − lL − (cid:19) (2 log n ) + · · · + lL (cid:18) L − lL − (cid:19) (cid:18) L − l − L − (cid:19) · · · L − l − (cid:0) L − l − (cid:1) L − ( L − l ) ! (cid:0) L − l + 1 (cid:1) log n = lL log n (1 + 2 (cid:18) L − lL − (cid:19) + · · · + (cid:0) L − l + 1 (cid:1) (cid:18) L − lL − (cid:19) (cid:18) L − l − L − (cid:19) · · · L − l − (cid:0) L − l − (cid:1) L − ( L − l ) ! )= lL log n (cid:0) L − l (cid:1) !( L − X L − l +1 i =1 i (cid:0) L − i (cid:1) !( L − l − i + 1)! ! = lL log n (cid:0) L − l (cid:1) !( L − L (cid:0) L + 1 (cid:1) (cid:0) L − (cid:1) ! l ( l + 1) ( L − l )!= L + 1 l + 1 log n In light of the previous reult, the total amount of work done by the adversary is L +1 l +1 n log n . The probability of success of the algorithm depends on the whetherthe adversary has chosen the correct combination of picture and answer positionpair. Thus if we again let p be the average proabibility of successfully extractingthe feature, the result come out to be the same as before: p log2 n . We could analyze the workload in the practical scheme by viewing the behaviorof the linear congruential generator for small values of the arguments. However,for simplicity, we can assume that the position of the next picture is determinedanalogous to the previous protocol. Thus we can let L = L and l = 1 in ourresult for the previous protocol. What about the value of n ? Ofcourse this shoulddepend on the dictionary size N which can be anywhere in the range of 10 to10 . However, n represents the number of distinct features and this could not bepossibly more than N . We say this because two words might contain the sameletters like “wolf” and “ﬂow”, and two words might represent the same concept,like synonyms. Therefore we can assume n = xN where 0 < x <

1. Assuming x = 0 . p +12 N log (cid:0) N (cid:1) . Theprobability of success in this case would also come out to be p log2 n . .4 Comparative Workloads of the Three Protocols Based on the results obtained in the previous subsections, we can show thecomparative workloads on the adversary in the three protocols. First we showthe workload by ﬁxing n = 10 and N = 10 and plotting the three graphs asa function of L . In the enhanced scheme we have assumed l = (cid:6) L (cid:7) . The threeplots are shown in Figure 6. Fig. 6.

Comparative workloads of the adversary in the three schemes

The workload of the adversary increases in the two schemes with the increas-ing value of L as compared to the basic protocol. Notice the use of “PracticalZone Delimiter”. This is placed at a value of L = 25, since we believe that puttingmore pictures in a given iteration would indeed place computational burden onthe human user. For small values of L , we see that the Basic and the Enhancedschemes work better as compared to the Practical Scheme. As we increase L even beyond the practical zone, the practical scheme becomes better. But this isbecause we have ﬁxed l to be one half L in the enhanced scheme. The advantageof the two schemes over the basic scheme does not come without a disadvantage.The memory and processing requirements of the other two schemes also increasewith an increment in L . The following table shows the comparison:Figure 7, shows the comparative workloads as a function of n ( N in thepractical scheme case). The value of L is ﬁxed at 20. The range of n is from10 to 10 and that of N is from 10 to 10 . Interestingly, the enhanced schemebecomes better with larger values of n as should be evident from the fact that N has a much smaller value as compared to n . ig. 7. Comparative workloads of the adversary in the three schemes with changing n

Table 2.

Qualitative comparison of the three schemesScheme Image Evaluation Memory ComputationBasic ✔ q ✘ Enhanced ✔ q and σ ( L, l ) ✘ Practical ✔ q and a, b, x i all < L L Finally we show the interrelationship between l and L in the enhancedshceme. We see that the workload of the scheme increases signiﬁcantly withlower values of l and higher values of L . This is shown in Figure 8. Human identiﬁcation protocols can be a good alternative for the traditionallyless secure password based systems. Over the years researchers have tried to con-struct eﬃcient Human identiﬁcation protocols which are secure against passiveor active adversaries. However, the protocols run short in terms of eﬃciency andsecurity. One such protocol was proposed by us in [1] and its security was basedon the “conjectured diﬃculty” of obtaining the secret after observing some au-thentication sessions. In this paper, we have extended the work by giving somecandidate alternative protocols, ﬁnding the exact harndess of these protocols interms of the eﬀort required by the human adversary as well as giving a detailedanalysis of the protocol proposed in [1]. A brief survey regarding the number of ig. 8.

Graph of the adversary’s workload in the enhanced scheme with changing l andﬁxed L possible features in an image was also carried out. Our results show that a prac-tical implementation of the protocol might be feasile provided we are against aresource constrained human adversary.A notable future line of work is to device a similar protocol secure againstactive adversaries. This might involve sending challenges to the server by thehuman user similar to [19]. However, it remains an open problem whether wecan ﬁne tune the protocols so as to make them secure against active adversarieswithout increasing the workload on legitimate users. Another direction of futurework is to come up with a diﬀerent model for the distribution of features inimages found on the web. This might give a close to realistic quantiﬁcation ofthe workload on the adversary. References

1. Hassan Jameel, Riaz Ahmed Shaikh, Heejo Lee and Sungyoung Lee: Human Iden-tiﬁcation Through Image Evaluation Using Secret Predicates. Topics in Cryptology- CT-RSA 07, Lecture Notes in Computer Science, Springer-Verlag. (2007)67–842. Matsumoto, T., Imai, H.: Human Identiﬁcation through Insecure Channel. Advancesin Cryptology - EUROCRYPT 91, Lecture Notes in Computer Science, Springer-Verlag. (1991) 409–4213. Jermyn, I., Mayer, A., Monrose, F., Reiter, M., Rubin, A.: The design and analysisof graphical passwords. 8th USENIX Security Symposium (1999).. Wang, C.H., Hwang, T., Tsai, J.J.: On the Matsumoto and Imai’s Human Iden-tiﬁcation Scheme. Advances in Cryptology - EUROCRYPT 95, Lecture Notes inComputer Science, Springer-Verlag. (1995) 382–3925. Matsumoto, T.: Human-computer cryptography: An attempt. 3rd ACM Conferenceon Computer and Communications Security, ACM Press. (1996) 68–756. Xiang-Yang Li, Shang-Hua Teng: Practical Human-Machine Identiﬁcation over In-secure Channels. Journal of Combinatorial Optimization. (1999) 347–3617. Hopper, N.J., Blum, M.: Secure Human Identiﬁcation Protocols. Advances in Cryp-tology - Asiacrypt 2001, Lecture Notes in Computer Science, Springer-Verlag.3621