[PDF] PatternMonitor: a whole pipeline with a much higher level of automation for guessing Android lock pattern based on videos

Abstract

Pattern lock is a general technique used to realize identity authentication and access authorization on mobile terminal devices such as Android platform devices, but it is vulnerable to the attack proposed by recent researches that exploit information leaked by users while drawing patterns. However, the existing attacks on pattern lock are environmentally sensitive, and rely heavily on manual work, which constrains the practicability of these attack approaches. To attain a more practical attack, this paper designs the PatternMonitor, a whole pipeline with a much higher level of automation system againsts pattern lock, which extracts the guessed candidate patterns from a video containing pattern drawing: instead of manually cutting the target video and setting thresholds, it first employs recognition models to locate the target phone and keypoints of pattern drawing hand, which enables the gesture can be recognized even when the fingertips are shaded. Then, we extract the frames from the video where the drawing starts and ends. These pre-processed frames are inputs of target tracking model to generate trajectories, and further transformed into possible candidate patterns by performing our designed algorithm. To the best of our knowledge, our work is the first attack system to generate candidate patterns by only relying on hand movement instead of accurate fingertips capture. The experimental results demonstrates that our work is as accurate as previous work, which gives more than 90\% success rate within 20 attempts.

Full PDF

PPatternMonitor: a whole pipeline with a much higher level of automation forguessing Android lock pattern based on videos

Yangde Wang

Shanghai Jiaotong University

Weidong Qiu

Shanghai Jiaotong University

Yuming Xie

Shanghai Jiaotong University

Yan Zha

Shanghai Jiaotong University

Abstract

Pattern lock is a general technique used to realize identityauthentication and access authorization on mobile terminaldevices such as Android platform devices, but it is vulnerableto the attack proposed by recent researches that exploit infor-mation leaked by users while drawing patterns. However, theexisting attacks on pattern lock are environmentally sensitive,and rely heavily on manual work, which constrains the practi-cability of these attack approaches. To attain a more practicalattack, this paper designs the PatternMonitor, a whole pipelinewith a much higher level of automation system againsts pat-tern lock, which extracts the guessed candidate patterns froma video containing pattern drawing: instead of manually cut-ting the target video and setting thresholds, it ﬁrst employsrecognition models to locate the target phone and keypointsof pattern drawing hand, which enables the gesture can berecognized even when the ﬁngertips are shaded. Then, we ex-tract the frames from the video where the drawing starts andends. These pre-processed frames are inputs of target track-ing model to generate trajectories, and further transformedinto possible candidate patterns by performing our designedalgorithm. To the best of our knowledge, our work is the ﬁrstattack system to generate candidate patterns by only relyingon hand movement instead of accurate ﬁngertips capture. Theexperimental results demonstrates that our work is as accurateas previous work, which gives more than 90% success ratewithin 20 attempts.

Psychological research shows that for the human brain, visualinformation is more convenient to remember and recall thancharacter and number information (i.e. PIN- or text-basedpassword) [16, 38]. Beneﬁt from the convenience provided bypattern lock, it is widely applied on various mobile terminaldevices including Android devices to provide identity authen-tication and access authorization for these devices. A recentsurvey [41] showed that nearly 73% of respondents choose to set up pattern locks on their mobile devices, and the main-stream payment applications such as Paypal and Alipay alsoenable users to choose the pattern lock their individual loginmethod. However, due to the openness of the unlocking sce-nario, pattern lock users confront with various security threats.For the sake of user’s security and privacy, it is necessary toreveal the vulnerability that pattern lock may encounter inpractical scenarios.There are multiple attack systems have been proposed inrecent years to exploit the potential vulnerabilities of patternlock. In 2010, Adam et al. [4] proposed the Smudge attackby analyzing the leveraged oily residues left on screen toguess unlock patterns. Zhang et al. [45] designed an attacksystem which reconstructs unlock pattern by monitoring wire-less signals. Zhou et al. [47] developed an app, which collectsacoustic signals while victims drawing lock pattern, then for-wards these signals to the remote server to recover the unlockpattern. However, the practicality of above attack systems isconfronted with restrictions in the real world scenario. Specif-ically, Smudge attack may be interfered by oily residues fromthe user’s historical operations (not necessarily from the mostrecent). The Wireless-signal attack requires a complicatednetwork setup, and the success rate may be easily disturbedby environments (e.g.people walking by). The acoustic attackrequires that a speciﬁed app should be installed in the targetdevice, which is unfeasible in reality. Alternatively, consider-ing each unlock pattern can be regarded as the combinationof ﬁnite numbers, thus it is possible to guess unlock patternsfrom video footage. In 2017, Ye et al. [41] cracked patternlock by tracking the victim’s ﬁngertips motion from the videofootage, but their work heavily relies on continuous ﬁngertipsmotion, which indicates that the attack may failed if the ﬁnger-tips are not continuously captured by the camera. Moreover,their attack requires signiﬁcant manual efforts, including cut-ting videos, locating ﬁngertips, deciding start typing and endtyping moment and etc, this implies that their proposed attacksystem is hard to be implemented in practical scenarios.To achieve a more practical attack on the pattern lock,this paper guesses the unlock pattern from the video footage1 a r X i v : . [ c s . CR ] F e b ith the recognition model and our designed algorithm, in-stead of manually locating the ﬁngertips of people who us-ing the target device, in which way we have built the fullyautomatic attack system PatternMonitor. The advantage ofPatternMonitor includes: First, as a fully automatic attack sys-tem, PatternMonitor can prevent the errors and uncertaintiescaused by manual operation. Secondly, in comparison withthe manual processing in related literatures, the fully auto-matic mechanism we designed improves the performance ofPatternMonitor for processing a video footage. According tothe experimental implementation, we can test a video contain-ing the whole procedure of unlocking device including takingout the device, waking up the pattern grid, drawing pattern,and making sure the device is unlocked, within 60 seconds,this is more time-efﬁcient than the manual operations. Thirdly,we provide a novel threaten model against the lock pattern.The experimental results show that in addition to ﬁngertips,it is promising to reconstruct the unlock pattern by captur-ing other keypoints on the hand. This implies that our workwould arouse the reappraising of the academic and industrialcommunitees to the security of the pattern lock, even thoughit has been extensively applied worldwide. Contributions.

The contributions of this paper is enumer-ated as follows. . We designed a whole attacking pipeline with a muchhigher level of automation. . We used multi-point tracking of hands to improve accu-racy of recovering use input. . We conducted initial experiments under two differentrecording conditions and showed that our proposed attackingpipeline outperformed the state-of-the-art work.The remainder of the paper is organised as follows: InSection 2 we brieﬂy overview the related works. Section 3describes the threat model, and the attack scenarios in detail.Section 4 elaborates the experimental implementation of ourproposed automatic attack system, and in Section 5, we givethe evaluation of the experimental results. Section 6 discussesthe limitation factors of our proposed attack system, and thefeasible approach to resist the proposed attack on the patternlock. The last section concludes this paper. Mobile devices have accepted widespread popularity in recentyears. However, they often serve in privacy sensitive environ-ments, and some of the installed applications [10, 11] whichinvolve sensitive information are also easily exposed to theaccessing by unauthorized users.

Threat scenarios

In a nutshell, mobile devices are con-fronted with the following typical threat scenarios. . Adversary taping: This threat scenario requires an ad-versary to obtain sensitive information from the victim with-out the victim being aware of it. Prior work [33] demon-strates that with a long distance(e.g. 5m/8m), an adversary can also collects enough information to implement the crack-ing procedure. In addition, the increasing types of wearabledevice [20, 23, 36] make this attack more convenient and un-perceivable. . Surveillance camera monitoring: Surveillance camerashave been widely deployed in public places, and the threat ofsurveillance cameras to personal privacy has been discussedin [31, 39]. . Shoulder surﬁng: [19, 28] claim that shoulder surﬁngattacks are more likely to be conducted in a close distance.however, these researches mainly concentrate on the peep tothe static information, such as instant message, web pagesand etc., while the dynamic objectives including passwordinputting and pattern lock drawing are out of consideration(3 unlock patterns and 4 passwords are reported out of 189samples). Recent works [2, 3, 46] have proposed some preven-tive measures against the shoulder surﬁng, including turningdown the screen luminance or changing the operation in-terface. This paper just sketchily discusses shoulder surﬁng,since the movement of ﬁngertip is visible in this scenario. Cracking lock pattern

This kind of attack can be imple-mented in various ways. In 2010, Aviv et al. [4] tried to cracklock pattern by collecting information from oily residues lefton the screen. The feasibility of this attack approach is ques-tionable sinceusers may wipe away the oily residues for theirfrequently operations on the mobile phone. Besides, this at-tack approach can only be succuessfully implemented, onthe premise of obtaining the target device. In 2016, Zhanget al. [45] tried to reconstruct lock pattern by monitoringthe differences of WIFI signal during drawing lock pattern.This approach is not practical, since the attacker is requiredto access the router the target device connected before, thenconduct the complicated conﬁguration on it. What’s more,this attack approach is proved to be environmentally senstive,that is, the success rate of attack is heavily inﬂuenced by thecircumanstance change. In 2017, Zhou et al. [47] came upwith a novel approach to reconstruct lock pattern by hearingacoustic signals. However, to hear the acoustic signals, the at-tacker is required to install a speciﬁc APP in the target device,thus this attack approach is also impractical. Subsequently,Abdelrahman et al. [1] designed a new attack system whichmonitors the thermal information during identity authentica-tion. This attack approach is similar to the Smudge attack, butit may be easily disrupted by extra on-screen operations.In 2018, Ye et al. [42] tried to crack lock pattern from thevideo footage that includes the victim’s ﬁngertips motion ofunlock pattern. This is a very closely related work to thispaper.However, our work differs from [42] in the followingaspects: 1. [42] includes much manual work , such as cuttingmovies, identify ﬁngertips, setting phone angles etc.Our workuses device recognition module, hand recognition module,Key points tracking module, Trajectory processing moduleto automatically do these jobs, which means more effective,and less ad hoc. 2. [42] works only when the ﬁngertips can2e seen during the whole drawing pattern procedure, whileour work gives a prior list of hand key points including ﬁnger-tips.Using this list, our tracking module can record each keypoint’s motion individually in descend order, and our patternrecognizing module will generate guess candidates by theoptimized trajectory. 3. [42] make some assumptions (e.g. be-fore or after unlocking, users tend to pause for a few seconds)to complete their attack. Our work gives a more scientiﬁc wayto prove it, and builds a recognizer based on machine learningalgorithm to automatically locate the start and end frame ofdrawing lock pattern.

Video-based attack

A large number of video-based attackapproaches have been developed to break the identity authen-tication mechanism on the mobile phone. The attack modelspresented in [7, 8, 13, 40] can be successfully implemented onthe premise of capturing the screen display while typing thetext-based key. Therefore, for the user who intentionally shel-ters the screen while typing, these attack approaches wouldnot work. Maggi et al. [25] published a feasible attack model,which records the video while user typing, and further ex-tracts the privacy information from the recorded video. Toattain this attack, their system ﬁrst beneﬁts from the displayfeedback mechanism (the enlarged key display while it isbeing typed), and the camera is required to directly point tothe screen of the target device. A similar attack on text-forminput including the password has been proposed recently byYue et al. [43, 44]. Their attack is achieved by the advancedcamera device such as Google glasses to remotely recordthe screen and ﬁngertip movements, which signiﬁcantly in-creases the stealthiness of the attack. The above work aimsto the text-form passwords and PIN codes. However, as anextremely common identity authentication mechanism, thesecurity vulnerability of pattern lock is rarely considered.

Pattern lock is a popular authentication mechanism which iswidely used in mobile devices, both on unlock screen andsome APPs such as Paypal and Alipay, since it is easily toremember and recall than the text-form password [16]. Gener-ally, pattern lock requires user to conﬁgure a graphic patternthat connects a sequence of contact points arranged in a 3 × ?? gives a speciﬁcinstance of the graphic pattern. In practice, an valid graphicpattern should satisfy the following requirements [5]. 1) Itmust contains at least 4 points, 2) Each point just for once use,3) It must be entered without lifting and 4) It may not avoid apreviously unselected contact point. With these requirements,in a 3 × (a) Pattern Layout and Pattern 1-2-3-4-5-6-7-8-9(b) Valid Pattern Examples (c) Invalid Pattern Examples Figure 1: Some valid and invalid pattern examples.patterns and invalid ones.

According to a recent research [12], nearly 40% of the usersset the pattern lock rather than the PIN as their identity authen-tication in their mobile devices. Since the camera on personalmobile devices such as phones and pads have more powerfulability to take photo and video. In addition, for the sake ofpublic safety, surveillance cameras have been increasingly de-ployed in public place [9,18], which also threaten the securityof the pattern lock. Naturally, here we describe two typicalthreat scenarios against the security of pattern lock which arerarely mentioned in previous researches.

As [18] says, the number of surveillance cameras are growingall over the world. These cameras are deployed not only inpublic place, but also in domestic such as coffee houses, hotels,and malls. In these places, surveillance cameras are usuallyset up near the ceiling, that is, about 3 metres above the ﬂoor.We choose this scenario as a typical threaten for two reasons:1). Surveillance cameras set up in these places are easy to beinvaded [14]. Due to employees of these places are alwayslacking of knowledge of security, hackers may take advantageof this fact, then hacks into the cameras to steal the videostream. 2). The user in these places may not be aware ofbeing monitored by surveillance cameras, which will leadthem to let their guard down to unlock their devices in an easyway, i.e. just draw the unlock pattern without any cover. Anexample of this scenario is illustrated as Figure 2(a).3 .2.2 Scenario 2 : Face-to-face taping

The biggest difference between this scenario and scena-tio 3.2.1 is that the screen of target device and user’s ﬁngertipsof his pattern-drawing hand are all inevitably missing. To thebest of our knowledge, ours is the ﬁrst work that can generatepattern guess candidates only on the basis of observed handmovements. Figure 2(b) gives an example of this scenario.It shows that, when the victim draws a pattern, an adversarymay sit not very far away from him, taping the victim’s ac-tions. In this scenario, the victim usually has an illusion ofself-security for following reasons: 1). He/She may believethat the screen of his/her phone and ﬁngers are not visible. 2).The adversary can conceal his actions in divers ways, such asusing small wearable devices to tape, or pretending to watchmovies but actually taping with mobile phones. (a) Scenario 1 : Surveillance cam-era monitoring (b) Scenario 2 : Face-to-face tap-ing

Figure 2: Examples of attack scenarios discussed in our paper.

To overcome the ﬂaws prior work has which make the attackinapplicable, we need to address some new challenges:

Challenge 1:

TBD

Challenge 2:

TBD

Challenge 3:

Which point on the hand can be consideredto be key point? Will different key points affect the accuracyof tracking hand motion? Especially on scenario 2, since wecan not see the ﬁngertips, how to track hand motion?

Challenge 4:

How to locate the frames that the user starts todrawing the pattern, and the ones that user ﬁnishes drawing?

Challenge 5:

Under both scenarios, no matter surveillancecameras or hand holding devices, have vertical distance off thetarget device, so we can not simply use afﬁne transformationto transform the trajectory to the pattern of user’s perspec-tive. Then how can we generate guess candidates from thetrajectories?

To illustrate how our PatternMonitorworks, we make severalreasonable assumptions, which can clarify the attack scenario,but not simplify it, as following:

Assumption 1:

User use pattern lock as the authenticationmethod, which means, if he wants to unlock screen, he willdo the following actions in sequence: (1). take out the targetdevice (i.e. mobile phone). (2). awake the device. (3). startdrawing the pattern.

Assumption 2:

The layout of gird on target device is 3 ×

3, for most modern mobile phones use this layout and do notoffer other alternative options (e.g. 4 ×

4, or 6 × Assumption 3:

The video footage should contain parts ofthe drawing hand, but ﬁngertips, the console’s geometry, orany content displayed on the screen are not necessary. We con-sider this assumption is reasonable for people may cover thedrawing ﬁngertip but not all drawing hand. However, lackingof visual of ﬁngertips, knowledge of console’s geometry, dis-played content left on the screen, will make prior video-basedwork focusing on pattern lock [4, 32, 41, 42] out of action.

Assumption 4:

We assume that user draw the correct pattern.Although it is possible that user may draw wrong pattern bymistake, but we mainly focus on how to automatically guessthe correct pattern from user’s motion, so the incorrect motionis not under our consideration. Furthermore, we also ignorethe situation that user just take out his device but not unlockit (e.g. sweep dust on the screen).

Assumption 5:

We assume that the device’s head has thesame direction of user’s face. With the technique of head poseestimation improving [27, 30], we can easily get the directionof user’s head, so we use this direction as the target device’sorientation.

Assumption 6:

For the attack scenario mentioned in 3.2.1,we only consider surveillance cameras that deployed in rela-tive small indoor space, such as coffee houses, or malls, butnot in public places, or big indoor space such as railway sta-tions or airports. We have this assumption because we donot have professional cameras these places have which canrecord vivid videos even they are deployed in very high place(e.g. 5 metres or higher). In other words, we only considersurveillance cameras which are put up with height of near 3metres.

In this section, we ﬁrst sketchily describe the workﬂow of thePatternMonitor, then elaborate the detailed implementationsteps of our attack system.

Figure. 3 shows the workﬂow of our system. Videos thatcontain an unlocking process ﬁlmed by surveillance camerasor face-to-face shooting are input into the system. To analysethe video footage, our system uses the following steps toautomatically generate candidate patterns.

Locate start point of drawing unlock pattern.

The inputvideo usually does not perfectly cover just the scene in which4he victim draw the graphic pattern, that is, it also containsthe irrelevant frames. In this step, we extract the frames thatthe user starts to draw pattern by the following substeps: 1).Using object detection algorithm to identify target phone. 2).Magnifying the area of the bounding box of target phone, andstart to detect user’s drawing hand. 3). Determining whetherthe user is operating the mobile phone by the relative positionbetween phone and key points of hand. Examples of thesesubsteps are shown in Figure. 5(c), 5(d), 5(e). After thesethree substeps, the approximate starting position of drawingunlock pattern in the video can be considered to be found.

Track hand motion.

Once the start of drawing pattern isfound and key points of hand are detected, tracking algorithmscan be employed to locate key points in each successive frame.Using the relative position between the phone and the hand,a trajectory which reﬂects the user’s pattern drawing is pro-duced.

Optimize trajectory.

After generating the trajectory ofhand motion, We extract the turning points from the trajectoryin this step to make identifying the pattern lock more efﬁcient.The original trajectory can be simulated by a few turningpoints, and the turning points correspond to the ciphers. Forexample, the trajectory in Figure. 8 can be simulated by 7points.

Generate candidate patterns.

In this step, the processedtrajectory (turning points) is mapped to possible patterns. Wedivide the trajectory into small parts, each part is mappedto ciphers. By combining these small parts, we obtain allpossible results. The candidate patterns are sorted accordingto their conﬁdence and are tried one by one on target device.

The ﬁrst big challenge PatternMonitor faces is how to au-tomatically locate frames where target user starts to drawunlock pattern. Previous work of video-based attacks on pat-tern [40, 41] ﬁnd the video segment based on the hypothesisthat the user’s ﬁngertip often pauses for nearly 1.5 secondsbefore and after unlocking. However, as shown in Figure. 4,based on our analysis of our own data sets, although the draw-ing habits are vary for different users, there is no feature of1.5 seconds’ pause to judge whether the user starts or endsdrawing. In the following, we give a new method to locate theframes that target user start are about to drawing patterns. Un-like the previous works, our attack system only give a roughstart point, but it exhibits the excellent experimental results.In both scenarios, we apply an object detection algorithmto detect phones in each frame of the video. One example ofthe workﬂow of the attack to Scenario. 1 is shown in Figure. 5.Once the target phone is detected, we magnify the boundingbox of the phone as hand-detection area by using OpenPose[34] to identify keypoints of user’s hand only in this scope. IfPatternMonitor ﬁnds that the keypoints we predeﬁned appearsin the hand-detection area, we consider the user is about to draw unlocking pattern, and these frames will be labelledas starting unlocking. More details on implementation aredescribed as follows.

Detect phone and phone corner

To identify phones ap-pear in videos, we introduce YOLOv3 [29], a SOTA, real-time object detection system to get this job done. First, wetook 1,000 pictures of mobile phones with different modelsas the phone training set.We also make another 500 picturesof mobiles and manually label phone corner in them to beanother part of phone training set, for the reason that we alsoneed automatically mark phones’ corner as the reference forkeypoints of drawing hand to generate trajectories. Second,to improve the capability for generalization of YOLOv3, weapplied 20 types of image augmentation methods to the origi-nal phone training set of 1,500 images, which led us to have30,000 images in total to train YOLOv3, 20,000 images onlyhave phones and 10,000 images have both phones and cornersrespectively.

Detect keypoints of drawing hand

OpenPose [34] is a ma-ture artiﬁcial intelligence model which uses a multi-camerasystem to train ﬁne-grained detectors for keypoints that areprone to occlusion, such as the joints of a hand. It is simi-lar to ﬁnding keypoints on Face Detection or Body Estima-tion [34, 37], but different from Hand Detection since in thatcase, they treat the whole hand as one object. The model pro-duces 22 keypoints. The hand has 21 points while the 22ndpoint signiﬁes the background. The points are as shown inFigure. 6(a).If there is only one hand appears in detection area, as Fig-ure. 6(b) shows, the location of each keypoint can be easilydetected. However, in pratical, as Figure. 6(c)shows, userstend to use one hand to hold the phone and another handto draw unlocking pattern. In that case, OpenPose will givemultiple interest areas for it isn’t able to distinguish betweenright hand and left hand. If that happens, we just return allinterest areas of keypoints and use the algorithm Section. 4.3describes to exclude the wrong areas by collecting trackinginformation.

In this model, PatternMonitor takes a set of frames that theuser is about to draw unlocking pattern with phone, cornerof this phone, keypoints of drawing hand labeled in each ofthem as input, and generate trajectories of validated keypointsrelative to corner of target phone by using an introducedtracking algorithm named CSRT [24].

Selection of keypoints on drawing hand

One of the bigdifferences between Scenario. 1 and Scenario. 2 is the visualparts of drawing hand. In Scenario. 1, usually all parts of5igure 3: The workﬂow of attack. An adversary takes a video of a user’s pattern input, inputs it into the PatternMonitor, and thengets the possible patterns.0 . .

15 0 . .

25 0 . . . . .

81 The time interval (s) C D F Figure 4: The cumulative distribution function(CDF) of thetime interval between pattern drawing and other on-screenactivities.hand are visible throughout the unlocking process, while inScenario. 2, only parts of hand can be seen. Under this sit-uation, we select the keypoints labeled as number 6, 7, and8 to be alternatives in Scenario. 1, and number 18, 19, 20to be alternatives in Scenario. 2. Attackers can also chooseother keypoints based on what video they get, and we usethis conﬁguration only to show how PatternMonitorworks. InSection. 4.5, we give a way of how to combine the trajectoriesto generate more reliable candidate patterns.

Selection of object tracking algorithm

Many object track-ing algorithms have been proposed such as TLD, BOOSTING,CSRT etc.. Previous work [32, 33, 41] all used TLD as theirtracing tool, but we decide to use CSRT instead for the fol-lowing three reasons: 1). To make our system completelyautomatic, no manual work should be involved in tracking.However, TLD need manually check when the tracking re-sults are in low conﬁdence [32]. 2). Most pattern drawingprocess last no more than 10 seconds, so our work can beseen as a short-term tracking problem. In terms of that situa-tion, we choose CSRT because it is designed for short-termtracking while TLD is for long-term tracking. 3). Our workhas high requirements for the precision of object tracking, andon the OTB100 benchmark, CSRT scores highest in averageprecision plot [24].

Generate trajectory and check validation

Given frameswith phone, corner of phone, keypoints of hand labeled inthem, we use CSRT to track motions of corner of phone, andkeypoints of hand, then generate raw trajectory by calculatingthe relative distance between the two bounding boxes frameby frame. Note that these frames still contain wrong areaswhich are not belong to user’s drawing hand, we can excludethese areas by two assertions: 1). bounding box of wrongareas always moves slightly. 2). bounding box of wrong areasalways moves differently from others. Then, we adjust theraw trajectory by excluding frames before the bounding boxesof keypoints of hand enter the phone area and after the boxesleaves. Note that this raw trajectory still includes extra useractions such as sliding the screen to activate the phone ormoving outside the phone after drawing the pattern, we willuse another algorithm mentioned in ?? to deal with it.However, this tracking process may fail if XXXXXX. Toclarify how this model works, we use Figure. 22 to demon-strate the whole process, and Figure .7 gives examples oftracking results under the two scenarios we mentioned inSection. 3.2. Since raw trajectories always contain noise and redundancy, itis hard to generate candidate patterns directly based on them.In this step, we describe how we optimize trajectories in detailto raise the efﬁciency of pattern identifying.As shown in Figure 8, one trajectory can be simulated bysome turning points. A line segment deﬁned by two turningpoints is considered as the smallest constituent part of a pat-tern. Based on these line segments, we can use our algorithmmentioned in Section. 4.5 to generate candidate patterns. Ourgoal in this step is to extract turning points from the raw trajec-tory and handle the overlap trajectory in some circumstances.We employ the Ramer-Douglas-Peucker [17] (RDP) algo-rithm to extract turning points from trajectory. RDP is analgorithm to reduce the points in a curve but keep the shape,so we can use line segments RDP processed to approximatethe raw trajectory. Figure. 8 shows how RDP works: 1). First,we have a start point (labeled as 0), and an end point (labeledas 6), and RDP uses these two points to form a line segment.6 a) Process the videostream in real time tojudge whether the mo-bile phone appears inthe video (b) The Mobile phonedetected and the systemreturn it’s bounding box (c) Enlarge boundingbox as detection area (d) Detect the numberof hand keypoints in de-tection area (e) The start of unlock-ing located as the num-ber of hand keypoints indetection area is greaterthan threshold

Figure 5: Find Start workﬂow (a) The 21 hand key-points in HandPose (b) The prob map of 06th,07th and 08th keypointused in Scenario. 3.2.1 (c) The prob map of 17th,18th and 19th keypointused in Scenario. 3.2.2.

Figure 6: The example of ﬁnger keypoints2). Calculate all other points in the trajectory, and a new pointwill be added to form new line segments with existing verticesif this point has the longest distance to the line segment, andthe distance of it reaches a predeﬁned threshold. As Figure. 8shows, the point labeled as 2 will be considered to be a newturning point, and new line segments 0 2, 2 6 are formed. 3).Repeat substep 2) until no points need to be considered to beturning points, such as Figure. 8 shows. After RDP algorithmis done, we will get all possible turning points which formthe trajectory.Intuitively, trajectory of pattern who has overlapping linesegments is an obstacle to extracting turning points usingRDP, but in practical, RDP still works well under this situa-tion. There are two kinds of trajectories who have overlapping:1). Overlapping happens at start or end or both of trajectory;2). Overlapping happens in the middle of trajectory. Our al-gorithm of RDP can deal with the second kind of overlappingtrajectory to extract turning points. For the ﬁrst kind, note thatthe raw trajectory still includes some redundant frames, i.e.the process of sliding the screen to activate the phone andmoving outside the phone after drawing the pattern, so RDPalgorithm still can ﬁnd the turning point with overlapping atstart or end. Take Figure. ?? for example, point 2 will not beconsidered as a turning point because its distance to line 0-3 is near zero. We solved this problem by keeping redundanttrajectories at the beginning and end. In this step, the trajectory optimized will be mapped to possi-ble lock patterns. Our approach generates as many candidatepatterns as possible and sorts them by its conﬁdence in de-scending order. [41] used the geometry information of thewhole trajectory to get most-likely patterns. But for Pattern-Monitor, deﬁnite start and end point of drawing pattern arehardly to get, so we use another method instead.

Relationship between optimized trajectory and real pat-tern

To generate candidate patterns, we need understandthe relationship between the optimized trajectory and its realpattern, so we designed three experiments to do that.

Exper-iment 1: Relationship of LSL between trajectory and realpattern

Due to different angles of taping, no matter usinghand-hold devices or surveillance cameras, the LSL will de-form inevitably. In this experiment, we calculate the rela-tionship of LSL between optimized trajectory and its realpattern. We deﬁne the LSL of real pattern as follows: thedistance between adjacent dots, no matter horizontally orvertically but not 45-degree angle, is 1. Under this deﬁni-tion, we can get all possible distances between two points:Distance-Set = { , √ , , √ , √ } , which correspond to OA , OC , OA (cid:48) , OB , OC (cid:48) shown in Figure. 9 respectively. Thenwe record every LSL in trajectory, normalise them by standardlength, and use Gaussian kernel function [6] (see Equation. 1)to calculate the density distribution of each type of LSL. Theexperimental results are shown in Figure. 10(a). k ( x , x ’ ) = e − (cid:107) x − x ’ (cid:107) σ (1) Experiment 2: Relationship of angles between trajectoryand real pattern

In this experiment, we try to ﬁnd the relation-ship between trajectory and real pattern. As Experiment 1, we7 a) The example of tracking inscenario 1. (b) The trajectory of example(278369) in scenario 1(c) The example of tracking inscenario 2 (d) The trajectory of example(5298674) in scenario 2

Figure 7: Typical tracking scenarios and the trajectory results.The blue rectangle in pic (a) and pic (c) are bounding boxes.Pic (b) is the trajectory of pic(a) and the pattern is 2-7-3-6-9.Pic (d) is the trajectory of pic(b) and the pattern is 5-2-9-8-6-7-4.ﬁrst deﬁne angle: the smaller angle which two intersectantline segments form. Under this deﬁnition, we know that everyangle we talk about is less than 180 ◦ . Furthermore, we giveall possible angles: Angle-Set = { ◦ , 27 ◦ , 37 ◦ , 45 ◦ , 53 ◦ , 63 ◦ ,72 ◦ , 90 ◦ , 117 ◦ , 135 ◦ } which correspond to (cid:54) GOH , (cid:54) HOI , (cid:54) FOH , (cid:54) GOI , (cid:54) DOF , (cid:54) DOI , (cid:54) GOH , (cid:54) EOI , (cid:54) GOH , (cid:54) GOH shown in Figure. 9 respectively. We also use Gaussian kernelfunction to to calculate the density distribution of each typeof angle, and the results are shown in Figure. 10(b).Intuitively, the more ﬂat the distribution is, the larger vari-ance the actual value will be. If the distribution variance of afeature is small and the mean is close to the standard value,it means that this feature is more effective for our algorithm.The results show signiﬁcant difference between each distri-bution of edge length or angle in possible pattern lock andthe mean value of each distribution is very close to standardvalue.

Experiment 3: Correlation of LSL and angle

Now we havetwo features that be used to transform optimized trajectory tocandidate patterns. So in this experiment, we try to ﬁnd if wecan use one feature to substitute the other one. We introduceKendall [22] (see Equation. 2) and Spearman [35] (see Equa-tion. 3) method to calculate the correlation between thesetwo features, i.e. LSL and angle. Furthermore, we also use −

20 0 20 40 60 − −

012 3 45 6Figure 8: Using RDP algorithm to identify the turning pointsin trajectory. There are 222 points in original trajectory. 7turning points are identiﬁed by RDP algorithm.AB CD E FG HIA (cid:48) C (cid:48) OFigure 9: All possible angles in patterns.these two methods to calculate the correlation between LSLof trajectory and its real pattern and angle as well, as a supple-ment of why LSL and angle detected from video can be usedto generate candidate patterns. As Table. 4.5 shows, we canclearly see that under the Kendall coefﬁcient and Spearmancoefﬁcient, both line segment length feature and angle featurehave a strong correlation with their corresponding standardvalues, but there is little correlation between them. In otherwords, LSL and angle are important features for generatingcandidate patterns and because of their little correlation, theycannot replace each other. R = Pn ( n − ) − ρ = − ∑ d i N ( N − ) (3) Novel method to generate candidate patterns

We pro-posed a novel method to generate candidate patterns based onthe information we have: a trajectory with some redundancy, asequence of turning points, and the knowledge we mentionedbefore. In our method, ﬁrst, we treat each three sequential8 1 2 3 4 500 .

51 Distance P r ob a b ilit y D e n s it y √ √ √ (a) The distribution of LSL . . . . P r ob a b ilit y D e n s it y ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ (b) The distribution of angle Figure 10: The distribution of LSL and angle in trajectoriesFactorCorrelation Algorithm Kendall SpearmanLength 0.66859106 0.81089916Angle 0.9052319 0.9785986Length&Angle -0.0596755 -0.0868365Table 1: The correlation of distance, angle and the correlationbetween themturning points as a unit, and get the all possible patterns thatmatches the shape these three turning points form. Then,we use a three point sized window to traverse the turningpoints, each three points are denoted as a pair of two vectors.For example, from three points { ( x , y ) , ( x , y ) , ( x , y ) }, weget a combination of three two-dimensional vectors { (cid:126) a =( x − x , y − y ) ,(cid:126) b = ( x − x , y − y ) ,(cid:126) c = ( || (cid:126) a || , || (cid:126) b || ) }.There are 504 possibilities for a three-digit cipher total.Some ciphers like those shown in Figure. ?? may pass fouror ﬁve keys. Standard vector pairs responding to ciphers aregenerated, according to the layout of the pattern lock. Wecalculate the similarity between a standard unit ( (cid:126) u ,(cid:126) v ,(cid:126) w ) anda unit ( (cid:126) a ,(cid:126) b ,(cid:126) c ) extract from trajectory depending on the cosinesimilarity of vectors. (a) The standard trajectory of ex-ample(278369) −

20 0 20 40 60 − −

012 3 45 6 (b) The trajectory of example(278369) in scenario

Figure 11: The example of deformation between standardpattern and trajectory (a) Vector Units Passed Four Dots (b) Vector Units Passed Five Dots

Figure 12: Some vector units examples that pass more than threedots. S = × (cid:32) (cid:126) u · (cid:126) a || (cid:126) u || × || (cid:126) a || + (cid:126) v · (cid:126) b || (cid:126) v || × || (cid:126) b || (cid:33) × θ + (cid:126) w · (cid:126) c || (cid:126) w || × || (cid:126) c || × ( − θ ) ( ≤ θ ≤ ) (4)To reduce computational complexity, for any vector in unit,if the cosine similarity to the corresponding vector in cipher isless than 0, we consider this cipher impossible and abandon it.The vector (cid:126) c is mainly concerned with the length relation be-tween (cid:126) a and (cid:126) b , so a weighting factor θ is added when dealingwith its contribution to similarity. In our initial test, the lengthrelation is not a good basis for similarity because the lengthcan be easily inﬂuenced by the condition of the photographand users’ subjective consciousness. And the ratio of lengthvaries little in different cipher, so the weighting factor θ is setto 0.9.For each unit, we compute its similarity to max to 504possible ciphers, some impossible dots(any of the three termsis less than 0) are dropped. Then the possible patterns arereconstructed through combining units together. Because thehead and tail of the trajectory may not be in pattern drawing,the process starts from the unit in the middle and goes toboth sides. At the beginning, the possible patterns for the midunit are generated and stored with their conﬁdence(similaritytimes weight) in a set P . When a new unit comes, the set willbe updated. The Algorithm. 4 described the updating of theset. After going through all the units, possible patterns are9orted from high to low according to conﬁdence. Improved schemes

In order to improve the success rate ofrestoring the password with fewer attempts, two improvedschemes are applied.First, we proposed a consensus algorithm to exclude thewrong candidate pattern. As the trajectory deformation oftrajectory caused by our camera angle will not affect whetherthe edges intersect with each other, the intersection betweenedges of candidate pattern should match the intersection be-tween edges of trajectory edges. As described in Algorithm. 1, L S [] is the edge array of line segments. P [] is the set of possiblepatterns. tDict is the intersection dict of trajectory and pDict is the intersection dict of candidate pattern. Variable d is thedifference of indexes in edge array. First, we parse the inter-section dict of trajectory and all of the pattern in candidatepattern set. If pDict matches tDict , the pattern will be kept,otherwise it will be discarded. The way we parse intersectiondict is summerized in Algorithm. 2. variable d is also thedifference of indexes in edge array, c is the string that recordintersection. For each d , if P [ i ] is intersected with P [ i + d ] , c will append (cid:48) T (cid:48) , otherwise it will append (cid:48) F (cid:48) . In the end, foreach d , we get intersect string c . Algorithm 1

Consistency Veriﬁcation

Input: Ls [] :The set of line segments(edge array); P [] : The set of possible patterns; Output: : N [] : A new set of possible patterns; tDict ← parseIntersect ( LS []) ; for each pattern ∈ P [] do Lt ← parsePatternToLineSegment ( pattern ) pDict ← parseIntersect ( Lt ) match = True for d ∈ [ , , , , , ] do if pDict [ d ] / ∈ tDict [ d ] then match = False end if if match then N . append ( pattern ) end if end for end for return N ; Algorithm 2

Parse Intersection

Input: P [] : The set of line segments(edge array); Output: : intersectDict : A dict of intersect for d ∈ [ , , , , , ] do String c ← CountIntersect ( d ) intersectDict ( d ) = c end for return intersectDict ; Figure 13: Some patterns that contain intersections.As Mentioned in Subsection. 4.3, we also tracked otherhand key points and get trajectories to generate candidatepatterns. For the candidate patterns generated by differenttrajectories, the conﬁdence of each pattern is accumulated, andﬁnally the candidate patterns are sorted to get a ﬁnal results.By applying these two schemes, we can successfully crackthe lock pattern in fewer attempts, the comparison results ofwhether to use the two schemes are presented in Section. 5.6. In this section, we demonstrate the simulation experiment andthe results. The experimental results reported in this sectionare based on data collected from the authors only. We plan torecruite a number of human participants to validate the perfor-mance of the proposed attacking pipeline using more realisticdata from real world users. Based on these test samples, wecan guess over 90% of videos ﬁlmed in scenario 1 and 60%in scenario 2 in ten attempts.

Complexity score N u m b e r o f p a tt e r n l o c k s Figure 14: The distribution of pattern complexity scores

Pattern selection

TBD

Device selection

TBD10 ttack scenarios

As mentioned in [cite, threat model], oursystem focusing on working in 2 main scenarios, so the de-tailed information is as follows: .Surveillance cameras:To simulate surveillance cameras,we use tripod to ﬁx video-taping devices with 2.5 meters highfrom ground. The horizontal distances from taping devicesare 1 meter, 3 meters, 5 meters, and direction includes in front, and left to the volunteer.All volunteers has the same positionof drawing patterns of sitting near a table with target phonelaid on it. .Face-to-face:To simulate this scenario, the video-tapingdevice is held by another volunteer and is at the same verticalheight as the target device.The horizontal distances betweenthese two devices differ from 2 meter, 3 meters, 5 metersand 10 meters.Target device is held by a volunteer, with himsitting near a table.To summarize, we in total record xx videos with xx differ-ent patterns. In our experiment, over 99.3% of video samples in scenario1 can be successfully cracked in 20 attempts. In scenario 2,we can crack over 92% of video samples in 20 attempts. Therelation between attempts times and success rate is shown inﬁgure 10. . . . . . . . . . Number of attempts S u cce ss r a t e Scenario 1Scenario 2Figure 15: The successful cracking rate in different numberof attempts.TBD

In both scenarios, the vertical distance between the cameraand the target phone is relatively single. We mainly considerthe impact of horizontal distance on success rate. There arefour varieties in distance: 1m, 2m, 3m and 5m.As shown in ﬁgure 16, for samples ﬁlmed by monitor, weget the highest success rate when the horizontal distance is1m. As the distance increases, the success rate goes downboth in ﬁve attempts and 20 attempts. In face-to-face shootingscenario, the success rate is not affected by distance. . . . . . . . . . . Number of attempts S u cce ss r a t e (a) Scenario 1 . . . . . . . . . Number of attempts S u cce ss r a t e (b) Scenario 2 Figure 16: The successful rate in different shooting distance.

In scenario 1, as mentioned in 3.3, the orientation of the phonecan be identiﬁed by detecting the orientation of the humanface. There is no need for PatternMonitorto transfer the phonein video into completely front view. With knowing the headorientation of the phone, even the orientation identiﬁed issomewhat different from the real orientation, PatternMonitor-can successfully resort the pattern lock. main directionturn left turn right00 . . . . . . . . Directions S u cce ss r a t e Front FilmingLeft Filming

Figure 17: Impact of phone grip angle.In scenario 2,the phone’s screen is projected onto the verti-cal plane at one angle, depending on how the user holds it. In11he experiment, we found that it was difﬁcult for the user toﬁx the position of holding the phone while entering pattern,and it was difﬁcult to determine the angle between the phoneand the vertical direction. As we track the visible part of hand,which is not in direct contact with the phone screen in sce-nario 2, the Angle of the phone’s grip does not directly affectthe success rate. Nonetheless, the impact of this Angle on thesuccess rate cannot be ignored. We compared the success rateby dividing the samples into three categories according to thedegree of skew. The results is show on Figure. 18. . . . . . . . . . Number of attempts S u cce ss r a t e As mentioned in 5.1, we calculate the complexity score ofeach pattern. In this experiment we divided all the patternsinto three groups: complex, medium and simple. The successrate for each type of pattern is shown on Figure 19.1 2 3 4 5 6 7 8 9101112131415161718192000 . . . . . . . . . . S u cce ss r a t e simplemediumcomplexFigure 19: The successful cracking rate in different types ofpatterns.For the two main factors that affect password complexity:length and cross or not, we also set an experiment to studyeach factor’s impacts on success rate. factorssuccess rate factors long shortcross 90% 80%no cross 95% 90%Table 2: The successful cracking rate in different types ofpatterns There are some adjustable parameters in our system such asthe number of frames detected and the key points in hand. Inthis section, we evaluate the effects on success rate of differentsettings.

Frame Detected

In our approach, once a phone is detected,the following several frames will be used as the input imageto detect phone and hand. Then one frame which have thedetection results with highest conﬁdence is chosen as thetracking start frame. The number of frames is set to 30 in ouroverall experiment, but there remains some options. When thenumber is too small, the detection results may be unsatisﬁed.When the number is too large, the tracking may start afterthe drawing of pattern. So we did the experiment in differentnumber settings. As the ﬁgure... . . . . . . . . . Number of frames detected S u cce ss r a t e MonitorFace-to-faceFigure 20: The successful cracking rate in different numberof frames detected.

KeyPoints

In scenario 1, the tip of index ﬁnger is the targetwe chose for tracking because its motion reveals the patterndrawing quite well. There are some other parts in hand thatcan be treated as targets such as the second and the ﬁrst jointof index ﬁnger. We chose different parts as the tracking targetand got their inﬂuence on success rate.12 arget1 target2 target3 target400 . . . . . . . . Key points S u cce ss r a t e Figure 21: The successful cracking rate using different keypoints.

Our research works well under our assumptions mentionedin Section 3.4, however, we do have some limitations. First,in some places (e.g. railway station, or airport), surveillancecameras may deployed much higher than our assumption.In that case, drawing hand identiﬁcation and object trackingalgorithm may lose efﬁcacy, which can lead our work becomeinvalid. Second, though our work do not ﬁngertip in the video,but we still need parts of drawing hand continuously appearin the video, for we do not have prediction algorithm formissing key points. Third, there are certain people who hasdifferent drawing habit. For example, one may have a verysimple pattern as his locking pattern, and he can draw thispattern with only ﬁnger motion but his hand barely moves. Inthat case, our work may lose efﬁcacy.

Based on our work, we clearly give the threateness of video-attack towards pattern lock. A potential defence of these attackis remembering cover hand when people drawing pattern, orinput text-based password, no matter in public place or in asmall room. Losing visual of keypoints always has big effecton these attacks. Another possible defence is that, peoplecan use multiple authentication methods to protect his per-sonal information. In recent years, authentication based onbiological characteristics such as ﬁngerprints [26], iris [15]and face [21] recognition are widely used. These bio-basedauthentication may vulnerable by other attacks, but are rela-tively secure to video-based attack. So a better way to protectpersonal information may use multi-factor authentication.

In this paper, we proposed a automatic video-based attacktowards pattern lock. Our Experiments showed that, althoughsome prior work gave some evidences that pattern lock isvulnerable, but the actual threaten is still underestimated. Anexperienced adversary can utilize different ways, such as face-to-face taping or hacking surveillance cameras, to guess targetuser’s pattern in very short time, and high success rate.

References [1] Yomna Abdelrahman, Mohamed Khamis, StefanSchneegass, and Florian Alt. Stay cool! understandingthermal attacks on mobile-based user authentication.In

The 2017 CHI Conference on Human Factors inComputing Systems (CHI 2017) , 2017.[2] Mohammed Eunus Ali, Anika Anwar, Ishrat Ahmed,Tanzima Hashem, Lars Kulik, and Egemen Tanin. Pro-tecting mobile users from visual privacy attacks. In

AcmInternational Joint Conference on Pervasive & Ubiqui-tous Computing: Adjunct Publication , 2014.[3] Florian Alt, Andreas Bulling, Gino Gravanis, and DanielBuschek. Gravityspot: Guiding users in front of publicdisplays using on-screen visual cues. In

Acm Sympo-sium , 2015.[4] Adam Aviv, Katherine Gibson, Evan Mossop, MattBlaze, and Jonathan Smith. Smudge attacks on smart-phone touch screens.

Proceedings of the 4th USENIXConference on Offensive Technologies, WOOT’10 , 122010.[5] Adam J. Aviv, Devon Budzitowski, and Ravi Kuber.Is bigger better? comparing user-generated passwordson 3x3 vs. 4x4 grid sizes for android’s pattern unlock.In

Proceedings of the 31st Annual Computer SecurityApplications Conference , ACSAC 2015, page 301–310,New York, NY, USA, 2015. Association for ComputingMachinery.[6] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda.Uniqueness of the gaussian kernel for scale-space ﬁl-tering.

IEEE Transactions on Pattern Analysis andMachine Intelligence , PAMI-8(1):26–33, 1986.[7] Kiran Balagani, Matteo Cardaioli, Mauro Conti, PaoloGasti, Martin Georgiev, Tristan Gurtler, Daniele Lain,Charissa Miller, Kendall Molas, Nikita Samarin, et al.Pilot: Password and pin information leakage from ob-fuscated typing videos.

Journal of Computer Security ,27(4):405–425, 2019.138] Kiran Balagani, Mauro Conti, Paolo Gasti, MartinGeorgiev, Tristan Gurtler, Daniele Lain, Charissa Miller,Kendall Molas, Nikita Samarin, Eugen Saraci, GeneTsudik, and Lynn Wu.

SILK-TV: Secret InformationLeakage from Keystroke Timing Videos: 23rd Euro-pean Symposium on Research in Computer Security,ESORICS 2018, Barcelona, Spain, September 3-7, 2018,Proceedings, Part I , pages 263–280. 08 2018.[9] David Barrett. One surveillance camera for every 11people in britain, says cctv survey.

The Telegraph , 10,2013.[10] Matthias Böhmer, Brent Hecht, Johannes Schöning, An-tonio Krüger, and Gernot Bauer. Falling asleep withangry birds, facebook and kindle: a large scale study onmobile application usage. In

Proceedings of the 13th in-ternational conference on Human computer interactionwith mobile devices and services , pages 47–56, 2011.[11] Barry Brown, Moira Mcgregor, and Donald Mcmillan.100 days of iphone use: understanding the details of mo-bile device use. In

International Conference on Human-computer Interaction with Mobile Devices & Services ,2014.[12] Dirk Van Bruggen.

Studying the Impact of SecurityAwareness Efforts on User Behavior . PhD thesis, 2014.[13] Tao Chen, Michael Farcasin, and Eric Chan-Tin. Smart-phone passcode prediction.

IET Information Security ,12, 04 2018.[14] Andrei Costin. Security of cctv and video surveillancesystems: Threats, vulnerabilities, attacks, and mitiga-tions. In

Proceedings of the 6th international workshopon trustworthy embedded devices , pages 45–54, 2016.[15] John Daugman. How iris recognition works. In

Theessential guide to image processing , pages 715–739.Elsevier, 2009.[16] Antonella De Angeli, Lynne Coventry, Graham Johnson,and Karen Renaud. Is a picture really worth a thousandwords? exploring the feasibility of graphical authentica-tion systems.

International Journal of Human-ComputerStudies , 63:128–152, 07 2005.[17] David H Douglas and Thomas K Peucker. Algorithmsfor the reduction of the number of points required to rep-resent a digitized line or its caricature.

Cartographica:the international journal for geographic informationand geovisualization , 10(2):112–122, 1973.[18] Aaron Doyle, Randy Lippert, and David Lyon.

Eyeseverywhere: The global growth of camera surveillance .Routledge, 2013. [19] Malin Eiband, Mohamed Khamis, Emanuel VonZezschwitz, Heinrich Hussmann, and Florian Alt. Un-derstanding shoulder surﬁng in the wild: Stories fromusers and observers. In

Chi Conference on Human Fac-tors in Computing Systems , 2017.[20] Meriem Guerar, Mauro Migliardi, Francesco Palmieri,Luca Verderame, and Alessio Merlo. Securing pin-basedauthentication in smartwatches with just two gestures.

Concurrency and Computation Practice and Experience ,09 2019.[21] Yoshihisa Ijiri, Miharu Sakuragi, and Shihong Lao. Se-curity management for mobile devices by face recogni-tion. In , pages 49–49. IEEE, 2006.[22] Maurice G Kendall. Rank correlation methods. newyork: Hafner, 1955.

Manuscript received 3/30 , 65, 1955.[23] Chris Xiaoxuan Lu, Bowen Du, Hongkai Wen, SenWang, Andrew Markham, Ivan Martinovic, Yiran Shen,and Niki Trigoni. Snoopy: Snifﬁng your smartwatchpasswords via deep sequence learning.

Proceedings ofthe ACM on Interactive, Mobile, Wearable and Ubiqui-tous Technologies , 1(4):1–29, 2018.[24] Alan Lukezic, Tomas Vojir, Luka ˇ Cehovin Zajc, JiriMatas, and Matej Kristan. Discriminative correlationﬁlter with channel and spatial reliability. In

Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition , pages 6309–6318, 2017.[25] Federico Maggi, Alberto Volpatto, Simone Gasparini,Giacomo Boracchi, and Stefano Zanero. A fast eaves-dropping attack against touchscreens. In , pages 320–325. IEEE, 2011.[26] Davide Maltoni, Dario Maio, Anil K Jain, and Salil Prab-hakar.

Handbook of ﬁngerprint recognition . SpringerScience & Business Media, 2009.[27] Erik Murphy-Chutorian and Mohan Manubhai Trivedi.Head pose estimation in computer vision: A survey.

IEEE transactions on pattern analysis and machine in-telligence , 31(4):607–626, 2008.[28] Vijay Rajanna, Seth Polsley, Paul Taele, and Tracy Ham-mond. A gaze gesture-based user authentication systemto counter shoulder-surﬁng attacks. In the 2017 CHIConference Extended Abstracts , 2017.[29] Joseph Redmon and Ali Farhadi. Yolov3: An incremen-tal improvement. arXiv , 2018.[30] Nataniel Ruiz, Eunji Chong, and James M. Rehg. Fine-grained head pose estimation without keypoints. In14 he IEEE Conference on Computer Vision and PatternRecognition (CVPR) Workshops , June 2018.[31] Jeremy Schiff, Marci Meingast, Deirdre K. Mulligan‡,Shankar Sastry§, and Ken Goldberg. Respectful cam-eras: Detecting visual markers in real-time to addressprivacy concerns. In

IEEE/RSJ International Confer-ence on Intelligent Robots & Systems , 2007.[32] Diksha Shukla, Rajesh Kumar, Abdul Serwadda, andVir V Phoha. Beware, your hands reveal your secrets!In

Proceedings of the 2014 ACM SIGSAC Conferenceon Computer and Communications Security , pages 904–917, 2014.[33] Diksha Shukla and Vir Phoha. Stealing passwords byobserving hands movement.

IEEE Transactions on In-formation Forensics and Security , PP:1–1, 04 2019.[34] Tomas Simon, Hanbyul Joo, Iain Matthews, and YaserSheikh. Hand keypoint detection in single images us-ing multiview bootstrapping. In

The IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR) ,2017.[35] Charles Spearman. " general intelligence" objectivelydetermined and measured. 1961.[36] Chen Wang, Xiaonan Guo, Yingying Chen, Yan Wang,and Bo Liu. Personal pin leakage from wearable devices.

IEEE Transactions on Mobile Computing , PP:1–1, 082017.[37] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, andYaser Sheikh. Convolutional pose machines. , pages 4724–4732, 2016.[38] Roman Weiss and Alexander De Luca. Passshapes: uti-lizing stroke based authentication to increase passwordmemorability. In

Proceedings of the 5th Nordic confer-ence on Human-computer interaction: building bridges ,pages 383–392, 2008.[39] Thomas Winkler and Bernhard Rinner. User-centricprivacy awareness in video surveillance.

MultimediaSystems , 18(2):p.99–121, 2012.[40] Yi Xu, Jared Heinly, Andrew White, Fabian Monrose,and Jan-Michael Frahm. Seeing double: Reconstruct-ing obscured typed input from repeated compromisingreﬂections.

Proceedings of the ACM Conference onComputer and Communications Security , 11 2013.[41] Guixin Ye, Zhanyong Tang, Dingyi Fang, XiaojiangChen, Kwang In Kim, Ben Taylor, and Zheng Wang.Cracking android pattern lock in ﬁve attempts. In

Pro-ceedings of the 2017 Network and Distributed System Security Symposium 2017 (NDSS 17) . Internet Society,2017.[42] Guixin Ye, Zhanyong Tang, Dingyi Fang, XiaojiangChen, Willy Wolff, Adam J Aviv, and Zheng Wang. Avideo-based attack for android pattern lock.

ACM Trans-actions on Privacy and Security (TOPS) , 21(4):1–31,2018.[43] Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu,Kui Ren, and Wei Zhao. Blind recognition of touchedkeys on mobile devices. In

Proceedings of the 2014ACM SIGSAC Conference on Computer and Communi-cations Security , CCS ’14, page 1403–1414, New York,NY, USA, 2014. Association for Computing Machinery.[44] Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu,Kui Ren, and Wei Zhao. Blind recognition of touchedkeys on mobile devices. In

Proceedings of the 2014ACM SIGSAC Conference on Computer and Communi-cations Security , pages 1403–1414, 2014.[45] Jie Zhang, Xiaolong Zheng, Zhanyong Tang, TianzhangXing, Xiaojiang Chen, Dingyi Fang, Rong Li, XiaoqingGong, and Feng Chen. Privacy leakage in mobile sens-ing: Your unlock passwords can be leaked through wire-less hotspot functionality.

Mobile Information Systems ,2016:1–14, 01 2016.[46] Huiyuan Zhou, Vinicius Ferreira, Thamara Alves,Kirstie Hawkey, and Derek Reilly. Somebody is peek-ing! a proximity and privacy aware tablet interface. In

Proceedings of the 33rd Annual ACM Conference Ex-tended Abstracts on Human Factors in Computing Sys-tems , pages 1971–1976, 2015.[47] Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao,Zhibo Wang, and Xiaofeng Chen. Patternlistener: Crack-ing android pattern lock using acoustic signals. In

Pro-ceedings of the 2018 ACM SIGSAC Conference on Com-puter and Communications Security , pages 1775–1787,2018.15

Appendix

Algorithm 3

Check

Input: tra jectory : The temporary trajectory while tracking; CR : The threshold of number of frames to check moving; Output: : State : The temporary trajectory is valid or not. AD = getAvgDistance ( tra jectory ) State = True d ← distance ( tra jectory [ − CR : ]) if d ≤ then countStatic + = else countStatic = end if if countStatic ≥ then State = False end if if distance ( tra jectory [ − ]) ≥ × AD then State = False end if return

State

StartOrdered Detection Re-sult and Video StreamSelect Hand Key-point and PhoneCorner in One FrameFor Frame inVideo StreamcheckTrack Hand Keypoint Track Phone CornerFind and Save Move-ment of Hand Key-point with Respectto Phone CornerPointsNum> ThresEndpass failyes noFigure 22: The ﬂowchart16 lgorithm 4

Update Possible Patterns Set

Input: LS [] :The set of line segments; P [] : The old set of possible patterns; U < line , line > : A incoming unit; C [] : The set of possible ciphers of U; Output: : P [] : A new set of possible patterns; for each pattern ∈ P [] do nextUnit ← getNext ( LS [] , pattern ) ; lastUnit ← getLast ( LS [] , pattern ) ; if U == nextUnit then overlapSegment ← getOverlap ( pattern . passed , U ) ; for each cipher in C do if cipher[overlapSegment] == pattern[overlapSegment] then passedUnit ← pattern . passedUnit ∪ U . line keys ← pattern . keys ∪ cipher . lastkey ; con f idence ← pattern . con f idence + dots . similarity × U . weight ; newpattern ← < keys , passedUnit , con f idence > ; end if end for end if if U == lastUnit then overlapSegment ← getOverlap ( pattern . passed , U ) ; for each cipher in C do if cipher[overlapSegment] == pattern[overlapSegment] then passedUnit ← pattern . passedUnit ∪ U . line l ; keys ← pattern . keys ∪ cipher . lastkey ; con f idence ← pattern . con f idence + dots . similarity × U . weight ; newpattern ← < keys , passedUnit , con f idence > ; end if end for end if P ← P ∪ newpattern ; end for P ← P ; return P ; 17 lgorithm 5 Process

Input: startNum :The start frame number; f rames [] : The frames of video stream; Output: : tra jectory [] : Point coordinates obtained by track-ing; for each f rameNum , f rame ∈ enumerate ( f rames ) do if f rameNum < startNum then continue ; end if cornerBox ← detectCorner ( f rame ) ; f ingerBoxs ← detectFinger ( f rame ) ; tra jectory ← track ( cornerBox , f ingerBoxs [ ] , f rames , f rameNum ) ; if tra jectory ! = None then return tra jectory ; end if tra jectory ← track ( cornerBox , f ingerBoxs [ ] , f rames , f rameNum ) ; if tra jectory ! = None then return tra jectory ; end if end for return [] ; Algorithm 6 track

Input: startNum :The start frame number; f rames [] : The frames of video stream; cBox [] The location of phone corner; pBox [] The location of phoneBox; f Box [] The location of ﬁnger box;

Output: : tra jectory [] : Point coordinates obtained by track-ing; tra jectory = [] inside = False for f rameNum , f rame ∈ enumerate ( f rames ) do if f rameNum < startNum then continue end if if f rameNum == startNum then f ingerTracker . init ( f Box ) cornerTracker . init ( cBox ) end if fCenter ← Center ( f Box ) cCenter ← Center ( cBox ) tra jectory . append ( fCenter [ ] − cCenter [ ] , fCenter [ ] − cCenter [ ]) ; ok ← check ( tra jectory ) if ! ok then : return [] end if f ingerOK , f Box ← f ingerTracker . update ( f rame ) cornerOK , cBox ← cornerTracker . update ( f rame ) if ! f ingerOK || ! cornerOK then return [] ; end if if ! inside ( f Box , pBox ) then : if ! inside : then continue else : break end if else : inside ← True ; end if end for return tra jectorytra jectory