SliceType: Fast Gaze Typing with a Merging Keyboard
SSliceType: Fast Gaze Typingwith a Merging Keyboard ∗† Burak Benligiray Cihan Topal Cuneyt Akinlar
Abstract
Jitter is an inevitable by-product of gaze detection. Because of this,gaze typing tends to be a slow and frustrating process. In this paper, wepropose SliceType, a soft keyboard that is optimized for gaze input. Ourmain design objective is to use the screen area more efficiently by allo-cating a larger area to the target keys. We achieve this by determiningthe keys that will not be used for the next input, and allocating theirspace to the adjacent keys with a merging animation. Larger keys arefaster to navigate towards, and easy to dwell on in the presence of eyetracking jitter. As a result, the user types faster and more comfortably.In addition, we employ a word completion scheme that complements gazetyping mechanics. A character and a related prediction is displayed ateach key. Dwelling at a key enters the character, and double-dwelling en-ters the prediction. While dwelling on a key to enter a character, the userreads the related prediction effortlessly. The improvements provided bythese features are quantified using the Fitts’ law. The performance of theproposed keyboard is compared with two other soft keyboards designedfor gaze typing, Dasher and GazeTalk. 37 novice users gaze-typed a pieceof text using all three keyboards. The results of the experiment show thatthe proposed keyboard allows faster typing, and is more preferred by theusers.
Typing on a physical keyboard is the most typical method of text entry. Thedefining characteristic of typing on a keyboard is that every character is mappedto a single key, and each keystroke results in a character being entered. Theaction of typing is very simple and literal, as each atomic action results in anindependent input. This allows for easy adoption, and high performance aftertraining. However, as the spiritual successors of typewriters, physical keyboardshave inherited many limitations. They are designed to be used in a desktop ∗ This is a pre-print of an article published in Journal on Multimodal User Interfaces. Thefinal authenticated version is available online at https://doi.org/10.1007/s12193-018-0285-z † We invite the readers to test SliceType, and reach additional material fromhttps://github.com/bbenligiray/slicetype a r X i v : . [ c s . H C ] D ec etting, which hinders their mobile capabilities. Additionally, they are operatedby hand, which may not always be an option.The alternative of a physical keyboard for text entry is a soft keyboard [1,2]. Soft keyboards can be designed to be used with discrete (e.g., buttons,blinks) or continuous input (e.g., mouse, stylus, gaze). They are not static likephysical keyboards. Soft keyboards can present pop-up menus [3], modify theirlayouts [4] and mechanics [5] adaptively.The flexible nature of soft keyboards allows them to be designed to overcomelimitations. For example, mobile devices typically have small screens. Softkeyboards for mobile devices are specifically designed to have a small footprinton the screen [6, 7]. Head mounted displays obstruct the users’ view of theirenvironments, which disables them from getting visual feedback from a physicalkeyboard. If the user cannot touch-type, a soft keyboard becomes necessary toenter text [8].Users suffering from severe neural or motor disabilities (e.g., amyotrophiclateral sclerosis, locked-in syndrome) cannot use physical keyboards for textentry. In these cases, blinks and facial gestures can be used as discrete inputmechanisms [9, 10]. Soft keyboards designed specifically for gaze typing will becritical in improving the life quality of such users [11]. In addition, usage sce-narios where non-disabled users depend solely on gaze interaction are becomingmore common [12, 13].In this paper, we propose SliceType, a soft keyboard that is optimized forgaze typing through the following features: • Based on a language model, the keys that are predicted not to be targetednext disappear, and the resulting space is allocated to the adjacent keysthrough a merging animation. Larger target keys increase text entry rateand reduce the effect of eye tracking jitter. • Each key displays a character, and a related prediction based on the lan-guage model [14]. Dwelling once at a key enters the character, and dwellingtwice enters the prediction. The user reads the prediction during the firstdwelling phase, which saves time and effort. • The keyboard layout is designed as an inner ring composed of frequentlyused keys, and an outer ring composed of rarely used keys. In this way,travel time between frequently used keys is reduced, and the mergingfunction is facilitated by ensuring that each frequently used key has anadjacent rarely used key.The improvements provided by these features are quantified using the Fitts’law. This analysis showed that the merging functionality contributes even morethan word completion through predictions. In addition, a user experiment wasconducted to compare the proposed keyboard with two soft keyboards designedfor gaze typing, Dasher [15] and GazeTalk [16]. The proposed keyboard wasfound to provide the highest text entry rate, and is preferred over the otherkeyboards by the users. An interesting result of this experiment was discoveringthat some users’ keyboard preferences were contradictory to their performances.2igure 1: WiViK’s user interface [17]. The key being dwelt on is highlighted.Figure 2: GazeTalk’s user interface [16]. Only six characters which are mostlikely to be used are displayed. To enter another character, the user has to selectthe key labeled as ‘
ABCD... ’. In this section, we focus on soft keyboards that are designed to be used witha continuous input device, which can be a mouse, a touch sensitive surface, ora gaze tracker. The input device is used either to dwell on items for selection,or to draw certain patterns through movement. One of the earliest examples ofdwelling keyboards is WiViK [17]. The keyboard has the traditional QWERTYlayout contained in a rectangular user interface, as shown in Figure 1. To entera character, the user moves the cursor over the key representing the characterand dwells on it for a full dwell period. WiViK predicts words that completethe current prefix, and presents them on the left. Selecting words among thesesuggestions is meant to improve text entry rate.GazeTalk [16, 18] is a dwelling soft keyboard designed specifically for gazetyping. Its design consists of a limited number of large keys, as shown in Fig-ure 2. To select a key, the user moves the cursor over it and waits for a dwellperiod. A progress bar indicates the remaining dwelling time for the key to3igure 3: pEYEwrite’s user interface [19]. The circular pies in the middle areused to enter characters, which appear inside the text-box at the bottom. Thekeys on the corners provide auxiliary functionality.be selected. The transcribed text is shown at the top-left key. The key belowcontains a set of predictions. Authors report a text entry rate of 4 wpm with adwell period of 750 ms.pEYEwrite [19] has a hierarchical interface as shown in Figure 3. To enter acharacter, the user first dwells on the pie slice containing the desired character.Once the dwell period is up, a circular pop-up menu appears, where a singlecharacter is displayed inside each slice. The user dwells on the slice with thedesired character for a second time to select it. Authors report a text entry rateof 7.9 wpm with a dwell period of 400 ms.For dwelling soft keyboards, reducing the dwell period allows for a highertext entry rate, but it also increases typing errors. Most systems use a fixedtime interval in the range of 500–1000 ms. Alternative approaches are to adjustthe dwelling period adaptive to user performance [5], or to use different dwellperiods for different keys [20].EyeWrite [21] is an alternative system, where entering a character involvesdrawing a certain pattern by moving the cursor, rather than dwelling on a key.Patterns resemble pen strokes for writing letters, and are drawn by movingthe cursor from one corner of a rectangular window to another, as shown inFigure 4. Authors report a text entry throughput of about 5 wpm. In additionto EyeWrite, Minimum Device Independent Text Input Method (MDTIM) [22]and Eye-S [23] also employ this style of “eye graffiti” communication. In thesesystems, letters are created by a sequence of fixations on regions called hot-spots.These regions can be made invisible and do not interfere with other applications,meaning that the entire screen area can be utilized by other applications.EyeSwipe [24] has a QWERTY layout, but keys are not selected by dwelling.Instead, the user glances through the keys in sequence, similar to the swiping4igure 4: EyeWrite’s user interface and the patterns representing each charac-ter [21]. In the figure, the letter ‘ c ’ is being entered. Figure 5: Dasher’s user interface [15]. Text is entered by moving the cursortowards the desired characters. In the figure, the word “ development ” is beingwritten.motion used in soft keyboards for mobile devices. Memorization of the layoutis critical for this system, as the user should not scan the keys to search for acharacter in the middle of typing word.Dasher [15] has a radically different design that is used by a continuousgesture. Characters appear on the right hand-side of the interface as shown inFigure 5. To select a character, the cursor is moved towards it. This causes thearea representing the target character to become larger, which is called dynamiczooming. When the character crosses over the vertical line, it is entered. Whileapproaching the target character, the characters that may follow it appear insideits area. To undo an entry, the cursor must be moved to the left hand-side ofthe vertical line. This causes the entered characters to move back to the righthand-side of the vertical line and be removed. Authors report a text entry rateof about 20 wpm after one hour of practice.Zooming is another interesting concept in which the region having the cur-rent user focus is made bigger for easy and accurate selection [25, 27]. Fig-ure 6 (a) illustrates this idea in action. A study on how auditory and visual5 a) (b)
Figure 6: (a) Zooming user interface to make easy and correct selections [25],(b) A keyboard layout to minimize the cursor movement to enter a book chap-ter [26].feedback affects gaze typing is presented in [28]. Authors state that properfeedback by the system influences both the text entry rate and error rate.There is also a good body of research on the optimal layout and size of thekeys on the soft keyboard interface [25, 27, 29–31]. For example, Figure 6 (b)shows a keyboard layout optimized to minimize the text entry time for a specificbook chapter. Word or phrase completion or prediction is widely employed bymany soft keyboards [32–36]. The idea with word completion is to look atthe previously entered text and make suggestions based on a language model.The goal is to allow the user enter the complete word without entering theremaining letters, thus increasing text entry rate. Sharma et al. investigate theeffect of prediction list orientation, position, and the number of predictions tobe displayed [36].Let us investigate the design aspects of the aforementioned soft keyboards,and if they are suitable for gaze typing. The most typical characteristic of gazeinput is its inaccuracy, which can manifest as a constant bias or random jitter.This can be combated by ensuring that the interface components are sufficientlylarge. Since large components are desired and the total area is limited, efficiencyis a key aspect of good soft keyboard design. From this standpoint, no area onthe interface should remain unutilized, which is not universally adhered to (e.g.,pEYEwrite). In relation to this, allocating a separate area for predictions asWiViK and GazeTalk is a common practice that wastes area and causes constantdiversion of attention. Both Dasher and the proposed design circumvent theneed for a static prediction component that takes up space.If the input resolution is low, hierarchical selection is a must. However, hav-ing to do multiple actions per entry puts a hard cap on the potential entry rate.Gaze input is noisy, rather than low-resolution. Thus, trying to compensatefor this noise by using larger elements is a more suitable alternative for gazetyping. Therefore, pEYEwrite, GazeTalk and other soft keyboards that utilize6ierarchical selection are not optimal for gaze typing, especially considering thatgaze tracking devices grow in accuracy by the day.The final design aspect that we would like to touch on is the balance betweenstatic and dynamic layouts. Hard keyboards are fully-static, which allows theexperienced user to touch type with great speed and comfort. On the otherhand, soft keyboard layouts can change dynamically, which opens up a path forinteresting designs. For optimal performance and comfort, a balance needs to bestruck. We believe that the most important rule regarding this trade-off is thatthe positions of keys should not change dramatically. A completely dynamiclayout as in GazeTalk and Dasher requires constant cognitive effort from theuser, as the location of each key is unpredictable. It is to be expected that suchan effort will have negative consequences on typing performance and comfort.
In this section, the design of the proposed dwelling soft keyboard, SliceType,will be discussed. Some of these discussions apply for any kind of continuousinput, while others are gaze typing-specific. With a conventional continuousinput device, the user locates the item to be selected, drags the cursor uponthe item, and selects it. In gaze typing, where the selection cursor moves withgaze, the location and dragging occur concurrently. As soon as the user seesthe item to be selected, the cursor will be on the said item. Then, the itemis selected by dwelling. The user has to keep looking at the item to dwell,thus the dwelling period cannot be used to search for the next target, or check adedicated prediction area. For these reasons, dwelling keyboards for gaze typingrequire special design considerations.The implementation of an efficient word prediction proposal method is dis-cussed in Section 3.1. In Section 3.2, we present the layout of the proposedkeyboard. The mechanics of key merging are explained in detail in Section 3.3.A brief usage example is presented in Section 3.4.
Word prediction is a common tool used to speed up typing for soft keyboards.Using the previously entered text and a language model, the current word tobe entered is predicted and proposed to the user for faster entry. Word predic-tion is generally considered to be limited to the generation of the prediction,using the recently entered characters and statistical data. However, in the softkeyboard interface context, the method of communicating the predictions tothe user is a problem by itself. The generation of the prediction is more in thescope of natural language processing, rather than human–computer interaction.Therefore, we will briefly explain the simple prediction generation scheme usedin SliceType, then move on to how the predictions are presented to the user.7 .1.1 The Prediction Engine
A bigram corpus that consists of word pairs and a unigram corpus that consistsof single words are maintained. The last word entered and the current wordprefix are used to predict the word that the user is currently entering. If there aremultiple candidates for prediction, the word that belongs to the more frequentlyused bigram pattern is preferred. If there are no suitable bigrams, unigramprediction is used as backup.If the corpora would have been kept as a list, finding a prediction wouldbecome more complex as the sizes of the corpora grow. Instead, the unigramand bigram corpora are kept as individual tries. These tries can be updated byadding words, or arranging words based on usage frequency, which will result ina prediction engine that dynamically adapts to the user. Refer to [37] for moreinformation on n-gram language models.
The predictions are generally displayed in a dedicated area of the interface (seeFigure 1). This is because they are seen as an addition to improve performance,and not as an essential part of the soft keyboard. However, this separation hasmany disadvantages for gaze typing: • While gaze typing, every time the user glances at the predictions, they arealso dragging the cursor over. This reflects as a direct Fitts’ law cost [38]. • Since the area dedicated for the predictions is limited, not many pre-dictions can be displayed. It is less likely to be able to propose correctpredictions with fewer guesses. • The user has to read a prediction list until they encounter the desiredword. If the desired word is not among the predictions, the user will haveto read all of the predictions, which has the highest time cost. • It is easy for the user to give up on the predictions entirely, e.g., whenmany false predictions are proposed consecutively. This also presents itselfin novice users, where the user enters a state of tunnel vision and simplyignores the predictions to avoid complexity, which delays acquiring theskills needed to use the keyboard efficiently.We quantified the negative effects of presenting the predictions in a dedicatedarea, and not using the predictions in Section 4.The conventional prediction engines use the previously entered text to findthe most probable continuations. Instead, we propose to add each of the let-ters to the previously entered text and make a respective prediction, as firstused in [14]. Since there are 26 letters in the English alphabet, this will resultin 26 different predictions. Each of these predictions will be presented in therespective key. While the user is dwelling on a key to select it, they will readthe associated prediction effortlessly. If they wish to use the prediction, they8
Figure 7: SliceType interface and the layout of the characters.keep on dwelling for an additional dwell period. Otherwise, they move on tothe next character. The proposed prediction proposal method improves uponthe conventional methods regarding all of the problems stated above. A disad-vantage is that every key must be large enough to encase a legible word, whichis a problem for keyboards with small keys [39]. In this study, we have partlyovercome this problem by enlarging the target keys through merging.
A dwelling keyboard can either be designed to display all characters at once,or group characters under categories, making them reachable by a hierarchi-cal selection mechanism. The need for an additional selection event for eachcharacter slows down typing, but only requires a small number of keys to bedisplayed at once. Less and larger keys can be selected easily with lower res-olution input devices. On the other hand, an accurate input device is neededto use a keyboard that displays all characters at once. Considering that eyetracking technology has reached some level of maturity, and will only improvein the future, we opted for displaying all characters for SliceType (see Figure 7).The keyboard is shaped as a circle encased in a square. The aspect ratio isset to be 1:1, which provides compactness to the keyboard, while minimizingthe distance among the characters. The circular keyboard area contains thecharacters, while the corners are used for additional functionality. The circulararea has an innermost circle enclosed by two outer rings. The innermost circleis divided in half, and the outer rings are divided radially into 24 keys, resultingin a key for each letter of the English alphabet.9he typical design objective in the character layout is to minimize the totaldistance traveled during typing. An obvious solution to this problem is to assignthe more frequently visited characters towards the center of the interface. Toachieve this, we calculated the frequency statistics of the letters in our Englishcorpus. We placed the two most frequently used letters, ‘ e ’ and ‘ t ’ in the twohalves of the innermost circle. The remaining 12 frequently used letters areplaced in the inner ring, and the 12 rarely used letters are placed in the outerring, as shown in Figure 7.Reducing the cursor movement is not our only goal, which is why we avoidedthe common strategy of placing letters that are likely to follow each other ad-jacently. We also want to facilitate key merging, and placing letters that arelikely to follow each other actively prevents this. Instead, the layout describedabove ensures that each letter has at least one rarely used letter adjacent to it. The Fitts’ law indicates that user movement to a target becomes faster as thetarget width increases [38]. Furthermore, gaze tracking systems introduce someerror to the system, which manifests itself as a jittering cursor. While dwellingon the target, it is undesirable for the cursor to leave the target due to noise,which will reset the dwell timer and waste time. Then, increasing the size ofthe target keys will improve speed and reduce the probability of errors.It can be agreed on that larger keys are easier to select. However, the totalarea of the graphical interface is limited. In this case, the only way of allocatingmore area to a key is removing area from another key. Allocating a smallerarea to the keys that are less likely to be selected [4] may result in a frustratingexperience when they are to be selected. Instead, a more drastic approach isremoving the keys that are less likely to be selected. Doing so will result in morearea to be allocated to the keys that are more likely to be chosen. This removalof a key and using its space for an adjacent key is defined to be key merging,and is employed in SliceType.To decide on which keys are to be eliminated, we use the word predictions.See Figure 8 for an example. The text entered so far, “ in- ”, appears on thetop-left corner. The prediction engine does not return a result for the key ‘ y ’,because there are no words that start with “ iny- ”. Thus, the key is removedfrom the interface, and its space is merged into its neighbor, key ‘ s ’. Similarly,key ‘ w ’ is removed and its space is merged into key ‘ n ’. Notice that if lettersthat frequently follow each other were specifically placed next to each other onthe interface, such merging would have been less frequent.The proposed key merging method assumes that the user enters a word thatis present in the corpus. In the case that the word to be entered is not in thecorpus, the user will be aware when the following key to be entered merges toanother key, hence does not appear on the screen. Then, the user can undo thetyping actions for the last word and enable a non-merging mode of the keyboardthat allows typing words that are not present in the corpus. This mode alsoadds this word to the corpus, so that the user can enter the word in the regular10 Figure 8: Merging of the keys based on the current word prefix “ in- ”.operation mode next time.See Figure 9 for the prediction proposal mechanism. As seen on the upper-left corner, the user has typed “ in- ”. The user intends to select the letter ‘ p ’by hovering over it. Now, not only SliceType highlights the slice representingthe currently selected letter ‘ p ’ in orange, but it also displays the word “ input ”(generated using the current word prefix “ inp- ”). Displaying the suggestedword in the same slice as the current letter enables the user to read the sugges-tion without directing their gaze to a different location. Notice that SliceTypeonly displays the prediction within the key that is being dwelt on. Displaying allpredictions at all times would have cluttered the interface, which would hinderthe user’s peripheral vision from locating the next target. This section is intended to further clarify the concepts introduced in Section 3,and serve as an instruction guide for testing the proposed keyboard. SliceTypeuses a direct selection system, where letter selection is performed by dwellinginside the key representing the letter. The default dwell period is 1000 ms,which is intended to be used by novice users. The user is expected to decreasethis parameter manually as they gain confidence.When the cursor is moved inside the boundaries of a key, it is highlightedin a light orange color. This is illustrated in Figure 10 (a), where the user hasjust moved the cursor inside the letter ‘ i ’. To select this letter, the user has tostay inside the key for the entire dwell period. The amount of dwell period thathas passed since the cursor has moved inside the key is indicated by the dark11 Figure 9: Using the current prefix “ in ” and the next letter ‘ p ’, SliceType sug-gests the word “ input ”, which appears inside the pie slice representing letter‘ p ’.orange color that continuously fills the key. In Figure 10 (b), about 50% of thedwell period is up, and in Figure 10 (c), about 90% of the dwell period is up.When the entire dwell period is up, the letter ‘ i ’ becomes selected and appearson the top-left corner of the interface, as shown in Figure 10 (d). Also noticein Figure 10 (d) that after the letter ‘ i ’ is selected, some letters have beenremoved from the interface, and their space have been merged into adjacentkeys. In Figure 11, not all blank space is merged to the remaining keys, becausewe limited the merging to only nearby keys. More aggressive merging changesthe layout excessively, which is reported to be harmful [40].Figure 10 and Figure 11 demonstrate the details of SliceType’s word sug-gestion and selection mechanism. Assume that the user wants to type the word“ input ”. As soon as the user moves the cursor inside the first letter of the word,‘ i ’, its color changes to light orange and the first suggested word “ in ” appearsinside the key of letter ‘ i ’. Since the user desires to type “ input ”, rather than“ in ”, they complete the selection of letter ‘ i ’ by dwelling inside its key for asingle dwell period, and move on to the next letter, ‘ n ’. SliceType continuessuggesting the same word, “ in ”; but this time, the suggested word is displayedinside the key of letter ‘ n ’ (see Figure 10 (e-f)). After letter ‘ n ’ is selected andappears on the top-left corner, the user now moves on to the next letter, ‘ p ’.This time, SliceType suggests the word “ input ”, which is displayed inside thekey of letter ‘ p ’ as shown in Figure 11 (a). The user stays inside letter ‘ p ’ for theentire dwell period to select it as shown in Figure 11 (b), and when the selectionis complete, letter ‘ p ’ appears on the top-left corner, as shown in Figure 11 (c).12 a) (b)(c) (d)(e) (f) Figure 10: (a) The user has just started dwelling upon the letter ‘ i ’, which ishighlighted in light orange. The prediction “ in ” is proposed. (b) About 50% ofthe dwell period is up. (c) About 90% of the dwell period is up. (d) The entiredwell period is up. The letter ‘ i ’ is selected and appears on the top-left corner.(e) The user moves on to the letter ‘ n ’. Note that the same prediction can beproposed again. (f) The user keeps dwelling on ‘ n ’.13 a) (b)(c) (d) Figure 11: The user has entered the word prefix “ in ”. (a) The current focus ison the letter ‘ p ’, and SliceType suggests the word “ input ” for completion. (b)The user is about to complete the selection of letter ‘ p ’. (c) The user continuesdwelling inside the letter ‘ p ’ to select the prediction. The second dwell periodprogress is illustrated in green. (d) The user is about to complete the selectionof the prediction, “ input ”. 14ow, the user wants to select the suggested word “ input ” displayed inside thekey of letter ‘ p ’, so that they do not have to type the rest of the desired word.To select the suggested word, the user must continue dwelling inside letter ‘ p ’for an additional dwell period. This is illustrated in Figure 11 (c), where thedwell period progress is illustrated by the green color. When this second dwellperiod is up, the suggested word “ input ” is selected, and is sent to the system’scharacter input stream. SliceType interface returns back to its default stateshown in Figure 7 to display all characters again, and the user can start typinga new word.Words with repeating letters are a common occurrence in English. Considerthe word being entered to be “ winning ”. While the user is dwelling to enter thefirst ‘ n ’, the corresponding prediction would be “ window ”. After dwelling onceon ‘ n ’, continuing to double-dwell would result in the prediction to be entered,rather than a second ‘ n ’. Then, the user needs to exit the key ‘ n ’ after the firstdwell towards an arbitrary location, and return back to the same key. Followingfrom that, the keyboard starts entering the second ‘ n ’, and proposes “ winning ”as the prediction. Now, the user can double-dwell to enter the desired word.SliceType does not have a dedicated area on its interface to display thetranscribed text. Instead, it sends the entered text to the system’s characterinput system, which can then be read by the program that has the currentsystem focus. This way, the entered text can appear inside any program, e.g.,word processor, text-to-speech program, web browser, etc. The Fitts’ law predicts the time required to move a pointing device to a tar-get [38]. In this study, we used its Shannon variant, which was proposed byMacKenzie [41]. The formulation that estimates the movement time is as fol-lows. Where A represents the distance between the movement origin and thetarget center, W represents the width of the target, and a, b are model param-eters: M T = a + b log (cid:18) AW + 1 (cid:19) (1)To estimate the time of a specific movement, one must estimate a and b . Ourobjective is to quantify the relative performances expected with different key-board settings. Thus, we will use the form in which only the index of difficulty(ID) is estimated: ID = log (cid:18) AW + 1 (cid:19) (2)While ID does not indicate the time required for an action directly, we cansay that an action with higher ID will take longer, because a and b are definedto be positive. MacKenzie and Buxton propose a method for using the Fitts’law with 2D targets [42]. The W (cid:48) model proposed in this study replaces W in I in N in P input P input D W D W D W r r θ θ W' A Figure 12: The target center (shown in red) is defined to be the intersection ofthe radial symmetry axis, and the concentric circle that is equidistant from thecircular borders.with the extent of the target along the line that passes through the movementorigin and the target center. To use this model, the target center has to bedefined. For the key shapes in SliceType, we used the intersection of the radialsymmetry axis of the key and the concentric circle that is equidistant from thecircular borders of the key (see Figure 12). The same target definition is usedfor the two half circle keys in the center of the keyboard. For these two keys,the inner circular borders are assumed to have a radius of 0.After defining the target center, calculating W (cid:48) and A is rather straightfor-ward. To find W (cid:48) , a line passing through the movement origin and target centeris drawn. The distance between the two intersections of this line with the keygives the target width (see Figure 13). A is the distance between the movementorigin and the target center. These variables are used in Eq. 2 to find the ID ofeach movement.The pangram, “The quick brown fox jumps over the lazy dog”, is used tocalculate the Fitts’ law ID scores. The comparison is done between combinationsof utilizing two features of the proposed soft keyboard. The first feature isinstantly completing the word using the proposed prediction. The second featureis the merging of the keys to increase target width. In this condition, the whole sample text is written letter by letter, i.e., the pre-dictions proposed by the soft keyboard are not used. This increases the totaldistance to be traveled, which directly increases the Fitts’ law cost. Further-more, the keys that do not propose a prediction are not removed, thus the freespace is not used to enlarge the remaining keys. Since the key widths are less16 in I in N in P input P input D W D W D W r r θ θ W' A Figure 13: The defined width, W (cid:48) , is the extent of the target along the linethat passes through the movement origin (shown in blue) and the target center(shown in red). A is the distance between the movement origin and the target.than ideal, the Fitts’ law cost is again increased. In other words, this condition isexpected to produce the worst results. According to the methodology describedearlier, the Fitts’ law ID for the test pangram is calculated to be 49 . Only word prediction is utilized in this condition. The user is assumed to usethe true predictions as soon as they are proposed, yet the keys that do notpropose a prediction are not removed. This condition actually simulates thedefault usage scenario of the majority of soft keyboards. The Fitts’ law ID iscalculated to be 40 . Contrary to the previous condition, keys that do not produce a word predictionare merged to the remaining keys. However, the user is assumed not to utilizethe predictions, i.e., all letters are typed individually. The calculated Fitts’ lawID for this condition is 33 . This condition assumes the normal usage of SliceType as discussed in Section 3.The predictions are utilized as soon as they are proposed and the keys that donot propose a prediction are removed to make room for the remaining keys.This is the best case for user performance, thus the Fitts’ law ID is expected tobe the lowest. Indeed, the calculated value is only 30 . .
02 40 . .
68 30 . In Section 3.2, the importance of the method of delivery for prediction proposalswas emphasized. Placing all predictions in a single area of the layout has manydownsides. Instead, we have proposed to place each related prediction insidea key. In this part, we will try to quantify the achieved user performanceimprovement using the Fitts’ law. Since key merging is irrelevant in this matter,we will use the Fitts’ law ID result of the second condition (see Sec. 4.2). Whenthe user uses the predictions in the keys, the Fitts’ law ID was calculated to be40 . .
37. We can see that simply moving thepredictions inside the keys results in a 63 .
1% decrease in the Fitts’ law ID.
The Fitts’ law indices of performance (IDs) calculated under different conditionsare presented in Table 1. As expected, adding the word prediction and keymerging features resulted in a decrease in ID, which will translate to faster textentry. The case in which both predictions are utilized and keys are mergedperformed best with an ID of 30 .
13. In contrast with this result, the conditionin which predictions and key merging are disabled performed worst with an IDof 49 .
02. The difference cannot be interpreted as a 38 .
5% speed up, becauseas we have discussed in the beginning of this section, ID is multiplied by aparameter ( b ) and the result is added with another parameter ( a ) to find thetotal time needed for the user to travel through the designated path. However,we can assume that lower ID will correspond to faster typing.The more interesting results are seen for the cases in which only one of thefeatures is implemented. Word prediction is commonly implemented by softkeyboards as it is obvious that typing less characters will result in faster textentry. However, we see that the proposed key merging functionality will improve18erformance significantly more than using word predictions. Only using wordpredictions results in an ID of 40 .
40, while only using key merging results in anID of 33 .
68. We can say that soft keyboard designers should invest more of theirfocus in effective allocation of area, as doing so may produce better results, evenif the total number of actions to enter a piece of text would increase.
In this Section, we present the results of our experiment where we comparedSliceType with two other publicly available gaze typing keyboards, Dasher [15]and GazeTalk [16]. These keyboards were chosen to cover a wide variety ofdesign choices. SliceType and GazeTalk perform dwelling based selection, whileDasher employs continuous gestures. Itoh et al. have compared GazeTalk andDasher for typing in Japanese, and predicted Dasher to be better in the long-term [43]. To our knowledge, this is the only other study in the literature thatcompared gaze typing soft keyboards directly.
All three keyboards are operated using a continuous input device. To comparethem in terms of gaze typing performance, we used an eye tracking system totranslate the user’s eye movements to cursor movements. The eye tracker usedin the experiment is Monocular Edge Analysis System from LC Technologies. Ithas a sampling rate of 60 Hz, and a gaze position estimation accuracy of 0 . ◦ .In the experiments, the eye tracker is used from a typical distance of 50 cm.The calibration was repeated before the use of each keyboard.The experiment was conducted with 37 undergraduate and graduate com-puter and electrical-electronics engineering students. None of the participantshave used an eye tracker before. The participants were not asked to removetheir glasses or contact lenses. They were given written usage instructions foreach keyboard prior to the experiment. After reading the instructions, eachparticipant was individually taught how the eye tracking system and each key-board works for a total of thirty minutes. After this brief training session, theparticipants were asked to gaze-type a different paragraph with each keyboard.The paragraphs were chosen to represent the complexity of daily conversation.Instead of having the participants read from the source material, we dictatedit to them. This was done to emulate typing as a means of communication. Theparticipants were asked to type as much of the text as possible in 5 minutes.The participants were instructed to correct any errors they may make beforethe session, but they were not actively notified of their errors during the session.Both the keyboard order and keyboard–paragraph matching were permutedto prevent any bias. The keyboards were sized to cover the left-half of a 21”screen with 1920 × A v e r a g e N u m b e r o f C h a r ac t e r s p e r M i nu t e s Figure 14: Average number of characters entered by the participants in 5 min-utes.GazeTalk were set to 1000 ms, and for Dasher, the speed was set to 0.8 withadaptive speed adjustment enabled.After the experiment was finished, the participants were asked which key-board they would prefer to use if operating a soft keyboard with an eye trackerwas their only means of communication. The participants put the keyboards inthe order of preference. The reasoning behind this question was that althougha user may type slowly using a keyboard, they may had a better user experi-ence with it, and vice versa. The experiment and the survey were done blindly,meaning that the participants were not informed about which keyboard wasbeing developed in this study.
Let us begin by emphasizing again that the experiment were conducted withnovice users. Both the text entry rates and preferences are likely to change asthe users gain proficiency. Therefore, the results should be considered as anearly prediction, rather than a conclusion.Figure 14 illustrates the average number of characters entered by all par-ticipants in 5 minutes. Clearly, SliceType provides the highest text entry rate,while the participants performed poorly with Dasher. We compared these re-sults with the ones of our previous experiment [14] in Table 2. The results fromFigure 14 are converted to words per minute (wpm) using the 5 characters perword standard. It is important to note that the soft keyboard in [14], eOSK, wasan early prototype of SliceType that did not include key merging. Therefore,the mouse experiment result for SliceType is not very representative.As seen in Table 2, SliceType has the highest text entry rate in both ex-periments, but the ranking between Dasher and GazeTalk is reversed. Thedifference between the results obtained in this experiment and the results from20able 2: Text Entry Rates (Word per Minute)Dasher GazeTalk SliceTypeMouse Experiment [14] 3.50 3.07 5.42*Eye Tracker Experiment 1.68 2.93 3.45 * Without key merging.Table 3: Post-hoc Comparisons with Bonferroni Correction Following One-wayANOVA ( p < . < . < . F (2 , . p < .
001 ). To analyze the results further,we applied post-hoc comparisons with Bonferroni correction (see Table 3). Itcan be seen that there is a very significant difference between the text entryrates achieved with Dasher and SliceType. The difference between GazeTalkand SliceType is less pronounced ( p ≮ . Dasher
16% 70%14%
GazeTalk
22% 41%38%
SliceType
Less than 50 characters50 to 100 charactersMore than 100 characters
Figure 15: Percentage of the participants who typed <
50, 50–100, or > Dasher
19% 54%27%
GazeTalk
8% 38%54%
SliceType
Ranked 3rd by the usersRanked 2nd by the usersRanked 1st by the users
Figure 16: The users were asked to rank the keyboards in their order of prefer-ence for future use. Lower rankings indicate higher preference. The results arepresented as percentages.were able to reach an average typing speed of 50 to 100 characters, while as manyparticipants were able to reach high typing speeds of more than 100 characters.This shows that SliceType is easy to learn and use, and the users can speed upeven more with a little practice. Only a small percentage of the participantswere not able to use SliceType efficiently. If we examine the average entry rate ofthese participants across keyboards, we can see that they are all below average.This implies that the participants who underperformed with SliceType mayhave suffered from a more general problem. This can include variations in eyetracker performance based on the individual, participants’ level of motivationand proficiency with computers.Finally, the participants were asked to rate the keyboards in terms of theirorder of preference. The results are illustrated in Figure 16. Recall that both theexperiments and the survey were conducted blindly to prevent any bias towardsa keyboard. The participants were also not informed of their text entry rateswith the keyboards before filling out the survey. Intuitively, we can expect theusers to prefer the keyboards that they can type faster with. The results of thesurvey can be summarized as follows:1. 46% of the users ranked the keyboards in the same order with their typingspeed.2. 59% of the participants ranked the keyboard that they typed fastest withas the best. 22able 4: Post-hoc Comparisons with Bonferroni Correction Following Aligned-rank Transform and One-way ANOVA ( p < . < . . . p = 0 . F (2 , . p < . p ≮ . N u m b e r o f C h a r ac t e r s p e r M i nu t e s Participants
Dasher
Ranked 1stRanked 2ndRanked 3rd020406080100 N u m b e r o f C h a r ac t e r s p e r M i nu t e s Participants
GazeTalk
Ranked 1stRanked 2ndRanked 3rd020406080100120140160 0 5 10 15 20 25 30 35 40 N u m b e r o f C h a r ac t e r s p e r M i nu t e s Participants
SliceType
Ranked 1stRanked 2ndRanked 3rd
Figure 17: Number of characters entered in 5 minutes by each participant, alongwith the participant’s ranking of the keyboard. “Ranked 1st” indicates that thekeyboard is the most preferred and “Ranked 3rd” indicates that the keyboard isthe least preferred by the participant. The solid black line indicates the mean,while the dashed lines indicate ± F (2 , . p = 0 . The main objective while designing a soft keyboard is to allow fast text entrywith a comfortable user experience. In this paper, we investigated the specificcase of gaze typing keyboards. Increasing the target sizes is the most effectiveway of improving a dwelling keyboard. That is because large keys are fastto navigate towards, and easy to dwell on. However, statically displaying afew large keys is not ideal. Instead, we focused on dynamically enlarging theprobable target keys to use the interface area efficiently. Given a fixed sizedinterface, the only way to enlarge a key is to shrink others. We proposed a moreextreme approach, namely removing the keys that are not likely to be used.Our analysis based on the Fitts’ law showed that this key merging approachimproves text entry rate even more than word completion.Word completion is a fundamental tool, used in nearly all contemporary softkeyboards. While the language model used to produce predictions is important,a more subtle factor is how the predictions are communicated to the user. Thecommon way is to present predictions in a dedicated area, grouped together. Theuser has to look over to the area to read the predictions, and going through a longlist takes time. We propose to display a prediction at each key, so that the userwill already be looking at the prediction while dwelling to select a character. TheFitts’ law analysis shows that this modification has a critical impact. Moreover,since these predictions are displayed individually, it is fast, even involuntary, toread them. A downside of this approach is that the keys not only have to belarge enough to contain characters, but also words. However, the key mergingfunction enlarges the majority of the target keys, thus is complementary as asolution to this problem.We experimentally compared the keyboard designed using these principles,SliceType, with two other gaze typing keyboards. The results showed thatnovice users typed faster with SliceType, and preferred it over others for dailycommunication. This validates that the design principles we have proposed arein accordance with our objectives. An interesting result is that 68% of theparticipants typed fastest with SliceType, while only 54% preferred it over theother two. This indicates that further study on the subject should be morefocused on user comfort than text entry rate.25 eferences [1] I. S. MacKenzie and S. X. Zhang, “The design and evaluation of a high-performance soft keyboard,” in
Proc. Conf. Human Factors in Computing Syst.(CHI) , 1999, pp. 25–31.[2] I. S. MacKenzie, S. X. Zhang, and R. W. Soukoreff, “Text entry using soft key-boards,”
Behaviour & Inform. Technol. , vol. 18, no. 4, pp. 235–244, 1999.[3] P. Isokoski, “Performance of menu-augmented soft keyboards,” in
Proc. Conf.Human Factors in Computing Syst. (CHI) , 2004, pp. 423–430.[4] K. Al Faraj, M. Mojahid, and N. Vigouroux, “BigKey: A virtual keyboard formobile devices,” in
Proc. Int. Conf. Human-Comput. Interaction (HCI Int.) , 2009,pp. 3–10.[5] P. Panwar, S. Sarcar, and D. Samanta, “EyeBoard: A fast and accurate eyegaze-based text entry system,” in
Proc. Int. Conf. Intelligent Human Comput.Interaction (IHCI) , 2012, pp. 1–8.[6] I. S. MacKenzie and R. W. Soukoreff, “Text entry for mobile computing: Modelsand methods, theory and practice,”
Human–Comput. Interaction , vol. 17, no. 2-3,pp. 147–198, 2002.[7] M. Romano, L. Paolino, G. Tortora, and G. Vitiello, “The tap and slide keyboard:A new interaction method for mobile device text entry,”
Int. J. Human–Comput.Interaction (IJHCI) , vol. 30, no. 12, pp. 935–945, 2014.[8] C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi, “Tap, dwell or gesture?:Exploring head-based text entry techniques for HMDs,” in
Proc. Conf. HumanFactors in Computing Syst. (CHI) , 2017, pp. 4479–4488.[9] K. Grauman, M. Betke, J. Lombardi, J. Gips, and G. R. Bradski, “Communica-tion via eye blinks and eyebrow raises: Video-based human-computer interfaces,”
Universal Access in the Inform. Soc. (UAIS) , vol. 2, no. 4, pp. 359–373, 2003.[10] I. S. MacKenzie and B. Ashtiani, “BlinkWrite: Efficient text entry using eyeblinks,”
Universal Access in the Inform. Soc. (UAIS) , vol. 10, no. 1, pp. 69–80,2011.[11] P. Majaranta and K.-J. R¨aih¨a, “Twenty years of eye typing: Systems and designissues,” in
Proc. Symp. Eye Tracking Res. and Appl. (ETRA) , 2002, pp. 15–22.[12] A. Bulling and H. Gellersen, “Toward mobile eye-based human-computer inter-action,”
IEEE Pervasive Computing , vol. 9, no. 4, pp. 8–12, 2010.[13] Y. Zhang, M. K. Chong, J. M¨uller, A. Bulling, and H. Gellersen, “Eye trackingfor public displays in the wild,”
Personal and Ubiquitous Computing , vol. 19, no.5-6, pp. 967–981, 2015.[14] C. Topal, B. Benligiray, and C. Akinlar, “On the efficiency issues of virtual key-board design,” in
Proc. IEEE Int. Conf. Virtual Environments Human-Comput.Interfaces and Measurement Syst. (VECIMS) , 2012, pp. 38–42.[15] D. J. Ward, A. F. Blackwell, and D. J. MacKay, “Dasher—a data entry inter-face using continuous gestures and language models,” in
Proc. ACM Symp. UserInterface Software and Technol. (UIST) , 2000, pp. 129–137.[16] J. P. Hansen, D. W. Hansen, and A. S. Johansen, “Bringing gaze-based interactionback to basics,” in
Proc. Int. Conf. Human-Comput. Interaction (HCI Int.) , 2001,pp. 325–329.
17] F. Shein, G. Hamann, N. Brownlow, J. Treviranus, M. Milner, and P. Parnes,“WiViK: A visual keyboard for Windows 3.0,” in
Proc. Annu. Conf. of the Re-habilitation Eng. Soc. of North America (RESNA) , 1991, pp. 160–162.[18] J. P. Hansen, A. S. Johansen, D. W. Hansen, K. Itoh, and S. Mashino, “Languagetechnology in a predictive, restricted on-screen keyboard with ambiguous layoutfor severely disabled people,” in
Proc. EACL Workshop on Language Modelingfor Text Entry Methods , 2003.[19] A. Huckauf and M. H. Urbina, “Gazing with pEYEs: towards a universal inputfor various applications,” in
Proc. Symp. Eye Tracking Res. and Appl. (ETRA) ,2008, pp. 51–54.[20] M. E. Mott, S. Williams, J. O. Wobbrock, and M. R. Morris, “Improving dwell-based gaze typing with dynamic, cascading dwell times,” in
Proc. Conf. HumanFactors in Computing Syst. (CHI) , 2017, pp. 2558–2570.[21] J. O. Wobbrock, J. Rubinstein, M. W. Sawyer, and A. T. Duchowski, “Longi-tudinal evaluation of discrete consecutive gaze gestures for text entry,” in
Proc.Symp. Eye Tracking Res. and Appl. (ETRA) , 2008, pp. 11–18.[22] P. Isokoski and R. Raisamo, “Device independent text input: A rationale and anexample,” in
Proc. Work. Conf. Advanced Visual Interfaces , 2000, pp. 76–83.[23] M. Porta and M. Turina, “Eye-S: a full-screen input modality for pure eye-basedcommunication,” in
Proc. Symp. Eye Tracking Res. and Appl. (ETRA) , 2008, pp.27–34.[24] A. Kurauchi, W. Feng, A. Joshi, C. Morimoto, and M. Betke, “EyeSwipe: Dwell-free text entry using gaze paths,” in
Proc. Conf. Human Factors in ComputingSyst. (CHI) , 2016, pp. 1952–1956.[25] M. Ashmore, A. T. Duchowski, and G. Shoemaker, “Efficient eye pointing witha fisheye lens,” in
Proc. Graph. Interface (GI) , 2005, pp. 203–210.[26] G. Francis and E. Johnson, “Speed–accuracy tradeoffs in specialized keyboards,”
Int. J. Human-Comput. Stud. (IJHCS) , vol. 69, no. 7-8, pp. 526–538, 2011.[27] R. Bates and H. Istance, “Zooming interfaces!: Enhancing the performance of eyecontrolled pointing devices,” in
Proc. Int. ACM Conf. on Assistive Technologies(ASSETS) , 2002, pp. 119–126.[28] P. Majaranta, I. S. MacKenzie, A. Aula, and K.-J. R¨aih¨a, “Auditory and visualfeedback during eye typing,” in
Proc. CHI Extended Abstracts on Human Factorsin Computing Syst. , 2003, pp. 766–767.[29] H. Venkatagiri, “Efficient keyboard layouts for sequential access in augmentativeand alternative communication,”
Augmentative and Alternative Communication(AAC) , vol. 15, no. 2, pp. 126–134, 1999.[30] S. Zhai, M. Hunter, and B. A. Smith, “The metropolis keyboard - an explorationof quantitative techniques for virtual keyboard design,” in
Proc. ACM Symp. UserInterface Software and Technol. (UIST) , 2000, pp. 119–128.[31] ——, “Performance optimization of virtual keyboards,”
Human–Comput. Inter-action , vol. 17, no. 2-3, pp. 229–269, 2002.[32] G. Foster, P. Langlais, and G. Lapalme, “User-friendly text prediction for transla-tors,” in
Proc. Conf. Empirical Methods in Natural Language Process. (EMNLP) ,2002, pp. 148–155.
33] K. Grabski and T. Scheffer, “Sentence completion,” in
Proc. Int. ACM Conf. onRes. and Develop. in Inform. Retrieval (SIGIR) , 2004, pp. 433–439.[34] S. Bickel, P. Haider, and T. Scheffer, “Predicting sentences using n-gram languagemodels,” in
Proc. Conf. Human Language Technol. and Empirical Methods inNatural Language Process. , 2005, pp. 193–200.[35] I. S. MacKenzie and X. Zhang, “Eye typing using word and letter prediction and afixation algorithm,” in
Proc. Symp. Eye Tracking Res. and Appl. (ETRA) , 2008,pp. 55–58.[36] M. K. Sharma, S. Dey, P. K. Saha, and D. Samanta, “Parameters effecting thepredictive virtual keyboard,” in
Proc. IEEE Students’ Technol. Symp. (TechSym) ,2010, pp. 268–275.[37] C. D. Manning and H. Sch¨utze,
Foundations of Statistical Natural Language Pro-cessing . MIT Press, 1999.[38] P. M. Fitts, “The information capacity of the human motor system in controllingthe amplitude of movement.”
J. Experimental Psychology (JEP) , vol. 47, no. 6,p. 381, 1954.[39] A. Diaz-Tula and C. H. Morimoto, “AugKey: Increasing Foveal Throughput inEye Typing with Augmented Keys,” in
Proc. Conf. Human Factors in ComputingSyst. (CHI) , 2016, pp. 3533–3544.[40] A. Gunawardana, T. Paek, and C. Meek, “Usability guided key-target resizingfor soft keyboards,” in
Proc. Int. Conf. Intelligent User Interfaces (IUI) , 2010,pp. 111–118.[41] I. S. MacKenzie, “Fitts’ law as a research and design tool in human-computerinteraction,”
Human-Comput. Interaction , vol. 7, no. 1, pp. 91–139, 1992.[42] I. S. MacKenzie and W. Buxton, “Extending Fitts’ law to two-dimensional tasks,”in
Proc. Conf. Human Factors in Computing Syst. (CHI) , 1992, pp. 219–226.[43] K. Itoh, H. Aoki, and J. P. Hansen, “A comparative usability study of twoJapanese gaze typing systems,” in
Proc. Symp. Eye Tracking Res. and Appl.(ETRA) , 2006, pp. 59–66., 2006, pp. 59–66.