[PDF] A Probabilistic Interpretation of Motion Correlation Selection Techniques

Abstract

Motion correlation interfaces are those that present targets moving in different patterns, which the user can select by matching their motion. In this paper, we re-formulate the task of target selection as a probabilistic inference problem. We demonstrate that previous interaction techniques can be modelled using a Bayesian approach and that how modelling the selection task as transmission of information can help us make explicit the assumptions behind similarity measures. We propose ways of incorporating uncertainty into the decision-making process and demonstrate how the concept of entropy can illuminate the measurement of the quality of a design. We apply these techniques in a case study and suggest guidelines for future work.

Full PDF

AA Probabilistic Interpretation of Motion Correlation SelectionTechniques

Eduardo Velloso [email protected] University of MelbourneVictoria, Australia

Carlos Hitoshi Morimoto

University of São PauloSão Paulo, [email protected]

ABSTRACT

Motion correlation interfaces are those that present targets movingin different patterns, which the user can select by matching theirmotion. In this paper, we re-formulate the task of target selectionas a probabilistic inference problem. We demonstrate that previousinteraction techniques can be modelled using a Bayesian approachand that how modelling the selection task as transmission of infor-mation can help us make explicit the assumptions behind similaritymeasures. We propose ways of incorporating uncertainty into thedecision-making process and demonstrate how the concept of en-tropy can illuminate the measurement of the quality of a design.We apply these techniques in a case study and suggest guidelinesfor future work.

CCS CONCEPTS • Human-centered computing → Interaction techniques ; HCIdesign and evaluation methods ; HCI theory, concepts and models . KEYWORDS motion correlation, pursuits, computational interaction, probabilis-tic input, gestures, gaze interaction

ACM Reference Format:

Eduardo Velloso and Carlos Hitoshi Morimoto. 2021. A Probabilistic Inter-pretation of Motion Correlation Selection Techniques. In

CHI Conference onHuman Factors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama,Japan.

ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3411764.3445184

Though the dominant interaction paradigm in current gestural userinterfaces is still largely based on deictic and manipulative gestures,a different kind of interaction has been gathering interest in recentyears—that of motion correlation [28]. The principle underlying thistype of interaction relies on the interface presenting moving targets,each with a distinct motion pattern, which the user can mimic inorder to signal the intention to select the desired target [28]. Thesystem then measures the similarity between the signal originatingfrom the input device and the signals corresponding to the targets

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445184 moving on the screen in order to determine which target (if any) isbeing followed.The literature contains a wealth of examples that successfullyemploy the principle, including for gaze interaction with smartwatches [10, 11], public displays [18, 32], virtual reality [17], andsmart homes [30]; for manual control in body-based games [2, 6]and smart TVs [3, 4, 31]; for head control of head-up displays [12];for one-handed ring-based input [36]; for menu selection withmice [13, 35]; among others.Despite building upon the same principle, these works employ awide variety of input devices, data processing pipelines, similaritymeasures, feedback mechanisms, and evaluation procedures. As aconsequence, it is challenging to compare approaches and resultsin order to advance the state-of-the-art of motion correlation inter-action. The standard analysis approach in related work is empirical,instantiating different designs and collecting performance data (e.g.precision, recall, selection times, and error rates) in user studies.The problem with this is that the design space of motion correlationinterfaces is enormous—not only there are many design parame-ters to be manipulated, but these parameters are not necessarilyindependent.In this paper, we propose an interpretation of motion correla-tion based on probability and information theory, inspired by theearly work conducted by Williamson and Murray-Smith [34, 35]and aligned with current trends in computational interaction [23].As such, we offer the following contributions: (1) We demonstratethat previous works on motion correlation can be modelled as aprobabilistic reasoning problem; (2) we demonstrate a series ofprobabilistic techniques for better understanding combinations ofinterface design decisions; (3) we offer a case study of the appli-cation of these techniques in the context of a motion correlationtechnique,

Orbits [10]. We argue that such account offers a holisticapproach for understanding motion correlation and a consistentlanguage for framing future work and advancing progress on thetopic.

A motion correlation interface presents the user 𝑛 targets, eachmoving according to its own characteristics, such as trajectory,velocity, phase, and direction. The user signals their intention toselect a target by matching the motion of the desired target. Whenthe system detects that the user has been following one of the tar-gets, the selection of the corresponding target occurs. The mainadvantage of this technique is that users can signal their intentionto select a target by mimicking its motion with their eyes, hand, a r X i v : . [ c s . H C ] F e b HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto head, or finger, without the need of an explicit pointing device orselection confirmation mechanism. Previous works have demon-strated a wealth of opportunities for motion correlation interfaces,including enabling calibration-free gaze interaction [30, 32], im-plicitly calibrating eye trackers [14, 18, 24], enabling cursorlessmulti-user gestural interaction [2], and enabling interaction withsmall displays [10, 11].The central challenges for a motion correlation system are todetect whether the user is trying to select a target, and if so, which target they are trying to select. To accomplish this, system designersmust carefully choose a set of parameters that together determinethe performance of the system. These parameters include the num-ber of targets, their motion characteristics (e.g. deterministic vs.random, path shape, velocity, acceleration, direction, etc.), the inputmodality and device (e.g. eye trackers, RGB cameras, depth cam-eras), the feedback modality and device (e.g. screens, projections,mechanical actuators), the similarity measure used to compare theprojection of the user movements onto the input space with thetarget motions (e.g. Euclidean distance, Pearson’s correlation, fre-quency analysis, etc.), the amount of data being compared (i.e. thenumber of samples within a window), the decision criteria for de-termining when a selection happens (e.g. the correlation threshold),among many others (see Velloso et al. for an overview of theseparameters [28]).The large number of parameters creates a rich design space, withmany opportunities for building novel interfaces. The downsideis that thoroughly exploring and understanding this design spacebecomes a challenge without a systematic process for comparingalternatives. Previous works have taken an empirical approach forsuch exploration—designing interfaces with different combinationsof parameters and evaluating them through user studies. However,recruiting participants is expensive and time-consuming. In thiswork, we propose techniques for early design space explorationthat enable us to analyse design decisions a priori.

Any interaction with a system involves a level of uncertainty, includ-ing factors such as noise, user capabilities, and sensor limitations.Probabilistic interaction techniques, in contrast to conventionaldeterministic ones, explicitly model this uncertainty to enhanceand improve the interaction. They typically draw from probabilitytheory, information theory, control theory, machine learning, andother related areas.For example, Rogers et al. [25] employed a particle filtering ap-proach to improve finger touch estimation using low resolutioncapacitive sensing arrays, and Biswas et al. [1] used a Kalman filteralgorithm to estimate the cursor position and the probability distri-bution of possible targets to improve the performance of pointingtasks for people with physical impairments. These approaches havein common the modelling of likelihoods of possible user states anduse this to improve the interaction.A more general framework for handling input with uncertaintywas proposed by Schwarz et al. [26]. The framework tracks theprobabilities of alternative inputs, provides a mechanism to dispatchthe input to appropriate interactors (or interface elements), andallows the interactors to provide feedback (mediation) or other appropriate actions to resolve ambiguous input cases. Anothergeneral framework based on information theory for modellinginteraction is that of

Bayesian Information Gain , which has beendemonstrated in applications such as pan and zoom navigation andfile retrieval [20, 21]. The approach interprets every user actionas clue for what the user is interested in, updating a probabilisticmodel of the information space. The system can then update itsvisualisation in order to maximise the expected information gainof the system. We apply ideas inspired from these approaches tothe specific problem of selection through motion correlation.In the context of motion correlation we consider the literatureabout probabilistic methods used in gestural user interfaces. A ges-ture can be represented as a temporal sequence of measurementsamples provided by at least one position or motion sensor, forexample. A particular gesture can be recognised by matching theshape trajectory described by the user’s motion to expected knownshapes, such as seen in Figure 1a. Sensor noise and personal mo-tion traits are possible sources variations in the path obtained bythe input device that are challenging for any gesture recognitionsystem. Statistical shape analysis methods [9] have been designedto handle such variations in the interpretation of gestures. Activeshape and appearance models are examples of such techniquesused in computer vision to find objects (such as hands and faces)in images [5] and for recognition of handwritten characters [27].a b

Figure 1: a) The basic motion correlation idea selects a tar-get moving along a known trajectory using some similaritymeasure; and b) shows the idea of the pointing without apointer method proposed by Williamson and Murray-Smith.Black arrows indicate the expected motion of each target. Toselect target 𝑥 𝑖 the user must stabilise it moving the mousein the opposite direction 𝑚 . Previous motion correlation methods have mostly taken a de-terministic approach for detecting target selection. Typically, theymeasure the similarity between the projection of the user’s move-ments onto the input space and each target’s motion; then, if thesimilarity crosses a given threshold, the target is selected.A notable exception was the “pointing without a pointer” tech-nique proposed by Williamson and Murray-Smith [35]. Their ideawas to present the user several targets moving according to apseudo-random disturbance. An arrow attached to each targetshows the user the direction where the target is moving to dueto the disturbance. To select a target, the user must try to stabiliseits motion by moving the mouse in the opposite direction of the

Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan target’s arrow, as illustrated in Figure 1b. The position of a target 𝑥 𝑖 at time 𝑡 is computed by adding the mouse motion 𝑚 to thedisturbance for 𝑥 𝑖 . The variance of this sum, 𝜎 𝑠𝑖 , is computed overa time window 𝑤 . The variance of the disturbance applied to eachtarget, 𝜎 𝑓 𝑖 is also used to compute the ratio 𝑟 𝑖 = 𝜎 𝑠𝑖 / 𝜎 𝑓 𝑖 . A small 𝑟 𝑖 can be used as evidence that the user is trying to select 𝑥 𝑖 .Williamson and Murray-Smith[35] define a threshold 𝑣 < 𝑥 𝑖 increases, and decreases otherwise. Starting froman even prior for all targets, when 𝑟 𝑖 < 𝑣 , they keep track of an in-termediary weight value for each target, which increases additivelyby 𝛼 × 𝑟 𝑖 , and is attenuated multiplicatively by 𝛽 otherwise. Theseintermediate values are then normalised to produce the new proba-bilities and, when the resulting entropy falls below a threshold, thetarget 𝑖 with the highest 𝑝 ( 𝑥 𝑖 ) is selected. The choices of 𝛼 , 𝛽 , and 𝑣 affect how quickly and how accurately the system decides thatthe user is making a selection.In comparison to deterministic approaches, Williamson andMurray-Smith’s probabilistic approach has several advantages. Thefirst is that it considers the probability of the targets jointly andprovides a measure of uncertainty that can be used to indicate themost probable target when the uncertainty is low, i.e., when theentropy of the system is sufficiently small. The second is that itallows the evidence from the measurements to be integrated in aprobabilistic update process as a function of prior probabilities. Athird advantage compared to typical probabilistic gesture-basedmethods is that it neither requires training nor a specific shapemodel. Despite this early work, works in motion correlation sincethen have not used probabilistic techniques for inferring whichtarget the user is trying to select.In this work, we take direct inspiration from the pioneeringwork of Williamson and Murray-Smith, abstracting it to modellater approaches and extending it to enable the analysis and designof new interfaces. Our work is framed within the larger program of computational interaction , which “focuses on the use of algorithmsand mathematical models to explain and enhance interaction" [23].Core to computational interaction is mathematical modelling, typi-cally involving ways for updating the model using data observedfrom the user [23]. In this paper, we model motion correlation usinga probabilistic approach, which enables us to derive a set of tech-niques for examining motion correlation designs before collectinguser data. Our thesis in this paper is that we can better understand the prin-ciples and make more informed and confident interface designdecisions by looking at motion correlation interfaces from a proba-bilistic perspective. As such, our goal is to bring ideas from prob-abilistic interaction to researchers and practitioners working onthe development of motion correlation interfaces. In this and thenext few sections, we offer a didactic primer on probabilistic in-put and demonstrate how previous motion correlation techniquescan be formulated in probabilistic terms. As we discuss how tomodel deterministic techniques as probabilistic, we formalise a set of principles that are useful for the design of motion correlationinterfaces.In addition, we offer a practical example of how to employ thetechniques we propose in the case study of

Orbits [10], a gaze-onlymotion correlation technique for interaction with smart watches.The design of

Orbits was done through an empirical process, col-lecting user data at each stage of the development. Here, we demon-strate the additional insights that are gained by analysing theseinteractions through a probabilistic framework.To begin, we demonstrate how to formalise the motion corre-lation selection task in a notation that allows us to discuss it inprobabilistic terms. The basic challenge in any selection task isto discover the user’s state (e.g. the intention to select a target,if any) given a system state (e.g. a graphical user interface withmoving targets) and a sequence of user behaviours (e.g. matchingthe motion of the intended target or natural behaviour when notattempting to make a selection). From a probabilistic perspective,we can model these as random variables, where the user behaviouris influenced by the user state and the system state. If we break thesystem state into separate random variables for each target, we canbuild the Bayesian network in Figure 2.

UserState UserBehaviourTarget 1Movement Target 2Movement Target 3Movement

Figure 2: In a motion correlation interface the user move-ment is influenced by the user state and by the system state,represented here by each separate target’s motion. Observedvariables are in grey.

More formally, the user state is a random variable with a samplespace containing the intention to select each target ( 𝑥 𝑖 ) and theintention to not select any of them ( 𝑥 ∅ ). 𝑈 𝑠𝑒𝑟 𝑆𝑡𝑎𝑡𝑒 = 𝑋 ∈ { 𝑥 ∅ , 𝑥 , . . . , 𝑥 𝑁 } (1)The target motions ( 𝑇 𝑖 ) and the user behaviour ( 𝑈 ) are sequencesof two-dimensional coordinates, not necessarily in the same coor-dinate system—e.g. the input device might not be calibrated to theoutput device.Like all Bayesian networks, the model in the figure encodes ajoint distribution over the modelled variables. Missing edges reflectindependence assumptions that are necessary to make probabilisticinferences efficient in runtime (computationally) and in observa-tions needed for model fitting (statistically). In this case, we knowthat the variables 𝑋,𝑇 , . . . ,𝑇 𝑁 are pairwise independent by con-struction of the target motions.During the interaction, the target motions and the user behaviourare our evidence variables (denoted in the figure in grey), as theyare directly observable, whereas the user state is our query variable(denoted in the figure in white), which we would like to infer. This HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto is specified as the following conditional probability, which we canunpack with Bayes’ rule. 𝑃 ( 𝑋 | 𝑈 ,𝑇 , . . . ,𝑇 𝑁 ) = 𝑃 ( 𝑈 | 𝑋,𝑇 , . . . ,𝑇 𝑁 ) 𝑃 ( 𝑋 ) 𝑃 ( 𝑈 ,𝑇 , . . . ,𝑇 𝑁 ) (2)The key term in this equation is the likelihood 𝑃 ( 𝑈 | 𝑋,𝑇 , . . . ,𝑇 𝑁 ) .This term computes for each possible user state, the probabilityof observing a given user behaviour given the motions availableon the interface and the user state. A well-designed motion cor-relation interface displays motions that elicit sufficiently unusualuser movements—decreasing 𝑃 ( 𝑈 | 𝑥 ∅ ,𝑇 , . . . ,𝑇 𝑁 ) —and that couldonly happen given that the motion is displayed on the interface—increasing 𝑃 ( 𝑈 ≈ 𝑇 𝑖 | 𝑥 𝑖 ,𝑇 , . . . ,𝑇 𝑁 ) , while balancing subjective userexperience factors such as comfort, effort, and social acceptability.The challenge lies in how to compute this likelihood.The motions of the targets should also be sufficiently differentfrom each other, so that 𝑃 ( 𝑈 ≈ 𝑇 𝑗 | 𝑥 𝑖 ≠ 𝑗 ,𝑇 , . . . ,𝑇 𝑁 ) , that is, thelikelihood of observing a user movement that similar to target 𝑗 when trying to select target 𝑖 should be small. Such difference canbe created a priori by through the path design or in real-time, suchas in Williamson and Murray-Smith’s decorrelation approach [35].In their approach, as the probability of a target increases, so doesthe range of the disturbances on the target’s motion characteristics(e.g. path, speed, etc.), to make the motion of the target even moredistinct than the others, as if testing the hypothesis that the user isindeed trying to select it.With this framing, we can introduce a formal notation to knownprinciples of motion correlation design. Principle 1: Motions in the interface should elicit asimilar movement from the user

We aim to maximise 𝑃 ( 𝑈 ≈ 𝑇 𝑖 | 𝑥 𝑖 ,𝑇 , . . . ,𝑇 𝑁 ) —the likeli-hood that given a set of target motions and given that theuser is trying to select target 𝑖 , the movement observedfrom the user should be similar to the motion of target 𝑖 . Principle 2: Targets should elicit motions distinctfrom non-interaction behaviour

The motions of the targets on a motion correlation in-terface should not elicit movements that frequently oc-cur in the user’s natural behaviour when they are notinteracting with the system. Formally, we aim to minimise 𝑃 ( 𝑈 ≈ 𝑇 𝑖 | 𝑥 ∅ ,𝑇 , . . . ,𝑇 𝑁 ) —the likelihood of observing auser movement similar to a target motion when the useris not interacting with the system. Principle 3: Targets should display motions distinctfrom each other

We aim to minimise 𝑃 ( 𝑈 ≈ 𝑇 𝑗 | 𝑥 𝑖 ≠ 𝑗 ,𝑇 , . . . ,𝑇 𝑁 ) —the like-lihood of observing a user movement similar to target 𝑗 when the user is trying to select a different target 𝑖 . In their evaluation of the performance of

Orbits , Esteves et al. mea-sured the effects of the number of targets (2, 4, 8, 16), trajectorysize (4.25 𝑜 , 2.62 𝑜 , and 0.98 𝑜 ) and target speed (120 𝑜 / 𝑠 , 180 𝑜 / 𝑠 , and240 𝑜 / 𝑠 ) on the true- and false-positives in a selection task [11]. Inall conditions, they used a fixed window size of 1s, a 30Hz eyetracker, and a Pearson’s correlation threshold of 0.8. The targetsmoved along a circular trajectory, with half of the targets movingin the opposite direction of the other half. This is representative ofthe types of evaluation procedures found in previous works.In the case of Orbits , the user state is the desire to select a targetor not, and their behaviour is captured by the system as

𝑋, 𝑌 coordi-nates provided by the eye tracker. Because the user most often looksstraight at the screen, it is expected that when following a target,the user’s eyes will follow a smooth pursuit movement that closelyresembles the motion of the target (Principle 1). The head-mountedeye tracker employed in

Orbits captures data even when the user isnot interacting with the system, so we must consider that the 𝑥 ∅ state will be over-represented—that is, most of the time, the userwill not be interacting with the system. However, because the eyesonly engage in smooth pursuits when there is a moving target forthem to track, if the system is able to accurately detect whether theeyes are in this state, Principle 2 should be satisfied. The targetsmove around circular trajectories. Differently from shapes withstraight edges, no two windows in a circle are the same, so all tar-gets will exhibit a different motion at all points in time (satisfyingPrinciple 3). The discussion above emphasises that ultimately, we seek to com-pute likelihoods. The novel insight in the motion correlation litera-ture was that we can estimate these likelihood from the similaritybetween the motion of the target and the movement that user ismaking as captured by the input device. We can model this inter-action from an information theory perspective, as transmission ofinformation [16]. From this perspective, when the user is attempt-ing to select a target, that target motion is like a message passingthrough a noisy channel and manifesting itself as the user behaviour.As such, the measurements 𝑈 ( 𝑡 ) can be seen as a distorted versionof the selected target trajectory (or none). These distortions mightinclude noise, delays, and spatial transformations introduced by thesensors, processing hardware, geometric setup, and user movementinaccuracies. We represent these distortions as a spatial transforma-tion matrix 𝐴 , a time delay 𝜏 , and a noise component 𝜖 . Note thathere we assume linear distortions, where distortions found in thereal world, such as those introduced by camera lenses, might not belinear. More complex distortions require more sophisticated mod-elling approaches or different assumptions made by the interactiontechnique. 𝑋 = 𝑥 𝑖 = ⇒ 𝑈 ( 𝑡 ) = 𝐴 · 𝑇 𝑖 ( 𝑡 − 𝜏 ) + 𝜖 ( 𝑡 ) (3)By treating the interaction process as transmission of informa-tion, we can detect the evidence of whether a selection is takingplace by measuring how much information is being transmitted.The mutual information —the amount of information you gain about Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan one variable by learning the value of another [22]—can then be usedas evidence that the interaction is taking place and that a selectionis being attempted. This consolidates Principles 1–3 into a unifiedprinciple:

Principle 4: The intent to select is evidenced by theamount of mutual information between the targetand user behaviours 𝑃 ( 𝑥 𝑖 ) ∝ 𝐼 ( 𝑈 ; 𝑇 𝑖 ) —that is, the higher the mutual informa-tion between the motions of the user and the target, thehigher the likelihood that the user intends to select thattarget.The insight that mutual information is the key idea behind theselection means that the term “correlation” in “motion correla-tion” should be understood in the broad English sense of the wordrather than as Pearson’s correlation coefficient. In practice, how-ever, computing the mutual information is challenging [33], so wemust estimate it using other similarity measures , such as Pearson’sproduct-moment correlation coefficient, as used in Vidal et al.’s Pursuits [32]. We denote this similarity measure as 𝑟 ( 𝑈 ,𝑇 𝑖 ) = 𝑟 𝑖 .Therefore, instead of computing 𝑃 ( 𝑋 | 𝑈 ,𝑇 , . . . ,𝑇 𝑁 ) , we can com-pute 𝑃 ( 𝑋 | 𝑟 , . . . , 𝑟 𝑁 ) , which is much easier. Our network then be-comes the one shown in Figure 3. UserStater ₁ r ₃ r ₂ Figure 3: By measuring the similarity between the usermovement and the motion of each target, we can more eas-ily compute the probability that the user is in a given state.Observed variables are in grey.

Besides Pearson’s coefficient, many other similarity functionshave been used in previous work. We have already described howWilliamson and Murray-Smith used a ratio of variances as theirsimilarity function [35]. Fekete et al. proposed several measurescomputed on the path derivatives, including the Euclidean distance,the normalised Euclidean distance, and Pearson’s correlation, ulti-mately opting for the normalised Euclidean distance [13]. Carteret al. also used Pearson’s correlation, but prior to computing thecoefficient, they rotated the data to ensure an even distribution ofvariance between the horizontal and vertical axes [2]. Velloso etal. proposed a version of Pearson’s correlation that considered thevariance in both axes simultaneously [29]. Drewes et al. used thelinear regression slope [8]. Zhang et al. compared the dominantfrequencies of the signal [36].The choice of similarity measure depends on the assumptionsabout the data. For example, if we assume no calibration between the input device and the display—i.e. we do not know the transfor-mation matrix 𝐴 —it becomes difficult to find appropriate thresholdsfor using the linear regression slope or anything based on Euclideandistances. Though measures based on Pearson’s correlation workwell in these cases, they are susceptible to statistical factors knownto affect this coefficient, such as the distributions of the data beingcompared, the variability of the data, the lack of linearity, the pres-ence of outliers, etc.—for a review, see Goodwin and Leech [15].Among them, one that is of particular relevance to the design ofmotion correlation interfaces is the fact that it expects a similar dis-tribution (preferably normal) in the samples being compared. Thisis difficult to guarantee with regular shapes, such as the square,where there might be windows with no variation in one of thecoordinates when the target is traversing one of the edges. Thecorrelation coefficient is also susceptible to the presence of out-liers, and this effect is more pronounced in small windows. As arule of thumb, Drewes et al. suggest that the window size shouldbe set to allow the motion of the target by 3-4 times the ampli-tude of the measurement noise [7]. The choice of measure alsodepends on which properties characterise the differences betweenthe target motions. Certain measures, like Zhang et al.’s frequencyanalysis [36] work best when targets display motions with different frequencies , whereas others, like Pearson’s correlation, work bestwhen the targets display motions with different phases , but thesame frequency.Most systems assume a sufficiently small lag 𝜏 and error 𝜖 ( 𝑡 ) ,and a stable calibration matrix 𝐴 . However, these can be measuredin pilot tests and incorporated into the calculation of the similar-ity, either by shifting the windows being compared (in the caseof a substantial 𝜏 ) or by filtering the input data (in the case of asubstantial 𝜖 ( 𝑡 ) ). Principle 5: The appropriateness of a similarity mea-sure is determined by how well it estimates the mu-tual information

Similarity measures should be chosen based on the possi-ble distortions of the channel and how much informationabout the user state can be determined from the knowledgeof the target states.

Because

Orbits was designed for interaction with smart watcheswhile the user is wearing a head-mounted eye tracker, we cannotassume that the gaze data will be in the same coordinate system asthe GUI in the watch. This means that the matrix 𝐴 in Equation 3will contain a scaling factor and that the noise component 𝜖 ( 𝑡 ) willhave a constant translation offset together with a random compo-nent. We assume that the two devices will be synchronised, making 𝜏 sufficiently small to be ignored. The user’s head and the watchmight not be perfectly aligned. This is a problem, particularly inthe case of targets moving around a circle, as the rotated imagewill lead to ambiguity. This can either be fixed by using the scenecamera of the eye tracker/the IMU in the watch to re-align them or HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto by assuming that this alignment is a pre-condition for the interac-tion, so we will not assume that 𝐴 contains a rotation component.These assumptions mean that certain similarity measures cannot beapplied. Ones based on Euclidean distance cannot be used becausethey are not invariant to translation and scale [13]; one based on thelinear regression slope cannot be applied because it is not invariantto scale [8]; and one based on frequency analysis would fail due tothe fact that all targets move with the same frequency [36]. Previous works typically make a decision about which target toselect deterministically, by simply thresholding the similarity at anempirically obtained value 𝜆 . We can model these approaches, byusing the following conditional probability density functions. 𝑝 ( 𝑥 𝑖 | 𝑟 𝑖 ) = (cid:40) 𝑟 𝑖 > 𝜆, 𝑝 ( 𝑥 ∅ | 𝑟 𝑖 ) = − 𝑁 ∑︁ 𝑖 𝑝 ( 𝑥 𝑖 | 𝑟 𝑖 ) (5)This formulation already exposes a few problems with this ap-proach. First, it does not deal well with cases where more than onetarget cross the threshold—all targets that cross get assigned anequal likelihood of 1. In practical implementations, this is typicallysolved by only selecting the target with the highest similarity. Nev-ertheless, it would be good to be able to incorporate this ambiguityinto the decision to whether make a selection and provide someform of user feedback, reflecting in the interface the uncertainty inthe system (for example, Williamson and Murray-Smith showeda growing circle around the target as its likelihood increased inrelation to the other targets [35] and Schwarz et al. delayed systemactions that were difficult to reverse until the system was confidentenough of the user intent [26]).The second insight is that the step function lacks nuance. Partic-ularly for values near the threshold, this can cause the probabilityto alternate between 0 and 1 with small perturbations in the outputof the similarity measure. A smoother thresholding function couldhelp address this problem and also assign higher probabilities totargets with a higher similarity. A simple solution is to use a prob-ability density function with a soft threshold, such as the probitor logit distributions. These approaches do not suffer from the dis-continuity at the threshold, smoothly increasing its outputs as thelikelihood of the input increases. The downside of these approachesis that they still require the system designer to manually specify amean threshold.A potentially better approach is to collect user data to learn thesedistributions. To do so, we must collect data of users attempting toselect targets, along with the target motion data in order to computethe similarity between them. Likewise, we must also collect data ofusers not trying to select any target, but rather, performing tasksappropriate for the context of the system.For example, a video-on-demand application as featured in Am-biGaze should consider collecting data not only of users attemptingto control the interface widgets, but also of users watching a varietyof videos [30]. One can then simulate target motions to computethe similarity values against the natural behaviour data. The system can then compute the similarity between the gaze behaviour andthe true target and estimate a probability density function 𝑝 ( 𝑟 𝑖 | 𝑥 𝑖 ) ,where 𝑥 𝑖 is the true state. The other distributions 𝑝 ( 𝑟 𝑖 | 𝑥 𝑗 ) , where 𝑖 ≠ 𝑗 , can be computed by modifying the gaze or trajectory data(e.g. if the target motions differ in phase, this can be simulated byshifting the data). Finally, the trajectory data of the target motioncan be compared to the natural behaviour dataset to generate thedistribution 𝑝 ( 𝑟 𝑖 | 𝑥 ∅ ) .By applying Bayes’ rule, we notice a few more issues: 𝑃 ( 𝑥 𝑖 | 𝑟 , . . . , 𝑟 𝑁 ) = 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 , 𝑟 , . . . , 𝑟 𝑖 − , 𝑟 𝑖 + , . . . , 𝑟 𝑁 ) 𝑃 ( 𝑥 𝑖 ) 𝑃 ( 𝑟 , . . . , 𝑟 𝑁 ) (6)In order to obtain the step function used in deterministic ap-proaches, the underlying assumption in these works is that whenthe user is following target i, 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 , 𝑟 , . . . , 𝑟 𝑖 − , 𝑟 𝑖 + , . . . , 𝑟 𝑁 ) = 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 , 𝑟 , . . . , 𝑟 𝑖 − , 𝑟 𝑖 + , . . . , 𝑟 𝑁 ) = 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 ) (7)This is not necessarily a problem: by making the similarities con-ditionally independent given 𝑥 𝑖 , we reduce the complexity of theinference—this is the assumption that all Naïve Bayes models make.However, 𝑟 𝑖 are not necessarily independent: if the motion of a tar-get 𝑇 is correlated with the motion of target 𝑇 and 𝑇 is correlatedwith 𝑇 , most likely 𝑇 is also correlated with 𝑇 . The problemis that we often only consider 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 ) when computing the likeli-hood, when in fact all other 𝑃 ( 𝑟 𝑗 | 𝑥 𝑖 ) also give us evidence for theprobability that 𝑃 ( 𝑥 𝑖 | 𝑟 , . . . , 𝑟 𝑁 ) .Shifting our attention to the other terms in the equation, thedenominator 𝑃 ( 𝑟 , . . . , 𝑟 𝑁 ) is just a normalisation factor to make theprobabilities add up to 1. The prior probability 𝑃 ( 𝑥 𝑖 ) , on the otherhand, can open up opportunities for tuning the algorithm. A naïve—though standard—approach would be to set all prior probabilitiesto be the same, effectively making this term irrelevant for the stateestimation. This is the approach underlying current deterministicapproaches, even though there is no real reason to assume that allprior probabilities would be the same in every case. The beauty ofprobabilistic input is precisely in enabling developers to incorporateprior knowledge in a quantifiable manner.Another approach would be to use domain knowledge or sepa-rate classifiers to estimate reasonable values. For example, a motioncorrelation interface based on smooth pursuits might use a separateclassifier to identify when the eyes engage in a smooth pursuit and It is important to note that 𝑇 and 𝑇 will only be positively correlated if 𝑟 ( 𝑇 ,𝑇 ) and 𝑟 ( 𝑇 ,𝑇 ) are sufficiently close to 1, but not necessarily true otherwise, as demonstratedby Langford et al. [19]—in other words, being positively correlated is not a transitiveproperty. Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan use its output as a prior. Alternatively, a gesture recognition algo-rithm could be used to determine the probability that the gesturehas the right shape before trying to determine which target alongthat shape is being followed. Yet another approach would be tocollect statistical data of application usage to compute these values.This is particularly valuable in cases where users exhibit consistentusage patterns, such as when typing or navigating hierarchicalmenus. A final possibility is using the probabilities computed atprevious points in time to adjust our confidence over time. Fromthis perspective, the Bayesian networks we presented so far can beextended into temporal inference models, leveraging techniquessuch as Kalman filters [1] and particle filters [25].

Esteves et al. used a single similarity threshold 𝜆 = . Orbits [10]. This dataset contained data of users followingcircular targets with their eyes, as well as other natural gaze data,including reading the time, watching a video, playing a video game,and reading a news article. The gaze data was sampled at 30Hz andcompared in windows of 30 samples.We computed the similarity between the gaze data and the tar-get using three metrics: the Pearson’s correlation coefficient of thelowest axis proposed by Vidal et al. [32], the modification of thismeasure involving a rotation of the windows to maximise the vari-ance across axes proposed by Carter et al. [2], and the complementof the ratio of the sum of squares computed in two dimensionsproposed by Velloso et al. [29]. In the cases where the user wasfollowing a target ( 𝑋 = 𝑥 𝑖 ), the similarity was computed betweenthe gaze data and the target positions. In the case where the userwas not following any target ( 𝑋 = 𝑥 ∅ ), we simulated a target usingthe same characteristics of the targets in the positive condition. Fig-ure 4 shows the probability density functions 𝑃 ( 𝑟 𝑖 | 𝑥 𝑖 ) and 𝑃 ( 𝑟 𝑖 | 𝑥 ∅ ) ,computed using a Gaussian kernel density estimate. As predicted,the empirical probability density functions have a much softer dis-tribution compared to the step function assumed in deterministicapproaches. These curves can then be used to choose an appropriatethreshold that gives more weight to robustness or responsivenessdepending on the requirements of the system. In the previous section, we discussed how prior approaches onlytake into account the similarity between the target motion andthe projection of the user movement onto the input space whencomputing the likelihood that the user is trying to select that target.However, even if the user is following the target perfectly, withoutany distortion (i.e. 𝑈 = 𝑇 𝑖 ) this motion can nevertheless present adegree of positive correlation with the other targets, especially ifthe trajectories are not carefully designed. Ideally, if the true stateis 𝑥 𝑖 , 𝑟 ( 𝑈 ,𝑇 𝑖 ) should be much higher than 𝑟 ( 𝑈 ,𝑇 𝑗 ) for all 𝑗 ≠ 𝑖 .However, depending on the choice of similarity measure, trajectoryshapes, and window sizes, this is not always achievable. Therefore, P R O B A B I L I T Y D E N S I T Y Carter et al. 2016

SIMILARITY

Vidal et al. 2012 Velloso et al. 2018

P(r i |x i )P(r i |x ) Figure 4: Probability density functions for three similaritymeasures found in the literature. Note that all measuresyield distributions that overlap and are therefore substan-tially different to the step function assumed by determinis-tic methods.Figure 5: When the path has straight edges that are longerthan the distance traversed by the target within the window,multiple trajectory windows will look the same, making itimpossible to distinguish between them. In the example, wesee three targets traversing the edge of a square path oneafter another. Even though they are temporally offset, theirrelative trajectories will look the same until they reach thecorner. understanding the expected levels of similarity between 𝑈 and all 𝑇 𝑗 when the user is following 𝑇 𝑖 can help us make decisions aboutthe quality of the trajectory design.This problem is exacerbated in the case of polygons where dis-tance traversed by the targets within a window is smaller than thelength of the edges, such as the one in Figure 5. The figure showsa square where targets traverse half the length of the edge withina window and all three targets move in the same direction. Thetrajectories in all three windows (as well as all other windows be-tween them) look exactly the same in terms of their relative motion.If a calibration-free system observes a straight vertical line fromthe input device, it will know that the user is following one of them,but not which of them. In this case, the system would have to waituntil the targets start hitting the corner of the polygon to be able totell them apart. From an information theory perspective, the cornerof the square has more information than the edge, making it moredistinguishable.In practice, the differences in the information content of thewindows of a trajectory mean that the point in time where the userbegins to follow a target matters for the interaction— when ideallyit should not : a target that is just entering the edge of the squarewill be harder to select than one that is about to hit the corner. Thismight affect the interaction by delaying the selection decision. If weare able to quantify how distinct any given window in a trajectoryis to the others, we can have an initial estimate of the quality of HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto w = 10 w = 15 w = 31 w = 39 w = 60 w = 80 w = 100 w = 120 E N T R O P Y A B C D

Figure 6: The window size affects the overall entropy of the system. Different windows have different amounts of information,leading to varying levels of entropy. The x-axis represents the index of the window. Increasing the window size does notnecessarily lead to a smaller entropy. that trajectory for a motion correlation interface. The key idea hereis that motion correlation interfaces must not only consider thetargets trajectories (i.e. space + time), but also the path itself (i.e.space alone).As such, we propose the use of entropy for understanding howdistinguishable each window along a path is from the rest. For eachwindow, we compute the similarity between that window and everyother window and normalise these values to obtain probabilities.We then compute the entropy for that window. This measures howsimilar this window is to all others—a low value of entropy meansthat there are few windows that capture most of the high similarityvalues, which is what we want. We finally compute the entropy forall windows of the trajectory to see how it fluctuates as the targetmoves. The idea here is that by looking at the path itself, comparingthe possible windows sampled from it to each other (as opposedto comparing them to windows sampled from the input), we canestimate the upper bound of the confidence that the system canhave at a given point in time (as the path itself would correspondto perfect motion matching with no noise).To demonstrate this technique, we generated a square trajectorywith 120 points. We then computed the entropy for all possiblewindows along the trajectory, varying the window size—this isequivalent to having 120 targets moving along a square while theuser follows one of them with no distortion. We used the Pearson’scorrelation of the most dissimilar axis [10, 32] as the similaritymeasure, and we repeated this process for windows ranging from5 to 120 samples. In total, we computed 116 (window sizes) × A V E R A G E E N T R O P Y WINDOW SIZE w = 39

Figure 7: Average entropy across all windows as a functionof the window size, for a square with 120 samples. A win-dow size of approximately 1/3 of the perimeter minimisesthe entropy. increases, it becomes sinusoidal (Figure 6-C) with higher baselinesand smaller amplitudes until it becomes a flat horizontal line whenthe window size is 100% of the perimeter (Figure 6-D).Figure 7 shows how the average entropy across the whole tra-jectory varies with the window size—each point in the curve is theaverage of each curve like the ones in Figure 6. In this case, theoptimal window size would be 1/3 of the perimeter, but this will de-pend on the similarity measure and trajectory shape. Naturally, thiswindow size is only optimal in terms of how well it distinguishesa window from the other windows in the shape, but in practicethe optimal window will also depend on other factors, such as theinteraction latency and robustness to noise. Nevertheless, this anal-ysis can help us reduce the design space of possible window sizesto explore in empirical studies. Further, it allows us to quantifythe effect of the starting point of the interaction on the expectedperformance of the system.The previous analysis shows that the entropy for each window inthe shape is affected by the amount of information in each segmentof the shape. An analysis like this can show that a priori, the systemwill have a higher degree of confusion at the edges than at thecorners, so the developer can choose to defer a decision until itreaches the corner or change the system parameters (e.g. adjustingthe window size to always include a corner). Further, the variationsin entropy are not simply due to the shape of the trajectory—themeasure itself leads to variations in performance along the same

Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan r (T ,T i ) T T T T T T T T T T i T T T T T T T T T T Figure 8: Similarity between all combinations of target 𝑇 and every possible target 𝑇 𝑖 within the same window. Even ifthe user managed to perfectly follow the target, they wouldstill yield a high similarity with other targets. shape. This is best demonstrated by how the same pairs of inputand target motion windows can yield different similarity valuesdepending on the coordinate system (see a concrete example in thecase study below). In Orbits , because targets move around circular trajectories, no twowindows will look exactly the same, regardless of the window size,avoiding the problem shown in Figure 5. However, depending onthe point along the trajectory through which the target is passingand how far apart they are, they might nevertheless exhibit highcorrelation.The easiest way to reveal this is to make pairwise comparisonsbetween the target windows themselves (in contrast to the usualapproach of comparing the targets’ and the user’s behaviours).Figure 8 shows an example of this type of analysis in the contextof

Orbits . Consider 10 targets moving around a circle with 160samples. The figure shows a comparison between one arbitrarywindow ( 𝑤 =

15) and all other windows using Vidal et al.’s methodbased on Pearson’s correlation [32]. By comparing the targets toeach other, we analyse the system performance as if the user wasable to perfectly match the desired target’s motion, giving us anupper bound on the system performance. In the case of the figure,the reference target is 𝑇 . As the bar chart shows, the similaritybetween the target and itself is perfect ( 𝑟 ( 𝑇 ,𝑇 ) = . 𝑟 ( 𝑇 ,𝑇 ) = .

96 and 𝑟 ( 𝑇 ,𝑇 ) = .

85. In practice, this meansthat even if the user follows the motion of the target perfectly, we willstill observe a high correlation with other targets .The choice of similarity measure also affects how the entropyof the design fluctuates along a path. To demonstrate this in thecontext of

Orbits , we generated a circular trajectory (radius = 1)with 60 samples, from which we sampled a window with 20 samples.We then simulated user input by distorting this trajectory with aGaussian noise (SD = .1). We then computed the similarity betweenthe original window and its distorted version using 4 similaritymetrics described in the literature [2, 7, 29, 32]. We then rotated these windows (both the original and the distorted) by angles rang-ing from zero to 360 degrees, computing the similarity for eachrotation.Figure 9-Left shows our results for four different similarity mea-sures. Though the absolute value of the similarity alone is not soimportant in this case—what matters is how well they are able todiscriminate between correct and incorrect targets—there shouldnot be much fluctuation in this value. This is because the data beingcompared is exactly the same ; the only difference is how the win-dows are orientated relative to the coordinate system. Notice thatthree of the measures are substantially affected by the orientationof the window. The plots in Figures 9-Centre and 9-Right help toillustrate why. In the case of the blue window, because there islittle variance in the trajectory data in the Y axis the noise in thedata ends up having a larger effect, leading to a lower correlation( 𝑟 = . 𝑟 = . Rotated Correlation , which prior to cal-culating the correlation coefficient, rotates the window to betterdistribute the variance across the axes [2].

In the previous section, we saw that current approaches are limitedin how they handle multiple targets yielding high similarities, oftenjust selecting the most similar one. Ideally, we should be able toquantify this uncertainty and make decisions about whether andwhich target to select based on our level of confidence consideringthe interface as a whole. Here, we draw directly from Williamsonand Murray-Smith, who used a similarity metric to compute aweight for each potential target, normalised them into probabilitiesand computed the entropy of the system to decide whether tomake a selection [22, 35]. For us, entropy is interesting because itquantifies how concentrated the probabilities are around any of thetargets.The entropy value can give us a hint as to the number of statesbetween which system is undecided: if the entropy 𝐻 ( 𝑋 ) = log 𝐾 ,we are uncertain between approximately 𝐾 states. As we gathermore evidence that a certain target 𝑥 𝑖 is being followed, we increaseits probability 𝑝 ( 𝑥 𝑖 ) and reduce the probabilities of the other states { 𝑥 𝑗 | 𝑗 ≠ 𝑖 } . As 𝑝 ( 𝑥 𝑖 ) reaches one (and the others approach zero),the uncertainty is reduced and the entropy tends to zero. Therefore,this allows us to handle the case where multiple targets are abovethe threshold. As a few of them yield high likelihoods, the otherswill yield low ones. Looking at the probabilities individually wouldlead us to select one or all targets; looking at the entropy, we learnthat the system is not confident enough in its estimation, so wemight wait until more data is available before making a selection. Inpractice, we make a selection when the entropy of the system fallsbelow a certain empirically determined threshold. Other simpler,but related approaches, such as the Gini coefficient or the ratiobetween the largest and second largest likelihoods can also beinformative in this regard.Figure 10 shows an example of how this works in practice inthe case of three targets moving with constant speed around a HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto S I M I L A R I T Y Figure 9: Left: The effect of the coordinate system on theoutput of different measures. Centre and Right: Correla-tion between the Y coordinate of the reference window andthe noisy window for windows centred at North and North-west points of the circle. When the variance in the data ismore evenly distributed along each axis (pink), we observea higher correlation than when it is concentrated in one ofthe axes (blue). square path, but slightly offset. The user data is shown in grey andcorresponds to a noisy version of the red trajectory. We computedthe similarity between the user data and each target trajectoryusing

Rotated Correlation . We then computed the likelihoods usingthe corresponding probability density function shown in Figure 4.Though it is similar in shape to the similarity curve, it normalises thevalues and pushes middling values towards the extremes of the scale.The likelihoods are then normalised into probabilities, from whichwe compute the overall system entropy. Initially, both the red andthe green targets yield high similarities and a deterministic systemwould struggle to decide which to select. When the blue targetreaches the edge, the uncertainty in the system further increases.This process is reflected in the entropy curve, which increaseswhen the blue target reaches the edge, but drops when the redtarget reaches the side edge and its trajectory becomes substantiallydifferent to the others. Note that the likelihoods shown in theplots are instantaneous (i.e. they only consider the data in thecorresponding window, ignoring previous data). The system can befurther enhanced by incorporating a dynamic model that considersthe previous likelihoods. This way, the probability of the blue targetat point B would not be as large as the probabilities of the othertargets and the overall entropy would be lower.

To understand the theoretical limits of a circular trajectory designfor motion correlation, we first consider the problem of discoveringthe maximum number of targets we could add to our interfaceunder the assumption that the user is able to perfectly mimic themotion of the target. Given the sampling rate of the eye tracker(30Hz), we can generate hypothetical perfect gaze trajectories with30 × / 𝑆𝑝𝑒𝑒𝑑 samples—leaving us with trajectories of 45, 60,and 90 samples. For each of these three trajectories, we computedthe similarity between all pairs of possible windows and averagedthem as per the number of samples between them. This step isnecessary because as we saw in Figures 8 and 9 the correlationvalues vary depending on the orientation of the window relative to M O T I O N E N T R O P Y P R O B A B I L I T I E S L I K E L I H OO D SS I M I L A R I T I E S .50-.5.8.6.4.201.75.5.251.61.2.8.40 A B C

Figure 10: Stepping through a probabilistic calculation:Three targets move along a square path as the user noisilymatches the motion of the red target. We compare the in-put signal to the trajectories of the targets, calculate the like-lihoods using probability density functions, and normalisethem to calculate the probabilities and the system entropy.Initially, two of the targets exhibit the same relative trajec-tory (A), yielding high similarities for both. When the thirdtarget reaches the edge (B), the entropy of the system in-creases. When the red target reaches the side edge of thesquare (C), its trajectory is sufficiently distinct from the oth-ers, leading to a drop in the entropy, which signals to thesystem that a selection can be made. the coordinate system. Considering that each of these samples couldbe a potential target leaves us with the problem of distinguishingwhich of these

𝑁𝑆𝑎𝑚𝑝𝑙𝑒𝑠 targets the user is trying to follow. Assuch, the initial entropy of the system is log ( 𝑁𝑆𝑎𝑚𝑝𝑙𝑒𝑠 ) .Using the same approach used by Esteves et al., we thresholdedthe similarity values at 𝜆 = .

8. We assigned a likelihood of 1 toall windows above this threshold and zero otherwise. We then nor-malised the likelihoods to obtain the probabilities and the entropy.Table 1 shows our results. The table only considers targets movingin the same direction. Because in

Orbits the authors also includedtargets moving in the opposite direction, we can double the num-ber of targets. We are able to make this assumption because if the

Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan target is moving in the opposite direction, at least one of the coor-dinate axes will be inversely correlated, ensuring that the overallcorrelation is negative. From the table, we see that there are on av-erage approximately 10-16% of windows with a similarity above thethreshold depending on the overall trajectory size. In other words,there will be confusion between a target and any target within the36-22.5 degree arc. This means that we need at least this distancebetween the targets, in order to minimise uncertainty. Therefore,we can calculate the maximum number of targets supported by agiven design by dividing 360 by this distance, and doubling it if wealso consider targets moving in the opposite direction along thesame path. N 𝑅 > 𝜆 Proportion Entropy Max Targets

45 7 0.156 2.81 2 × × × Table 1: Simulation results for the design of

Orbits : For eachtrajectory size, we show the number of windows above thethreshold 𝜆 , the corresponding proportion of the number ofwindows, the entropy of the system, and the maximum num-ber of targets that could be placed along this trajectory inorder to obtain zero entropy (we multiply this last numberby 2 to consider targets moving in the opposite direction). These results suggest that assuming that the user is able to followthe targets perfectly and that half of the targets will move in theopposite direction to the rest, we can add up to 14 targets with the

Slow and

Medium speeds, and up to 22 with the

Fast speed. Esteveset al.’s empirical results found acceptable error rates for up to 8targets, but not for 16 targets, which aligns with our theoreticalpredictions.However, despite our predictions that the

Fast condition couldsupport up to 22 targets, their study found that even 16 were toomany in this condition. This is because at high speeds the eyes canno longer maintain a smooth pursuit, engaging instead in a seriesof saccades. This highlights the importance of user testing, as ourtheoretical results can only provide an upper limit for the algorithmperformance, but do not say much about human capabilities withoutfurther data. Nevertheless, the advantage of a probabilistic approachis that it enables us to incorporate motor models as priors, whichis a rich direction for future work.

To better understand how a design will handle noisy data, we canrepeat the tests above under different noise conditions. As an ex-ample, consider Figure 11. To generate this plot, we simulated 16targets moving around a circle at 180 degrees/s (the

Medium speedtested in

Orbits ). We then simulated user input by adding Gaussiannoise to the trajectory data, varying the noise between 5-75% ofthe radius of the circle. We repeated each simulation 30 times andcomputed the average entropy across all simulations for each levelof noise. The smooth curve in the figure was computed with a cubicregression spline on the data.

NOISE LEVEL (% radius) A V G E N T R O P Y ( b i t s ) Figure 11: Effect of noise on the entropy of the system, con-sidering 16 targets moving with the Medium speed tested in

Orbits , with a window size of 30 samples.

The higher the noise level, the lower the correlation with thetrajectory data. At small noise levels, this reduces the similarity withincorrect targets, leaving fewer targets above the threshold, andtherefore reducing the uncertainty. At approximately 20%, however,the uncertainty begins to increase, culminating at an entropy of4, which is the maximum for 16 targets—log ( ) =

4. This is nota universal rule, and these results should be computed for eachspecific combination of variables, but one can use a curve like thisto predict the uncertainty in the system given a sensor’s noisecharacteristics. Conversely, given the sensor’s noise characteristics,we can specify the minimum radius of the targets to achieve adesired entropy level.

One main contribution of this paper is the re-framing of previousworks on motion correlation-based interaction techniques under aprobabilistic framework. Along this process, we have shown thatindividual design decisions in a motion correlation interface, suchas which window size or similarity measure to use, should notbe made in isolation as they influence each other. Computationaltechniques drawn from probability and information theory areparticularly useful in helping us to quantify these decisions. Aprobabilistic approach offers a common language that allow usto consolidate the existing knowledge within motion correlationand also to relate it to other types of interaction techniques. Inparticular, it provides us with a notation to formalise the principlesbehind motion correlation interfaces.Another contribution is the description of practical examplesthat demonstrates that a probabilistic approach can help us analysemotion correlation designs and understand why they work (or donot). Next we summarise other relevant outcomes spread along thispaper as a recommendation list for future work.

Probabilistic modelling extends previous approaches, whilestill being compatible with them.

Deterministic approaches aresimply particular cases where the probabilities are 1 and 0. By explic-itly modelling uncertainty in a system we can yield valuable designinsights, such as quantifying how distinguishable each segment

HI ’21, May 8–13, 2021, Yokohama, Japan Velloso and Morimoto of the path is, understanding the limitations of different similar-ity measures, and enabling the integration of different sources ofinformation.

Interpret the similarity measure as an estimate of mutualinformation.

Because selection by motion correlation can be con-sidered as transmission of information about the target trajectory,we argued that measuring motion similarities is related to esti-mating mutual information. As a consequence, a good similaritymeasure must not only yield high values when user and targetmotions are similar, but also yield low values in every other case.

Make explicit the similarity measure assumptions:

Differ-ent similarity measures perform differently depending on the as-sumptions behind the design. By making these assumptions explicit(for example, invariance to scale, rotation, and/or translation), wecan make more informed comparisons between measures. As such,when proposing novel similarity measures, future works shouldclearly label which transformations are necessary to unambigu-ously transform the trajectory data into the gaze data.

Use softer functions instead of step functions.

Hard thresh-olds make comparisons unstable when the results are near thethreshold and are more vulnerable to confusion when multipletargets yield similarities above the threshold. Using softer thresh-olds mitigates this problem. Even better, we can incorporate ourknowledge of the problem domain by using the probability densityfunctions obtained by a mix of simulations and data collections (seeFigure 4).

Consider the similarity values together instead of inde-pendently.

Similarity values only consider each possible state in-dividually, but high correlations with incorrect targets can still beconsistent with perfectly following the correct target (see Figure8). Calculating the entropy of the probabilities of different statesenables us to quantify the ambiguity of the inference and makemore informed decisions about how to proceed.

Understand how the entropy of the design behaves.

Theshape of the trajectory, the number of samples, the window size,and the similarity measures all influence each other. Computing theinherent entropy of each window in relation to other windows helpsus quantify the uncertainty of the design prior to even collectinguser data and identify points of risk (see Figure 6).

Simulation is not an excuse for not collecting user data.

Though simulations can be very valuable in explicating design deci-sions, they do not replace user testing. However, as the communitymakes progress in modelling user behaviour from a computationalperspective, we can expect to see more accurate models that willfurther empower our own simulations.

This paper frames interaction techniques based on motion correla-tion as a probabilistic reasoning problem. We argued that previoustechniques can be formulated as such, and demonstrated the kindsof analysis that this formalism enables. In particular, we demon-strated analyses of similarity measures in terms of their assumptionsabout user and interface data, of their behaviour along differenttrajectory shapes, and of their value as a measure of mutual infor-mation. We also made a case for computing conditional probabilitiesfor each potential state and for making decisions based on entropy values rather than on independent similarities. Finally, we arguedfor the use of entropy as a tool for understanding the informationcontent of a given design.We highlight a few practical pieces of advice for developers thatstem from our analysis: (1) use the advice in section 4 to select anappropriate similarity metric, (2) collect null data to derive prob-ability density functions for the priors (e.g. Figure 4), (3) selectan appropriate window size and path shape based on the analysisof the entropy of the shape, (4) use entropy or other measures ofuncertainty to decide whether to make a selection. All of theseare ideas that can be readily incorporated in the development ofsystems like the ones described in the literature. We encouragefuture work to frame their results in probabilistic terms in orderto better contextualise their contributions and more consistentlymeasure progress. To facilitate this, we suggest a series of lessonslearned that can be taken forward in future work.

ACKNOWLEDGMENTS

This work was partially funded by a FAPESP-University of Mel-bourne SPRINT Grant (Project Number: 2016/10148-3). EduardoVelloso is the recipient of an Australian Research Council DiscoveryEarly Career Award (Project Number: DE180100315) funded by theAustralian Government, and Carlos Morimoto is the recipient ofFAPESP grants no. 2016/10148-3 and 2017/50121-0.

REFERENCES [1] Pradipta Biswas, Gokcen Aslan Aydemir, Pat Langdon, and Simon Godsill. 2013.Intent recognition using neural networks and Kalman filters. In

InternationalWorkshop on Human-Computer Interaction and Knowledge Discovery in Complex,Unstructured, Big Data . Springer, Maribor, Slovenia, 112–123. https://doi.org/10.1007/978-3-642-39146-0_11[2] Marcus Carter, Eduardo Velloso, John Downs, Abigail Sellen, Kenton O’Hara, andFrank Vetere. 2016. PathSync: Multi-User Gestural Interaction with TouchlessRhythmic Path Mimicry. In

Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems (Santa Clara, California, USA) (CHI ’16) . ACM, NewYork, NY, USA, 3415–3427. https://doi.org/10.1145/2858036.2858284[3] Christopher Clarke, Alessio Bellino, Augusto Esteves, Eduardo Velloso, and HansGellersen. 2016. TraceMatch: A Computer Vision Technique for User Inputby Tracing of Animated Controls. In

Proceedings of the 2016 ACM InternationalJoint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp ’16) . ACM, New York, NY, USA, 298–303. https://doi.org/10.1145/2971648.2971714[4] Christopher Clarke and Hans Gellersen. 2017. MatchPoint: Spontaneous SpatialCoupling of Body Movement for Touchless Pointing. In

Proceedings of the 30thAnnual ACM Symposium on User Interface Software and Technology (Qu& (UIST ’17) . ACM, New York, NY, USA, 179–192. https://doi.org/10.1145/3126594.3126626[5] Timothy F. Cootes and Cristopher J. Taylor. 2004. Statistical models of appearancefor computer vision.[6] Travis Cox, Marcus Carter, and Eduardo Velloso. 2016. Public DisPLAY: SocialGames on Interactive Public Screens. In

Proceedings of the 28th Australian Confer-ence on Computer-Human Interaction (Launceston, Tasmania, Australia) (OzCHI’16) . ACM, New York, NY, USA, 371–380. https://doi.org/10.1145/3010915.3010917[7] Heiko Drewes, Mohamed Khamis, and Florian Alt. 2018. Smooth Pursuit TargetSpeeds and Trajectories. In

Proceedings of the 17th International Conference onMobile and Ubiquitous Multimedia (Cairo, Egypt) (MUM 2018) . ACM, New York,NY, USA, 139–146. https://doi.org/10.1145/3282894.3282913[8] Heiko Drewes, Mohamed Khamis, and Florian Alt. 2019. DialPlates: EnablingPursuits-Based User Interfaces with Large Target Numbers. In

Proceedings of the18th International Conference on Mobile and Ubiquitous Multimedia (Pisa, Italy) (MUM ’19) . Association for Computing Machinery, New York, NY, USA, Article10, 10 pages. https://doi.org/10.1145/3365610.3365626[9] Ian L. Dryden and Kanti V. Mardia. 1998.

Statistical Shape Analysis . John Wiley& Sons, New York, NY.[10] Augusto Esteves, Eduardo Velloso, Andreas Bulling, and Hans Gellersen. 2015.Orbits: Enabling Gaze Interaction in Smart Watches Using Moving Targets. In

Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive

Probabilistic Interpretation of Motion Correlation Selection Techniques CHI ’21, May 8–13, 2021, Yokohama, Japan and Ubiquitous Computing and Proceedings of the 2015 ACM International Sympo-sium on Wearable Computers (Osaka, Japan) (UbiComp/ISWC’15 Adjunct) . ACM,New York, NY, USA, 419–422. https://doi.org/10.1145/2800835.2800942[11] Augusto Esteves, Eduardo Velloso, Andreas Bulling, and Hans Gellersen. 2015.Orbits: Gaze Interaction for Smart Watches Using Smooth Pursuit Eye Movements.In

Proceedings of the 28th Annual ACM Symposium on User Interface Software &Technology (Daegu, Kyungpook, Republic of Korea) (UIST ’15) . ACM, New York,NY, USA, 457–466. https://doi.org/10.1145/2807442.2807499[12] Augusto Esteves, David Verweij, Liza Suraiya, Rasel Islam, Youryang Lee, and IanOakley. 2017. SmoothMoves: Smooth Pursuits Head Movements for AugmentedReality. In

Proceedings of the 30th Annual ACM Symposium on User InterfaceSoftware and Technology (Qu& (UIST ’17) . ACM, NewYork, NY, USA, 167–178. https://doi.org/10.1145/3126594.3126616[13] Jean-Daniel Fekete, Niklas Elmqvist, and Yves Guiard. 2009. Motion-pointing:Target Selection Using Elliptical Motions. In

Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09) . ACM, NewYork, NY, USA, 289–298. https://doi.org/10.1145/1518701.1518748[14] Argenis Ramirez Gomez and Hans Gellersen. 2018. Smooth-i: Smart Re-calibration Using Smooth Pursuit Eye Movements. In

Proceedings of the 2018 ACMSymposium on Eye Tracking Research & Applications (Warsaw, Poland) (ETRA ’18) .ACM, New York, NY, USA, Article 10, 5 pages. https://doi.org/10.1145/3204493.3204585[15] Laura D. Goodwin and Nancy L. Leech. 2006. Understanding Correlation: FactorsThat Affect the Size of r.

The Journal of Experimental Education

74, 3 (2006),251–266. https://doi.org/10.3200/JEXE.74.3.249-266[16] Kasper Hornbæk and Antti Oulasvirta. 2017. What Is Interaction?. In

Proceedingsof the 2017 CHI Conference on Human Factors in Computing Systems (Denver,Colorado, USA) (CHI ’17) . ACM, New York, NY, USA, 5040–5052. https://doi.org/10.1145/3025453.3025765[17] Mohamed Khamis, Carl Oechsner, Florian Alt, and Andreas Bulling. 2018. VR-pursuits: Interaction in Virtual Reality Using Smooth Pursuit Eye Movements.In

Proceedings of the 2018 International Conference on Advanced Visual Interfaces (Castiglione della Pescaia, Grosseto, Italy) (AVI ’18) . ACM, New York, NY, USA,Article 18, 8 pages. https://doi.org/10.1145/3206505.3206522[18] Mohamed Khamis, Ozan Saltuk, Alina Hang, Katharina Stolz, Andreas Bulling,and Florian Alt. 2016. TextPursuits: Using Text for Pursuits-based Interactionand Calibration on Public Displays. In

Proceedings of the 2016 ACM InternationalJoint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp ’16) . ACM, New York, NY, USA, 274–285. https://doi.org/10.1145/2971648.2971679[19] Eric Langford, Neil Schwertman, and Margaret Owens. 2001. Is the Prop-erty of Being Positively Correlated Transitive?

The American Statisti-cian

55, 4 (2001), 322–325. https://doi.org/10.1198/000313001753272286arXiv:https://doi.org/10.1198/000313001753272286[20] Wanyu Liu, Rafael Lucas D’Oliveira, Michel Beaudouin-Lafon, and Olivier Rioul.2017. BIGnav: Bayesian Information Gain for Guiding Multiscale Navigation. In

Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17) . Association for Computing Machinery, NewYork, NY, USA, 5869–5880. https://doi.org/10.1145/3025453.3025524[21] Wanyu Liu, Olivier Rioul, Joanna McGrenere, Wendy E. Mackay, and MichelBeaudouin-Lafon. 2018. BIGFile: Bayesian Information Gain for Fast File Retrieval.In

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18) . Association for Computing Machinery, NewYork, NY, USA, 1–13. https://doi.org/10.1145/3173574.3173959[22] David J.C. MacKay. 2003.

Information theory, inference and learning algorithms .Cambridge university press, United Kingdom. https://doi.org/10.2277/0521642981[23] Antti Oulasvirta, Xiaojun Bi, and Andrew Howes. 2018.

Computational inter-action . Oxford University Press, United Kingdom. https://doi.org/10.1093/oso/9780198799603.001.0001[24] Ken Pfeuffer, Mélodie Vidal, Jayson Turner, Andreas Bulling, and Hans Gellersen.2013. Pursuit Calibration: Making Gaze Calibration Less Tedious and MoreFlexible. In

Proceedings of the 26th Annual ACM Symposium on User InterfaceSoftware and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13) .ACM, New York, NY, USA, 261–270. https://doi.org/10.1145/2501988.2501998[25] Simon Rogers, John Williamson, Craig Stewart, and Roderick Murray-Smith.2010. FingerCloud: Uncertainty and Autonomy Handover Incapacitive Sensing.In

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10) . Association for Computing Machinery, NewYork, NY, USA, 577–580. https://doi.org/10.1145/1753326.1753412[26] Julia Schwarz, Scott Hudson, Jennifer Mankoff, and Andrew D. Wilson. 2010.A Framework for Robust and Flexible Handling of Inputs with Uncertainty. In

Proceedings of the 23nd Annual ACM Symposium on User Interface Software andTechnology (UIST ’10) . Association for Computing Machinery, New York, NY,47–56. https://doi.org/10.1145/1866029.1866039[27] Daming Shi, Steve R. Gunn, and Robert I. Damper. 2003. Handwritten Chineseradical recognition using nonlinear active shape models.

IEEE transactions onpattern analysis and machine intelligence

25, 2 (2003), 277–280. https://doi.org/10.1109/TPAMI.2003.1177158 [28] Eduardo Velloso, Marcus Carter, Joshua Newn, Augusto Esteves, ChristopherClarke, and Hans Gellersen. 2017. Motion Correlation: Selecting Objects byMatching Their Movement.

ACM Trans. Comput.-Hum. Interact.

24, 3, Article 22(April 2017), 35 pages. https://doi.org/10.1145/3064937[29] Eduardo Velloso, Flavio Luiz Coutinho, Andrew Kurauchi, and Carlos H Morimoto.2018. Circular Orbits Detection for Gaze Interaction Using 2D Correlation andProfile Matching Algorithms. In

Proceedings of the 2018 ACM Symposium on EyeTracking Research & Applications (Warsaw, Poland) (ETRA ’18) . ACM, New York,NY, USA, Article 25, 9 pages. https://doi.org/10.1145/3204493.3204524[30] Eduardo Velloso, Markus Wirth, Christian Weichel, Augusto Esteves, and HansGellersen. 2016. AmbiGaze: Direct Control of Ambient Devices by Gaze. In

Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane,QLD, Australia) (DIS ’16) . ACM, New York, NY, USA, 812–817. https://doi.org/10.1145/2901790.2901867[31] David Verweij, Augusto Esteves, Vassilis-Javed Khan, and Saskia Bakker. 2017.WaveTrace: Motion Matching Input Using Wrist-Worn Motion Sensors. In

Pro-ceedings of the 2017 CHI Conference Extended Abstracts on Human Factors inComputing Systems (Denver, Colorado, USA) (CHI EA ’17) . ACM, New York, NY,USA, 2180–2186. https://doi.org/10.1145/3027063.3053161[32] Mélodie Vidal, Andreas Bulling, and Hans Gellersen. 2013. Pursuits: SpontaneousInteraction with Displays Based on Smooth Pursuit Eye Movement and MovingTargets. In

Proceedings of the 2013 ACM International Joint Conference on Pervasiveand Ubiquitous Computing (Zurich, Switzerland) (UbiComp ’13) . ACM, New York,NY, USA, 439–448. https://doi.org/10.1145/2493432.2493477[33] Janett Walters-Williams and Yan Li. 2009. Estimation of Mutual Information: ASurvey. In

Rough Sets and Knowledge Technology , Peng Wen, Yuefeng Li, LechPolkowski, Yiyu Yao, Shusaku Tsumoto, and Guoyin Wang (Eds.). Springer BerlinHeidelberg, Berlin, Heidelberg, 389–396. https://doi.org/10.1007/978-3-642-02962-2_49[34] John Williamson. 2006.

Continuous uncertain interaction . Ph.D. Dissertation.University of Glasgow.[35] John Williamson and Roderick Murray-Smith. 2004. Pointing Without a Pointer.In

CHI ’04 Extended Abstracts on Human Factors in Computing Systems (Vienna,Austria) (CHI EA ’04) . ACM, New York, NY, USA, 1407–1410. https://doi.org/10.1145/985921.986076[36] Cheng Zhang, Xiaoxuan Wang, Anandghan Waghmare, Sumeet Jain, ThomasPloetz, Omer T. Inan, Thad E. Starner, and Gregory D. Abowd. 2017. Fin-gOrbits: Interaction with Wearables Using Synchronized Thumb Movements.In

Proceedings of the 2017 ACM International Symposium on Wearable Com-puters (Maui, Hawaii) (ISWC ’17)(ISWC ’17)