11 Gaze Gestures and Their Applications inHuman-Computer Interaction with aHead-Mounted Display
W.X. Chen, X.Y. Cui,
Member, IEEE,
J. Zheng, J.M. Zhang, S. Chen, and Y.D. Yao,
Fellow, IEEE
Abstract —A head-mounted display (HMD) is a portable and interactive display device. With the development of 5G technology, it maybecome a general-purpose computing platform in the future. Human-computer interaction (HCI) technology for HMDs has also been ofsignificant interest in recent years. In addition to tracking gestures and speech, tracking human eyes as a means of interaction is highlyeffective. In this paper, we propose two UnityEyes-based convolutional neural network models, UEGazeNet and UEGazeNet*, whichcan be used for input images with low resolution and high resolution, respectively. These models can perform rapid interactions byclassifying gaze trajectories (GTs), and a GTgestures dataset containing data for 10,200 ”eye-painting gestures” collected from 15individuals is established with our gaze-tracking method. We evaluated the performance both indoors and outdoors and theUEGazeNet can obtaine results 52% and 67% better than those of state-of-the-art networks. The generalizability of our GTgesturesdataset using a variety of gaze-tracking models is evaluated, and an average recognition rate of 96.71% is obtained by our method.
Index Terms —Human-computer interaction, Gaze tracking, head-mounted display, convolutional neural network, deep learning. (cid:70)
NTRODUCTION A H ead-mounted display (HMD) is a type of computerdisplay worn on the head or built into a helmet.Virtual reality (VR), augmented reality (AR) and mixedreality (MR) are the main applications that use HMDs.Early HMD studies were centered primarily on militaryapplications; however, in recent years, as the cost and sizeof the hardware has continually decreased, this type oftechnology has been applied in fields such as medicine[1], education [2], industrial design [3] and entertainment[4]. An HMD is different from a traditional monitor; thus,creating appropriate forms of human-computer interaction(HCI) for HMDs is also of concern. At present, HCIs havebeen well established for gestures and voice input such asMicrosofts HoloLens [5] and Magic Leap; however, theseHCI methods are unsuitable when both hands are occupiedor in environments in which speech is not an option. Thus,a simpler and more effective method to approach HCI withHMDs is crucial.Eye tracking is a technique for measuring the gaze pointof human eyes and their degree of movement relative tothe head pose. The main task is to determine where ahuman is looking and for how long. The world’s first non-invasive eye tracker, developed in Chicago in 1922 by GuyThomas Buswell [6], used beams reflected from the eyes andrecorded them on films to determine the gaze direction. Inthe 1970s, eye-tracking research advanced rapidly, especiallyin the field of reading research [7]. Eye tracking has beenused to solve HCI problems since the 1980s [8].Until now, eye-tracking applications have mainly con-centrated on behavior analysis and HCI [9]. In terms ofbehavior analysis, by analyzing human gaze time and • The authors are with Northeastern Univ, college of Med. & Bio. Informa-tion Engineering, Shenyang 110004, Liaoning, Peoples R China.E-mail: [email protected] changes in gaze angle, we can analyze hand-eye coordi-nation [10], students’ attention in class [11], visual fatigue[12], and even emotional state [13]. In addition, eye trackingplays an auxiliary role in the diagnosis of diseases suchas autism [14], visual memory impairment [15], and mildamnestic cognitive impairment [16]. Regarding interaction,because humans can freely control their eye movements,eye-tracking technology can be used as a method for HCI.For example, gaze duration has been used to determinewhether a human wants to press a button on the screenor to click a pointer on the screen via eye movements [17],[18], [19], [20], [21], [22], [23]. In recent years, HCI researchbased on gaze gestures has emerged. In this field, eye-tracking data are used to delineate virtual gestures thatcould be widely applied to HCI for games [25] and medicaloperations [24], [26], [27].This paper addresses the issue of achieving HMD-basedgaze interaction using an inexpensive webcam to detect andtrack the human gaze direction in real time at a close rangeand to analyze the users intent based on gaze trajectorydata. In previous studies, the fully connected layer of theCNN network has always been used to directly detect 3Dgaze coordinates [28] or two rotation vectors that representthe yaw and pitch of the gaze direction [29]. This methodusually requires the support of large-scale data and does notwork well in some scenarios, such as those with extremecamera shooting angles. Thus, we propose UEGazeNet,which detects landmarks and obtains the gaze angle fromthe landmarks. Our method requires relatively few trainingdata and can fit complete eye information from incompleteeye images, which means that the gaze can be detectedwhen the camera shooting angle is extreme or when theeyes are partially obscured. In addition, we also propose an-other network-UEGazeNet*, which similar to the structureof UEGazeNet but can recognize gaze directions with low- a r X i v : . [ c s . H C ] O c t resolution images. interactive instructionNear-eye camera Input Image Gaze angle ClassifierUEGazeNet
Operation A
Operation B
Operation C
Operation D … Operation A
Operation B
Operation C
Operation D … UEGazeNet* Gaze angle with landmarks
Blink events and Gaze movement
Fig. 1. Overview of UEGazeNet and UEGazeNet*-two near-eye gazeestimation neural networks and their applications in human-computerinteraction.
Moreover, we design an HCI method based on real-timegaze direction tracking. The traditional HCI methods in-volve directly controlling cursor movements and switchingbetween interaction events through gaze times or througha blinking action. For example, one existing method allowsusers to slide their gaze back and forth between the centerof the screen and the four corners of the screen to move orrotate a camera [24] [26]. Additionally, dividing the screeninto several areas, treating the areas as points, and havingusers gaze at these points in a specific order to draw gesturesis also feasible [25] [27]. However, these approaches requiregestures to be mapped to the entire screen, and accuracydepends on gaze tracking. In contrast, our HCI methodis based on a gaze gesture classifier that can detect theattention of the user through their gaze gesture. When usersuse HMDs, they can accomplish tasks well even if theyare not in a stationary, stable state, such as when they arewalking or performing quick eye movements.In addition, we establish a gaze-tracking dataset thatcontains data from 10,200 gaze trajectories collected from15 individuals using our gaze-tracking method. To ensurediversity, we create random transformations of standardpatterns; the participants supply gaze trajectories based ona displayed indicator map, which avoids the problem ofparticipants always using similar gaze tracks.To summarize, this work has three main contributions.First, we present an effective HMD-based gaze-trackingneural network called UEGazeNet, which can detect land-marks and obtains the gaze angle from the landmarks. Sec-ond, we develop a highly robust, flexible and fast-operatingHCI method based on the classification of gaze gestures.Third, we design a gaze gesture dataset that is used totrain the classifier for HIC. Our HCI approach is faster thanregular gaze interaction and very suitable for HMD devices.The rest of the paper is organized as follows. Section 2provides a review of eye-tracking methods and eye-trackingdatasets. Section 3 provides a detailed description of ourmethod, followed by experiments in Section 4 and discus-sions in Section 5. Finally, Section 6 concludes the paper andproposes further research areas.
ELATED W ORK
The process of eye tracking always includes two steps:eye detection and gaze estimation. For eye detection, thereare two main methods: those based on shape and thosebased on appearance. In shape-based methods, the locationof the eyes is decided by voting [30] or matching [31] [32] geometrical eye shapes such as the edge shape of theiris or pupil [33], [34], [35], [36]. Generally, these methodsrequire a priori model to judge shape complexities. Whenthe facial posture changes significantly or the image has lowresolution, there are few features around the eye area. In thiscase, the corners of the eyes, eyebrows and other parts of thehead can be used to detect eyes. Moreover, the corner of theeyes and the head contour can also be used to constrainthe target area [37], [38], [39], [40]. The appearance-basedapproach uses the appearance of various detection charac-teristics such as the original color distribution [41], [42], [43],[44], [45] or the distribution after filtering [46], [47], [48],[49], [50]. The gaze can be estimated using either a model-based approach [51], [52], [53], [54] or an appearance-basedapproach [55], [56], [57], [58]. Model-based approaches sim-ulate the physical structure of the human eye and typicallyconsider physiological behaviors such as eyelid movement.The 3D gaze direction is estimated by assuming a sphereor ellipsoid or by modeling the corneal surface. Generally,these approaches can be divided into two methods: corneal-reflection-based methods [59] [60] and shape-based methods[61] [62]. In corneal-reflection-based approaches, the corneais irradiated by light, and the first reflected Purkinje imageis used for feature detection, which can help to estimatethe optical axis in 3D space. This approach requires at leastone infrared light source. Shape-based approaches are thesame as the shape-based method used in eye detection; thatis, gaze estimation is further performed in 3D space basedon the eye detection results. However, these approachesrely on measurement information and hardware calibrationand require other relevant information, such as camera andmonitor position. In addition, they seldom achieve accurateresults near the edge regions of the cornea.In contrast, appearance-based approaches calculate theextracted features using regression; thus, these methods donot require camera or geometric calibration. Instead, theydirectly map an image to the gaze direction. Appearance-based approaches can be divided into parametric formssuch as polynomials [63] [64] and nonparametric classi-fiers such as neural networks [29] [65]. The former, whichconstitutes the usual practice, obtains the gaze directionthrough polynomial regression on the dark-bright pupilfeatures generated under the infrared light source. The latterlearns a mapping between the two from a large numberof ”eye image-gaze direction” data. These approaches canimplicitly extract the relevant characteristics used to esti-mate individual changes and identify items of concern, andthey do not require scene geometry or camera calibration.Nevertheless, the costs involved in collecting an appropriatedataset cannot be ignored, and these methods generally donot respond well to changes in head pose.
Several eye-tracking datasets have been developed in recentyears, and these datasets can be further classified as real-data-based datasets [29], [66], [67], [68] and synthetic-data-based datasets [28], [69], [70], [71]. Real-data-based datasetsuse cameras to capture images of eyes and obtain their gazedirections; they are often collected under lab conditions [66],[67], [68] and are not completely applicable to outdoor situ-ations. MPIIGaze [29] is a well-constructed dataset collected from recording 15 users’ daily laptop use; however, the gazedirection range of this dataset is narrow. It can be used asa verification set and is satisfactory for unconstrained cross-dataset evaluation, but it is still unsatisfactory as a trainingdataset. (a) (b)
Fig. 2. Training data. (a) The software UnityEyes used to generate thetraining data. We can customize the parameters related to the cameraand gaze direction or randomly generate them. (b) The data generatedby a simulation model with different lighting and races. Note that thedirection of the gaze in the figure is that detected by our method.
In contrast, synthetic datasets can customize the datafollowing users requirements and do not require anno-tating large numbers of images. UT Multiview [28] usesa combination of virtual and real data and synthesizesdata with real data. Due to the gap between the featuredistributions in synthetic images and those of real images,learning from synthetic images may not achieve the ex-pected performance. To bridge the gap between a syntheticimage distribution and a real image distribution, GazeNetuses a model pretrained on ImageNet to learn from largeamounts of data and then trains the resulting model onUT Multiview. Using real data solves the data distributionproblem, leading to the proposal of comprehensive learningmethod [72]. Apple, Inc. proposed using both synthetic data[73] and real unlabeled data to train a model. The approachused synthetic data as the input, and these data can be madeto approximate real data through a GAN to enhance theauthenticity of the synthetic output while retaining the la-beling information through unsupervised learning methods.
ETHOD
As shown in Fig. 1, our method involves photographinghuman eyes with a near-eye camera integrated into anHMD to calculate the gaze direction. The perspective affinemethod is used to map the direction of the gaze to a 3Dfixation direction in the target coordinate system. Whentracking the gaze direction, blinking is used to switch be-tween interactive events and to start recording the gazetrajectory during the period to identify the users operationalintent via a classifier.
The training data of this study are based on UnityEyes [71](see Fig. 2), which combines a generated 3D model of thehuman eye with a real-time rendering frame-work basedon high-resolution 3D facial scanning. The model includeseyelid animation that conforms to the human anatomy, and the image reflected in the cornea is a real image. The simu-lation of corneal curvature and reflection images producessynthetic data for gaze estimation in difficult field situations.Our method is based on HMDs near-eye camera, andconsequently, we need to customize the range of the headpose distribution to compensate for our method’s inabilityto directly provide head pose information. We collect thedata based on the range, and these data are consistent withthe image captured by our HMD system. For example, inour method, the camera is in front of and below the humaneyes, and it images the human eyes at a certain angle ofelevation. We need to increase the number of samples fromthat angle and similar angles. In addition to head posture,differences in personal appearance significantly impact gazeestimation [29]; therefore, we randomize the appearancewhen generating data. (c)
Eye tracing camera (b)
Eye tracing camera (b)(a) (d)
Optical waveguide glasses
Fig. 3. Hardware system. (a) Homemade HMD system for HCI; (b) eye-tracking camera; (c) target board designed for quantitative experiments,which uses (d) the adjustable mechanical device.
Our hardware is based on the optical waveguide glassesproduced by Lingxi AR Technology CO., Ltd of China (Fig.3 (a)). One camera is integrated inside the glasses to captureeye images from a short distance and conduct gaze trackingwithout requiring extra light (Fig. 3 (b)). In addition, we usean NVIDIA Jetson TX2, which is a single module based onthe AI supercomputing NVIDIA Pascal architecture, for ourneural network and 3D rendering synthesis.
Considering that there is no correlation between binoculardifferences and test results [9], only the data from the lefteye are used during model training. To ensure that theeye position is not limited to the middle of the image, weperformed the following operations before training.1) Randomly enlarge the image: Using the pupil asthe center, we randomly magnified the image by afactor of n , where n ∈ [1 , .2) Randomly move the image: Using the pupil as thecenter, we randomly moved w pixels horizontallyand h pixels vertically. Note that we ensured thepupil center in the image did not move outside theimage boundary as a result, that is, w ∈ [-x, W-x], h ∈ [-y, H-y], where W and H are the widthand length of the image, respectively, and the pupilcenter coordinates are (x, y). Block 1
Residual
Block 2
Residual
Block 3
Residual
Block 4 connected
Fully connectedResnet (full pre-activation) eyelid iris eyelid
Input image regressorregressor Gaze(UEGazeNet)Gaze(UEGazeNet)Gaze (UEGazeNet*)
Gaze (UEGazeNet*)
Block 1
Residual
Block 2
Residual
Block 3
Residual
Block 4 connected
Fully connectedResnet (full pre-activation) eyelid iris eyelid
Input image regressorregressor Gaze(UEGazeNet)Gaze (UEGazeNet*)
Fig. 4. UEGazeNet and UEGazeNet*. Additional layers added on top of ResNet to increase extracted features on different receptive fields whilesimultaneously detecting iris edges and eyelid.
3) Randomly rotate the image: Using the pupil as thecenter, we rotated the image clockwise by a random( α ) number of degrees, where α ∈ [ − ◦ , ◦ ] .4) Randomly reduce the number of image pixels: Ourimage input size was × ; to support low-resolution input, we randomly performed Gaussianfiltering to blur the images. As shown in Fig. 4, the neural network structures of ourUEGazeNet and UEGazeNet* are partially based on fullpre-activation ResNet [74]. However, different from ResNet,the input image synchronously enters a × convolutionallayer and a residual block, and the outputs of the × convolutional layer and the residual block are combinedand enter the next × convolutional layer. The outputsof both the last convolutional layer and the residual blockare connected to a fully connected layer.For UEGazeNet, convolution kernels we used for eachconvolutional layer are respectively 32, 64, 128, 256 and foreach residual block including 2 residual units(a residual unitsuccessively including Batch Normalization layers, ReLUlayers, Conv × layers, and repeat these three layers once).In particular, the outputs of the two fully connected layersare restricted to extract the landmarks of the eyelid and iris,respectively, which include a total of 55 characteristic points(7 at the corner of the eye, 16 at the eyelid and 32 at theiris). This process is implemented cooperatively by ResNetand the outer nested convolutional layers. Based on theextraction of iris features, the feature points of the eyelidsand the corners of the eyes are further extracted. In thisway, we can ensure that the feature points of the two partsare relatively independent, while retaining the relationshipbetween shape and position. Thus, our UEGazeNet canachieve good results even under adverse conditions, suchas when the eyes are partially obscured, as shown in Fig. 5.For UEGazeNet*, we use respectively 24, 24, 48, 48 con-volution kernels for each convolutional layer and 1 residualunits for each residual block. The outputs of two fullyconnected layers are connected to a filter and then used todirectly calculate the gaze direction by regression. In thisneural network structure, a series of convolutional layersare connected in sequence outside of ResNet, which allows features to be extracted from the neighborhood of featuresextracted by ResNet. This approach allows the integrationof different receptive features and reduces the impact of thequality of the dataset. Traditional HCI methods may be unsuitable when bothhands are occupied or in environments in which speech isnot an option. At this time, using gaze as an interactionmechanism is an appropriate choice, as gaze behavior iscontinuous and easy to control. There have long been waysof simulating mouse and keyboard inputs using the eyes,but such interactions are crude and are not sufficientlyfast or convenient for HMD applications. Our interactionmethod uses the user’s gaze to ”draw” a variety of gestureson the interface, which can be further mapped to a customoperation, as shown in Fig. 6.
We adopted two collection methods: a long-range methodand a close-range method. Long-range data collection cap-tures an image of the users face through a high-definitioncamera located in front of the screen to track the gazedirection and record its trajectory for the contrast test. Close-range data collection was based on the HMD; the users startrecording data as they use the HMD. We collected a total of17 patterns for each of the 10 people; data were collected 20times for each person for each pattern in two batches, i.e.,each participant first traced each pattern with their gaze 10times and then traced the remainder after a break. Finally,we normalized the collected trajectory coordinates using theunit vector of the 3D gaze direction to ensure that the dataare usable in different methods.
The collected gestures in the GTgestures dataset consist oftwo parts. Each of the 10 participants collected 40 patterns(20 for each eye), including a gaze-tracking image with asize of , × , and a normalized 2D gaze vector.We applied a lightweight CNN network that uses the gazetrajectory image as input to obtain the predicted pattern ofcategory. We use pixel region relationships to resample the GazeNet (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
UEGazeNetUEGazeNetUEGazeNet*UEGazeNet*
Fig. 5. Comparing the effects of UEGazeNet, UEGazeNet*, and GazeNet; all their cameras are on the lower side of the human eye (based on HMD),and GazeNet does not add information about the head posture. (a) Incomplete eyes; (b) (i) (l) a user wearing glasses; (j) (k) (l) a dark environment;(e) the camera is on the nonrental side of the user.
Operation AOperation B Operation COperation D Fig. 6. HCI method. In the figure, eyes 1, 2, and 3 show the gaze of aneye at different times, that is, the eye moves from 1 to 2 to 3, drawing apattern that is recognized by the classifier. The label is also connectedas an interface to an operation. images, scale them to × , and then classify the resultsvia a convolutional layer, a BN layer, two fully connectedlayers, and the final SoftMax layer. XPERIMENTS
We conducted a series of assessments of our approach. First,the MPIIGazes cross-dataset method [29] is used to evaluatethe generalizability of our approach. Then, the eye phantommethod is used to evaluate the errors in practical appli-cations, including applications under indoor and outdoorillumination and applications using short-range and long-distance cameras. We also evaluate the GTgestures datasetby using our gaze-tracking methods for HCI applications.
In this experiment, UnityEyes and UT Multiview are used asthe training dataset and MPIIGaze is used as testing dataset.Compared with the application of HMD, the current state-of-the-art eye-tracking methods always capture the humanface at a relatively long distance and then extract the eyeposition image using an eye recognition algorithm so thatthe image resolution for eye tracking is low. To ensurethe consistency of the algorithm comparison, we uniformlyscale the data of UnityEyes and UT Multiview to the scale of MPIIGaze ( × pixels). The loss function is the Euclideandistance between the real gaze direction and the predictedvalue and the evaluation criterion is the angle differencebetween the real 3D gaze direction and the predicted value.We trained each network in 15 epochs with a batch sizeof 256 on the training set using the Adam solver with aninitial learning rate of 0.0001 and multiplied by 0.1 afterevery 5 epochs. We performed multiple training tests forall models and took the average value instead of adoptingthe results of multiple random tests after a single trainingiteration, as sometimes the randomly selected subsets havelarge distribution differences from the test dataset. We firstevaluated the performance on the synthetic and real datasetsand then tested the head pose information. M ea n e rr o r [ d e g r ee s ] UnityEyes UT Multiview
ResNetUEGazeNet RFKNN ResNet*GazeNet UEGazeNet*ResNetUEGazeNet RFKNN ResNet*GazeNet UEGazeNet*
Fig. 7. The results of cross-dataset evaluation. We randomly selected50,000 images and 64,000 images as the training set in UnityEyes (left)and UT Multiview (right), respectively, and then tested them in 45,000images randomly selected in MPIIGaze. UT does not have fine landmarkinformation; thus, ResNet and UEGazeNet are not compared.
Fig. 7 shows the mean angular errors of the differentmethods, including GazeNet, ResNet (UEGazeNet withoutthe outer nested convolutional layers), ResNet* (full pre-activation), random forest (RF), K-nearest neighbor (KNN),UEGazeNet and UEGazeNet*. Bars correspond to the meanerror across all baseline methods in the two datasets, anderror bars indicate the standard deviations across eachmethod. As can be seen from the figure, when using theUnityEyes training set, GazeNet can obtain good resultseven without ImageNets pretraining. However, althoughthe error obtained by our UEGazeNet* is minimized, it did not obtain good results when using UEGazeNet. Similarresults can also be found for ResNet and ResNet*, whichmeans that, when the image resolution is low, it is moreeffective to learn gaze direction from the image directly.Moreover, when using the UT Multiview dataset, the best-performing model is still UEGazeNet*, which demonstratesthe effectiveness of our network structure. In addition, theperformance of RF and KNN showed substantial differencesbetween the two datasets, for which the average errors of theUnityEyes dataset are lower than those of the UT Multiviewdataset. The evaluation results of these two methods arelargely dependent on the quality of the datasets, which mayindicate that the UnityEyes dataset is closer to the real datadistribution and can thus achieve a better learning effect. M ea n e rr o r [ d e g r ee s ] GazeNet GazeNetUEGazeNet* UEGazeNet*
With head pose Without head pose
UnityEyes UT MultiviewUnityEyes UT Multiview
Fig. 8. Head pose information. Comparison of the effect of the directmethod with that of GazeNet when determining whether to inject headerinformation; both are trained in UnityEyes and UT Multiview.
To investigate the significance of head posture informa-tion, we compared the gaze estimation capability betweenGazeNet and our UEGazeNet* with and without head pos-ture data. As shown in Fig. 8, the average errors of GazeNetand UEGazeNet* are both decreased when adding headposture information, which indicates that the head posturemay have a close relationship with gaze direction. Moreover,our UEGazeNet* can achieve a better performance thanGazeNet with or without the head posture.
This paper focuses on the near-eye image of HMDs applica-tion; thus, we further evaluate the gaze estimation accuracyby using the eye phantom method [75]. As show in Fig.3.d, we develop a testing stage with a realistic artificial eyeand design a mechanical device to adjust its correspondingkinematic model, which enables us to precisely evaluate theeye location of an eye tracker and accurately obtain the gazedirection on the target board (Fig. 3.c) by a laser. Whenwe determine the four angles of the target range (top left,top right, bottom left, and bottom right), the gaze directionsare transformed to the screen coordinate of HMD by affinetransformation. We used UnityEyes as a training dataset toevaluate algorithm performance in different lighting envi-ronments and at different image resolutions. E rr o r [ d e g r ee s ] Indoor outdoor
ResNet UEGazeNetResNet*GazeNet UEGazeNet*
Fig. 9. Indoor and outdoor conditions. Indoors, each person is 55 cmfrom the screen, gazing at the marker points. The result of the estimatoris derived, and the angular difference between the two 3D vectors iscalculated (left). Outdoor conditions, which are more challenging, arealso considered (right).
In this experiment, the GazeNet, ResNet, UEGazeNet*,ResNet* are used as the baseline methods. To ensure that thetraining data meets the needs of the model, the size of thetraining data is set to × for GazeNet and × forthe other methods. Fig. 9 shows the error distributions of thedifferent methods when trained on UnityEyes with the near-eye dataset and tested on our designed evaluation system.Bars correspond to the error distribution interval and errorbars indicate the standard deviations across each method.As can be seen from the figure, our UEGazeNet shows thelowest error in both indoor and outdoor light environments,with average errors of 1.52 degrees indoors and 1.78 degreesoutdoors. In contact, the performance of UEGazeNet* isgenerally worse on the cross-dataset evaluation, possiblybecause the eye cannot be fully captured when the gazedirection gradually moves to an extreme angle. Thus, thereis a significant effect of landmarks in high-definition images.Moreover, the performance of ResNet (using landmark-based gaze estimation) is not stable, especially outdoors.Compared to GazeNet, the average error of our UEGazeNetis reduced 52.15% indoors and 67.52% outdoors, whichshows the advantage of our network structure in this chal-lenging test method.To analyze the error distribution, we further map theerror to the specific location of the target board. Fig. 10shows the average error distribution of these methods underindoor or outdoor environments. The indoor error is smallerthan the outdoor error and the errors for both are mainlyconcentrated in the low-middle area because we record themapping area using four corners. When the gaze angle res-olution is low, the middle area is not well distinguished formost methods. However, as can be seen from the figure, theerror distribution of our UEGazeNet is relatively balanced,and the errors of GazeNet change significantly when farfrom the center, which further confirms the validity andstability of our model. For different evaluation methods, the results obtained byour two networks are quite different. Thus, we also eval-uated our UEGazeNet and UEGazeNet* under different
Outdoor
Indoor
UEGazeNetGazeNet
Fig. 10. The error distributions measured by the eye phantom. M e a n e rr o r [ d e g r ee s ] Resolution
UEGazeNet* UEGazeNetUEGazeNet* UEGazeNet
Fig. 11. Effects of resolutions. Since our direct method requires high-resolution images (relative to × ), we should explore the effects ofresolution on our approach. training data sizes of indoor environments. As shown in Fig.11, image resolution has a major impact on gaze estimationperformance, especially for UEGazeNet, which relies onlandmark detection. When the resolution is low, it is oftendifficult to find accurate feature locations, which makes itimpossible for the gaze estimator to obtain correct features. The GTgestures dataset contains 17 easy-to-implement pat-terns that can be divided into 5 categories. We first analyzedthe accuracy of the results obtained by different classifiersusing our GTgesture dataset. Then, we tested the accuracyof real HCI by using our HMD device. Finally, we evaluated the time spent by different people learning to use interactivepatterns. A cc u r ac y KNN RF XGboost ANN SVM CNNKNN RF XGboost ANN SVM CNN
Fig. 12. Effects of different classifiers. Different classification methodswere used to test the dataset. The image size was × , and the dif-ference between ANN and CNN is the presence or lack of a convolutionlayer. We randomly selected 8,160 data of 12 individuals from GT-gesture as the training set, and the remaining 2,040 data of3 individuals were selected as the test set. Some commonlyused classifiers were utilized to compare and analyze theresults. As shown in Fig. 12, conventional machine learningmethods cannot achieve good results, especially for KNN.However, these methods can often get good results forhandwritten character recognition, which may indicate thatthere are some differences between gaze trajectories andhandwritten characters. Moreover, CNN can achieve goodresults for recognition tasks regardless of time and accuracy.In addition, the experimental results also demonstrates that our GTgesture can be applied to a variety of classifiers andmeet the needs of HCI.
Fig. 13. GTgestures. In the figure, there are 17 patterns in 5 categories,and we have evaluated the recognition rate of each pattern. The fifth pat-tern (circle) has the lowest recognition rate. According to the heat maps,only this pattern produces no clear high-heat area; thus, classificationis not effective. At the same time, in general, with a completely closedshape like a circle, people are not easy to draw when drawing patterns.
To further evaluate the HCI performance of our methodin real applications, we selected 5 individuals who didnot participate in GTgestures data collection to conductexperiments using our HMD device. During the evaluation,a pattern randomly appeared on the screen, and the partic-ipant traced the pattern with their gaze. The average recog-nition rate of these 17 gaze trajectories can reach 96.71%.We further analyzed the recognition rate of each group andfound that the first category has the highest recognition rate,which can reach 98%, while the fifth category has the lowest,reaching only 80%. Fig. 13 shows the trajectory probabilityof the gaze in the form of a heat map. We can clearly seethat, among the five patterns, only the fifth (circular) hasno high-heat area; that is, the circles drawn by each personare not the same-they differ not only in position but also inthe degree of deformation between the traced pattern andthe standard template. The other four types have clear high-heat areas following the patterns we designed; therefore, therecognition rates for these patterns are particularly high. T i m e [ m i n ] Participant
First time Second timeFirst time Second time
Fig. 14. Time spent mastering GTgesture. The time spent collectinggaze gesture data from 15 participants.
When collecting GTgesture data, we also recorded the timespent by each participant each time they completed a col-lection task. We collected data from each person in twobatches to analyze the user interactions after their initialexperiences and found effective improvements. As shownin Fig. 14, except for participant 1, the time required for the second collection was, on average, half that required forthe first collection. Most people require 40-50 min to collectthe first batch of data when they are first introduced to thisinteractive mode, but they require only 20-23 min to collectthe second batch of data: decreasing the time required forthe task by nearly half. Thus, the gaze gesture is usuallyused skillfully after the first use.
ISCUSIONS
Unconstrained gaze estimation is the core technology ofeye tracking, especially in the HMD field. The relativepositions of the camera and the eyes vary from person toperson and can shift and rotate with the face when theperson wears an HMD. Multiple light sources are oftenused to deal with these problems [76], and infrared camerasand multiple infrared light sources are used to carry outgeometric modeling of the eyeball. However, the use ofthese devices increases the complexity of system. In contrast,we designed a single RGB camera-based HMD that canachieve unconstrained gaze estimation and HCI throughour UEGazeNet or UEGazeNet*. Moreover, instead of usingreal-world data [29], [66], [67], [68], we generate trainingdata by using the UnityEyes customization according toour requirements [71]. This method can reduce the cost oflabeling data and adds data points with extreme angles.As far as the current effect is concerned, we believe thattraining with synthetic data is an effective approach that isnot limited to gaze estimation: it can also be applied in fieldssuch as autonomous driving [77].Our method uses two neural networks in parallel toextract different features and ensure the correlations be-tween features. UEGazeNet and UEGazeNet* have similarstructures but different functions. The experimental resultsshow that UEGazeNet* can consider the global features ofthe image, which is applicable for low-resolution inputs;meanwhile, UEGazeNet can achieve a good effect throughlandmark extraction, even under imperfect eye image con-ditions, which is suitable for close-range detection such asour HMDs applications.To improve HCI performance, we do not simply map thegaze to the cursor [24] [26], as the effect of these methods islargely depended on the precision of gaze tracking. Our HCImethod is based on a gaze gesture classifier that can detectthe attention of the user based on their gaze. In addition,different from previous rule-based classification methods[25], [27], we use a CNN classifier, which does not requirevery accurate gaze estimation to achieve rapid interactionand can even produce effective interactive operations whilein motion.It is easy to draw inaccurate patterns, especially for userswho are just beginning to learn this type of interaction. Forexample, a new user will take time to think about whereto ”write” when a trajectory begins or to look elsewherebefore the task is complete. We collected a dataset to classifyeye patterns without special standardization and accuratelydetect users’ intentions in real time. Based on our extensiveevaluation of multiple people and multiple models, thedataset we established is very effective and achieves 96.71%accuracy. However, when collecting datasets, we found that this interaction mode requires a certain adaptation time, dueto peoples attention and habits.
ONCLUSIONS
This paper proposes a gaze-tracking method that uses deepconvolutional neural networks to detect the landmarks ofeyes and obtains gaze directions from them. This methodcan learn the relationship between the real human eye andthe gaze direction from a small amount of synthetic data.The existence of landmarks allows the model to fit thecomplete eye information, even when the camera does notcapture the complete eye, thus achieving the detection andtracking of the gaze direction. In addition, although thismethod can adapt to various lighting conditions (includingindoor and outdoor environments at different times), itseffect also depends on the quality of the input image. Ourwork demonstrates that it is possible to interact throughgaze gestures. Furthermore, we developed the gaze gesturedataset, in which we collected during gaze tracking. Forthe recognition rate of interaction, we evaluated the 17patterns in the dataset separately, which proves that theuser’s intention can be detected even if the gaze-trackingeffect is not ideal or the gaze gesture is not accurate. Thismethod is a feasible solution that can be applied to HCI infuture HMDs.
CKNOWLEDGMENTS
This work was supported by the financial support from theNational Natural Science Foundation of China (61501101,61771121), the 111 Project (B16009), and the FundamentalResearch Funds for the Central Universities (N171904006,N172410006-2). R EFERENCES [1] T. Joda, G. O. Gallucci, D. Wismeijerc, and N. U. Zitzmann, Aug-mented and virtual reality in dental medicine: A systematic review,
Computers in Biology and Medicine , vol. 108, pp. 93-100, May. 2019.[2] J. Lasse, and F. Konradsen, A review of the use of virtual realityhead-mounted displays in education and training,
Education andInformation Technologies , vol. 23, pp. 1515-1529, 2017.[3] A. Y. C. Nee, S. K. Ong, G. Chryssolouris, and D. Mourtzis, ”Augmented reality applications in design and manufacturing,”
CIRP Annals - Manufacturing Technology , vol. 61, pp. 657-679, 2012[4] Z. Michael, From visual simulation to virtual reality to games,
Computer , vol. 38, pp. 25-32, 2005.[5] C. Nikolas, and T. Hllerer, An Evaluation of Bimanual Gestures onthe Microsoft HoloLens,
IEEE Conference on Virtual Reality and 3DUser Interfaces (VR) , pp. 1-8, 2018[6] G. T. Buswell, ”How people look at pictures. A study of thepsychology of perception in art”,
The University of Chicago Press ,1935.[7] P. C. Gordon, R. Hendrick, M. Johnson, and Y. Lee, Similarity-basedinterference during language comprehension: Evidence from eyetracking during reading,
Journal of Experimental Psychology: Learning,Memory, and Cognition , vol. 32, no. 6, pp. 1304-1321, 2006.[8] A. Poole, and L. Ball, ”Eye tracking in human-computer interac-tion and usability research: Current status and future prospects,”
Encyclopedia of Human-Computer Interaction , pp. 211219, 2006.[9] D. W. Hansen, Q. Ji, ”In the eye of the beholder: A survey of modelsfor eyes and gaze”,
IEEE Trans Pattern Anal. Mach. Intell. , vol. 32, no.3, pp. 478-500, Mar. 2010.[10] M. R. Wilson, J. S. McGrath, S. J. Vine, J. P. Brewer, D. Defriend, andR. S. Masters, ”Perceptual impairment and psychomotor control invirtual laparoscopic surgery,”
Surgical Endoscopy vol. 27, no. 9, pp.32053213, 2013. [11] S. K. D’Mello, A. Olney, C. Williams, and P. Hays, ”Gaze tutor:A gaze-reactive intelligent tutoring system”.
Int. J. Hum.-Comput.Stud. , vol. 70, pp. 377-398, 2012.[12] J. Kim, E. C. Lee, and J. S. Lim, ”A new objective visual fatiguemeasurement systeA new objective visual fatigue measurementsystem by using a remote infrared camera,”
JCSSE , pp. 182-186,2011.[13] T. A. Lansu, and W. Troop-Gordon, ”Affective associations withnegativity: Why popular peers attract youths’ visual attention,”
Journal of experimental child psychology , vol. 162, pp. 282-291, 2017.[14] M. Murias, S. Major, K. S. Davlantis, L. Franz, A. Harris, B. Rardin,M. G. Sabatos-DeVito, and G. Dawson, ”Validation of eye-trackingmeasures of social attention as a potential biomarker for autismclinical trials,”
Autism research : official journal of the InternationalSociety for Autism Research , vol. 11, no. 1, pp. 166-174, 2018.[15] M. Bostelmann, B. Glaser, A. N. Zaharia, S. Eliez, and M. Schnei-der, ”Does differential visual exploration contribute to visual mem-ory impairments in 22q11.2 microdeletion syndrome?,”
Journal ofintellectual disability research : JIDR , vol. 61, no. 12, pp. 1174-1184,2017.[16] T. Kawagoe, M. Matsushita, M. Hashimoto, M. Ikeda, and K.Sekiyama, ”Face-specific memory deficits and changes in eye scan-ning patterns among patients with amnestic mild cognitive impair-ment,”
Scientific Reports , vol. 7, id. 14344, Oct. 2017.[17] K. Shyu, P. Lee, M. Lee, M. Lin, R. Lai, and Y. Chiu, ”Develop-ment of a Low-Cost FPGA-Based SSVEP BCI Multimedia ControlSystem,”
IEEE Transactions on Biomedical Circuits and Systems , vol. 4,pp. 125-132, 2010.[18] T. Hagiya, and T. Kato, ”Probabilistic touchscreen keyboard incor-porating gaze point information,”
Mobile HCI , pp. 329-333, 2014.[19] P. Majaranta, U. Ahola, and O. Spakov, ”Fast gaze typing with anadjustable dwell time,”
CHI , pp. 357-360, 2009.[20] P.Majaranta, N. Majaranta, G. Daunys, and O. Spakov, ”TextEditing by Gaze: Static vs. Dynamic Menus,”
COGAIN , May. 2009.[21] M. Porta, A. Ravarelli, and G. Spagnoli, ceCursor, a contextual eyecursor for general pointing in windows environments.
ETRA , 2010.[22] P. Biswas, and P. J. Langdon, A new interaction technique involv-ing eye gaze tracker and scanning system,
ETSA ’13 , 2013.[23] G. Buscher, A. Dengel, L. V. Elst, and F. Mittag, Gen-erating andusing gaze-based document annotations,
CHI Extended Abstracts ,2008.[24] K. Fujii, G. Gras, A. Salerno, and G. Yang, Gaze gesture basedhuman robot interaction for laparoscopic surgery.
Medical imageanalysis , vol. 44, pp. 196-214, 2018.[25] H. O. Istance, A. Hyrskykari, L. Immonen, S. Mansikkamaa, andS. Vickers, ”Designing gaze gestures for gaming: an investigationof performance,”
ETRA , 2010.[26] K. Fujii, A. Salerno, K. Sriskandarajah, K. Kwok, K. Shetty, andG. Yang, ”Gaze contingent cartesian control of a robotic arm forlaparoscopic surgery,” , pp. 3582-3589, 2013.[27] M. Porta, and M. Turina, ”Eye-S: a full-screen input modality forpure eye-based communication,”
ETRA , 2008.[28] Y. Sugano, Y. Matsushita, and Y. Sato, ”Learning-by-synthesis forappear-ance-based 3d gaze estimation,” in Proc. IEEE Conf. Comput.Vis. Pat-tern Recognit. , pp. 1821-1828, 2014.[29] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, ”MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.41, pp. 162-175, 2017.[30] R. Valenti and T. Gevers, ”Accurate Eye Center Location andTracking Using Isophote Curvature,”
Proc. IEEE Conf. ComputerVision and Pattern Recognition , pp. 1-8, 2008.[31] D. Li, D. Winfield, and D.J. Parkhurst, ”Starburst: A hybridalgorithm for video-based eye tracking combining feature-basedand model-based approaches,”
IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR’05) , pp. 79-79, 2005.[32] D.W. Hansen and A.E.C. Pece, ”Eye Tracking in the Wild,”
Com-puter Vision and Image Understanding , vol. 98, no. 1, pp. 182- 210,Apr. 2005.[33] K.N. Kim and R.S. Ramakrishna, ”Vision-Based Eye-Gaze Track-ing for Human Computer Interface,”
Proc. IEEE Int’l Conf. Systems,Man, and Cyber-netics , vol. 2, pp. 324-329, 1999.[34] M. Nixon, ”Eye Spacing Measurement for Facial Recognition,”
Proc. Conf. Soc. Photo-Optical Instrument Eng. , 1985. [35] A. Perez, M.L. Cordoba, A. Garcia, R. Mendez, M.L. Munoz,J.L. Pedraza, and F. Sanchez, ”A Precise Eye-Gaze Detection andTracking System,” J. WSCG , pp. 105-108, 2003.[36] D. Young, H. Tunley, and R. Samuels, ”Specialised Hough Trans-form and Active Contour Methods for Real-Time Eye Tracking,”
Technical Report 386, School of Cognitive and Computing Sciences, Univ.of Sussex , 1995.[37] A. Yuille, P. Hallinan, and D. Cohen, ”Feature Extraction fromFaces Using Deformable Templates,”
Int’l J. Computer Vision , vol. 8,no. 2, pp. 99-111, 1992.[38] K. Lam and H. Yan, ”Locating and Extracting the Eye in HumanFace Images,”
Pattern Recognition , vol. 29, pp. 771-779, 1996.[39] L. Zhang, ”Estimation of Eye and Mouth Corner Point Positionsin a Knowledge-Based Coding System,”
Proc. SPIE , pp. 21-18, 1996.[40] M. Kampmann and L. Zhang, ”Estimation of Eye, Eyebrow andNose Fea-tures in Videophone Sequences,”
Proc. Int’l Workshop VeryLow Bitrate Video Coding , 1998.[41] K. Grauman, M. Betke, J. Gips, and G.R. Bradski, ”Communicationvia Eye Blinks: Detection and Duration Analysis in Real Time,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition , vol. I, pp.1010-1017, 2001.[42] P.W. Hallinan, ”Recognizing Human Eyes,”
Geometric Methods inComputer Vision , pp. 212-226, 1991.[43] J. Huang, D. Ii, X. Shao, and H. Wechsler, ”Pose Discriminationand Eye Detec-tion Using Support Vector Machines (SVMs),”
Proc.Conf. NATO-ASI on Face Recognition: From Theory to Applications , pp.528-536, 1998.[44] Z. Zhu, K. Fujimura, and Q. Ji, ”Real-Time Eye Detection andTracking under Various Light Conditions,”
Proc. Eye Tracking Re-search and Applica-tions Symp. , 2002.[45] F. Samaria and S. Young, ”HMM-Based Architecture for FaceIdentification,”
Image and Vision Computing , vol. 12, no. 8, pp. 537-543, 1994.[46] J. Huang and H. Wechsler, ”Eye Detection Using Optimal WaveletPackets and Radial Basis Functions (RBFs),”
Int’l J. Pattern Recogni-tion and Arti-ficial Intelligence , vol. 13, no. 7, 1999.[47] P. Viola and M. Jones, ”Robust Real-Time Face Detection,”
Proc.Int’l Conf. Computer Vision , vol. 2, pp. 747-747, 2001.[48] D.W. Hansen and J.P. Hansen, ”Robustifying Eye Interaction,”
Proc. Conf. Vision for Human Computer Interaction , pp. 152-158, 2006.[49] I.R. Fasel, B. Fortenberry, and J.R. Movellan, ”A Generative Frame-work for Real Time Object Detection and Classification,”
ComputerVision and Image Understanding , vol. 98, no. 1, pp. 182-210, Apr. 2005.[50] P. Wang, M.B. Green, Q. Ji, and J. Wayman, ”Automatic EyeDetection and Its Validation,”
Proc. 2005 IEEE CS Conf. ComputerVision and Pattern Recognition , vol. 3, pp. 164-164, 2005.[51] J.G. Wang, E. Sung, and R. Venkateswarlu, ”Estimating the EyeGaze from One Eye,”
Computer Vision and Image Understanding , vol.98, no. 1, pp. 83-103, Apr. 2005.[52] A. Villanueva, R. Cabeza, and S. Porta, ”Eye Tracking: PupilOrientation Geometrical Modeling,”
Image and Vision Computing ,vol. 24, no. 7, pp. 663-679, July 2006.[53] A. Villanueva, R. Cabeza, and S. Porta, ”Gaze Tracking SystemModel Based on Physical Parameter,”
Int’l J. Pattern Recognition andArtificial Intelligence (IJPRAI) , vol. 21, pp.855-877, 2007.[54] D. Beymer and M. Flickner, ”Eye Gaze Tracking Using an ActiveStereo Head,”
Proc. IEEE Conf. Computer Vision and Pattern Recogni-tion , vol. 2, pp. 451-458, 2003.[55] X.L.C. Brolly and J.B. Mulligan, ”Implicit Calibration of a RemoteGaze Tracker,”
Proc. 2004 Conf. Computer Vision and Pattern Recogni-tion Workshop , vol. 8, pp. 134-134, 2004.[56] D.W. Hansen, “Comitting Eye Tracking,”
PhD thesis, IT Univ. ofCo-penhagen , 2003.[57] C.H. Morimoto and M.R.M. Mimica, ”Eye Gaze Tracking Tech-niques for Interactive Applications,”
Computer Vision and ImageUnderstanding , vol. 98, no. 1, pp. 4-24, Apr. 2005.[58] D. Witzner Hansen, J.P. Hansen, M. Nielsen, A.S. Johansen, andM.B. Steg-mann, ”Eye Typing Using Markov and Active Appear-ance Models,”
Proc. IEEE Workshop Applications on Computer Vision ,pp. 132-136, 2003.[59] A. Meyer, M. Bohme, T. Martinetz, and E. Barth, A Single CameraRemote Eye Tracker,
Perception and Interactive Technologies , pp. 208-211, 2006.[60] B. Noureddin, P.D. Lawrence, and C.F. Man, A Non-Contact De-vice for Tracking Gaze in a Human Computer Interface,
ComputerVision and Image Understanding , vol. 98, no. 1, pp. 52-82, 2005 [61] D.W. Hansen and A.E.C. Pece, Eye Tracking in the Wild,
ComputerVision and Image Understanding , vol. 98, no. 1, pp. 182-210, Apr. 2005[62] C. Morimoto, A. Amir, and M. Flickner, Detecting Eye Positionand Gaze from a Single Camera and 2 Light Sources,
Proc. Intl Conf.Pattern Recognition , vol. 4, pp. 314-317, 2002.[63] C.W. Huang, Z.S. Jiang, W.F. Kao, and Y.L. Huang, Building a low-cost eye-tracking system.
ICIT 2012 , 2012.[64] Q. Ji and Z. Zhu, Eye and Gaze Tracking for Interactive GraphicDisplay,
Proc. Second Intl Symp. Smart Graphics , pp. 79-85, 2002[65] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W.Matusik, ”Eye tracking for everyone,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit. , pp. 21762184, Jun. 2016.[66] Q. He, X. Hong, X. Chai, J. Holappa, G. Zhao, X. Chen, and M.Pietikinen, ”Omeg: Oulu multi-pose eye gaze dataset,” in Proc.Image Anal. , pp. 418-427, 2015.[67] Q. Huang, A. Veeraraghavan, and A. Sabharwal, ”Tabletgaze:Dataset and analysis for unconstrained appearance-based gazeestimation in mobile tab-lets,”
Mach. Vis. Appl. , vol. 28, no. 5, pp.445-461, 2017.[68] K. A. Funes Mora, F. Monay, and J.M. Odobez, ”EYEDIAP: Adatabase for the development and evaluation of gaze estimationalgorithms from RGB and RGB-D cameras,” in Proc. ACM Symp.Eye Tracking Res. , pp. 255-258, 2014.[69] K. A. Funes Mora and J.-M. Odobez, ”Person independent 3d gazeestimation from remote RGB-D cameras,” in Proc. IEEE Int. Conf.Image Process. , pp. 2787-2791, 2013.[70] T. Schneider, B. Schauerte, and R. Stiefelhagen, ”Manifold align-ment for person independent appearance-based gaze estimation,” in Proc. Int. Conf. Pattern Recognit. , pp. 1167-1172, 2014.[71] E. Wood, T. Baltrusaitis, L.P. Morency, P. Robinson, and A. Bulling,”Learning an appearance-based gaze estimator from one millionsynthesised images,” in Proc. ACM Symp. Eye Tracking Res. , pp. 131-138, 2016.[72] E. Wood, T. Baltrusaitis, X. Zhang, Y. Sugano, P. Robinson, andA. Bulling, ”Rendering of eyes for eye-shape registration and gazeestimation,”
Proc. IEEE Int. Conf. Comput. Vis. , pp. 3756-3764, 2015.[73] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, R.Webb, ”Learning from simulated and unsupervised images throughadversarial training”,
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. ,pp. 2242-2251, Jun. 2016.[74] K. He, X. Zhang, S. Ren, and J. Sun, ”Deep residual learning forimage recogni-tion,”
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. ,pp. 770778, June 2016.[75] S. Wyder, and P.C. Cattin, ”Eye tracker accuracy: quantitativeevaluation of the invisible eye center location,”
International Journalof Computer Assisted Radiology and Surgery , vol. 13, pp. 1651-1660,2017.[76] A. Plopski, J. Orlosky, Y. Itoh, C. Nitschke, K. Kiyokawa, and G.Klinker, Automated spatial calibration of HMD systems with un-constrained eye-cameras,
Proc. Int. Symp. Mixed Augmented Reality ,pp. 9499, 2016.[77] Y. Zhang, Z. Qiu, T. Yao, D. Liu, and T. Mei, ”Fully ConvolutionalAdaptation Networks for Semantic Segmentation,” , pp. 6810-6818,2018.
Weixing Chen is working toward the BScdegree in Northeastern university, China. Atpresent, he is an intern at Shenzhen Institutesof Advanced Technology, Chinese Academy ofSciences. His Research experience includes eyetracking , pathological image analysis, low powerelectrical impedance measurement and non-contact measurements for electrical stimulators.His research interest mainly includes biomedicalimage processing, pattern recognition. Xiaoyu Cui received his Bachelor degrees inElectronics and Information Engineering in 2007from Shenyang University of Technology andreceived his Master and Doctor degrees inBiomedical Engineering in 2009 and 2013, re-spectively, from Northeastern University. He iscurrently an associate professor in Sino-DutchBiomedical and Information Engineering Schoolat Northeastern University in China. His re-search interests include optical imaging and ma-chine learning.
Jing Zheng received the BSc degree in machinedesign from the ShenYang University of Technol-ogy,in 2016,and the MSc degree from Northeast-ern University. His research interests includecomputer vision,embedded hardware develop-ment.
Jinming Zhang is working toward a bache-lor’s degree in Northeastern university, China. Atpresent, he is an intern at Shenzhen Institutesof Advanced Technology, Chinese Academy ofSciences. His Research experience includeseye tracking and non-contact measurementsfor electrical stimulators. His research interestmainly includes medical image processing.
Shuo Chen received the B.E. degree in biomed-ical engineering from Shanghai Jiaotong Uni-veristy, China, the M.S. degree in biomedicaloptics from Heidelberg University, Germany, andthe Ph.D. degree in biomedical engineering fromNanyang Technological University, Singapore.He is currently an Associate Professor withNortheastern University, China. His research in-terests include biomedical optical spectroscopyand imaging, noninvasive medical diagnostics,biomedical instrumentation, and biomedical im-age processing.