Yusuke Sugano
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yusuke Sugano.
computer vision and pattern recognition | 2015
Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling
Appearance-based gaze estimation is believed to work well in real-world settings, but existing datasets have been collected under controlled laboratory conditions and methods have been not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing ones with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks that significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithms on three current datasets, including our own. This evaluation provides clear insights and allows us to identify key research challenges of gaze estimation in the wild.
international conference on computer vision | 2015
Erroll Wood; Tadas Baltruaitis; Xucong Zhang; Yusuke Sugano; Peter Robinson; Andreas Bulling
Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniquesto build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our models controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild.
user interface software and technology | 2015
Yusuke Sugano; Andreas Bulling
Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or the quantified self. A neglected challenge for the long-term recordings required by these applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and thus system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method 1) is effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve gaze estimation accuracy competitive with that of a calibrated eye tracker, without any manual calibration.
computer vision and pattern recognition | 2017
Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling
Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and particularly pronounced for the most challenging extreme head poses.
IEEE Transactions on Human-Machine Systems | 2015
Yusuke Sugano; Yasuyuki Matsushita; Yoichi Sato; Hideki Koike
This paper presents an unconstrained gaze estimation method using an online learning algorithm. We focus on a desktop scenario, where a user operates a personal computer, and use the mouse-clicked positions to infer, where on the screen the user is looking at. Our method continuously captures the users head pose and eye images with a monocular camera, and each mouse click triggers learning sample acquisition. In order to handle head pose variations, the samples are adaptively clustered according to the estimated head pose. Then, local reconstruction-based gaze estimation models are incrementally updated in each cluster. We conducted a prototype evaluation in real-world environments, and our method achieved an estimation accuracy of 2.9°.
user interface software and technology | 2016
Yusuke Sugano; Xucong Zhang; Andreas Bulling
Gaze is frequently explored in public display research given its importance for monitoring and analysing audience attention. However, current gaze-enabled public display interfaces require either special-purpose eye tracking equipment or explicit personal calibration for each individual user. We present AggreGaze, a novel method for estimating spatio-temporal audience attention on public displays. Our method requires only a single off-the-shelf camera attached to the display, does not require any personal calibration, and provides visual attention estimates across the full display. We achieve this by 1) compensating for errors of state-of-the-art appearance-based gaze estimation methods through on-site training data collection, and by 2) aggregating uncalibrated and thus inaccurate gaze estimates of multiple users into joint attention estimates. We propose different visual stimuli for this compensation: a standard 9-point calibration, moving targets, text and visual stimuli embedded into the display content, as well as normal video content. Based on a two-week deployment in a public space, we demonstrate the effectiveness of our method for estimating attention maps that closely resemble ground-truth audience gaze distributions.
IEEE Transactions on Image Processing | 2015
Feng Lu; Yusuke Sugano; Takahiro Okabe; Yoichi Sato
In this paper, we address the problem of free head motion in appearance-based gaze estimation. This problem remains challenging because head motion changes eye appearance significantly, and thus, training images captured for an original head pose cannot handle test images captured for other head poses. To overcome this difficulty, we propose a novel gaze estimation method that handles free head motion via eye image synthesis based on a single camera. Compared with conventional fixed head pose methods with original training images, our method only captures four additional eye images under four reference head poses, and then, precisely synthesizes new training images for other unseen head poses in estimation. To this end, we propose a single-directional (SD) flow model to efficiently handle eye image variations due to head motion. We show how to estimate SD flows for reference head poses first, and then use them to produce new SD flows for training image synthesis. Finally, with synthetic training images, joint optimization is applied that simultaneously solves an eye image alignment and a gaze estimation. Evaluation of the method was conducted through experiments to assess its performance and demonstrate its effectiveness.
arXiv: Computer Vision and Pattern Recognition | 2016
Marc Tonsen; Xucong Zhang; Yusuke Sugano; Andreas Bulling
We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people of different ethnicities and a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, and make-up. We benchmark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution and vision aids as well as recording location (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018
Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling
Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22 percent (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.
user interface software and technology | 2017
Xucong Zhang; Yusuke Sugano; Andreas Bulling
Eye contact is an important non-verbal cue in social signal processing and promising as a measure of overt attention in human-object interactions and attentive user interfaces. However, robust detection of eye contact across different users, gaze targets, camera positions, and illumination conditions is notoriously challenging. We present a novel method for eye contact detection that combines a state-of-the-art appearance-based gaze estimator with a novel approach for unsupervised gaze target discovery, i.e. without the need for tedious and time-consuming manual data annotation. We evaluate our method in two real-world scenarios: detecting eye contact at the workplace, including on the main work display, from cameras mounted to target objects, as well as during everyday social interactions with the wearer of a head-mounted egocentric camera. We empirically evaluate the performance of our method in both scenarios and demonstrate its effectiveness for detecting eye contact independent of target object type and size, camera position, and user and recording environment.