Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ryo Yonetani is active.

Publication


Featured researches published by Ryo Yonetani.


human factors in computing systems | 2016

Can Eye Help You?: Effects of Visualizing Eye Fixations on Remote Collaboration Scenarios for Physical Tasks

Keita Higuch; Ryo Yonetani; Yoichi Sato

In this work, we investigate how remote collaboration between a local worker and a remote collaborator will change if eye fixations of the collaborator are presented to the worker. We track the collaborators points of gaze on a monitor screen displaying a physical workspace and visualize them onto the space by a projector or through an optical see-through head-mounted display. Through a series of user studies, we have found the followings: 1) Eye fixations can serve as a fast and precise pointer to objects of the collaborators interest. 2) Eyes and other modalities, such as hand gestures and speech, are used differently for object identification and manipulation. 3) Eyes are used for explicit instructions only when they are combined with speech. 4) The worker can predict some intentions of the collaborator such as his/her current interest and next instruction.


computer vision and pattern recognition | 2016

Recognizing Micro-Actions and Reactions from Paired Egocentric Videos

Ryo Yonetani; Kris M. Kitani; Yoichi Sato

We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera. Our work will impact several first-person vision tasks that need the detailed understanding of social interactions, such as automatic video summarization of group events and assistive systems. To recognize micro-level actions and reactions, such as slight shifts in attention, subtle nodding, or small hand actions, where only subtle body motion is apparent, we propose to use paired egocentric videos recorded by two interacting people. We show that the first-person and second-person points-of-view features of two people, enabled by paired egocentric videos, are complementary and essential for reliably recognizing micro-actions and reactions. We also build a new dataset of dyadic (two-persons) interactions that comprises more than 1000 pairs of egocentric videos to enable systematic evaluations on the task of micro-action and reaction recognition.


computer vision and pattern recognition | 2015

Ego-surfing first person videos

Ryo Yonetani; Kris M. Kitani; Yoichi Sato

We envision a future time when wearable cameras (e.g., small cameras in glasses or pinned on a shirt collar) are worn by the masses and record first-person point-of-view (POV) videos of everyday life. While these cameras can enable new assistive technologies and novel research challenges, they also raise serious privacy concerns. For example, first-person videos passively recorded by wearable cameras will necessarily include anyone who comes into the view of a camera - with or without consent. Motivated by these benefits and risks, we develop a self-search technique tailored to first-person POV videos. The key observation of our work is that the egocentric head motions of a target person (i.e., the self) are observed both in the POV video of the target and observer. The motion correlation between the target persons video and the observers video can then be used to uniquely identify instances of the self. We incorporate this feature into our proposed approach that computes the motion correlation over supervoxel hierarchies to localize target instances in observer videos. Our proposed approach significantly improves self-search performance over several well-known face detectors and recognizers. Furthermore, we show how our approach can enable several practical applications such as privacy filtering, automated video collection and social group discovery.


human factors in computing systems | 2017

EgoScanning: Quickly Scanning First-Person Videos with Egocentric Elastic Timelines

Keita Higuchi; Ryo Yonetani; Yoichi Sato

This work presents EgoScanning, a novel video fast-forwarding interface that helps users to find important events from lengthy first-person videos recorded with wearable cameras continuously. This interface is featured by an elastic timeline that adaptively changes playback speeds and emphasizes egocentric cues specific to first-person videos, such as hand manipulations, moving, and conversations with people, based on computer-vision techniques. The interface also allows users to input which of such cues are relevant to events of their interests. Through our user study, we confirm that users can find events of interests quickly from first-person videos thanks to the following benefits of using the EgoScanning interface: 1) adaptive changes of playback speeds allow users to watch fast-forwarded videos more easily; 2) Emphasized parts of videos can act as candidates of events actually significant to users; 3) Users are able to select relevant egocentric cues depending on events of their interests.


computer vision and pattern recognition | 2016

Discovering Objects of Joint Attention via First-Person Sensing

Hiroshi Kera; Ryo Yonetani; Keita Higuchi; Yoichi Sato

The goal of this work is to discover objects of joint attention, i.e., objects being viewed by multiple people using head-mounted cameras and eye trackers. Such objects of joint attention are expected to act as an important cue for understanding social interactions in everyday scenes. To this end, we develop a commonality-clustering method tailored to first-person videos combined with points-of-gaze sources. The proposed method uses multiscale spatiotemporal tubes around points of gaze as a candidate of objects, making it possible to deal with various sizes of objects observed in the first-person videos. We also introduce a new dataset of multiple pairs of first-person videos and points-of-gaze data. Our experimental results show that our approach can outperform several state-of-the-art commonality-clustering methods.


international conference on computer vision | 2017

Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption

Ryo Yonetani; Vishnu Naresh Boddeti; Kris M. Kitani; Yoichi Sato

We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data. This framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. We utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. To overcome the high computational cost of homomorphic encryption of high-dimensional classifiers, we (1) impose sparsity constraints on local classifier updates and (2) propose a novel efficient encryption scheme named doublypermuted homomorphic encryption (DPHE) which is tailored to sparse high-dimensional data. DPHE (i) decomposes sparse data into its constituent non-zero values and their corresponding support indices, (ii) applies homomorphic encryption only to the non-zero values, and (iii) employs double permutations on the support indices to make them secret. Our experimental evaluation on several public datasets shows that the proposed approach achieves comparable performance against state-of-the-art visual recognition methods while preserving privacy and significantly outperforms other privacy-preserving methods.


european conference on computer vision | 2016

Visual Motif Discovery via First-Person Vision

Ryo Yonetani; Kris M. Kitani; Yoichi Sato

Visual motifs are images of visual experiences that are significant and shared across many people, such as an image of an informative sign viewed by many people and that of a familiar social situation such as when interacting with a clerk at a store. The goal of this study is to discover visual motifs from a collection of first-person videos recorded by a wearable camera. To achieve this goal, we develop a commonality clustering method that leverages three important aspects: inter-video similarity, intra-video sparseness, and people’s visual attention. The problem is posed as normalized spectral clustering, and is solved efficiently using a weighted covariance matrix. Experimental results suggest the effectiveness of our method over several state-of-the-art methods in terms of both accuracy and efficiency of visual motif discovery.


human factors in computing systems | 2018

Dynamic Object Scanning: Object-Based Elastic Timeline for Quickly Browsing First-Person Videos

Seita Kayukawa; Keita Higuchi; Ryo Yonetani; Masanori Nakamura; Yoichi Sato; Shigeo Morishima

This work presents the Dynamic Object Scanning (DO-Scanning), a novel interface that helps users browse long and untrimmed first-person videos quickly. The proposed interface offers users a small set of object cues generated automatically tailored to the context of a given video. Users choose which cue to highlight, and the interface in turn adaptively fast-forwards the video while keeping scenes with highlighted cues played at original speed. Our experimental results have revealed that the DO-Scanning has an efficient and compact set of cues arranged dynamically and this set of cues is useful for browsing a diverse set of first-person videos.


Proceedings of the Internet of Accessible Things on | 2018

Exploring the Role of Tunnel Vision Simulation in the Design Cycle of Accessible Interfaces

Rie Kamikubo; Keita Higuchi; Ryo Yonetani; Hideki Koike; Yoichi Sato

Despite the emphasis of involving users with disabilities in the development of accessible interfaces, user trials come with high costs and effort. Particularly considering the diverse range of abilities such as in the case of low vision, simulating the effect of an impairment on interaction with an interface has been approached. As a starting point to assess the role of simulation in the design cycle, this research focuses on allowing sighted individuals to experience the interface under tunnel vision using gaze-contingent simulation. We investigated its implementation reliability through the analysis of empirical tests of prototypes compared between participants under simulation and intended groups. We found that the simulation-based approach can enable developers to not only examine problems in interfaces but also be exposed to user feedback from simulated user trials with necessary evaluation measures. We discussed how the approach can complement accessibility qualities associated with user involvement at different development phases.


international conference on computer graphics and interactive techniques | 2017

Egoscanning: quickly scanning first-person videos with egocentric elastic timelines

Keita Higuchi; Ryo Yonetani; Yoichi Sato

This work presents EgoScanning, a video fast-forwarding interface that helps users find important events from lengthy first-person videos continuously recorded with wearable cameras. This interface features an elastic timeline that adaptively changes playback speeds and emphasizes egocentric cues specific to first-person videos, such as hand manipulations, moving, and conversations with people, on the basis of computer-vision techniques. The interface also allows users to input which of such cues are relevant to events of their interest. Through our user study, we confirm that users can find events of interest quickly from first-person videos thanks to benefits of using the EgoScanning interface.

Collaboration


Dive into the Ryo Yonetani's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kris M. Kitani

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hideki Koike

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge